Working paper

Can satellite data on air pollution predict industrial production?

Published on 18 November 2021
Authors : Jean-Charles Bricongne, Baptiste Meunier, Thomas Pical

Working Paper Series no. 847. The Covid-19 crisis has highlighted innovative high-frequency dataset allowing to measure in real-time the economic impact. In this vein, we explore how satellite data measuring the concentration of nitrogen dioxide (NO2, a pollutant emitted mainly by industrial activity) in the troposphere can help predict industrial production. We first show how such data must be adjusted for meteorological patterns which can alter data quality and pollutant emissions. We use machine learning techniques to better account for non-linearities and interactions between variables. We then find evidence that nowcasting performances for monthly industrial production are significantly improved when relying on daily NO2 data compared to benchmark models based on PMIs and auto-regressive (AR) terms. We also find evidence of heterogeneities suggesting that the contribution of daily pollution data is particularly important during “crisis” episodes and that the elasticity of NO2 pollution to industrial production for a country depends on the share of manufacturing in the value added. Available daily, free-to-use, granular and covering all countries including those with limited statistics, this paper illustrates the potential of satellite-based data for air pollution in enhancing the real-time monitoring of economic activity.

Image Model's performance (out-of-sample RMSE) relative to the AR(1) benchmark

The sudden shock of the Covid-19 crisis – with some countries shutting down almost entirely in a matter of days – has put new emphasis on high-frequency data. Weekly, daily, or even hourly data have been extensively used to assess in real-time the impact of the pandemics. In particular, satellite measurements of air pollution have been put forward on several occasions to show the alleged dramatic effect of the shutdown of factories on air pollution.

Against this background, we assess whether satellite data for tropospheric pollution can help predicting industrial production. We focus on nitrogen dioxide (NO2), a pollutant mainly emitted by industrial activity. Compared to official or alternative indicators, such data present advantages of timeliness, global coverage – including over developing countries with limited official statistics, granularity, and free use. While an avenue for future research could be to compare predictive performances of NO2 satellite data with other high-frequency indicators, the global and uniform coverage of satellite data appears a key advantage.

Raw satellite data are however far from ready-to-use. Our first step is fetching the data and making it easier to process: we select and aggregate relevant data at ZIP code, allowing to go from a daily download of 4 Gb in multiple files into a single 20 Mb csv file. As the quality of satellite data can be altered over cloudy or snowy areas, we also clean the data. This results however in a large amount of missing points. Since missing data at local level might result in undesired composition effects when aggregating at national level, we interpolate them. We rely on a machine learning technique (the k-nearest neighbours’ algorithm) that allows interpolation to account both for spatial and temporal correlations. Finally, NO2 pollution heavily depends on meteorological factors (temperature, wind, humidity). Given that their effect is non-linear and features interactions between variables, we use a random forest algorithm. Data is then aggregated at national level to match the granularity of official statistics, most notably the industrial production that we intend to nowcast.

Our second step checks the relevance of NO2 pollution from a forecasting standpoint. We rely on panel regression over 17 emerging and 16 advanced countries to make up for the limited available timespan (since Dec. 2018 only). We find evidence that a model based on daily NO2 pollution data over-performs benchmark models based on survey data (Purchasing Managers’ Index) or auto-regressive (AR) terms. Mimicking a real-time set-up from March 2020 to December 2020, we find that this over-performance holds for all days of the month: the model based on daily NO2 pollution data over-performs benchmarks at every day – with evidence that gains in predictive accuracy are greater as the month advances and more daily data become then available (see figure 1). There are additional accuracy gains when relying on a MIxed DAta Sampling (MIDAS) approach to predict monthly industrial production using daily NO2 pollution – using a panel-MIDAS recently introduced in the literature.

We finally find evidence for heterogeneities. First, accuracy gains are greater for countries that have been more affected by the Covid-19 crisis, suggesting that the contribution of high-frequency data is more important during “crisis” episodes than during “normal” times. Second, the elasticity of pollution to industrial production appears to be greater for countries with a larger share of manufacturing in the value added. In the end, we turn to business cycle detection. Building on a Markov-switching framework, we find that daily NO2 pollution data allow for a swifter detection of turning points compared with relying on monthly official data, with a lead time of around 2.5 months for the former over the latter.