Paris School of Economics and École des Hautes Etudes en Sciences Sociales, 48 Boulevard Jourdan, 75014, Paris, France.
Barcelona Institute for Global Health (ISGlobal), Carrer del Rosselló, 132, 08036, Barcelona, Spain.
Sci Rep. 2022 May 13;12(1):7917. doi: 10.1038/s41598-022-11939-6.
A growing literature in economics and epidemiology has exploited changes in wind patterns as a source of exogenous variation to better measure the acute health effects of air pollution. Since the distribution of wind components is not randomly distributed over time and related to other weather parameters, multivariate regression models are used to adjust for these confounding factors. However, this type of analysis relies on its ability to correctly adjust for all confounding factors and extrapolate to units without empirical counterfactuals. As an alternative to current practices and to gauge the extent of these issues, we propose to implement a causal inference pipeline to embed this type of observational study within an hypothetical randomized experiment. We illustrate this approach using daily data from Paris, France, over the 2008-2018 period. Using the Neyman-Rubin potential outcomes framework, we first define the treatment of interest as the effect of North-East winds on particulate matter concentrations compared to the effects of other wind directions. We then implement a matching algorithm to approximate a pairwise randomized experiment. It adjusts nonparametrically for observed confounders while avoiding model extrapolation by discarding treated days without similar control days. We find that the effective sample size for which treated and control units are comparable is surprisingly small. It is however reassuring that results on the matched sample are consistent with a standard regression analysis of the initial data. We finally carry out a quantitative bias analysis to check whether our results could be altered by an unmeasured confounder: estimated effects seem robust to a relatively large hidden bias. Our causal inference pipeline is a principled approach to improve the design of air pollution studies based on wind patterns.
经济学和流行病学领域的研究文献越来越多地利用风向变化作为外生变量来更好地衡量空气污染对健康的急性影响。由于风分量的分布不是随时间随机分布的,并且与其他天气参数有关,因此需要使用多元回归模型来调整这些混杂因素。然而,这种分析方法依赖于其正确调整所有混杂因素并外推到没有经验反事实的单位的能力。作为对现有实践的替代方法,并衡量这些问题的程度,我们建议实施因果推理管道,将这种类型的观察性研究嵌入到假设的随机实验中。我们使用法国巴黎 2008-2018 年期间的每日数据来说明这种方法。使用 Neyman-Rubin 潜在结果框架,我们首先将感兴趣的处理定义为与其他风向相比,东北风对颗粒物浓度的影响。然后,我们实施匹配算法来近似配对随机实验。它通过丢弃没有类似对照日的处理日来非参数调整观察到的混杂因素,同时避免模型外推。我们发现,可比较处理和对照单位的有效样本量出奇地小。然而,令人欣慰的是,匹配样本上的结果与初始数据的标准回归分析一致。我们最后进行了定量偏差分析,以检查未测量的混杂因素是否会改变我们的结果:估计的效果似乎对相对较大的隐藏偏差具有稳健性。我们的因果推理管道是一种基于风向改进空气污染研究设计的原则性方法。