Chen Jiaxin, Xu Chang, Shi Su, Li Xinyue, Jiang Yichen, He Xinling, Sun Weiran, Liu Sijin, Kan Haidong, Meng Xia
School of Public Health, Key Laboratory of Public Health Safety of the Ministry of Education and Key Laboratory of Health Technology Assessment of the Ministry of Health, Fudan University, Shanghai 200032, China.
State Key Laboratory of Environmental Chemistry and Ecotoxicology, Research Center for Eco-Environmental Sciences, Chinese Academy of Sciences, Beijing 100085, China.
Eco Environ Health. 2025 Jul 9;4(3):100170. doi: 10.1016/j.eehl.2025.100170. eCollection 2025 Sep.
Few studies have predicted indoor ozone (O) levels using machine learning methods. This study aimed to predict hourly indoor O concentrations using easily accessible predictors and a machine learning algorithm. We took measurements of indoor O concentrations based on low-cost sensors in 18 cities in China, along with ambient O concentration, meteorological factors, and a binary window status indicator as a proxy for ventilation behaviour, to establish random forest models. The results showed that including window status as a predictor improved model performance, with the cross-validation R increasing from 0.80 to 0.83 and the root mean square error (RMSE) decreasing from 7.89 to 7.21 ppb, highlighting the importance of considering ventilation behavior in enhancing model accuracy. The model also effectively captured hourly variations in indoor O, revealing that indoor O concentrations were consistently lower and more stable than outdoor levels. These differences suggest that relying solely on ambient data may misrepresent true personal exposure, underscoring the need to incorporate indoor exposure in assessments. This is the first study to apply easily accessible variables and machine learning methods for indoor O prediction at a large geographic spatial scale, showing promising potential for improving the accuracy of exposure assessments in epidemiological studies.
很少有研究使用机器学习方法预测室内臭氧(O)水平。本研究旨在利用易于获取的预测因子和机器学习算法预测每小时的室内O浓度。我们基于中国18个城市的低成本传感器测量室内O浓度,同时测量环境O浓度、气象因素以及作为通风行为替代指标的二元窗户状态指标,以建立随机森林模型。结果表明,将窗户状态作为预测因子可提高模型性能,交叉验证R从0.80提高到0.83,均方根误差(RMSE)从7.89 ppb降至7.21 ppb,突出了在提高模型准确性方面考虑通风行为的重要性。该模型还有效捕捉了室内O的每小时变化,表明室内O浓度始终低于室外水平且更稳定。这些差异表明,仅依靠环境数据可能会误判真实的个人暴露情况,强调了在评估中纳入室内暴露的必要性。这是第一项在大地理空间尺度上应用易于获取的变量和机器学习方法进行室内O预测的研究,显示出在提高流行病学研究中暴露评估准确性方面具有广阔的潜力。