Institute of Smart City and Intelligent Transportation, Southwest Jiaotong University, Chengdu 611756, Sichuan, China.
School of Transportation and Logistics, Southwest Jiaotong University, Chengdu 611756, Sichuan, China.
Accid Anal Prev. 2024 Dec;208:107778. doi: 10.1016/j.aap.2024.107778. Epub 2024 Sep 16.
To effectively capture and explain complex, nonlinear relationships within bicycle crash frequency data and account for unobserved heterogeneity simultaneously, this study proposes a new hybrid framework that combines the Random Forest-based SHapley Additive exPlanations (RF-SHAP) method with a random parameter negative binomial regression model (RPNB). First, four machine learning algorithms, including random forest (RF), support vector machine (SVM), gradient boosting machine (GBM), and Extreme Gradient Boosting (XGBoost), were compared for variable importance calculation. The RF algorithm, demonstrating the best performance, was selected and integrated into an interpretable machine learning-based method (i.e., RF-SHAP) to provide an interpretable measure of each variable's impact, which is critical for understanding the model's predictions results. Finally, the RF-SHAP method was combined with the RPNB model to explore individual-specific variations that influence crash frequency predictions. Using 288 traffic analysis zones (TAZs) in Greater London and various regional risk factors for bicycle crash frequency, the proposed framework was validated. The results indicate that the proposed framework demonstrates improved prediction accuracy and better factor interpretation in analyzing bicycle crash frequency. The model exhibits consistent Akaike Information Criterion (AIC) and Bayesian Information Criterion (BIC) values, indicating its reliable explanatory power. Furthermore, there is a significant improvement in the Mean Absolute Error (MAE) and Root Mean Squared Error (RMSE). This suggests that the proposed model effectively combines the explanatory power of statistical models with the forecasting powers of data-driven models. The interpretability of SHAP values, coupled with the causal insights from RPNB, provides policymakers with actionable information to develop targeted interventions.
为了有效捕捉和解释自行车碰撞频率数据中的复杂非线性关系,同时考虑未观测到的异质性,本研究提出了一种新的混合框架,该框架结合了基于随机森林的 SHapley 加性解释 (RF-SHAP) 方法和随机参数负二项回归模型 (RPNB)。首先,比较了包括随机森林 (RF)、支持向量机 (SVM)、梯度提升机 (GBM) 和极端梯度提升 (XGBoost) 在内的四种机器学习算法,以计算变量重要性。RF 算法表现最佳,被选中并集成到基于可解释机器学习的方法 (即 RF-SHAP) 中,以提供每个变量影响的可解释度量,这对于理解模型的预测结果至关重要。最后,将 RF-SHAP 方法与 RPNB 模型相结合,以探索影响碰撞频率预测的个体特定变化。利用大伦敦的 288 个交通分析区 (TAZs) 和各种区域自行车碰撞频率风险因素,验证了所提出的框架。结果表明,所提出的框架在分析自行车碰撞频率方面表现出了更高的预测精度和更好的因素解释能力。该模型的 Akaike 信息准则 (AIC) 和贝叶斯信息准则 (BIC) 值一致,表明其具有可靠的解释能力。此外,平均绝对误差 (MAE) 和均方根误差 (RMSE) 也有显著提高。这表明,所提出的模型有效地结合了统计模型的解释能力和数据驱动模型的预测能力。SHAP 值的可解释性,加上 RPNB 的因果洞察力,为政策制定者提供了可操作的信息,以制定有针对性的干预措施。