Song Zhiwei, Weng Jilin, Han Yupeng, Li Wangyu, Xu Yiya, He Yingchao, Wang Yinzhou
Shengli Clinical Medical College of Fujian Medical University, Department of Neurology, Fuzhou University Affiliated Provincial Hospital, Fujian Key Laboratory of Medical Analysis, Fujian Academy of Medical Sciences, Fuzhou, Fujian, China.
Department of Neurology, Wuyishan City Hospital, Wuyi Hospital Affiliated To Fujian Provincial Hospital, Fuzhou, Fujian, China.
BMC Public Health. 2025 Aug 21;25(1):2868. doi: 10.1186/s12889-025-24220-y.
To create and verify a machine learning model that integrates social determinants of health (SDoH) for assessing post-stroke depression (PSD) and examining the association between SDoH and disease outcomes.
Data were acquired from the National Health and Nutrition Examination Survey. Logistic regression was employed to analyse the association between SDoH and PSD, whereas Cox regression was utilized to assess the correlation between SDoH and all-cause mortality in PSD. The Boruta algorithm was employed for feature selection, and four machine learning models were constructed (CatBoost, Logistic, Multilayer Perceptron, and Random Forest) to evaluate the predictive effectiveness, calibration, and clinical applicability of these ML models. SHAP values were computed to ascertain the predictive significance of each feature in the model that exhibited the highest predictive performance.
Logistic regression analysis revealed a significant positive correlation between SDoH and PSD prevalence(p for trend < 0.0001). Compared to the other three models, CatBoost (AUC = 0.966) demonstrated the best overall predictive performance. Moreover, the decision curve analysis (DCA) and calibration curve findings demonstrated that the CatBoost model possessed considerable clinical utility and consistent predictive efficacy. The ten-fold cross-validation method further confirmed the model's robustness and generalization ability.
A linear relationship exists between SDoH and PSD, with CatBoost demonstrating the best performance in predicting PSD. SHAP values emphasize the importance of SDoH.
创建并验证一个整合健康社会决定因素(SDoH)的机器学习模型,用于评估中风后抑郁症(PSD),并研究SDoH与疾病结局之间的关联。
数据来自国家健康与营养检查调查。采用逻辑回归分析SDoH与PSD之间的关联,而采用Cox回归评估SDoH与PSD全因死亡率之间的相关性。采用Boruta算法进行特征选择,并构建四个机器学习模型(CatBoost、逻辑回归、多层感知器和随机森林)来评估这些机器学习模型的预测有效性、校准和临床适用性。计算SHAP值以确定在预测性能最高的模型中每个特征的预测显著性。
逻辑回归分析显示SDoH与PSD患病率之间存在显著正相关(趋势p<0.0001)。与其他三个模型相比,CatBoost(AUC = 0.966)表现出最佳的整体预测性能。此外,决策曲线分析(DCA)和校准曲线结果表明,CatBoost模型具有相当的临床实用性和一致的预测效果。十倍交叉验证方法进一步证实了该模型的稳健性和泛化能力。
SDoH与PSD之间存在线性关系,CatBoost在预测PSD方面表现最佳。SHAP值强调了SDoH的重要性。