Wei Yiqu, Xu Wanqing, Yang Shuo, Zhang Congfeng, Wang Jia, Wan Xianyao
Department of Critical Care Medicine, The First Affiliated Hospital of Dalian Medical University, Dalian, China.
Department of Critical Care Medicine, Dandong Central Hospital, Dandong, China.
Front Cell Infect Microbiol. 2025 Aug 8;15:1623109. doi: 10.3389/fcimb.2025.1623109. eCollection 2025.
Urosepsis is a subset of sepsis with a high mortality rate. Currently, the ranking of urosepsis in sepsis etiology is on the rise. Our goal is to use machine learning (ML) methods to construct and validate an interpretable prognosis prediction model for patients with urosepsis.
Data were collected from the Intensive Care Medical Information Mart IV database version 3.1 and divided into a training cohort and a validation cohort in a 7:3 ratio. Random Forest (RF), Lasso, Boruta, and eXtreme Gradient Boosting (XGBoost) were used to identify the most influential variables in the model development dataset, and the optimal variables were selected based on achieving the λ value. Model development includes seven machine learning methods and ten cross validations. Accuracy and Decision Curve Analysis (DCA) were used to evaluate the performance of the model in order to select the optimal model. Internal validation of the model included area under the ROC curve (AUC), sensitivity, specificity, Matthews correlation coefficient, and F1-score. Finally, SHapley Additive exPlans (SHAP) was used to explain ML models.
A total of 1389 patients with urosepsis were included. Optimal predictors were selected through statistical regularization, yielding a parsimonious set of 9 variables for model development. The performance of XGBoost model is the best and the accuracy of XGBoost was 0.818, with an AUC of 0.904 (95% CI: 0.886-0.923). The internal validation accuracy was 0.797, AUC was 0.869 (95% CI: 0.834-0.904), sensitivity was 0.797, specificity was 0.752, Matthews correlation coefficient was 0.597, and F1-score was 0.791. This indicates that the predictive model performs well in internal validation. SHAP-based summary graphs and diagrams were used to globally explain the XGBoost model.
ML demonstrates strong prognostic capability in urosepsis, with the SHAP method providing clinically intuitive explanations of model predictions. This enables clinicians to identify critical prognostic factors and personalize treatments. While our model achieved high predictive accuracy, its retrospective derivation from a single-center database necessitates external validation in diverse populations, which should be addressed through future prospective multicenter studies to establish clinical generalizability.
泌尿道脓毒症是脓毒症的一个子集,死亡率很高。目前,泌尿道脓毒症在脓毒症病因中的排名正在上升。我们的目标是使用机器学习(ML)方法构建并验证一个针对泌尿道脓毒症患者的可解释的预后预测模型。
数据从重症监护医学信息集市IV数据库3.1版中收集,并以7:3的比例分为训练队列和验证队列。使用随机森林(RF)、套索回归、博鲁塔算法和极端梯度提升(XGBoost)来识别模型开发数据集中最具影响力的变量,并根据达到的λ值选择最佳变量。模型开发包括七种机器学习方法和十次交叉验证。使用准确性和决策曲线分析(DCA)来评估模型的性能,以选择最佳模型。模型的内部验证包括ROC曲线下面积(AUC)、敏感性、特异性、马修斯相关系数和F1分数。最后,使用SHapley加性解释(SHAP)来解释ML模型。
共纳入1389例泌尿道脓毒症患者。通过统计正则化选择最佳预测因子,为模型开发产生了一组简洁的9个变量。XGBoost模型的性能最佳,XGBoost的准确性为0.818,AUC为0.904(95%CI:0.886-0.923)。内部验证准确性为0.797,AUC为0.869(95%CI:0.834-0.904),敏感性为0.797,特异性为0.752,马修斯相关系数为0.597,F1分数为0.791。这表明预测模型在内部验证中表现良好。基于SHAP的汇总图和图表用于全面解释XGBoost模型。
ML在泌尿道脓毒症中显示出强大的预后能力,SHAP方法为模型预测提供了临床直观的解释。这使临床医生能够识别关键的预后因素并实现个性化治疗。虽然我们的模型实现了高预测准确性,但它从单中心数据库的回顾性推导需要在不同人群中进行外部验证,这应通过未来的前瞻性多中心研究来解决,以建立临床通用性。