Hu Xiefei, Zhi Shenshen, Li Yang, Cheng Yuming, Fan Haiping, Li Haorong, Meng Zihao, Xie Jiaxin, Tang Shu, Li Wei
Department of Clinical Laboratory, Chongqing Emergency Medical Center, School of Medicine, Chongqing University Central Hospital, Chongqing University, Chongqing, China.
Peking University Chongqing Big Data Research Institute, Chongqing, China.
BMC Med Inform Decis Mak. 2025 May 14;25(1):186. doi: 10.1186/s12911-025-03020-9.
Bloodstream Infection (BSI) is a severe systemic infectious disease that can lead to sepsis and Multiple Organ Dysfunction Syndrome (MODS), resulting in high mortality rates and posing a major public health burden globally. Early identification of BSI is crucial for effective intervention, reducing mortality, and improving patient outcomes. However, existing diagnostic methods are flawed by low specificity, long detection times and high demands on testing platforms. The development of artificial intelligence provides a new approach for early disease identification. This study aims to explore the optimal combination of routine laboratory data and clinical monitoring indicators, and to utilize machine learning algorithms to construct an early, rapid, and universally applicable BSI risk prediction model, to assist in the early diagnosis of BSI in clinical practice.
Clinical data of 2582 suspected BSI patients admitted to the Chongqing University Central Hospital, from January 1, 2021 to December 31, 2023 were collected for this study. The data were divided into a modeling dataset and an external validation dataset based on chronological order, while the modeling dataset was further divided into a training set and an internal validation set. The occurrence rate of BSI, distribution of pathogens, and microbial primary reporting time were analyzed within the training set. During the feature selection stage, univariate regression and ML algorithms were applied. First, Univariate logistic regression was used to screen for predictive factors of BSI. Then, the Boruta algorithm, Lasso regression, and Recursive Feature Elimination with Cross-validation (RFE-CV) were employed to determine the optimal combination of predictors for predicting BSI. Based on the optimal combination, six machine learning algorithms were used to construct an early BSI risk prediction model. The best model was selected by models' performance, and the Shapley Additive Explanations (SHAP) method was used to explain the model. The external validation set was used to evaluate the predictive performance and generalizability of the selected model, and the research findings were ultimately applied in clinical practice.
The incidence of BSI among inpatients at the Chongqing University Central Hospital was 12.91%. Following further feature selection, a set of 5 variables was determined, including white blood cell count, standard bicarbonate, base excess of extracellular fluid, interleukin-6, and body temperature. BSI early risk prediction models were constructed using six machine learning algorithms, with the XGBoost model demonstrating the best performance, achieving an AUC value of 0.782 in the internal validation set and an AUC value of 0.776 in the external validation set. This model is made publicly available as an online webpage tool for clinical use.
This study successfully identified a set of 5 features by analyzing routine laboratory data clinical monitoring indicators among hospitalized patients. Based on this set, a machine learning-based early risk prediction model for BSI was constructed. The model is capable of early and rapid differentiation between BSI and non-BSI patients. The inclusion of minimal risk prediction factors enhances its applicability in clinical settings, particularly at the primary care level. To further improve the model's real-world applicability and more convenient for clinical use, the online application of the model could greatly improve the efficiency of BSI diagnosis and reducing patients' mortality.
血流感染(BSI)是一种严重的全身性感染性疾病,可导致脓毒症和多器官功能障碍综合征(MODS),死亡率高,给全球公共卫生带来重大负担。早期识别BSI对于有效干预、降低死亡率和改善患者预后至关重要。然而,现有的诊断方法存在特异性低、检测时间长以及对检测平台要求高等缺陷。人工智能的发展为疾病早期识别提供了新方法。本研究旨在探索常规实验室数据与临床监测指标的最佳组合,并利用机器学习算法构建早期、快速且普遍适用的BSI风险预测模型,以协助临床实践中BSI的早期诊断。
本研究收集了2021年1月1日至2023年12月31日重庆大学附属中心医院收治的2582例疑似BSI患者的临床资料。根据时间顺序将数据分为建模数据集和外部验证数据集,同时将建模数据集进一步分为训练集和内部验证集。在训练集内分析BSI的发生率、病原体分布及微生物初步报告时间。在特征选择阶段,应用单变量回归和机器学习算法。首先,使用单变量逻辑回归筛选BSI的预测因素。然后,采用博鲁塔算法、套索回归和带交叉验证的递归特征消除(RFE-CV)来确定预测BSI的预测因子的最佳组合。基于最佳组合,使用六种机器学习算法构建早期BSI风险预测模型。通过模型性能选择最佳模型,并使用夏普利值附加解释(SHAP)方法解释该模型。外部验证集用于评估所选模型的预测性能和可推广性,研究结果最终应用于临床实践。
重庆大学附属中心医院住院患者中BSI的发生率为12.91%。经过进一步的特征选择,确定了一组5个变量,包括白细胞计数、标准碳酸氢盐、细胞外液碱剩余、白细胞介素-6和体温。使用六种机器学习算法构建了BSI早期风险预测模型,其中XGBoost模型表现最佳,在内部验证集中AUC值为0.782,在外部验证集中AUC值为0.776。该模型作为在线网页工具公开提供以供临床使用。
本研究通过分析住院患者的常规实验室数据和临床监测指标,成功识别出一组5个特征。基于此构建了基于机器学习的BSI早期风险预测模型。该模型能够早期、快速地区分BSI患者和非BSI患者。纳入最少的风险预测因素增强了其在临床环境中的适用性,特别是在基层医疗水平。为进一步提高模型在现实世界中的适用性并更方便临床使用,该模型的在线应用可大大提高BSI诊断效率并降低患者死亡率。