Suppr超能文献

应用机器学习方法预测食管癌患者的5年生存状况。

Application of machine learning approaches to predict the 5-year survival status of patients with esophageal cancer.

作者信息

Gong Xian, Zheng Bin, Xu Guobing, Chen Hao, Chen Chun

机构信息

Department of Thoracic Surgery, Fujian Medical University Union Hospital, Fuzhou, China.

Key Laboratory of Cardio-Thoracic Surgery (Fujian Medical University), Fujian Province University, Fuzhou, China.

出版信息

J Thorac Dis. 2021 Nov;13(11):6240-6251. doi: 10.21037/jtd-21-1107.

Abstract

BACKGROUND

Accurate prognostic estimation for esophageal cancer (EC) patients plays an important role in the process of clinical decision-making. The objective of this study was to develop an effective model to predict the 5-year survival status of EC patients using machine learning (ML) algorithms.

METHODS

We retrieved the information of patients diagnosed with EC between 2010 and 2015 from the Surveillance, Epidemiology, and End Results (SEER) Program, including 24 features. A total of 8 ML models were applied to the selected dataset to classify the EC patients in terms of 5-year survival status, including 3 newly developed gradient boosting models (GBM), XGBoost, CatBoost, and LightGBM, 2 commonly used tree-based models, gradient boosting decision trees (GBDT) and random forest (RF), and 3 other ML models, artificial neural networks (ANN), naive Bayes (NB), and support vector machines (SVM). A 5-fold cross-validation was used in model performance measurement.

RESULTS

After excluding records with missing data, the final study population comprised 10,588 patients. Feature selection was conducted based on the χ test, however, the experiment results showed that the complete dataset provided better prediction of outcomes than the dataset with removal of non-significant features. Among the 8 models, XGBoost had the best performance [area under the receiver operating characteristic (ROC) curve (AUC): 0.852 for XGBoost, 0.849 for CatBoost, 0.850 for LightGBM, 0.846 for GBDT, 0.838 for RF, 0.844 for ANN, 0.833 for NB, and 0.789 for SVM]. The accuracy and logistic loss of XGBoost were 0.875 and 0.301, respectively, which were also the best performances. In the XGBoost model, the SHapley Additive exPlanations (SHAP) value was calculated and the result indicated that the four features: reason no cancer-directed surgery, Surg Prim Site, age, and stage group had the greatest impact on predicting the outcomes.

CONCLUSIONS

The XGBoost model and the complete dataset can be used to construct an accurate prognostic model for patients diagnosed with EC which may be applicable in clinical practice in the future.

摘要

背景

准确预测食管癌(EC)患者的预后在临床决策过程中起着重要作用。本研究的目的是使用机器学习(ML)算法开发一种有效的模型来预测EC患者的5年生存状况。

方法

我们从监测、流行病学和最终结果(SEER)计划中检索了2010年至2015年期间被诊断为EC的患者信息,包括24个特征。总共8种ML模型应用于选定的数据集,以根据5年生存状况对EC患者进行分类,包括3种新开发的梯度提升模型(GBM)、XGBoost、CatBoost和LightGBM,2种常用的基于树的模型,梯度提升决策树(GBDT)和随机森林(RF),以及3种其他ML模型,人工神经网络(ANN)、朴素贝叶斯(NB)和支持向量机(SVM)。在模型性能测量中使用5折交叉验证。

结果

在排除缺失数据的记录后,最终研究人群包括10588名患者。基于χ检验进行特征选择,然而,实验结果表明,完整数据集比去除非显著特征的数据集能更好地预测结果。在这8种模型中,XGBoost表现最佳[受试者操作特征(ROC)曲线下面积(AUC):XGBoost为0.852,CatBoost为0.849,LightGBM为0.850,GBDT为0.846,RF为0.838,ANN为0.844,NB为0.833,SVM为0.789]。XGBoost的准确率和逻辑损失分别为0.875和0.301,也是最佳表现。在XGBoost模型中,计算了SHapley加性解释(SHAP)值,结果表明以下四个特征:未进行癌症定向手术的原因、手术原发部位、年龄和分期组对预测结果的影响最大。

结论

XGBoost模型和完整数据集可用于为诊断为EC的患者构建准确的预后模型,未来可能适用于临床实践。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1148/8662490/958d8294395a/jtd-13-11-6240-f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验