Xie Bin, Mo Mingda, Cui Haidong, Dong Yijie, Yin Hongping, Lu Zhe
School of Information Science and Technology, Hangzhou Normal University, Hangzhou 311121, China.
Department of Breast Surgery, The First Affiliated Hospital, School of Medicine, Zhejiang University, Hangzhou 311121, China.
Diagnostics (Basel). 2025 Mar 28;15(7):872. doi: 10.3390/diagnostics15070872.
Lung cancer is one of the most prevalent cancers worldwide. Accurately determining lung cancer subtypes and identifying high-risk patients are helpful for individualized treatment and follow-up. Our study aimed to establish an effective model for subtype classification and overall survival (OS) prediction in patients with lung cancer. Histopathological images, clinical data, and genetic information of lung adenocarcinoma and lung squamous cell carcinoma cases were downloaded from The Cancer Genome Atlas. An influencing factor system was optimized based on the nuclear, clinical, and genetic features. Four machine-learning models-light gradient boosting machine (LightGBM), extreme gradient boosting (XGBoost), random forest (RF), and adaptive boosting (AdaBoost)-and three deep-learning models-multilayer perceptron (MLP), TabNet, and convolutional neural network (CNN)-were employed for subtype classification and OS prediction. The performance of the models was comprehensively evaluated. XGBoost exhibited the highest area under the curve (AUC) value of 0.9821 in subtype classification, whereas RF exhibited the highest AUC values of 0.9134, 0.8706, and 0.8765 in predicting OS at 1, 2, and 3 years, respectively. Our study was the first to incorporate the characteristics of nuclei and the genetic information of patients to predict the subtypes and OS of patients with lung cancer. The combination of different factors and the usage of artificial intelligence methods achieved a small breakthrough in the results of previous studies regarding AUC values.
肺癌是全球最常见的癌症之一。准确确定肺癌亚型并识别高危患者有助于个体化治疗和随访。我们的研究旨在建立一种有效的模型,用于肺癌患者的亚型分类和总生存期(OS)预测。从癌症基因组图谱下载了肺腺癌和肺鳞状细胞癌病例的组织病理学图像、临床数据和基因信息。基于细胞核、临床和基因特征优化了一个影响因素系统。使用四个机器学习模型——轻梯度提升机(LightGBM)、极端梯度提升(XGBoost)、随机森林(RF)和自适应提升(AdaBoost)——以及三个深度学习模型——多层感知器(MLP)、TabNet和卷积神经网络(CNN)——进行亚型分类和OS预测。对模型的性能进行了全面评估。在亚型分类中,XGBoost的曲线下面积(AUC)值最高,为0.9821;而在预测1年、2年和3年的OS时,RF的AUC值分别最高,为0.9134、0.8706和0.8765。我们的研究首次纳入细胞核特征和患者基因信息来预测肺癌患者的亚型和OS。不同因素的组合以及人工智能方法的使用在先前研究的AUC值结果方面取得了小的突破。