Dai Lihuan, Yin Jinxue, Xin Xin, Yao Chun, Tang Yongfang, Xia Xiaohong, Chen Yuanlin, Lai Shuying, Lu Guoliang, Huang Jie, Zhang Purong, Li Jiansheng, Chen Xiangguang, Zhong Xi
Department of Medical Imaging, Guangzhou Institute of Cancer Research, the Affiliated Cancer Hospital, Guangzhou Medical University, Guangzhou, 510095, China.
Department of Radiology, Meizhou People's Hospital, Mei Zhou, 514031, China.
Cancer Imaging. 2025 Mar 12;25(1):31. doi: 10.1186/s40644-025-00855-3.
Programmed death ligand 1 (PD-L1) expression status, closely related to immunotherapy outcomes, is a reliable biomarker for screening patients who may benefit from immunotherapy. Here, we developed and validated an interpretable machine learning (ML) model based on contrast-enhanced computed tomography (CECT) radiomics for preoperatively predicting PD-L1 expression status in patients with gastric cancer (GC).
We retrospectively recruited 285 GC patients who underwent CECT and PD-L1 detection from two medical centers. A PD-L1 combined positive score (CPS) of ≥ 5 was considered to indicate a high PD-L1 expression status. Patients from center 1 were divided into training (n = 143) and validation sets (n = 62), and patients from center 2 were considered a test set (n = 80). Radiomics features were extracted from venous-phase CT images. After feature reduction and selection, 11 ML algorithms were employed to develop predictive models, and their performance in predicting PD-L1 expression status was evaluated using areas under receiver operating characteristic curves (AUCs). SHapley Additive exPlanations (SHAP) were used to interpret the optimal model and visualize the decision-making process for a single individual.
Nine features significantly associated with PD-L1 expression status were ultimately selected to construct the predictive model. The light gradient-boosting machine (LGBM) model demonstrated the best performance for PD-L1 high expression status prediction in the training, validation, and test sets, with AUCs of 0.841(95% CI: 0.773, 0.908), 0.834 (95% CI:0.729, 0.939), and 0.822 (95% CI: 0.718, 0.926), respectively. The SHAP summary and bar plots illustrated that a feature's value affected the feature's impact attributed to the model. The SHAP waterfall plots were used to visualize the decision-making process for a single individual.
Our CT radiomics-based LGBM model may aid in preoperatively predicting PD-L1 expression status in GC patients, and the SHAP method may improve the interpretability of this model.
程序性死亡配体1(PD-L1)表达状态与免疫治疗结果密切相关,是筛选可能从免疫治疗中获益患者的可靠生物标志物。在此,我们开发并验证了一种基于对比增强计算机断层扫描(CECT)影像组学的可解释机器学习(ML)模型,用于术前预测胃癌(GC)患者的PD-L1表达状态。
我们回顾性招募了285例在两个医疗中心接受CECT和PD-L1检测的GC患者。PD-L1联合阳性评分(CPS)≥5被认为表明PD-L1高表达状态。来自中心1的患者分为训练集(n = 143)和验证集(n = 62),来自中心2的患者被视为测试集(n = 80)。从静脉期CT图像中提取影像组学特征。经过特征降维和选择后,采用11种ML算法开发预测模型,并使用受试者工作特征曲线下面积(AUC)评估其预测PD-L1表达状态的性能。使用Shapley加性解释(SHAP)来解释最优模型并可视化单个个体的决策过程。
最终选择了9个与PD-L1表达状态显著相关的特征来构建预测模型。轻梯度提升机(LGBM)模型在训练集、验证集和测试集中对PD-L1高表达状态预测表现最佳,AUC分别为0.841(95%CI:0.773,0.908)、0.834(95%CI:0.729,0.939)和0.822(95%CI:0.718,0.926)。SHAP汇总图和柱状图表明,一个特征的值会影响该特征对模型的影响。SHAP瀑布图用于可视化单个个体的决策过程。
我们基于CT影像组学的LGBM模型可能有助于术前预测GC患者的PD-L1表达状态,且SHAP方法可能会提高该模型的可解释性。