用于预测肝细胞癌患者经动脉化疗栓塞反应和亚型的可解释机器学习模型。

Explainable machine learning model for predicting the transarterial chemoembolization response and subtypes of hepatocellular carcinoma patients.

作者信息

Zhang Yunjie, Tong Songjian, Yang Junhui, Lin Jiawei, Kong Yifan, Lu Deyu, Chen Yan, Li Yingchao, Xu Linfeng, Kong Xiuyan, Zhu Guoqing, Zhang Hao, Liu Pixu, Yu Zhijie, Xia Jinglin

机构信息

Zhejiang Key Laboratory of Intelligent Cancer Biomarker Discovery and Translation, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China.

Department of Interventional Medicine, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, 325000, China.

出版信息

BMC Gastroenterol. 2025 Jul 7;25(1):503. doi: 10.1186/s12876-025-04105-5.

DOI:10.1186/s12876-025-04105-5

PMID:40624491

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12232800/

Abstract

BACKGROUND

Hepatocellular carcinoma (HCC), the third leading cause of cancer-related deaths globally, faces heterogeneous responses to transarterial chemoembolization (TACE) in intermediate-stage disease. We developed a Machine Learning (ML)-based model integrating routine clinical variables to preoperatively predict TACE efficacy, enabling tailored TACE candidate selection and optimized therapeutic decision-making.

METHODS

This retrospective multicentre study enrolled treatment-naive HCC patients undergoing initial TACE from two independent cohorts: the First Affiliated Hospital of Wenzhou Medical University (training cohort) and Wenzhou Central Hospital (external validation cohort). Through recursive feature elimination (RFE), we systematically developed prediction models employing ten distinct ML algorithms. The SHAP algorithm was implemented to enhance model interpretability, while patient stratification was subsequently performed using PCA and K-means clustering to facilitate comprehensive prognostic analysis.

RESULTS

We retrospectively collected 382 unresectable HCC patients from the First Affiliated Hospital of Wenzhou Medical University and 52 from Wenzhou Central Hospital. RFE method identified 10 predictors for constructing ML models. XGBoost and CatBoost outperformed other algorithms, achieving AUCs of 0.796–0.799 (internal test) and 0.785–0.791 (external validation) with balanced accuracy (76-76.8%). SHAP interpretability revealed tumor burden and hepatic function markers as key determinants of TACE resistance. K-means clustering stratified patients into two prognostically distinct subgroups: Cluster B showed significantly longer survival than Cluster A (HR = 0.36, 95%CI:0.26–0.49, < 0.001), confirming the clinical relevance of ML-selected features.

CONCLUSION

We developed and validated an interpretable ML-based system integrating predictive modelling and patient clustering to individualize TACE efficacy prediction and clinical risk stratification for HCC patients.

SUPPLEMENTARY INFORMATION

The online version contains supplementary material available at 10.1186/s12876-025-04105-5.

摘要

背景

肝细胞癌（HCC）是全球癌症相关死亡的第三大主要原因，在中期疾病中对经动脉化疗栓塞术（TACE）存在异质性反应。我们开发了一种基于机器学习（ML）的模型，整合常规临床变量以术前预测TACE疗效，从而实现TACE候选者的个性化选择和优化治疗决策。

方法

这项回顾性多中心研究纳入了来自两个独立队列的初治HCC患者，这些患者接受了初次TACE治疗：温州医科大学附属第一医院（训练队列）和温州市中心医院（外部验证队列）。通过递归特征消除（RFE），我们系统地开发了采用十种不同ML算法的预测模型。实施SHAP算法以增强模型的可解释性，随后使用主成分分析（PCA）和K均值聚类进行患者分层，以促进全面的预后分析。

结果

我们回顾性收集了温州医科大学附属第一医院的382例不可切除HCC患者和温州市中心医院的52例患者。RFE方法确定了10个用于构建ML模型的预测因子。XGBoost和CatBoost的表现优于其他算法，内部测试的AUC为0.796 - 0.799，外部验证的AUC为0.785 - 0.791，平衡准确率为76 - 76.8%。SHAP可解释性表明肿瘤负荷和肝功能标志物是TACE耐药的关键决定因素。K均值聚类将患者分为两个预后明显不同的亚组：B组的生存期明显长于A组（HR = 0.36, 95%CI:0.26 - 0.49, < 0.001），证实了ML选择特征的临床相关性。