Hou Fei, Zhu Yun, Zhao Hongbo, Cai Haolin, Wang Yinghui, Peng Xiaoqi, Lu Lin, He Rongli, Hou Yan, Li Zhenhui, Chen Ting
Department of Nuclear Medicine, Yunnan Cancer Hospital, The Third Affiliated Hospital of Kunming Medical University, Peking University Cancer Hospital Yunnan, Kunming, China.
Department of Radiology, The First Affiliated Hospital of Kunming Medical University, Kunming, China.
EClinicalMedicine. 2024 Oct 30;77:102913. doi: 10.1016/j.eclinm.2024.102913. eCollection 2024 Nov.
The survival rate of patients with distant metastasis (DM) of papillary thyroid carcinoma (PTC) is significantly reduced. It is of great significance to find an effective method for early prediction of the risk of DM for formulating individualized diagnosis and treatment plans and improving prognosis. Previous studies have significant limitations, and it is still necessary to develop new models for predicting the risk of DM of PTC. We aimed to develop and validate interpretable machine learning (ML) models for early prediction of DM in patients with PTC using a multicenter cohort.
We collected data on patients with PTC who were admitted between June 2013 and May 2023. Data from 1430 patients at Yunnan Cancer Hospital (YCH) served as the training and internal validation set, while data from 434 patients at the First Affiliated Hospital of Kunming Medical University (KMU 1st AH) was used as the external test set. Nine ML methods such as random forest (RF) were used to construct the model. Model prediction performance was compared using evaluation indicators such as the area under the receiver operating characteristic curve (AUC). The SHapley Additive exPlanation (SHAP) method was used to rank the feature importance and explain the final model.
Among the nine ML models, the RF model performed the best. The RF model accurately predicted the risk of DM in patients with PTC in both the internal validation of the training set [AUC: 0.913, 95% confidence interval (CI) (0.9075-0.9185)] and the external test set [AUC: 0.8996, 95% CI (0.8483-0.9509)]. The calibration curve showed high agreement between the predicted and observed risks. In the sensitivity analysis focusing on DM sites of PTC, the RF model exhibited outstanding performance in predicting "lung-only metastasis" showing high AUC, specificity, sensitivity, F1 score, and a low Brier score. SHAP analysis identified variables that contributed to the model predictions. An online calculator based on the RF model was developed and made available for clinicians at https://predictingdistantmetastasis.shinyapps.io/shiny1/. 11 variables were included in the final RF model: age of the patient with PTC, whether the tumor size is > 2 cm, whether the tumor size is ≤ 1 cm, lymphocyte (LYM) count, monocyte (MONO) count, monocyte/lymphocyte ratio (MLR), thyroglobulin (TG) level, thyroid peroxidase antibody (TPOAb) level, whether the T stage is T1/2, whether the T stage is T3/4, and whether the N stage is N0.
On the basis of large-sample and multicenter data, we developed and validated an explainable ML model for predicting the risk of DM in patients with PTC. The model helps clinicians to identify high-risk patients early and provides a basis for individualized patient treatment plans.
This work was supported by the National Natural Science Foundation of China (No. 81960426, 82360345 and 82001986), the Outstanding Youth Science Foundation of Yunnan Basic Research Project (No. 202401AY070001-316), Yunnan Province Applied and Basic Research Foundation (No. 202401AT070008), and Ten Thousand Talent Plans for Young Top-notch Talents of Yunnan Province.
甲状腺乳头状癌(PTC)远处转移(DM)患者的生存率显著降低。寻找一种早期预测DM风险的有效方法对于制定个体化诊断和治疗方案以及改善预后具有重要意义。以往的研究存在显著局限性,仍有必要开发新的PTC-DM风险预测模型。我们旨在使用多中心队列开发并验证可解释的机器学习(ML)模型,用于早期预测PTC患者的DM。
我们收集了2013年6月至2023年5月期间收治的PTC患者的数据。云南肿瘤医院(YCH)1430例患者的数据用作训练集和内部验证集,而昆明医科大学第一附属医院(KMU 1st AH)434例患者的数据用作外部测试集。使用随机森林(RF)等9种ML方法构建模型。使用受试者操作特征曲线下面积(AUC)等评估指标比较模型预测性能。采用SHapley加性解释(SHAP)方法对特征重要性进行排序并解释最终模型。
在9种ML模型中,RF模型表现最佳。RF模型在训练集的内部验证[AUC:0.913,95%置信区间(CI)(0.9075 - 0.9185)]和外部测试集[AUC:0.8996,95% CI(0.8483 - 0.9509)]中均准确预测了PTC患者的DM风险。校准曲线显示预测风险与观察风险之间具有高度一致性。在针对PTC-DM部位的敏感性分析中,RF模型在预测“仅肺转移”方面表现出色,显示出高AUC、特异性、敏感性、F1分数和低Brier分数。SHAP分析确定了对模型预测有贡献的变量。基于RF模型开发了一个在线计算器,可在https://predictingdistantmetastasis.shinyapps.io/shiny1/上供临床医生使用。最终的RF模型纳入了11个变量:PTC患者的年龄、肿瘤大小是否>2 cm、肿瘤大小是否≤1 cm、淋巴细胞(LYM)计数、单核细胞(MONO)计数、单核细胞/淋巴细胞比值(MLR)、甲状腺球蛋白(TG)水平、甲状腺过氧化物酶抗体(TPOAb)水平、T分期是否为T1/2、T分期是否为T3/4以及N分期是否为N0。
基于大样本多中心数据,我们开发并验证了一种可解释的ML模型,用于预测PTC患者的DM风险。该模型有助于临床医生早期识别高危患者,并为患者个体化治疗方案提供依据。
本研究得到了中国国家自然科学基金(项目编号:81960426、82360345和82001986)、云南省基础研究项目杰出青年科学基金(项目编号:202401AY070001 - 316)、云南省应用基础研究基金(项目编号:202401AT070008)以及云南省万人计划青年拔尖人才项目的支持。