Department of Breast and Thyroid Surgery, Sichuan Provincial Hospital for Women and Children (Affiliated Women and Children's Hospital of Chengdu Medical College), Chengdu, China.
PLoS One. 2023 Jan 26;18(1):e0280340. doi: 10.1371/journal.pone.0280340. eCollection 2023.
Many researchers used machine learning (ML) to predict the prognosis of breast cancer (BC) patients and noticed that the ML model had good individualized prediction performance.
The cohort study was intended to establish a reliable data analysis model by comparing the performance of 10 common ML algorithms and the the traditional American Joint Committee on Cancer (AJCC) stage, and used this model in Web application development to provide a good individualized prediction for others.
This study included 63145 BC patients from the Surveillance, Epidemiology, and End Results database.
Through the performance of the 10 ML algorithms and 7th AJCC stage in the optimal test set, we found that in terms of 5-year overall survival, multivariate adaptive regression splines (MARS) had the highest area under the curve (AUC) value (0.831) and F1-score (0.608), and both sensitivity (0.737) and specificity (0.772) were relatively high. Besides, MARS showed a highest AUC value (0.831, 95%confidence interval: 0.820-0.842) in comparison to the other ML algorithms and 7th AJCC stage (all P < 0.05). MARS, the best performing model, was selected for web application development (https://w12251393.shinyapps.io/app2/).
The comparative study of multiple forecasting models utilizing a large data noted that MARS based model achieved a much better performance compared to other ML algorithms and 7th AJCC stage in individualized estimation of survival of BC patients, which was very likely to be the next step towards precision medicine.
许多研究人员使用机器学习(ML)来预测乳腺癌(BC)患者的预后,并注意到 ML 模型具有良好的个体化预测性能。
本队列研究旨在通过比较 10 种常见 ML 算法和传统的美国癌症联合委员会(AJCC)分期的性能,建立一个可靠的数据分析模型,并将该模型用于 Web 应用程序开发,为他人提供良好的个体化预测。
本研究纳入了来自监测、流行病学和最终结果(SEER)数据库的 63145 例 BC 患者。
通过在最优测试集中对 10 种 ML 算法和 7 版 AJCC 分期的性能进行评估,我们发现,在 5 年总生存率方面,多元自适应回归样条(MARS)的曲线下面积(AUC)值最高(0.831),F1 评分(0.608)最高,且灵敏度(0.737)和特异性(0.772)均较高。此外,MARS 与其他 ML 算法和 7 版 AJCC 分期相比,AUC 值最高(0.831,95%置信区间:0.820-0.842,均 P < 0.05)。选择性能最佳的 MARS 模型进行 Web 应用程序开发(https://w12251393.shinyapps.io/app2/)。
利用大数据对多个预测模型进行比较研究表明,与其他 ML 算法和 7 版 AJCC 分期相比,基于 MARS 的模型在 BC 患者生存个体化估计方面具有更好的性能,这很可能是迈向精准医学的下一步。