Jia Zheng-Yi, Abulimiti Maierbiya, Wu Yun, Ma Li-Na, Li Xiao-Yu, Wang Jie
School of Pharmacy, Xinjiang Medical University, Urumqi, 830011, China.
Department of General Medicine, The First Affiliated Hospital of Xinjiang Medical University, Urumqi, 830011, China.
Heliyon. 2025 Jan 19;11(2):e42030. doi: 10.1016/j.heliyon.2025.e42030. eCollection 2025 Jan 30.
The purpose of this study is to explore the epidemiological characteristics of acute myeloid leukemia (AML) and establish a more accurate model for predicting the prognosis of AML patients based on machine learning.
We obtained clinical data of a total of 87,090 AML patients between 1975 and 2019 from the SEER database. First, we used Kaplan-Meier analysis to examine the prognosis of patients in different strata. Then, we discussed the independent factors that influenced the overall survival (OS) of AML patients, using univariate and multivariate Cox regression analysis. Finally, we used 11 machine learning algorithms to predict the survival rate of AML patients at 1, 2, and 3 years, respectively. We also used five-fold cross-validation with 20 cycles to obtain the optimal parameters for each model, in order to improve the accuracy of predictions.
The Kaplan-Meier analysis showed that the survival rate of patients diagnosed after 2010 was significantly higher than that of those diagnosed before. In addition, older age, male gender, and non-black race were associated with poor prognosis. Among the FAB subtypes, M3 AML had a better prognosis than other subtypes, and among the WHO subtypes, AML associated with Down syndrome had the best prognosis, followed by AML with eosinophilic abnormalities. The Cox regression analysis demonstrated that gender, age, race, and family income were significantly related to the survival of AML patients. Among the 11 machine learning models, the random forest classifier performed best on multiple evaluation metrics in predicting survival at 1, 2, and 3 years. In addition, both the XGBoost classifier and the neural network classifier showed high accuracy and reliability at each prediction stage.
Through in-depth analysis, this study provides a deeper understanding of the epidemiological characteristics of AML and successfully establishes a prediction model based on machine learning, which demonstrates good accuracy and reliability in predicting the prognosis of AML patients.
本研究旨在探讨急性髓系白血病(AML)的流行病学特征,并基于机器学习建立更准确的AML患者预后预测模型。
我们从SEER数据库中获取了1975年至2019年间共87090例AML患者的临床数据。首先,我们使用Kaplan-Meier分析来检验不同分层患者的预后。然后,我们使用单因素和多因素Cox回归分析讨论影响AML患者总生存期(OS)的独立因素。最后,我们使用11种机器学习算法分别预测AML患者1年、2年和3年的生存率。我们还使用20个周期的五折交叉验证来为每个模型获取最佳参数,以提高预测的准确性。
Kaplan-Meier分析表明,2010年后诊断的患者生存率显著高于之前诊断的患者。此外,年龄较大、男性和非黑人种族与预后不良相关。在FAB亚型中,M3 AML的预后优于其他亚型,在WHO亚型中,与唐氏综合征相关的AML预后最佳,其次是伴有嗜酸性粒细胞异常的AML。Cox回归分析表明,性别、年龄、种族和家庭收入与AML患者的生存显著相关。在11种机器学习模型中,随机森林分类器在预测1年、2年和三年生存率的多个评估指标上表现最佳。此外,XGBoost分类器和神经网络分类器在每个预测阶段都显示出高准确性和可靠性。
通过深入分析,本研究对AML的流行病学特征有了更深入的了解,并成功建立了基于机器学习的预测模型,该模型在预测AML患者预后方面显示出良好的准确性和可靠性。