Suppr超能文献

训练、验证和测试子宫内膜癌复发的机器学习预测模型

Training, Validating, and Testing Machine Learning Prediction Models for Endometrial Cancer Recurrence.

作者信息

Gonzalez Bosquet Jesus, Polio Andrew, George Erin, Tarhini Ahmad A, Cosgrove Casey M, Huang Marilyn S, Corr Bradley, Leiser Aliza L, Salhia Bodour, Darcy Kathleen, Tarney Christopher M, Dood Rob L, Dockery Lauren E, Edge Stephen B, Cavnar Michael J, Landrum Lisa, Rounbehler Rob J, Churchman Michelle, Wagner Vincent M

机构信息

Department of Obstetrics and Gynecology, Gynecologic Oncology, University of Iowa, Iowa City, IA.

Gynecologic Oncology, Moffit Cancer Center, Tampa, FL.

出版信息

JCO Precis Oncol. 2025 May;9:e2400859. doi: 10.1200/PO-24-00859. Epub 2025 May 5.

Abstract

PURPOSE

Endometrial cancer (EC) is the most common gynecologic cancer in the United States with rising incidence and mortality. Despite optimal treatment, 15%-20% of all patients will recur. To better select patients for adjuvant therapy, it is important to accurately predict patients at risk for recurrence. Our objective was to train, validate, and test models of EC recurrence using lasso regression and other machine learning (ML) and deep learning (DL) analytics in a large, comprehensive data set.

METHODS

Data from patients with EC were downloaded from the Oncology Research Information Exchange Network database and stratified into low risk, The International Federation of Gynecology and Obstetrics (FIGO) grade 1 and 2, stage I (N = 329); high risk, or FIGO grade 3 or stages II, III, IV (N = 324); and nonendometrioid histology (N = 239) groups. Clinical, pathologic, genomic, and genetic data were used for the analysis. Genomic data included microRNA, long noncoding RNA, isoforms, and pseudogene expressions. Genetic variation included single-nucleotide variation (SNV) and copy-number variation (CNV). In the discovery phase, we selected variables informative for recurrence ( < .05), using univariate analyses of variance. Then, we trained, validated, and tested multivariate models using selected variables and lasso regression, MATLAB (ML), and TensorFlow (DL).

RESULTS

Recurrence clinic models for low-risk, high-risk, and high-risk nonendometrioid histology had AUCs of 56%, 70%, and 65%, respectively. For training, we selected models with AUC >80%: five for the low-risk group, 20 models for the high-risk group, and 20 for the nonendometrioid group. The two best low-risk models included clinical data and CNVs. For the high-risk group, three of the five best-performing models included pseudogene expression. For the nonendometrioid group, pseudogene expression and SNV were overrepresented in the best models.

CONCLUSION

Prediction models of EC recurrence built with ML and DL analytics had better performance than models with clinical and pathologic data alone. Prospective validation is required to determine clinical utility.

摘要

目的

子宫内膜癌(EC)是美国最常见的妇科癌症,其发病率和死亡率呈上升趋势。尽管进行了最佳治疗,但所有患者中仍有15%-20%会复发。为了更好地选择接受辅助治疗的患者,准确预测有复发风险的患者非常重要。我们的目标是在一个大型综合数据集中,使用套索回归以及其他机器学习(ML)和深度学习(DL)分析方法来训练、验证和测试EC复发模型。

方法

从肿瘤研究信息交换网络数据库下载EC患者的数据,并将其分为低风险组(国际妇产科联盟(FIGO)1级和2级,I期,N = 329);高风险组,即FIGO 3级或II、III、IV期(N = 324);以及非子宫内膜样组织学组(N = 239)。临床、病理、基因组和遗传数据用于分析。基因组数据包括微小RNA、长链非编码RNA、异构体和假基因表达。基因变异包括单核苷酸变异(SNV)和拷贝数变异(CNV)。在发现阶段,我们使用单因素方差分析选择对复发有信息价值的变量(P <.05)。然后,我们使用选定的变量以及套索回归、MATLAB(ML)和TensorFlow(DL)训练、验证和测试多变量模型。

结果

低风险、高风险和高风险非子宫内膜样组织学的复发临床模型的曲线下面积(AUC)分别为56%、70%和65%。为了进行训练,我们选择AUC >80%的模型:低风险组5个,高风险组20个,非子宫内膜样组20个。两个最佳的低风险模型包括临床数据和CNV。对于高风险组,五个表现最佳的模型中有三个包括假基因表达。对于非子宫内膜样组,假基因表达和SNV在最佳模型中占比过高。

结论

使用ML和DL分析构建的EC复发预测模型比仅使用临床和病理数据的模型表现更好。需要进行前瞻性验证以确定其临床实用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4ac/12058372/fb4f933cc739/po-9-e2400859-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验