利用真实世界的电子健康记录数据预测多病共存情况下12种癌症相关症状的发生发展。

Using real-world electronic health record data to predict the development of 12 cancer-related symptoms in the context of multimorbidity.

作者信息

Bandyopadhyay Anindita, Albashayreh Alaa, Zeinali Nahid, Fan Weiguo, Gilbertson-White Stephanie

机构信息

Department of Business Analytics, University of Iowa, Iowa City, IA 52242, United States.

College of Nursing, University of Iowa, Iowa City, IA 52242, United States.

出版信息

JAMIA Open. 2024 Sep 12;7(3):ooae082. doi: 10.1093/jamiaopen/ooae082. eCollection 2024 Oct.

DOI:10.1093/jamiaopen/ooae082

PMID:39282082

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11397936/

Abstract

OBJECTIVE

This study uses electronic health record (EHR) data to predict 12 common cancer symptoms, assessing the efficacy of machine learning (ML) models in identifying symptom influencers.

MATERIALS AND METHODS

We analyzed EHR data of 8156 adults diagnosed with cancer who underwent cancer treatment from 2017 to 2020. Structured and unstructured EHR data were sourced from the Enterprise Data Warehouse for Research at the University of Iowa Hospital and Clinics. Several predictive models, including logistic regression, random forest (RF), and XGBoost, were employed to forecast symptom development. The performances of the models were evaluated by F1-score and area under the curve (AUC) on the testing set. The SHapley Additive exPlanations framework was used to interpret these models and identify the predictive risk factors associated with fatigue as an exemplar.

RESULTS

The RF model exhibited superior performance with a macro average AUC of 0.755 and an F1-score of 0.729 in predicting a range of cancer-related symptoms. For instance, the RF model achieved an AUC of 0.954 and an F1-score of 0.914 for pain prediction. Key predictive factors identified included clinical history, cancer characteristics, treatment modalities, and patient demographics depending on the symptom. For example, the odds ratio (OR) for fatigue was significantly influenced by allergy (OR = 2.3, 95% CI: 1.8-2.9) and colitis (OR = 1.9, 95% CI: 1.5-2.4).

DISCUSSION

Our research emphasizes the critical integration of multimorbidity and patient characteristics in modeling cancer symptoms, revealing the considerable influence of chronic conditions beyond cancer itself.

CONCLUSION

We highlight the potential of ML for predicting cancer symptoms, suggesting a pathway for integrating such models into clinical systems to enhance personalized care and symptom management.

摘要

目的

本研究使用电子健康记录（EHR）数据预测12种常见癌症症状，评估机器学习（ML）模型在识别症状影响因素方面的有效性。

材料与方法

我们分析了2017年至2020年接受癌症治疗的8156名成年癌症患者的EHR数据。结构化和非结构化EHR数据来源于爱荷华大学医院和诊所的企业研究数据仓库。采用了几种预测模型，包括逻辑回归、随机森林（RF）和XGBoost，来预测症状的发展。通过测试集上的F1分数和曲线下面积（AUC）评估模型的性能。使用SHapley加法解释框架来解释这些模型，并将与疲劳相关的预测风险因素作为示例进行识别。

结果

RF模型在预测一系列癌症相关症状方面表现出卓越性能，宏观平均AUC为0.755，F1分数为0.729。例如，RF模型在疼痛预测方面的AUC为0.954，F1分数为0.914。确定的关键预测因素包括临床病史、癌症特征、治疗方式以及取决于症状的患者人口统计学特征。例如，过敏（优势比[OR]=2.3，95%置信区间[CI]：1.8 - 2.9）和结肠炎（OR = 1.9，95% CI：1.5 - 2.4）对疲劳的优势比有显著影响。

讨论

我们的研究强调了在对癌症症状进行建模时多病症和患者特征的关键整合，揭示了慢性病在癌症本身之外的重大影响。

结论

我们强调了ML在预测癌症症状方面的潜力，提出了将此类模型整合到临床系统中以加强个性化护理和症状管理的途径。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ddf1/11397936/bed1618510c5/ooae082f1.jpg

相似文献

Using real-world electronic health record data to predict the development of 12 cancer-related symptoms in the context of multimorbidity.

JAMIA Open. 2024 Sep 12;7(3):ooae082. doi: 10.1093/jamiaopen/ooae082. eCollection 2024 Oct.

Classifying social determinants of health from unstructured electronic health records using deep learning-based natural language processing.

J Biomed Inform. 2022 Mar;127:103984. doi: 10.1016/j.jbi.2021.103984. Epub 2022 Jan 7.

Deep Learning Approaches for Predicting Glaucoma Progression Using Electronic Health Records and Natural Language Processing.

Ophthalmol Sci. 2022 Feb 12;2(2):100127. doi: 10.1016/j.xops.2022.100127. eCollection 2022 Jun.

Developing a FHIR-based EHR phenotyping framework: A case study for identification of patients with obesity and multiple comorbidities from discharge summaries.

J Biomed Inform. 2019 Nov;99:103310. doi: 10.1016/j.jbi.2019.103310. Epub 2019 Oct 14.

Predicting seizure recurrence after an initial seizure-like episode from routine clinical notes using large language models: a retrospective cohort study.

Lancet Digit Health. 2023 Dec;5(12):e882-e894. doi: 10.1016/S2589-7500(23)00179-6.

Application of machine learning model in predicting the likelihood of blood transfusion after hip fracture surgery.

Aging Clin Exp Res. 2023 Nov;35(11):2643-2656. doi: 10.1007/s40520-023-02550-4. Epub 2023 Sep 21.

Predicting near-term glaucoma progression: An artificial intelligence approach using clinical free-text notes and data from electronic health records.

Front Med (Lausanne). 2023 Apr 13;10:1157016. doi: 10.3389/fmed.2023.1157016. eCollection 2023.

Machine learning-based prediction models for home discharge in patients with COVID-19: Development and evaluation using electronic health records.

PLoS One. 2023 Oct 20;18(10):e0292888. doi: 10.1371/journal.pone.0292888. eCollection 2023.

Symptom-BERT: Enhancing Cancer Symptom Detection in EHR Clinical Notes.

J Pain Symptom Manage. 2024 Aug;68(2):190-198.e1. doi: 10.1016/j.jpainsymman.2024.05.015. Epub 2024 May 23.

Joint modeling strategy for using electronic medical records data to build machine learning models: an example of intracerebral hemorrhage.

BMC Med Inform Decis Mak. 2022 Oct 25;22(1):278. doi: 10.1186/s12911-022-02018-x.

引用本文的文献

The Use of Machine Learning for Analyzing Real-World Data in Disease Prediction and Management: Systematic Review.

JMIR Med Inform. 2025 Jun 19;13:e68898. doi: 10.2196/68898.

Deep Learning Approaches to Forecast Physical and Mental Deterioration During Chemotherapy in Patients with Cancer.

Diagnostics (Basel). 2025 Apr 9;15(8):956. doi: 10.3390/diagnostics15080956.

Non-Invasive Cancer Detection Using Blood Test and Predictive Modeling Approach.

Adv Appl Bioinform Chem. 2025 Jan 10;17:159-178. doi: 10.2147/AABC.S488604. eCollection 2024.

本文引用的文献

Natural Language Processing Accurately Differentiates Cancer Symptom Information in Electronic Health Record Narratives.

JCO Clin Cancer Inform. 2024 Aug;8:e2300235. doi: 10.1200/CCI.23.00235.

Machine Learning Approaches to Predict Symptoms in People With Cancer: Systematic Review.

JMIR Cancer. 2024 Mar 19;10:e52322. doi: 10.2196/52322.

The Iowa Health Data Resource (IHDR): an innovative framework for transforming the clinical health data ecosystem.

J Am Med Inform Assoc. 2024 Feb 16;31(3):720-726. doi: 10.1093/jamia/ocad236.

Multimorbidity in people living with and beyond cancer: a scoping review.

Am J Cancer Res. 2023 Sep 15;13(9):4346-4365. eCollection 2023.

A comprehensive analysis of recent advancements in cancer detection using machine learning and deep learning models for improved diagnostics.

J Cancer Res Clin Oncol. 2023 Nov;149(15):14365-14408. doi: 10.1007/s00432-023-05216-w. Epub 2023 Aug 4.

Machine learning explainability in nasopharyngeal cancer survival using LIME and SHAP.

Sci Rep. 2023 Jun 2;13(1):8984. doi: 10.1038/s41598-023-35795-0.

Quality of Life and Side Effects Management in Cancer Treatment-A Cross Sectional Study.

Int J Environ Res Public Health. 2023 Jan 17;20(3):1708. doi: 10.3390/ijerph20031708.

Extreme gradient boosting model to assess risk of central cervical lymph node metastasis in patients with papillary thyroid carcinoma: Individual prediction using SHapley Additive exPlanations.

Comput Methods Programs Biomed. 2022 Oct;225:107038. doi: 10.1016/j.cmpb.2022.107038. Epub 2022 Jul 23.

A prediction model for xerostomia in locoregionally advanced nasopharyngeal carcinoma patients receiving radical radiotherapy.

BMC Oral Health. 2022 Jun 17;22(1):239. doi: 10.1186/s12903-022-02269-0.

A decision tree prediction model for a short-term outcome of delirium in patients with advanced cancer receiving pharmacological interventions: A secondary analysis of a multicenter and prospective observational study (Phase-R).

Palliat Support Care. 2022 Apr;20(2):153-158. doi: 10.1017/S1478951521001565.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用真实世界的电子健康记录数据预测多病共存情况下12种癌症相关症状的发生发展。

Using real-world electronic health record data to predict the development of 12 cancer-related symptoms in the context of multimorbidity.

作者信息

机构信息

出版信息

OBJECTIVE

MATERIALS AND METHODS

RESULTS

DISCUSSION

CONCLUSION

目的

材料与方法

结果

讨论

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献