Bandyopadhyay Anindita, Albashayreh Alaa, Zeinali Nahid, Fan Weiguo, Gilbertson-White Stephanie
Department of Business Analytics, University of Iowa, Iowa City, IA 52242, United States.
College of Nursing, University of Iowa, Iowa City, IA 52242, United States.
JAMIA Open. 2024 Sep 12;7(3):ooae082. doi: 10.1093/jamiaopen/ooae082. eCollection 2024 Oct.
This study uses electronic health record (EHR) data to predict 12 common cancer symptoms, assessing the efficacy of machine learning (ML) models in identifying symptom influencers.
We analyzed EHR data of 8156 adults diagnosed with cancer who underwent cancer treatment from 2017 to 2020. Structured and unstructured EHR data were sourced from the Enterprise Data Warehouse for Research at the University of Iowa Hospital and Clinics. Several predictive models, including logistic regression, random forest (RF), and XGBoost, were employed to forecast symptom development. The performances of the models were evaluated by F1-score and area under the curve (AUC) on the testing set. The SHapley Additive exPlanations framework was used to interpret these models and identify the predictive risk factors associated with fatigue as an exemplar.
The RF model exhibited superior performance with a macro average AUC of 0.755 and an F1-score of 0.729 in predicting a range of cancer-related symptoms. For instance, the RF model achieved an AUC of 0.954 and an F1-score of 0.914 for pain prediction. Key predictive factors identified included clinical history, cancer characteristics, treatment modalities, and patient demographics depending on the symptom. For example, the odds ratio (OR) for fatigue was significantly influenced by allergy (OR = 2.3, 95% CI: 1.8-2.9) and colitis (OR = 1.9, 95% CI: 1.5-2.4).
Our research emphasizes the critical integration of multimorbidity and patient characteristics in modeling cancer symptoms, revealing the considerable influence of chronic conditions beyond cancer itself.
We highlight the potential of ML for predicting cancer symptoms, suggesting a pathway for integrating such models into clinical systems to enhance personalized care and symptom management.
本研究使用电子健康记录(EHR)数据预测12种常见癌症症状,评估机器学习(ML)模型在识别症状影响因素方面的有效性。
我们分析了2017年至2020年接受癌症治疗的8156名成年癌症患者的EHR数据。结构化和非结构化EHR数据来源于爱荷华大学医院和诊所的企业研究数据仓库。采用了几种预测模型,包括逻辑回归、随机森林(RF)和XGBoost,来预测症状的发展。通过测试集上的F1分数和曲线下面积(AUC)评估模型的性能。使用SHapley加法解释框架来解释这些模型,并将与疲劳相关的预测风险因素作为示例进行识别。
RF模型在预测一系列癌症相关症状方面表现出卓越性能,宏观平均AUC为0.755,F1分数为0.729。例如,RF模型在疼痛预测方面的AUC为0.954,F1分数为0.914。确定的关键预测因素包括临床病史、癌症特征、治疗方式以及取决于症状的患者人口统计学特征。例如,过敏(优势比[OR]=2.3,95%置信区间[CI]:1.8 - 2.9)和结肠炎(OR = 1.9,95% CI:1.5 - 2.4)对疲劳的优势比有显著影响。
我们的研究强调了在对癌症症状进行建模时多病症和患者特征的关键整合,揭示了慢性病在癌症本身之外的重大影响。
我们强调了ML在预测癌症症状方面的潜力,提出了将此类模型整合到临床系统中以加强个性化护理和症状管理的途径。