Finkelstein Joseph, Smiley Aref, Echeverria Christina, Mooney Kathi
Department of Biomedical Informatics, The University of Utah, Salt Lake City, UT, USA.
College of Nursing, The University of Utah, Salt Lake City, UT, USA.
AMIA Annu Symp Proc. 2025 May 22;2024:427-432. eCollection 2024.
This study evaluates the utility of machine learning (ML) algorithms in early forecasting of total symptom score changes from daily self-reports of 339 chemotherapy patients. The dataset comprised 12 specific symptoms, with severity and distress for each symptom rated on a 1 to 10 scale, generating a "total symptom score" ranging from 0 to 230. To address the challenge of an unbalanced original dataset, where Class I (score change ≥ 5) and Class II (score change < 5) were unevenly represented, we created a balanced dataset specifically for model training. This process involved a stratified sampling technique to ensure equitable representation of both classes, enhancing the predictive analysis. Using the MATLAB® Classification Learner application, we investigated nine ML models, including decision trees, discriminant analysis, support vector machines (SVM), and others, each applying various classifiers. The objective was to predict the total symptom score change based on the preceding 3 to 5 days' symptom data. Models were trained on the balanced dataset to mitigate the original imbalance's impact, with comparative evaluations also conducted on the unbalanced data to assess performance differences. The analysis revealed that certain classifiers, such as SVM, delivered optimal performance on the unbalanced dataset, with an accuracy rate peaking at 82%. Yet, these models tended to frequently misclassify Class I as Class II. In contrast, the Ensemble algorithm equipped with the RUSBoost classifier demonstrated exceptional skill in accurately classifying both classes on both datasets, achieving accuracies of 59%, 59.3%, and 59.4% for data from 3, 4, and 5 days prior, respectively. Notably, these figures slightly improved to 61.16%, 58.41%, and 60.05% upon utilizing the balanced dataset for training. The deployment of a balanced dataset for model training underscores the significant potential of ML algorithms in improving symptom management for chemotherapy patients, offering a path to enhanced patient care and quality of life through targeted, personalized symptom monitoring.
本研究评估了机器学习(ML)算法在早期预测339名化疗患者每日自我报告的总症状评分变化方面的效用。数据集包含12种特定症状,每种症状的严重程度和痛苦程度按1至10分进行评分,从而产生一个范围从0到230的“总症状评分”。为应对原始数据集不平衡的挑战,其中I类(评分变化≥5)和II类(评分变化<5)的表示不均衡,我们专门为模型训练创建了一个平衡数据集。此过程涉及分层抽样技术,以确保两类的公平表示,增强预测分析。使用MATLAB®分类学习器应用程序,我们研究了九种ML模型,包括决策树、判别分析、支持向量机(SVM)等,每个模型都应用了各种分类器。目标是根据前3至5天的症状数据预测总症状评分变化。模型在平衡数据集上进行训练,以减轻原始不平衡的影响,同时也在不平衡数据上进行比较评估,以评估性能差异。分析表明,某些分类器,如SVM,在不平衡数据集上表现出最佳性能,准确率最高达到82%。然而,这些模型往往经常将I类误分类为II类。相比之下,配备RUSBoost分类器的集成算法在准确分类两个数据集上的两类方面表现出卓越技能,对于前3天、4天和5天的数据,准确率分别达到59%、59.3%和59.4%。值得注意的是,在使用平衡数据集进行训练后,这些数字分别略微提高到61.16%、58.41%和60.05%。为模型训练部署平衡数据集突出了ML算法在改善化疗患者症状管理方面的巨大潜力,通过有针对性的、个性化的症状监测,为提高患者护理水平和生活质量提供了一条途径。