Department of Artificial Intelligence, Lviv Polytechnic National University, Lviv 79013, Ukraine.
Faculty of Mathematics and Computer Science, University of Warmia and Mazury, Olsztyn 10719, Poland.
Math Biosci Eng. 2022 Apr 13;19(6):6102-6123. doi: 10.3934/mbe.2022285.
Starting from December 2019, the COVID-19 pandemic has globally strained medical resources and caused significant mortality. It is commonly recognized that the severity of SARS-CoV-2 disease depends on both the comorbidity and the state of the patient's immune system, which is reflected in several biomarkers. The development of early diagnosis and disease severity prediction methods can reduce the burden on the health care system and increase the effectiveness of treatment and rehabilitation of patients with severe cases. This study aims to develop and validate an ensemble machine-learning model based on clinical and immunological features for severity risk assessment and post-COVID rehabilitation duration for SARS-CoV-2 patients. The dataset consisting of 35 features and 122 instances was collected from Lviv regional rehabilitation center. The dataset contains age, gender, weight, height, BMI, CAT, 6-minute walking test, pulse, external respiration function, oxygen saturation, and 15 immunological markers used to predict the relationship between disease duration and biomarkers using the machine learning approach. The predictions are assessed through an area under the receiver-operating curve, classification accuracy, precision, recall, and F1 score performance metrics. A new hybrid ensemble feature selection model for a post-COVID prediction system is proposed as an automatic feature cut-off rank identifier. A three-layer high accuracy stacking ensemble classification model for intelligent analysis of short medical datasets is presented. Together with weak predictors, the associative rules allowed improving the classification quality. The proposed ensemble allows using a random forest model as an aggregator for weak repressors' results generalization. The performance of the three-layer stacking ensemble classification model (AUC 0.978; CA 0.920; F1 score 0.921; precision 0.924; recall 0.920) was higher than five machine learning models, viz. tree algorithm with forward pruning; Naïve Bayes classifier; support vector machine with RBF kernel; logistic regression, and a calibrated learner with sigmoid function and decision threshold optimization. Aging-related biomarkers, viz. CD3+, CD4+, CD8+, CD22+ were examined to predict post-COVID rehabilitation duration. The best accuracy was reached in the case of the support vector machine with the linear kernel (MAPE = 0.0787) and random forest classifier (RMSE = 1.822). The proposed three-layer stacking ensemble classification model predicted SARS-CoV-2 disease severity based on the cytokines and physiological biomarkers. The results point out that changes in studied biomarkers associated with the severity of the disease can be used to monitor the severity and forecast the rehabilitation duration.
自 2019 年 12 月以来,COVID-19 大流行在全球范围内对医疗资源造成了巨大压力,并导致了大量死亡。人们普遍认为,SARS-CoV-2 疾病的严重程度取决于合并症和患者免疫系统的状态,这反映在几个生物标志物上。开发早期诊断和疾病严重程度预测方法可以减轻医疗保健系统的负担,并提高严重病例患者的治疗和康复效果。本研究旨在开发和验证一种基于临床和免疫学特征的集成机器学习模型,用于评估 SARS-CoV-2 患者的严重程度风险和 COVID-19 后康复持续时间。该数据集由 35 个特征和 122 个实例组成,采集自利沃夫地区康复中心。该数据集包含年龄、性别、体重、身高、BMI、CAT、6 分钟步行测试、脉搏、呼吸功能、血氧饱和度和 15 个免疫学标记物,用于使用机器学习方法预测疾病持续时间和生物标志物之间的关系。通过接收者操作特征曲线下面积、分类准确性、精度、召回率和 F1 分数性能指标评估预测结果。提出了一种新的基于 COVID-19 预测系统的混合集成特征选择模型,作为自动特征截止排名识别器。提出了一种用于智能分析短医疗数据集的三层高精度堆叠集成分类模型。关联规则与弱预测器一起使用,可以提高分类质量。所提出的集成模型允许使用随机森林模型作为弱抑制剂结果泛化的聚合器。三层堆叠集成分类模型(AUC 0.978;CA 0.920;F1 分数 0.921;精度 0.924;召回率 0.920)的性能优于五种机器学习模型,即带有前向剪枝的树算法;朴素贝叶斯分类器;带有 RBF 核的支持向量机;逻辑回归和带有 sigmoid 函数和决策阈值优化的校准学习者。检查了与衰老相关的生物标志物,即 CD3+、CD4+、CD8+、CD22+,以预测 COVID-19 后的康复持续时间。在线性核支持向量机(MAPE = 0.0787)和随机森林分类器(RMSE = 1.822)的情况下达到了最佳精度。所提出的三层堆叠集成分类模型基于细胞因子和生理生物标志物预测 SARS-CoV-2 疾病的严重程度。结果表明,与疾病严重程度相关的研究生物标志物的变化可用于监测严重程度并预测康复持续时间。