Inonu University, Faculty of Medicine, Department of Biostatistics and Medical Informatics, Malatya, Turkey.
Inonu University, Faculty of Medicine, Department of Biostatistics and Medical Informatics, Malatya, Turkey.
Comput Methods Programs Biomed. 2021 Apr;201:105951. doi: 10.1016/j.cmpb.2021.105951. Epub 2021 Jan 22.
The new type of Coronavirus (2019-nCov) epidemic spread rapidly, causing more than 250 thousand deaths worldwide. The virus, which first appeared as a sign of pneumonia, was later called the SARS-COV-2 with Severe Acute Respiratory Syndrome by the World Health Organization. The SARS-COV-2 virus is triggered by binding to the Angiotensin-Converting Enzyme 2 (ACE 2) inhibitor, which is vital in cardiovascular diseases and the immune system, especially in conditions such as cerebrovascular, hypertension, and diabetes. This study aims to evaluate the prediction performance of death status based on the demographic/clinical factors (including COVID-19 severity) by data mining methods.
The dataset consists of 1603 SARS-COV-2 patients and 13 variables obtained from an open-source web address. The current dataset contains age, gender, chronic disease (hypertension, diabetes, renal, cardiovascular, etc.), some enzymes (ACE, angiotensin II receptor blockers), and COVID-19 severity, which are used to predict death status using deep learning and machine learning approaches (random forest, k-nearest neighbor, extreme gradient boosting [XGBoost]). A grid search algorithm tunes hyperparameters of the models, and predictions are assessed through performance metrics. Steps of knowledge discovery in databases are applied to obtain the relevant information.
The accuracy rate of deep learning (97.15%) was more successful than the accuracy rate based on classical machine learning (92.15% for RF and 93.4% for k-NN), but the ensemble classifier XGBoost method gave the highest accuracy (99.7%). While COVID-19 severity and age calculated from XGBoost were the two most important factors associated with death status, the most determining variables for death status estimated from deep learning were COVID-19 severity and hypertension.
The proposed model (XGBoost) achieved the best prediction of death status based on the factors as compared to the other algorithms. The results of this study can guide patients with certain variables to take early measures and access preventive health care services before they become infected with the virus.
新型冠状病毒(2019-nCov)疫情迅速蔓延,导致全球超过 25 万人死亡。该病毒最初表现为肺炎迹象,后被世界卫生组织命名为严重急性呼吸系统综合征冠状病毒 2 型(SARS-COV-2)。SARS-COV-2 病毒通过与血管紧张素转换酶 2(ACE 2)抑制剂结合而引发,ACE 2 抑制剂在心血管疾病和免疫系统中至关重要,特别是在脑血管、高血压和糖尿病等情况下。本研究旨在通过数据挖掘方法评估基于人口统计学/临床因素(包括 COVID-19 严重程度)预测死亡状态的能力。
数据集包含 1603 名 SARS-COV-2 患者和从开源网址获取的 13 个变量。当前数据集包含年龄、性别、慢性疾病(高血压、糖尿病、肾脏、心血管等)、一些酶(ACE、血管紧张素 II 受体阻滞剂)和 COVID-19 严重程度,用于使用深度学习和机器学习方法(随机森林、k-最近邻、极端梯度提升 [XGBoost])预测死亡状态。网格搜索算法调整模型的超参数,通过性能指标评估预测。应用数据库知识发现步骤获取相关信息。
深度学习的准确率(97.15%)比基于经典机器学习的准确率(RF 为 92.15%,k-NN 为 93.4%)更高,但集成分类器 XGBoost 方法的准确率最高(99.7%)。虽然 COVID-19 严重程度和 XGBoost 计算的年龄是与死亡状态最相关的两个因素,但从深度学习估计的死亡状态的最重要决定因素是 COVID-19 严重程度和高血压。
与其他算法相比,所提出的模型(XGBoost)在基于因素预测死亡状态方面取得了最佳效果。本研究的结果可以指导具有某些变量的患者在感染病毒之前采取早期措施并获得预防保健服务。