Suppr超能文献

基于三种机器学习方法构建冠状动脉粥样硬化性心脏病的遗传分类模型。

Construction of genetic classification model for coronary atherosclerosis heart disease using three machine learning methods.

机构信息

Department of Epidemiology and Health Statistics, School of Public Health, Capital Medical University, and Beijing Municipal Key Laboratory of Clinical Epidemiology, No. 10, Xi Toutiao You Anmenwai, Fengtai District, Beijing, 100069, China.

出版信息

BMC Cardiovasc Disord. 2022 Feb 12;22(1):42. doi: 10.1186/s12872-022-02481-4.

Abstract

BACKGROUND

Although the diagnostic method for coronary atherosclerosis heart disease (CAD) is constantly innovated, CAD in the early stage is still missed diagnosis for the absence of any symptoms. The gene expression levels varied during disease development; therefore, a classifier based on gene expression might contribute to CAD diagnosis. This study aimed to construct genetic classification models for CAD using gene expression data, which may provide new insight into the understanding of its pathogenesis.

METHODS

All statistical analysis was completed by R 3.4.4 software. Three raw gene expression datasets (GSE12288, GSE7638 and GSE66360) related to CAD were downloaded from the Gene Expression Omnibus database and included for analysis. Limma package was performed to identify differentially expressed genes (DEGs) between CAD samples and healthy controls. The WGCNA package was conducted to recognize CAD-related gene modules and hub genes, followed by recursive feature elimination analysis to select the optimal features genes (OFGs). The genetic classification models were established using support vector machine (SVM), random forest (RF) and logistic regression (LR), respectively. Further validation and receiver operating characteristic (ROC) curve analysis were conducted to evaluate the classification performance.

RESULTS

In total, 374 DEGs, eight gene modules, 33 hub genes and 12 OFGs (HTR4, KISS1, CA12, CAMK2B, KLK2, DDC, CNGB1, DERL1, BCL6, LILRA2, HCK, MTF2) were identified. ROC curve analysis showed that the accuracy of SVM, RF and LR were 75.58%, 63.57% and 63.95% in validation; with area under the curve of 0.813 (95% confidence interval, 95% CI 0.761-0.866, P < 0.0001), 0.727 (95% CI 0.665-0.788, P < 0.0001) and 0.783 (95% CI 0.725-0.841, P < 0.0001), respectively.

CONCLUSIONS

In conclusion, this study found 12 gene signatures involved in the pathogenic mechanism of CAD. Among the CAD classifiers constructed by three machine learning methods, the SVM model has the best performance.

摘要

背景

尽管冠状动脉粥样硬化性心脏病(CAD)的诊断方法不断创新,但由于缺乏任何症状,早期 CAD 仍存在漏诊。疾病发展过程中基因表达水平发生变化;因此,基于基因表达的分类器可能有助于 CAD 的诊断。本研究旨在使用基因表达数据构建 CAD 的遗传分类模型,这可能为理解其发病机制提供新的见解。

方法

所有统计分析均由 R 3.4.4 软件完成。从基因表达综合数据库中下载了三个与 CAD 相关的原始基因表达数据集(GSE12288、GSE7638 和 GSE66360)进行分析。使用 Limma 包识别 CAD 样本和健康对照之间的差异表达基因(DEGs)。使用 WGCNA 包识别 CAD 相关基因模块和枢纽基因,然后进行递归特征消除分析以选择最佳特征基因(OFGs)。分别使用支持向量机(SVM)、随机森林(RF)和逻辑回归(LR)建立遗传分类模型。进一步进行验证和接受者操作特征(ROC)曲线分析以评估分类性能。

结果

共鉴定出 374 个 DEG、8 个基因模块、33 个枢纽基因和 12 个 OFG(HTR4、KISS1、CA12、CAMK2B、KLK2、DDC、CNGB1、DERL1、BCL6、LILRA2、HCK、MTF2)。ROC 曲线分析表明,在验证中 SVM、RF 和 LR 的准确性分别为 75.58%、63.57%和 63.95%;曲线下面积分别为 0.813(95%置信区间,95%CI 0.761-0.866,P<0.0001)、0.727(95%CI 0.665-0.788,P<0.0001)和 0.783(95%CI 0.725-0.841,P<0.0001)。

结论

综上所述,本研究发现了 12 个与 CAD 发病机制相关的基因特征。在三种机器学习方法构建的 CAD 分类器中,SVM 模型的性能最佳。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b0d/8840658/072214218126/12872_2022_2481_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验