Suppr超能文献

使用机器学习方法识别重度抑郁症的诊断标志物

Identification of Diagnostic Markers for Major Depressive Disorder Using Machine Learning Methods.

作者信息

Zhao Shu, Bao Zhiwei, Zhao Xinyi, Xu Mengxiang, Li Ming D, Yang Zhongli

机构信息

State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, National Clinical Research Center for Infectious Diseases, Collaborative Innovation Center for Diagnosis and Treatment of Infectious Diseases, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, China.

Research Center for Air Pollution and Health, Zhejiang University, Hangzhou, China.

出版信息

Front Neurosci. 2021 Jun 18;15:645998. doi: 10.3389/fnins.2021.645998. eCollection 2021.

Abstract

BACKGROUND

Major depressive disorder (MDD) is a global health challenge that impacts the quality of patients' lives severely. The disorder can manifest in many forms with different combinations of symptoms, which makes its clinical diagnosis difficult. Robust biomarkers are greatly needed to improve diagnosis and to understand the etiology of the disease. The main purpose of this study was to create a predictive model for MDD diagnosis based on peripheral blood transcriptomes.

MATERIALS AND METHODS

We collected nine RNA expression datasets for MDD patients and healthy samples from the Gene Expression Omnibus database. After a series of quality control and heterogeneity tests, 302 samples from six studies were deemed suitable for the study. R package "MetaOmics" was applied for systematic meta-analysis of genome-wide expression data. Receiver operating characteristic (ROC) curve analysis was used to evaluate the diagnostic effectiveness of individual genes. To obtain a better diagnostic model, we also adopted the support vector machine (SVM), random forest (RF), k-nearest neighbors (kNN), and naive Bayesian (NB) tools for modeling, with the RF method being used for feature selection.

RESULTS

Our analysis revealed six differentially expressed genes (, , , , , and ) with a false discovery rate (FDR) < 0.05 between MDD patients and control subjects. We then evaluated the diagnostic ability of these genes individually. With single gene prediction, we achieved a corresponding area under the curve (AUC) value of 0.63 ± 0.04, 0.67 ± 0.07, 0.70 ± 0.11, 0.64 ± 0.08, 0.68 ± 0.07, and 0.62 ± 0.09, respectively, for these genes. Next, we constructed the classifiers of SVM, RF, kNN, and NB with an AUC of 0.84 ± 0.09, 0.81 ± 0.10, 0.73 ± 0.11, and 0.83 ± 0.09, respectively, in validation datasets, suggesting that the SVM classifier might be superior for constructing an MDD diagnostic model. The final SVM classifier including 70 feature genes was capable of distinguishing MDD samples from healthy controls and yielded an AUC of 0.78 in an independent dataset.

CONCLUSION

This study provides new insights into potential biomarkers through meta-analysis of GEO data. Constructing different machine learning models based on these biomarkers could be a valuable approach for diagnosing MDD in clinical practice.

摘要

背景

重度抑郁症(MDD)是一项全球性的健康挑战,严重影响患者的生活质量。该疾病可表现为多种形式,症状组合各异,这使得其临床诊断颇具难度。亟需强大的生物标志物来改善诊断并了解疾病的病因。本研究的主要目的是基于外周血转录组创建一个用于MDD诊断的预测模型。

材料与方法

我们从基因表达综合数据库收集了9个关于MDD患者和健康样本的RNA表达数据集。经过一系列质量控制和异质性检验后,来自6项研究的302个样本被认为适合该研究。使用R包“MetaOmics”对全基因组表达数据进行系统的荟萃分析。采用受试者工作特征(ROC)曲线分析来评估单个基因的诊断效能。为了获得更好的诊断模型,我们还采用支持向量机(SVM)、随机森林(RF)、k近邻(kNN)和朴素贝叶斯(NB)工具进行建模,其中RF方法用于特征选择。

结果

我们的分析揭示了6个差异表达基因(、、、、和),在MDD患者与对照受试者之间的错误发现率(FDR)<0.05。然后我们分别评估了这些基因的诊断能力。对于这些基因,通过单基因预测,我们分别获得了相应的曲线下面积(AUC)值,分别为0.63±0.04、0.67±0.07、0.70±0.11、0.64±0.08、0.68±0.07和0.62±0.09。接下来,我们构建了SVM、RF、kNN和NB分类器,在验证数据集中的AUC分别为0.84±0.09、0.81±0.10、0.73±0.11和0.83±0.09,这表明SVM分类器在构建MDD诊断模型方面可能更具优势。最终包含70个特征基因的SVM分类器能够区分MDD样本与健康对照,在独立数据集中的AUC为0.78。

结论

本研究通过对GEO数据的荟萃分析为潜在生物标志物提供了新的见解。基于这些生物标志物构建不同的机器学习模型可能是临床实践中诊断MDD的一种有价值的方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8157/8249859/1145046fee95/fnins-15-645998-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验