State Key Laboratory of Tea Plant Biology and Utilization, School of Tea and Food Sciences and Technology, Anhui Agricultural University, Hefei, Anhui Province, China.
School of Marine Science, Ningbo University, Ningbo, Zhejiang, China.
Comb Chem High Throughput Screen. 2023;26(2):424-435. doi: 10.2174/1386207325666220404123433.
The clinical diagnosis of major depressive disorder (MDD) mainly relies on subjective assessment of depression-like behaviors and clinical examination. In the present study, we aimed to develop a novel diagnostic model for specially predicting MDD.
The human brain GSE102556 DataSet and the blood GSE98793 and GSE76826 Data Sets were downloaded from the Gene Expression Omnibus (GEO) database. We used a novel algorithm, random forest (RF) plus artificial neural network (ANN), to examine gene biomarkers and establish a diagnostic model of MDD.
Through the "limma" package in the R language, 2653 differentially expressed genes (DEGs) were identified in the GSE102556 DataSet, and 1786 DEGs were identified in the GSE98793 DataSet, and a total of 100 shared DEGs. We applied GSE98793 TrainData 1 to an RF algorithm and thereby successfully selected 28 genes as biomarkers. Furthermore, 28 biomarkers were verified by GSE98793 TestData 1, and the performance of these biomarkers was found to be perfect. In addition, we further used an ANN algorithm to optimize the weight of each gene and employed GSE98793 TrainData 2 to build an ANN model through the neural net package by R language. Based on this algorithm, GSE98793 TestData 2 and independent blood GSE76826 were verified to correlate with MDD, with AUCs of 0.903 and 0.917, respectively.
To the best of our knowledge, this is the first time that the classifier constructed via DEG biomarkers has been used as an endophenotype for MDD clinical diagnosis. Our results may provide a new entry point for the diagnosis, treatment, outcome prediction, prognosis and recurrence of MDD.
重度抑郁症(MDD)的临床诊断主要依赖于对抑郁样行为和临床检查的主观评估。在本研究中,我们旨在开发一种新的诊断模型,专门预测 MDD。
从基因表达综合数据库(GEO)下载人类大脑 GSE102556 数据集以及血液 GSE98793 和 GSE76826 数据集。我们使用一种新的算法,随机森林(RF)加人工神经网络(ANN),检查基因生物标志物并建立 MDD 的诊断模型。
通过 R 语言中的“limma”包,在 GSE102556 数据集中鉴定出 2653 个差异表达基因(DEGs),在 GSE98793 数据集中鉴定出 1786 个 DEGs,共鉴定出 100 个共享 DEGs。我们将 GSE98793 TrainData 1 应用于 RF 算法,从而成功选择了 28 个基因作为生物标志物。此外,我们通过 GSE98793 TestData 1 验证了 28 个生物标志物,发现这些生物标志物的性能非常完美。此外,我们进一步使用 ANN 算法优化每个基因的权重,并通过 R 语言的 neural net 包使用 GSE98793 TrainData 2 构建 ANN 模型。基于该算法,我们验证了 GSE98793 TestData 2 和独立的血液 GSE76826 与 MDD 相关,AUC 分别为 0.903 和 0.917。
据我们所知,这是首次使用基于 DEG 生物标志物构建的分类器作为 MDD 临床诊断的内表型。我们的研究结果可能为 MDD 的诊断、治疗、预后预测、预后和复发提供新的切入点。