Suppr超能文献

对 COVID-19 患者的基因表达谱数据进行全面分析,以发现特异性和差异性的血液生物标志物特征。

A comprehensive analysis of gene expression profiling data in COVID-19 patients for discovery of specific and differential blood biomarker signatures.

机构信息

Department of Biotechnology, Faculty of Biological Science and Technology, The University of Isfahan, Isfahan, Iran.

Department of Plant Sciences and Biotechnology, Faculty of Life Sciences and Biotechnology, Shahid Beheshti University, Tehran, Iran.

出版信息

Sci Rep. 2023 Apr 5;13(1):5599. doi: 10.1038/s41598-023-32268-2.

Abstract

COVID-19 is a newly recognized illness with a predominantly respiratory presentation. Although initial analyses have identified groups of candidate gene biomarkers for the diagnosis of COVID-19, they have yet to identify clinically applicable biomarkers, so we need disease-specific diagnostic biomarkers in biofluid and differential diagnosis in comparison with other infectious diseases. This can further increase knowledge of pathogenesis and help guide treatment. Eight transcriptomic profiles of COVID-19 infected versus control samples from peripheral blood (PB), lung tissue, nasopharyngeal swab and bronchoalveolar lavage fluid (BALF) were considered. In order to find COVID-19 potential Specific Blood Differentially expressed genes (SpeBDs), we implemented a strategy based on finding shared pathways of peripheral blood and the most involved tissues in COVID-19 patients. This step was performed to filter blood DEGs with a role in the shared pathways. Furthermore, nine datasets of the three types of Influenza (H1N1, H3N2, and B) were used for the second step. Potential Differential Blood DEGs of COVID-19 versus Influenza (DifBDs) were found by extracting DEGs involved in only enriched pathways by SpeBDs and not by Influenza DEGs. Then in the third step, a machine learning method (a wrapper feature selection approach supervised by four classifiers of k-NN, Random Forest, SVM, Naïve Bayes) was utilized to narrow down the number of SpeBDs and DifBDs and find the most predictive combination of them to select COVID-19 potential Specific Blood Biomarker Signatures (SpeBBSs) and COVID-19 versus influenza Differential Blood Biomarker Signatures (DifBBSs), respectively. After that, models based on SpeBBSs and DifBBSs and the corresponding algorithms were built to assess their performance on an external dataset. Among all the extracted DEGs from the PB dataset (from common PB pathways with BALF, Lung and Swab), 108 unique SpeBD were obtained. Feature selection using Random Forest outperformed its counterparts and selected IGKC, IGLV3-16 and SRP9 among SpeBDs as SpeBBSs. Validation of the constructed model based on these genes and Random Forest on an external dataset resulted in 93.09% Accuracy. Eighty-three pathways enriched by SpeBDs and not by any of the influenza strains were identified, including 87 DifBDs. Using feature selection by Naive Bayes classifier on DifBDs, FMNL2, IGHV3-23, IGLV2-11 and RPL31 were selected as the most predictable DifBBSs. The constructed model based on these genes and Naive Bayes on an external dataset was validated with 87.2% accuracy. Our study identified several candidate blood biomarkers for a potential specific and differential diagnosis of COVID-19. The proposed biomarkers could be valuable targets for practical investigations to validate their potential.

摘要

新型冠状病毒肺炎(COVID-19)是一种以呼吸系统表现为主的新出现的疾病。虽然最初的分析已经确定了 COVID-19 候选基因生物标志物的候选群体,但尚未确定临床上可应用的生物标志物,因此我们需要在生物液中寻找疾病特异性诊断生物标志物,并与其他传染病进行鉴别诊断。这可以进一步增加对发病机制的了解,并有助于指导治疗。我们考虑了来自外周血(PB)、肺组织、鼻咽拭子和支气管肺泡灌洗液(BALF)的 8 个 COVID-19 感染与对照样本的转录组谱。为了找到 COVID-19 的潜在特异性血液差异表达基因(SpeBDs),我们实施了一种基于在外周血和 COVID-19 患者最相关组织中寻找共同途径的策略。这一步是为了筛选在外周血中具有共同途径作用的血液差异表达基因(DEGs)。此外,还使用了三种类型流感(H1N1、H3N2 和 B)的 9 个数据集进行第二步分析。通过提取仅与 SpeBDs 相关且与流感 DEGs 无关的富集途径中的差异表达基因,找到了 COVID-19 与流感的潜在差异血液 DEGs(DifBDs)。然后,在第三步中,利用一种机器学习方法(一种由四个分类器的 k-NN、随机森林、SVM、朴素贝叶斯进行监督的包装特征选择方法)来缩小 SpeBDs 和 DifBDs 的数量,并找到它们最具预测性的组合,以分别选择 COVID-19 的潜在特异性血液生物标志物特征(SpeBBSs)和 COVID-19 与流感的差异血液生物标志物特征(DifBBSs)。之后,基于 SpeBBSs 和 DifBBSs 以及相应的算法构建了模型,以评估它们在外部数据集上的性能。在外周血数据集(与 BALF、肺和拭子的共同外周血途径)中提取的所有差异表达基因中,获得了 108 个独特的 SpeBD。随机森林的特征选择优于其他方法,并选择 IGKC、IGLV3-16 和 SRP9 作为 SpeBBSs。基于这些基因和随机森林在外部数据集上构建的模型的验证结果准确率为 93.09%。确定了 83 个由 SpeBD 富集但不受任何流感株影响的途径,包括 87 个 DifBD。使用朴素贝叶斯分类器对 DifBDs 进行特征选择,选择 FMNL2、IGHV3-23、IGLV2-11 和 RPL31 作为最具预测性的 DifBBSs。基于这些基因和朴素贝叶斯在外部数据集上构建的模型的验证准确率为 87.2%。我们的研究确定了一些候选血液生物标志物,可用于 COVID-19 的潜在特异性和鉴别诊断。所提出的生物标志物可能是验证其潜在价值的实际研究的有价值的靶点。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9b43/10076301/3eaa64fd9732/41598_2023_32268_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验