Suppr超能文献

利用机器学习技术识别传染病相关宿主基因。

Identification of infectious disease-associated host genes using machine learning techniques.

机构信息

Biomedical Informatics Centre, ICMR-National Institute of Cholera and Enteric Diseases, Kolkata, West Bengal, India.

Department of Computer Science and Engineering, Jadavpur University, Kolkata, West Bengal, India.

出版信息

BMC Bioinformatics. 2019 Dec 27;20(1):736. doi: 10.1186/s12859-019-3317-0.

Abstract

BACKGROUND

With the global spread of multidrug resistance in pathogenic microbes, infectious diseases emerge as a key public health concern of the recent time. Identification of host genes associated with infectious diseases will improve our understanding about the mechanisms behind their development and help to identify novel therapeutic targets.

RESULTS

We developed a machine learning techniques-based classification approach to identify infectious disease-associated host genes by integrating sequence and protein interaction network features. Among different methods, Deep Neural Networks (DNN) model with 16 selected features for pseudo-amino acid composition (PAAC) and network properties achieved the highest accuracy of 86.33% with sensitivity of 85.61% and specificity of 86.57%. The DNN classifier also attained an accuracy of 83.33% on a blind dataset and a sensitivity of 83.1% on an independent dataset. Furthermore, to predict unknown infectious disease-associated host genes, we applied the proposed DNN model to all reviewed proteins from the database. Seventy-six out of 100 highly-predicted infectious disease-associated genes from our study were also found in experimentally-verified human-pathogen protein-protein interactions (PPIs). Finally, we validated the highly-predicted infectious disease-associated genes by disease and gene ontology enrichment analysis and found that many of them are shared by one or more of the other diseases, such as cancer, metabolic and immune related diseases.

CONCLUSIONS

To the best of our knowledge, this is the first computational method to identify infectious disease-associated host genes. The proposed method will help large-scale prediction of host genes associated with infectious-diseases. However, our results indicated that for small datasets, advanced DNN-based method does not offer significant advantage over the simpler supervised machine learning techniques, such as Support Vector Machine (SVM) or Random Forest (RF) for the prediction of infectious disease-associated host genes. Significant overlap of infectious disease with cancer and metabolic disease on disease and gene ontology enrichment analysis suggests that these diseases perturb the functions of the same cellular signaling pathways and may be treated by drugs that tend to reverse these perturbations. Moreover, identification of novel candidate genes associated with infectious diseases would help us to explain disease pathogenesis further and develop novel therapeutics.

摘要

背景

随着致病微生物中多药耐药性的全球传播,传染病成为当前主要的公共卫生关注点之一。鉴定与传染病相关的宿主基因将有助于我们了解其发病机制,并有助于确定新的治疗靶点。

结果

我们开发了一种基于机器学习技术的分类方法,通过整合序列和蛋白质相互作用网络特征来鉴定与传染病相关的宿主基因。在不同的方法中,基于深度神经网络(DNN)的模型,使用 16 个选择的伪氨基酸组成(PAAC)和网络特性特征,实现了 86.33%的最高准确率,敏感性为 85.61%,特异性为 86.57%。DNN 分类器在盲数据集上的准确率也达到了 83.33%,在独立数据集上的敏感性为 83.1%。此外,为了预测未知的与传染病相关的宿主基因,我们将提出的 DNN 模型应用于数据库中所有经过审查的蛋白质。在我们的研究中,从 100 个高度预测的与传染病相关的基因中,有 76 个也出现在经过实验验证的人类病原体蛋白质-蛋白质相互作用(PPIs)中。最后,我们通过疾病和基因本体富集分析验证了高度预测的与传染病相关的基因,发现其中许多基因与一种或多种其他疾病(如癌症、代谢和免疫相关疾病)共享。

结论

据我们所知,这是第一个鉴定与传染病相关的宿主基因的计算方法。该方法将有助于大规模预测与传染病相关的宿主基因。然而,我们的结果表明,对于小数据集,基于先进的 DNN 的方法并不比更简单的监督机器学习技术(如支持向量机(SVM)或随机森林(RF))具有显著优势,用于预测与传染病相关的宿主基因。传染病与癌症和代谢疾病在疾病和基因本体富集分析上的显著重叠表明,这些疾病扰乱了相同细胞信号通路的功能,并且可以用倾向于逆转这些干扰的药物进行治疗。此外,鉴定与传染病相关的新型候选基因将有助于我们进一步解释疾病发病机制,并开发新的治疗方法。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9d55/6935192/9221374ba8df/12859_2019_3317_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验