Suppr超能文献

经训练可用于检测病毒和噬菌体结构蛋白的人工神经网络。

Artificial neural networks trained to detect viral and phage structural proteins.

机构信息

Program of Computational Science, San Diego State University, San Diego, California, United States of America.

出版信息

PLoS Comput Biol. 2012;8(8):e1002657. doi: 10.1371/journal.pcbi.1002657. Epub 2012 Aug 23.

Abstract

Phages play critical roles in the survival and pathogenicity of their hosts, via lysogenic conversion factors, and in nutrient redistribution, via cell lysis. Analyses of phage- and viral-encoded genes in environmental samples provide insights into the physiological impact of viruses on microbial communities and human health. However, phage ORFs are extremely diverse of which over 70% of them are dissimilar to any genes with annotated functions in GenBank. Better identification of viruses would also aid in better detection and diagnosis of disease, in vaccine development, and generally in better understanding the physiological potential of any environment. In contrast to enzymes, viral structural protein function can be much more challenging to detect from sequence data because of low sequence conservation, few known conserved catalytic sites or sequence domains, and relatively limited experimental data. We have designed a method of predicting phage structural protein sequences that uses Artificial Neural Networks (ANNs). First, we trained ANNs to classify viral structural proteins using amino acid frequency; these correctly classify a large fraction of test cases with a high degree of specificity and sensitivity. Subsequently, we added estimates of protein isoelectric points as a feature to ANNs that classify specialized families of proteins, namely major capsid and tail proteins. As expected, these more specialized ANNs are more accurate than the structural ANNs. To experimentally validate the ANN predictions, several ORFs with no significant similarities to known sequences that are ANN-predicted structural proteins were examined by transmission electron microscopy. Some of these self-assembled into structures strongly resembling virion structures. Thus, our ANNs are new tools for identifying phage and potential prophage structural proteins that are difficult or impossible to detect by other bioinformatic analysis. The networks will be valuable when sequence is available but in vitro propagation of the phage may not be practical or possible.

摘要

噬菌体在其宿主的生存和致病性方面发挥着关键作用,通过溶原性转换因子,以及通过细胞裂解进行营养再分配。对环境样本中噬菌体和病毒编码基因的分析,提供了关于病毒对微生物群落和人类健康的生理影响的见解。然而,噬菌体 ORF 极其多样化,其中超过 70%的 ORF 与 GenBank 中具有注释功能的任何基因都不相似。更好地识别病毒也将有助于更好地检测和诊断疾病,开发疫苗,并更好地了解任何环境的生理潜力。与酶不同,由于序列保守性低、已知的保守催化位点或序列结构域少以及相对有限的实验数据,病毒结构蛋白的功能从序列数据中检测更加具有挑战性。我们设计了一种使用人工神经网络 (ANN) 预测噬菌体结构蛋白序列的方法。首先,我们使用氨基酸频率训练 ANN 来分类病毒结构蛋白;这些 ANN 正确分类了大量测试案例,具有高度的特异性和敏感性。随后,我们添加了蛋白质等电点的估计值作为特征,用于分类专门的蛋白质家族,即主要衣壳和尾部蛋白。正如预期的那样,这些更专门的 ANN 比结构 ANN 更准确。为了通过实验验证 ANN 预测,我们通过透射电子显微镜检查了几个与已知序列没有显著相似性但被 ANN 预测为结构蛋白的 ORF。其中一些自我组装成与病毒结构非常相似的结构。因此,我们的 ANN 是识别噬菌体和潜在噬菌体结构蛋白的新工具,这些结构蛋白通过其他生物信息学分析难以或不可能检测到。当有序列可用但体外繁殖噬菌体不切实际或不可能时,这些网络将非常有价值。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d040/3426561/f7a1cf6c9cce/pcbi.1002657.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验