Suppr超能文献

通过蛋白质结构比较扩展动质体基因组注释

Expanding kinetoplastid genome annotation through protein structure comparison.

作者信息

Trinidad-Barnech Juan Manuel, Sotelo-Silveira José, Do Porto Darío Fernández, Smircich Pablo

机构信息

Laboratorio de Bioinformática, Departamento de Genómica, Instituto de Investigaciones Biológicas Clemente Estable, MEC, Montevideo, Uruguay.

Laboratorio de Genómica Evolutiva, Sección Biomatemática, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay.

出版信息

PLoS Pathog. 2025 Apr 21;21(4):e1013120. doi: 10.1371/journal.ppat.1013120. eCollection 2025 Apr.

Abstract

Kinetoplastids belong to the Discoba supergroup, an early divergent eukaryotic clade. Although the amount of genomic information on these parasites has grown substantially, assigning gene functions through traditional sequence-based homology methods remains challenging. Recently, significant advancements have been made in in-silico protein structure prediction and algorithms for rapid and precise large-scale protein structure comparisons. In this work, we developed a protein structure-based homology search pipeline (ASC, Annotation by Structural Comparisons) and applied it to transfer biological information to all kinetoplastid proteins available in TriTrypDB, the reference database for this lineage. Our pipeline enabled the assignment of structural similarity to a substantial portion of kinetoplastid proteins, improving current knowledge through annotation transfer. Additionally, we identified structural homologs for representatives of 6,700 uncharacterized proteins across 33 kinetoplastid species, proteins that could not be annotated using existing sequence-based tools and databases. As a result, this approach allowed us to infer potential biological information for a considerable number of kinetoplastid proteins. Among these, we identified structural homologs to ubiquitous eukaryotic proteins that are challenging to detect in kinetoplastid genomes through standard genome annotation pipelines. The results (KASC, Kinetoplastid Annotation by Structural Comparison) are openly accessible to the community at kasc.fcien.edu.uy through a user-friendly, gene-by-gene interface that enables visual inspection of the data.

摘要

动质体属于盘基网柄菌超群,这是一个早期分化的真核生物进化枝。尽管关于这些寄生虫的基因组信息数量大幅增长,但通过传统的基于序列的同源性方法来确定基因功能仍然具有挑战性。最近,在计算机蛋白质结构预测以及用于快速精确大规模蛋白质结构比较的算法方面取得了重大进展。在这项工作中,我们开发了一种基于蛋白质结构的同源性搜索流程(ASC,通过结构比较进行注释),并将其应用于向TriTrypDB(该谱系的参考数据库)中所有可用的动质体蛋白质传递生物学信息。我们的流程能够为很大一部分动质体蛋白质确定结构相似性,通过注释转移来增进当前的知识。此外,我们为33种动质体物种中6700种未表征蛋白质的代表鉴定了结构同源物,这些蛋白质无法使用现有的基于序列的工具和数据库进行注释。因此,这种方法使我们能够推断出大量动质体蛋白质的潜在生物学信息。其中,我们鉴定出了与普遍存在的真核蛋白质的结构同源物,而通过标准的基因组注释流程在动质体基因组中检测这些同源物具有挑战性。结果(KASC,通过结构比较进行动质体注释)可通过一个用户友好的逐个基因界面在kasc.fcien.edu.uy上向社区公开获取,该界面能够直观检查数据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1737/12047770/8dac94982c18/ppat.1013120.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验