Trinidad-Barnech Juan Manuel, Sotelo-Silveira José, Do Porto Darío Fernández, Smircich Pablo
Laboratorio de Bioinformática, Departamento de Genómica, Instituto de Investigaciones Biológicas Clemente Estable, MEC, Montevideo, Uruguay.
Laboratorio de Genómica Evolutiva, Sección Biomatemática, Facultad de Ciencias, Universidad de la República, Montevideo, Uruguay.
PLoS Pathog. 2025 Apr 21;21(4):e1013120. doi: 10.1371/journal.ppat.1013120. eCollection 2025 Apr.
Kinetoplastids belong to the Discoba supergroup, an early divergent eukaryotic clade. Although the amount of genomic information on these parasites has grown substantially, assigning gene functions through traditional sequence-based homology methods remains challenging. Recently, significant advancements have been made in in-silico protein structure prediction and algorithms for rapid and precise large-scale protein structure comparisons. In this work, we developed a protein structure-based homology search pipeline (ASC, Annotation by Structural Comparisons) and applied it to transfer biological information to all kinetoplastid proteins available in TriTrypDB, the reference database for this lineage. Our pipeline enabled the assignment of structural similarity to a substantial portion of kinetoplastid proteins, improving current knowledge through annotation transfer. Additionally, we identified structural homologs for representatives of 6,700 uncharacterized proteins across 33 kinetoplastid species, proteins that could not be annotated using existing sequence-based tools and databases. As a result, this approach allowed us to infer potential biological information for a considerable number of kinetoplastid proteins. Among these, we identified structural homologs to ubiquitous eukaryotic proteins that are challenging to detect in kinetoplastid genomes through standard genome annotation pipelines. The results (KASC, Kinetoplastid Annotation by Structural Comparison) are openly accessible to the community at kasc.fcien.edu.uy through a user-friendly, gene-by-gene interface that enables visual inspection of the data.
动质体属于盘基网柄菌超群,这是一个早期分化的真核生物进化枝。尽管关于这些寄生虫的基因组信息数量大幅增长,但通过传统的基于序列的同源性方法来确定基因功能仍然具有挑战性。最近,在计算机蛋白质结构预测以及用于快速精确大规模蛋白质结构比较的算法方面取得了重大进展。在这项工作中,我们开发了一种基于蛋白质结构的同源性搜索流程(ASC,通过结构比较进行注释),并将其应用于向TriTrypDB(该谱系的参考数据库)中所有可用的动质体蛋白质传递生物学信息。我们的流程能够为很大一部分动质体蛋白质确定结构相似性,通过注释转移来增进当前的知识。此外,我们为33种动质体物种中6700种未表征蛋白质的代表鉴定了结构同源物,这些蛋白质无法使用现有的基于序列的工具和数据库进行注释。因此,这种方法使我们能够推断出大量动质体蛋白质的潜在生物学信息。其中,我们鉴定出了与普遍存在的真核蛋白质的结构同源物,而通过标准的基因组注释流程在动质体基因组中检测这些同源物具有挑战性。结果(KASC,通过结构比较进行动质体注释)可通过一个用户友好的逐个基因界面在kasc.fcien.edu.uy上向社区公开获取,该界面能够直观检查数据。