Suppr超能文献

FastAAI:利用通用蛋白质四聚体对基因组平均氨基酸同一性和门水平关系进行高效估计。

FastAAI: efficient estimation of genome average amino acid identity and phylum-level relationships using tetramers of universal proteins.

作者信息

Gerhardt Kenji, Ruiz-Perez Carlos A, Rodriguez-R Luis M, Jain Chirag, Tiedje James M, Cole James R, Konstantinidis Konstantinos T

机构信息

School of Biological Sciences, Georgia Institute of Technology, Atlanta, GA 30332, United States.

School of Civil and Environmental Engineering, Georgia Institute of Technology, Atlanta, GA 30332, United States.

出版信息

Nucleic Acids Res. 2025 Apr 22;53(8). doi: 10.1093/nar/gkaf348.

Abstract

Estimation of whole-genome relatedness and taxonomic identification are two important bioinformatics tasks in describing environmental or clinical microbiomes. The genome-aggregate Average Nucleotide Identity is routinely used to derive the relatedness of closely related (species level) microbial and viral genomes, but it is not appropriate for more divergent genomes. Average Amino-acid Identity (AAI) can be used in the latter cases, but no current AAI implementation can efficiently compare thousands of genomes. Here we present FastAAI, a tool that estimates whole-genome pairwise relatedness using shared tetramers of universal proteins in a matter of microseconds, providing a speedup of up to 5 orders of magnitude when compared with current methods for calculating AAI or alternative whole-genome metrics. Further, FastAAI resolves distantly related genomes related at the phylum level with comparable accuracy to the phylogeny of ribosomal RNA genes, substantially improving on a known limitation of current AAI implementations. Our analysis of the resulting AAI matrices also indicated that bacterial lineages predominantly evolve gradually, rather than showing bursts of diversification, and that AAI thresholds to define classes, orders, and families are generally elusive. Therefore, FastAAI uniquely expands the toolbox for microbiome analysis and allows it to scale to millions of genomes.

摘要

全基因组相关性估计和分类鉴定是描述环境或临床微生物群落的两项重要生物信息学任务。基因组聚合平均核苷酸同一性通常用于推导密切相关(物种水平)的微生物和病毒基因组的相关性,但它不适用于差异较大的基因组。平均氨基酸同一性(AAI)可用于后一种情况,但目前没有AAI实现方法能够高效地比较数千个基因组。在此,我们展示了FastAAI,这是一种利用通用蛋白质的共享四聚体在微秒内估计全基因组成对相关性的工具,与当前计算AAI或其他全基因组指标的方法相比,速度提升高达5个数量级。此外,FastAAI能够以与核糖体RNA基因系统发育相当的准确性解析门水平上的远缘相关基因组,显著改善了当前AAI实现方法的一个已知局限性。我们对所得AAI矩阵的分析还表明,细菌谱系主要是逐渐进化的,而不是呈现出多样化的爆发,并且定义纲、目和科的AAI阈值通常难以确定。因此,FastAAI独特地扩展了微生物群落分析的工具集,并使其能够扩展到数百万个基因组。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/355e/12034039/032cdb0eff33/gkaf348figgra1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验