Suppr超能文献

SCAMPP:将基于比对的系统发育定位扩展到大型树

SCAMPP: Scaling Alignment-Based Phylogenetic Placement to Large Trees.

作者信息

Wedell Eleanor, Cai Yirong, Warnow Tandy

出版信息

IEEE/ACM Trans Comput Biol Bioinform. 2023 Mar-Apr;20(2):1417-1430. doi: 10.1109/TCBB.2022.3170386. Epub 2023 Apr 3.

Abstract

Phylogenetic placement, the problem of placing a "query" sequence into a precomputed phylogenetic "backbone" tree, is useful for constructing large trees, performing taxon identification of newly obtained sequences, and other applications. The most accurate current methods, such as pplacer and EPA-ng, are based on maximum likelihood and require that the query sequence be provided within a multiple sequence alignment that includes the leaf sequences in the backbone tree. This approach enables high accuracy but also makes these likelihood-based methods computationally intensive on large backbone trees, and can even lead to them failing when the backbone trees are very large (e.g., having 50,000 or more leaves). We present SCAMPP (SCaling AlignMent-based Phylogenetic Placement), a technique to extend the scalability of these likelihood-based placement methods to ultra-large backbone trees. We show that pplacer-SCAMPP and EPA-ng-SCAMPP both scale well to ultra-large backbone trees (even up to 200,000 leaves), with accuracy that improves on APPLES and APPLES-2, two recently developed fast phylogenetic placement methods that scale to ultra-large datasets. EPA-ng-SCAMPP and pplacer-SCAMPP are available at https://github.com/chry04/PLUSplacer.

摘要

系统发育定位,即将一个“查询”序列置于预先计算好的系统发育“主干”树中的问题,对于构建大型树、对新获得的序列进行分类鉴定以及其他应用很有用。当前最准确的方法,如pplacer和EPA-ng,基于最大似然法,并且要求查询序列要在一个包含主干树中叶序列的多序列比对中提供。这种方法能实现高精度,但也使得这些基于似然法的方法在大型主干树上计算量很大,甚至当主干树非常大(例如有50000个或更多叶)时会导致它们失败。我们提出了SCAMPP(基于缩放比对的系统发育定位),一种将这些基于似然法的定位方法的可扩展性扩展到超大型主干树的技术。我们表明,pplacer-SCAMPP和EPA-ng-SCAMPP在超大型主干树(甚至多达200000个叶)上都具有良好的扩展性,其准确性优于APPLES和APPLES-2这两种最近开发的可扩展到超大型数据集的快速系统发育定位方法。EPA-ng-SCAMPP和pplacer-SCAMPP可在https://github.com/chry04/PLUSplacer获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验