一种基于快速系统发育的方法，可准确描绘大规模 metabarcoding 数据集的群落组成。

A rapid phylogeny-based method for accurate community profiling of large-scale metabarcoding datasets.

机构信息

Department of Integrative Biology, University of California, Berkeley, Berkeley, United States.

GLOBE Institute, University of Copenhagen, Copenhagen, Denmark.

出版信息

Elife. 2024 Aug 15;13:e85794. doi: 10.7554/eLife.85794.

DOI:10.7554/eLife.85794

PMID:39145536

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11377034/

Abstract

Environmental DNA (eDNA) is becoming an increasingly important tool in diverse scientific fields from ecological biomonitoring to wastewater surveillance of viruses. The fundamental challenge in eDNA analyses has been the bioinformatical assignment of reads to taxonomic groups. It has long been known that full probabilistic methods for phylogenetic assignment are preferable, but unfortunately, such methods are computationally intensive and are typically inapplicable to modern next-generation sequencing data. We present a fast approximate likelihood method for phylogenetic assignment of DNA sequences. Applying the new method to several mock communities and simulated datasets, we show that it identifies more reads at both high and low taxonomic levels more accurately than other leading methods. The advantage of the method is particularly apparent in the presence of polymorphisms and/or sequencing errors and when the true species is not represented in the reference database.

摘要

环境 DNA (eDNA) 正成为从生态生物监测到病毒废水监测等各个科学领域中越来越重要的工具。eDNA 分析的基本挑战一直是将读取物分配给分类群的生物信息学方法。人们早就知道，用于系统发育分配的全概率方法是可取的，但不幸的是，此类方法计算量很大，通常不适用于现代下一代测序数据。我们提出了一种用于 DNA 序列系统发育分配的快速近似似然方法。将新方法应用于几个模拟群落和模拟数据集，我们表明它比其他领先方法更准确地识别出更多的在高和低分类水平上的读取物。该方法的优势在存在多态性和/或测序错误以及参考数据库中未表示真实物种时尤为明显。