Center for Bioinformatics and Computational Biology, University of Maryland, College Park, USA.
BMC Genomics. 2011;12 Suppl 2(Suppl 2):S4. doi: 10.1186/1471-2164-12-S2-S4. Epub 2011 Jul 27.
A major goal of metagenomics is to characterize the microbial composition of an environment. The most popular approach relies on 16S rRNA sequencing, however this approach can generate biased estimates due to differences in the copy number of the gene between even closely related organisms, and due to PCR artifacts. The taxonomic composition can also be determined from metagenomic shotgun sequencing data by matching individual reads against a database of reference sequences. One major limitation of prior computational methods used for this purpose is the use of a universal classification threshold for all genes at all taxonomic levels.
We propose that better classification results can be obtained by tuning the taxonomic classifier to each matching length, reference gene, and taxonomic level. We present a novel taxonomic classifier MetaPhyler (http://metaphyler.cbcb.umd.edu), which uses phylogenetic marker genes as a taxonomic reference. Results on simulated datasets demonstrate that MetaPhyler outperforms other tools commonly used in this context (CARMA, Megan and PhymmBL). We also present interesting results by analyzing a real metagenomic dataset.
We have introduced a novel taxonomic classification method for analyzing the microbial diversity from whole-metagenome shotgun sequences. Compared with previous approaches, MetaPhyler is much more accurate in estimating the phylogenetic composition. In addition, we have shown that MetaPhyler can be used to guide the discovery of novel organisms from metagenomic samples.
宏基因组学的主要目标是描述环境中的微生物组成。最流行的方法依赖于 16S rRNA 测序,但是由于基因拷贝数在甚至密切相关的生物之间存在差异,以及由于 PCR 伪影,这种方法可能会产生有偏差的估计。通过将单个读取与参考序列数据库进行匹配,也可以从宏基因组鸟枪法测序数据中确定分类组成。为此目的而使用的先前计算方法的一个主要限制是在所有分类水平上对所有基因使用通用分类阈值。
我们提出,通过根据每个匹配长度、参考基因和分类水平调整分类器,可以获得更好的分类结果。我们提出了一种新颖的分类器 MetaPhyler(http://metaphyler.cbcb.umd.edu),它使用系统发育标记基因作为分类参考。在模拟数据集上的结果表明,MetaPhyler 优于在这种情况下常用的其他工具(CARMA、Megan 和 PhymmBL)。我们还通过分析真实的宏基因组数据集呈现了有趣的结果。
我们引入了一种新的分类方法,用于分析全宏基因组鸟枪法序列中的微生物多样性。与以前的方法相比,MetaPhyler 在估计系统发育组成方面要准确得多。此外,我们已经表明,MetaPhyler 可用于指导从宏基因组样本中发现新的生物体。