Suppr超能文献

RAIphy:基于相对丰度指数轮廓的迭代细化对宏基因组样本进行系统发育分类。

RAIphy: phylogenetic classification of metagenomics samples using iterative refinement of relative abundance index profiles.

机构信息

Department of Electrical Engineering, 209N WSEC University of Nebraska-Lincoln, Lincoln, NE 68588-0511, USA.

出版信息

BMC Bioinformatics. 2011 Jan 31;12:41. doi: 10.1186/1471-2105-12-41.

Abstract

BACKGROUND

Computational analysis of metagenomes requires the taxonomical assignment of the genome contigs assembled from DNA reads of environmental samples. Because of the diverse nature of microbiomes, the length of the assemblies obtained can vary between a few hundred bp to a few hundred Kbp. Current taxonomic classification algorithms provide accurate classification for long contigs or for short fragments from organisms that have close relatives with annotated genomes. These are significant limitations for metagenome analysis because of the complexity of microbiomes and the paucity of existing annotated genomes.

RESULTS

We propose a robust taxonomic classification method, RAIphy, that uses a novel sequence similarity metric with iterative refinement of taxonomic models and functions effectively without these limitations. We have tested RAIphy with synthetic metagenomics data ranging between 100 bp to 50 Kbp. Within a sequence read range of 100 bp-1000 bp, the sensitivity of RAIphy ranges between 38%-81% outperforming the currently popular composition-based methods for reads in this range. Comparison with computationally more intensive sequence similarity methods shows that RAIphy performs competitively while being significantly faster. The sensitivity-specificity characteristics for relatively longer contigs were compared with the PhyloPythia and TACOA algorithms. RAIphy performs better than these algorithms at varying clade-levels. For an acid mine drainage (AMD) metagenome, RAIphy was able to taxonomically bin the sequence read set more accurately than the currently available methods, Phymm and MEGAN, and more accurately in two out of three tests than the much more computationally intensive method, PhymmBL.

CONCLUSIONS

With the introduction of the relative abundance index metric and an iterative classification method, we propose a taxonomic classification algorithm that performs competitively for a large range of DNA contig lengths assembled from metagenome data. Because of its speed, simplicity, and accuracy RAIphy can be successfully used in the binning process for a broad range of metagenomic data obtained from environmental samples.

摘要

背景

对宏基因组进行计算分析需要对从环境样本的 DNA 读取中组装的基因组进行分类学分配。由于微生物组的多样性,获得的组装长度可以在几百 bp 到几百 Kbp 之间变化。当前的分类算法为长片段或与具有注释基因组的近亲的短片段提供了准确的分类。这对于宏基因组分析来说是一个重大的限制,因为微生物组的复杂性和现有的注释基因组的稀缺性。

结果

我们提出了一种稳健的分类方法 RAIphy,它使用了一种新的序列相似性度量,通过迭代细化分类模型和功能,有效地克服了这些限制。我们已经使用从 100 bp 到 50 Kbp 的合成宏基因组数据对 RAIphy 进行了测试。在 100 bp-1000 bp 的序列读取范围内,RAIphy 的敏感性在 38%-81%之间,优于当前流行的该范围内的基于组成的方法。与计算上更密集的序列相似性方法的比较表明,RAIphy 表现出竞争力,同时速度明显更快。与 PhyloPythia 和 TACOA 算法相比,相对较长的基因序列的敏感性-特异性特征。RAIphy 在不同的进化枝水平上表现优于这些算法。对于酸性矿山排水 (AMD) 宏基因组,RAIphy 能够比目前可用的方法(Phymm 和 MEGAN)更准确地对序列读取集进行分类,在三种测试中有两种比计算量更大的方法(PhymmBL)更准确。

结论

通过引入相对丰度指数度量和迭代分类方法,我们提出了一种分类算法,该算法可以在从宏基因组数据组装的较大 DNA 基因序列长度范围内具有竞争力。由于其速度、简单性和准确性,RAIphy 可以成功地用于从环境样本获得的广泛宏基因组数据的分类过程。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5bca/3038895/ac9f29ab3d40/1471-2105-12-41-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验