MetaCAA：一种用于宏基因组数据集高效组装的聚类辅助方法。

MetaCAA: A clustering-aided methodology for efficient assembly of metagenomic datasets.

作者信息

Reddy Rachamalla Maheedhar, Mohammed Monzoorul Haque, Mande Sharmila S

机构信息

Bio-Sciences R&D Division, TCS Innovation Labs, Tata Research Development & Design Centre, Tata Consultancy Services Ltd., 54-B Hadapsar Industrial Estate, Pune 411013, Maharashtra, India.

出版信息

Genomics. 2014 Feb-Mar;103(2-3):161-8. doi: 10.1016/j.ygeno.2014.02.007. Epub 2014 Mar 5.

DOI:10.1016/j.ygeno.2014.02.007

PMID:24607570

Abstract

A key challenge in analyzing metagenomics data pertains to assembly of sequenced DNA fragments (i.e. reads) originating from various microbes in a given environmental sample. Several existing methodologies can assemble reads originating from a single genome. However, these methodologies cannot be applied for efficient assembly of metagenomic sequence datasets. In this study, we present MetaCAA - a clustering-aided methodology which helps in improving the quality of metagenomic sequence assembly. MetaCAA initially groups sequences constituting a given metagenome into smaller clusters. Subsequently, sequences in each cluster are independently assembled using CAP3, an existing single genome assembly program. Contigs formed in each of the clusters along with the unassembled reads are then subjected to another round of assembly for generating the final set of contigs. Validation using simulated and real-world metagenomic datasets indicates that MetaCAA aids in improving the overall quality of assembly. A software implementation of MetaCAA is available at https://metagenomics.atc.tcs.com/MetaCAA.

摘要

分析宏基因组学数据的一个关键挑战涉及对给定环境样本中源自各种微生物的测序DNA片段（即读段）进行组装。现有的几种方法可以组装源自单个基因组的读段。然而，这些方法不能用于宏基因组序列数据集的高效组装。在本研究中，我们提出了MetaCAA——一种聚类辅助方法，有助于提高宏基因组序列组装的质量。MetaCAA首先将构成给定宏基因组的序列分组为较小的簇。随后，使用现有的单基因组组装程序CAP3对每个簇中的序列进行独立组装。然后，对每个簇中形成的重叠群以及未组装的读段进行另一轮组装，以生成最终的重叠群集。使用模拟和真实世界的宏基因组数据集进行的验证表明，MetaCAA有助于提高组装的整体质量。可在https://metagenomics.atc.tcs.com/MetaCAA获得MetaCAA的软件实现。

相似文献

MetaCAA: A clustering-aided methodology for efficient assembly of metagenomic datasets.

Genomics. 2014 Feb-Mar;103(2-3):161-8. doi: 10.1016/j.ygeno.2014.02.007. Epub 2014 Mar 5.

COGNIZER: A Framework for Functional Annotation of Metagenomic Datasets.

PLoS One. 2015 Nov 11;10(11):e0142102. doi: 10.1371/journal.pone.0142102. eCollection 2015.

Grid-Assembly: An oligonucleotide composition-based partitioning strategy to aid metagenomic sequence assembly.

J Bioinform Comput Biol. 2015 Jun;13(3):1541004. doi: 10.1142/S0219720015410048. Epub 2015 Feb 8.

A scalable assembly-free variable selection algorithm for biomarker discovery from metagenomes.

BMC Bioinformatics. 2016 Aug 19;17(1):311. doi: 10.1186/s12859-016-1186-3.

MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices.

Methods. 2016 Jun 1;102:3-11. doi: 10.1016/j.ymeth.2016.02.020. Epub 2016 Mar 21.

Fast and simple protein-alignment-guided assembly of orthologous gene families from microbiome sequencing reads.

Microbiome. 2017 Jan 25;5(1):11. doi: 10.1186/s40168-017-0233-2.

Recovering complete and draft population genomes from metagenome datasets.

Microbiome. 2016 Mar 8;4:8. doi: 10.1186/s40168-016-0154-5.

OGRE: Overlap Graph-based metagenomic Read clustEring.

Bioinformatics. 2021 May 17;37(7):905-912. doi: 10.1093/bioinformatics/btaa760.

Estimating the composition of species in metagenomes by clustering of next-generation read sequences.

Methods. 2014 Oct 1;69(3):213-9. doi: 10.1016/j.ymeth.2014.07.009. Epub 2014 Jul 27.

BIGMAC : breaking inaccurate genomes and merging assembled contigs for long read metagenomic assembly.

BMC Bioinformatics. 2016 Oct 28;17(1):435. doi: 10.1186/s12859-016-1288-y.

引用本文的文献

An Improved Machine Learning-Based Approach to Assess the Microbial Diversity in Major North Indian River Ecosystems.

Genes (Basel). 2023 May 14;14(5):1082. doi: 10.3390/genes14051082.

Genome-resolved metagenomics using environmental and clinical samples.

Brief Bioinform. 2021 Sep 2;22(5). doi: 10.1093/bib/bbab030.

Decoding the microbiome for the development of translational applications: Overview, challenges and pitfalls.

J Biosci. 2019 Oct;44(5).

A clinician's guide to microbiome analysis.

Nat Rev Gastroenterol Hepatol. 2017 Oct;14(10):585-595. doi: 10.1038/nrgastro.2017.97. Epub 2017 Aug 9.

Inferring Intra-Community Microbial Interaction Patterns from Metagenomic Datasets Using Associative Rule Mining Techniques.

PLoS One. 2016 Apr 28;11(4):e0154493. doi: 10.1371/journal.pone.0154493. eCollection 2016.

Metagenomic Analysis of Upwelling-Affected Brazilian Coastal Seawater Reveals Sequence Domains of Type I PKS and Modular NRPS.

Int J Mol Sci. 2015 Nov 27;16(12):28285-95. doi: 10.3390/ijms161226101.

Discovery and characterization of Alu repeat sequences via precise local read assembly.

Nucleic Acids Res. 2015 Dec 2;43(21):10292-307. doi: 10.1093/nar/gkv1089. Epub 2015 Oct 25.

Assembly of viral genomes from metagenomes.

Front Microbiol. 2014 Dec 18;5:714. doi: 10.3389/fmicb.2014.00714. eCollection 2014.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

MetaCAA：一种用于宏基因组数据集高效组装的聚类辅助方法。

MetaCAA: A clustering-aided methodology for efficient assembly of metagenomic datasets.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献