Nantes Université, École Centrale Nantes, CNRS, LS2N, UMR 6004, Nantes, France.
Nantes Université, CNRS, INSERM, l'institut du thorax, F-44000 Nantes, France.
mSystems. 2022 Aug 30;7(4):e0043222. doi: 10.1128/msystems.00432-22. Epub 2022 Jun 15.
Metagenome-assembled genomes (MAGs) represent individual genomes recovered from metagenomic data. MAGs are extremely useful to analyze uncultured microbial genomic diversity, as well as to characterize associated functional and metabolic potential in natural environments. Recent computational developments have considerably improved MAG reconstruction but also emphasized several limitations, such as the nonbinning of sequence regions with repetitions or distinct nucleotidic composition. Different assembly and binning strategies are often used; however, it still remains unclear which assembly strategy, in combination with which binning approach, offers the best performance for MAG recovery. Several workflows have been proposed in order to reconstruct MAGs, but users are usually limited to single-metagenome assembly or need to manually define sets of metagenomes to coassemble prior to genome binning. Here, we present MAGNETO, an automated workflow dedicated to MAG reconstruction, which includes a fully-automated coassembly step informed by optimal clustering of metagenomic distances, and implements complementary genome binning strategies, for improving MAG recovery. MAGNETO is implemented as a Snakemake workflow and is available at: https://gitlab.univ-nantes.fr/bird_pipeline_registry/magneto. Genome-resolved metagenomics has led to the discovery of previously untapped biodiversity within the microbial world. As the development of computational methods for the recovery of genomes from metagenomes continues, existing strategies need to be evaluated and compared to eventually lead to standardized computational workflows. In this study, we compared commonly used assembly and binning strategies and assessed their performance using both simulated and real metagenomic data sets. We propose a novel approach to automate coassembly, avoiding the requirement for knowledge to combine metagenomic information. The comparison against a previous coassembly approach demonstrates a strong impact of this step on genome binning results, but also the benefits of informing coassembly for improving the quality of recovered genomes. MAGNETO integrates complementary assembly-binning strategies to optimize genome reconstruction and provides a complete reads-to-genomes workflow for the growing microbiome research community.
宏基因组组装基因组(MAGs)代表从宏基因组数据中恢复的单个基因组。MAGs 非常有助于分析未培养微生物基因组多样性,以及描述自然环境中相关的功能和代谢潜力。最近的计算发展极大地提高了 MAG 的重建能力,但也强调了一些限制,例如具有重复或独特核苷酸组成的序列区域的非分箱。通常使用不同的组装和分箱策略;然而,对于 MAG 恢复,哪种组装策略与哪种分箱方法相结合提供最佳性能,仍然不清楚。已经提出了几种用于重建 MAG 的工作流程,但用户通常仅限于单宏基因组组装,或者需要在进行基因组分箱之前手动定义要共同组装的一组宏基因组。在这里,我们提出了 MAGNETO,这是一个专门用于 MAG 重建的自动化工作流程,它包括一个完全自动化的共同组装步骤,该步骤由基于最佳聚类的宏基因组距离信息提供,并且实现了互补的基因组分箱策略,用于提高 MAG 恢复能力。MAGNETO 作为一个 Snakemake 工作流程实现,并可在以下网址获得:https://gitlab.univ-nantes.fr/bird_pipeline_registry/magneto。 基因组解析宏基因组学导致了微生物世界中以前未开发的生物多样性的发现。随着从宏基因组中恢复基因组的计算方法的不断发展,现有的策略需要进行评估和比较,最终导致标准化的计算工作流程。在这项研究中,我们比较了常用的组装和分箱策略,并使用模拟和真实的宏基因组数据集评估了它们的性能。我们提出了一种新的自动化共同组装方法,避免了对组合宏基因组信息的知识要求。与以前的共同组装方法的比较证明了这一步对基因组分箱结果的强烈影响,但也证明了为提高回收基因组的质量而告知共同组装的好处。MAGNETO 集成了互补的组装-分箱策略,以优化基因组重建,并为不断发展的微生物组研究社区提供了一个完整的从读取到基因组的工作流程。