Vollmers John, Wiegand Sandra, Kaster Anne-Kristin
Leibniz Institute DSMZ - German Collection of Microorganisms and Cell Cultures, Braunschweig, Germany.
PLoS One. 2017 Jan 18;12(1):e0169662. doi: 10.1371/journal.pone.0169662. eCollection 2017.
With the constant improvement in cost-efficiency and quality of Next Generation Sequencing technologies, shotgun-sequencing approaches -such as metagenomics- have nowadays become the methods of choice for studying and classifying microorganisms from various habitats. The production of data has dramatically increased over the past years and processing and analysis steps are becoming more and more of a bottleneck. Limiting factors are partly the availability of computational resources, but mainly the bioinformatics expertise in establishing and applying appropriate processing and analysis pipelines. Fortunately, a large diversity of specialized software tools is nowadays available. Nevertheless, choosing the most appropriate methods for answering specific biological questions can be rather challenging, especially for non-bioinformaticians. In order to provide a comprehensive overview and guide for the microbiological scientific community, we assessed the most common and freely available metagenome assembly tools with respect to their output statistics, their sensitivity for low abundant community members and variability in resulting community profiles as well as their ease-of-use. In contrast to the highly anticipated "Critical Assessment of Metagenomic Interpretation" (CAMI) challenge, which uses general mock community-based assembler comparison we here tested assemblers on real Illumina metagenome sequencing data from natural communities of varying complexity sampled from forest soil and algal biofilms. Our observations clearly demonstrate that different assembly tools can prove optimal, depending on the sample type, available computational resources and, most importantly, the specific research goal. In addition, we present detailed descriptions of the underlying principles and pitfalls of publically available assembly tools from a microbiologist's perspective, and provide guidance regarding the user-friendliness, sensitivity and reliability of the resulting phylogenetic profiles.
随着新一代测序技术在成本效益和质量方面的不断提高,鸟枪法测序方法(如宏基因组学)如今已成为研究和分类来自各种栖息地的微生物的首选方法。在过去几年中,数据产量急剧增加,处理和分析步骤越来越成为瓶颈。限制因素部分在于计算资源的可用性,但主要是建立和应用适当的处理与分析流程所需的生物信息学专业知识。幸运的是,如今有各种各样的专用软件工具。然而,选择最合适的方法来回答特定的生物学问题可能颇具挑战性,尤其是对于非生物信息学专业人员而言。为了为微生物学科学界提供全面的概述和指导,我们评估了最常见且免费可用的宏基因组组装工具,涉及它们的输出统计、对低丰度群落成员的敏感性、所得群落概况的变异性以及易用性。与备受期待的“宏基因组解释关键评估”(CAMI)挑战不同,该挑战使用基于通用模拟群落的组装器比较,我们在此使用来自森林土壤和藻类生物膜中采样的不同复杂程度的自然群落的真实Illumina宏基因组测序数据测试组装器。我们的观察结果清楚地表明,根据样本类型、可用的计算资源以及最重要的是特定的研究目标,不同的组装工具可能被证明是最优的。此外,我们从微生物学家的角度详细描述了公开可用的组装工具的基本原理和陷阱,并就所得系统发育概况的用户友好性、敏感性和可靠性提供指导。