Golob Jonathan L, Margolis Elisa, Hoffman Noah G, Fredricks David N
Vaccine and Infectious Disease Division, Fred Hutch, 1100 Eastlake Ave E, E4-100, Seattle, WA, 98109, USA.
Division of Allergy and Infectious Diseases, University of Washington, Seattle, WA, USA.
BMC Bioinformatics. 2017 May 30;18(1):283. doi: 10.1186/s12859-017-1690-0.
Microbiome studies commonly use 16S rRNA gene amplicon sequencing to characterize microbial communities. Errors introduced at multiple steps in this process can affect the interpretation of the data. Here we evaluate the accuracy of operational taxonomic unit (OTU) generation, taxonomic classification, alpha- and beta-diversity measures for different settings in QIIME, MOTHUR and a pplacer-based classification pipeline, using a novel software package: DECARD.
In-silico we generated 100 synthetic bacterial communities approximating human stool microbiomes to be used as a gold-standard for evaluating the colligative performance of microbiome analysis software. Our synthetic data closely matched the composition and complexity of actual healthy human stool microbiomes. Genus-level taxonomic classification was correctly done for only 50.4-74.8% of the source organisms. Miscall rates varied from 11.9 to 23.5%. Species-level classification was less successful, (6.9-18.9% correct); miscall rates were comparable to those of genus-level targets (12.5-26.2%). The degree of miscall varied by clade of organism, pipeline and specific settings used. OTU generation accuracy varied by strategy (closed, de novo or subsampling), reference database, algorithm and software implementation. Shannon diversity estimation accuracy correlated generally with OTU-generation accuracy. Beta-diversity estimates with Double Principle Coordinate Analysis (DPCoA) were more robust against errors introduced in processing than Weighted UniFrac. The settings suggested in the tutorials were among the worst performing in all outcomes tested.
Even when using the same classification pipeline, the specific OTU-generation strategy, reference database and downstream analysis methods selection can have a dramatic effect on the accuracy of taxonomic classification, and alpha- and beta-diversity estimation. Even minor changes in settings adversely affected the accuracy of the results, bringing them far from the best-observed result. Thus, specific details of how a pipeline is used (including OTU generation strategy, reference sets, clustering algorithm and specific software implementation) should be specified in the methods section of all microbiome studies. Researchers should evaluate their chosen pipeline and settings to confirm it can adequately answer the research question rather than assuming the tutorial or standard-operating-procedure settings will be adequate or optimal.
微生物组研究通常使用16S rRNA基因扩增子测序来表征微生物群落。此过程中多个步骤引入的错误会影响数据的解释。在这里,我们使用一个新的软件包:DECARD,评估了QIIME、MOTHUR和基于pplacer的分类流程中不同设置下的操作分类单元(OTU)生成、分类学分类、α-和β-多样性测量的准确性。
在计算机模拟中,我们生成了100个近似人类粪便微生物组的合成细菌群落,用作评估微生物组分析软件综合性能的金标准。我们的合成数据与实际健康人类粪便微生物组的组成和复杂性密切匹配。属水平的分类学分类仅对50.4 - 74.8%的源生物体正确完成。错误调用率在11.9%至23.5%之间变化。种水平的分类不太成功(正确的为6.9 - 18.9%);错误调用率与属水平目标的相当(12.5 - 26.2%)。错误调用的程度因生物体的进化枝、流程和使用的特定设置而异。OTU生成准确性因策略(封闭、从头或抽样)、参考数据库、算法和软件实现而异。香农多样性估计准确性通常与OTU生成准确性相关。与加权UniFrac相比,双主坐标分析(DPCoA)的β-多样性估计对处理过程中引入的错误更具鲁棒性。教程中建议的设置在所有测试结果中表现最差。
即使使用相同的分类流程,特定的OTU生成策略、参考数据库和下游分析方法的选择也会对分类学分类以及α-和β-多样性估计的准确性产生显著影响。即使设置中的微小变化也会对结果的准确性产生不利影响,使其远离最佳观察结果。因此,所有微生物组研究的方法部分应详细说明流程的使用方式(包括OTU生成策略、参考集、聚类算法和特定软件实现)。研究人员应评估他们选择的流程和设置,以确认其能够充分回答研究问题,而不是假定教程或标准操作程序设置就足够或最优。