Channing Division of Network Medicine, Department of Medicine, Brigham and Women's Hospital and Harvard Medical School, Boston, MA, USA.
Department of Pediatrics, University of California, San Diego, La Jolla, CA, USA.
Nat Methods. 2021 Jun;18(6):618-626. doi: 10.1038/s41592-021-01141-3. Epub 2021 May 13.
Accurate microbial identification and abundance estimation are crucial for metagenomics analysis. Various methods for classification of metagenomic data and estimation of taxonomic profiles, broadly referred to as metagenomic profilers, have been developed. Nevertheless, benchmarking of metagenomic profilers remains challenging because some tools are designed to report relative sequence abundance while others report relative taxonomic abundance. Here we show how misleading conclusions can be drawn by neglecting this distinction between relative abundance types when benchmarking metagenomic profilers. Moreover, we show compelling evidence that interchanging sequence abundance and taxonomic abundance will influence both per-sample summary statistics and cross-sample comparisons. We suggest that the microbiome research community pay attention to potentially misleading biological conclusions arising from this issue when benchmarking metagenomic profilers, by carefully considering the type of abundance data that were analyzed and interpreted and clearly stating the strategy used for metagenomic profiling.
准确的微生物鉴定和丰度估计对宏基因组分析至关重要。已经开发了各种用于分类宏基因组数据和估计分类特征的方法,通常称为宏基因组分析器。然而,宏基因组分析器的基准测试仍然具有挑战性,因为一些工具旨在报告相对序列丰度,而另一些工具则报告相对分类丰度。在这里,我们展示了当在基准测试宏基因组分析器时忽略相对丰度类型之间的这种区别时,可能会得出误导性的结论。此外,我们还提供了令人信服的证据表明,交换序列丰度和分类丰度将影响样本内汇总统计数据和样本间比较。我们建议微生物组研究界在基准测试宏基因组分析器时,注意到由于这个问题可能会产生误导性的生物学结论,仔细考虑分析和解释的丰度数据类型,并清楚地说明用于宏基因组分析的策略。