Chorlton Samuel D
BugSeq Bioinformatics Inc., Vancouver, BC, Canada.
Front Bioinform. 2024 Mar 15;4:1278228. doi: 10.3389/fbinf.2024.1278228. eCollection 2024.
Metagenomic sequencing has revolutionized our understanding of microbiology. While metagenomic tools and approaches have been extensively evaluated and benchmarked, far less attention has been given to the reference sequence database used in metagenomic classification. Issues with reference sequence databases are pervasive. Database contamination is the most recognized issue in the literature; however, it remains relatively unmitigated in most analyses. Other common issues with reference sequence databases include taxonomic errors, inappropriate inclusion and exclusion criteria, and sequence content errors. This review covers ten common issues with reference sequence databases and the potential downstream consequences of these issues. Mitigation measures are discussed for each issue, including bioinformatic tools and database curation strategies. Together, these strategies present a path towards more accurate, reproducible and translatable metagenomic sequencing.
宏基因组测序彻底改变了我们对微生物学的理解。虽然宏基因组工具和方法已经得到了广泛的评估和基准测试,但对于宏基因组分类中使用的参考序列数据库却很少有人关注。参考序列数据库的问题普遍存在。数据库污染是文献中最受认可的问题;然而,在大多数分析中,这个问题仍然没有得到很好的解决。参考序列数据库的其他常见问题包括分类错误、不适当的纳入和排除标准以及序列内容错误。本综述涵盖了参考序列数据库的十个常见问题以及这些问题可能产生的下游后果。针对每个问题都讨论了缓解措施,包括生物信息学工具和数据库管理策略。这些策略共同为更准确、可重复和可转化的宏基因组测序提供了一条途径。