Suppr超能文献

参考序列数据库的十个常见问题及如何缓解这些问题。

Ten common issues with reference sequence databases and how to mitigate them.

作者信息

Chorlton Samuel D

机构信息

BugSeq Bioinformatics Inc., Vancouver, BC, Canada.

出版信息

Front Bioinform. 2024 Mar 15;4:1278228. doi: 10.3389/fbinf.2024.1278228. eCollection 2024.

Abstract

Metagenomic sequencing has revolutionized our understanding of microbiology. While metagenomic tools and approaches have been extensively evaluated and benchmarked, far less attention has been given to the reference sequence database used in metagenomic classification. Issues with reference sequence databases are pervasive. Database contamination is the most recognized issue in the literature; however, it remains relatively unmitigated in most analyses. Other common issues with reference sequence databases include taxonomic errors, inappropriate inclusion and exclusion criteria, and sequence content errors. This review covers ten common issues with reference sequence databases and the potential downstream consequences of these issues. Mitigation measures are discussed for each issue, including bioinformatic tools and database curation strategies. Together, these strategies present a path towards more accurate, reproducible and translatable metagenomic sequencing.

摘要

宏基因组测序彻底改变了我们对微生物学的理解。虽然宏基因组工具和方法已经得到了广泛的评估和基准测试,但对于宏基因组分类中使用的参考序列数据库却很少有人关注。参考序列数据库的问题普遍存在。数据库污染是文献中最受认可的问题;然而,在大多数分析中,这个问题仍然没有得到很好的解决。参考序列数据库的其他常见问题包括分类错误、不适当的纳入和排除标准以及序列内容错误。本综述涵盖了参考序列数据库的十个常见问题以及这些问题可能产生的下游后果。针对每个问题都讨论了缓解措施,包括生物信息学工具和数据库管理策略。这些策略共同为更准确、可重复和可转化的宏基因组测序提供了一条途径。

相似文献

1
Ten common issues with reference sequence databases and how to mitigate them.
Front Bioinform. 2024 Mar 15;4:1278228. doi: 10.3389/fbinf.2024.1278228. eCollection 2024.
4
RESCRIPt: Reproducible sequence taxonomy reference database management.
PLoS Comput Biol. 2021 Nov 8;17(11):e1009581. doi: 10.1371/journal.pcbi.1009581. eCollection 2021 Nov.
6
The use of taxon-specific reference databases compromises metagenomic classification.
BMC Genomics. 2020 Feb 27;21(1):184. doi: 10.1186/s12864-020-6592-2.
7
Comparison of k-mer-based comparative metagenomic tools and approaches.
Microbiome Res Rep. 2023 Jul 20;2(4):27. doi: 10.20517/mrr.2023.26. eCollection 2023.
9
taxalogue: a toolkit to create comprehensive CO1 reference databases.
PeerJ. 2023 Dec 4;11:e16253. doi: 10.7717/peerj.16253. eCollection 2023.
10
An in-depth evaluation of metagenomic classifiers for soil microbiomes.
Environ Microbiome. 2024 Mar 28;19(1):19. doi: 10.1186/s40793-024-00561-w.

引用本文的文献

1
Bone Adhered Sediments as a Source of Target and Environmental DNA and Proteins.
Mol Biol Evol. 2025 Sep 1;42(9). doi: 10.1093/molbev/msaf202.
2
Evaluation of DNA barcoding reference databases for marine species in the western and central Pacific Ocean.
PeerJ. 2025 Jul 14;13:e19674. doi: 10.7717/peerj.19674. eCollection 2025.
4
Precise and scalable metagenomic profiling with sample-tailored minimizer libraries.
NAR Genom Bioinform. 2025 Jun 9;7(2):lqaf076. doi: 10.1093/nargab/lqaf076. eCollection 2025 Jun.
6
From air to insight: the evolution of airborne DNA sequencing technologies.
Microbiology (Reading). 2025 May;171(5). doi: 10.1099/mic.0.001564.
7
Gut microbiota development across the lifespan: Disease links and health-promoting interventions.
J Intern Med. 2025 Jun;297(6):560-583. doi: 10.1111/joim.20089. Epub 2025 Apr 24.
8
Addressing the dynamic nature of reference data: a new nucleotide database for robust metagenomic classification.
mSystems. 2025 Apr 22;10(4):e0123924. doi: 10.1128/msystems.01239-24. Epub 2025 Mar 20.
9
Can whole genome sequencing resolve taxonomic ambiguities in fungi? The case study of associated with ferns.
Front Fungal Biol. 2025 Feb 28;6:1540469. doi: 10.3389/ffunb.2025.1540469. eCollection 2025.
10
Saliva microbiome profiling by full-gene 16S rRNA Oxford Nanopore Technology versus Illumina MiSeq sequencing.
NPJ Biofilms Microbiomes. 2024 Dec 18;10(1):149. doi: 10.1038/s41522-024-00634-1.

本文引用的文献

3
Major data analysis errors invalidate cancer microbiome findings.
mBio. 2023 Oct 31;14(5):e0160723. doi: 10.1128/mbio.01607-23. Epub 2023 Oct 9.
4
HoCoRT: host contamination removal tool.
BMC Bioinformatics. 2023 Oct 2;24(1):371. doi: 10.1186/s12859-023-05492-w.
5
compleasm: a faster and more accurate reimplementation of BUSCO.
Bioinformatics. 2023 Oct 3;39(10). doi: 10.1093/bioinformatics/btad595.
6
Better research software tools to elevate the rate of scientific discovery or why we need to invest in research software engineering.
Front Bioinform. 2023 Aug 4;3:1255159. doi: 10.3389/fbinf.2023.1255159. eCollection 2023.
7
A CRISPR-enhanced metagenomic NGS test to improve pandemic preparedness.
Cell Rep Methods. 2023 Apr 18;3(5):100463. doi: 10.1016/j.crmeth.2023.100463. eCollection 2023 May 22.
8
Reconstruction of the personal information from human genome reads in gut metagenome sequencing data.
Nat Microbiol. 2023 Jun;8(6):1079-1094. doi: 10.1038/s41564-023-01381-3. Epub 2023 May 15.
10
Enhanced Viral Metagenomics with Lazypipe 2.
Viruses. 2023 Feb 4;15(2):431. doi: 10.3390/v15020431.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验