Zhou Yufei, Trujillo-González Alejandro, Nicol Simon, Huerlimann Roger, Sarre Stephen D, Gleeson Dianne
Centre for Conservation Ecology and Genomics, University of Canberra, Canberra, ACT, Australia.
Oceanic Fisheries Programme, Pacific Community, Noumea, New Caledonia.
PeerJ. 2025 Jul 14;13:e19674. doi: 10.7717/peerj.19674. eCollection 2025.
DNA barcoding is a widely used tool for species identification, with its reliability heavily dependent on reference databases. While the quality of these databases has long been debated, a critical knowledge gap remains in their comprehensive evaluation and comparison at regional scales. Marine metazoan species in the western and central Pacific Ocean (WCPO), a region characterized by high biodiversity and limited sequencing efforts, are an example of this gap. This study developed a systematic workflow to assess mitochondrial cytochrome c oxidase subunit I (COI) barcode coverage and sequence quality in two commonly used reference databases for DNA barcoding: the nucleotide reference database from the National Center for Biotechnology Information (NCBI); and from the Barcode of Life Data System (BOLD). Comparative analyses across marine phyla and WCPO regions identified significant barcode gaps and quality problems, providing insights to guide future barcoding efforts. NCBI exhibited higher barcode coverage, but lower sequence quality compared to BOLD. Quality issues, including over- or under-represented species, short sequences, ambiguous nucleotides, incomplete taxonomic information, conflict records, high intraspecific distances, and low inter-specific distances were identified in both databases, likely resulting from contamination, cryptic species, sequencing errors, or inconsistent taxonomic assignment. The barcode identification number (BIN) system in BOLD demonstrated potential for identifying and addressing problematic records, highlighting the benefits of curated databases. Significant barcode deficiencies and quality issues were observed in the south temperate region of WCPO and phyla such as Porifera, Bryozoa, and Platyhelminthes. Additionally, the COI barcode showed limited species-level resolution for certain taxa, including Scombridae and Lutjanidae. Addressing barcode coverage gaps, improving taxonomic representation, and enhancing sequence quality will be essential for strengthening future barcoding initiatives and advancing biodiversity monitoring and conservation in the WCPO and beyond. This study highlights the need for standardized database curation and sequencing practices to improve the global reliability and applicability of DNA barcoding.
DNA条形码是一种广泛用于物种鉴定的工具,其可靠性在很大程度上依赖于参考数据库。虽然这些数据库的质量长期以来一直存在争议,但在区域尺度上对它们进行全面评估和比较方面仍存在关键的知识空白。西太平洋和中太平洋(WCPO)的海洋后生动物物种就是这一空白的一个例子,该区域生物多样性高但测序工作有限。本研究开发了一种系统的工作流程,以评估DNA条形码两个常用参考数据库中线粒体细胞色素c氧化酶亚基I(COI)条形码的覆盖范围和序列质量:美国国立生物技术信息中心(NCBI)的核苷酸参考数据库;以及生命条形码数据系统(BOLD)的数据库。对海洋门类和WCPO区域的比较分析发现了明显的条形码间隙和质量问题,为指导未来的条形码工作提供了见解。与BOLD相比,NCBI的条形码覆盖范围更高,但序列质量更低。在两个数据库中都发现了质量问题,包括物种代表性过高或过低、序列短、核苷酸模糊、分类信息不完整、记录冲突、种内距离高和种间距离低,这可能是由于污染、隐存物种、测序错误或分类分配不一致造成的。BOLD中的条形码识别号(BIN)系统显示出识别和处理有问题记录的潜力,突出了经过整理的数据库的优势。在WCPO的南温带地区以及海绵动物门、苔藓虫纲和扁形动物门等门类中观察到明显的条形码缺陷和质量问题。此外,COI条形码对某些分类群,包括鲭科和笛鲷科,显示出有限的物种水平分辨率。解决条形码覆盖间隙、改善分类代表性和提高序列质量对于加强未来的条形码计划以及推进WCPO及其他地区的生物多样性监测和保护至关重要。本研究强调了标准化数据库管理和测序实践的必要性,以提高DNA条形码在全球的可靠性和适用性。