Keck François, Altermatt Florian
Department of Aquatic Ecology, Eawag: Swiss Federal Institute of Aquatic Science and Technology, Dübendorf, Switzerland.
Department of Evolutionary Biology and Environmental Studies, University of Zurich, Zürich, Switzerland.
Mol Ecol Resour. 2023 Feb;23(2):511-518. doi: 10.1111/1755-0998.13723. Epub 2022 Oct 28.
DNA barcoding and metabarcoding are revolutionizing the study and survey of biodiversity. In order to assign taxonomic labels to the DNA sequence data retrieved, these methods are strongly dependent on comprehensive and accurate reference databases. Producing reliable databases linking biological sequences and taxonomic data can be-and often has been-done using mainstream tools such as spreadsheet software. However, spreadsheets quickly become insufficient when the amount of data increases to thousands of taxa and sequences to be matched, and validation operations become more complex and are error prone if done in a manual way. Thus, there is a clear need for providing scientists with user-friendly, reliable and powerful tools to manipulate and manage DNA reference databases in tractable, sound and efficient ways. Here, we introduce the R package refdb as an environment for semi-automatic and assisted construction of DNA reference libraries. The refdb package is a reference database manager offering a set of powerful functions to import, organize, clean, filter, audit and export the data. It is broadly applicable in metabarcoding data generally obtained in biodiversity and biomonitoring studies. We present the main features of the package and outline how refdb can speed up reference database generation, management and handling, and thus contribute to standardization and repeatability in barcoding and metabarcoding studies.
DNA条形码和宏条形码技术正在彻底改变生物多样性的研究和调查方式。为了给检索到的DNA序列数据赋予分类标签,这些方法强烈依赖于全面且准确的参考数据库。使用诸如电子表格软件等主流工具,可以(而且经常已经)创建将生物序列与分类数据联系起来的可靠数据库。然而,当数据量增加到数千个分类单元以及需要匹配的序列时,电子表格很快就不够用了,而且如果以手动方式进行验证操作,会变得更加复杂且容易出错。因此,迫切需要为科学家提供用户友好、可靠且强大的工具,以便以易于处理、合理且高效的方式来操作和管理DNA参考数据库。在此,我们介绍R包refdb,它是一个用于半自动和辅助构建DNA参考文库的环境。refdb包是一个参考数据库管理器,提供了一组强大的功能来导入、组织、清理、过滤、审核和导出数据。它广泛适用于在生物多样性和生物监测研究中通常获得的宏条形码数据。我们展示了该包的主要特性,并概述了refdb如何能够加速参考数据库的生成、管理和处理,从而有助于条形码和宏条形码研究的标准化和可重复性。