National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, 9600 Rockville Pike, Bethesda, MD, 20892, USA.
Int J Syst Evol Microbiol. 2023 Feb;73(1). doi: 10.1099/ijsem.0.005707.
The public sequence databases are entrusted with the dual responsibility of providing an accessible archive to all submitters and supporting data reliability and its re-use to all users. Genomes from type materials can act as an unambiguous reference for a taxonomic name and play an important role in comparative genomics, especially for taxon verification or reclassification. The National Center for Biotechnology Information (NCBI) collects and curates information on prokaryotic type strains and genomes from type strains. The average nucleotide identity (ANI)-based quality control processes introduced at NCBI to verify the genomes from type strains and improve related sequence records are detailed here. Using the curated genomes from type strains as reference, the taxonomy of over 1.1 million GenBank genomes were verified and the taxonomy of over 7000 new submissions before acceptance to GenBank and over 1800 existing genomes in GenBank were reclassified.
公共序列数据库承担着双重责任,即为所有提交者提供可访问的档案,并为所有用户提供数据可靠性及其再利用。来自模式材料的基因组可以作为分类名称的明确参考,并在比较基因组学中发挥重要作用,特别是对于分类群的验证或重新分类。美国国家生物技术信息中心(NCBI)收集和整理有关原核模式菌株和来自模式菌株的基因组的信息。这里详细介绍了 NCBI 引入的基于平均核苷酸同一性(ANI)的质量控制过程,以验证来自模式菌株的基因组并改进相关序列记录。使用经过整理的来自模式菌株的基因组作为参考,对超过 110 万个 GenBank 基因组的分类进行了验证,并对 GenBank 接受前的超过 7000 个新提交的基因组和 GenBank 中超过 1800 个现有基因组进行了重新分类。