Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany.
Centro de Biotecnología y Genómica de Plantas, Universidad Politécnica de Madrid (UPM) - Instituto Nacional de Investigación y Tecnología Agraria y Alimentaria (INIA), Madrid, Spain.
Nucleic Acids Res. 2019 Jan 8;47(D1):D309-D314. doi: 10.1093/nar/gky1085.
eggNOG is a public database of orthology relationships, gene evolutionary histories and functional annotations. Here, we present version 5.0, featuring a major update of the underlying genome sets, which have been expanded to 4445 representative bacteria and 168 archaea derived from 25 038 genomes, as well as 477 eukaryotic organisms and 2502 viral proteomes that were selected for diversity and filtered by genome quality. In total, 4.4M orthologous groups (OGs) distributed across 379 taxonomic levels were computed together with their associated sequence alignments, phylogenies, HMM models and functional descriptors. Precomputed evolutionary analysis provides fine-grained resolution of duplication/speciation events within each OG. Our benchmarks show that, despite doubling the amount of genomes, the quality of orthology assignments and functional annotations (80% coverage) has persisted without significant changes across this update. Finally, we improved eggNOG online services for fast functional annotation and orthology prediction of custom genomics or metagenomics datasets. All precomputed data are publicly available for downloading or via API queries at http://eggnog.embl.de.
eggNOG 是一个公共的同源关系数据库,提供基因进化历史和功能注释信息。在这里,我们介绍 eggNOG 5.0 版本,其主要更新了底层的基因组集,这些基因组集已扩展到 4445 个代表性细菌和 168 个古菌,来源于 25038 个基因组,以及 477 个真核生物和 2502 个病毒蛋白质组,这些基因组是通过多样性选择和基因组质量过滤得到的。总共计算了 440 万个分布在 379 个分类水平上的直系同源群(OGs),并为其提供了相关的序列比对、系统发育、HMM 模型和功能描述符。预先计算的进化分析为每个 OG 内的复制/物种形成事件提供了精细的分辨率。我们的基准测试表明,尽管基因组数量增加了一倍,但同源分配和功能注释的质量(80%的覆盖率)在这次更新中并没有发生显著变化。最后,我们改进了 eggNOG 在线服务,以快速对自定义基因组学或宏基因组学数据集进行功能注释和直系同源预测。所有预先计算的数据都可以公开下载,也可以通过 API 查询在 http://eggnog.embl.de 获取。