Suppr超能文献

eggNOG v4.0:跨越 3686 个生物体的嵌套同源推断。

eggNOG v4.0: nested orthology inference across 3686 organisms.

机构信息

European Molecular Biology Laboratory, Computational Biology Unit, Meyerhofstrasse 1, 69117 Heidelberg, Germany, University of Zurich and Swiss Institute of Bioinformatics, Institute of Molecular Life Sciences, Winterthurerstrasse 190, 8057 Zurich, Switzerland, Institute for Systems Biology, 401 Terry Avenue North, Seattle, WA 98109-5234, USA, Bioinformatics and Genomics Programme, Centre for Genomic Regulation (CRG), C/Dr. Aiguader 88, 08003 Barcelona, Spain, Universitat Pompeu Fabra (UPF), 08003 Barcelona, Spain, CUBE-Division of Computational Systems Biology, Department of Microbiology and Ecosystem Science, University of Vienna, Althanstraße 14, 1090 Vienna, Austria, Institute of Biological, Environmental & Rural Sciences, Aberystwyth University, Penglais, Aberystwyth, Ceredigion, SY23 3FG, UK, Biotechnology Center, TU Dresden, 01062 Dresden, Germany, Novo Nordisk Foundation Center for Protein Research, Faculty of Health Sciences, University of Copenhagen, 2200, Copenhagen N, Denmark and Max-Delbrück-Centre for Molecular Medicine, Robert-Rössle-Strasse 10, 13092 Berlin, Germany.

出版信息

Nucleic Acids Res. 2014 Jan;42(Database issue):D231-9. doi: 10.1093/nar/gkt1253. Epub 2013 Dec 1.

Abstract

With the increasing availability of various 'omics data, high-quality orthology assignment is crucial for evolutionary and functional genomics studies. We here present the fourth version of the eggNOG database (available at http://eggnog.embl.de) that derives nonsupervised orthologous groups (NOGs) from complete genomes, and then applies a comprehensive characterization and analysis pipeline to the resulting gene families. Compared with the previous version, we have more than tripled the underlying species set to cover 3686 organisms, keeping track with genome project completions while prioritizing the inclusion of high-quality genomes to minimize error propagation from incomplete proteome sets. Major technological advances include (i) a robust and scalable procedure for the identification and inclusion of high-quality genomes, (ii) provision of orthologous groups for 107 different taxonomic levels compared with 41 in eggNOGv3, (iii) identification and annotation of particularly closely related orthologous groups, facilitating analysis of related gene families, (iv) improvements of the clustering and functional annotation approach, (v) adoption of a revised tree building procedure based on the multiple alignments generated during the process and (vi) implementation of quality control procedures throughout the entire pipeline. As in previous versions, eggNOGv4 provides multiple sequence alignments and maximum-likelihood trees, as well as broad functional annotation. Users can access the complete database of orthologous groups via a web interface, as well as through bulk download.

摘要

随着各种“组学”数据的日益普及,高质量的直系同源物分配对于进化和功能基因组学研究至关重要。我们在此展示 eggNOG 数据库的第四个版本(可在 http://eggnog.embl.de 获得),它从完整基因组中推导出非监督直系同源物(NOG),然后对生成的基因家族应用全面的特征描述和分析流程。与前一个版本相比,我们的基础物种集增加了两倍多,涵盖了 3686 个生物体,紧跟基因组项目完成的步伐,同时优先考虑包含高质量的基因组,以最大限度地减少来自不完全蛋白质组集的错误传播。主要技术进步包括:(i)用于识别和包含高质量基因组的稳健且可扩展的过程;(ii)与 eggNOGv3 中的 41 个相比,提供了 107 个不同分类水平的直系同源物;(iii)鉴定和注释特别密切相关的直系同源物,有助于分析相关基因家族;(iv)改进聚类和功能注释方法;(v)采用基于过程中生成的多重比对的修订树构建过程;(vi)在整个管道中实施质量控制程序。与以前的版本一样,eggnogv4 提供了多重序列比对和最大似然树,以及广泛的功能注释。用户可以通过 Web 界面以及批量下载访问直系同源物的完整数据库。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ac45/3964997/c88da9dc16df/gkt1253f1p.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验