同源基因预测方法：基于已验证蛋白质家族的质量评估

Orthology prediction methods: a quality assessment using curated protein families.

机构信息

Structural and Computational Biology Unit, European Molecular Biology Laboratory, Heidelberg, Germany.

出版信息

Bioessays. 2011 Oct;33(10):769-80. doi: 10.1002/bies.201100062. Epub 2011 Aug 19.

DOI:10.1002/bies.201100062

PMID:21853451

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3193375/

Abstract

The increasing number of sequenced genomes has prompted the development of several automated orthology prediction methods. Tests to evaluate the accuracy of predictions and to explore biases caused by biological and technical factors are therefore required. We used 70 manually curated families to analyze the performance of five public methods in Metazoa. We analyzed the strengths and weaknesses of the methods and quantified the impact of biological and technical challenges. From the latter part of the analysis, genome annotation emerged as the largest single influencer, affecting up to 30% of the performance. Generally, most methods did well in assigning orthologous group but they failed to assign the exact number of genes for half of the groups. The publicly available benchmark set (http://eggnog.embl.de/orthobench/) should facilitate the improvement of current orthology assignment protocols, which is of utmost importance for many fields of biology and should be tackled by a broad scientific community.

摘要

随着测序基因组数量的不断增加，已经开发出了几种自动化的直系同源预测方法。因此，需要进行测试以评估预测的准确性，并探索由生物学和技术因素引起的偏差。我们使用了 70 个经过人工整理的家族来分析 Metazoa 中五种公共方法的性能。我们分析了方法的优缺点，并量化了生物学和技术挑战的影响。从分析的后半部分可以看出，基因组注释成为最大的单一影响因素，最多可达 30%的性能受到影响。通常，大多数方法在分配直系同源群方面表现良好，但它们未能为一半的群组分配确切的基因数量。公开可用的基准集（http://eggnog.embl.de/orthobench/）应该有助于改进当前的直系同源分配协议，这对生物学的许多领域都至关重要，应该由广泛的科学界来解决。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2741/3193375/f3428fbda95d/bies0033-0769-f2.jpg

相似文献

Orthology prediction methods: a quality assessment using curated protein families.

Bioessays. 2011 Oct;33(10):769-80. doi: 10.1002/bies.201100062. Epub 2011 Aug 19.

A phylogeny-based benchmarking test for orthology inference reveals the limitations of function-based validation.

PLoS One. 2014 Nov 4;9(11):e111122. doi: 10.1371/journal.pone.0111122. eCollection 2014.

eggNOG v4.0: nested orthology inference across 3686 organisms.

Nucleic Acids Res. 2014 Jan;42(Database issue):D231-9. doi: 10.1093/nar/gkt1253. Epub 2013 Dec 1.

Fast Genome-Wide Functional Annotation through Orthology Assignment by eggNOG-Mapper.

Mol Biol Evol. 2017 Aug 1;34(8):2115-2122. doi: 10.1093/molbev/msx148.

eggNOG: automated construction and annotation of orthologous groups of genes.

Nucleic Acids Res. 2008 Jan;36(Database issue):D250-4. doi: 10.1093/nar/gkm796. Epub 2007 Oct 16.

eggNOG-mapper v2: Functional Annotation, Orthology Assignments, and Domain Prediction at the Metagenomic Scale.

Mol Biol Evol. 2021 Dec 9;38(12):5825-5829. doi: 10.1093/molbev/msab293.

Protein-Coding Gene Families in Prokaryote Genome Comparisons.

Methods Mol Biol. 2024;2802:33-55. doi: 10.1007/978-1-0716-3838-5_2.

A meta-approach for improving the prediction and the functional annotation of ortholog groups.

BMC Genomics. 2014;15 Suppl 6(Suppl 6):S16. doi: 10.1186/1471-2164-15-S6-S16. Epub 2014 Oct 17.

eggNOG v3.0: orthologous groups covering 1133 organisms at 41 different taxonomic ranges.

Nucleic Acids Res. 2012 Jan;40(Database issue):D284-9. doi: 10.1093/nar/gkr1060. Epub 2011 Nov 16.

Gene orthology assessment with OrthologID.

Methods Mol Biol. 2009;537:23-38. doi: 10.1007/978-1-59745-251-9_2.

引用本文的文献

Annotation matters: the effect of structural gene annotation on orthology inference.

Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf365.

A curated benchmark dataset for molecular identification based on genome skimming.

Sci Data. 2025 May 29;12(1):906. doi: 10.1038/s41597-025-05230-2.

SCARAP: scalable cross-species comparative genomics of prokaryotes.

Bioinformatics. 2024 Dec 26;41(1). doi: 10.1093/bioinformatics/btae735.

Unraveling genomic features and phylogenomics through the analysis of three Mexican endemic genomes.

PeerJ. 2024 Jul 8;12:e17651. doi: 10.7717/peerj.17651. eCollection 2024.

Integrating gene annotation with orthology inference at scale.

Science. 2023 Apr 28;380(6643):eabn3107. doi: 10.1126/science.abn3107.

An Efficient Feature Selection Algorithm for Gene Families Using NMF and ReliefF.

Genes (Basel). 2023 Feb 6;14(2):421. doi: 10.3390/genes14020421.

Draft genome assembly for the colombian freshwater bocachico fish, .

Front Genet. 2023 Jan 19;13:989788. doi: 10.3389/fgene.2022.989788. eCollection 2022.

OrthoDB v11: annotation of orthologs in the widest sampling of organismal diversity.

Nucleic Acids Res. 2023 Jan 6;51(D1):D445-D451. doi: 10.1093/nar/gkac998.

ORTHOSCOPE*: A Phylogenetic Pipeline to Infer Gene Histories from Genome-Wide Data.

Mol Biol Evol. 2022 Jan 7;39(1). doi: 10.1093/molbev/msab301.

KinOrtho: a method for mapping human kinase orthologs across the tree of life and illuminating understudied kinases.

BMC Bioinformatics. 2021 Sep 18;22(1):446. doi: 10.1186/s12859-021-04358-3.

本文引用的文献

Computational methods for Gene Orthology inference.

Brief Bioinform. 2011 Sep;12(5):379-91. doi: 10.1093/bib/bbr030. Epub 2011 Jun 19.

Evaluating ortholog prediction algorithms in a yeast model clade.

PLoS One. 2011 Apr 13;6(4):e18755. doi: 10.1371/journal.pone.0018755.

MetaPhOrs: orthology and paralogy predictions from multiple phylogenetic evidence using a consistency-based confidence score.

Nucleic Acids Res. 2011 Mar;39(5):e32. doi: 10.1093/nar/gkq953. Epub 2010 Dec 11.

OMA 2011: orthology inference among 1000 complete genomes.

Nucleic Acids Res. 2011 Jan;39(Database issue):D289-94. doi: 10.1093/nar/gkq1238. Epub 2010 Nov 27.

PhylomeDB v3.0: an expanding repository of genome-wide collections of trees, alignments and phylogeny-based orthology and paralogy predictions.

Nucleic Acids Res. 2011 Jan;39(Database issue):D556-60. doi: 10.1093/nar/gkq1109. Epub 2010 Nov 12.

Ensembl 2011.

Nucleic Acids Res. 2011 Jan;39(Database issue):D800-6. doi: 10.1093/nar/gkq1064. Epub 2010 Nov 2.

OrthoDB: the hierarchical catalog of eukaryotic orthologs in 2011.

Nucleic Acids Res. 2011 Jan;39(Database issue):D283-8. doi: 10.1093/nar/gkq930. Epub 2010 Oct 23.

New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0.

Syst Biol. 2010 May;59(3):307-21. doi: 10.1093/sysbio/syq010. Epub 2010 Mar 29.

The dynamic genome of Hydra.

Nature. 2010 Mar 25;464(7288):592-6. doi: 10.1038/nature08830. Epub 2010 Mar 14.

A new generation of homology search tools based on probabilistic inference.

Genome Inform. 2009 Oct;23(1):205-11.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

同源基因预测方法：基于已验证蛋白质家族的质量评估

Orthology prediction methods: a quality assessment using curated protein families.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献