Suppr超能文献

基因直系同源推断的计算方法。

Computational methods for Gene Orthology inference.

机构信息

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.

出版信息

Brief Bioinform. 2011 Sep;12(5):379-91. doi: 10.1093/bib/bbr030. Epub 2011 Jun 19.

Abstract

Accurate inference of orthologous genes is a pre-requisite for most comparative genomics studies, and is also important for functional annotation of new genomes. Identification of orthologous gene sets typically involves phylogenetic tree analysis, heuristic algorithms based on sequence conservation, synteny analysis, or some combination of these approaches. The most direct tree-based methods typically rely on the comparison of an individual gene tree with a species tree. Once the two trees are accurately constructed, orthologs are straightforwardly identified by the definition of orthology as those homologs that are related by speciation, rather than gene duplication, at their most recent point of origin. Although ideal for the purpose of orthology identification in principle, phylogenetic trees are computationally expensive to construct for large numbers of genes and genomes, and they often contain errors, especially at large evolutionary distances. Moreover, in many organisms, in particular prokaryotes and viruses, evolution does not appear to have followed a simple 'tree-like' mode, which makes conventional tree reconciliation inapplicable. Other, heuristic methods identify probable orthologs as the closest homologous pairs or groups of genes in a set of organisms. These approaches are faster and easier to automate than tree-based methods, with efficient implementations provided by graph-theoretical algorithms enabling comparisons of thousands of genomes. Comparisons of these two approaches show that, despite conceptual differences, they produce similar sets of orthologs, especially at short evolutionary distances. Synteny also can aid in identification of orthologs. Often, tree-based, sequence similarity- and synteny-based approaches can be combined into flexible hybrid methods.

摘要

准确推断直系同源基因是大多数比较基因组学研究的前提,对于新基因组的功能注释也很重要。直系同源基因集的识别通常涉及系统发育树分析、基于序列保守性的启发式算法、同线性分析,或这些方法的某种组合。最直接的基于树的方法通常依赖于将单个基因树与物种树进行比较。一旦准确构建了这两棵树,就可以根据同源物是通过物种形成而不是基因复制在最近的起源点相关的定义,直接识别直系同源物。虽然从理论上讲,这种方法非常适合确定直系同源物,但对于大量基因和基因组来说,构建系统发育树的计算成本很高,而且它们通常包含错误,尤其是在较大的进化距离上。此外,在许多生物体中,特别是原核生物和病毒,进化似乎并没有遵循简单的“树状”模式,这使得传统的树整合方法不适用。其他启发式方法将最接近的同源对或一组基因识别为一组生物体中的可能直系同源物。这些方法比基于树的方法更快、更容易自动化,图形理论算法的高效实现使数千个基因组的比较成为可能。这两种方法的比较表明,尽管存在概念上的差异,但它们产生了相似的直系同源物集,尤其是在较短的进化距离上。同线性也有助于鉴定直系同源物。通常,基于树的、基于序列相似性的和基于同线性的方法可以组合成灵活的混合方法。

相似文献

1
Computational methods for Gene Orthology inference.
Brief Bioinform. 2011 Sep;12(5):379-91. doi: 10.1093/bib/bbr030. Epub 2011 Jun 19.
3
Inferring Orthology and Paralogy.
Methods Mol Biol. 2019;1910:149-175. doi: 10.1007/978-1-4939-9074-0_5.
4
Integrating Sequence Evolution into Probabilistic Orthology Analysis.
Syst Biol. 2015 Nov;64(6):969-82. doi: 10.1093/sysbio/syv044. Epub 2015 Jun 30.
5
Automatic clustering of orthologs and in-paralogs from pairwise species comparisons.
J Mol Biol. 2001 Dec 14;314(5):1041-52. doi: 10.1006/jmbi.2000.5197.
6
Identification of mammalian orthologs using local synteny.
BMC Genomics. 2009 Dec 23;10:630. doi: 10.1186/1471-2164-10-630.
7
Orthologs, turn-over, and remolding of tRNAs in primates and fruit flies.
BMC Genomics. 2016 Aug 11;17(1):617. doi: 10.1186/s12864-016-2927-4.
8
Inferring orthology and paralogy.
Methods Mol Biol. 2012;855:259-79. doi: 10.1007/978-1-61779-582-4_9.

引用本文的文献

1
A conserved role for centriolar satellites in translation of centrosomal and ciliary proteins.
J Cell Biol. 2025 Aug 4;224(8). doi: 10.1083/jcb.202408042. Epub 2025 May 21.
2
Caspase Domain Duplication During the Evolution of Caspase-16.
J Mol Evol. 2025 May 20. doi: 10.1007/s00239-025-10252-w.
4
Quantifying the influence of genetic context on duplicated mammalian genes.
bioRxiv. 2025 May 2:2025.04.03.647042. doi: 10.1101/2025.04.03.647042.
5
Exploration of the genetic landscape of bacterial dsDNA viruses reveals an ANI gap amid extensive mosaicism.
mSystems. 2025 Feb 18;10(2):e0166124. doi: 10.1128/msystems.01661-24. Epub 2025 Jan 29.
6
Hayai-Annotation: A functional gene prediction tool that integrates orthologs and gene ontology for network analysis in plant species.
Comput Struct Biotechnol J. 2024 Dec 16;27:117-126. doi: 10.1016/j.csbj.2024.12.011. eCollection 2025.
7
getphylo: rapid and automatic generation of multi-locus phylogenetic trees.
BMC Bioinformatics. 2025 Jan 18;26(1):21. doi: 10.1186/s12859-025-06035-1.
8
Genome sequencing of and its comparative analysis with malacostracan crustaceans.
3 Biotech. 2024 Nov;14(11):276. doi: 10.1007/s13205-024-04121-4. Epub 2024 Oct 23.

本文引用的文献

2
Evolutionarily conserved orthologous families in phages are relatively rare in their prokaryotic hosts.
J Bacteriol. 2011 Apr;193(8):1806-14. doi: 10.1128/JB.01311-10. Epub 2011 Feb 11.
3
Horizontal transfer, not duplication, drives the expansion of protein families in prokaryotes.
PLoS Genet. 2011 Jan 27;7(1):e1001284. doi: 10.1371/journal.pgen.1001284.
4
OrthoInspector: comprehensive orthology analysis and visual exploration.
BMC Bioinformatics. 2011 Jan 10;12:11. doi: 10.1186/1471-2105-12-11.
5
IsoBase: a database of functionally related proteins across PPI networks.
Nucleic Acids Res. 2011 Jan;39(Database issue):D295-300. doi: 10.1093/nar/gkq1234.
6
Harvesting evolutionary signals in a forest of prokaryotic gene trees.
Mol Biol Evol. 2011 Apr;28(4):1393-405. doi: 10.1093/molbev/msq323. Epub 2010 Dec 20.
8
Multiple sequence alignment: a major challenge to large-scale phylogenetics.
PLoS Curr. 2010 Nov 19;2:RRN1198. doi: 10.1371/currents.RRN1198.
9
OMA 2011: orthology inference among 1000 complete genomes.
Nucleic Acids Res. 2011 Jan;39(Database issue):D289-94. doi: 10.1093/nar/gkq1238. Epub 2010 Nov 27.
10
DODO: an efficient orthologous genes assignment tool based on domain architectures. Domain based ortholog detection.
BMC Bioinformatics. 2010 Oct 15;11 Suppl 7(Suppl 7):S6. doi: 10.1186/1471-2105-11-S7-S6.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验