Suppr超能文献

使用在真核生物基部定义的系统发育模式对同源物方法进行基准测试。

Benchmarking orthology methods using phylogenetic patterns defined at the base of Eukaryotes.

出版信息

Brief Bioinform. 2021 May 20;22(3). doi: 10.1093/bib/bbaa206.

Abstract

Insights into the evolution of ancestral complexes and pathways are generally achieved through careful and time-intensive manual analysis often using phylogenetic profiles of the constituent proteins. This manual analysis limits the possibility of including more protein-complex components, repeating the analyses for updated genome sets or expanding the analyses to larger scales. Automated orthology inference should allow such large-scale analyses, but substantial differences between orthologous groups generated by different approaches are observed. We evaluate orthology methods for their ability to recapitulate a number of observations that have been made with regard to genome evolution in eukaryotes. Specifically, we investigate phylogenetic profile similarity (co-occurrence of complexes), the last eukaryotic common ancestor's gene content, pervasiveness of gene loss and the overlap with manually determined orthologous groups. Moreover, we compare the inferred orthologies to each other. We find that most orthology methods reconstruct a large last eukaryotic common ancestor, with substantial gene loss, and can predict interacting proteins reasonably well when applying phylogenetic co-occurrence. At the same time, derived orthologous groups show imperfect overlap with manually curated orthologous groups. There is no strong indication of which orthology method performs better than another on individual or all of these aspects. Counterintuitively, despite the orthology methods behaving similarly regarding large-scale evaluation, the obtained orthologous groups differ vastly from one another. Availability and implementation The data and code underlying this article are available in github and/or upon reasonable request to the corresponding author: https://github.com/ESDeutekom/ComparingOrthologies.

摘要

通过仔细和耗时的手动分析,通常使用组成蛋白质的系统发育谱,可以深入了解祖先复合物和途径的进化。这种手动分析限制了包含更多蛋白质复合物组件的可能性,也限制了为更新的基因组集重复分析或扩展分析到更大规模的可能性。自动化同源推断应该允许进行这种大规模分析,但不同方法生成的同源群之间存在很大差异。我们评估了同源方法,以了解它们是否能够重现与真核生物基因组进化有关的许多观察结果。具体来说,我们研究了系统发育谱相似性(复合物的共现)、最后真核生物共同祖先的基因组成、基因丢失的普遍性以及与手动确定的同源群的重叠。此外,我们还比较了推断的同源群。我们发现,大多数同源方法都能重建一个很大的最后真核生物共同祖先,具有大量的基因丢失,并且当应用系统发育共现时,可以很好地预测相互作用的蛋白质。同时,衍生的同源群与手动整理的同源群之间存在不完全重叠。没有强有力的证据表明哪种同源方法在单个或所有这些方面都优于另一种。反直觉的是,尽管同源方法在大规模评估方面表现相似,但获得的同源群彼此之间却有很大的差异。

可获取性和实现本文所依据的数据和代码可在 github 上获得,或根据合理请求向相应作者索取:https://github.com/ESDeutekom/ComparingOrthologies。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5d4b/8138875/bb06804665d1/bbaa206f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验