Data Science Institute, Columbia University, New York, New York 10027.
Department of Biochemistry and Molecular Biophysics, Columbia University, New York, New York 10027.
Genetics. 2018 Oct;210(2):445-461. doi: 10.1534/genetics.118.301307. Epub 2018 Aug 17.
OrthoList, a compendium of genes with human orthologs compiled in 2011 by a meta-analysis of four orthology-prediction methods, has been a popular tool for identifying conserved genes for research into biological and disease mechanisms. However, the efficacy of orthology prediction depends on the accuracy of gene-model predictions, an ongoing process, and orthology-prediction algorithms have also been updated over time. Here we present OrthoList 2 (OL2), a new comparative genomic analysis between and humans, and the first assessment of how changes over time affect the landscape of predicted orthologs between two species. Although we find that updates to the orthology-prediction methods significantly changed the landscape of -human orthologs predicted by individual programs and-unexpectedly-reduced agreement among them, we also show that our meta-analysis approach "buffered" against changes in gene content. We show that adding results from more programs did not lead to many additions to the list and discuss reasons to avoid assigning "scores" based on support by individual orthology-prediction programs; the treatment of "legacy" genes no longer predicted by these programs; and the practical difficulties of updating due to encountering deprecated, changed, or retired gene identifiers. In addition, we consider what other criteria may support claims of orthology and alternative approaches to find potential orthologs that elude identification by these programs. Finally, we created a new web-based tool that allows for rapid searches of OL2 by gene identifiers, protein domains [InterPro and SMART (Simple Modular Architecture Research Tool], or human disease associations ([OMIM (Online Mendelian Inheritence in Man], and also includes available RNA-interference resources to facilitate potential translational cross-species studies.
OrthoList 是一个包含人类直系同源基因的摘要,由四种直系同源预测方法的荟萃分析于 2011 年编制,一直是识别保守基因以研究生物学和疾病机制的热门工具。然而,直系同源预测的效果取决于基因模型预测的准确性,这是一个持续的过程,并且直系同源预测算法也随着时间的推移而更新。在这里,我们展示了 OrthoList 2(OL2),这是与人类之间的新比较基因组分析,也是首次评估随着时间的推移变化如何影响两个物种之间预测直系同源的景观。尽管我们发现直系同源预测方法的更新显著改变了单个程序预测的 - 人类直系同源的景观,并且出人意料地降低了它们之间的一致性,但我们还表明,我们的荟萃分析方法“缓冲”了基因含量的变化。我们表明,添加更多程序的结果并没有导致列表中增加很多,并且讨论了避免根据单个直系同源预测程序的支持分配“分数”的原因;不再由这些程序预测的“遗留”基因的处理;以及由于遇到已弃用、更改或已退休的基因标识符而导致更新的实际困难。此外,我们考虑了其他标准可能支持直系同源的主张,以及替代方法来寻找这些程序无法识别的潜在直系同源。最后,我们创建了一个新的基于网络的工具,允许通过基因标识符、蛋白质结构域 [InterPro 和 SMART(简单模块化架构研究工具)] 或人类疾病关联 [OMIM(在线孟德尔遗传在人)] 快速搜索 OL2,还包括可用的 RNA 干扰资源,以促进潜在的跨物种转化研究。