Liao Irene T, Sears Karen E, Hileman Lena C, Nikolov Lachezar A
Department of Molecular, Cell, and Development Biology University of California - Los Angeles Los Angeles California USA.
Department of Ecology and Evolutionary Biology University of California - Los Angeles Los Angeles California USA.
Appl Plant Sci. 2024 Dec 25;13(1):e11627. doi: 10.1002/aps3.11627. eCollection 2025 Jan-Feb.
Orthology inference is crucial for comparative genomics, and multiple algorithms have been developed to identify putative orthologs for downstream analyses. Despite the abundance of proposed solutions, including publicly available benchmarks, it is difficult to assess which tool is most suitable for plant species, which commonly have complex genomic histories.
We explored the performance of four orthology inference algorithms-OrthoFinder, SonicParanoid, Broccoli, and OrthNet-on eight Brassicaceae genomes in two groups: one group comprising only diploids and another set comprising the diploids, two mesopolyploids, and one recent hexaploid genome.
The composition of the orthogroups reflected the species' ploidy and genomic histories, with the diploid set having a higher proportion of identical orthogroups. While the diploid + higher ploidy set had a lower proportion of orthogroups with identical compositions, the average degree of similarity between the orthogroups was not different from the diploid set.
Three algorithms-OrthoFinder, SonicParanoid, and Broccoli-are helpful for initial orthology predictions. Results produced using OrthNet were generally outliers but could still provide detailed information about gene colinearity. With our Brassicaceae dataset, slight discrepancies were found across the orthology inference algorithms, necessitating additional analyses such as tree inference to fine-tune results.
直系同源推断对于比较基因组学至关重要,并且已经开发了多种算法来识别假定的直系同源物以供下游分析。尽管有大量提出的解决方案,包括公开可用的基准,但很难评估哪种工具最适合植物物种,因为植物通常具有复杂的基因组历史。
我们在两组八个十字花科基因组中探索了四种直系同源推断算法——OrthoFinder、SonicParanoid、Broccoli和OrthNet的性能:一组仅包括二倍体,另一组包括二倍体、两个中多倍体和一个近期的六倍体基因组。
直系同源组的组成反映了物种的倍性和基因组历史,二倍体组具有更高比例的相同直系同源组。虽然二倍体+更高倍性组中具有相同组成的直系同源组比例较低,但直系同源组之间的平均相似程度与二倍体组没有差异。
三种算法——OrthoFinder、SonicParanoid和Broccoli——有助于进行初始直系同源预测。使用OrthNet产生的结果通常是异常值,但仍可提供有关基因共线性的详细信息。对于我们的十字花科数据集,在直系同源推断算法之间发现了细微差异,因此需要进行额外的分析,如树推断来微调结果。