Tsai Cheng-Hung, Stajich Jason Eric
bioRxiv. 2025 Aug 1:2025.07.30.666921. doi: 10.1101/2025.07.30.666921.
Phyling is a fast, scalable, and user-friendly tool supporting phylogenomic reconstruction of species phylogenies directly from protein-encoded genomic data. It identifies orthologous genes by searching a sample's protein sequences against a Hidden Markov Models marker set, containing single-copy orthologs, retrieved from the BUSCO database. In the final step, users can choose between consensus and concatenation strategies to construct the species tree from the aligned orthologs. Phyling efficiently resolves large phylogenies by optimizing memory usage and data processing. Its checkpoint system enables users to incrementally add or remove samples without repeating the entire search process. For analyses involving closely related taxa, Phyling supports the use of nucleotide coding sequences, which may capture phylogenetic signals missed by protein sequences. The benchmark results show that Phyling substantially runs faster than OrthoFinder, a Reciprocal Best Hit based method, while achieving equal or better accuracy.
Phyling是一个快速、可扩展且用户友好的工具,支持直接从蛋白质编码的基因组数据中进行物种系统发育的系统发育重建。它通过将样本的蛋白质序列与从BUSCO数据库检索到的包含单拷贝直系同源物的隐马尔可夫模型标记集进行比对来识别直系同源基因。在最后一步,用户可以在一致性策略和串联策略之间进行选择,以从比对后的直系同源物构建物种树。Phyling通过优化内存使用和数据处理有效地解析大型系统发育。其检查点系统使用户能够增量添加或删除样本,而无需重复整个搜索过程。对于涉及密切相关分类群的分析,Phyling支持使用核苷酸编码序列,这些序列可能捕获蛋白质序列遗漏的系统发育信号。基准测试结果表明,Phyling的运行速度比基于相互最佳命中的方法OrthoFinder快得多,同时具有相同或更好的准确性。