Garg Sriram G, Hochberg Georg K A
Evolutionary Biochemistry Group, Max Planck Institute for Terrestrial Microbiology, Marburg 35043, Germany.
Center for Synthetic Microbiology (SYNMIKRO), Philipps-University Marburikg, Marburg 35043, Germany.
Mol Biol Evol. 2025 Jun 4;42(6). doi: 10.1093/molbev/msaf124.
Sequence-based maximum likelihood phylogenetics is a widely used method for inferring evolutionary relationships, which has illuminated the evolutionary histories of proteins and the organisms that harbor them. However, modern implementations with sophisticated models of sequence evolution struggle to resolve deep evolutionary relationships, which can be obscured by excessive sequence divergence and substitution saturation. Structural phylogenetics has emerged as a promising alternative because protein structure evolves much more slowly than protein sequences. Recent developments in protein structure prediction using AI have made it possible to predict protein structures for entire protein families and then to translate these structures into a sequence representation-the 3Di structural alphabet-that can in theory be directly fed into existing sequence-based phylogenetic software. To unlock the full potential of this idea, however, requires the inference of a general substitution matrix for structural phylogenetics, which has so far been missing. Here, we infer this matrix from large datasets of protein structures and show that it results in a better fit to empirical datasets than previous approaches. We then use this matrix to re-visit the question of the root of the tree of life. Using structural phylogenies of universal paralogs, we provide the first unambiguous evidence for a root between archaea and bacteria. Finally, we discuss some practical and conceptual limitations of structural phylogenetics. Our 3Di substitution matrix provides a starting point for revisiting many deep phylogenetic problems that have so far been extremely difficult to solve.
基于序列的最大似然系统发育学是一种广泛用于推断进化关系的方法,它阐明了蛋白质及其宿主生物的进化历史。然而,具有复杂序列进化模型的现代方法难以解析深层进化关系,这些关系可能会因过度的序列差异和替换饱和而变得模糊不清。结构系统发育学作为一种有前途的替代方法应运而生,因为蛋白质结构的进化比蛋白质序列慢得多。利用人工智能进行蛋白质结构预测的最新进展使得预测整个蛋白质家族的蛋白质结构成为可能,然后将这些结构转化为一种序列表示形式——3Di结构字母表,理论上可以直接输入现有的基于序列的系统发育软件中。然而,要充分发挥这一想法的潜力,需要为结构系统发育学推断一个通用的替换矩阵,而目前这一矩阵仍然缺失。在这里,我们从大量蛋白质结构数据集中推断出这个矩阵,并表明它比以前的方法更能拟合经验数据集。然后,我们使用这个矩阵重新审视生命之树的根部问题。通过通用旁系同源物的结构系统发育,我们首次为古菌和细菌之间的根部提供了明确的证据。最后,我们讨论了结构系统发育学在实际应用和概念上的一些局限性。我们的3Di替换矩阵为重新审视许多迄今为止极难解决的深层系统发育问题提供了一个起点。