Suppr超能文献

重新调整蛋白质序列和结构生成模型以增强蛋白质稳定性预测。

Rewiring protein sequence and structure generative models to enhance protein stability prediction.

作者信息

Li Ziang, Luo Yunan

机构信息

School of Computational Science and Engineering, Georgia Institute of Technology.

出版信息

bioRxiv. 2025 Feb 18:2025.02.13.638154. doi: 10.1101/2025.02.13.638154.

Abstract

Predicting changes in protein thermostability due to amino acid substitutions is essential for understanding human diseases and engineering useful proteins for clinical and industrial applications. While recent advances in protein generative models, which learn probability distributions over amino acids conditioned on structural or evolutionary sequence contexts, have shown impressive performance in predicting various protein properties without task-specific training, their strong unsupervised prediction ability does not extend to all protein functions. In particular, their potential to improve protein stability prediction remains underexplored. In this work, we present SPURS, a novel deep learning framework that adapts and integrates two general-purpose protein generative models-a protein language model (ESM) and an inverse folding model (ProteinMPNN)-into an effective stability predictor. SPURS employs a lightweight neural network module to rewire per-residue structure representations learned by ProteinMPNN into the attention layers of ESM, thereby informing and enhancing ESM's sequence representation learning. This rewiring strategy enables SPURS to harness evolutionary patterns from both sequence and structure data, where the sequence likelihood distribution learned by ESM is conditioned on structure priors encoded by ProteinMPNN to predict mutation effects. We steer this integrated framework to a stability prediction model through supervised training on a recently released mega-scale thermostability dataset. Evaluations across 12 benchmark datasets showed that SPURS delivers accurate, rapid, scalable, and generalizable stability predictions, consistently outperforming current state-of-the-art methods. Notably, SPURS demonstrates remarkable versatility in protein stability and function analyses: when combined with a protein language model, it accurately identifies protein functional sites in an unsupervised manner. Additionally, it enhances current low- protein fitness prediction models by serving as a stability prior model to improve accuracy. These results highlight SPURS as a powerful tool to advance current protein stability prediction and machine learning-guided protein engineering workflows. The source code of SPURS is available at https://github.com/luo-group/SPURS.

摘要

预测由于氨基酸替换导致的蛋白质热稳定性变化对于理解人类疾病以及设计用于临床和工业应用的有用蛋白质至关重要。虽然蛋白质生成模型最近取得了进展,这些模型在结构或进化序列背景条件下学习氨基酸的概率分布,在无需特定任务训练的情况下预测各种蛋白质特性方面表现出了令人印象深刻的性能,但其强大的无监督预测能力并不适用于所有蛋白质功能。特别是,其在改善蛋白质稳定性预测方面的潜力仍未得到充分探索。在这项工作中,我们提出了SPURS,这是一种新颖的深度学习框架,它将两种通用蛋白质生成模型——一种蛋白质语言模型(ESM)和一种反向折叠模型(ProteinMPNN)——进行调整和整合,形成一个有效的稳定性预测器。SPURS采用一个轻量级神经网络模块,将ProteinMPNN学习到的每个残基的结构表示重新连接到ESM的注意力层,从而为ESM的序列表示学习提供信息并增强其学习效果。这种重新连接策略使SPURS能够利用来自序列和结构数据的进化模式,其中ESM学习到的序列似然分布以ProteinMPNN编码的结构先验为条件来预测突变效应。我们通过在最近发布的大规模热稳定性数据集上进行监督训练,将这个集成框架引导到一个稳定性预测模型。在12个基准数据集上的评估表明,SPURS能够提供准确、快速、可扩展且通用的稳定性预测,始终优于当前的最先进方法。值得注意的是,SPURS在蛋白质稳定性和功能分析中表现出显著的通用性:当与蛋白质语言模型结合时,它能够以无监督的方式准确识别蛋白质功能位点。此外,它通过作为稳定性先验模型来提高准确性,增强了当前的低蛋白质适应性预测模型。这些结果突出了SPURS作为推进当前蛋白质稳定性预测和机器学习指导的蛋白质工程工作流程的强大工具。SPURS的源代码可在https://github.com/luo-group/SPURS获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d92/11870403/dcb55294e3f4/nihpp-2025.02.13.638154v1-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验