Suppr超能文献

利用贝叶斯主动学习和生物物理学预测高适应性病毒蛋白变体

Predicting high-fitness viral protein variants with Bayesian active learning and biophysics.

作者信息

Huot Marian, Wang Dianzhuo, Liu Jiacheng, Shakhnovich Eugene I

机构信息

Department of Chemistry and Chemical Biology, Harvard University, Cambridge, MA 02138.

Laboratory of Physics of the Ecole Normale Supérieure, Department of Physics, CNRS UMR 8023 and Paris Sciences and Lettres Research, Sorbonne Université, Paris 75005, France.

出版信息

Proc Natl Acad Sci U S A. 2025 Jun 17;122(24):e2503742122. doi: 10.1073/pnas.2503742122. Epub 2025 Jun 9.

Abstract

The early detection of high-fitness viral variants is critical for pandemic response, yet limited experimental resources at the onset of variant emergence hinder effective identification. To address this, we introduce an active learning framework, VIRAL (Viral Identification via Rapid Active Learning), that integrates protein language model, Gaussian process with uncertainty estimation, and a biophysical model to predict the fitness of novel variants in a few-shot learning setting. By benchmarking on past SARS-CoV-2 data, we demonstrate that our method accelerates the identification of high-fitness variants by up to fivefold compared to random sampling while requiring experimental characterization of fewer than 1% of possible variants. We also demonstrate that our framework effectively identifies sites that are frequently mutated during natural viral evolution with a predictive advantage of up to two years compared to baseline strategies, particularly those enabling antibody escape while preserving ACE2 binding. Through systematic analysis of different acquisition strategies, we show that incorporating uncertainty in variant selection enables broader exploration of the sequence landscape, leading to the identification of evolutionarily distant but potentially dangerous variants. Our results suggest that VIRAL could serve as an effective early warning system for identifying concerning SARS-CoV-2 variants and potentially emerging viruses with pandemic potential before they achieve widespread circulation.

摘要

高适应性病毒变体的早期检测对于应对大流行至关重要,但在变体出现之初,实验资源有限阻碍了有效识别。为解决这一问题,我们引入了一个主动学习框架VIRAL(通过快速主动学习进行病毒识别),该框架整合了蛋白质语言模型、带不确定性估计的高斯过程和一个生物物理模型,以在少样本学习环境中预测新变体的适应性。通过对过去的新冠病毒数据进行基准测试,我们证明,与随机抽样相比,我们的方法将高适应性变体的识别速度提高了多达五倍,同时所需实验表征的可能变体不到1%。我们还证明,我们的框架有效地识别了在自然病毒进化过程中频繁发生突变的位点,与基线策略相比,预测优势高达两年,特别是那些在保留与血管紧张素转化酶2(ACE2)结合的同时实现抗体逃逸的位点。通过对不同获取策略的系统分析,我们表明在变体选择中纳入不确定性能够更广泛地探索序列景观,从而识别出进化上距离较远但可能危险的变体。我们的结果表明,VIRAL可以作为一种有效的早期预警系统,在具有大流行潜力的新冠病毒变体和潜在新兴病毒广泛传播之前识别出相关变体。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/63b8/12184641/a368b4813d5a/pnas.2503742122fig01.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验