Suppr超能文献

整合实验反馈可改进生物序列生成模型。

Integrating experimental feedback improves generative models for biological sequences.

作者信息

Calvanese Francesco, Peinetti Giovanni, Pavlinova Polina, Nghe Philippe, Weigt Martin

机构信息

Sorbonne Université, CNRS, Department of Computational, Quantitative and Synthetic Biology-CQSB, 75005 Paris, France.

Laboratoire de Biophysique et Evolution, UMR CNRS-ESPCI 8231 Chimie Biologie Innovation, PSL University, 75005 Paris, France.

出版信息

Nucleic Acids Res. 2025 Aug 27;53(16). doi: 10.1093/nar/gkaf832.

Abstract

Generative probabilistic models have shown promise in designing artificial RNA and protein sequences but often suffer from high rates of false positives, where sequences predicted as functional fail experimental validation. To address this critical limitation, we explore the impact of reintegrating experimental feedback into the model design process. We propose a likelihood-based reintegration scheme, which we test through extensive computational experiments on both RNA and protein datasets, as well as through wet-lab experiments on the self-splicing ribozyme from the Group I intron RNA family where our approach demonstrates particular efficacy. We show that integrating recent experimental data enhances the model's capacity of generating functional sequences (e.g. from 6.7% to 63.7% of active designs at 45 mutations). This feedback-driven approach thus provides a significant improvement in the design of biomolecular sequences by directly tackling the false-positive challenge.

摘要

生成概率模型在设计人工RNA和蛋白质序列方面已显示出前景,但往往存在较高的假阳性率,即预测为功能性的序列在实验验证中失败。为了解决这一关键限制,我们探讨了将实验反馈重新整合到模型设计过程中的影响。我们提出了一种基于似然性的重新整合方案,并通过对RNA和蛋白质数据集进行广泛的计算实验,以及对来自I组内含子RNA家族的自我剪接核酶进行湿实验室实验来进行测试,我们的方法在该实验中显示出特别的效果。我们表明,整合最新的实验数据可提高模型生成功能序列的能力(例如,在45个突变时,活性设计从6.7%提高到63.7%)。这种反馈驱动的方法通过直接应对假阳性挑战,从而在生物分子序列设计方面带来了显著改进。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d36d/12407104/e30b452d676d/gkaf832figgra1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验