Suppr超能文献

从 500,000 个随机序列中深度学习酵母 5'非翻译区的调控语法。

Deep learning of the regulatory grammar of yeast 5' untranslated regions from 500,000 random sequences.

机构信息

Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA.

Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA.

出版信息

Genome Res. 2017 Dec;27(12):2015-2024. doi: 10.1101/gr.224964.117. Epub 2017 Nov 2.

Abstract

Our ability to predict protein expression from DNA sequence alone remains poor, reflecting our limited understanding of -regulatory grammar and hampering the design of engineered genes for synthetic biology applications. Here, we generate a model that predicts the protein expression of the 5' untranslated region (UTR) of mRNAs in the yeast We constructed a library of half a million 50-nucleotide-long random 5' UTRs and assayed their activity in a massively parallel growth selection experiment. The resulting data allow us to quantify the impact on protein expression of Kozak sequence composition, upstream open reading frames (uORFs), and secondary structure. We trained a convolutional neural network (CNN) on the random library and showed that it performs well at predicting the protein expression of both a held-out set of the random 5' UTRs as well as native 5' UTRs. The model additionally was used to computationally evolve highly active 5' UTRs. We confirmed experimentally that the great majority of the evolved sequences led to higher protein expression rates than the starting sequences, demonstrating the predictive power of this model.

摘要

我们单独从 DNA 序列预测蛋白质表达的能力仍然很差,这反映了我们对调控语法的理解有限,阻碍了用于合成生物学应用的工程基因的设计。在这里,我们生成了一个模型,该模型可以预测酵母中 mRNA 的 5'非翻译区(UTR)的蛋白质表达。我们构建了一个包含五十万个 50 个核苷酸长的随机 5'UTR 的文库,并在大规模平行生长选择实验中测定了它们的活性。由此产生的数据使我们能够量化 Kozak 序列组成、上游开放阅读框(uORFs)和二级结构对蛋白质表达的影响。我们在随机文库上训练了一个卷积神经网络(CNN),并表明它在预测随机 5'UTR 和天然 5'UTR 的蛋白质表达方面表现良好。该模型还被用于计算进化高度活跃的 5'UTR。我们通过实验证实,绝大多数进化序列导致的蛋白质表达速率高于起始序列,这证明了该模型的预测能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/efdb/5741052/c6663b2c6b56/2015f01.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验