从 500,000 个随机序列中深度学习酵母 5'非翻译区的调控语法。

Deep learning of the regulatory grammar of yeast 5' untranslated regions from 500,000 random sequences.

机构信息

Department of Genome Sciences, University of Washington, Seattle, Washington 98195, USA.

Howard Hughes Medical Institute, University of Washington, Seattle, Washington 98195, USA.

出版信息

Genome Res. 2017 Dec;27(12):2015-2024. doi: 10.1101/gr.224964.117. Epub 2017 Nov 2.

DOI:10.1101/gr.224964.117

PMID:29097404

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5741052/

Abstract

Our ability to predict protein expression from DNA sequence alone remains poor, reflecting our limited understanding of -regulatory grammar and hampering the design of engineered genes for synthetic biology applications. Here, we generate a model that predicts the protein expression of the 5' untranslated region (UTR) of mRNAs in the yeast We constructed a library of half a million 50-nucleotide-long random 5' UTRs and assayed their activity in a massively parallel growth selection experiment. The resulting data allow us to quantify the impact on protein expression of Kozak sequence composition, upstream open reading frames (uORFs), and secondary structure. We trained a convolutional neural network (CNN) on the random library and showed that it performs well at predicting the protein expression of both a held-out set of the random 5' UTRs as well as native 5' UTRs. The model additionally was used to computationally evolve highly active 5' UTRs. We confirmed experimentally that the great majority of the evolved sequences led to higher protein expression rates than the starting sequences, demonstrating the predictive power of this model.

摘要

我们单独从 DNA 序列预测蛋白质表达的能力仍然很差，这反映了我们对调控语法的理解有限，阻碍了用于合成生物学应用的工程基因的设计。在这里，我们生成了一个模型，该模型可以预测酵母中 mRNA 的 5'非翻译区（UTR）的蛋白质表达。我们构建了一个包含五十万个 50 个核苷酸长的随机 5'UTR 的文库，并在大规模平行生长选择实验中测定了它们的活性。由此产生的数据使我们能够量化 Kozak 序列组成、上游开放阅读框（uORFs）和二级结构对蛋白质表达的影响。我们在随机文库上训练了一个卷积神经网络（CNN），并表明它在预测随机 5'UTR 和天然 5'UTR 的蛋白质表达方面表现良好。该模型还被用于计算进化高度活跃的 5'UTR。我们通过实验证实，绝大多数进化序列导致的蛋白质表达速率高于起始序列，这证明了该模型的预测能力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/efdb/5741052/c6663b2c6b56/2015f01.jpg

相似文献

Deep learning of the regulatory grammar of yeast 5' untranslated regions from 500,000 random sequences.

Genome Res. 2017 Dec;27(12):2015-2024. doi: 10.1101/gr.224964.117. Epub 2017 Nov 2.

Identification and characterization of upstream open reading frames (uORF) in the 5' untranslated regions (UTR) of genes in Saccharomyces cerevisiae.

Curr Genet. 2005 Aug;48(2):77-87. doi: 10.1007/s00294-005-0001-x. Epub 2005 Sep 14.

Folding free energies of 5'-UTRs impact post-transcriptional regulation on a genomic scale in yeast.

PLoS Comput Biol. 2005 Dec;1(7):e72. doi: 10.1371/journal.pcbi.0010072. Epub 2005 Dec 9.

An RNA structure-mediated, posttranscriptional model of human α-1-antitrypsin expression.

Proc Natl Acad Sci U S A. 2017 Nov 21;114(47):E10244-E10253. doi: 10.1073/pnas.1706539114. Epub 2017 Nov 6.

Upstream sequence elements direct post-transcriptional regulation of gene expression under stress conditions in yeast.

BMC Genomics. 2009 Jan 7;10:7. doi: 10.1186/1471-2164-10-7.

Deciphering the rules by which 5'-UTR sequences affect protein expression in yeast.

Proc Natl Acad Sci U S A. 2013 Jul 23;110(30):E2792-801. doi: 10.1073/pnas.1222534110. Epub 2013 Jul 5.

5' untranslated regions: the next regulatory sequence in yeast synthetic biology.

Biol Rev Camb Philos Soc. 2020 Apr;95(2):517-529. doi: 10.1111/brv.12575. Epub 2019 Dec 20.

Effects of sequence motifs in the yeast 3' untranslated region determined from massively parallel assays of random sequences.

Genome Biol. 2021 Oct 18;22(1):293. doi: 10.1186/s13059-021-02509-6.

Post-termination ribosome interactions with the 5'UTR modulate yeast mRNA stability.

EMBO J. 1999 Jun 1;18(11):3139-52. doi: 10.1093/emboj/18.11.3139.

Human 5' UTR design and variant effect prediction from a massively parallel translation assay.

Nat Biotechnol. 2019 Jul;37(7):803-809. doi: 10.1038/s41587-019-0164-5. Epub 2019 Jul 1.

引用本文的文献

Multi-omic assessment of mRNA translation dynamics in liver cancer cell lines.

Sci Data. 2025 Aug 30;12(1):1520. doi: 10.1038/s41597-025-05861-5.

5' untranslated regions tune translation.

bioRxiv. 2025 Jul 14:2025.07.14.664749. doi: 10.1101/2025.07.14.664749.

Biocontrol Effect of D7-8 on Potato Common Scab and Its Complete Genome Sequence Analysis.

Microorganisms. 2025 Mar 28;13(4):770. doi: 10.3390/microorganisms13040770.

Deciphering the landscape of cis-acting sequences in natural yeast transcript leaders.

Nucleic Acids Res. 2025 Feb 27;53(5). doi: 10.1093/nar/gkaf165.

Inferring protein from transcript abundances using convolutional neural networks.

BioData Min. 2025 Feb 27;18(1):18. doi: 10.1186/s13040-025-00434-z.

Improving the generalization of protein expression models with mechanistic sequence information.

Nucleic Acids Res. 2025 Jan 24;53(3). doi: 10.1093/nar/gkaf020.

The regulatory landscape of 5' UTRs in translational control during zebrafish embryogenesis.

Dev Cell. 2025 May 19;60(10):1498-1515.e8. doi: 10.1016/j.devcel.2024.12.038. Epub 2025 Jan 15.

Active learning of enhancers and silencers in the developing neural retina.

Cell Syst. 2025 Jan 15;16(1):101163. doi: 10.1016/j.cels.2024.12.004. Epub 2025 Jan 7.

NaP-TRAP reveals the regulatory grammar in 5'UTR-mediated translation regulation during zebrafish development.

Nat Commun. 2024 Dec 30;15(1):10898. doi: 10.1038/s41467-024-55274-y.

Role of artificial intelligence in cancer detection using protein p53: A Review.

Mol Biol Rep. 2024 Dec 11;52(1):46. doi: 10.1007/s11033-024-10051-4.

本文引用的文献

A statistical framework for analyzing deep mutational scanning data.

Genome Biol. 2017 Aug 7;18(1):150. doi: 10.1186/s13059-017-1272-5.

PEDLA: predicting enhancers with a deep learning-based algorithmic framework.

Sci Rep. 2016 Jun 22;6:28517. doi: 10.1038/srep28517.

Protein Abundance Control by Non-coding Antisense Transcription.

Cell Rep. 2016 Jun 21;15(12):2625-36. doi: 10.1016/j.celrep.2016.05.043. Epub 2016 Jun 9.

Basset: learning the regulatory code of the accessible genome with deep convolutional neural networks.

Genome Res. 2016 Jul;26(7):990-9. doi: 10.1101/gr.200535.115. Epub 2016 May 3.

DanQ: a hybrid convolutional and recurrent deep neural network for quantifying the function of DNA sequences.

Nucleic Acids Res. 2016 Jun 20;44(11):e107. doi: 10.1093/nar/gkw226. Epub 2016 Apr 15.

Comprehensive Analysis of the SUL1 Promoter of Saccharomyces cerevisiae.

Genetics. 2016 May;203(1):191-202. doi: 10.1534/genetics.116.188037. Epub 2016 Mar 2.

Predicting DNA Methylation State of CpG Dinucleotide Using Genome Topological Features and Deep Networks.

Sci Rep. 2016 Jan 22;6:19598. doi: 10.1038/srep19598.

Learning the sequence determinants of alternative splicing from millions of random sequences.

Cell. 2015 Oct 22;163(3):698-711. doi: 10.1016/j.cell.2015.09.054.

Predicting effects of noncoding variants with deep learning-based sequence model.

Nat Methods. 2015 Oct;12(10):931-4. doi: 10.1038/nmeth.3547. Epub 2015 Aug 24.

Predicting the sequence specificities of DNA- and RNA-binding proteins by deep learning.

Nat Biotechnol. 2015 Aug;33(8):831-8. doi: 10.1038/nbt.3300. Epub 2015 Jul 27.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

从 500,000 个随机序列中深度学习酵母 5'非翻译区的调控语法。

Deep learning of the regulatory grammar of yeast 5' untranslated regions from 500,000 random sequences.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献