Suppr超能文献

仅从基因组数据估算基因表达、密码子特异性翻译效率、突变偏好和选择系数。

Estimating Gene Expression and Codon-Specific Translational Efficiencies, Mutation Biases, and Selection Coefficients from Genomic Data Alone.

作者信息

Gilchrist Michael A, Chen Wei-Chen, Shah Premal, Landerer Cedric L, Zaretzki Russell

机构信息

Department of Ecology & Evolutionary Biology, University of Tennessee, Knoxville National Institute for Mathematical and Biological Synthesis, Knoxville, Tennessee

Department of Ecology & Evolutionary Biology, University of Tennessee, Knoxville Center for Biologics Evaluation and Research, Food and Drug Administration, Silver Spring, Maryland.

出版信息

Genome Biol Evol. 2015 May 14;7(6):1559-79. doi: 10.1093/gbe/evv087.

Abstract

Extracting biologically meaningful information from the continuing flood of genomic data is a major challenge in the life sciences. Codon usage bias (CUB) is a general feature of most genomes and is thought to reflect the effects of both natural selection for efficient translation and mutation bias. Here we present a mechanistically interpretable, Bayesian model (ribosome overhead costs Stochastic Evolutionary Model of Protein Production Rate [ROC SEMPPR]) to extract meaningful information from patterns of CUB within a genome. ROC SEMPPR is grounded in population genetics and allows us to separate the contributions of mutational biases and natural selection against translational inefficiency on a gene-by-gene and codon-by-codon basis. Until now, the primary disadvantage of similar approaches was the need for genome scale measurements of gene expression. Here, we demonstrate that it is possible to both extract accurate estimates of codon-specific mutation biases and translational efficiencies while simultaneously generating accurate estimates of gene expression, rather than requiring such information. We demonstrate the utility of ROC SEMPPR using the Saccharomyces cerevisiae S288c genome. When we compare our model fits with previous approaches we observe an exceptionally high agreement between estimates of both codon-specific parameters and gene expression levels ([Formula: see text] in all cases). We also observe strong agreement between our parameter estimates and those derived from alternative data sets. For example, our estimates of mutation bias and those from mutational accumulation experiments are highly correlated ([Formula: see text]). Our estimates of codon-specific translational inefficiencies and tRNA copy number-based estimates of ribosome pausing time ([Formula: see text]), and mRNA and ribosome profiling footprint-based estimates of gene expression ([Formula: see text]) are also highly correlated, thus supporting the hypothesis that selection against translational inefficiency is an important force driving the evolution of CUB. Surprisingly, we find that for particular amino acids, codon usage in highly expressed genes can still be largely driven by mutation bias and that failing to take mutation bias into account can lead to the misidentification of an amino acid's "optimal" codon. In conclusion, our method demonstrates that an enormous amount of biologically important information is encoded within genome scale patterns of codon usage, accessing this information does not require gene expression measurements, but instead carefully formulated biologically interpretable models.

摘要

从源源不断的基因组数据中提取具有生物学意义的信息是生命科学领域的一项重大挑战。密码子使用偏好(CUB)是大多数基因组的一个普遍特征,被认为反映了高效翻译的自然选择和突变偏好的影响。在此,我们提出了一个具有机械可解释性的贝叶斯模型(核糖体额外成本蛋白质生产率随机进化模型[ROC SEMPPR]),以从基因组内的CUB模式中提取有意义的信息。ROC SEMPPR基于群体遗传学,使我们能够在逐个基因和逐个密码子的基础上,区分突变偏好和针对翻译低效的自然选择的贡献。到目前为止,类似方法的主要缺点是需要对基因表达进行全基因组规模的测量。在此,我们证明,有可能在生成基因表达的准确估计值的同时,提取密码子特异性突变偏好和翻译效率的准确估计值,而不是需要此类信息。我们使用酿酒酵母S288c基因组证明了ROC SEMPPR的实用性。当我们将我们的模型拟合与先前的方法进行比较时,我们发现在所有情况下,密码子特异性参数估计值和基因表达水平之间都有非常高的一致性([公式:见正文])。我们还观察到我们的参数估计值与从替代数据集得出的估计值之间有很强的一致性。例如,我们的突变偏好估计值与突变积累实验得出的估计值高度相关([公式:见正文])。我们的密码子特异性翻译低效估计值与基于tRNA拷贝数的核糖体暂停时间估计值([公式:见正文]),以及基于mRNA和核糖体谱足迹的基因表达估计值([公式:见正文])也高度相关,从而支持了针对翻译低效的选择是驱动CUB进化的重要力量这一假设。令人惊讶的是,我们发现对于特定氨基酸,高表达基因中的密码子使用在很大程度上仍可能由突变偏好驱动,并且不考虑突变偏好可能导致对氨基酸“最优”密码子的错误识别。总之,我们的方法表明,基因组规模的密码子使用模式中编码了大量生物学上重要的信息,获取这些信息不需要基因表达测量,而是需要精心构建的具有生物学可解释性的模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc04/4494061/5275d85bcf71/evv087f1p.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验