Institute for Evolution and Biodiversity, University of Münster, Münster, Germany.
Department of Protein Evolution, Max Planck Institute for Biology Tübingen, Tübingen, Germany.
Mol Biol Evol. 2023 Apr 4;40(4). doi: 10.1093/molbev/msad079.
New protein coding genes can emerge from genomic regions that previously did not contain any genes, via a process called de novo gene emergence. To synthesize a protein, DNA must be transcribed as well as translated. Both processes need certain DNA sequence features. Stable transcription requires promoters and a polyadenylation signal, while translation requires at least an open reading frame. We develop mathematical models based on mutation probabilities, and the assumption of neutral evolution, to find out how quickly genes emerge and are lost. We also investigate the effect of the order by which DNA features evolve, and if sequence composition is biased by mutation rate. We rationalize how genes are lost much more rapidly than they emerge, and how they preferentially arise in regions that are already transcribed. Our study not only answers some fundamental questions on the topic of de novo emergence but also provides a modeling framework for future studies.
新的蛋白质编码基因可以从以前不含任何基因的基因组区域通过称为从头基因出现的过程产生。为了合成蛋白质,DNA 必须被转录和翻译。这两个过程都需要特定的 DNA 序列特征。稳定的转录需要启动子和聚腺苷酸化信号,而翻译至少需要开放阅读框。我们基于突变概率和中性进化的假设开发数学模型,以找出基因出现和丢失的速度有多快。我们还研究了 DNA 特征进化顺序的影响,以及序列组成是否受到突变率的影响。我们解释了为什么基因的丢失速度远快于其出现速度,以及它们为什么优先在已经转录的区域出现。我们的研究不仅回答了关于从头出现的一些基本问题,还为未来的研究提供了建模框架。