Department of Pharmacology, Feinberg School of Medicine, Northwestern University, Chicago, IL, 60611, USA.
Department of Biomedical Engineering, McCormick School of Engineering, Northwestern University, Evanston, IL, 60628, USA.
Nat Commun. 2023 Nov 15;14(1):7378. doi: 10.1038/s41467-023-43266-3.
The genomic distribution of cleavage and polyadenylation (polyA) sites should be co-evolutionally optimized with the local gene structure. Otherwise, spurious polyadenylation can cause premature transcription termination and generate aberrant proteins. To obtain mechanistic insights into polyA site optimization across the human genome, we develop deep/machine learning models to identify genome-wide putative polyA sites at unprecedented nucleotide-level resolution and calculate their strength and usage in the genomic context. Our models quantitatively measure position-specific motif importance and their crosstalk in polyA site formation and cleavage heterogeneity. The intronic site expression is governed by the surrounding splicing landscape. The usage of alternative polyA sites in terminal exons is modulated by their relative locations and distance to downstream genes. Finally, we apply our models to reveal thousands of disease- and trait-associated genetic variants altering polyadenylation activity. Altogether, our models represent a valuable resource to dissect molecular mechanisms mediating genome-wide polyA site expression and characterize their functional roles in human diseases.
切割和多聚腺苷酸化 (polyA) 位点的基因组分布应该与局部基因结构进行共同进化优化。否则,异常的 polyA 加尾会导致转录过早终止,并产生异常的蛋白质。为了深入了解人类基因组中 polyA 位点的优化机制,我们开发了深度学习/机器学习模型,以前所未有的核苷酸分辨率识别全基因组潜在的 polyA 位点,并计算它们在基因组背景下的强度和使用情况。我们的模型定量测量了 polyA 位点形成和切割异质性中特定位置的基序重要性及其串扰。内含子位点的表达受周围剪接景观的控制。末端外显子中替代 polyA 位点的使用受其相对位置和与下游基因的距离调节。最后,我们应用我们的模型来揭示数千个与疾病和特征相关的遗传变异,这些变异改变了 polyA 加尾活性。总之,我们的模型代表了一种有价值的资源,可以剖析介导全基因组 polyA 位点表达的分子机制,并描述它们在人类疾病中的功能作用。