Kapusta Aurélie, Kronenberg Zev, Lynch Vincent J, Zhuo Xiaoyu, Ramsay LeeAnn, Bourque Guillaume, Yandell Mark, Feschotte Cédric
Department of Human Genetics, University of Utah School of Medicine, Salt Lake City, Utah, United States of America.
PLoS Genet. 2013 Apr;9(4):e1003470. doi: 10.1371/journal.pgen.1003470. Epub 2013 Apr 25.
Advances in vertebrate genomics have uncovered thousands of loci encoding long noncoding RNAs (lncRNAs). While progress has been made in elucidating the regulatory functions of lncRNAs, little is known about their origins and evolution. Here we explore the contribution of transposable elements (TEs) to the makeup and regulation of lncRNAs in human, mouse, and zebrafish. Surprisingly, TEs occur in more than two thirds of mature lncRNA transcripts and account for a substantial portion of total lncRNA sequence (~30% in human), whereas they seldom occur in protein-coding transcripts. While TEs contribute less to lncRNA exons than expected, several TE families are strongly enriched in lncRNAs. There is also substantial interspecific variation in the coverage and types of TEs embedded in lncRNAs, partially reflecting differences in the TE landscapes of the genomes surveyed. In human, TE sequences in lncRNAs evolve under greater evolutionary constraint than their non-TE sequences, than their intronic TEs, or than random DNA. Consistent with functional constraint, we found that TEs contribute signals essential for the biogenesis of many lncRNAs, including ~30,000 unique sites for transcription initiation, splicing, or polyadenylation in human. In addition, we identified ~35,000 TEs marked as open chromatin located within 10 kb upstream of lncRNA genes. The density of these marks in one cell type correlate with elevated expression of the downstream lncRNA in the same cell type, suggesting that these TEs contribute to cis-regulation. These global trends are recapitulated in several lncRNAs with established functions. Finally a subset of TEs embedded in lncRNAs are subject to RNA editing and predicted to form secondary structures likely important for function. In conclusion, TEs are nearly ubiquitous in lncRNAs and have played an important role in the lineage-specific diversification of vertebrate lncRNA repertoires.
脊椎动物基因组学的进展揭示了数千个编码长链非编码RNA(lncRNA)的基因座。虽然在阐明lncRNA的调控功能方面取得了进展,但对它们的起源和进化却知之甚少。在这里,我们探讨了转座元件(TE)对人类、小鼠和斑马鱼lncRNA组成和调控的贡献。令人惊讶的是,TE存在于超过三分之二的成熟lncRNA转录本中,占lncRNA总序列的很大一部分(人类中约为30%),而它们很少出现在蛋白质编码转录本中。虽然TE对lncRNA外显子的贡献比预期的要少,但有几个TE家族在lncRNA中高度富集。lncRNA中嵌入的TE的覆盖范围和类型也存在很大的种间差异,部分反映了所研究基因组的TE景观差异。在人类中,lncRNA中的TE序列比其非TE序列、内含子TE或随机DNA受到更大的进化限制。与功能限制一致,我们发现TE为许多lncRNA的生物合成贡献了必不可少的信号,包括人类中约30000个转录起始、剪接或聚腺苷酸化的独特位点。此外,我们在lncRNA基因上游10 kb范围内鉴定出约35000个标记为开放染色质的TE。这些标记在一种细胞类型中的密度与同一细胞类型中下游lncRNA的表达升高相关,表明这些TE有助于顺式调控。这些全局趋势在几个具有既定功能的lncRNA中得到了重现。最后,嵌入lncRNA中的一部分TE会发生RNA编辑,并预计会形成可能对功能很重要的二级结构。总之,TE在lncRNA中几乎无处不在,并在脊椎动物lncRNA库的谱系特异性多样化中发挥了重要作用。