Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge, Massachusetts 02142, USA.
Genes Dev. 2011 Sep 15;25(18):1915-27. doi: 10.1101/gad.17446611. Epub 2011 Sep 2.
Large intergenic noncoding RNAs (lincRNAs) are emerging as key regulators of diverse cellular processes. Determining the function of individual lincRNAs remains a challenge. Recent advances in RNA sequencing (RNA-seq) and computational methods allow for an unprecedented analysis of such transcripts. Here, we present an integrative approach to define a reference catalog of >8000 human lincRNAs. Our catalog unifies previously existing annotation sources with transcripts we assembled from RNA-seq data collected from ∼4 billion RNA-seq reads across 24 tissues and cell types. We characterize each lincRNA by a panorama of >30 properties, including sequence, structural, transcriptional, and orthology features. We found that lincRNA expression is strikingly tissue-specific compared with coding genes, and that lincRNAs are typically coexpressed with their neighboring genes, albeit to an extent similar to that of pairs of neighboring protein-coding genes. We distinguish an additional subset of transcripts that have high evolutionary conservation but may include short ORFs and may serve as either lincRNAs or small peptides. Our integrated, comprehensive, yet conservative reference catalog of human lincRNAs reveals the global properties of lincRNAs and will facilitate experimental studies and further functional classification of these genes.
长链非编码 RNA(lincRNAs)作为多种细胞过程的关键调控因子而逐渐受到关注。确定单个 lincRNA 的功能仍然是一个挑战。RNA 测序(RNA-seq)和计算方法的最新进展使得对这些转录本进行前所未有的分析成为可能。在这里,我们提出了一种综合方法来定义 >8000 个人类 lincRNA 的参考目录。我们的目录将以前存在的注释源与我们从来自 24 种组织和细胞类型的约 40 亿个 RNA-seq 读段中组装的转录本统一起来。我们通过 >30 种特性的全景图来描述每个 lincRNA,包括序列、结构、转录和同源性特征。我们发现,与编码基因相比,lincRNA 的表达具有惊人的组织特异性,并且 lincRNA 通常与它们相邻的基因共表达,尽管与相邻的蛋白质编码基因对的程度相似。我们区分出另外一组具有高进化保守性的转录本,它们可能包含短的开放阅读框,并可能作为 lincRNA 或小肽发挥作用。我们综合、全面但保守的人类 lincRNA 参考目录揭示了 lincRNA 的全局特性,并将有助于这些基因的实验研究和进一步的功能分类。