Suppr超能文献

将 RT-PCR-seq 和 RNA-seq 相结合,对人类基因组中所有编码基因元件进行编目。

Combining RT-PCR-seq and RNA-seq to catalog all genic elements encoded in the human genome.

机构信息

Center for Integrative Genomics, University of Lausanne, 1015 Lausanne, Switzerland.

出版信息

Genome Res. 2012 Sep;22(9):1698-710. doi: 10.1101/gr.134478.111.

Abstract

Within the ENCODE Consortium, GENCODE aimed to accurately annotate all protein-coding genes, pseudogenes, and noncoding transcribed loci in the human genome through manual curation and computational methods. Annotated transcript structures were assessed, and less well-supported loci were systematically, experimentally validated. Predicted exon-exon junctions were evaluated by RT-PCR amplification followed by highly multiplexed sequencing readout, a method we called RT-PCR-seq. Seventy-nine percent of all assessed junctions are confirmed by this evaluation procedure, demonstrating the high quality of the GENCODE gene set. RT-PCR-seq was also efficient to screen gene models predicted using the Human Body Map (HBM) RNA-seq data. We validated 73% of these predictions, thus confirming 1168 novel genes, mostly noncoding, which will further complement the GENCODE annotation. Our novel experimental validation pipeline is extremely sensitive, far more than unbiased transcriptome profiling through RNA sequencing, which is becoming the norm. For example, exon-exon junctions unique to GENCODE annotated transcripts are five times more likely to be corroborated with our targeted approach than with extensive large human transcriptome profiling. Data sets such as the HBM and ENCODE RNA-seq data fail sampling of low-expressed transcripts. Our RT-PCR-seq targeted approach also has the advantage of identifying novel exons of known genes, as we discovered unannotated exons in ~11% of assessed introns. We thus estimate that at least 18% of known loci have yet-unannotated exons. Our work demonstrates that the cataloging of all of the genic elements encoded in the human genome will necessitate a coordinated effort between unbiased and targeted approaches, like RNA-seq and RT-PCR-seq.

摘要

在 ENCODE 联盟中,GENCODE 旨在通过人工注释和计算方法准确注释人类基因组中的所有蛋白编码基因、假基因和非转录基因座。评估了注释的转录结构,并通过系统的、实验验证的方法对支持度较低的基因座进行了验证。通过 RT-PCR 扩增和高度多重测序读取来评估预测的外显子-内含子连接,我们称之为 RT-PCR-seq。通过这种评估程序,验证了所有评估连接的 79%,这证明了 GENCODE 基因集的高质量。RT-PCR-seq 也可有效地筛选使用人体图谱(HBM)RNA-seq 数据预测的基因模型。我们验证了这些预测中的 73%,从而确认了 1168 个新基因,主要是非编码基因,这将进一步补充 GENCODE 的注释。我们的新实验验证管道非常敏感,远远超过通过 RNA 测序进行无偏转录组分析,这已成为规范。例如,GENCODE 注释转录本特有的外显子-内含子连接,通过我们的靶向方法得到证实的可能性是通过广泛的人类转录组分析得到证实的可能性的五倍。HBM 和 ENCODE RNA-seq 等数据集未能对低表达转录本进行采样。我们的 RT-PCR-seq 靶向方法还有一个优点,即可以识别已知基因的新外显子,因为我们在大约 11%的评估内含子中发现了未注释的外显子。因此,我们估计至少有 18%的已知基因座有尚未注释的外显子。我们的工作表明,要对人类基因组中编码的所有基因元件进行编目,将需要无偏和靶向方法(如 RNA-seq 和 RT-PCR-seq)之间的协调努力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/064d/3431487/df3563f31f50/1698fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验