Suppr超能文献

人类剪接多样性以及序列读取存档中人类RNA测序样本间未注释剪接位点的程度。

Human splicing diversity and the extent of unannotated splice junctions across human RNA-seq samples on the Sequence Read Archive.

作者信息

Nellore Abhinav, Jaffe Andrew E, Fortin Jean-Philippe, Alquicira-Hernández José, Collado-Torres Leonardo, Wang Siruo, Phillips Robert A, Karbhari Nishika, Hansen Kasper D, Langmead Ben, Leek Jeffrey T

机构信息

Department of Computer Science, Johns Hopkins University, Baltimore, MD, USA.

Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.

出版信息

Genome Biol. 2016 Dec 30;17(1):266. doi: 10.1186/s13059-016-1118-6.

Abstract

BACKGROUND

Gene annotations, such as those in GENCODE, are derived primarily from alignments of spliced cDNA sequences and protein sequences. The impact of RNA-seq data on annotation has been confined to major projects like ENCODE and Illumina Body Map 2.0.

RESULTS

We aligned 21,504 Illumina-sequenced human RNA-seq samples from the Sequence Read Archive (SRA) to the human genome and compared detected exon-exon junctions with junctions in several recent gene annotations. We found 56,861 junctions (18.6%) in at least 1000 samples that were not annotated, and their expression associated with tissue type. Junctions well expressed in individual samples tended to be annotated. Newer samples contributed few novel well-supported junctions, with the vast majority of detected junctions present in samples before 2013. We compiled junction data into a resource called intropolis available at http://intropolis.rail.bio . We used this resource to search for a recently validated isoform of the ALK gene and characterized the potential functional implications of unannotated junctions with publicly available TRAP-seq data.

CONCLUSIONS

Considering only the variation contained in annotation may suffice if an investigator is interested only in well-expressed transcript isoforms. However, genes that are not generally well expressed and nonetheless present in a small but significant number of samples in the SRA are likelier to be incompletely annotated. The rate at which evidence for novel junctions has been added to the SRA has tapered dramatically, even to the point of an asymptote. Now is perhaps an appropriate time to update incomplete annotations to include splicing present in the now-stable snapshot provided by the SRA.

摘要

背景

基因注释,如GENCODE中的注释,主要来源于剪接cDNA序列和蛋白质序列的比对。RNA测序数据对注释的影响仅限于诸如ENCODE和Illumina人体图谱2.0等大型项目。

结果

我们将来自序列读取存档(SRA)的21,504个人类RNA测序样本与人类基因组进行比对,并将检测到的外显子-外显子连接与最近几个基因注释中的连接进行比较。我们发现至少1000个样本中有56,861个连接(18.6%)未被注释,并且它们的表达与组织类型相关。在单个样本中表达良好的连接往往会被注释。较新的样本贡献的新的有充分支持的连接很少,绝大多数检测到的连接存在于2013年之前的样本中。我们将连接数据汇编成一个名为intropolis的资源,可在http://intropolis.rail.bio获取。我们利用这个资源搜索最近验证的ALK基因的一种异构体,并利用公开可用的TRAP测序数据表征未注释连接的潜在功能影响。

结论

如果研究人员只对表达良好的转录本异构体感兴趣,那么仅考虑注释中包含的变异可能就足够了。然而,那些通常表达不佳但在SRA中少量但显著数量的样本中存在的基因更有可能注释不完整。添加到SRA中的新连接证据的速率已经大幅下降,甚至到了渐近线的程度。现在可能是更新不完整注释以纳入SRA提供的当前稳定快照中存在的剪接的适当时机。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验