Grassi Elena, Mariella Elisa, Lembo Antonio, Molineris Ivan, Provero Paolo
Department of Molecular Biotechnology and Health Sciences, Molecular Biotechnology Center, Via Nizza 52, Torino, 10126, Italy.
Center for Translational Genomics and Bioinformatics, San Raffaele Scientific Institute, Via Olgettina 60, Milan, 20132, Italy.
BMC Bioinformatics. 2016 Oct 18;17(1):423. doi: 10.1186/s12859-016-1254-8.
Post-transcriptional regulation is a complex mechanism that plays a central role in defining multiple cellular identities starting from a common genome. Modifications in the length of 3'UTRs have been found to play an important role in this context, since alternative 3' UTRs could lead to differences for example in regulation by microRNAs and cellular localization of the transcripts thus altering their fate.
We propose a strategy to identify the genes undergoing regulation of 3' UTR length using RNA sequencing data obtained from standard libraries, thus widely applicable to data originally obtained to perform classical differential expression analyses. We decided to exploit previously annotated APA sites from public databases, in contrast with other approaches recently proposed in which the location of the APA site is inferred from the data together with the relative abundance of the isoforms. We demonstrate the reliability of our method by comparing it to the results of other microarray based or specific RNA-seq libraries methods and show that using APA sites databases results in higher sensitivity compared to de novo site prediction approach.
We implemented the algorithm in a Bioconductor package to facilitate its broad usage in the scientific community. The ability of this approach to detect shortening from libraries with a number of reads comparable to that needed for differential expression analyses makes it useful for investigating if alternative polyadenylation is relevant in a certain biological process without requiring specific experimental assays.
转录后调控是一种复杂的机制,在从共同基因组定义多种细胞身份方面发挥着核心作用。在这种情况下,已发现3'非翻译区(3'UTR)长度的改变起着重要作用,因为可变3'UTR可能导致例如在微小RNA调控和转录本的细胞定位方面的差异,从而改变它们的命运。
我们提出了一种策略,使用从标准文库获得的RNA测序数据来鉴定经历3'UTR长度调控的基因,因此广泛适用于最初为进行经典差异表达分析而获得的数据。与最近提出的其他方法相反,我们决定利用公共数据库中先前注释的聚腺苷酸化位点(APA),在其他方法中,APA位点的位置是从数据以及异构体的相对丰度中推断出来的。我们通过将我们的方法与其他基于微阵列或特定RNA测序文库方法的结果进行比较,证明了我们方法的可靠性,并表明与从头预测位点方法相比,使用APA位点数据库具有更高的灵敏度。
我们在一个生物导体包中实现了该算法,以促进其在科学界的广泛使用。这种方法能够从具有与差异表达分析所需读数数量相当的文库中检测到缩短,这使得它对于研究可变聚腺苷酸化在特定生物学过程中是否相关很有用,而无需特定的实验分析。