Niu Liang, Huang Weichun, Umbach David M, Li Leping
Biostatistics Branch, National Institute of Environmental Health Sciences, Research Triangle Park, NC 27709, USA.
BMC Genomics. 2014 Oct 6;15(1):862. doi: 10.1186/1471-2164-15-862.
Most genes in mammals generate several transcript isoforms that differ in stability and translational efficiency through alternative splicing. Such alternative splicing can be tissue- and developmental stage-specific, and such specificity is sometimes associated with disease. Thus, detecting differential isoform usage for a gene between tissues or cell lines/types (differences in the fraction of total expression of a gene represented by the expression of each of its isoforms) is potentially important for cell and developmental biology.
We present a new method IUTA that is designed to test each gene in the genome for differential isoform usage between two groups of samples. IUTA also estimates isoform usage for each gene in each sample as well as averaged across samples within each group. IUTA is the first method to formulate the testing problem as testing for equal means of two probability distributions under the Aitchison geometry, which is widely recognized as the most appropriate geometry for compositional data (vectors that contain the relative amount of each component comprising the whole). Evaluation using simulated data showed that IUTA was able to provide test results for many more genes than was Cuffdiff2 (version 2.2.0, released in Mar. 2014), and IUTA performed better than Cuffdiff2 for the limited number of genes that Cuffdiff2 did analyze. When applied to actual mouse RNA-Seq datasets from six tissues, IUTA identified 2,073 significant genes with clear patterns of differential isoform usage between a pair of tissues. IUTA is implemented as an R package and is available at http://www.niehs.nih.gov/research/resources/software/biostatistics/iuta/index.cfm.
Both simulation and real-data results suggest that IUTA accurately detects differential isoform usage. We believe that our analysis of RNA-seq data from six mouse tissues represents the first comprehensive characterization of isoform usage in these tissues. IUTA will be a valuable resource for those who study the roles of alternative transcripts in cell development and disease.
哺乳动物中的大多数基因通过可变剪接产生几种转录异构体,这些异构体在稳定性和翻译效率上存在差异。这种可变剪接可以是组织和发育阶段特异性的,并且这种特异性有时与疾病相关。因此,检测基因在不同组织或细胞系/类型之间的异构体使用差异(其每种异构体的表达在该基因总表达中所占比例的差异)对于细胞和发育生物学可能具有重要意义。
我们提出了一种新方法IUTA,该方法旨在测试基因组中的每个基因在两组样本之间的异构体使用差异。IUTA还估计每个样本中每个基因的异构体使用情况,以及每组内样本的平均值。IUTA是第一种将测试问题表述为在艾奇逊几何下对两个概率分布的均值进行相等性测试的方法,艾奇逊几何被广泛认为是用于成分数据(包含构成整体的每个成分相对量的向量)的最合适几何。使用模拟数据进行的评估表明,IUTA能够为比Cuffdiff2(2014年3月发布的2.2.0版本)更多的基因提供测试结果,并且对于Cuffdiff2实际分析的有限数量基因,IUTA的表现优于Cuffdiff2。当应用于来自六个组织的实际小鼠RNA测序数据集时,IUTA识别出2073个具有清晰异构体使用差异模式的显著基因,这些差异存在于一对组织之间。IUTA作为一个R包实现,可在http://www.niehs.nih.gov/research/resources/software/biostatistics/iuta/index.cfm获取。
模拟和实际数据结果均表明IUTA能够准确检测异构体使用差异。我们认为,我们对来自六个小鼠组织的RNA测序数据的分析代表了这些组织中异构体使用情况的首次全面表征。IUTA将成为研究可变转录本在细胞发育和疾病中作用的人员的宝贵资源。