Gan Ruei-Chi, Chen Ting-Wen, Wu Timothy H, Huang Po-Jung, Lee Chi-Ching, Yeh Yuan-Ming, Chiu Cheng-Hsun, Huang Hsien-Da, Tang Petrus
Department of Biological Science and Technology, National Chiao Tung University, Hsin-Chu, 300, Taiwan.
Bioinformatics Center, Molecular Medicine Research Center, Chang Gung University, Taoyuan, Taiwan.
BMC Bioinformatics. 2016 Dec 22;17(Suppl 19):513. doi: 10.1186/s12859-016-1366-1.
Next-generation sequencing promises the de novo genomic and transcriptomic analysis of samples of interests. However, there are only a few organisms having reference genomic sequences and even fewer having well-defined or curated annotations. For transcriptome studies focusing on organisms lacking proper reference genomes, the common strategy is de novo assembly followed by functional annotation. However, things become even more complicated when multiple transcriptomes are compared.
Here, we propose a new analysis strategy and quantification methods for quantifying expression level which not only generate a virtual reference from sequencing data, but also provide comparisons between transcriptomes. First, all reads from the transcriptome datasets are pooled together for de novo assembly. The assembled contigs are searched against NCBI NR databases to find potential homolog sequences. Based on the searched result, a set of virtual transcripts are generated and served as a reference transcriptome. By using the same reference, normalized quantification values including RC (read counts), eRPKM (estimated RPKM) and eTPM (estimated TPM) can be obtained that are comparable across transcriptome datasets. In order to demonstrate the feasibility of our strategy, we implement it in the web service PARRoT. PARRoT stands for Pipeline for Analyzing RNA Reads of Transcriptomes. It analyzes gene expression profiles for two transcriptome sequencing datasets. For better understanding of the biological meaning from the comparison among transcriptomes, PARRoT further provides linkage between these virtual transcripts and their potential function through showing best hits in SwissProt, NR database, assigning GO terms. Our demo datasets showed that PARRoT can analyze two paired-end transcriptomic datasets of approximately 100 million reads within just three hours.
In this study, we proposed and implemented a strategy to analyze transcriptomes from non-reference organisms which offers the opportunity to quantify and compare transcriptome profiles through a homolog based virtual transcriptome reference. By using the homolog based reference, our strategy effectively avoids the problems that may cause from inconsistencies among transcriptomes. This strategy will shed lights on the field of comparative genomics for non-model organism. We have implemented PARRoT as a web service which is freely available at http://parrot.cgu.edu.tw .
新一代测序技术有望对感兴趣的样本进行从头基因组和转录组分析。然而,只有少数生物拥有参考基因组序列,拥有明确或经过整理注释的生物更少。对于专注于缺乏合适参考基因组的生物的转录组研究,常见策略是从头组装,然后进行功能注释。然而,当比较多个转录组时,情况会变得更加复杂。
在此,我们提出了一种新的分析策略和定量方法来量化表达水平,该方法不仅能从测序数据生成虚拟参考,还能提供转录组之间的比较。首先,将转录组数据集中的所有读段汇集在一起进行从头组装。将组装好的重叠群与NCBI NR数据库进行比对,以找到潜在的同源序列。基于搜索结果,生成一组虚拟转录本并用作参考转录组。通过使用相同的参考,可以获得包括RC(读段计数)、eRPKM(估计RPKM)和eTPM(估计TPM)在内的标准化定量值,这些值在转录组数据集之间具有可比性。为了证明我们策略的可行性,我们在网络服务PARRoT中实现了它。PARRoT代表转录组RNA读段分析管道。它分析两个转录组测序数据集的基因表达谱。为了更好地理解转录组之间比较的生物学意义,PARRoT通过在SwissProt、NR数据库中显示最佳匹配、分配GO术语,进一步提供这些虚拟转录本与其潜在功能之间的联系。我们的演示数据集表明,PARRoT可以在短短三小时内分析两个约1亿读段的双端转录组数据集。
在本研究中,我们提出并实施了一种分析非参考生物转录组的策略,该策略提供了通过基于同源物的虚拟转录组参考来量化和比较转录组谱的机会。通过使用基于同源物的参考,我们的策略有效地避免了转录组之间不一致可能导致的问题。该策略将为非模式生物的比较基因组学领域带来启示。我们已将PARRoT实现为一个网络服务,可在http://parrot.cgu.edu.tw免费获取。