Amin Shorash, Prentis Peter J, Gilding Edward K, Pavasovic Ana
School of Biomedical Sciences, Faculty of Health, Queensland University of Technology, GPO Box 2434, Brisbane, Qld 4001, Australia.
BMC Res Notes. 2014 Aug 1;7:488. doi: 10.1186/1756-0500-7-488.
The sequencing, de novo assembly and annotation of transcriptome datasets generated with next generation sequencing (NGS) has enabled biologists to answer genomic questions in non-model species with unprecedented ease. Reliable and accurate de novo assembly and annotation of transcriptomes, however, is a critically important step for transcriptome assemblies generated from short read sequences. Typical benchmarks for assembly and annotation reliability have been performed with model species. To address the reliability and accuracy of de novo transcriptome assembly in non-model species, we generated an RNAseq dataset for an intertidal gastropod mollusc species, Nerita melanotragus, and compared the assembly produced by four different de novo transcriptome assemblers; Velvet, Oases, Geneious and Trinity, for a number of quality metrics and redundancy.
Transcriptome sequencing on the Ion Torrent PGM™ produced 1,883,624 raw reads with a mean length of 133 base pairs (bp). Both the Trinity and Oases de novo assemblers produced the best assemblies based on all quality metrics including fewer contigs, increased N50 and average contig length and contigs of greater length. Overall the BLAST and annotation success of our assemblies was not high with only 15-19% of contigs assigned a putative function.
We believe that any improvement in annotation success of gastropod species will require more gastropod genome sequences, but in particular an increase in mollusc protein sequences in public databases. Overall, this paper demonstrates that reliable and accurate de novo transcriptome assemblies can be generated from short read sequencers with the right assembly algorithms.
利用新一代测序(NGS)生成的转录组数据集进行测序、从头组装和注释,使生物学家能够以前所未有的轻松方式回答非模式物种中的基因组问题。然而,对于从短读长序列生成的转录组组装而言,可靠且准确的从头组装和注释是至关重要的一步。典型的组装和注释可靠性基准测试是在模式物种上进行的。为了评估非模式物种中从头转录组组装的可靠性和准确性,我们为一种潮间带腹足纲软体动物黑凹螺(Nerita melanotragus)生成了一个RNAseq数据集,并比较了四种不同的从头转录组组装程序(Velvet、Oases、Geneious和Trinity)产生的组装结果,涉及多个质量指标和冗余情况。
在Ion Torrent PGM™上进行的转录组测序产生了1,883,624条原始读段,平均长度为133个碱基对(bp)。基于所有质量指标,包括更少的重叠群、增加的N50和平均重叠群长度以及更长的重叠群,Trinity和Oases从头组装程序都产生了最佳组装结果。总体而言,我们组装结果的BLAST和注释成功率不高,只有15 - 19%的重叠群被赋予了推定功能。
我们认为,腹足纲物种注释成功率的任何提高都将需要更多的腹足纲基因组序列,特别是公共数据库中软体动物蛋白质序列的增加。总体而言,本文表明,使用合适的组装算法,可以从短读长测序仪生成可靠且准确的从头转录组组装。