Weber Andreas P M, Weber Katrin L, Carr Kevin, Wilkerson Curtis, Ohlrogge John B
Department of Plant Biology, Michigan State University, East Lansing, MI 48824-1312, USA.
Plant Physiol. 2007 May;144(1):32-42. doi: 10.1104/pp.107.096677. Epub 2007 Mar 9.
Massively parallel sequencing of DNA by pyrosequencing technology offers much higher throughput and lower cost than conventional Sanger sequencing. Although extensively used already for sequencing of genomes, relatively few applications of massively parallel pyrosequencing to transcriptome analysis have been reported. To test the ability of this technology to provide unbiased representation of transcripts, we analyzed mRNA from Arabidopsis (Arabidopsis thaliana) seedlings. Two sequencing runs yielded 541,852 expressed sequence tags (ESTs) after quality control. Mapping of the ESTs to the Arabidopsis genome and to The Arabidopsis Information Resource 7.0 cDNA models indicated: (1) massively parallel pyrosequencing detected transcription of 17,449 gene loci providing very deep coverage of the transcriptome. Performing a second sequencing run only increased the number of genes identified by 10%, but increased the overall sequence coverage by 50%. (2) Mapping of the ESTs to their predicted full-length transcripts indicated that all regions of the transcript were well represented regardless of transcript length or expression level. Furthermore, short, medium, and long transcripts were equally represented. (3) Over 16,000 of the ESTs that mapped to the genome were not represented in the existing dbEST database. In some cases, the ESTs provide the first experimental evidence for transcripts derived from predicted genes, and, for at least 60 locations in the genome, pyrosequencing identified likely protein-coding sequences that are not now annotated as genes. Together, the results indicate massively parallel pyrosequencing provides novel information helpful to improve the annotation of the Arabidopsis genome. Furthermore, the unbiased representation of transcripts will be particularly useful for gene discovery and gene expression analysis of nonmodel plants with less complete genomic information. EST sequence accession numbers in GenBank are EH 795234 through EH 995233 and EL 000001 through EL 341852.
与传统的桑格测序法相比,焦磷酸测序技术对DNA进行大规模平行测序可提供更高的通量和更低的成本。尽管该技术已广泛用于基因组测序,但将大规模平行焦磷酸测序应用于转录组分析的报道相对较少。为了测试该技术提供转录本无偏差表征的能力,我们分析了拟南芥幼苗的mRNA。经过质量控制后,两次测序运行产生了541,852个表达序列标签(EST)。将这些EST定位到拟南芥基因组和拟南芥信息资源7.0 cDNA模型表明:(1)大规模平行焦磷酸测序检测到17,449个基因座的转录,为转录组提供了非常深入的覆盖。进行第二次测序运行仅使鉴定出的基因数量增加了10%,但总体序列覆盖率增加了50%。(2)将EST定位到其预测的全长转录本表明,无论转录本长度或表达水平如何,转录本的所有区域均得到了很好的表征。此外,短、中、长转录本的表征均等。(3)超过16,000个定位到基因组的EST在现有的dbEST数据库中未出现。在某些情况下,这些EST为源自预测基因的转录本提供了首个实验证据,并且对于基因组中的至少60个位置,焦磷酸测序鉴定出了目前未注释为基因的可能的蛋白质编码序列。总之,结果表明大规模平行焦磷酸测序提供了有助于改善拟南芥基因组注释的新信息。此外,转录本的无偏差表征对于基因组信息不太完整的非模式植物的基因发现和基因表达分析将特别有用。GenBank中的EST序列登录号为EH 795234至EH 995233以及EL 000001至EL 341852。