Suppr超能文献

利用单分子长读长测序揭示甘蔗品种孔敬3号的全长转录本异构体

Uncovering full-length transcript isoforms of sugarcane cultivar Khon Kaen 3 using single-molecule long-read sequencing.

作者信息

Piriyapongsa Jittima, Kaewprommal Pavita, Vaiwsri Sirintra, Anuntakarun Songtham, Wirojsirasak Warodom, Punpee Prapat, Klomsa-Ard Peeraya, Shaw Philip J, Pootakham Wirulda, Yoocha Thippawan, Sangsrakru Duangjai, Tangphatsornruang Sithichoke, Tongsima Sissades, Tragoonrung Somvong

机构信息

National Center for Genetic Engineering and Biotechnology (BIOTEC), National Science and Technology Development Agency (NSTDA), Pathum Thani, Thailand.

Mitr Phol Sugarcane Research Center Co., Ltd., Chaiyaphum, Thailand.

出版信息

PeerJ. 2018 Oct 30;6:e5818. doi: 10.7717/peerj.5818. eCollection 2018.

Abstract

BACKGROUND

Sugarcane is an important global food crop and energy resource. To facilitate the sugarcane improvement program, genome and gene information are important for studying traits at the molecular level. Most currently available transcriptome data for sugarcane were generated using second-generation sequencing platforms, which provide short reads. The assembled transcripts from these data are limited in length, and hence may be incomplete and inaccurate, especially for long RNAs.

METHODS

We generated a transcriptome dataset of leaf tissue from a commercial Thai sugarcane cultivar Khon Kaen 3 (KK3) using PacBio RS II single-molecule long-read sequencing by the Iso-Seq method. Short-read RNA-Seq data were generated from the same RNA sample using the Ion Proton platform for reducing base calling errors.

RESULTS

A total of 119,339 error-corrected transcripts were generated with the N50 length of 3,611 bp, which is on average longer than any previously reported sugarcane transcriptome dataset. 110,253 sequences (92.4%) contain an open reading frame (ORF) of at least 300 bp long with ORF N50 of 1,416 bp. The mean lengths of 5' and 3' untranslated regions in 73,795 sequences with complete ORFs are 1,249 and 1,187 bp, respectively. 4,774 transcripts are putatively novel full-length transcripts which do not match with a previous Iso-Seq study of sugarcane. We annotated the functions of 68,962 putative full-length transcripts with at least 90% coverage when compared with homologous protein coding sequences in other plants.

DISCUSSION

The new catalog of transcripts will be useful for genome annotation, identification of splicing variants, SNP identification, and other research pertaining to the sugarcane improvement program. The putatively novel transcripts suggest unique features of KK3, although more data from different tissues and stages of development are needed to establish a reference transcriptome of this cultivar.

摘要

背景

甘蔗是一种重要的全球粮食作物和能源资源。为推动甘蔗改良计划,基因组和基因信息对于在分子水平研究性状至关重要。目前大多数可用的甘蔗转录组数据是使用第二代测序平台生成的,这些平台提供短读长。从这些数据组装的转录本长度有限,因此可能不完整且不准确,尤其是对于长RNA。

方法

我们使用Iso-Seq方法通过PacBio RS II单分子长读长测序生成了泰国商业甘蔗品种孔敬3号(KK3)叶片组织的转录组数据集。使用Ion Proton平台从相同RNA样本生成短读长RNA-Seq数据,以减少碱基识别错误。

结果

共生成了119,339个纠错转录本,N50长度为3,611 bp,平均比之前报道的任何甘蔗转录组数据集都长。110,253个序列(92.4%)包含至少300 bp长的开放阅读框(ORF),ORF N50为1,416 bp。73,795个具有完整ORF的序列中5'和3'非翻译区的平均长度分别为1,249和1,187 bp。4,774个转录本被推测为新的全长转录本,与之前甘蔗的Iso-Seq研究不匹配。当与其他植物中的同源蛋白质编码序列相比时,我们注释了68,962个推测的全长转录本的功能,覆盖率至少为90%。

讨论

新的转录本目录将有助于基因组注释、剪接变体鉴定、SNP鉴定以及与甘蔗改良计划相关的其他研究。推测的新转录本表明了KK3的独特特征,尽管需要来自不同组织和发育阶段的更多数据来建立该品种的参考转录组。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验