Shearman Jeremy R, Pootakham Wirulda, Sonthirod Chutima, Naktang Chaiwat, Yoocha Thippawan, Sangsrakru Duangjai, Jomchai Nukoon, Tongsima Sissades, Piriyapongsa Jittima, Ngamphiw Chumpol, Wanasen Nanchaya, Ukoskit Kittipat, Punpee Prapat, Klomsa-Ard Peeraya, Sriroth Klanarong, Zhang Jisen, Zhang Xingtan, Ming Ray, Tragoonrung Somvong, Tangphatsornruang Sithichoke
National Omics Center, National Science and Technology Development Agency, Pathum Thani, Thailand.
National Biobank of Thailand, National Science and Technology Development Agency, Pathum Thani, Thailand.
Sci Rep. 2022 Nov 28;12(1):20474. doi: 10.1038/s41598-022-24823-0.
Sugarcane accounts for a large portion of the worlds sugar production. Modern commercial cultivars are complex hybrids of S. officinarum, S. spontaneum, and several other Saccharum species, resulting in an auto-allopolyploid with 8-12 copies of each chromosome. The current genome assembly gold standard is to generate a long read assembly followed by chromatin conformation capture sequencing to scaffold. We used the PacBio RSII and chromatin conformation capture sequencing to sequence and assemble the genome of a South East Asian commercial sugarcane cultivar, known as Khon Kaen 3. The Khon Kaen 3 genome assembled into 104,477 contigs totalling 7 Gb, which scaffolded into 56 pseudochromosomes containing 5.2 Gb of sequence. Genome annotation produced 242,406 genes from 30,927 orthogroups. Aligning the Khon Kaen 3 genome sequence to S. officinarum and S. spontaneum revealed a high level of apparent recombination, indicating a chimeric assembly. This assembly error is explained by high nucleotide identity between S. officinarum and S. spontaneum, where 91.8% of S. spontaneum aligns to S. officinarum at 94% identity. Thus, the subgenomes of commercial sugarcane are so similar that using short reads to correct long PacBio reads produced chimeric long reads. Future attempts to sequence sugarcane must take this information into account.
甘蔗占世界食糖产量的很大一部分。现代商业品种是热带种、割手密和其他几个甘蔗属物种的复杂杂交种,形成了一种每条染色体有8 - 12个拷贝的同源异源多倍体。当前基因组组装的金标准是先生成一个长读长组装,然后进行染色质构象捕获测序以构建支架。我们使用PacBio RSII和染色质构象捕获测序对一个东南亚商业甘蔗品种孔敬3号的基因组进行测序和组装。孔敬3号基因组组装成104,477个重叠群,总计7Gb,构建成56条假染色体,包含5.2Gb的序列。基因组注释从30,927个直系同源组中产生了242,406个基因。将孔敬3号基因组序列与热带种和割手密进行比对,发现有高水平的明显重组,表明是嵌合组装。这种组装错误是由于热带种和割手密之间的高核苷酸同一性造成的,其中91.8%的割手密以94%的同一性与热带种比对。因此,商业甘蔗的亚基因组非常相似,以至于使用短读长来校正长PacBio读长会产生嵌合长读长。未来对甘蔗进行测序的尝试必须考虑到这一信息。