Trujillo-Montenegro Jhon Henry, Rodríguez Cubillos María Juliana, Loaiza Cristian Darío, Quintero Manuel, Espitia-Navarro Héctor Fabio, Salazar Villareal Fredy Antonio, Viveros Valens Carlos Arturo, González Barrios Andrés Fernando, De Vega José, Duitama Jorge, Riascos John J
Centro de Investigación de la Caña de Azúcar de Colombia (CENICAÑA), Cali, Colombia.
Research Group in Bioinformatics, Department of Computer Science, Faculty of Engineering, Universidad Del Valle,Cali, Colombia.
Front Plant Sci. 2021 Aug 13;12:694859. doi: 10.3389/fpls.2021.694859. eCollection 2021.
Recent developments in High Throughput Sequencing (HTS) technologies and bioinformatics, including improved read lengths and genome assemblers allow the reconstruction of complex genomes with unprecedented quality and contiguity. Sugarcane has one of the most complicated genomes among grassess with a haploid length of 1Gbp and a ploidies between 8 and 12. In this work, we present a genome assembly of the Colombian sugarcane hybrid CC 01-1940. Three types of sequencing technologies were combined for this assembly: PacBio long reads, Illumina paired short reads, and Hi-C reads. We achieved a median contig length of 34.94 Mbp and a total genome assembly of 903.2 Mbp. We annotated a total of 63,724 protein coding genes and performed a reconstruction and comparative analysis of the sucrose metabolism pathway. Nucleotide evolution measurements between orthologs with close species suggest that divergence between and occurred <2 million years ago. Synteny analysis between CC 01-1940 and the genome confirms the presence of translocation events between the species and a random contribution throughout the entire genome in current sugarcane hybrids. Analysis of RNA-Seq data from leaf and root tissue of contrasting sugarcane genotypes subjected to water stress treatments revealed 17,490 differentially expressed genes, from which 3,633 correspond to genes expressed exclusively in tolerant genotypes. We expect the resources presented here to serve as a source of information to improve the selection processes of new varieties of the breeding programs of sugarcane.
高通量测序(HTS)技术和生物信息学的最新进展,包括读长的提高和基因组组装工具,使得能够以前所未有的质量和连续性重建复杂基因组。甘蔗是禾本科中基因组最复杂的物种之一,单倍体长度为1Gbp,倍性在8到12之间。在这项工作中,我们展示了哥伦比亚甘蔗杂交种CC 01 - 1940的基因组组装。为了进行该组装,我们结合了三种测序技术:PacBio长读长、Illumina配对短读长和Hi-C读长。我们获得了34.94 Mbp的中位重叠群长度和903.2 Mbp的全基因组组装。我们总共注释了63,724个蛋白质编码基因,并对蔗糖代谢途径进行了重建和比较分析。与近缘物种直系同源基因之间的核苷酸进化测量表明, 和 之间的分歧发生在200万年前以内。CC 01 - 1940与 基因组之间的共线性分析证实了物种间存在易位事件,以及当前甘蔗杂交种整个基因组中的随机贡献。对遭受水分胁迫处理的不同甘蔗基因型的叶片和根组织的RNA-Seq数据进行分析,发现了17,490个差异表达基因,其中3,633个对应于仅在耐受基因型中表达的基因。我们期望这里展示的资源能够作为一种信息来源,用于改进甘蔗育种计划中新品种的选择过程。