Department of Microbiology, Faculty of Medicine, Khon Kaen University, Khon Kaen, Thailand.
Research and Diagnostic Center for Emerging Infectious Diseases (RCEID), Khon Kaen University, Khon Kaen, Thailand.
PeerJ. 2024 Aug 29;12:e17964. doi: 10.7717/peerj.17964. eCollection 2024.
Next-generation sequencing of , the infectious agent causing tuberculosis, is improving the understanding of genomic diversity of circulating lineages and strain-types, and informing knowledge of drug resistance mutations. An increasingly popular approach to characterizing genomes (size: 4.4 Mbp) and variants (., single nucleotide polymorphisms (SNPs)) involves the assembly of sequence data.
We compared the performance of genome assembly tools (Unicycler, RagOut, and RagTag) on sequence data from nine drug resistant isolates (multi-drug (MDR) = 1; pre-extensively-drug (pre-XDR) = 8) generated using Illumina HiSeq, Oxford Nanopore Technology (ONT) PromethION, and PacBio platforms.
Our investigation found that Unicycler-based assemblies had significantly higher genome completeness (98.7%; values = 0.01) compared to other assembler tools (RagOut = 98.6%, and RagTag = 98.6%). The genome assembly sizes (bp) across isolates and sequencers based on RagOut was significantly longer ( values < 0.001) (4,418,574 ± 8,824 bp) than Unicycler and RagTag assemblies (Unicycler = 4,377,642 ± 55,257 bp, and RagTag = 4,380,711 ± 51,164 bp). RagOut-based assemblies had the fewest contigs (32) and the longest genome size (4,418,574 bp; . H37Rv reference size 4,411,532 bp) and therefore were chosen for downstream analysis. Pan-genome analysis of Illumina and PacBio hybrid assemblies revealed the greatest number of detected genes (4,639 genes; H37Rv reference contains 3,976 genes), while Illumina and ONT hybrid assemblies produced the highest number of SNPs. The number of genes from hybrid assemblies with ONT and PacBio long-reads (mean: 4,620 genes) was greater than short-read assembly alone (4,478 genes). All nine RagOut hybrid genome assemblies detected known mutations in genes associated with MDR-TB and pre-XDR-TB.
Unicycler software performed the best in terms of achieving contiguous genomes, whereas RagOut improved the quality of Unicycler's genome assemblies by providing a longer genome size. Overall, our approach has demonstrated that short-read and long-read hybrid assembly can provide a more complete genome assembly than short-read assembly alone by detecting pan-genomes and more genes, including IS, and SNPs.
下一代测序技术能够对引起结核病的病原体进行测序,这有助于提高对流行谱系和菌株类型的基因组多样性的理解,并为耐药突变提供相关知识。一种越来越流行的方法是对结核分枝杆菌基因组(大小为 4.4 Mbp)和变体(.,单核苷酸多态性(SNP))进行特征描述,该方法涉及到序列数据的组装。
我们比较了 9 株耐药结核分枝杆菌(耐多药(MDR)= 1;耐多药前(pre-XDR)= 8)的序列数据,使用 Illumina HiSeq、Oxford Nanopore Technology(ONT)PromethION 和 PacBio 平台生成,比较了三种基因组组装工具(Unicycler、RagOut 和 RagTag)的性能。
我们的研究发现,基于 Unicycler 的组装具有更高的基因组完整性(98.7%; 值= 0.01),与其他组装工具(RagOut = 98.6%,和 RagTag = 98.6%)相比。在基于 RagOut 的组装中,不同的分离株和测序仪的基因组组装大小(bp)显著更长( 值< 0.001)(4,418,574 ± 8,824 bp),而不是 Unicycler 和 RagTag 组装(Unicycler = 4,377,642 ± 55,257 bp,和 RagTag = 4,380,711 ± 51,164 bp)。基于 RagOut 的组装具有最少的 contigs(32)和最长的基因组大小(4,418,574 bp;. H37Rv 参考大小 4,411,532 bp),因此被选择用于下游分析。Illumina 和 PacBio 混合组装的泛基因组分析显示,检测到的基因数量最多(4,639 个基因; H37Rv 参考包含 3,976 个基因),而 Illumina 和 ONT 混合组装产生的 SNP 数量最多。具有 ONT 和 PacBio 长读长的混合组装的基因数量(平均值:4,620 个基因)大于仅短读长组装的基因数量(4,478 个基因)。基于 RagOut 的所有 9 个混合基因组组装都检测到了与耐多药结核病和耐多药前结核病相关的基因中的已知突变。
在实现连续基因组方面,Unicycler 软件表现最好,而 RagOut 通过提供更长的基因组大小,提高了 Unicycler 基因组组装的质量。总的来说,我们的方法表明,短读长和长读长混合组装可以通过检测泛基因组和更多的基因,包括插入序列(IS)和 SNP,提供比短读长组装更完整的基因组组装。