Li Minghao, Guo Xuaoyu, Zhao Jin
School of Computer Science and Technology, Qingdao University, Shandong 266071, China.
Bioinform Adv. 2025 Apr 8;5(1):vbaf075. doi: 10.1093/bioadv/vbaf075. eCollection 2025.
The discontinuous transcription mechanism of coronaviruses contributes to their adaptation to different host environments and plays a critical role in their lifecycle. Accurate assembly of coronavirus transcripts is vital for understanding the virus's biological traits and developing precise prevention and treatment strategies. However, existing assembly algorithms are primarily designed for alternative splicing events in eukaryotes and are not suitable for assembling coronavirus transcriptome, which consists of both genomic RNA and subgenomic mRNAs. Coronavirus transcriptome reconstruction from short reads remains a challenging problem.
In this work, we present VirDiG, a transcriptome assembler specifically designed for coronaviruses. VirDiG utilizes a discontinuous graph to facilitate accurate transcript assembly by incorporating information from paired-end reads, sequence depth, and start and stop codons. Experimental results from both simulated and real datasets show that VirDiG exhibits significant advantages in reconstructing the transcriptome of coronaviruses when compared to traditional assemblers tailored for classical eukaryotic transcriptome assembly.
VirDiG is freely available at https://github.com/Limh616/VirDiG.git.
冠状病毒的不连续转录机制有助于其适应不同的宿主环境,并在其生命周期中发挥关键作用。准确组装冠状病毒转录本对于理解病毒的生物学特性以及制定精确的预防和治疗策略至关重要。然而,现有的组装算法主要是为真核生物中的可变剪接事件设计的,并不适用于组装由基因组RNA和亚基因组mRNA组成的冠状病毒转录组。从短读长中重建冠状病毒转录组仍然是一个具有挑战性的问题。
在这项工作中,我们提出了VirDiG,一种专门为冠状病毒设计的转录组组装器。VirDiG利用不连续图,通过整合来自双端读长、序列深度以及起始和终止密码子的信息,来促进准确的转录本组装。来自模拟数据集和真实数据集的实验结果表明,与为经典真核生物转录组组装量身定制的传统组装器相比,VirDiG在重建冠状病毒转录组方面具有显著优势。