Department of Biochemistry and Molecular Biology, The University of Texas Medical Branch, Galveston, TX 77555, USA.
John Sealy School of Medicine, The University of Texas Medical Branch, Galveston, TX 77555, USA.
Gigascience. 2023 Mar 20;12. doi: 10.1093/gigascience/giad009.
Genetic recombination is a tremendous source of intrahost diversity in viruses and is critical for their ability to rapidly adapt to new environments or fitness challenges. While viruses are routinely characterized using high-throughput sequencing techniques, characterizing the genetic products of recombination in next-generation sequencing data remains a challenge. Viral recombination events can be highly diverse and variable in nature, including simple duplications and deletions, or more complex events such as copy/snap-back recombination, intervirus or intersegment recombination, and insertions of host nucleic acids. Due to the variable mechanisms driving virus recombination and the different selection pressures acting on the progeny, recombination junctions rarely adhere to simple canonical sites or sequences. Furthermore, numerous different events may be present simultaneously in a viral population, yielding a complex mutational landscape.
We have previously developed an algorithm called ViReMa (Virus Recombination Mapper) that bootstraps the bowtie short-read aligner to capture and annotate a wide range of recombinant species found within virus populations. Here, we have updated ViReMa to provide an "error density" function designed to accurately detect recombination events in the longer reads now routinely generated by the Illumina platforms and provide output reports for multiple types of recombinant species using standardized formats. We demonstrate the utility and flexibility of ViReMa in different settings to report deletion events in simulated data from Flock House virus, copy-back RNA species in Sendai viruses, short duplication events in HIV, and virus-to-host recombination in an archaeal DNA virus.
遗传重组是病毒体内多样性的巨大来源,对其快速适应新环境或适应新挑战的能力至关重要。虽然病毒通常使用高通量测序技术进行表征,但在下一代测序数据中对重组的遗传产物进行特征描述仍然是一个挑战。病毒重组事件在性质上可能非常多样化和多变,包括简单的重复和缺失,或更复杂的事件,如拷贝/弹回重组、病毒间或片段间重组,以及宿主核酸的插入。由于驱动病毒重组的机制不同,以及对后代施加的不同选择压力,重组连接处很少遵循简单的规范位点或序列。此外,在病毒群体中可能同时存在许多不同的事件,产生复杂的突变景观。
我们之前开发了一种称为 ViReMa(病毒重组映射器)的算法,该算法引导 bowtie 短读对齐器捕获和注释病毒群体中发现的广泛重组种。在这里,我们更新了 ViReMa,提供了“错误密度”功能,旨在准确检测现在由 Illumina 平台常规生成的较长读长中的重组事件,并使用标准化格式为多种类型的重组种提供输出报告。我们展示了 ViReMa 在不同设置下的实用性和灵活性,以报告 Flock House 病毒模拟数据中的缺失事件、Sendai 病毒中的拷贝回 RNA 种、HIV 中的短重复事件以及古细菌 DNA 病毒中的病毒到宿主重组。