Suppr超能文献

纳米孔测序和人类巨细胞病毒 TB40/E 的全基因组从头组装揭示了克隆多样性和结构变异。

Nanopore sequencing and full genome de novo assembly of human cytomegalovirus TB40/E reveals clonal diversity and structural variations.

机构信息

Department of Zoology, University of Oxford, Oxford, United Kingdom.

Public Health Laboratories, Department of Microbiology, Hellenic Pasteur Institute, 127 Vas Sofias Ave, 11527, Athens, Greece.

出版信息

BMC Genomics. 2018 Aug 2;19(1):577. doi: 10.1186/s12864-018-4949-6.

Abstract

BACKGROUND

Human cytomegalovirus (HCMV) has a double-stranded DNA genome of approximately 235 Kbp that is structurally complex including extended GC-rich repeated regions. Genomic recombination events are frequent in HCMV cultures but have also been observed in vivo. Thus, the assembly of HCMV whole genomes from technologies producing shorter than 500 bp sequences is technically challenging. Here we improved the reconstruction of HCMV full genomes by means of a hybrid, de novo genome-assembly bioinformatics pipeline upon data generated from the recently released MinION MkI B sequencer from Oxford Nanopore Technologies.

RESULTS

The MinION run of the HCMV (strain TB40/E) library resulted in ~ 47,000 reads from a single R9 flowcell and in ~ 100× average read depth across the virus genome. We developed a novel, self-correcting bioinformatics algorithm to assemble the pooled HCMV genomes in three stages. In the first stage of the bioinformatics algorithm, long contigs (N50 = 21,892) of lower accuracy were reconstructed. In the second stage, short contigs (N50 = 5686) of higher accuracy were assembled, while in the final stage the high quality contigs served as template for the correction of the longer contigs resulting in a high-accuracy, full genome assembly (N50 = 41,056). We were able to reconstruct a single representative haplotype without employing any scaffolding steps. The majority (98.8%) of the genomic features from the reference strain were accurately annotated on this full genome construct. Our method also allowed the detection of multiple alternative sub-genomic fragments and non-canonical structures suggesting rearrangement events between the unique (UL /US) and the repeated (T/IRL/S) genomic regions.

CONCLUSIONS

Third generation high-throughput sequencing technologies can accurately reconstruct full-length HCMV genomes including their low-complexity and highly repetitive regions. Full-length HCMV genomes could prove crucial in understanding the genetic determinants and viral evolution underpinning drug resistance, virulence and pathogenesis.

摘要

背景

人巨细胞病毒(HCMV)具有约 235 Kbp 的双链 DNA 基因组,结构复杂,包括扩展的 GC 丰富重复区。HCMV 培养物中的基因组重组事件频繁,但也在体内观察到。因此,从产生短于 500 bp 序列的技术中组装 HCMV 全基因组在技术上具有挑战性。在这里,我们通过一种混合的从头基因组组装生物信息学管道,改进了 HCMV 全基因组的重建,该管道基于最近由牛津纳米孔技术公司发布的 MinION MkI B 测序仪生成的数据。

结果

来自单个 R9 流动池的 HCMV(株 TB40/E)文库的 MinION 运行产生了约 47000 个读数,病毒基因组的平均读数深度约为 100 倍。我们开发了一种新颖的、自我纠正的生物信息学算法,分三个阶段组装汇集的 HCMV 基因组。在生物信息学算法的第一阶段,构建了较低准确性的长连续体(N50=21892)。在第二阶段,组装了更高准确性的短连续体(N50=5686),而在最后阶段,高质量连续体作为校正更长连续体的模板,从而生成高准确性的全基因组组装(N50=41056)。我们能够在不采用任何支架步骤的情况下重建单个代表性单倍型。参考株的大多数(98.8%)基因组特征在这个全基因组构建体上得到了准确注释。我们的方法还允许检测多个替代亚基因组片段和非典型结构,表明独特(UL/US)和重复(T/IRL/S)基因组区域之间的重排事件。

结论

第三代高通量测序技术可以准确地重建包括其低复杂度和高度重复区域在内的全长 HCMV 基因组。全长 HCMV 基因组对于理解药物抗性、毒力和发病机制背后的遗传决定因素和病毒进化可能至关重要。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7f64/6090854/9e8959d91ccd/12864_2018_4949_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验