Department of Laboratory Medicine and Pathology, University of Washington Medical Center, Seattle, Washington, USA.
Department of Microbiology, University of Washington, Seattle, Washington, USA.
J Clin Microbiol. 2023 Aug 23;61(8):e0184222. doi: 10.1128/jcm.01842-22. Epub 2023 Jul 10.
Identification and analysis of clinically relevant strains of bacteria increasingly relies on whole-genome sequencing. The downstream bioinformatics steps necessary for calling variants from short-read sequences are well-established but seldom validated against haploid genomes. We devised an workflow to introduce single nucleotide polymorphisms (SNP) and indels into bacterial reference genomes, and computationally generate sequencing reads based on the mutated genomes. We then applied the method to Mycobacterium tuberculosis H37Rv, Staphylococcus aureus NCTC 8325, and Klebsiella pneumoniae HS11286, and used the synthetic reads as truth sets for evaluating several popular variant callers. Insertions proved especially challenging for most variant callers to correctly identify, relative to deletions and single nucleotide polymorphisms. With adequate read depth, however, variant callers that use high quality soft-clipped reads and base mismatches to perform local realignment consistently had the highest precision and recall in identifying insertions and deletions ranging from1 to 50 bp. The remaining variant callers had lower recall values associated with identification of insertions greater than 20 bp.
越来越多的临床相关细菌的鉴定和分析依赖于全基因组测序。从短读序列中调用变体所需的下游生物信息学步骤已经成熟,但很少针对单倍体基因组进行验证。我们设计了一种工作流程,可将单核苷酸多态性(SNP)和插入缺失引入细菌参考基因组,并基于突变基因组计算生成测序reads。然后,我们将该方法应用于结核分枝杆菌 H37Rv、金黄色葡萄球菌 NCTC 8325 和肺炎克雷伯菌 HS11286,并将合成reads用作评估几种流行的变体调用器的真实数据集。与缺失和单核苷酸多态性相比,插入物对大多数变体调用器来说特别难以正确识别。但是,具有足够深度的读取时,使用高质量的软剪辑读取和碱基错配来执行局部重-align 的变体调用器在识别 1 到 50bp 的插入和缺失方面具有最高的精度和召回率。其余的变体调用器在识别大于 20bp 的插入时召回值较低。