Division of Foodborne, Waterborne, and Environmental Diseases, Centers for Disease Control and Prevention, Atlanta, GA, USA.
bioMérieux SA, Sint-Martens-Latem, Belgium.
Microb Genom. 2023 May;9(5). doi: 10.1099/mgen.0.001012.
is a leading causing of bacterial foodborne and zoonotic illnesses in the USA. Pulsed-field gene electrophoresis (PFGE) and 7-gene multilocus sequence typing (MLST) have been historically used to differentiate sporadic from outbreak isolates. Whole genome sequencing (WGS) has been shown to provide superior resolution and concordance with epidemiological data when compared with PFGE and 7-gene MLST during outbreak investigations. In this study, we evaluated epidemiological concordance for high-quality SNP (hqSNP), core genome (cg)MLST and whole genome (wg)MLST to cluster or differentiate outbreak-associated and sporadic and isolates. Phylogenetic hqSNP, cgMLST and wgMLST analyses were also compared using Baker's gamma index (BGI) and cophenetic correlation coefficients. Pairwise distances comparing all three analysis methods were compared using linear regression models. Our results showed that 68/73 sporadic and isolates were differentiated from outbreak-associated isolates using all three methods. There was a high correlation between cgMLST and wgMLST analyses of the isolates; the BGI, cophenetic correlation coefficient, linear regression model and Pearson correlation coefficients were >0.90. The correlation was sometimes lower comparing hqSNP analysis to the MLST-based methods; the linear regression model and Pearson correlation coefficients were between 0.60 and 0.86, and the BGI and cophenetic correlation coefficient were between 0.63 and 0.86 for some outbreak isolates. We demonstrated that and isolates clustered in concordance with epidemiological data using WGS-based analysis methods. Discrepancies between allele and SNP-based approaches may reflect the differences between how genomic variation (SNPs and indels) are captured between the two methods. Since cgMLST examines allele differences in genes that are common in most isolates being compared, it is well suited to surveillance: searching large genomic databases for similar isolates is easily and efficiently done using allelic profiles. On the other hand, use of an hqSNP approach is much more computer intensive and not scalable to large sets of genomes. If further resolution between potential outbreak isolates is needed, wgMLST or hqSNP analysis can be used.
是美国导致食源性和动物源性细菌病的主要原因。脉冲场基因电泳(PFGE)和 7 基因多位点序列分型(MLST)一直以来被用于区分散发和暴发分离株。全基因组测序(WGS)在暴发调查中与 PFGE 和 7 基因 MLST 相比,在提供更高分辨率和与流行病学数据的一致性方面显示出优越性。在这项研究中,我们评估了高分辨率单核苷酸多态性(hqSNP)、核心基因组(cg)MLST 和全基因组(wg)MLST 在聚类或区分暴发相关和散发的和分离株方面的流行病学一致性。还使用贝克氏伽马指数(BGI)和协方差相关系数比较了 hqSNP、cgMLST 和 wgMLST 的系统发育分析。使用线性回归模型比较了比较所有三种分析方法的成对距离。我们的结果表明,使用所有三种方法可以将 68/73 个散发的和分离株与暴发相关的分离株区分开来。分离株的 cgMLST 和 wgMLST 分析之间存在高度相关性;BGI、协方差相关系数、线性回归模型和 Pearson 相关系数均>0.90。与基于 MLST 的方法相比,hqSNP 分析的相关性有时较低;线性回归模型和 Pearson 相关系数在 0.60 到 0.86 之间,BGI 和协方差相关系数在 0.63 到 0.86 之间,对于一些暴发分离株。我们证明,使用基于 WGS 的分析方法,和分离株与流行病学数据一致聚类。等位基因和 SNP 方法之间的差异可能反映了两种方法之间捕获基因组变异(SNP 和插入缺失)的差异。由于 cgMLST 检查在比较的大多数分离株中常见的基因中的等位基因差异,因此它非常适合于监测:使用等位基因谱可以轻松有效地在大型基因组数据库中搜索相似的分离株。另一方面,使用 hqSNP 方法需要更多的计算机资源,并且不能扩展到大型基因组集。如果需要进一步区分潜在的暴发分离株,可以使用 wgMLST 或 hqSNP 分析。