Central Clinical School, Monash University, Melbourne, VIC 3004, Australia.
Burnet Institute for Medical Research, Melbourne, VIC 3004, Australia.
Viruses. 2022 Jun 29;14(7):1434. doi: 10.3390/v14071434.
Since its emergence in 2019, SARS-CoV-2 has spread and evolved globally, with newly emerged variants of concern (VOCs) accounting for more than 500 million COVID-19 cases and 6 million deaths. Continuous surveillance utilizing simple genetic tools is needed to measure the viral epidemiological diversity, risk of infection, and distribution among different demographics in different geographical regions. To help address this need, we developed a proof-of-concept multilocus genotyping tool and demonstrated its utility to monitor viral populations sampled in 2020 and 2021 across six continents. We sampled globally 22,164 SARS-CoV-2 genomes from GISAID (inclusion criteria: available clinical and demographic data). They comprised two study populations, “2020 genomes” (N = 5959) sampled from December 2019 to September 2020 and “2021 genomes” (N = 16,205) sampled from 15 January to 15 March 2021. All genomes were aligned to the SARS-CoV-2 reference genome and amino acid polymorphisms were called with quality filtering. Thereafter, 74 codons (loci) in 14 genes including orf1ab polygene (N = 9), orf3a, orf8, nucleocapsid (N), matrix (M), and spike (S) met the 0.01 minimum allele frequency criteria and were selected to construct multilocus genotypes (MLGs) for the genomes. At these loci, 137 mutant/variant amino acids (alleles) were detected with eight VOC-defining variant alleles, including N KR203&204, orf1ab (I265, F3606, and L4715), orf3a H57, orf8 S84, and S G614, being predominant globally with > 35% prevalence. Their persistence and selection were associated with peaks in the viral transmission and COVID-19 incidence between 2020 and 2021. Epidemiologically, older patients (≥20 years) compared to younger patients (<20 years) had a higher risk of being infected with these variants, but this association was dependent on the continent of origin. In the global population, the discriminant analysis of principal components (DAPC) showed contrasting patterns of genetic clustering with three (Africa, Asia, and North America) and two (North and South America) continental clusters being observed for the 2020 and 2021 global populations, respectively. Within each continent, the MLG repertoires (range 40−199) sampled in 2020 and 2021 were genetically differentiated, with ≤4 MLGs per repertoire accounting for the majority of genomes sampled. These data suggested that the majority of SARS-CoV-2 infections in 2020 and 2021 were caused by genetically distinct variants that likely adapted to local populations. Indeed, four GISAID clade-defined VOCs - GRY (Alpha), GH (Beta), GR (Gamma), and G/GK (Delta variant) were differentiated by their MLG signatures, demonstrating the versatility of the MLG tool for variant identification. Results from this proof-of-concept multilocus genotyping demonstrates its utility for SARS-CoV-2 genomic surveillance and for monitoring its spatiotemporal epidemiology and evolution, particularly in response to control interventions including COVID-19 vaccines and chemotherapies.
自 2019 年出现以来,SARS-CoV-2 在全球范围内传播和演变,新出现的关注变种(VOC)占超过 5 亿例 COVID-19 病例和 600 万例死亡。需要利用简单的遗传工具进行持续监测,以衡量不同地理区域不同人群中的病毒流行病学多样性、感染风险和分布。为了帮助满足这一需求,我们开发了一种概念验证的多位点基因分型工具,并展示了其在监测 2020 年和 2021 年全球六个大陆采样的病毒群体中的效用。我们从 GISAID 中采样了全球范围内 22164 个 SARS-CoV-2 基因组(纳入标准:可获得临床和人口统计学数据)。它们包括两个研究人群,“2020 年基因组”(N=5959)于 2019 年 12 月至 2020 年 9 月采样,“2021 年基因组”(N=16205)于 2021 年 1 月 15 日至 3 月 15 日采样。所有基因组均与 SARS-CoV-2 参考基因组对齐,并对质量过滤后的氨基酸多态性进行了调用。之后,在包括orf1ab 多基因(N=9)、orf3a、orf8、核衣壳(N)、基质(M)和刺突(S)在内的 14 个基因中的 74 个密码子(位点)中,有 0.01 个最小等位基因频率的标准和 137 个突变/变体氨基酸(等位基因)被检测到,其中包括 8 个 VOC 定义的变体等位基因,包括 N KR203&204、orf1ab(I265、F3606 和 L4715)、orf3a H57、orf8 S84 和 S G614,这些等位基因在全球范围内普遍存在,超过 35%的患病率。它们的持续存在和选择与 2020 年至 2021 年期间病毒传播和 COVID-19 发病率的高峰有关。在流行病学方面,与年轻患者(<20 岁)相比,年龄较大的患者(≥20 岁)感染这些变体的风险更高,但这种关联取决于起源大陆。在全球人群中,主成分判别分析(DAPC)显示出不同的遗传聚类模式,2020 年和 2021 年的全球人群分别观察到三个(非洲、亚洲和北美)和两个(北美和南美)大陆聚类。在每个大陆内,2020 年和 2021 年采样的 MLG 库(范围 40-199)在遗传上存在差异,每个库中≤4 个 MLG 占采样基因组的大多数。这些数据表明,2020 年和 2021 年的大多数 SARS-CoV-2 感染是由遗传上不同的变体引起的,这些变体可能适应了当地人群。事实上,四个 GISAID 谱系定义的 VOC-GRY(Alpha)、GH(Beta)、GR(Gamma)和 G/GK(Delta 变体)通过其 MLG 特征进行了区分,展示了 MLG 工具在变体识别方面的多功能性。这一概念验证的多位点基因分型的结果证明了其在 SARS-CoV-2 基因组监测中的效用,以及在监测其时空流行病学和进化方面的效用,特别是在应对控制干预措施(包括 COVID-19 疫苗和化疗药物)方面。