Fifth Affiliated Hospital of Guangzhou Medical University, Guangzhou 510700, China.
Guangzhou Nanxin Pharmaceutical Co., Ltd., Guangzhou 510700, China.
Brief Bioinform. 2021 Nov 5;22(6). doi: 10.1093/bib/bbab307.
For epidemic prevention and control, the identification of SARS-CoV-2 subpopulations sharing similar micro-epidemiological patterns and evolutionary histories is necessary for a more targeted investigation into the links among COVID-19 outbreaks caused by SARS-CoV-2 with similar genetic backgrounds. Genomic sequencing analysis has demonstrated the ability to uncover viral genetic diversity. However, an objective analysis is necessary for the identification of SARS-CoV-2 subpopulations. Herein, we detected all the mutations in 186 682 SARS-CoV-2 isolates. We found that the GC content of the SARS-CoV-2 genome had evolved to be lower, which may be conducive to viral spread, and the frameshift mutation was rare in the global population. Next, we encoded the genomic mutations in binary form and used an unsupervised learning classifier, namely PhenoGraph, to classify this information. Consequently, PhenoGraph successfully identified 303 SARS-CoV-2 subpopulations, and we found that the PhenoGraph classification was consistent with, but more detailed and precise than the known GISAID clades (S, L, V, G, GH, GR, GV and O). By the change trend analysis, we found that the growth rate of SARS-CoV-2 diversity has slowed down significantly. We also analyzed the temporal, spatial and phylogenetic relationships among the subpopulations and revealed the evolutionary trajectory of SARS-CoV-2 to a certain extent. Hence, our results provide a better understanding of the patterns and trends in the genomic evolution and epidemiology of SARS-CoV-2.
为了进行疫情防控,有必要识别具有相似微观流行病学模式和进化历史的 SARS-CoV-2 亚群,以便更有针对性地调查具有相似遗传背景的 SARS-CoV-2 引起的 COVID-19 爆发之间的联系。基因组测序分析已经证明了揭示病毒遗传多样性的能力。然而,需要进行客观分析才能识别 SARS-CoV-2 亚群。在此,我们检测了 186682 个 SARS-CoV-2 分离株中的所有突变。我们发现,SARS-CoV-2 基因组的 GC 含量已经进化得更低,这可能有利于病毒传播,而全球人群中的移码突变很少见。接下来,我们将基因组突变编码为二进制形式,并使用无监督学习分类器 PhenoGraph 对该信息进行分类。结果,PhenoGraph 成功地识别出了 303 个 SARS-CoV-2 亚群,我们发现 PhenoGraph 分类与已知的 GISAID 进化枝(S、L、V、G、GH、GR、GV 和 O)一致,但更详细和精确。通过变化趋势分析,我们发现 SARS-CoV-2 多样性的增长率显著放缓。我们还分析了亚群之间的时间、空间和系统发育关系,并在一定程度上揭示了 SARS-CoV-2 的进化轨迹。因此,我们的结果提供了对 SARS-CoV-2 基因组进化和流行病学模式和趋势的更好理解。