Public Health Sciences Division, Fred Hutchinson Cancer Research Center, Seattle, WA, USA.
Quintepa Computing LLC, Nashville, TN, USA.
Sci Rep. 2022 Nov 9;12(1):19089. doi: 10.1038/s41598-022-23342-2.
Extensive mutations in the Omicron spike protein appear to accelerate the transmission of SARS-CoV-2, and rapid infections increase the odds that additional mutants will emerge. To build an investigative framework, we have applied an unsupervised machine learning approach to 4296 Omicron viral genomes collected and deposited to GISAID as of December 14, 2021, and have identified a core haplotype of 28 polymutants (A67V, T95I, G339D, R346K, S371L, S373P, S375F, K417N, N440K, G446S, S477N, T478K, E484A, Q493R, G496S, Q498R, N501Y, Y505H, T547K, D614G, H655Y, N679K, P681H, N764K, K796Y, N856K, Q954H, N69K, L981F) in the spike protein and a separate core haplotype of 17 polymutants in non-spike genes: (K38, A1892) in nsp3, T492 in nsp4, (P132, V247, T280, S284) in 3C-like proteinase, I189 in nsp6, P323 in RNA-dependent RNA polymerase, I42 in Exonuclease, T9 in envelope protein, (D3, Q19, A63) in membrane glycoprotein, and (P13, R203, G204) in nucleocapsid phosphoprotein. Using these core haplotypes as reference, we have identified four newly emerging polymutants (R346, A701, I1081, N1192) in the spike protein (p value = 9.3710, 1.010, 4.7610 and 1.5610, respectively), and five additional polymutants in non-spike genes (D343G in nucleocapsid phosphoprotein, V1069I in nsp3, V94A in nsp4, F694Y in the RNA-dependent RNA polymerase and L106L/F of ORF3a) that exhibit significant increasing trajectories (all p values < 1.0*10). In the absence of relevant clinical data for these newly emerging mutations, it is important to monitor them closely. Two emerging mutations may be of particular concern: the N1192S mutation in spike protein locates in an extremely highly conserved region of all human coronaviruses that is integral to the viral fusion process, and the F694Y mutation in the RNA polymerase may induce conformational changes that could impact remdesivir binding.
奥密克戎刺突蛋白的广泛突变似乎加速了 SARS-CoV-2 的传播,而快速感染增加了出现额外突变的可能性。为了构建一个调查框架,我们应用了一种无监督机器学习方法,对截至 2021 年 12 月 14 日从 GISAID 收集并存储的 4296 个奥密克戎病毒基因组进行了分析,并确定了 28 个多突变体(A67V、T95I、G339D、R346K、S371L、S373P、S375F、K417N、N440K、G446S、S477N、T478K、E484A、Q493R、G496S、Q498R、N501Y、Y505H、T547K、D614G、H655Y、N679K、P681H、N764K、K796Y、N856K、Q954H、N69K、L981F)的核心单倍型和非刺突蛋白中 17 个多突变体的单独核心单倍型:nsp3 中的 K38、A1892,nsp4 中的 T492,3C 样蛋白酶中的 T492、P132、V247、T280、S284,nsp6 中的 I189,RNA 依赖性 RNA 聚合酶中的 P323,外切酶中的 I42,包膜蛋白中的 T9,膜糖蛋白中的 D3、Q19、A63,以及核衣壳磷蛋白中的 P13、R203、G204。使用这些核心单倍型作为参考,我们在刺突蛋白中发现了四个新出现的多突变体(R346、A701、I1081、N1192)(p 值分别为 9.3710、1.010、4.7610 和 1.5610),以及非刺突蛋白基因中的五个额外多突变体(核衣壳磷蛋白中的 D343G、nsp3 中的 V1069I、nsp4 中的 V94A、RNA 依赖性 RNA 聚合酶中的 F694Y 和 ORF3a 中的 L106L/F),它们都表现出显著的增长轨迹(所有 p 值均小于 1.0*10)。由于这些新出现的突变缺乏相关的临床数据,密切监测它们很重要。两个新出现的突变可能特别值得关注:刺突蛋白中的 N1192S 突变位于所有人类冠状病毒中一个极其高度保守的区域,这是病毒融合过程的关键,而 RNA 聚合酶中的 F694Y 突变可能会诱导构象变化,从而影响瑞德西韦的结合。