Suppr超能文献

对 350 万 SARS-CoV-2 序列的分析揭示了独特的突变趋势,具有一致的核苷酸和密码子频率。

Analysis of 3.5 million SARS-CoV-2 sequences reveals unique mutational trends with consistent nucleotide and codon frequencies.

机构信息

Hemostasis Branch, Division of Plasma Protein Therapeutics, Office of Tissues and Advanced Therapies, Center for Biologics Evaluation and Research, US Food and Drug Administration, Silver Spring, MD, USA.

Department of Statistics, University of Connecticut, Storrs, CT, USA.

出版信息

Virol J. 2023 Feb 17;20(1):31. doi: 10.1186/s12985-023-01982-8.

Abstract

BACKGROUND

Since the onset of the SARS-CoV-2 pandemic, bioinformatic analyses have been performed to understand the nucleotide and synonymous codon usage features and mutational patterns of the virus. However, comparatively few have attempted to perform such analyses on a considerably large cohort of viral genomes while organizing the plethora of available sequence data for a month-by-month analysis to observe changes over time. Here, we aimed to perform sequence composition and mutation analysis of SARS-CoV-2, separating sequences by gene, clade, and timepoints, and contrast the mutational profile of SARS-CoV-2 to other comparable RNA viruses.

METHODS

Using a cleaned, filtered, and pre-aligned dataset of over 3.5 million sequences downloaded from the GISAID database, we computed nucleotide and codon usage statistics, including calculation of relative synonymous codon usage values. We then calculated codon adaptation index (CAI) changes and a nonsynonymous/synonymous mutation ratio (dN/dS) over time for our dataset. Finally, we compiled information on the types of mutations occurring for SARS-CoV-2 and other comparable RNA viruses, and generated heatmaps showing codon and nucleotide composition at high entropy positions along the Spike sequence.

RESULTS

We show that nucleotide and codon usage metrics remain relatively consistent over the 32-month span, though there are significant differences between clades within each gene at various timepoints. CAI and dN/dS values vary substantially between different timepoints and different genes, with Spike gene on average showing both the highest CAI and dN/dS values. Mutational analysis showed that SARS-CoV-2 Spike has a higher proportion of nonsynonymous mutations than analogous genes in other RNA viruses, with nonsynonymous mutations outnumbering synonymous ones by up to 20:1. However, at several specific positions, synonymous mutations were overwhelmingly predominant.

CONCLUSIONS

Our multifaceted analysis covering both the composition and mutation signature of SARS-CoV-2 gives valuable insight into the nucleotide frequency and codon usage heterogeneity of SARS-CoV-2 over time, and its unique mutational profile compared to other RNA viruses.

摘要

背景

自 SARS-CoV-2 大流行以来,已经进行了生物信息学分析,以了解病毒的核苷酸和同义密码子使用特征和突变模式。然而,比较少的研究试图在相当大的病毒基因组队列上进行这样的分析,同时组织大量可用的序列数据进行逐月分析,以观察随时间的变化。在这里,我们旨在对 SARS-CoV-2 进行序列组成和突变分析,按基因、分支和时间点分离序列,并将 SARS-CoV-2 的突变谱与其他可比的 RNA 病毒进行对比。

方法

使用从 GISAID 数据库下载的超过 350 万条序列的清理、过滤和预对齐数据集,我们计算了核苷酸和密码子使用统计数据,包括相对同义密码子使用值的计算。然后,我们计算了我们数据集的密码子适应指数(CAI)变化和非同义/同义突变率(dN/dS)随时间的变化。最后,我们汇编了 SARS-CoV-2 和其他可比 RNA 病毒发生的突变类型的信息,并生成了显示 Spike 序列高熵位置的密码子和核苷酸组成的热图。

结果

我们表明,核苷酸和密码子使用度量在 32 个月的时间跨度内相对稳定,尽管在每个基因的不同时间点,分支之间存在显著差异。CAI 和 dN/dS 值在不同时间点和不同基因之间有很大的变化,Spike 基因平均显示出最高的 CAI 和 dN/dS 值。突变分析表明,SARS-CoV-2 的 Spike 比其他 RNA 病毒的类似基因具有更高比例的非同义突变,非同义突变数比同义突变数多 20 倍。然而,在几个特定位置,同义突变占绝对优势。

结论

我们的多方面分析涵盖了 SARS-CoV-2 的组成和突变特征,为了解 SARS-CoV-2 随时间的核苷酸频率和密码子使用异质性及其与其他 RNA 病毒相比的独特突变特征提供了有价值的见解。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/506b/9936717/f1767fb312c1/12985_2023_1982_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验