Terbot John W, Cooper Brandon S, Good Jeffrey M, Jensen Jeffrey D
Arizona State University, School of Life Sciences, Center for Evolution & Medicine, Tempe, Arizona, United States of America.
University of Montana, Division of Biological Sciences, Missoula, Montana, United States of America.
bioRxiv. 2023 Jul 17:2023.07.13.548462. doi: 10.1101/2023.07.13.548462.
The global impact of Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2) has led to considerable interest in detecting novel beneficial mutations and other genomic changes that may signal the development of variants of concern (VOCs). The ability to accurately detect these changes within individual patient samples is important in enabling early detection of VOCs. Such genomic scans for positive selection are best performed via comparison of empirical data to simulated data wherein evolutionary factors, including mutation and recombination rates, reproductive and infection dynamics, and purifying and background selection, can be carefully accounted for and parameterized. While there has been work to quantify these factors in SARS-CoV-2, they have yet to be integrated into a baseline model describing intra-host evolutionary dynamics. To construct such a baseline model, we develop a simulation framework that enables one to establish expectations for underlying levels and patterns of patient-level variation. By varying eight key parameters, we evaluated 12,096 different model-parameter combinations and compared them to existing empirical data. Of these, 592 models (~5%) were plausible based on the resulting mean expected number of segregating variants. These plausible models shared several commonalities shedding light on intra-host SARS-CoV-2 evolutionary dynamics: severe infection bottlenecks, low levels of reproductive skew, and a distribution of fitness effects skewed towards strongly deleterious mutations. We also describe important areas of model uncertainty and highlight additional sequence data that may help to further refine a baseline model. This study lays the groundwork for the improved analysis of existing and future SARS-CoV-2 within-patient data.
严重急性呼吸综合征冠状病毒2(SARS-CoV-2)的全球影响引发了人们对检测新型有益突变和其他基因组变化的浓厚兴趣,这些变化可能预示着值得关注的变异株(VOC)的出现。在个体患者样本中准确检测这些变化的能力对于早期发现VOC至关重要。这种针对正向选择的基因组扫描最好通过将经验数据与模拟数据进行比较来进行,其中可以仔细考虑和参数化包括突变和重组率、繁殖和感染动态以及纯化和背景选择在内的进化因素。虽然已经有研究对SARS-CoV-2中的这些因素进行了量化,但它们尚未被整合到一个描述宿主内进化动态的基线模型中。为了构建这样一个基线模型,我们开发了一个模拟框架,使人们能够建立对患者水平变异的潜在水平和模式的预期。通过改变八个关键参数,我们评估了12096种不同的模型参数组合,并将它们与现有的经验数据进行比较。其中,基于产生的平均预期分离变异数,有592个模型(约5%)是合理的。这些合理的模型有几个共同特点,揭示了宿主内SARS-CoV-2的进化动态:严重的感染瓶颈、低水平的繁殖偏斜以及适应度效应分布偏向于强有害突变。我们还描述了模型不确定性的重要领域,并强调了可能有助于进一步完善基线模型的额外序列数据。这项研究为改进对现有和未来SARS-CoV-2患者内数据的分析奠定了基础。