Minin Vladimir N, Bloomquist Erik W, Suchard Marc A
Department of Statistics, University of Washington, USA.
Mol Biol Evol. 2008 Jul;25(7):1459-71. doi: 10.1093/molbev/msn090. Epub 2008 Apr 11.
Kingman's coalescent process opens the door for estimation of population genetics model parameters from molecular sequences. One paramount parameter of interest is the effective population size. Temporal variation of this quantity characterizes the demographic history of a population. Because researchers are rarely able to choose a priori a deterministic model describing effective population size dynamics for data at hand, nonparametric curve-fitting methods based on multiple change-point (MCP) models have been developed. We propose an alternative to change-point modeling that exploits Gaussian Markov random fields to achieve temporal smoothing of the effective population size in a Bayesian framework. The main advantage of our approach is that, in contrast to MCP models, the explicit temporal smoothing does not require strong prior decisions. To approximate the posterior distribution of the population dynamics, we use efficient, fast mixing Markov chain Monte Carlo algorithms designed for highly structured Gaussian models. In a simulation study, we demonstrate that the proposed temporal smoothing method, named Bayesian skyride, successfully recovers "true" population size trajectories in all simulation scenarios and competes well with the MCP approaches without evoking strong prior assumptions. We apply our Bayesian skyride method to 2 real data sets. We analyze sequences of hepatitis C virus contemporaneously sampled in Egypt, reproducing all key known aspects of the viral population dynamics. Next, we estimate the demographic histories of human influenza A hemagglutinin sequences, serially sampled throughout 3 flu seasons.
金曼合并过程为从分子序列估计群体遗传学模型参数打开了大门。一个至关重要的感兴趣参数是有效种群大小。这个数量的时间变化表征了一个种群的人口统计学历史。由于研究人员很少能够先验地选择一个确定性模型来描述手头数据的有效种群大小动态,基于多变化点(MCP)模型的非参数曲线拟合方法已经被开发出来。我们提出了一种替代变化点建模的方法,该方法利用高斯马尔可夫随机场在贝叶斯框架中实现有效种群大小的时间平滑。我们方法的主要优点是,与MCP模型相比,显式的时间平滑不需要强有力的先验决策。为了近似种群动态的后验分布,我们使用了为高度结构化高斯模型设计的高效、快速混合的马尔可夫链蒙特卡罗算法。在一项模拟研究中,我们证明了所提出的名为贝叶斯天际线的时间平滑方法在所有模拟场景中都成功地恢复了“真实”的种群大小轨迹,并且在不引入强先验假设的情况下与MCP方法竞争良好。我们将我们的贝叶斯天际线方法应用于两个真实数据集。我们分析了在埃及同时采样的丙型肝炎病毒序列,重现了病毒种群动态的所有关键已知方面。接下来,我们估计了在三个流感季节连续采样的人类甲型流感血凝素序列的人口统计学历史。