Ji Xiang, Redelings Benjamin, Su Shuo, Bao Hongcun, Deng Wu-Min, Hong Samuel L, Baele Guy, Lemey Philippe, Suchard Marc A
Department of Mathematics, School of Science and Engineering, Tulane University, New Orleans, LA, USA.
Shanghai Institute of Infectious Disease and Biosecurity, School of Public Health, Fudan University, Shanghai, China.
Res Sq. 2025 Jun 25:rs.3.rs-6926809. doi: 10.21203/rs.3.rs-6926809/v1.
Branch-specific substitution models are popular for detecting evolutionary change-points, such as shifts in selective pressure. However, applying such models typically requires prior knowledge of change-point locations on the phylogeny or faces scalability issues with large data sets. To address both limitations, we integrate branch-specific substitution models with shrinkage priors to automatically identify change-points without prior knowledge, while simultaneously estimating distinct substitution parameters for each branch. To enable tractable inference under this high-dimensional model, we develop an analytical gradient algorithm for the branch-specific substitution parameters where the computation time is linear in the number of parameters. We apply this gradient algorithm to infer selection pressure dynamics in the evolution of the BRCA1 gene in primates and mutational dynamics in viral sequences from the recent mpox epidemic. Our novel algorithm enhances inference efficiency, achieving up to a 90-fold speedup per iteration in maximum-likelihood optimization when compared to central difference numerical gradient method and up to a 360-fold improvement in computational performance within a Bayesian framework using Hamiltonian Monte Carlo sampler compared to conventional univariate random walk sampler.
特定分支替代模型在检测进化变化点(如选择压力的转变)方面很受欢迎。然而,应用此类模型通常需要事先了解系统发育树上变化点的位置,否则会面临大数据集的可扩展性问题。为了解决这两个局限性,我们将特定分支替代模型与收缩先验相结合,以便在无需先验知识的情况下自动识别变化点,同时为每个分支估计不同的替代参数。为了在这个高维模型下实现易于处理的推断,我们为特定分支替代参数开发了一种解析梯度算法,其计算时间与参数数量呈线性关系。我们应用这种梯度算法来推断灵长类动物中BRCA1基因进化过程中的选择压力动态,以及近期猴痘疫情中病毒序列的突变动态。我们的新算法提高了推断效率,与中心差分数值梯度法相比,在最大似然优化中每次迭代的速度提升高达90倍,与传统单变量随机游走采样器相比,在贝叶斯框架内使用哈密顿蒙特卡罗采样器时计算性能提升高达360倍。