Hui Daniel, Dudek Scott, Kiryluk Krzysztof, Walunas Theresa L, Kullo Iftikhar J, Wei Wei-Qi, Tiwari Hemant, Peterson Josh F, Chung Wendy K, Davis Brittney H, Khan Atlas, Kottyan Leah C, Limdi Nita A, Feng Qiping, Puckelwartz Megan J, Weng Chunhua, Smith Johanna L, Karlson Elizabeth W, Jarvik Gail P, Ritchie Marylyn D
Department of Genetics, Perelman School of Medicine, University of Pennsylvania, Philadelphia, United States.
Division of Nephrology, Department of Medicine, Columbia University, New York, United States.
Elife. 2025 Jan 24;12:RP88149. doi: 10.7554/eLife.88149.
Apart from ancestry, personal or environmental covariates may contribute to differences in polygenic score (PGS) performance. We analyzed the effects of covariate stratification and interaction on body mass index (BMI) PGS (PGS) across four cohorts of European (N = 491,111) and African (N = 21,612) ancestry. Stratifying on binary covariates and quintiles for continuous covariates, 18/62 covariates had significant and replicable R differences among strata. Covariates with the largest differences included age, sex, blood lipids, physical activity, and alcohol consumption, with R being nearly double between best- and worst-performing quintiles for certain covariates. Twenty-eight covariates had significant PGS-covariate interaction effects, modifying PGS effects by nearly 20% per standard deviation change. We observed overlap between covariates that had significant R differences among strata and interaction effects - across all covariates, their main effects on BMI were correlated with their maximum R differences and interaction effects (0.56 and 0.58, respectively), suggesting high-PGS individuals have highest R and increase in PGS effect. Using quantile regression, we show the effect of PGS increases as BMI itself increases, and that these differences in effects are directly related to differences in R when stratifying by different covariates. Given significant and replicable evidence for context-specific PGS performance and effects, we investigated ways to increase model performance taking into account nonlinear effects. Machine learning models (neural networks) increased relative model R (mean 23%) across datasets. Finally, creating PGS directly from GxAge genome-wide association studies effects increased relative R by 7.8%. These results demonstrate that certain covariates, especially those most associated with BMI, significantly affect both PGS performance and effects across diverse cohorts and ancestries, and we provide avenues to improve model performance that consider these effects.
除了血统外,个人或环境协变量可能会导致多基因评分(PGS)表现的差异。我们分析了协变量分层和相互作用对欧洲(N = 491,111)和非洲(N = 21,612)血统的四个队列中体重指数(BMI)PGS的影响。对二元协变量和连续协变量的五分位数进行分层,18/62个协变量在各层之间存在显著且可重复的R差异。差异最大的协变量包括年龄、性别、血脂、身体活动和饮酒量,某些协变量在表现最佳和最差的五分位数之间R几乎翻倍。28个协变量具有显著的PGS-协变量相互作用效应,每标准差变化使PGS效应改变近20%。我们观察到在各层之间具有显著R差异的协变量和相互作用效应之间存在重叠——在所有协变量中,并表明高PGS个体具有最高的R且PGS效应增加。使用分位数回归,我们表明PGS的效应随着BMI本身的增加而增加,并且当按不同协变量分层时,这些效应差异与R差异直接相关。鉴于有显著且可重复的证据表明PGS表现和效应具有背景特异性,我们研究了考虑非线性效应来提高模型性能的方法。机器学习模型(神经网络)提高了跨数据集的相对模型R(平均23%)。最后,直接从全基因组关联研究的GxAge效应创建PGS使相对R提高了7.8%。这些结果表明,某些协变量尤其是那些与BMI最相关协变量,在不同队列和血统中显著影响PGS表现和效应,并且我们提供了考虑这些效应来提高模型性能的途径。