Graduate Program in Genomics and Computational Biology, University of Pennsylvania, Philadelphia, PA, USA.
Pac Symp Biocomput. 2023;28:437-448.
Polygenic risk scores (PRS) have led to enthusiasm for precision medicine. However, it is well documented that PRS do not generalize across groups differing in ancestry or sample characteristics e.g., age. Quantifying performance of PRS across different groups of study participants, using genome-wide association study (GWAS) summary statistics from multiple ancestry groups and sample sizes, and using different linkage disequilibrium (LD) reference panels may clarify which factors are limiting PRS transferability. To evaluate these factors in the PRS generation process, we generated body mass index (BMI) PRS (PRSBMI) in the Electronic Medical Records and Genomics (eMERGE) network (N=75,661). Analyses were conducted in two ancestry groups (European and African) and three age ranges (adult, teenagers, and children). For PRSBMI calculations, we evaluated five LD reference panels and three sets of GWAS summary statistics of varying sample size and ancestry. PRSBMI performance increased for both African and European ancestry individuals using cross-ancestry GWAS summary statistics compared to European-only summary statistics (6.3% and 3.7% relative R2 increase, respectively, pAfrican=0.038, pEuropean=6.26x10-4). The effects of LD reference panels were more pronounced in African ancestry study datasets. PRSBMI performance degraded in children; R2 was less than half of teenagers or adults. The effect of GWAS summary statistics sample size was small when modeled with the other factors. Additionally, the potential of using a PRS generated for one trait to predict risk for comorbid diseases is not well understood especially in the context of cross-ancestry analyses - we explored clinical comorbidities from the electronic health record associated with PRSBMI and identified significant associations with type 2 diabetes and coronary atherosclerosis. In summary, this study quantifies the effects that ancestry, GWAS summary statistic sample size, and LD reference panel have on PRS performance, especially in cross-ancestry and age-specific analyses.
多基因风险评分 (PRS) 引发了人们对精准医学的热情。然而,有大量文献记载表明,PRS 在不同祖先或样本特征(例如年龄)的群体中无法推广。使用来自多个祖先群体和样本大小的全基因组关联研究 (GWAS) 汇总统计数据,并使用不同的连锁不平衡 (LD) 参考面板,在不同的研究参与者群体中量化 PRS 的性能,可以阐明哪些因素限制了 PRS 的可转移性。为了在 PRS 生成过程中评估这些因素,我们在电子病历和基因组学 (eMERGE) 网络中生成了体重指数 (BMI) PRS (PRSBMI)(N=75661)。分析在两个祖先群体(欧洲和非洲)和三个年龄范围(成人、青少年和儿童)中进行。对于 PRSBMI 的计算,我们评估了五个 LD 参考面板和三个具有不同样本大小和祖先的 GWAS 汇总统计数据集。与仅使用欧洲 GWAS 汇总统计数据相比,使用跨祖先 GWAS 汇总统计数据计算 PRSBMI 时,对欧洲和非洲血统个体的表现都有所提高(相对 R2 分别增加了 6.3%和 3.7%,pAfrican=0.038,pEuropean=6.26x10-4)。LD 参考面板的影响在非洲血统研究数据集中更为明显。PRSBMI 在儿童中的表现下降;R2 不到青少年或成年人的一半。当与其他因素一起建模时,GWAS 汇总统计数据样本量的影响很小。此外,特别是在跨祖先分析的背景下,使用为一种性状生成的 PRS 来预测共病风险的潜力尚未得到很好的理解 - 我们从电子健康记录中探索了与 PRSBMI 相关的临床共病,并确定了与 2 型糖尿病和冠状动脉粥样硬化的显著关联。综上所述,本研究量化了祖先、GWAS 汇总统计数据样本量和 LD 参考面板对 PRS 性能的影响,特别是在跨祖先和年龄特异性分析中。