Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, MA, USA.
Program in Medical and Population Genetics, Broad Institute of Harvard and MIT, Cambridge, MA, USA.
Nat Commun. 2021 Oct 18;12(1):6052. doi: 10.1038/s41467-021-25171-9.
Polygenic risk prediction is a widely investigated topic because of its promising clinical applications. Genetic variants in functional regions of the genome are enriched for complex trait heritability. Here, we introduce a method for polygenic prediction, LDpred-funct, that leverages trait-specific functional priors to increase prediction accuracy. We fit priors using the recently developed baseline-LD model, including coding, conserved, regulatory, and LD-related annotations. We analytically estimate posterior mean causal effect sizes and then use cross-validation to regularize these estimates, improving prediction accuracy for sparse architectures. We applied LDpred-funct to predict 21 highly heritable traits in the UK Biobank (avg N = 373 K as training data). LDpred-funct attained a +4.6% relative improvement in average prediction accuracy (avg prediction R = 0.144; highest R = 0.413 for height) compared to SBayesR (the best method that does not incorporate functional information). For height, meta-analyzing training data from UK Biobank and 23andMe cohorts (N = 1107 K) increased prediction R to 0.431. Our results show that incorporating functional priors improves polygenic prediction accuracy, consistent with the functional architecture of complex traits.
多基因风险预测是一个备受关注的研究领域,因为它具有广阔的临床应用前景。基因组功能区域的遗传变异与复杂性状的遗传力密切相关。在这里,我们介绍了一种多基因预测方法 LDpred-funct,它利用特定于性状的功能先验知识来提高预测准确性。我们使用最近开发的基线 LD 模型来拟合先验知识,包括编码、保守、调控和 LD 相关注释。我们通过分析估计后验平均因果效应大小,然后使用交叉验证来正则化这些估计值,从而提高稀疏结构的预测准确性。我们将 LDpred-funct 应用于预测英国生物银行中 21 种高度遗传的性状(平均 N = 373K 作为训练数据)。与 SBayesR(不包含功能信息的最佳方法)相比,LDpred-funct 在平均预测准确性方面提高了+4.6%(平均预测 R = 0.144;最高 R = 0.413 用于身高)。对于身高,通过对英国生物银行和 23andMe 队列的训练数据进行荟萃分析(N = 1107K),将预测 R 提高到 0.431。我们的结果表明,整合功能先验知识可以提高多基因预测准确性,这与复杂性状的功能结构一致。