Suppr超能文献

在英国生物银行中计算多基因风险评分(PRS):流行病学家实用指南

Calculating Polygenic Risk Scores (PRS) in UK Biobank: A Practical Guide for Epidemiologists.

作者信息

Collister Jennifer A, Liu Xiaonan, Clifton Lei

机构信息

Nuffield Department of Population Health, University of Oxford, Oxford, United Kingdom.

出版信息

Front Genet. 2022 Feb 18;13:818574. doi: 10.3389/fgene.2022.818574. eCollection 2022.

Abstract

A polygenic risk score estimates the genetic risk of an individual for some disease or trait, calculated by aggregating the effect of many common variants associated with the condition. With the increasing availability of genetic data in large cohort studies such as the UK Biobank, inclusion of this genetic risk as a covariate in statistical analyses is becoming more widespread. Previously this required specialist knowledge, but as tooling and data availability have improved it has become more feasible for statisticians and epidemiologists to calculate existing scores themselves for use in analyses. While tutorial resources exist for conducting genome-wide association studies and generating of new polygenic risk scores, fewer guides exist for the simple calculation and application of existing genetic scores. This guide outlines the key steps of this process: selection of suitable polygenic risk scores from the literature, extraction of relevant genetic variants and verification of their quality, calculation of the risk score and key considerations of its inclusion in statistical models, using the UK Biobank imputed data as a model data set. Many of the techniques in this guide will generalize to other datasets, however we also focus on some of the specific techniques required for using data in the formats UK Biobank have selected. This includes some of the challenges faced when working with large numbers of variants, where the computation time required by some tools is impractical. While we have focused on only a couple of tools, which may not be the best ones for every given aspect of the process, one barrier to working with genetic data is the sheer volume of tools available, and the difficulty for a novice to assess their viability. By discussing in depth a couple of tools that are adequate for the calculation even at large scale, we hope to make polygenic risk scores more accessible to a wider range of researchers.

摘要

多基因风险评分通过汇总与某种疾病或性状相关的许多常见变异的影响,来估计个体患该疾病或具有该性状的遗传风险。随着英国生物银行等大型队列研究中遗传数据的日益丰富,在统计分析中将这种遗传风险作为协变量纳入的做法越来越普遍。以前这需要专业知识,但随着工具和数据可用性的提高,统计学家和流行病学家自己计算现有评分以用于分析变得更加可行。虽然存在关于进行全基因组关联研究和生成新的多基因风险评分的教程资源,但关于现有遗传评分的简单计算和应用的指南较少。本指南概述了这一过程的关键步骤:从文献中选择合适的多基因风险评分,提取相关遗传变异并验证其质量,计算风险评分以及将其纳入统计模型的关键考虑因素,以英国生物银行的推算数据作为模型数据集。本指南中的许多技术将适用于其他数据集,不过我们也关注使用英国生物银行所选格式的数据所需的一些特定技术。这包括处理大量变异时面临的一些挑战,其中一些工具所需的计算时间不切实际。虽然我们只关注了少数几种工具,它们可能并非在该过程的每个方面都是最佳的,但处理遗传数据的一个障碍是可用工具的数量庞大,以及新手评估其可行性的难度。通过深入讨论即使在大规模情况下也足以进行计算的几种工具,我们希望使更广泛的研究人员能够更容易地使用多基因风险评分。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee75/8894758/dc56be18c8b6/fgene-13-818574-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验