Department of Psychiatry and Neuropsychology, School for Mental Health and Neuroscience, Maastricht University Medical Centre, Maastricht, The Netherlands.
Department of Preventive Medicine, Institute of Biomedical Informatics, Bioinformatics Center, School of Basic Medical Sciences, Henan University, Kaifeng, China.
J Hum Genet. 2023 Sep;68(9):653-656. doi: 10.1038/s10038-023-01161-1. Epub 2023 May 15.
The current study was conducted to provide a general guidance for model specifications in polygenic risk score (PRS) analyses of the UK Biobank, such as adjusting for covariates (i.e. age, sex, recruitment centers, and genetic batch) and the number of principal components (PCs) that need to be included. To cover behavioral, physical and mental health outcomes, we evaluated three continuous outcomes (BMI, smoking, drinking) and two binary outcomes (Major Depressive Disorder and educational attainment). We applied 3280 (656 per phenotype) different models including different sets of covariates. We evaluated these different model specifications by comparing regression parameters such as R2, coefficients, and P values, as well as ANOVA tests. Findings suggest that only up to three PCs appears to be sufficient for controlling population stratification for most outcomes, whereas including other covariates (particularly age and sex) appears to be more essential for model performance.
本研究旨在为 UK Biobank 中多基因风险评分 (PRS) 分析的模型规范提供一般性指导,例如调整协变量(即年龄、性别、招募中心和遗传批次)以及需要包含的主成分 (PC) 的数量。为了涵盖行为、身体和心理健康结果,我们评估了三个连续结果(BMI、吸烟、饮酒)和两个二分结果(重度抑郁症和教育程度)。我们应用了 3280 种(每种表型 656 种)不同的模型,包括不同的协变量集。我们通过比较回归参数(如 R2、系数和 P 值)以及方差分析 (ANOVA) 测试来评估这些不同的模型规范。研究结果表明,对于大多数结果,似乎最多只需要三个 PC 即可充分控制人群分层,而包含其他协变量(特别是年龄和性别)似乎对模型性能更为重要。