He Ruoyu, Ren Jingchen, Malakhov Mykhaylo M, Pan Wei
School of Statistics, University of Minnesota, Minneapolis, Minnesota, United States of America.
Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, Minnesota, United States of America.
PLoS Genet. 2025 Apr 10;21(4):e1011659. doi: 10.1371/journal.pgen.1011659. eCollection 2025 Apr.
Genome-wide association studies (GWAS) performed on large cohort and biobank datasets have identified many genetic loci associated with Alzheimer's disease (AD). However, the younger demographic of biobank participants relative to the typical age of late-onset AD has resulted in an insufficient number of AD cases, limiting the statistical power of GWAS and any downstream analyses. To mitigate this limitation, several trait imputation methods have been proposed to impute the expected future AD status of individuals who may not have yet developed the disease. This paper explores the use of imputed AD status in nonlinear transcriptome/proteome-wide association studies (TWAS/PWAS) to identify genes and proteins whose genetically regulated expression is associated with AD risk. In particular, we considered the TWAS/PWAS method DeLIVR, which utilizes deep learning to model the nonlinear effects of expression on disease. We trained transcriptome and proteome imputation models for DeLIVR on data from the Genotype-Tissue Expression (GTEx) Project and the UK Biobank (UKB), respectively, with imputed AD status in UKB participants as the outcome. Next, we performed hypothesis testing for the DeLIVR models using clinically diagnosed AD cases from the Alzheimer's Disease Sequencing Project (ADSP). Our results demonstrate that nonlinear TWAS/PWAS trained with imputed AD outcomes successfully identifies known and putative AD risk genes and proteins. Notably, we found that training with imputed outcomes can increase statistical power without inflating false positives, enabling the discovery of molecular exposures with potentially nonlinear effects on neurodegeneration.
对大型队列和生物样本库数据集进行的全基因组关联研究(GWAS)已经确定了许多与阿尔茨海默病(AD)相关的基因位点。然而,相对于晚发性AD的典型发病年龄,生物样本库参与者的年龄层较轻,导致AD病例数量不足,限制了GWAS以及任何下游分析的统计效力。为了缓解这一限制,已经提出了几种性状插补方法,用于插补那些可能尚未发病的个体未来患AD的预期状态。本文探讨了在非线性转录组/蛋白质组全关联研究(TWAS/PWAS)中使用插补的AD状态,以识别其基因调控表达与AD风险相关的基因和蛋白质。具体而言,我们考虑了TWAS/PWAS方法DeLIVR,该方法利用深度学习对表达对疾病的非线性效应进行建模。我们分别在基因型-组织表达(GTEx)项目和英国生物样本库(UKB)的数据上,以UKB参与者的插补AD状态作为结果,为DeLIVR训练转录组和蛋白质组插补模型。接下来,我们使用来自阿尔茨海默病测序项目(ADSP)的临床诊断AD病例对DeLIVR模型进行假设检验。我们的结果表明,用插补的AD结果训练的非线性TWAS/PWAS能够成功识别已知和推定的AD风险基因及蛋白质。值得注意的是,我们发现用插补结果进行训练可以在不增加假阳性的情况下提高统计效力,从而发现对神经退行性变可能具有非线性效应的分子暴露因素。