Suppr超能文献

通过性状插补增强全转录组和全蛋白质组的非线性关联研究及其在阿尔茨海默病中的应用

Enhancing nonlinear transcriptome- and proteome-wide association studies via trait imputation with applications to Alzheimer's disease.

作者信息

He Ruoyu, Ren Jingchen, Malakhov Mykhaylo M, Pan Wei

机构信息

School of Statistics, University of Minnesota, Minneapolis, Minnesota, United States of America.

Division of Biostatistics and Health Data Science, School of Public Health, University of Minnesota, Minneapolis, Minnesota, United States of America.

出版信息

PLoS Genet. 2025 Apr 10;21(4):e1011659. doi: 10.1371/journal.pgen.1011659. eCollection 2025 Apr.

Abstract

Genome-wide association studies (GWAS) performed on large cohort and biobank datasets have identified many genetic loci associated with Alzheimer's disease (AD). However, the younger demographic of biobank participants relative to the typical age of late-onset AD has resulted in an insufficient number of AD cases, limiting the statistical power of GWAS and any downstream analyses. To mitigate this limitation, several trait imputation methods have been proposed to impute the expected future AD status of individuals who may not have yet developed the disease. This paper explores the use of imputed AD status in nonlinear transcriptome/proteome-wide association studies (TWAS/PWAS) to identify genes and proteins whose genetically regulated expression is associated with AD risk. In particular, we considered the TWAS/PWAS method DeLIVR, which utilizes deep learning to model the nonlinear effects of expression on disease. We trained transcriptome and proteome imputation models for DeLIVR on data from the Genotype-Tissue Expression (GTEx) Project and the UK Biobank (UKB), respectively, with imputed AD status in UKB participants as the outcome. Next, we performed hypothesis testing for the DeLIVR models using clinically diagnosed AD cases from the Alzheimer's Disease Sequencing Project (ADSP). Our results demonstrate that nonlinear TWAS/PWAS trained with imputed AD outcomes successfully identifies known and putative AD risk genes and proteins. Notably, we found that training with imputed outcomes can increase statistical power without inflating false positives, enabling the discovery of molecular exposures with potentially nonlinear effects on neurodegeneration.

摘要

对大型队列和生物样本库数据集进行的全基因组关联研究(GWAS)已经确定了许多与阿尔茨海默病(AD)相关的基因位点。然而,相对于晚发性AD的典型发病年龄,生物样本库参与者的年龄层较轻,导致AD病例数量不足,限制了GWAS以及任何下游分析的统计效力。为了缓解这一限制,已经提出了几种性状插补方法,用于插补那些可能尚未发病的个体未来患AD的预期状态。本文探讨了在非线性转录组/蛋白质组全关联研究(TWAS/PWAS)中使用插补的AD状态,以识别其基因调控表达与AD风险相关的基因和蛋白质。具体而言,我们考虑了TWAS/PWAS方法DeLIVR,该方法利用深度学习对表达对疾病的非线性效应进行建模。我们分别在基因型-组织表达(GTEx)项目和英国生物样本库(UKB)的数据上,以UKB参与者的插补AD状态作为结果,为DeLIVR训练转录组和蛋白质组插补模型。接下来,我们使用来自阿尔茨海默病测序项目(ADSP)的临床诊断AD病例对DeLIVR模型进行假设检验。我们的结果表明,用插补的AD结果训练的非线性TWAS/PWAS能够成功识别已知和推定的AD风险基因及蛋白质。值得注意的是,我们发现用插补结果进行训练可以在不增加假阳性的情况下提高统计效力,从而发现对神经退行性变可能具有非线性效应的分子暴露因素。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/02f3/12040266/f4f29edee2ff/pgen.1011659.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验