Chang Yung-Han, Bresnahan Sean T, Taylor Head S, Harrison Tabitha A, Yu Yao, Huff Chad D, Pasaniuc Bogdan, Lindström Sara, Bhattacharya Arjun
Quantitative Sciences Program, University of Texas MD Anderson Cancer Center UTHealth Houston Graduate School of Biomedical Sciences, Houston, TX, USA.
Department of Epidemiology, University of Texas MD Anderson Cancer Center, Houston, TX, USA.
Br J Cancer. 2025 Aug 7. doi: 10.1038/s41416-025-03141-y.
Integrating genome-wide association study (GWAS) and transcriptomic datasets can identify mediators for genetic risk of cancer. Traditional methods often are insufficient as they rely on total gene expression measures and overlook alternative splicing, which generates different transcript-isoforms with potentially distinct effects.
We integrate multi-tissue isoform expression data from the Genotype Tissue-Expression Project with GWAS summary statistics (all N > ~20,000 cases) to identify isoform- and gene-level associations with six cancers (breast, endometrial, colorectal, lung, ovarian, prostate) and six related cancer subtype classifications (N = 12 total).
Directly modeling isoforms using transcriptome-wide association studies (isoTWAS) significantly improves discovery of genetic associations compared to gene-level approaches, identifying 164% more significant associations (6163 vs. 2336) with isoTWAS-prioritized genes enriched 4-fold for evolutionarily-constrained genes. isoTWAS tags transcriptomic associations at 52% more independent GWAS loci across the six cancers. Isoform expression mediates an estimated 63% greater proportion of cancer risk SNP heritability compared to gene expression. We highlight several isoTWAS associations that demonstrate GWAS colocalization at the isoform level but not at the gene level, including CLPTM1L (lung cancer), LAMC1 (colorectal), and BABAM1 (breast).
These results underscore the importance of modeling isoforms to maximize discovery of genetic risk mechanisms for cancers.
整合全基因组关联研究(GWAS)和转录组数据集能够识别癌症遗传风险的介导因素。传统方法往往存在不足,因为它们依赖于基因总表达量测量,而忽略了可变剪接,可变剪接会产生具有潜在不同效应的不同转录异构体。
我们将基因型组织表达项目的多组织异构体表达数据与GWAS汇总统计数据(所有N>~20,000例)整合,以识别与六种癌症(乳腺癌、子宫内膜癌、结直肠癌、肺癌、卵巢癌、前列腺癌)及六种相关癌症亚型分类(共N = 12)的异构体和基因水平关联。
与基因水平方法相比,使用全转录组关联研究(isoTWAS)直接对异构体进行建模可显著提高遗传关联的发现率,isoTWAS识别出的显著关联多164%(6163对2336),isoTWAS优先排序的基因中受进化限制的基因富集了4倍。isoTWAS在六种癌症的52%以上的独立GWAS位点标记转录组关联。与基因表达相比,异构体表达介导的癌症风险SNP遗传力比例估计高63%。我们重点介绍了几个isoTWAS关联,这些关联在异构体水平而非基因水平显示GWAS共定位,包括CLPTM1L(肺癌)、LAMC1(结直肠癌)和BABAM1(乳腺癌)。
这些结果强调了对异构体进行建模以最大限度发现癌症遗传风险机制的重要性。