Suppr超能文献

基于广义 Berk-Jones 统计量的 GWAS 中强大的基因集分析。

Powerful gene set analysis in GWAS with the Generalized Berk-Jones statistic.

机构信息

Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, United States of America.

The Donnelly Center, University of Toronto, Toronto, Ontario, Canada.

出版信息

PLoS Genet. 2019 Mar 15;15(3):e1007530. doi: 10.1371/journal.pgen.1007530. eCollection 2019 Mar.

Abstract

A common complementary strategy in Genome-Wide Association Studies (GWAS) is to perform Gene Set Analysis (GSA), which tests for the association between one phenotype of interest and an entire set of Single Nucleotide Polymorphisms (SNPs) residing in selected genes. While there exist many tools for performing GSA, popular methods often include a number of ad-hoc steps that are difficult to justify statistically, provide complicated interpretations based on permutation inference, and demonstrate poor operating characteristics. Additionally, the lack of gold standard gene set lists can produce misleading results and create difficulties in comparing analyses even across the same phenotype. We introduce the Generalized Berk-Jones (GBJ) statistic for GSA, a permutation-free parametric framework that offers asymptotic power guarantees in certain set-based testing settings. To adjust for confounding introduced by different gene set lists, we further develop a GBJ step-down inference technique that can discriminate between gene sets driven to significance by single genes and those demonstrating group-level effects. We compare GBJ to popular alternatives through simulation and re-analysis of summary statistics from a large breast cancer GWAS, and we show how GBJ can increase power by incorporating information from multiple signals in the same gene. In addition, we illustrate how breast cancer pathway analysis can be confounded by the frequency of FGFR2 in pathway lists. Our approach is further validated on two other datasets of summary statistics generated from GWAS of height and schizophrenia.

摘要

在全基因组关联研究(GWAS)中,一种常见的补充策略是进行基因集分析(GSA),该分析检验一个感兴趣的表型与选定基因中存在的整个单核苷酸多态性(SNP)集之间的关联。虽然有许多用于执行 GSA 的工具,但流行的方法通常包括一些难以从统计学上证明的特定步骤、基于置换推断提供复杂解释以及表现出较差操作特性的步骤。此外,缺乏黄金标准基因集列表会产生误导性结果,并在比较分析时即使在同一表型中也会造成困难。我们引入了用于 GSA 的广义 Berk-Jones(GBJ)统计量,这是一种无置换的参数框架,在某些基于集合的测试设置中提供渐近功效保证。为了调整不同基因集列表引入的混杂,我们进一步开发了一种 GBJ 逐步下降推断技术,该技术可以区分由单个基因驱动到显著水平的基因集和显示组水平效应的基因集。我们通过模拟和重新分析来自大型乳腺癌 GWAS 的汇总统计数据,将 GBJ 与流行的替代方法进行比较,并展示了如何通过整合同一基因中的多个信号来提高功效。此外,我们还说明了乳腺癌途径分析如何受到途径列表中 FGFR2 频率的干扰。我们的方法在另外两个来自身高和精神分裂症 GWAS 的汇总统计数据的数据集上进一步得到验证。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f68f/6436759/42f6dcea127b/pgen.1007530.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验