Suppr超能文献

设计全基因组关联研究:样本量、效能、填补以及基因分型芯片的选择

Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip.

作者信息

Spencer Chris C A, Su Zhan, Donnelly Peter, Marchini Jonathan

机构信息

Department of Statistics, University of Oxford, Oxford, United Kingdom.

出版信息

PLoS Genet. 2009 May;5(5):e1000477. doi: 10.1371/journal.pgen.1000477. Epub 2009 May 15.

Abstract

Genome-wide association studies are revolutionizing the search for the genes underlying human complex diseases. The main decisions to be made at the design stage of these studies are the choice of the commercial genotyping chip to be used and the numbers of case and control samples to be genotyped. The most common method of comparing different chips is using a measure of coverage, but this fails to properly account for the effects of sample size, the genetic model of the disease, and linkage disequilibrium between SNPs. In this paper, we argue that the statistical power to detect a causative variant should be the major criterion in study design. Because of the complicated pattern of linkage disequilibrium (LD) in the human genome, power cannot be calculated analytically and must instead be assessed by simulation. We describe in detail a method of simulating case-control samples at a set of linked SNPs that replicates the patterns of LD in human populations, and we used it to assess power for a comprehensive set of available genotyping chips. Our results allow us to compare the performance of the chips to detect variants with different effect sizes and allele frequencies, look at how power changes with sample size in different populations or when using multi-marker tags and genotype imputation approaches, and how performance compares to a hypothetical chip that contains every SNP in HapMap. A main conclusion of this study is that marked differences in genome coverage may not translate into appreciable differences in power and that, when taking budgetary considerations into account, the most powerful design may not always correspond to the chip with the highest coverage. We also show that genotype imputation can be used to boost the power of many chips up to the level obtained from a hypothetical "complete" chip containing all the SNPs in HapMap. Our results have been encapsulated into an R software package that allows users to design future association studies and our methods provide a framework with which new chip sets can be evaluated.

摘要

全基因组关联研究正在彻底改变对人类复杂疾病潜在基因的搜索。在这些研究的设计阶段要做出的主要决策是选择使用的商业基因分型芯片以及要进行基因分型的病例和对照样本数量。比较不同芯片最常用的方法是使用覆盖度衡量指标,但这未能充分考虑样本量、疾病的遗传模型以及单核苷酸多态性(SNP)之间的连锁不平衡的影响。在本文中,我们认为检测致病变异的统计效能应是研究设计中的主要标准。由于人类基因组中连锁不平衡(LD)模式复杂,无法通过解析计算效能,而必须通过模拟进行评估。我们详细描述了一种在一组连锁SNP处模拟病例对照样本的方法,该方法能复制人类群体中的LD模式,并用它来评估一系列可用基因分型芯片的效能。我们的结果使我们能够比较芯片检测不同效应大小和等位基因频率变异的性能,研究效能在不同人群中如何随样本量变化,或者在使用多标记标签和基因型填充方法时如何变化,以及性能与包含HapMap中每个SNP的假设芯片相比如何。这项研究的一个主要结论是,基因组覆盖度的显著差异可能不会转化为效能上的明显差异,并且在考虑预算因素时,最有效的设计可能并不总是对应覆盖度最高的芯片。我们还表明,基因型填充可用于将许多芯片的效能提高到从包含HapMap中所有SNP的假设“完整”芯片获得的水平。我们的结果已封装到一个R软件包中,该软件包允许用户设计未来的关联研究,我们的方法提供了一个可用于评估新芯片集的框架。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b808/2688469/4b6bd235a22d/pgen.1000477.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验