量化基因型调用中的不确定性。

Quantifying uncertainty in genotype calls.

机构信息

Department of Biostatistics, Johns Hopkins University, Baltimore, MD 21205, USA.

出版信息

Bioinformatics. 2010 Jan 15;26(2):242-9. doi: 10.1093/bioinformatics/btp624. Epub 2009 Nov 11.

DOI:10.1093/bioinformatics/btp624

PMID:19906825

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2804295/

Abstract

MOTIVATION

Genome-wide association studies (GWAS) are used to discover genes underlying complex, heritable disorders for which less powerful study designs have failed in the past. The number of GWAS has skyrocketed recently with findings reported in top journals and the mainstream media. Microarrays are the genotype calling technology of choice in GWAS as they permit exploration of more than a million single nucleotide polymorphisms (SNPs) simultaneously. The starting point for the statistical analyses used by GWAS to determine association between loci and disease is making genotype calls (AA, AB or BB). However, the raw data, microarray probe intensities, are heavily processed before arriving at these calls. Various sophisticated statistical procedures have been proposed for transforming raw data into genotype calls. We find that variability in microarray output quality across different SNPs, different arrays and different sample batches have substantial influence on the accuracy of genotype calls made by existing algorithms. Failure to account for these sources of variability can adversely affect the quality of findings reported by the GWAS.

RESULTS

We developed a method based on an enhanced version of the multi-level model used by CRLMM version 1. Two key differences are that we now account for variability across batches and improve the call-specific assessment of each call. The new model permits the development of quality metrics for SNPs, samples and batches of samples. Using three independent datasets, we demonstrate that the CRLMM version 2 outperforms CRLMM version 1 and the algorithm provided by Affymetrix, Birdseed. The main advantage of the new approach is that it enables the identification of low-quality SNPs, samples and batches.

AVAILABILITY

Software implementing of the method described in this article is available as free and open source code in the crlmm R/BioConductor package.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

全基因组关联研究（GWAS）用于发现复杂的、可遗传的疾病的相关基因，过去这些疾病的研究设计不太强大。最近，GWAS 的数量激增，顶级期刊和主流媒体都有报道。微阵列是 GWAS 中首选的基因型检测技术，因为它们可以同时探索超过一百万的单核苷酸多态性（SNP）。GWAS 用于确定基因座与疾病之间关联的统计分析的起点是进行基因型调用（AA、AB 或 BB）。然而，在得出这些调用之前，原始数据（微阵列探针强度）需要经过大量处理。已经提出了各种复杂的统计程序来将原始数据转换为基因型调用。我们发现，不同 SNP、不同微阵列和不同样本批次之间的微阵列输出质量的可变性对现有算法做出的基因型调用的准确性有很大影响。如果不考虑这些可变性来源，可能会对 GWAS 报告的发现质量产生不利影响。

结果

我们开发了一种基于 CRLMM 版本 1 中使用的增强多级模型的方法。两个关键区别是，我们现在考虑了批次之间的可变性，并改进了每个调用的特定调用评估。新模型允许为 SNP、样本和样本批次开发质量指标。使用三个独立的数据集，我们证明 CRLMM 版本 2 优于 CRLMM 版本 1 和 Affymetrix 的 Birdseed 算法。新方法的主要优点是它能够识别低质量的 SNP、样本和批次。

可用性

本文中描述的方法的软件实现在 crlmm R/BioConductor 包中作为免费的开源代码提供。

补充信息

补充数据可在生物信息学在线获得。

相似文献

Quantifying uncertainty in genotype calls.

Bioinformatics. 2010 Jan 15;26(2):242-9. doi: 10.1093/bioinformatics/btp624. Epub 2009 Nov 11.

Assessing batch effects of genotype calling algorithm BRLMM for the Affymetrix GeneChip Human Mapping 500 K array set using 270 HapMap samples.

BMC Bioinformatics. 2008 Aug 12;9 Suppl 9(Suppl 9):S17. doi: 10.1186/1471-2105-9-S9-S17.

A multi-array multi-SNP genotyping algorithm for Affymetrix SNP microarrays.

Bioinformatics. 2007 Jun 15;23(12):1459-67. doi: 10.1093/bioinformatics/btm131. Epub 2007 Apr 25.

Evaluating the influence of quality control decisions and software algorithms on SNP calling for the affymetrix 6.0 SNP array platform.

Hum Hered. 2011;71(4):221-33. doi: 10.1159/000328843. Epub 2011 Jul 2.

R/Bioconductor software for Illumina's Infinium whole-genome genotyping BeadChips.

Bioinformatics. 2009 Oct 1;25(19):2621-3. doi: 10.1093/bioinformatics/btp470. Epub 2009 Aug 6.

Variability in GWAS analysis: the impact of genotype calling algorithm inconsistencies.

Pharmacogenomics J. 2010 Aug;10(4):324-35. doi: 10.1038/tpj.2010.46.

SNiPer-HD: improved genotype calling accuracy by an expectation-maximization algorithm for high-density SNP arrays.

Bioinformatics. 2007 Jan 1;23(1):57-63. doi: 10.1093/bioinformatics/btl536. Epub 2006 Oct 24.

Software comparison for evaluating genomic copy number variation for Affymetrix 6.0 SNP array platform.

BMC Bioinformatics. 2011 May 31;12:220. doi: 10.1186/1471-2105-12-220.

Comparing genotyping algorithms for Illumina's Infinium whole-genome SNP BeadChips.

BMC Bioinformatics. 2011 Mar 8;12:68. doi: 10.1186/1471-2105-12-68.

M(3): an improved SNP calling algorithm for Illumina BeadArray data.

Bioinformatics. 2012 Feb 1;28(3):358-65. doi: 10.1093/bioinformatics/btr673. Epub 2011 Dec 8.

引用本文的文献

Genotype prediction of 336,463 samples from public expression data.

bioRxiv. 2024 Mar 13:2023.10.21.562237. doi: 10.1101/2023.10.21.562237.

Gene essentiality in cancer cell lines is modified by the sex chromosomes.

Genome Res. 2022 Nov-Dec;32(11-12):1993-2002. doi: 10.1101/gr.276488.121. Epub 2022 Nov 23.

Analysis of the caudate nucleus transcriptome in individuals with schizophrenia highlights effects of antipsychotics and new risk genes.

Nat Neurosci. 2022 Nov;25(11):1559-1568. doi: 10.1038/s41593-022-01182-7. Epub 2022 Nov 1.

Allelic expression imbalance of PIK3CA mutations is frequent in breast cancer and prognostically significant.

NPJ Breast Cancer. 2022 Jun 8;8(1):71. doi: 10.1038/s41523-022-00435-9.

Heritability and Genomic Architecture of Episodic Exercise-Induced Collapse in Border Collies.

Genes (Basel). 2021 Nov 29;12(12):1927. doi: 10.3390/genes12121927.

Role of a genetic variation in the microRNA-4421 binding site of ERP29 regarding risk of oropharynx cancer and prognosis.

Sci Rep. 2020 Oct 12;10(1):17039. doi: 10.1038/s41598-020-73675-z.

Inherited variations in human pigmentation-related genes modulate cutaneous melanoma risk and clinicopathological features in Brazilian population.

Sci Rep. 2020 Jul 22;10(1):12129. doi: 10.1038/s41598-020-68945-9.

Combination of PI3K and MEK inhibitors yields durable remission in PDX models of PIK3CA-mutated metaplastic breast cancers.

J Hematol Oncol. 2020 Feb 22;13(1):13. doi: 10.1186/s13045-020-0846-y.

Response to mTOR and PI3K inhibitors in enzalutamide-resistant luminal androgen receptor triple-negative breast cancer patient-derived xenografts.

Theranostics. 2020 Jan 1;10(4):1531-1543. doi: 10.7150/thno.36182. eCollection 2020.

Genome-epigenome interactions associated with Myalgic Encephalomyelitis/Chronic Fatigue Syndrome.

Epigenetics. 2018;13(12):1174-1190. doi: 10.1080/15592294.2018.1549769. Epub 2018 Dec 5.

本文引用的文献

R/Bioconductor software for Illumina's Infinium whole-genome genotyping BeadChips.

Bioinformatics. 2009 Oct 1;25(19):2621-3. doi: 10.1093/bioinformatics/btp470. Epub 2009 Aug 6.

Inflammation, hemostasis, and the risk of kidney function decline in the Atherosclerosis Risk in Communities (ARIC) Study.

Am J Kidney Dis. 2009 Apr;53(4):596-605. doi: 10.1053/j.ajkd.2008.10.044. Epub 2008 Dec 24.

Integrated genotype calling and association analysis of SNPs, common copy number polymorphisms and rare CNVs.

Nat Genet. 2008 Oct;40(10):1253-60. doi: 10.1038/ng.237. Epub 2008 Sep 7.

Validation and extension of an empirical Bayes method for SNP calling on Affymetrix microarrays.

Genome Biol. 2008 Apr 3;9(4):R63. doi: 10.1186/gb-2008-9-4-r63.

A new multipoint method for genome-wide association studies by imputation of genotypes.

Nat Genet. 2007 Jul;39(7):906-13. doi: 10.1038/ng2088. Epub 2007 Jun 17.

Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls.

Nature. 2007 Jun 7;447(7145):661-78. doi: 10.1038/nature05911.

A method to address differential bias in genotyping in large-scale association studies.

PLoS Genet. 2007 May 18;3(5):e74. doi: 10.1371/journal.pgen.0030074. Epub 2007 Apr 5.

Exploration, normalization, and genotype calls of high-density oligonucleotide SNP array data.

Biostatistics. 2007 Apr;8(2):485-99. doi: 10.1093/biostatistics/kxl042. Epub 2006 Dec 22.

Genetics of Kidneys in Diabetes (GoKinD) study: a genetics collection available for identifying genetic susceptibility factors for diabetic nephropathy in type 1 diabetes.

J Am Soc Nephrol. 2006 Jul;17(7):1782-90. doi: 10.1681/ASN.2005080822. Epub 2006 Jun 14.

Linear models and empirical bayes methods for assessing differential expression in microarray experiments.

Stat Appl Genet Mol Biol. 2004;3:Article3. doi: 10.2202/1544-6115.1027. Epub 2004 Feb 12.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

量化基因型调用中的不确定性。

Quantifying uncertainty in genotype calls.

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

SUPPLEMENTARY INFORMATION

动机

结果

可用性

补充信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献