Social, Genetic and Developmental Psychiatry, Institute of Psychiatry, King's College London, De Crespigny Park, London, UK.
BMC Genomics. 2013 May 1;14:293. doi: 10.1186/1471-2164-14-293.
As the most stable and experimentally accessible epigenetic mark, DNA methylation is of great interest to the research community. The landscape of DNA methylation across tissues, through development and in disease pathogenesis is not yet well characterized. Thus there is a need for rapid and cost effective methods for assessing genome-wide levels of DNA methylation. The Illumina Infinium HumanMethylation450 (450K) BeadChip is a very useful addition to the available methods for DNA methylation analysis but its complex design, incorporating two different assay methods, requires careful consideration. Accordingly, several normalization schemes have been published. We have taken advantage of known DNA methylation patterns associated with genomic imprinting and X-chromosome inactivation (XCI), in addition to the performance of SNP genotyping assays present on the array, to derive three independent metrics which we use to test alternative schemes of correction and normalization. These metrics also have potential utility as quality scores for datasets.
The standard index of DNA methylation at any specific CpG site is β = M/(M + U + 100) where M and U are methylated and unmethylated signal intensities, respectively. Betas (βs) calculated from raw signal intensities (the default GenomeStudio behavior) perform well, but using 11 methylomic datasets we demonstrate that quantile normalization methods produce marked improvement, even in highly consistent data, by all three metrics. The commonly used procedure of normalizing betas is inferior to the separate normalization of M and U, and it is also advantageous to normalize Type I and Type II assays separately. More elaborate manipulation of quantiles proves to be counterproductive.
Careful selection of preprocessing steps can minimize variance and thus improve statistical power, especially for the detection of the small absolute DNA methylation changes likely associated with complex disease phenotypes. For the convenience of the research community we have created a user-friendly R software package called wateRmelon, downloadable from bioConductor, compatible with the existing methylumi, minfi and IMA packages, that allows others to utilize the same normalization methods and data quality tests on 450K data.
作为最稳定和最易于实验的表观遗传标记,DNA 甲基化引起了研究界的极大兴趣。组织、发育和疾病发病机制中 DNA 甲基化的全景尚未得到很好的描述。因此,需要快速且具有成本效益的方法来评估全基因组水平的 DNA 甲基化。Illumina Infinium HumanMethylation450(450K)BeadChip 是 DNA 甲基化分析的一种非常有用的方法,但由于其复杂的设计,结合了两种不同的检测方法,需要仔细考虑。因此,已经发表了几种归一化方案。我们利用了与基因组印迹和 X 染色体失活(XCI)相关的已知 DNA 甲基化模式,以及阵列上存在的 SNP 基因分型检测的性能,得出了三个独立的指标,我们用这些指标来测试替代的校正和归一化方案。这些指标也可以作为数据集的质量分数。
任何特定 CpG 位点的 DNA 甲基化的标准指数是β= M/(M + U + 100),其中 M 和 U 分别是甲基化和未甲基化的信号强度。从原始信号强度(默认的 GenomeStudio 行为)计算出的贝塔值(βs)表现良好,但使用 11 个甲基组数据集,我们证明了分位数归一化方法通过所有三个指标产生了显著的改进,即使在高度一致的数据中也是如此。通常使用的标准化β的过程不如单独标准化 M 和 U 有效,单独标准化 I 型和 II 型检测也更有利。更精细的分位数操作被证明是适得其反的。
仔细选择预处理步骤可以最小化方差,从而提高统计能力,特别是对于检测可能与复杂疾病表型相关的微小绝对 DNA 甲基化变化。为了方便研究界,我们创建了一个名为 wateRmelon 的用户友好的 R 软件包,可从 bioConductor 下载,与现有的 methylumi、minfi 和 IMA 包兼容,允许其他人在 450K 数据上使用相同的归一化方法和数据质量测试。