Suppr超能文献

InterpolatedXY:一种避免性别偏差的两步策略,用于标准化 DNA 甲基化微阵列数据。

InterpolatedXY: a two-step strategy to normalize DNA methylation microarray data avoiding sex bias.

机构信息

School of Computer Science and Electronic Engineering, University of Essex, Colchester CO4 3SQ, UK.

Diamond Light Source Ltd., Oxfordshire OX11 0DE, UK.

出版信息

Bioinformatics. 2022 Aug 10;38(16):3950-3957. doi: 10.1093/bioinformatics/btac436.

Abstract

MOTIVATION

Data normalization is an essential step to reduce technical variation within and between arrays. Due to the different karyotypes and the effects of X chromosome inactivation, females and males exhibit distinct methylation patterns on sex chromosomes; thus, it poses a significant challenge to normalize sex chromosome data without introducing bias. Currently, existing methods do not provide unbiased solutions to normalize sex chromosome data, usually, they just process autosomal and sex chromosomes indiscriminately.

RESULTS

Here, we demonstrate that ignoring this sex difference will lead to introducing artificial sex bias, especially for thousands of autosomal CpGs. We present a novel two-step strategy (interpolatedXY) to address this issue, which is applicable to all quantile-based normalization methods. By this new strategy, the autosomal CpGs are first normalized independently by conventional methods, such as funnorm or dasen; then the corrected methylation values of sex chromosome-linked CpGs are estimated as the weighted average of their nearest neighbors on autosomes. The proposed two-step strategy can also be applied to other non-quantile-based normalization methods, as well as other array-based data types. Moreover, we propose a useful concept: the sex explained fraction of variance, to quantitatively measure the normalization effect.

AVAILABILITY AND IMPLEMENTATION

The proposed methods are available by calling the function 'adjustedDasen' or 'adjustedFunnorm' in the latest wateRmelon package (https://github.com/schalkwyk/wateRmelon), with methods compatible with all the major workflows, including minfi.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

数据标准化是减少数组内部和之间技术差异的重要步骤。由于不同的核型和 X 染色体失活的影响,女性和男性在性染色体上表现出不同的甲基化模式;因此,在不引入偏差的情况下对性染色体数据进行标准化是一个重大挑战。目前,现有的方法没有提供无偏的解决方案来标准化性染色体数据,通常只是不分青红皂白地处理常染色体和性染色体。

结果

在这里,我们证明忽略这种性别差异将导致引入人为的性别偏差,特别是对于数千个常染色体 CpG。我们提出了一种新的两步策略(interpolatedXY)来解决这个问题,该策略适用于所有基于分位数的标准化方法。通过这个新策略,首先通过常规方法(如 funnorm 或 dasen)独立地对常染色体 CpG 进行标准化;然后,通过将其最近邻在常染色体上的加权平均值来估计性染色体连锁 CpG 的校正甲基化值。所提出的两步策略也可应用于其他非基于分位数的标准化方法,以及其他基于阵列的数据类型。此外,我们提出了一个有用的概念:方差解释的性别分数,用于定量衡量标准化效果。

可用性和实现

通过调用最新的 wateRmelon 包(https://github.com/schalkwyk/wateRmelon)中的函数 'adjustedDasen' 或 'adjustedFunnorm' 可以使用所提出的方法,这些方法与所有主要工作流程兼容,包括 minfi。

补充信息

补充数据可在 Bioinformatics 在线获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6621/9364386/bc2c1de9824c/btac436f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验