Suppr超能文献

用于协调 Illumina 的 450K 和 EPIC 平台的 DNA 甲基化数据以用于流行病学研究的有效处理管道。

An effective processing pipeline for harmonizing DNA methylation data from Illumina's 450K and EPIC platforms for epidemiological studies.

机构信息

Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.

Department of Epidemiology, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, USA.

出版信息

BMC Res Notes. 2021 Sep 8;14(1):352. doi: 10.1186/s13104-021-05741-2.

Abstract

OBJECTIVE

Illumina BeadChip arrays are commonly used to generate DNA methylation data for large epidemiological studies. Updates in technology over time create challenges for data harmonization within and between studies, many of which obtained data from the older 450K and newer EPIC platforms. The pre-processing pipeline for DNA methylation is not trivial, and influences the downstream analyses. Incorporating different platforms adds a new level of technical variability that has not yet been taken into account by recommended pipelines. Our study evaluated the performance of various tools on different versions of platform data harmonization at each step of pre-processing pipeline, including quality control (QC), normalization, batch effect adjustment, and genomic inflation. We illustrate our novel approach using 450K and EPIC data from the Diabetes Autoimmunity Study in the Young (DAISY) prospective cohort.

RESULTS

We found normalization and probe filtering had the biggest effect on data harmonization. Employing a meta-analysis was an effective and easily executable method for accounting for platform variability. Correcting for genomic inflation also helped with harmonization. We present guidelines for studies seeking to harmonize data from the 450K and EPIC platforms, which includes the use of technical replicates for evaluating numerous pre-processing steps, and employing a meta-analysis.

摘要

目的

Illumina BeadChip 阵列常用于生成大型流行病学研究的 DNA 甲基化数据。随着时间的推移,技术的更新为研究内部和研究之间的数据协调带来了挑战,其中许多研究从较旧的 450K 和较新的 EPIC 平台获得了数据。DNA 甲基化的预处理管道并不简单,并且会影响下游分析。整合不同的平台增加了一个尚未被推荐管道考虑到的新的技术可变性层次。我们的研究评估了各种工具在预处理管道的每个步骤(包括质量控制 (QC)、标准化、批次效应调整和基因组膨胀)中对不同版本平台数据协调的性能。我们使用来自年轻糖尿病自身免疫研究 (DAISY) 前瞻性队列的 450K 和 EPIC 数据说明了我们的新方法。

结果

我们发现标准化和探针过滤对数据协调有最大的影响。采用荟萃分析是一种有效且易于执行的方法,可以解决平台变异性问题。校正基因组膨胀也有助于协调。我们为试图协调 450K 和 EPIC 平台数据的研究提供了指导方针,包括使用技术重复来评估众多预处理步骤,并采用荟萃分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b07e/8424820/6b76de7579c7/13104_2021_5741_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验