Suppr超能文献

用于ChIP-Seq数据分析的统计框架

A Statistical Framework for the Analysis of ChIP-Seq Data.

作者信息

Kuan Pei Fen, Chung Dongjun, Pan Guangjin, Thomson James A, Stewart Ron, Keleş Sündüz

机构信息

Departments of Statistics and of Biostatistics and Medical Informatics.

Genome Center of Wisconsin and Morgridge Institute for Research.

出版信息

J Am Stat Assoc. 2011;106(495):891-903. doi: 10.1198/jasa.2011.ap09706. Epub 2012 Jan 24.

Abstract

Chromatin immunoprecipitation followed by sequencing (ChIP-Seq) has revolutionalized experiments for genome-wide profiling of DNA-binding proteins, histone modifications, and nucleosome occupancy. As the cost of sequencing is decreasing, many researchers are switching from microarray-based technologies (ChIP-chip) to ChIP-Seq for genome-wide study of transcriptional regulation. Despite its increasing and well-deserved popularity, there is little work that investigates and accounts for sources of biases in the ChIP-Seq technology. These biases typically arise from both the standard pre-processing protocol and the underlying DNA sequence of the generated data. We study data from a naked DNA sequencing experiment, which sequences non-cross-linked DNA after deproteinizing and shearing, to understand factors affecting background distribution of data generated in a ChIP-Seq experiment. We introduce a background model that accounts for apparent sources of biases such as mappability and GC content and develop a flexible mixture model named MOSAiCS for detecting peaks in both one- and two-sample analyses of ChIP-Seq data. We illustrate that our model fits observed ChIP-Seq data well and further demonstrate advantages of MOSAiCS over commonly used tools for ChIP-Seq data analysis with several case studies.

摘要

染色质免疫沉淀测序(ChIP-Seq)彻底改变了用于全基因组分析DNA结合蛋白、组蛋白修饰和核小体占据情况的实验。随着测序成本的降低,许多研究人员正从基于微阵列的技术(ChIP-chip)转向ChIP-Seq,以进行全基因组转录调控研究。尽管ChIP-Seq越来越受欢迎且实至名归,但很少有工作去研究和解释该技术中偏差的来源。这些偏差通常源于标准的预处理方案和所生成数据的基础DNA序列。我们研究了来自裸DNA测序实验的数据,该实验在使DNA脱蛋白和剪切后对非交联DNA进行测序,以了解影响ChIP-Seq实验中数据背景分布的因素。我们引入了一个背景模型,该模型考虑了诸如可映射性和GC含量等明显的偏差来源,并开发了一种名为MOSAiCS的灵活混合模型,用于在ChIP-Seq数据的单样本和双样本分析中检测峰值。我们表明我们的模型能很好地拟合观察到的ChIP-Seq数据,并通过几个案例研究进一步证明了MOSAiCS相对于常用的ChIP-Seq数据分析工具的优势。

相似文献

1
A Statistical Framework for the Analysis of ChIP-Seq Data.
J Am Stat Assoc. 2011;106(495):891-903. doi: 10.1198/jasa.2011.ap09706. Epub 2012 Jan 24.
2
BIDCHIPS: bias decomposition and removal from ChIP-seq data clarifies true binding signal and its functional correlates.
Epigenetics Chromatin. 2015 Sep 17;8:33. doi: 10.1186/s13072-015-0028-2. eCollection 2015.
3
Genome-wide profiling of DNA-binding proteins using barcode-based multiplex Solexa sequencing.
Methods Mol Biol. 2012;786:247-62. doi: 10.1007/978-1-61779-292-2_15.
8
Direct ChIP-Seq significance analysis improves target prediction.
BMC Genomics. 2015;16 Suppl 5(Suppl 5):S4. doi: 10.1186/1471-2164-16-S5-S4. Epub 2015 May 26.
9
A fully Bayesian hidden Ising model for ChIP-seq data analysis.
Biostatistics. 2012 Jan;13(1):113-28. doi: 10.1093/biostatistics/kxr029. Epub 2011 Sep 13.

引用本文的文献

1
2
Guidelines to Analyze ChIP-Seq Data: Journey Through QC and Analysis Considerations.
Methods Mol Biol. 2025;2889:193-206. doi: 10.1007/978-1-0716-4322-8_14.
4
TbsP and TrmB jointly regulate gapII to influence cell development phenotypes in the archaeon Haloferax volcanii.
Mol Microbiol. 2024 Apr;121(4):742-766. doi: 10.1111/mmi.15225. Epub 2024 Jan 11.
8
TrmB Family Transcription Factor as a Thiol-Based Regulator of Oxidative Stress Response.
mBio. 2022 Aug 30;13(4):e0063322. doi: 10.1128/mbio.00633-22. Epub 2022 Jul 20.
9
The essential Rhodobacter sphaeroides CenKR two-component system regulates cell division and envelope biosynthesis.
PLoS Genet. 2022 Jun 29;18(6):e1010270. doi: 10.1371/journal.pgen.1010270. eCollection 2022 Jun.

本文引用的文献

1
Discovering transcription factor binding sites in highly repetitive regions of genomes with multi-read analysis of ChIP-Seq data.
PLoS Comput Biol. 2011 Jul;7(7):e1002111. doi: 10.1371/journal.pcbi.1002111. Epub 2011 Jul 14.
2
PICS: probabilistic inference for ChIP-seq.
Biometrics. 2011 Mar;67(1):151-63. doi: 10.1111/j.1541-0420.2010.01441.x.
3
JASPAR 2010: the greatly expanded open-access database of transcription factor binding profiles.
Nucleic Acids Res. 2010 Jan;38(Database issue):D105-10. doi: 10.1093/nar/gkp950. Epub 2009 Nov 11.
5
Primary sequence and epigenetic determinants of in vivo occupancy of genomic DNA by GATA1.
Nucleic Acids Res. 2009 Nov;37(21):7024-38. doi: 10.1093/nar/gkp747.
6
Mapping accessible chromatin regions using Sono-Seq.
Proc Natl Acad Sci U S A. 2009 Sep 1;106(35):14926-31. doi: 10.1073/pnas.0905443106. Epub 2009 Aug 18.
7
Genome-wide analysis of SREBP-1 binding in mouse liver chromatin reveals a preference for promoter proximal binding to a new motif.
Proc Natl Acad Sci U S A. 2009 Aug 18;106(33):13765-9. doi: 10.1073/pnas.0904246106. Epub 2009 Aug 4.
8
MEME SUITE: tools for motif discovery and searching.
Nucleic Acids Res. 2009 Jul;37(Web Server issue):W202-8. doi: 10.1093/nar/gkp335. Epub 2009 May 20.
9
PeakSeq enables systematic scoring of ChIP-seq experiments relative to controls.
Nat Biotechnol. 2009 Jan;27(1):66-75. doi: 10.1038/nbt.1518. Epub 2009 Jan 4.
10
An integrated software system for analyzing ChIP-chip and ChIP-seq data.
Nat Biotechnol. 2008 Nov;26(11):1293-300. doi: 10.1038/nbt.1505. Epub 2008 Nov 2.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验