School of Computer Science and Engineering, Nanyang Technological University, 50 Nanyang Avenue, Singapore, 639798, Singapore.
School of Biological Sciences, Nanyang Technological University, 60 Nanyang Drive, Singapore, 637551, Singapore.
BMC Bioinformatics. 2020 Nov 7;21(1):506. doi: 10.1186/s12859-020-03832-8.
Hi-C and its variant techniques have been developed to capture the spatial organization of chromatin. Normalization of Hi-C contact map is essential for accurate modeling and interpretation of high-throughput chromatin conformation capture (3C) experiments. Hi-C correction tools were originally developed to normalize systematic biases of karyotypically normal cell lines. However, a vast majority of available Hi-C datasets are derived from cancer cell lines that carry multi-level DNA copy number variations (CNVs). CNV regions display over- or under-representation of interaction frequencies compared to CN-neutral regions. Therefore, it is necessary to remove CNV-driven bias from chromatin interaction data of cancer cell lines to generate a euploid-equivalent contact map.
We developed the HiCNAtra framework to compute high-resolution CNV profiles from Hi-C or 3C-seq data of cancer cell lines and to correct chromatin contact maps from systematic biases including CNV-associated bias. First, we introduce a novel 'entire-fragment' counting method for better estimation of the read depth (RD) signal from Hi-C reads that recapitulates the whole-genome sequencing (WGS)-derived coverage signal. Second, HiCNAtra employs a multimodal-based hierarchical CNV calling approach, which outperformed OneD and HiNT tools, to accurately identify CNVs of cancer cell lines. Third, incorporating CNV information with other systematic biases, HiCNAtra simultaneously estimates the contribution of each bias and explicitly corrects the interaction matrix using Poisson regression. HiCNAtra normalization abolishes CNV-induced artifacts from the contact map generating a heatmap with homogeneous signal. When benchmarked against OneD, CAIC, and ICE methods using MCF7 cancer cell line, HiCNAtra-corrected heatmap achieves the least 1D signal variation without deforming the inherent chromatin interaction signal. Additionally, HiCNAtra-corrected contact frequencies have minimum correlations with each of the systematic bias sources compared to OneD's explicit method. Visual inspection of CNV profiles and contact maps of cancer cell lines reveals that HiCNAtra is the most robust Hi-C correction tool for ameliorating CNV-induced bias.
HiCNAtra is a Hi-C-based computational tool that provides an analytical and visualization framework for DNA copy number profiling and chromatin contact map correction of karyotypically abnormal cell lines. HiCNAtra is an open-source software implemented in MATLAB and is available at https://github.com/AISKhalil/HiCNAtra .
Hi-C 及其变体技术被开发用于捕获染色质的空间组织。Hi-C 接触图谱的归一化对于准确建模和解释高通量染色质构象捕获 (3C) 实验至关重要。Hi-C 校正工具最初是为了校正正常核型细胞系的系统偏差而开发的。然而,绝大多数可用的 Hi-C 数据集都来自携带多层次 DNA 拷贝数变异 (CNV) 的癌细胞系。与 CN 中性区域相比,CNV 区域显示出交互频率的过表达或低表达。因此,有必要从癌细胞系的染色质相互作用数据中去除由 CNV 驱动的偏差,以生成与整倍体等效的接触图谱。
我们开发了 HiCNAtra 框架,用于从癌细胞系的 Hi-C 或 3C-seq 数据中计算高分辨率的 CNV 图谱,并校正包括与 CNV 相关的偏差在内的系统偏差引起的染色质接触图谱。首先,我们引入了一种新颖的“全片段”计数方法,用于更好地估计来自 Hi-C 读取的读深度 (RD) 信号,该方法再现了全基因组测序 (WGS) 衍生的覆盖信号。其次,HiCNAtra 采用基于多峰的分层 CNV 调用方法,该方法优于 OneD 和 HiNT 工具,能够准确识别癌细胞系的 CNV。第三,将 CNV 信息与其他系统偏差相结合,HiCNAtra 同时估计每个偏差的贡献,并使用泊松回归显式校正相互作用矩阵。HiCNAtra 归一化消除了由接触图谱生成的 CNV 诱导的伪影,生成了具有均匀信号的热图。在用 MCF7 癌细胞系与 OneD、CAIC 和 ICE 方法进行基准测试时,HiCNAtra 校正后的热图在不改变固有染色质相互作用信号的情况下实现了最小的 1D 信号变化。此外,与 OneD 的显式方法相比,HiCNAtra 校正后的接触频率与每个系统偏差源的相关性最小。对癌细胞系的 CNV 图谱和接触图谱进行可视化检查表明,HiCNAtra 是一种用于改善由 CNV 引起的偏差的最稳健的 Hi-C 校正工具。
HiCNAtra 是一种基于 Hi-C 的计算工具,为异常核型细胞系的 DNA 拷贝数分析和染色质接触图谱校正提供了分析和可视化框架。HiCNAtra 是一个用 MATLAB 实现的开源软件,可在 https://github.com/AISKhalil/HiCNAtra 上获得。