Institut Curie, PSL Research University, Paris, F-75005, France.
INSERM, U900, Paris, F-75005, France.
BMC Bioinformatics. 2018 Sep 6;19(1):313. doi: 10.1186/s12859-018-2256-5.
Normalization is essential to ensure accurate analysis and proper interpretation of sequencing data, and chromosome conformation capture data such as Hi-C have particular challenges. Although several methods have been proposed, the most widely used type of normalization of Hi-C data usually casts estimation of unwanted effects as a matrix balancing problem, relying on the assumption that all genomic regions interact equally with each other.
In order to explore the effect of copy-number variations on Hi-C data normalization, we first propose a simulation model that predict the effects of large copy-number changes on a diploid Hi-C contact map. We then show that the standard approaches relying on equal visibility fail to correct for unwanted effects in the presence of copy-number variations. We thus propose a simple extension to matrix balancing methods that model these effects. Our approach can either retain the copy-number variation effects (LOIC) or remove them (CAIC). We show that this leads to better downstream analysis of the three-dimensional organization of rearranged genomes.
Taken together, our results highlight the importance of using dedicated methods for the analysis of Hi-C cancer data. Both CAIC and LOIC methods perform well on simulated and real Hi-C data sets, each fulfilling different needs.
为了确保测序数据和染色体构象捕获数据(如 Hi-C)的准确分析和正确解释,归一化是必不可少的。尽管已经提出了几种方法,但最广泛使用的 Hi-C 数据归一化类型通常将估计不需要的影响视为矩阵平衡问题,依赖于所有基因组区域彼此平等相互作用的假设。
为了探索拷贝数变异对 Hi-C 数据归一化的影响,我们首先提出了一个模拟模型,预测大拷贝数变化对二倍体 Hi-C 接触图谱的影响。然后我们表明,在存在拷贝数变异的情况下,依赖于相等可见性的标准方法无法纠正不需要的影响。因此,我们提出了一种简单的矩阵平衡方法扩展,该方法可以模拟这些影响。我们的方法可以保留(LOIC)或去除(CAIC)拷贝数变异的影响。我们表明,这可以更好地分析重排基因组的三维结构。
总之,我们的结果强调了使用专门的方法分析 Hi-C 癌症数据的重要性。CAIC 和 LOIC 方法在模拟和真实 Hi-C 数据集上都表现良好,每种方法都满足不同的需求。