Molecular and Computational Biology, Department of Biological Sciences, University of Southern California, Los Angeles, California, USA.
Nat Protoc. 2018 May;13(5):915-926. doi: 10.1038/nprot.2018.008. Epub 2018 Apr 5.
Chromosome conformation capture technologies such as Hi-C are widely used to investigate the spatial organization of genomes. Because genome structures can vary considerably between individual cells of a population, interpreting ensemble-averaged Hi-C data can be challenging, in particular for long-range and interchromosomal interactions. We pioneered a probabilistic approach for the generation of a population of distinct diploid 3D genome structures consistent with all the chromatin-chromatin interaction probabilities from Hi-C experiments. Each structure in the population is a physical model of the genome in 3D. Analysis of these models yields new insights into the causes and the functional properties of the genome's organization in space and time. We provide a user-friendly software package, called PGS, which runs on local machines (for practice runs) and high-performance computing platforms. PGS takes a genome-wide Hi-C contact frequency matrix, along with information about genome segmentation, and produces an ensemble of 3D genome structures entirely consistent with the input. The software automatically generates an analysis report, and provides tools to extract and analyze the 3D coordinates of specific domains. Basic Linux command-line knowledge is sufficient for using this software. A typical running time of the pipeline is ∼3 d with 300 cores on a computer cluster to generate a population of 1,000 diploid genome structures at topological-associated domain (TAD)-level resolution.
染色体构象捕获技术(如 Hi-C)被广泛用于研究基因组的空间组织。由于群体中单个细胞的基因组结构可能有很大差异,因此解释平均化的 Hi-C 数据具有挑战性,特别是对于长距离和染色体间相互作用。我们率先开发了一种概率方法,用于生成与 Hi-C 实验中的所有染色质-染色质相互作用概率一致的独特二倍体 3D 基因组结构群体。该群体中的每个结构都是基因组在 3D 中的物理模型。对这些模型的分析为理解基因组在空间和时间上的组织的原因和功能特性提供了新的见解。我们提供了一个名为 PGS 的用户友好型软件包,该软件可以在本地机器(用于实践运行)和高性能计算平台上运行。PGS 采用全基因组 Hi-C 接触频率矩阵,以及关于基因组分割的信息,并生成与输入完全一致的 3D 基因组结构群体。该软件会自动生成分析报告,并提供提取和分析特定结构域的 3D 坐标的工具。使用此软件只需具备基本的 Linux 命令行知识。在计算机集群上使用 300 个核,典型的运行时间约为 3 天,可生成拓扑相关域(TAD)级分辨率的 1000 个二倍体基因组结构群体。