Laboratory of Zebrafish Developmental Genomics, International Institute of Molecular and Cell Biology in Warsaw, Warsaw, Poland.
RIKEN Center for Integrative Medical Sciences, Yokohama, 230-0045, Japan.
BMC Bioinformatics. 2023 Jan 11;24(1):14. doi: 10.1186/s12859-022-05084-0.
Elucidating the Transcription Factors (TFs) that drive the gene expression changes in a given experiment is a common question asked by researchers. The existing methods rely on the predicted Transcription Factor Binding Site (TFBS) to model the changes in the motif activity. Such methods only work for TFs that have a motif and assume the TF binding profile is the same in all cell types.
Given the wealth of the ChIP-seq data available for a wide range of the TFs in various cell types, we propose that gene expression modeling can be done using ChIP-seq "signatures" directly, effectively skipping the motif finding and TFBS prediction steps. We present xcore, an R package that allows TF activity modeling based on ChIP-seq signatures and the user's gene expression data. We also provide xcoredata a companion data package that provides a collection of preprocessed ChIP-seq signatures. We demonstrate that xcore leads to biologically relevant predictions using transforming growth factor beta induced epithelial-mesenchymal transition time-courses, rinderpest infection time-courses, and embryonic stem cells differentiated to cardiomyocytes time-course profiled with Cap Analysis Gene Expression.
xcore provides a simple analytical framework for gene expression modeling using linear models that can be easily incorporated into differential expression analysis pipelines. Taking advantage of public ChIP-seq databases, xcore can identify meaningful molecular signatures and relevant ChIP-seq experiments.
阐明特定实验中驱动基因表达变化的转录因子(TFs)是研究人员经常提出的问题。现有的方法依赖于预测的转录因子结合位点(TFBS)来模拟基序活性的变化。这种方法仅适用于具有基序的 TF,并假设 TF 结合谱在所有细胞类型中都是相同的。
鉴于各种细胞类型中广泛的 TF 的 ChIP-seq 数据的丰富性,我们提出可以直接使用 ChIP-seq“特征”进行基因表达建模,有效地跳过基序发现和 TFBS 预测步骤。我们提出了 xcore,这是一个 R 包,允许基于 ChIP-seq 特征和用户的基因表达数据进行 TF 活性建模。我们还提供了 xcoredata 一个配套的数据包,其中提供了预处理的 ChIP-seq 特征的集合。我们证明,xcore 可以使用转化生长因子β诱导的上皮-间充质转化时间过程、牛瘟感染时间过程和用 Cap 分析基因表达分析的胚胎干细胞分化为心肌细胞时间过程进行生物学相关的预测。
xcore 提供了一个简单的分析框架,用于使用线性模型进行基因表达建模,这些模型可以轻松地纳入差异表达分析管道中。利用公共 ChIP-seq 数据库,xcore 可以识别有意义的分子特征和相关的 ChIP-seq 实验。