Budden David M, Hurley Daniel G, Crampin Edmund J
Brief Bioinform. 2015 Jul;16(4):616-28. doi: 10.1093/bib/bbu034. Epub 2014 Sep 16.
Predictive modelling of gene expression provides a powerful framework for exploring the regulatory logic underpinning transcriptional regulation. Recent studies have demonstrated the utility of such models in identifying dysregulation of gene and miRNA expression associated with abnormal patterns of transcription factor (TF) binding or nucleosomal histone modifications (HMs). Despite the growing popularity of such approaches, a comparative review of the various modelling algorithms and feature extraction methods is lacking. We define and compare three methods of quantifying pairwise gene-TF/HM interactions and discuss their suitability for integrating the heterogeneous chromatin immunoprecipitation (ChIP)-seq binding patterns exhibited by TFs and HMs. We then construct log-linear and ϵ-support vector regression models from various mouse embryonic stem cell (mESC) and human lymphoblastoid (GM12878) data sets, considering both ChIP-seq- and position weight matrix- (PWM)-derived in silico TF-binding. The two algorithms are evaluated both in terms of their modelling prediction accuracy and ability to identify the established regulatory roles of individual TFs and HMs. Our results demonstrate that TF-binding and HMs are highly predictive of gene expression as measured by mRNA transcript abundance, irrespective of algorithm or cell type selection and considering both ChIP-seq and PWM-derived TF-binding. As we encourage other researchers to explore and develop these results, our framework is implemented using open-source software and made available as a preconfigured bootable virtual environment.
基因表达的预测建模为探索转录调控背后的调控逻辑提供了一个强大的框架。最近的研究已经证明了这类模型在识别与转录因子(TF)结合或核小体组蛋白修饰(HM)异常模式相关的基因和miRNA表达失调方面的实用性。尽管这类方法越来越受欢迎,但缺乏对各种建模算法和特征提取方法的比较综述。我们定义并比较了三种量化基因与TF/HM成对相互作用的方法,并讨论了它们对于整合TF和HM所呈现的异质染色质免疫沉淀(ChIP)-seq结合模式的适用性。然后,我们从各种小鼠胚胎干细胞(mESC)和人类淋巴母细胞(GM12878)数据集中构建对数线性和ϵ支持向量回归模型,同时考虑ChIP-seq和基于位置权重矩阵(PWM)的计算机模拟TF结合。从建模预测准确性以及识别单个TF和HM既定调控作用的能力这两个方面对这两种算法进行了评估。我们的结果表明,无论算法或细胞类型如何选择,并且同时考虑ChIP-seq和基于PWM的TF结合,TF结合和HM都能通过mRNA转录本丰度高度预测基因表达。由于我们鼓励其他研究人员探索和拓展这些结果,我们的框架是使用开源软件实现的,并作为一个预配置的可引导虚拟环境提供。