Phuong Tu Minh, Lee Doheon, Lee Kwang Hyung
Department of BioSystems, Korea Advanced Institute of Science and Technology, 373-1 Guseong-dong Yuseong-gu, Daejeon 305-701, Korea.
Bioinformatics. 2004 Mar 22;20(5):750-7. doi: 10.1093/bioinformatics/btg480. Epub 2004 Jan 29.
The transcription of a gene is largely determined by short sequence motifs that serve as binding sites for transcription factors. Recent findings suggest direct relationships between the motifs and gene expression levels. In this work, we present a method for identifying regulatory motifs. Our method makes use of tree-based techniques for recovering the relationships between motifs and gene expression levels.
We treat regulatory motifs and gene expression levels as predictor variables and responses, respectively, and use a regression tree model to identify the structural relationships between them. The regression tree methodology is extended to handle responses from multiple experiments by modifying the split function. The significance of regulatory elements is determined by analyzing tree structures and using a variable importance measure. When applied to two data sets of the yeast Saccharomyces cerevisiae, the method successfully identifies most of the regulatory motifs that are known to control gene transcription under the given experimental conditions, and suggests several new putative motifs. Analysis of the tree structures also reconfirms several pairs of motifs that are known to regulate gene transcription in combination.
基因的转录很大程度上由作为转录因子结合位点的短序列基序决定。最近的研究结果表明这些基序与基因表达水平之间存在直接关系。在这项工作中,我们提出了一种识别调控基序的方法。我们的方法利用基于树的技术来恢复基序与基因表达水平之间的关系。
我们分别将调控基序和基因表达水平视为预测变量和响应变量,并使用回归树模型来识别它们之间的结构关系。通过修改分裂函数,将回归树方法扩展以处理来自多个实验的响应。通过分析树结构并使用变量重要性度量来确定调控元件的重要性。当应用于酿酒酵母的两个数据集时,该方法成功识别出了在给定实验条件下已知控制基因转录的大多数调控基序,并提出了几个新的假定基序。对树结构的分析还再次证实了几对已知共同调控基因转录的基序。