Faculty of Health Sciences, University of Macau, Taipa, Macau, China.
Int J Biol Sci. 2018 Sep 7;14(12):1724-1731. doi: 10.7150/ijbs.28850. eCollection 2018.
Next-generation sequencing coupled to chromatin immunoprecipitation (ChIP-seq), DNase I hypersensitivity (DNase-seq) and the transposase-accessible chromatin assay (ATAC-seq) has generated enormous amounts of data, markedly improved our understanding of the transcriptional and epigenetic control of gene expression. To take advantage of the availability of such datasets and provide clues on what factors, including transcription factors, epigenetic regulators and histone modifications, potentially regulates the expression of a gene of interest, a tool for simultaneous queries of multiple datasets using symbols or genomic coordinates as search terms is needed. In this study, we annotated the peaks of thousands of ChIP-seq datasets generated by ENCODE project, or ChIP-seq/DNase-seq/ATAC-seq datasets deposited in Gene Expression Omnibus (GEO) and curated by Cistrome project; We built a MySQL database called TFmapper containing the annotations and associated metadata, allowing users without bioinformatics expertise to search across thousands of datasets to identify factors targeting a genomic region/gene of interest in a specified sample through a web interface. Users can also visualize multiple peaks in genome browsers and download the corresponding sequences. TFmapper will help users explore the vast amount of publicly available ChIP-seq/DNase-seq/ATAC-seq data and perform integrative analyses to understand the regulation of a gene of interest. The web server is freely accessible at http://www.tfmapper.org/.
下一代测序与染色质免疫沉淀(ChIP-seq)、DNase I 超敏(DNase-seq)和转座酶可及染色质分析(ATAC-seq)相结合,产生了大量的数据,极大地提高了我们对基因表达转录和表观遗传调控的理解。为了利用这些数据集的可用性,并提供有关哪些因素(包括转录因子、表观遗传调节剂和组蛋白修饰)可能调节感兴趣基因的表达的线索,需要一种工具来同时使用符号或基因组坐标作为搜索词查询多个数据集。 在这项研究中,我们注释了由 ENCODE 项目生成的数千个 ChIP-seq 数据集的峰,或在基因表达综合数据库(GEO)中存储并由 Cistrome 项目 curated 的 ChIP-seq/DNase-seq/ATAC-seq 数据集;我们构建了一个名为 TFmapper 的 MySQL 数据库,其中包含注释和相关元数据,允许没有生物信息学专业知识的用户通过网络界面搜索数千个数据集,以识别针对特定样本中感兴趣的基因组区域/基因的因子。用户还可以在基因组浏览器中可视化多个峰,并下载相应的序列。TFmapper 将帮助用户探索大量可用的 ChIP-seq/DNase-seq/ATAC-seq 数据,并进行整合分析以了解感兴趣基因的调控。该网络服务器可在 http://www.tfmapper.org/ 免费访问。