Suppr超能文献

TFBSFootprinter:一种用于预测脊椎动物物种中转录因子结合位点的多组学工具。

TFBSFootprinter: a multiomics tool for prediction of transcription factor binding sites in vertebrate species.

作者信息

Barker Harlan R, Parkkila Seppo, Tolvanen MarttiE E

机构信息

Tampere University Hospital and Faculty of Medicine and Health Technology, Tampere University, Tampere, Finland.

Department of Clinical Chemistry, Fimlab Laboratories PLC, Tampere University Hospital, Tampere, Finland.

出版信息

Transcription. 2025 Apr-Jun;16(2-3):204-223. doi: 10.1080/21541264.2025.2521764. Epub 2025 Jul 11.

Abstract

BACKGROUND

Transcription factor (TF) proteins play a critical role in the regulation of eukaryotic gene expression via sequence-specific binding to genomic locations known as transcription factor binding sites (TFBSs). Accurate prediction of TFBSs is essential for understanding gene regulation, disease mechanisms, and drug discovery. These studies are therefore relevant not only in humans but also in model organisms and domesticated and wild animals. However, current tools for the automatic analysis of TFBSs in gene promoter regions are limited in their usability across multiple species. To our knowledge, no tools currently exist that allow for automatic analysis of TFBSs in gene promoter regions for many species.

METHODOLOGY AND FINDINGS

The TFBSFootprinter tool combines multiomic transcription-relevant data for more accurate prediction of functional TFBSs in 317 vertebrate species. In humans, this includes vertebrate sequence conservation (GERP), proximity to transcription start sites (FANTOM5), correlation of expression between target genes and TFs predicted to bind promoters (FANTOM5), overlap with ChIP-Seq TF metaclusters (GTRD), overlap with ATAC-Seq peaks (ENCODE), eQTLs (GTEx), and the observed/expected CpG ratio (Ensembl). In non-human vertebrates, this includes GERP, proximity to transcription start sites, and CpG ratio.TFBSFootprinter analyses are based on the Ensembl transcript ID for simplicity of use and require minimal setup steps. Benchmarking of the TFBSFootprinter on a manually curated and experimentally verified dataset of TFBSs produced superior results when using all multiomic data (average area under the receiver operating characteristic curve, 0.881), compared with DeepBind (0.798), DeepSEA (0.682), FIMO (0.817) and traditional PWM (0.854). The results were further improved by selecting the best overall combination of multiomic data (0.910). Additionally, we determined combinations of multiomic data that provide the best model of binding for each TF. TFBSFootprinter is available as Conda and Python packages.

摘要

背景

转录因子(TF)蛋白通过与基因组中称为转录因子结合位点(TFBS)的特定序列结合,在真核基因表达调控中发挥关键作用。准确预测TFBS对于理解基因调控、疾病机制和药物发现至关重要。因此,这些研究不仅与人类相关,也与模式生物以及家养和野生动物相关。然而,目前用于自动分析基因启动子区域TFBS的工具在跨多个物种的可用性方面存在局限。据我们所知,目前不存在能够对多种物种的基因启动子区域TFBS进行自动分析的工具。

方法与发现

TFBSFootprinter工具结合了多组学转录相关数据,以更准确地预测317种脊椎动物物种中的功能性TFBS。在人类中,这包括脊椎动物序列保守性(GERP)、与转录起始位点的接近程度(FANTOM5)、靶基因与预测结合启动子的TF之间的表达相关性(FANTOM5)、与ChIP-Seq TF元簇的重叠(GTRD)、与ATAC-Seq峰的重叠(ENCODE)、eQTL(GTEx)以及观察到的/预期的CpG比率(Ensembl)。在非人类脊椎动物中,这包括GERP、与转录起始位点的接近程度以及CpG比率。TFBSFootprinter分析基于Ensembl转录本ID,使用简单,且只需最少的设置步骤。与DeepBind(0.798)、DeepSEA(0.682)、FIMO(0.817)和传统PWM(0.854)相比,在一个经过人工整理和实验验证的TFBS数据集上对TFBSFootprinter进行基准测试时,使用所有多组学数据产生了更优的结果(受试者操作特征曲线下的平均面积,0.881)。通过选择多组学数据的最佳总体组合(0.910),结果进一步得到改善。此外,我们确定了为每个TF提供最佳结合模型的多组学数据组合。TFBSFootprinter可作为Conda和Python包获取。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b8e8/12258250/d6366807d3fc/KTRN_A_2521764_F0001_OC.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验