Suppr超能文献

使用ProtBERT和支持向量机识别转录因子的DNA甲基化偏好性。

Identifying the DNA methylation preference of transcription factors using ProtBERT and SVM.

作者信息

Li Yanchao, Zou Quan, Dai Qi, Stalin Antony, Luo Ximei

机构信息

School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu, Sichuan, China.

Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China, Chengdu, Sichuan, China.

出版信息

PLoS Comput Biol. 2025 May 13;21(5):e1012513. doi: 10.1371/journal.pcbi.1012513. eCollection 2025 May.

Abstract

Transcription factors (TFs) can affect gene expression by binding to certain specific DNA sequences. This binding process of TFs may be modulated by DNA methylation. A subset of TFs that serve as methylation readers preferentially binds to certain methylated DNA and is defined as TFPM. The identification of TFPMs enhances our understanding of DNA methylation's role in gene regulation. However, their experimental identification is resource-demanding. In this study, we propose a novel two-step computational approach to classify TFs and TFPMs. First, we employed a fine-tuned ProtBERT model to differentiate between the classes of TFs and non-TFs. Second, we combined the Reduced Amino Acid Category (RAAC) with K-mer and SVM to predict the potential of TFs to bind to methylated DNA. Comparative experiments demonstrate that our proposed methods outperform all existing approaches and emphasize the efficiency of our computational framework in classifying TFs and TFPMs. Cross-species validation on an independent mouse dataset further demonstrates the generalizability of our proposed framework In addition, we conducted predictions on all human transcription factors and found that most of the top 20 proteins belong to the Krueppel C2H2-type Zinc-finger family. So far, some studies have demonstrated a partial correlation between this family and DNA methylation and confirmed the preference of some of its members, thereby showing the robustness of our approach.

摘要

转录因子(TFs)可通过与某些特定的DNA序列结合来影响基因表达。TFs的这种结合过程可能会受到DNA甲基化的调控。作为甲基化读取器的一部分转录因子优先与某些甲基化DNA结合,并被定义为TFPM。TFPM的鉴定增强了我们对DNA甲基化在基因调控中作用的理解。然而,对它们进行实验鉴定需要大量资源。在本研究中,我们提出了一种新颖的两步计算方法来对TFs和TFPMs进行分类。首先,我们使用一个微调的ProtBERT模型来区分TFs和非TFs类别。其次,我们将简化氨基酸类别(RAAC)与K-mer和支持向量机相结合,以预测TFs与甲基化DNA结合的潜力。对比实验表明,我们提出的方法优于所有现有方法,并强调了我们的计算框架在对TFs和TFPMs进行分类方面的效率。在一个独立的小鼠数据集上进行的跨物种验证进一步证明了我们提出的框架的通用性。此外,我们对所有人类转录因子进行了预测,发现排名前20的蛋白质中大多数属于Krueppel C2H2型锌指家族。到目前为止,一些研究已经证明了这个家族与DNA甲基化之间存在部分关联,并证实了其一些成员的偏好,从而表明了我们方法的稳健性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/0e3b/12121914/612ae4d728ec/pcbi.1012513.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验