Suppr超能文献

GraphPro:一种基于可解释图神经网络的模型,用于识别多个物种中的启动子。

GraphPro: An interpretable graph neural network-based model for identifying promoters in multiple species.

机构信息

College of Science, Dalian Jiaotong University, Dalian, 116028, China.

College of Software, Dalian Jiaotong University, Dalian, 116028, China.

出版信息

Comput Biol Med. 2024 Sep;180:108974. doi: 10.1016/j.compbiomed.2024.108974. Epub 2024 Aug 2.

Abstract

Promoters are DNA sequences that bind with RNA polymerase to initiate transcription, regulating this process through interactions with transcription factors. Accurate identification of promoters is crucial for understanding gene expression regulation mechanisms and developing therapeutic approaches for various diseases. However, experimental techniques for promoter identification are often expensive, time-consuming, and inefficient, necessitating the development of accurate and efficient computational models for this task. Enhancing the model's ability to recognize promoters across multiple species and improving its interpretability pose significant challenges. In this study, we introduce a novel interpretable model based on graph neural networks, named GraphPro, for multi-species promoter identification. Initially, we encode the sequences using k-tuple nucleotide frequency pattern, dinucleotide physicochemical properties, and dna2vec. Subsequently, we construct two feature extraction modules based on convolutional neural networks and graph neural networks. These modules aim to extract specific motifs from the promoters, learn their dependencies, and capture the underlying structural features of the promoters, providing a more comprehensive representation. Finally, a fully connected neural network predicts whether the input sequence is a promoter. We conducted extensive experiments on promoter datasets from eight species, including Human, Mouse, and Escherichia coli. The experimental results show that the average Sn, Sp, Acc and MCC values of GraphPro are 0.9123, 0.9482, 0.8840 and 0.7984, respectively. Compared with previous promoter identification methods, GraphPro not only achieves better recognition accuracy on multiple species, but also outperforms all previous methods in cross-species prediction ability. Furthermore, by visualizing GraphPro's decision process and analyzing the sequences matching the transcription factor binding motifs captured by the model, we validate its significant advantages in biological interpretability. The source code for GraphPro is available at https://github.com/liuliwei1980/GraphPro.

摘要

启动子是与 RNA 聚合酶结合以启动转录的 DNA 序列,通过与转录因子的相互作用来调节转录过程。准确识别启动子对于理解基因表达调控机制和开发各种疾病的治疗方法至关重要。然而,启动子识别的实验技术通常昂贵、耗时且效率低下,因此需要开发准确高效的计算模型来完成这项任务。提高模型识别跨多种物种启动子的能力并提高其可解释性是一个重大挑战。在这项研究中,我们引入了一种基于图神经网络的新型可解释模型,名为 GraphPro,用于多物种启动子识别。首先,我们使用 k- 元核苷酸频率模式、二核苷酸理化性质和 dna2vec 对序列进行编码。然后,我们基于卷积神经网络和图神经网络构建了两个特征提取模块。这些模块旨在从启动子中提取特定的基序,学习它们的依赖关系,并捕获启动子的潜在结构特征,提供更全面的表示。最后,一个全连接神经网络预测输入序列是否为启动子。我们在来自八个物种的启动子数据集上进行了广泛的实验,包括人类、小鼠和大肠杆菌。实验结果表明,GraphPro 的平均 Sn、Sp、Acc 和 MCC 值分别为 0.9123、0.9482、0.8840 和 0.7984。与以前的启动子识别方法相比,GraphPro 不仅在多种物种上实现了更好的识别精度,而且在跨物种预测能力方面也优于所有以前的方法。此外,通过可视化 GraphPro 的决策过程并分析与模型捕获的转录因子结合基序匹配的序列,我们验证了它在生物学可解释性方面的显著优势。GraphPro 的源代码可在 https://github.com/liuliwei1980/GraphPro 上获得。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验