Suppr超能文献

HydRA:基于蛋白质相互作用关联上下文和蛋白质序列预测 RNA 结合能力的深度学习模型。

HydRA: Deep-learning models for predicting RNA-binding capacity from protein interaction association context and protein sequence.

机构信息

Department of Cellular and Molecular Medicine, University of Califorinia, San Diego, La Jolla, CA, USA; Institute for Genomic Medicine and UCSD Stem Cell Program, University of California, San Diego, La Jolla, CA, USA; Stem Cell Program, University of California, San Diego, La Jolla, CA, USA.

Department of Physiology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.

出版信息

Mol Cell. 2023 Jul 20;83(14):2595-2611.e11. doi: 10.1016/j.molcel.2023.06.019. Epub 2023 Jul 7.

Abstract

RNA-binding proteins (RBPs) control RNA metabolism to orchestrate gene expression and, when dysfunctional, underlie human diseases. Proteome-wide discovery efforts predict thousands of RBP candidates, many of which lack canonical RNA-binding domains (RBDs). Here, we present a hybrid ensemble RBP classifier (HydRA), which leverages information from both intermolecular protein interactions and internal protein sequence patterns to predict RNA-binding capacity with unparalleled specificity and sensitivity using support vector machines (SVMs), convolutional neural networks (CNNs), and Transformer-based protein language models. Occlusion mapping by HydRA robustly detects known RBDs and predicts hundreds of uncharacterized RNA-binding associated domains. Enhanced CLIP (eCLIP) for HydRA-predicted RBP candidates reveals transcriptome-wide RNA targets and confirms RNA-binding activity for HydRA-predicted RNA-binding associated domains. HydRA accelerates construction of a comprehensive RBP catalog and expands the diversity of RNA-binding associated domains.

摘要

RNA 结合蛋白(RBPs)控制 RNA 代谢以协调基因表达,而当其功能失调时,则会导致人类疾病。全蛋白质组的发现工作预测了数千个 RBP 候选物,其中许多缺乏典型的 RNA 结合结构域(RBDs)。在这里,我们提出了一种混合的 RBP 分类器(HydRA),它利用了分子间蛋白质相互作用和内部蛋白质序列模式的信息,使用支持向量机(SVMs)、卷积神经网络(CNNs)和基于 Transformer 的蛋白质语言模型,以无与伦比的特异性和灵敏度预测 RNA 结合能力。HydRA 的遮挡映射稳健地检测到已知的 RBDs,并预测了数百个未表征的与 RNA 结合相关的结构域。用于 HydRA 预测的 RBP 候选物的增强型 CLIP(eCLIP)揭示了转录组范围的 RNA 靶标,并证实了 HydRA 预测的与 RNA 结合相关的结构域的 RNA 结合活性。HydRA 加速了全面 RBP 目录的构建,并扩展了与 RNA 结合相关的结构域的多样性。

相似文献

1
HydRA: Deep-learning models for predicting RNA-binding capacity from protein interaction association context and protein sequence.
Mol Cell. 2023 Jul 20;83(14):2595-2611.e11. doi: 10.1016/j.molcel.2023.06.019. Epub 2023 Jul 7.
2
Predicting dynamic cellular protein-RNA interactions by deep learning using in vivo RNA structures.
Cell Res. 2021 May;31(5):495-516. doi: 10.1038/s41422-021-00476-y. Epub 2021 Feb 23.
3
SONAR Discovers RNA-Binding Proteins from Analysis of Large-Scale Protein-Protein Interactomes.
Mol Cell. 2016 Oct 20;64(2):282-293. doi: 10.1016/j.molcel.2016.09.003. Epub 2016 Oct 6.
7
Principles of RNA processing from analysis of enhanced CLIP maps for 150 RNA binding proteins.
Genome Biol. 2020 Apr 6;21(1):90. doi: 10.1186/s13059-020-01982-9.
8
RBPsuite: RNA-protein binding sites prediction suite based on deep learning.
BMC Genomics. 2020 Dec 9;21(1):884. doi: 10.1186/s12864-020-07291-6.
9
Predicting RNA-protein binding sites and motifs through combining local and global deep convolutional neural networks.
Bioinformatics. 2018 Oct 15;34(20):3427-3436. doi: 10.1093/bioinformatics/bty364.
10
Human protein-RNA interaction network is highly stable across mammals.
BMC Genomics. 2019 Dec 30;20(Suppl 12):1004. doi: 10.1186/s12864-019-6330-9.

引用本文的文献

1
RNA triggers chronic stress during neuronal aging.
bioRxiv. 2025 Aug 5:2025.08.04.668575. doi: 10.1101/2025.08.04.668575.
3
RNA-coupled CRISPR Screens Reveal ZNF207 as a Regulator of LMNA Aberrant Splicing in Progeria.
bioRxiv. 2025 Apr 26:2025.04.25.648738. doi: 10.1101/2025.04.25.648738.
4
Neuronal aging causes mislocalization of splicing proteins and unchecked cellular stress.
Nat Neurosci. 2025 Jun;28(6):1174-1184. doi: 10.1038/s41593-025-01952-z. Epub 2025 Jun 2.
5
: A Potential Biomarker for Microtia Identified by Integrated RNA Transcriptome Analysis.
Curr Genomics. 2025;26(3):210-224. doi: 10.2174/0113892029311725240911065539. Epub 2024 Sep 25.
6
RegRNA 3.0: expanding regulatory RNA analysis with new features for motif, interaction, and annotation.
Nucleic Acids Res. 2025 Jul 7;53(W1):W485-W495. doi: 10.1093/nar/gkaf405.
9
RNA sequence analysis landscape: A comprehensive review of task types, databases, datasets, word embedding methods, and language models.
Heliyon. 2025 Jan 6;11(2):e41488. doi: 10.1016/j.heliyon.2024.e41488. eCollection 2025 Jan 30.

本文引用的文献

1
Masked inverse folding with sequence transfer for protein representation learning.
Protein Eng Des Sel. 2023 Jan 21;36. doi: 10.1093/protein/gzad015.
2
Fast and accurate protein structure search with Foldseek.
Nat Biotechnol. 2024 Feb;42(2):243-246. doi: 10.1038/s41587-023-01773-0. Epub 2023 May 8.
3
Prediction of protein-protein interaction using graph neural networks.
Sci Rep. 2022 May 19;12(1):8360. doi: 10.1038/s41598-022-12201-9.
4
ProteinBERT: a universal deep-learning model of protein sequence and function.
Bioinformatics. 2022 Apr 12;38(8):2102-2110. doi: 10.1093/bioinformatics/btac020.
5
6
Highly accurate protein structure prediction with AlphaFold.
Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.
7
Structure-based protein function prediction using graph convolutional networks.
Nat Commun. 2021 May 26;12(1):3168. doi: 10.1038/s41467-021-23303-9.
8
ChIP-R: Assembling reproducible sets of ChIP-seq and ATAC-seq peaks from multiple replicates.
Genomics. 2021 Jul;113(4):1855-1866. doi: 10.1016/j.ygeno.2021.04.026. Epub 2021 Apr 18.
9
CATH: increased structural coverage of functional space.
Nucleic Acids Res. 2021 Jan 8;49(D1):D266-D273. doi: 10.1093/nar/gkaa1079.
10
The InterPro protein families and domains database: 20 years on.
Nucleic Acids Res. 2021 Jan 8;49(D1):D344-D354. doi: 10.1093/nar/gkaa977.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验