Department of Cellular and Molecular Medicine, University of Califorinia, San Diego, La Jolla, CA, USA; Institute for Genomic Medicine and UCSD Stem Cell Program, University of California, San Diego, La Jolla, CA, USA; Stem Cell Program, University of California, San Diego, La Jolla, CA, USA.
Department of Physiology, Yong Loo Lin School of Medicine, National University of Singapore, Singapore, Singapore.
Mol Cell. 2023 Jul 20;83(14):2595-2611.e11. doi: 10.1016/j.molcel.2023.06.019. Epub 2023 Jul 7.
RNA-binding proteins (RBPs) control RNA metabolism to orchestrate gene expression and, when dysfunctional, underlie human diseases. Proteome-wide discovery efforts predict thousands of RBP candidates, many of which lack canonical RNA-binding domains (RBDs). Here, we present a hybrid ensemble RBP classifier (HydRA), which leverages information from both intermolecular protein interactions and internal protein sequence patterns to predict RNA-binding capacity with unparalleled specificity and sensitivity using support vector machines (SVMs), convolutional neural networks (CNNs), and Transformer-based protein language models. Occlusion mapping by HydRA robustly detects known RBDs and predicts hundreds of uncharacterized RNA-binding associated domains. Enhanced CLIP (eCLIP) for HydRA-predicted RBP candidates reveals transcriptome-wide RNA targets and confirms RNA-binding activity for HydRA-predicted RNA-binding associated domains. HydRA accelerates construction of a comprehensive RBP catalog and expands the diversity of RNA-binding associated domains.
RNA 结合蛋白(RBPs)控制 RNA 代谢以协调基因表达,而当其功能失调时,则会导致人类疾病。全蛋白质组的发现工作预测了数千个 RBP 候选物,其中许多缺乏典型的 RNA 结合结构域(RBDs)。在这里,我们提出了一种混合的 RBP 分类器(HydRA),它利用了分子间蛋白质相互作用和内部蛋白质序列模式的信息,使用支持向量机(SVMs)、卷积神经网络(CNNs)和基于 Transformer 的蛋白质语言模型,以无与伦比的特异性和灵敏度预测 RNA 结合能力。HydRA 的遮挡映射稳健地检测到已知的 RBDs,并预测了数百个未表征的与 RNA 结合相关的结构域。用于 HydRA 预测的 RBP 候选物的增强型 CLIP(eCLIP)揭示了转录组范围的 RNA 靶标,并证实了 HydRA 预测的与 RNA 结合相关的结构域的 RNA 结合活性。HydRA 加速了全面 RBP 目录的构建,并扩展了与 RNA 结合相关的结构域的多样性。