Suppr超能文献

基于支持向量机的 RNA 结合蛋白结合残基和进化信息预测。

SVM based prediction of RNA-binding proteins using binding residues and evolutionary information.

机构信息

Department of Biology, McGill University, Montreal, QC, H3A 1B1, Canada.

出版信息

J Mol Recognit. 2011 Mar-Apr;24(2):303-13. doi: 10.1002/jmr.1061.

Abstract

RNA-binding proteins (RBPs) play crucial role in transcription and gene-regulation. This paper describes a support vector machine (SVM) based method for discriminating and classifying RNA-binding and non-binding proteins using sequence features. With the threshold of 30% interacting residues, RNA-binding amino acid prediction method PPRINT achieved the Matthews correlation coefficient (MCC) of 0.32. BLAST and PSI-BLAST identified RBPs with the coverage of 32.63 and 33.16%, respectively, at the e-value of 1e-4. The SVM models developed with amino acid, dipeptide and four-part amino acid compositions showed the MCC of 0.60, 0.46, and 0.53, respectively. This is the first study in which evolutionary information in form of position specific scoring matrix (PSSM) profile has been successfully used for predicting RBPs. We achieved the maximum MCC of 0.62 using SVM model based on PSSM called PSSM-400. Finally, we developed different hybrid approaches and achieved maximum MCC of 0.66. We also developed a method for predicting three subclasses of RNA binding proteins (e.g., rRNA, tRNA, mRNA binding proteins). The performance of the method was also evaluated on an independent dataset of 69 RBPs and 100 non-RBPs (NBPs). An additional benchmarking was also performed using gene ontology (GO) based annotation. Based on the hybrid approach a web-server RNApred has been developed for predicting RNA binding proteins from amino acid sequences (http://www.imtech.res.in/raghava/rnapred/).

摘要

RNA 结合蛋白(RBPs)在转录和基因调控中起着至关重要的作用。本文描述了一种基于支持向量机(SVM)的方法,用于使用序列特征区分和分类 RNA 结合和非结合蛋白。使用 30%相互作用残基的阈值,RNA 结合氨基酸预测方法 PPRINT 实现了 Matthews 相关系数(MCC)为 0.32。BLAST 和 PSI-BLAST 分别以覆盖率 32.63%和 33.16%,在 e 值为 1e-4 的情况下鉴定了 RBPs。使用氨基酸、二肽和四部分氨基酸组成开发的 SVM 模型显示 MCC 分别为 0.60、0.46 和 0.53。这是首次成功使用形式为位置特异性评分矩阵(PSSM)图谱的进化信息来预测 RBPs 的研究。我们使用基于 PSSM 的 SVM 模型 PSSM-400 实现了最大 MCC 为 0.62。最后,我们开发了不同的混合方法,实现了最大 MCC 为 0.66。我们还开发了一种预测三种 RNA 结合蛋白亚类(例如 rRNA、tRNA、mRNA 结合蛋白)的方法。该方法的性能也在 69 个 RBPs 和 100 个非 RBPs(NBPs)的独立数据集上进行了评估。还使用基于基因本体(GO)的注释进行了额外的基准测试。基于混合方法,我们开发了一个用于从氨基酸序列预测 RNA 结合蛋白的网络服务器 RNApred(http://www.imtech.res.in/raghava/rnapred/)。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验