Suppr超能文献

基于序列的蛋白质-蛋白质相互作用预测:结合全局编码的加权稀疏表示模型

Sequence-based prediction of protein-protein interactions using weighted sparse representation model combined with global encoding.

作者信息

Huang Yu-An, You Zhu-Hong, Chen Xing, Chan Keith, Luo Xin

机构信息

College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong, 518060, China.

School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, Jiangsu, 221116, China.

出版信息

BMC Bioinformatics. 2016 Apr 26;17(1):184. doi: 10.1186/s12859-016-1035-4.

Abstract

BACKGROUND

Proteins are the important molecules which participate in virtually every aspect of cellular function within an organism in pairs. Although high-throughput technologies have generated considerable protein-protein interactions (PPIs) data for various species, the processes of experimental methods are both time-consuming and expensive. In addition, they are usually associated with high rates of both false positive and false negative results. Accordingly, a number of computational approaches have been developed to effectively and accurately predict protein interactions. However, most of these methods typically perform worse when other biological data sources (e.g., protein structure information, protein domains, or gene neighborhoods information) are not available. Therefore, it is very urgent to develop effective computational methods for prediction of PPIs solely using protein sequence information.

RESULTS

In this study, we present a novel computational model combining weighted sparse representation based classifier (WSRC) and global encoding (GE) of amino acid sequence. Two kinds of protein descriptors, composition and transition, are extracted for representing each protein sequence. On the basis of such a feature representation, novel weighted sparse representation based classifier is introduced to predict protein interaction class. When the proposed method was evaluated with the PPIs data of S. cerevisiae, Human and H. pylori, it achieved high prediction accuracies of 96.82, 97.66 and 92.83 % respectively. Extensive experiments were performed for cross-species PPIs prediction and the prediction accuracies were also very promising.

CONCLUSIONS

To further evaluate the performance of the proposed method, we then compared its performance with the method based on support vector machine (SVM). The results show that the proposed method achieved a significant improvement. Thus, the proposed method is a very efficient method to predict PPIs and may be a useful supplementary tool for future proteomics studies.

摘要

背景

蛋白质是几乎参与生物体内细胞功能各个方面的重要分子。尽管高通量技术已经为各种物种生成了大量的蛋白质-蛋白质相互作用(PPI)数据,但实验方法既耗时又昂贵。此外,它们通常与高比例的假阳性和假阴性结果相关。因此,已经开发了许多计算方法来有效且准确地预测蛋白质相互作用。然而,当没有其他生物数据源(例如蛋白质结构信息、蛋白质结构域或基因邻域信息)时,这些方法中的大多数通常表现较差。因此,开发仅使用蛋白质序列信息来预测PPI的有效计算方法非常迫切。

结果

在本研究中,我们提出了一种结合基于加权稀疏表示的分类器(WSRC)和氨基酸序列全局编码(GE)的新型计算模型。提取了两种蛋白质描述符,即组成和转变,以表示每个蛋白质序列。基于这种特征表示,引入了新型的基于加权稀疏表示的分类器来预测蛋白质相互作用类别。当用酿酒酵母、人类和幽门螺杆菌的PPI数据对所提出的方法进行评估时,它分别达到了96.82%、97.66%和92.83%的高预测准确率。针对跨物种PPI预测进行了广泛的实验,预测准确率也非常可观。

结论

为了进一步评估所提出方法的性能,我们随后将其性能与基于支持向量机(SVM)的方法进行了比较。结果表明所提出的方法取得了显著的改进。因此,所提出的方法是预测PPI的一种非常有效的方法,可能是未来蛋白质组学研究的一个有用的补充工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ff05/4845433/06255d0567ee/12859_2016_1035_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验