Rashid Mamoon, Saha Sudipto, Raghava Gajendra Ps
Bioinformatics Centre, Institute of Microbial Technology, Sector-39A, Chandigarh, India.
BMC Bioinformatics. 2007 Sep 13;8:337. doi: 10.1186/1471-2105-8-337.
In past number of methods have been developed for predicting subcellular location of eukaryotic, prokaryotic (Gram-negative and Gram-positive bacteria) and human proteins but no method has been developed for mycobacterial proteins which may represent repertoire of potent immunogens of this dreaded pathogen. In this study, attempt has been made to develop method for predicting subcellular location of mycobacterial proteins.
The models were trained and tested on 852 mycobacterial proteins and evaluated using five-fold cross-validation technique. First SVM (Support Vector Machine) model was developed using amino acid composition and overall accuracy of 82.51% was achieved with average accuracy (mean of class-wise accuracy) of 68.47%. In order to utilize evolutionary information, a SVM model was developed using PSSM (Position-Specific Scoring Matrix) profiles obtained from PSI-BLAST (Position-Specific Iterated BLAST) and overall accuracy achieved was of 86.62% with average accuracy of 73.71%. In addition, HMM (Hidden Markov Model), MEME/MAST (Multiple Em for Motif Elicitation/Motif Alignment and Search Tool) and hybrid model that combined two or more models were also developed. We achieved maximum overall accuracy of 86.8% with average accuracy of 89.00% using combination of PSSM based SVM model and MEME/MAST. Performance of our method was compared with that of the existing methods developed for predicting subcellular locations of Gram-positive bacterial proteins.
A highly accurate method has been developed for predicting subcellular location of mycobacterial proteins. This method also predicts very important class of proteins that is membrane-attached proteins. This method will be useful in annotating newly sequenced or hypothetical mycobacterial proteins. Based on above study, a freely accessible web server TBpred http://www.imtech.res.in/raghava/tbpred/ has been developed.
过去已经开发了多种方法来预测真核生物、原核生物(革兰氏阴性和革兰氏阳性细菌)以及人类蛋白质的亚细胞定位,但尚未开发出针对分枝杆菌蛋白质的预测方法,而分枝杆菌蛋白质可能代表这种可怕病原体的有效免疫原库。在本研究中,尝试开发预测分枝杆菌蛋白质亚细胞定位的方法。
在852种分枝杆菌蛋白质上对模型进行训练和测试,并使用五折交叉验证技术进行评估。首先使用氨基酸组成开发了支持向量机(SVM)模型,总体准确率达到82.51%,平均准确率(类准确率的平均值)为68.47%。为了利用进化信息,使用从位置特异性迭代比对工具(PSI-BLAST)获得的位置特异性得分矩阵(PSSM)谱开发了一个SVM模型,总体准确率达到86.62%,平均准确率为73.71%。此外,还开发了隐马尔可夫模型(HMM)、多模体引出/模体比对与搜索工具(MEME/MAST)以及结合两个或更多模型的混合模型。使用基于PSSM的SVM模型和MEME/MAST的组合,我们实现了86.8%的最大总体准确率和89.00%的平均准确率。将我们方法的性能与为预测革兰氏阳性细菌蛋白质亚细胞定位而开发的现有方法进行了比较。
已开发出一种高度准确的方法来预测分枝杆菌蛋白质的亚细胞定位。该方法还能预测非常重要的一类蛋白质,即膜附着蛋白。此方法将有助于注释新测序的或假设的分枝杆菌蛋白质。基于上述研究,已开发了一个可免费访问的网络服务器TBpred http://www.imtech.res.in/raghava/tbpred/ 。