Suppr超能文献

基于位置特异性得分矩阵的蛋白质中DNA结合位点预测

PSSM-based prediction of DNA binding sites in proteins.

作者信息

Ahmad Shandar, Sarai Akinori

机构信息

Department of Bioinformatics and Bioscience, Kyushu Institute of Technology, Iizuka 820 8502, Fukuoka, Japan.

出版信息

BMC Bioinformatics. 2005 Feb 19;6:33. doi: 10.1186/1471-2105-6-33.

Abstract

BACKGROUND

Detection of DNA-binding sites in proteins is of enormous interest for technologies targeting gene regulation and manipulation. We have previously shown that a residue and its sequence neighbor information can be used to predict DNA-binding candidates in a protein sequence. This sequence-based prediction method is applicable even if no sequence homology with a previously known DNA-binding protein is observed. Here we implement a neural network based algorithm to utilize evolutionary information of amino acid sequences in terms of their position specific scoring matrices (PSSMs) for a better prediction of DNA-binding sites.

RESULTS

An average of sensitivity and specificity using PSSMs is up to 8.7% better than the prediction with sequence information only. Much smaller data sets could be used to generate PSSM with minimal loss of prediction accuracy.

CONCLUSION

One problem in using PSSM-derived prediction is obtaining lengthy and time-consuming alignments against large sequence databases. In order to speed up the process of generating PSSMs, we tried to use different reference data sets (sequence space) against which a target protein is scanned for PSI-BLAST iterations. We find that a very small set of proteins can actually be used as such a reference data without losing much of the prediction value. This makes the process of generating PSSMs very rapid and even amenable to be used at a genome level. A web server has been developed to provide these predictions of DNA-binding sites for any new protein from its amino acid sequence.

AVAILABILITY

Online predictions based on this method are available at http://www.netasa.org/dbs-pssm/

摘要

背景

对于旨在进行基因调控和操纵的技术而言,检测蛋白质中的DNA结合位点极具意义。我们之前已经表明,一个残基及其序列邻域信息可用于预测蛋白质序列中的DNA结合候选位点。即使未观察到与先前已知的DNA结合蛋白的序列同源性,这种基于序列的预测方法也适用。在此,我们实现了一种基于神经网络的算法,以利用氨基酸序列的进化信息(根据其位置特异性得分矩阵,即PSSM)来更好地预测DNA结合位点。

结果

使用PSSM的敏感性和特异性平均比仅使用序列信息的预测提高了8.7%。可以使用小得多的数据集来生成PSSM,而预测准确性的损失最小。

结论

使用源自PSSM的预测存在的一个问题是,针对大型序列数据库进行比对既冗长又耗时。为了加快生成PSSM的过程,我们尝试使用不同的参考数据集(序列空间),针对这些数据集对目标蛋白进行PSI-BLAST迭代扫描。我们发现,实际上可以使用非常小的一组蛋白质作为这样的参考数据,而不会损失太多预测价值。这使得生成PSSM的过程非常迅速,甚至适用于在基因组水平上使用。我们已经开发了一个网络服务器,可根据任何新蛋白质的氨基酸序列提供这些DNA结合位点的预测。

可用性

基于此方法的在线预测可在http://www.netasa.org/dbs-pssm/获得

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d4fc/550660/94a6dec037b9/1471-2105-6-33-1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验