Michel Mirco, Hayat Sikander, Skwark Marcin J, Sander Chris, Marks Debora S, Elofsson Arne
Department of Biochemistry and Biophysics, Stockholm University, 10691 Stockholm, Sweden, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden, Department of Systems Biology, Harvard Medical School, Boston, MA, USA, Department of Information and Computer Science, Aalto University, PO Box 15400, FI-00076 Aalto, Finland and Computational Biology, Memorial Sloan-Kettering Cancer Center, New York, NY, USA Department of Biochemistry and Biophysics, Stockholm University, 10691 Stockholm, Sweden, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden, Department of Systems Biology, Harvard Medical School, Boston, MA, USA, Department of Information and Computer Science, Aalto University, PO Box 15400, FI-00076 Aalto, Finland and Computational Biology, Memorial Sloan-Kettering Cancer Center, New York, NY, USA.
Department of Biochemistry and Biophysics, Stockholm University, 10691 Stockholm, Sweden, Science for Life Laboratory, Stockholm University, Box 1031, 17121 Solna, Sweden, Department of Systems Biology, Harvard Medical School, Boston, MA, USA, Department of Information and Computer Science, Aalto University, PO Box 15400, FI-00076 Aalto, Finland and Computational Biology, Memorial Sloan-Kettering Cancer Center, New York, NY, USA.
Bioinformatics. 2014 Sep 1;30(17):i482-8. doi: 10.1093/bioinformatics/btu458.
Recently it has been shown that the quality of protein contact prediction from evolutionary information can be improved significantly if direct and indirect information is separated. Given sufficiently large protein families, the contact predictions contain sufficient information to predict the structure of many protein families. However, since the first studies contact prediction methods have improved. Here, we ask how much the final models are improved if improved contact predictions are used.
In a small benchmark of 15 proteins, we show that the TM-scores of top-ranked models are improved by on average 33% using PconsFold compared with the original version of EVfold. In a larger benchmark, we find that the quality is improved with 15-30% when using PconsC in comparison with earlier contact prediction methods. Further, using Rosetta instead of CNS does not significantly improve global model accuracy, but the chemistry of models generated with Rosetta is improved.
PconsFold is a fully automated pipeline for ab initio protein structure prediction based on evolutionary information. PconsFold is based on PconsC contact prediction and uses the Rosetta folding protocol. Due to its modularity, the contact prediction tool can be easily exchanged. The source code of PconsFold is available on GitHub at https://www.github.com/ElofssonLab/pcons-fold under the MIT license. PconsC is available from http://c.pcons.net/.
Supplementary data are available at Bioinformatics online.
最近研究表明,如果将直接和间接信息分开,从进化信息预测蛋白质接触的质量可显著提高。对于足够大的蛋白质家族,接触预测包含足够信息来预测许多蛋白质家族的结构。然而,自首次研究以来,接触预测方法已有改进。在此,我们探讨如果使用改进的接触预测,最终模型会有多大程度的改进。
在一个包含15种蛋白质的小型基准测试中,我们发现与原始版本的EVfold相比,使用PconsFold时排名靠前的模型的TM分数平均提高了33%。在一个更大的基准测试中,我们发现与早期的接触预测方法相比,使用PconsC时质量提高了15%-30%。此外,使用Rosetta而非CNS并没有显著提高全局模型准确性,但使用Rosetta生成的模型的化学性质得到了改善。
PconsFold是一个基于进化信息的从头算蛋白质结构预测的全自动流程。PconsFold基于PconsC接触预测,并使用Rosetta折叠协议。由于其模块化,接触预测工具可轻松更换。PconsFold的源代码可在GitHub上获取,网址为https://www.github.com/ElofssonLab/pcons-fold ,遵循MIT许可。PconsC可从http://c.pcons.net/获取。
补充数据可在《生物信息学》在线获取。