Department of Structural Biology, Stanford University, Stanford, CA 94305, USA.
Proteins. 2010 Sep;78(12):2668-78. doi: 10.1002/prot.22781.
Protein structure refinement is an important but unsolved problem; it must be solved if we are to predict biological function that is very sensitive to structural details. Specifically, critical assessment of techniques for protein structure prediction (CASP) shows that the accuracy of predictions in the comparative modeling category is often worse than that of the template on which the homology model is based. Here we describe a refinement protocol that is able to consistently refine submitted predictions for all categories at CASP7. The protocol uses direct energy minimization of the knowledge-based potential of mean force that is based on the interaction statistics of 167 atom types (Summa and Levitt, Proc Natl Acad Sci USA 2007; 104:3177-3182). Our protocol is thus computationally very efficient; it only takes a few minutes of CPU time to run typical protein models (300 residues). We observe an average structural improvement of 1% in GDT_TS, for predictions that have low and medium homology to known PDB structures (Global Distance Test score or GDT_TS between 50 and 80%). We also observe a marked improvement in the stereochemistry of the models. The level of improvement varies amongst the various participants at CASP, but we see large improvements (>10% increase in GDT_TS) even for models predicted by the best performing groups at CASP7. In addition, our protocol consistently improved the best predicted models in the refinement category at CASP7 and CASP8. These improvements in structure and stereochemistry prove the usefulness of our computationally inexpensive, powerful and automatic refinement protocol.
蛋白质结构精修是一个重要但尚未解决的问题;如果我们要预测对结构细节非常敏感的生物学功能,就必须解决这个问题。具体来说,蛋白质结构预测技术的关键评估(CASP)表明,同源建模类别中预测的准确性通常比基于同源模板的预测准确性差。在这里,我们描述了一种能够一致地改进 CASP7 所有类别提交预测的精修方案。该方案使用基于 167 种原子类型相互作用统计的基于知识的平均力势能的直接能量最小化(Summa 和 Levitt,Proc Natl Acad Sci USA 2007;104:3177-3182)。因此,我们的方案在计算上非常高效;运行典型的蛋白质模型(300 个残基)只需几分钟的 CPU 时间。我们观察到,对于与已知 PDB 结构具有低和中等同源性的预测(全局距离测试分数或 GDT_TS 在 50 到 80%之间),平均结构改进为 1%。我们还观察到模型立体化学的显著改善。在 CASP 中的各个参与者中,改进的程度有所不同,但我们甚至看到了很大的改进(GDT_TS 增加超过 10%),即使对于 CASP7 中表现最好的组预测的模型也是如此。此外,我们的方案在 CASP7 和 CASP8 的精修类别中一致地改进了预测最好的模型。这些结构和立体化学的改进证明了我们计算成本低、功能强大且自动的精修方案的有效性。