Di Francesco V, McQueen P, Garnier J, Munson P J
National Institutes of Health, Bethesda, MD 20892-5626, USA.
Proc Int Conf Intell Syst Mol Biol. 1997;5:100-3.
Here we propose an approach to include global structural information in the secondary structure prediction procedure based on hidden Markov models (HMMs) of protein folds. We first identify the correct fold or 'topology' of a protein by means of the HMMs of topology families of proteins. Then the most likely structural model for that protein is used to modify the sequence of secondary structure states previously obtained with a prediction algorithm. Our goal is to investigate the effect on the prediction accuracy of including global structural information in the secondary structure prediction scheme, by means of the HMMs. We find that when the HMM of the predicted topology of a protein is used to adjust the secondary structure sequence predicted originally with the Quadratic-Logistic method, the cross-validated prediction accuracy (Q3) improves by 3%. The topology is correctly predicted in 68% of the cases. We conclude that this HMM based approach is a promising tool for effectively incorporating global structural information in the secondary structure prediction scheme.
在此,我们提出一种方法,将全局结构信息纳入基于蛋白质折叠隐马尔可夫模型(HMM)的二级结构预测程序中。我们首先通过蛋白质拓扑家族的HMM识别蛋白质的正确折叠或“拓扑结构”。然后,使用该蛋白质最可能的结构模型来修改先前通过预测算法获得的二级结构状态序列。我们的目标是通过HMM研究在二级结构预测方案中纳入全局结构信息对预测准确性的影响。我们发现,当使用蛋白质预测拓扑的HMM来调整最初用二次逻辑方法预测的二级结构序列时,交叉验证预测准确性(Q3)提高了3%。在68%的情况下,拓扑结构被正确预测。我们得出结论,这种基于HMM的方法是一种很有前景的工具,可有效地将全局结构信息纳入二级结构预测方案中。