Podell Sheila, Gribskov Michael
San Diego Supercomputer Center, University of California San Diego, La Jolla CA 92093-0537, USA.
BMC Genomics. 2004 Jun 17;5(1):37. doi: 10.1186/1471-2164-5-37.
N-terminal myristoylation plays a vital role in membrane targeting and signal transduction in plant responses to environmental stress. Although N-myristoyltransferase enzymatic function is conserved across plant, animal, and fungal kingdoms, exact substrate specificities vary, making it difficult to predict protein myristoylation accurately within specific taxonomic groups.
A new method for predicting N-terminal myristoylation sites specifically in plants has been developed and statistically tested for sensitivity, specificity, and robustness. Compared to previously available methods, the new model is both more sensitive in detecting known positives, and more selective in avoiding false positives. Scores of myristoylated and non-myristoylated proteins are more widely separated than with other methods, greatly reducing ambiguity and the number of sequences giving intermediate, uninformative results. The prediction model is available at http://plantsp.sdsc.edu/myrist.html.
Superior performance of the new model is due to the selection of a plant-specific training set, covering 266 unique sequence examples from 40 different species, the use of a probability-based hidden Markov model to obtain predictive scores, and a threshold cutoff value chosen to provide maximum positive-negative discrimination. The new model has been used to predict 589 plant proteins likely to contain N-terminal myristoylation signals, and to analyze the functional families in which these proteins occur.
N 端肉豆蔻酰化在植物对环境胁迫的反应中,对膜靶向和信号转导起着至关重要的作用。尽管 N-肉豆蔻酰转移酶的酶功能在植物、动物和真菌界是保守的,但确切的底物特异性各不相同,这使得在特定分类群中准确预测蛋白质肉豆蔻酰化变得困难。
已开发出一种专门用于预测植物 N 端肉豆蔻酰化位点的新方法,并对其敏感性、特异性和稳健性进行了统计学测试。与先前可用的方法相比,新模型在检测已知阳性时更敏感,在避免假阳性时更具选择性。与其他方法相比,肉豆蔻酰化和非肉豆蔻酰化蛋白质的得分分离得更广泛,大大减少了模糊性以及给出中间无信息结果的序列数量。预测模型可在 http://plantsp.sdsc.edu/myrist.html 获得。
新模型的卓越性能归因于选择了一个植物特异性训练集,涵盖来自 40 个不同物种的 266 个独特序列示例,使用基于概率的隐马尔可夫模型来获得预测分数,以及选择了一个阈值截断值以提供最大的正负区分度。新模型已用于预测 589 种可能含有 N 端肉豆蔻酰化信号的植物蛋白,并分析这些蛋白所在的功能家族。