Li Zhao, Tang Jijun, Guo Fei
School of Computer Science and Technology, Tianjin University, 92 Weijin Road, Nankai District, Tianjin, P.R. China.
School of Computational Science and Engineering, University of South Carolina, Columbia, United States of America.
PLoS One. 2016 Feb 1;11(2):e0147467. doi: 10.1371/journal.pone.0147467. eCollection 2016.
The 14-3-3 proteins are a highly conserved family of homodimeric and heterodimeric molecules, expressed in all eukaryotic cells. In human cells, this family consists of seven distinct but highly homologous 14-3-3 isoforms. 14-3-3σ is the only isoform directly linked to cancer in epithelial cells, which is regulated by major tumor suppressor genes. For each 14-3-3 isoform, we have 1,000 peptide motifs with experimental binding affinity values. In this paper, we present a novel method for identifying peptide motifs binding to 14-3-3σ isoform. First, we propose a sampling criteria to build a predictor for each new peptide sequence. Then, we select nine physicochemical properties of amino acids to describe each peptide motif. We also use auto-cross covariance to extract correlative properties of amino acids in any two positions. Finally, we consider elastic net to predict affinity values of peptide motifs, based on ridge regression and least absolute shrinkage and selection operator (LASSO). Our method tests on the 1,000 known peptide motifs binding to seven 14-3-3 isoforms. On the 14-3-3σ isoform, our method has overall pearson-product-moment correlation coefficient (PCC) and root mean squared error (RMSE) values of 0.84 and 252.31 for N-terminal sublibrary, and 0.77 and 269.13 for C-terminal sublibrary. We predict affinity values of 16,000 peptide sequences and relative binding ability across six permutated positions similar with experimental values. We identify phosphopeptides that preferentially bind to 14-3-3σ over other isoforms. Several positions on peptide motifs are in the same amino acid category with experimental substrate specificity of phosphopeptides binding to 14-3-3σ. Our method is fast and reliable and is a general computational method that can be used in peptide-protein binding identification in proteomics research.
14-3-3蛋白是一类高度保守的同二聚体和异二聚体分子家族,存在于所有真核细胞中。在人类细胞中,该家族由七种不同但高度同源的14-3-3亚型组成。14-3-3σ是上皮细胞中唯一与癌症直接相关的亚型,它受主要肿瘤抑制基因调控。对于每种14-3-3亚型,我们有1000个具有实验结合亲和力值的肽基序。在本文中,我们提出了一种鉴定与14-3-3σ亚型结合的肽基序的新方法。首先,我们提出一种采样标准,为每个新的肽序列构建一个预测器。然后,我们选择氨基酸的九种物理化学性质来描述每个肽基序。我们还使用自交叉协方差来提取任意两个位置氨基酸的相关性质。最后,我们基于岭回归和最小绝对收缩与选择算子(LASSO),考虑弹性网络来预测肽基序的亲和力值。我们的方法在与七种14-3-3亚型结合的1000个已知肽基序上进行测试。在14-3-3σ亚型上,我们的方法对于N端子文库的总体皮尔逊积矩相关系数(PCC)和均方根误差(RMSE)值分别为0.84和252.31,对于C端子文库分别为0.77和269.13。我们预测了16000个肽序列的亲和力值以及六个置换位置上的相对结合能力,其与实验值相似。我们鉴定出比其他亚型更优先结合14-3-3σ的磷酸肽。肽基序上的几个位置与磷酸肽结合14-3-3σ的实验底物特异性处于相同的氨基酸类别。我们的方法快速且可靠,是一种可用于蛋白质组学研究中肽-蛋白质结合鉴定的通用计算方法。