Webb-Robertson Bobbie-Jo M, Matzke Melissa M, Datta Susmita, Payne Samuel H, Kang Jiyun, Bramer Lisa M, Nicora Carrie D, Shukla Anil K, Metz Thomas O, Rodland Karin D, Smith Richard D, Tardiff Mark F, McDermott Jason E, Pounds Joel G, Waters Katrina M
From the ‡Applied Statistics and Computational Modeling, Pacific Northwest National Laboratory, Richland, WA 99354;
§Computational Biology & Bioinformatics, Pacific Northwest National Laboratory, Richland, WA 99354;
Mol Cell Proteomics. 2014 Dec;13(12):3639-46. doi: 10.1074/mcp.M113.030932.
As the capability of mass spectrometry-based proteomics has matured, tens of thousands of peptides can be measured simultaneously, which has the benefit of offering a systems view of protein expression. However, a major challenge is that, with an increase in throughput, protein quantification estimation from the native measured peptides has become a computational task. A limitation to existing computationally driven protein quantification methods is that most ignore protein variation, such as alternate splicing of the RNA transcript and post-translational modifications or other possible proteoforms, which will affect a significant fraction of the proteome. The consequence of this assumption is that statistical inference at the protein level, and consequently downstream analyses, such as network and pathway modeling, have only limited power for biomarker discovery. Here, we describe a Bayesian Proteoform Quantification model (BP-Quant)(1) that uses statistically derived peptides signatures to identify peptides that are outside the dominant pattern or the existence of multiple overexpressed patterns to improve relative protein abundance estimates. It is a research-driven approach that utilizes the objectives of the experiment, defined in the context of a standard statistical hypothesis, to identify a set of peptides exhibiting similar statistical behavior relating to a protein. This approach infers that changes in relative protein abundance can be used as a surrogate for changes in function, without necessarily taking into account the effect of differential post-translational modifications, processing, or splicing in altering protein function. We verify the approach using a dilution study from mouse plasma samples and demonstrate that BP-Quant achieves similar accuracy as the current state-of-the-art methods at proteoform identification with significantly better specificity. BP-Quant is available as a MatLab® and R packages.
随着基于质谱的蛋白质组学技术日益成熟,可同时检测数以万计的肽段,这有助于从系统层面了解蛋白质表达情况。然而,一个主要挑战在于,随着通量的增加,从原始测量肽段进行蛋白质定量估计已成为一项计算任务。现有计算驱动的蛋白质定量方法的一个局限性在于,大多数方法忽略了蛋白质变异,如RNA转录本的可变剪接、翻译后修饰或其他可能的蛋白质异构体,而这些会影响相当一部分蛋白质组。这种假设的结果是,在蛋白质水平上的统计推断以及随之而来的下游分析,如网络和通路建模,在生物标志物发现方面的能力有限。在此,我们描述了一种贝叶斯蛋白质异构体定量模型(BP-Quant),该模型利用统计推导的肽段特征来识别偏离主导模式的肽段或多种过表达模式的存在,以改进相对蛋白质丰度估计。这是一种以研究为导向的方法,它利用在标准统计假设背景下定义的实验目标,来识别一组与蛋白质相关的具有相似统计行为的肽段。这种方法推断,相对蛋白质丰度的变化可以用作功能变化的替代指标,而不必考虑翻译后修饰、加工或剪接差异对蛋白质功能改变的影响。我们使用小鼠血浆样本的稀释研究验证了该方法,并证明BP-Quant在蛋白质异构体鉴定方面与当前最先进的方法具有相似的准确性,且特异性显著更高。BP-Quant可作为MatLab®和R软件包获取。