Key Laboratory of Functional Protein Research of Guangdong Higher Education Institutes, Institute of Life and Health Engineering, College of Life Science and Technology, Jinan University , Guangzhou 510632, China.
J Proteome Res. 2017 Dec 1;16(12):4446-4454. doi: 10.1021/acs.jproteome.7b00463. Epub 2017 Oct 11.
Multiple search engines based on various models have been developed to search MS/MS spectra against a reference database, providing different results for the same data set. How to integrate these results efficiently with minimal compromise on false discoveries is an open question due to the lack of an independent, reliable, and highly sensitive standard. We took the advantage of the translating mRNA sequencing (RNC-seq) result as a standard to evaluate the integration strategies of the protein identifications from various search engines. We used seven mainstream search engines (Andromeda, Mascot, OMSSA, X!Tandem, pFind, InsPecT, and ProVerB) to search the same label-free MS data sets of human cell lines Hep3B, MHCCLM3, and MHCC97H from the Chinese C-HPP Consortium for Chromosomes 1, 8, and 20. As expected, the union of seven engines resulted in a boosted false identification, whereas the intersection of seven engines remarkably decreased the identification power. We found that identifications of at least two out of seven engines resulted in maximizing the protein identification power while minimizing the ratio of suspicious/translation-supported identifications (STR), as monitored by our STR index, based on RNC-Seq. Furthermore, this strategy also significantly improves the peptides coverage of the protein amino acid sequence. In summary, we demonstrated a simple strategy to significantly improve the performance for shotgun mass spectrometry by protein-level integrating multiple search engines, maximizing the utilization of the current MS spectra without additional experimental work.
基于各种模型的多个搜索引擎已被开发出来,用于根据参考数据库搜索 MS/MS 谱,为同一数据集提供不同的结果。由于缺乏独立、可靠和高度敏感的标准,如何有效地整合这些结果,同时最大限度地减少错误发现,这是一个悬而未决的问题。我们利用翻译 mRNA 测序 (RNC-seq) 结果作为标准,来评估来自不同搜索引擎的蛋白质鉴定的整合策略。我们使用七种主流搜索引擎(Andromeda、Mascot、OMSSA、X!Tandem、pFind、InsPecT 和 ProVerB)来搜索来自中国染色体 1、8 和 20 协作组的人细胞系 Hep3B、MHCCLM3 和 MHCC97H 的无标签 MS 数据集。正如预期的那样,七个引擎的联合导致了虚假鉴定的增加,而七个引擎的交集则显著降低了鉴定能力。我们发现,至少有两个引擎的鉴定结果能够在最大限度地提高蛋白质鉴定能力的同时,最小化可疑/翻译支持鉴定(STR)的比例,这是我们基于 RNC-Seq 的 STR 指数监测的结果。此外,这种策略还显著提高了蛋白质氨基酸序列的肽覆盖率。总之,我们展示了一种通过在蛋白质水平上整合多个搜索引擎来显著提高 shotgun 质谱性能的简单策略,在不进行额外实验工作的情况下,最大限度地利用当前的 MS 谱。