From the ‡Translational Gastrointestinal Oncology, Department of Pathology, Netherlands Cancer Institute, Amsterdam, the Netherlands.
§Oncoproteomics Laboratory, Department of Medical Oncology, VU University Medical Center, Amsterdam, the Netherlands.
Mol Cell Proteomics. 2017 Oct;16(10):1850-1863. doi: 10.1074/mcp.TIR117.000056. Epub 2017 Jul 26.
Proteogenomics, comprehensive integration of genomics and proteomics data, is a powerful approach identifying novel protein biomarkers. This is especially the case for proteins that differ structurally between disease and control conditions. As tumor development is associated with aberrant splicing, we focus on this rich source of cancer specific biomarkers. To this end, we developed a proteogenomic pipeline, Splicify, which can detect differentially expressed protein isoforms. Splicify is based on integrating RNA massive parallel sequencing data and tandem mass spectrometry proteomics data to identify protein isoforms resulting from differential splicing between two conditions. Proof of concept was obtained by applying Splicify to RNA sequencing and mass spectrometry data obtained from colorectal cancer cell line SW480, before and after siRNA-mediated downmodulation of the splicing factors SF3B1 and SRSF1. These analyses revealed 2172 and 149 differentially expressed isoforms, respectively, with peptide confirmation upon knock-down of SF3B1 and SRSF1 compared with their controls. Splice variants identified included RAC1, OSBPL3, MKI67, and SYK. One additional sample was analyzed by PacBio Iso-Seq full-length transcript sequencing after SF3B1 downmodulation. This analysis verified the alternative splicing identified by Splicify and in addition identified novel splicing events that were not represented in the human reference genome annotation. Therefore, Splicify offers a validated proteogenomic data analysis pipeline for identification of disease specific protein biomarkers resulting from mRNA alternative splicing. Splicify is publicly available on GitHub (https://github.com/NKI-TGO/SPLICIFY) and suitable to address basic research questions using pre-clinical model systems as well as translational research questions using patient-derived samples, allowing to identify clinically relevant biomarkers.
蛋白质基因组学,全面整合基因组学和蛋白质组学数据,是一种识别新型蛋白质生物标志物的强大方法。对于在疾病和对照条件下结构不同的蛋白质尤其如此。由于肿瘤的发生与异常剪接有关,我们专注于这个丰富的癌症特异性生物标志物来源。为此,我们开发了一种蛋白质基因组学管道 Splicify,可以检测差异表达的蛋白质同工型。Splicify 基于整合 RNA 大规模平行测序数据和串联质谱蛋白质组学数据,以鉴定两种条件之间差异剪接产生的蛋白质同工型。通过将 Splicify 应用于从结直肠癌细胞系 SW480 获得的 RNA 测序和质谱数据,在 SF3B1 和 SRSF1 的 siRNA 介导下调前后,获得了概念验证。这些分析分别显示了 2172 个和 149 个差异表达的同工型,与对照相比,SF3B1 和 SRSF1 的敲低有肽确认。鉴定的剪接变体包括 RAC1、OSBPL3、MKI67 和 SYK。在 SF3B1 下调后,还对另一个样本进行了 PacBio Iso-Seq 全长转录测序分析。该分析验证了 Splicify 鉴定的可变剪接,并额外鉴定了未在人类参考基因组注释中表示的新剪接事件。因此,Splicify 提供了一种经过验证的蛋白质基因组数据分析管道,用于鉴定由 mRNA 可变剪接引起的疾病特异性蛋白质生物标志物。Splicify 可在 GitHub(https://github.com/NKI-TGO/SPLICIFY)上公开获得,适用于使用临床前模型系统解决基础研究问题,以及使用患者来源样本解决转化研究问题,从而识别临床相关的生物标志物。