Department of Bioinformatics and Computational Biology, The University of Texas MD Anderson Cancer Center, Houston, TX.
Department of Cell, Development and Cancer Biology, Knight Cancer Institute, Oregon Health & Science University, Portland, OR.
JCO Clin Cancer Inform. 2020 Apr;4:357-366. doi: 10.1200/CCI.19.00144.
Predicting cancer dependencies from molecular data can help stratify patients and identify novel therapeutic targets. Recently available data on large-scale cancer cell line dependency allow a systematic assessment of the predictive power of diverse molecular features; however, the protein expression data have not been rigorously evaluated. By using the protein expression data generated by reverse-phase protein arrays, we aimed to assess their predictive power in identifying cancer dependencies and to develop a related analytic tool for community use.
By using a machine learning schema, we conducted an analysis of feature importance based on cancer dependency and multiomic data from the DepMap and Cancer Cell Line Encyclopedia projects. We assessed the consistency of cancer dependency data between CRISPR/Cas9 and short hairpin RNA-mediated perturbation platforms. For a fair comparison, we focused on a set of genes with robust dependency data and four available expression-related features (copy number alteration, DNA methylation, messenger RNA expression, and protein expression) and performed the same-gene predictions of the cancer dependency using different molecular features.
For the genes surveyed, we observed that the protein expression data contained substantial predictive power for cancer dependencies, and they were the best predictive feature for the CRISPR/Cas9-based dependency data. We also developed a user-friendly protein-dependency analytic module and integrated it with The Cancer Proteome Atlas; this module allows researchers to explore and analyze our results intuitively.
This study provides a systematic assessment for predicting cancer dependencies of cell lines from different expression-related features of a gene. Our results suggest that protein expression data are a highly valuable information resource for understanding tumor vulnerabilities and identifying therapeutic opportunities.
从分子数据预测癌症相关性可以帮助对患者进行分层并确定新的治疗靶点。最近获得的大规模癌细胞系相关性数据可对各种分子特征的预测能力进行系统评估;然而,蛋白质表达数据尚未经过严格评估。我们使用反相蛋白微阵列生成的蛋白质表达数据,旨在评估其识别癌症相关性的预测能力,并开发相关的分析工具供社区使用。
我们使用机器学习方案,根据 DepMap 和癌症细胞系百科全书项目中的癌症相关性和多组学数据,进行了基于特征重要性的分析。我们评估了 CRISPR/Cas9 和短发夹 RNA 介导的扰动平台之间癌症相关性数据的一致性。为了进行公平比较,我们专注于一组具有可靠依赖性数据的基因和四个可用的表达相关特征(拷贝数改变、DNA 甲基化、信使 RNA 表达和蛋白质表达),并使用不同的分子特征对癌症依赖性进行相同基因预测。
对于调查的基因,我们观察到蛋白质表达数据对癌症相关性具有很大的预测能力,并且是基于 CRISPR/Cas9 的依赖性数据的最佳预测特征。我们还开发了一个用户友好的蛋白质依赖性分析模块,并将其与癌症蛋白质组图谱集成;该模块允许研究人员直观地探索和分析我们的结果。
本研究对从基因的不同表达相关特征预测细胞系的癌症相关性进行了系统评估。我们的结果表明,蛋白质表达数据是了解肿瘤脆弱性和确定治疗机会的极具价值的信息资源。