Azuaje Francisco, Kim Sang-Yoon, Perez Hernandez Daniel, Dittmar Gunnar
Quantitative Biology Unit, Luxembourg Institute of Health (LIH), Strassen L-1445, Luxembourg.
J Clin Med. 2019 Sep 25;8(10):1535. doi: 10.3390/jcm8101535.
Proteomics data encode molecular features of diagnostic value and accurately reflect key underlying biological mechanisms in cancers. Histopathology imaging is a well-established clinical approach to cancer diagnosis. The predictive relationship between large-scale proteomics and H&E-stained histopathology images remains largely uncharacterized. Here we investigate such associations through the application of machine learning, including deep neural networks, to proteomics and histology imaging datasets generated by the Clinical Proteomic Tumor Analysis Consortium (CPTAC) from clear cell renal cell carcinoma patients. We report robust correlations between a set of diagnostic proteins and predictions generated by an imaging-based classification model. Proteins significantly correlated with the histology-based predictions are significantly implicated in immune responses, extracellular matrix reorganization, and metabolism. Moreover, we showed that the genes encoding these proteins also reliably recapitulate the biological associations with imaging-derived predictions based on strong gene-protein expression correlations. Our findings offer novel insights into the integrative modeling of histology and omics data through machine learning, as well as the methodological basis for new research opportunities in this and other cancer types.
蛋白质组学数据编码具有诊断价值的分子特征,并能准确反映癌症关键的潜在生物学机制。组织病理学成像技术是一种成熟的癌症诊断临床方法。大规模蛋白质组学与苏木精-伊红(H&E)染色的组织病理学图像之间的预测关系在很大程度上仍未得到充分描述。在此,我们通过应用机器学习,包括深度神经网络,来研究临床蛋白质组肿瘤分析联盟(CPTAC)从肾透明细胞癌患者中生成的蛋白质组学和组织学成像数据集之间的这种关联。我们报告了一组诊断性蛋白质与基于成像的分类模型生成的预测结果之间存在显著相关性。与基于组织学的预测结果显著相关的蛋白质在免疫反应、细胞外基质重组和代谢过程中具有重要作用。此外,我们还表明,基于基因与蛋白质表达之间的强相关性,编码这些蛋白质的基因也能可靠地概括与成像衍生预测结果的生物学关联。我们的研究结果为通过机器学习对组织学和组学数据进行综合建模提供了新的见解,同时也为这种癌症及其他癌症类型的新研究机会提供了方法学基础。