GD-Net：一种基于深度学习的集成多模态信息模型，用于癌症预后预测和信息性特征选择。

GD-Net: An Integrated Multimodal Information Model Based on Deep Learning for Cancer Outcome Prediction and Informative Feature Selection.

作者信息

Lin Junqi, Deng Weizhen, Wei Junyu, Zheng Jinyong, Chen Kenan, Chai Hua, Zeng Tao, Tang Hui

机构信息

School of Mathematics, Foshan University, Foshan, China.

Guangzhou National Laboratory, Guangzhou, China.

出版信息

J Cell Mol Med. 2024 Dec;28(23):e70221. doi: 10.1111/jcmm.70221.

DOI:10.1111/jcmm.70221

PMID:39628446

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11615516/

Abstract

Multimodal information provides valuable resources for cancer prognosis and survival prediction. However, the computational integration of this heterogeneous data information poses significant challenges due to the complex interactions between molecules from different biological modalities and the limited sample size. Here, we introduce GD-Net, a Graph Deep learning algorithm to enhance the accuracy of survival prediction with an average accuracy of 72% by early fusing of multimodal information, which includes an interpretable and lightweight XGBoost module to efficiently extract informative features. First, we applied GD-Net to eight cancer datasets and achieved superior performance compared to benchmarking methods, with an average 7.9% higher C-index value. The ablation experiments strongly supported that multi-modal integration could significantly improve accuracy over the single-modality model. In the deep case study of liver cancer, 319 differential genes, 15 differential miRNAs and 155 methylated differential genes based on the predicted risk subgroups are identified as the informative features, and then we have statistically and biologically validated the efficacy of these key molecules in internal and external test datasets. The comprehensive independent validations demonstrated that GD-Net is accurate and competitive in predicting different cancer outcomes in real-time, and it is an effective tool for identifying new multimodal prognosis biomarkers.

摘要

多模态信息为癌症预后和生存预测提供了有价值的资源。然而，由于来自不同生物学模态的分子之间存在复杂的相互作用以及样本量有限，这种异构数据信息的计算整合带来了重大挑战。在此，我们引入了GD-Net，一种图深度学习算法，通过早期融合多模态信息来提高生存预测的准确性，平均准确率达到72%，该算法包括一个可解释且轻量级的XGBoost模块，用于高效提取信息特征。首先，我们将GD-Net应用于八个癌症数据集，并与基准方法相比取得了卓越的性能，C指数值平均高出7.9%。消融实验有力地支持了多模态整合能够显著提高相对于单模态模型的准确性。在肝癌的深度案例研究中，基于预测的风险亚组，确定了319个差异基因、15个差异miRNA和155个甲基化差异基因作为信息特征，然后我们在内部和外部测试数据集中对这些关键分子的功效进行了统计学和生物学验证。全面的独立验证表明，GD-Net在实时预测不同癌症结果方面准确且具有竞争力，它是识别新的多模态预后生物标志物的有效工具。