Shen Jiquan, Guo Xuanhui, Bai Hanwen, Luo Junwei
School of Software, Henan Polytechnic University, Jiaozuo, China.
Front Bioinform. 2024 Jul 15;4:1403826. doi: 10.3389/fbinf.2024.1403826. eCollection 2024.
The identification of cancer subtypes plays a very important role in the field of medicine. Accurate identification of cancer subtypes is helpful for both cancer treatment and prognosis Currently, most methods for cancer subtype identification are based on single-omics data, such as gene expression data. However, multi-omics data can show various characteristics about cancer, which also can improve the accuracy of cancer subtype identification. Therefore, how to extract features from multi-omics data for cancer subtype identification is the main challenge currently faced by researchers. In this paper, we propose a cancer subtype identification method named CAEM-GBDT, which takes gene expression data, miRNA expression data, and DNA methylation data as input, and adopts convolutional autoencoder network to identify cancer subtypes. Through a convolutional encoder layer, the method performs feature extraction on the input data. Within the convolutional encoder layer, a convolutional self-attention module is embedded to recognize higher-level representations of the multi-omics data. The extracted high-level representations from the convolutional encoder are then concatenated with the input to the decoder. The GBDT (Gradient Boosting Decision Tree) is utilized for cancer subtype identification. In the experiments, we compare CAEM-GBDT with existing cancer subtype identifying methods. Experimental results demonstrate that the proposed CAEM-GBDT outperforms other methods. The source code is available from GitHub at https://github.com/gxh-1/CAEM-GBDT.git.
癌症亚型的识别在医学领域起着非常重要的作用。准确识别癌症亚型有助于癌症治疗和预后。目前,大多数癌症亚型识别方法基于单组学数据,如基因表达数据。然而,多组学数据可以展现出癌症的各种特征,这也能够提高癌症亚型识别的准确性。因此,如何从多组学数据中提取特征用于癌症亚型识别是研究人员目前面临的主要挑战。在本文中,我们提出了一种名为CAEM-GBDT的癌症亚型识别方法,该方法将基因表达数据、miRNA表达数据和DNA甲基化数据作为输入,并采用卷积自动编码器网络来识别癌症亚型。通过卷积编码器层,该方法对输入数据进行特征提取。在卷积编码器层内,嵌入了一个卷积自注意力模块以识别多组学数据的更高层次表示。从卷积编码器提取的高层次表示随后与解码器的输入连接起来。利用梯度提升决策树(GBDT)进行癌症亚型识别。在实验中,我们将CAEM-GBDT与现有的癌症亚型识别方法进行比较。实验结果表明,所提出的CAEM-GBDT优于其他方法。源代码可从GitHub上的https://github.com/gxh-1/CAEM-GBDT.git获取。