School of Computer Science and Engineering, The University of Aizu, Aizuwakamatsu, Fukushima, 965-8580, Japan.
Department of Computer Science and Engineering, Rajshahi University of Engineering & Technology, Rajshahi, 6204, Bangladesh.
Sci Rep. 2023 Mar 7;13(1):3771. doi: 10.1038/s41598-023-30851-1.
Hepatocellular carcinoma (HCC) is the most common lethal malignancy of the liver worldwide. Thus, it is important to dig the key genes for uncovering the molecular mechanisms and to improve diagnostic and therapeutic options for HCC. This study aimed to encompass a set of statistical and machine learning computational approaches for identifying the key candidate genes for HCC. Three microarray datasets were used in this work, which were downloaded from the Gene Expression Omnibus Database. At first, normalization and differentially expressed genes (DEGs) identification were performed using limma for each dataset. Then, support vector machine (SVM) was implemented to determine the differentially expressed discriminative genes (DEDGs) from DEGs of each dataset and select overlapping DEDGs genes among identified three sets of DEDGs. Enrichment analysis was performed on common DEDGs using DAVID. A protein-protein interaction (PPI) network was constructed using STRING and the central hub genes were identified depending on the degree, maximum neighborhood component (MNC), maximal clique centrality (MCC), centralities of closeness, and betweenness criteria using CytoHubba. Simultaneously, significant modules were selected using MCODE scores and identified their associated genes from the PPI networks. Moreover, metadata were created by listing all hub genes from previous studies and identified significant meta-hub genes whose occurrence frequency was greater than 3 among previous studies. Finally, six key candidate genes (TOP2A, CDC20, ASPM, PRC1, NUSAP1, and UBE2C) were determined by intersecting shared genes among central hub genes, hub module genes, and significant meta-hub genes. Two independent test datasets (GSE76427 and TCGA-LIHC) were utilized to validate these key candidate genes using the area under the curve. Moreover, the prognostic potential of these six key candidate genes was also evaluated on the TCGA-LIHC cohort using survival analysis.
肝细胞癌(HCC)是全球最常见的致命性肝脏恶性肿瘤。因此,挖掘关键基因以揭示分子机制并改善 HCC 的诊断和治疗选择非常重要。本研究旨在采用一系列统计和机器学习计算方法来鉴定 HCC 的关键候选基因。本工作使用了三个微阵列数据集,这些数据集均从基因表达综合数据库中下载。首先,使用 limma 对每个数据集进行归一化和差异表达基因(DEG)鉴定。然后,使用支持向量机(SVM)从每个数据集的 DEG 中确定差异表达的鉴别基因(DEDG),并选择鉴定的三组 DEDG 之间的重叠 DEDG 基因。使用 DAVID 对常见的 DEDG 进行富集分析。使用 STRING 构建蛋白质-蛋白质相互作用(PPI)网络,并根据度、最大邻域组件(MNC)、最大团中心性(MCC)、接近度中心性和介数标准,使用 CytoHubba 确定中心枢纽基因。同时,使用 MCODE 分数选择显著模块,并从 PPI 网络中识别它们的相关基因。此外,通过列出之前研究中的所有枢纽基因来创建元数据,并识别出之前研究中出现频率大于 3 的显著元枢纽基因。最后,通过交集共享基因,从中心枢纽基因、枢纽模块基因和显著元枢纽基因中确定了六个关键候选基因(TOP2A、CDC20、ASPM、PRC1、NUSAP1 和 UBE2C)。使用曲线下面积,利用两个独立的测试数据集(GSE76427 和 TCGA-LIHC)来验证这些关键候选基因。此外,还使用生存分析评估了这六个关键候选基因在 TCGA-LIHC 队列中的预后潜力。