Hasan Md Al Mehedi, Maniruzzaman Md, Huang Jie, Shin Jungpil
Department of Computer Science & Engineering, Rajshahi University of Engineering & Technology, Rajshahi, Bangladesh.
Statistics Discipline, Khulna University, Khulna, Bangladesh.
PLoS One. 2025 Feb 5;20(2):e0318215. doi: 10.1371/journal.pone.0318215. eCollection 2025.
Hepatocellular carcinoma (HCC) is the most prevalent and deadly form of liver cancer, and its mortality rate is gradually increasing worldwide. Existing studies used genetic datasets, taken from various platforms, but focused only on common differentially expressed genes (DEGs) across platforms. Consequently, these studies may missed some important genes in the investigation of HCC. To solve these problems, we have taken datasets from multiple platforms and designed a statistical and machine learning-based system to determine platform-independent key genes (KGs) for HCC patients. DEGs were determined from each dataset using limma. Individual combined DEGs (icDEGs) were identified from each platform and then determined grand combined DEGs (gcDEGs) from icDEGs of all platforms. Differentially expressed discriminative genes (DEDGs) was determined based on the classification accuracy using Support vector machine. We constructed PPI network on DEDGs and identified hub genes using MCC. This study determined the optimal modules using the MCODE scores of the PPI network and selected their gene combinations. We combined all genes, obtained from previous studies to form metadata, known as meta-hub genes. Finally, six KGs (CDC20, TOP2A, CENPF, DLGAP5, UBE2C, and RACGAP1) were selected by intersecting the overlapping hub genes, meta-hub genes, and hub module genes. The discriminative power of six KGs and their prognostic potentiality were evaluated using AUC and survival analysis.
肝细胞癌(HCC)是最常见且致命的肝癌形式,其死亡率在全球范围内正逐渐上升。现有研究使用了来自各种平台的基因数据集,但仅关注跨平台的常见差异表达基因(DEG)。因此,这些研究在肝癌调查中可能遗漏了一些重要基因。为了解决这些问题,我们从多个平台获取数据集,并设计了一个基于统计和机器学习的系统,以确定肝癌患者的平台无关关键基因(KG)。使用limma从每个数据集中确定DEG。从每个平台识别个体组合DEG(icDEG),然后从所有平台的icDEG中确定总体组合DEG(gcDEG)。基于支持向量机的分类准确性确定差异表达判别基因(DEDG)。我们在DEDG上构建蛋白质-蛋白质相互作用(PPI)网络,并使用最大团中心性(MCC)识别枢纽基因。本研究使用PPI网络的MCODE分数确定最佳模块,并选择它们的基因组合。我们将先前研究中获得的所有基因组合形成元数据,即元枢纽基因。最后,通过交叉重叠的枢纽基因、元枢纽基因和枢纽模块基因,选择了六个KG(细胞分裂周期蛋白20(CDC20)、拓扑异构酶IIα(TOP2A)、着丝粒蛋白F(CENPF)、Dlg同源物相关蛋白5(DLGAP5)、泛素结合酶E2C(UBE2C)和RACGTP酶激活蛋白1(RACGAP1))。使用曲线下面积(AUC)和生存分析评估六个KG的判别能力及其预后潜力。