IEEE/ACM Trans Comput Biol Bioinform. 2023 Nov-Dec;20(6):3786-3799. doi: 10.1109/TCBB.2023.3322753. Epub 2023 Dec 25.
Biomarkers associated with hepatocellular carcinoma (HCC) are of great importance to better understand biological response mechanisms to internal or external intervention. The study aimed to identify key candidate genes for HCC using machine learning (ML) and statistics-based bioinformatics models. Differentially expressed genes (DEGs) were identified using limma and then selected their common genes among DEGs identified from four datasets. After that, protein-protein interaction networks were constructed using STRING and then Cytoscape was used to determine hub genes, significant modules, and their associated genes. Simultaneously, three ML-based techniques such as support vector machine (SVM), least absolute shrinkage and selection operator-logistic regression (LASSO-LR), and partial least squares-discriminant analysis (PLS-DA) were implemented to determine the discriminative genes of HCC from common DEGs. Moreover, metadata of hub genes were formed by listing all hub genes from existing studies to incorporate other findings in our analysis. Finally, seven key candidate genes (ASPM, CCNB1, CDK1, DLGAP5, KIF20 A, MT1X, and TOP2A) were identified by intersecting common genes among hub genes, significant modules genes, discriminative genes from SVM, LASSO-LR, and PLS-DA, and meta hub genes from existing studies. Another three independent test datasets were also used to validate these seven key candidate genes using AUC, computed from ROC.
与肝细胞癌 (HCC) 相关的生物标志物对于更好地了解内部或外部干预的生物学反应机制非常重要。本研究旨在使用机器学习 (ML) 和基于统计学的生物信息学模型来识别 HCC 的关键候选基因。使用 limma 鉴定差异表达基因 (DEGs),然后从四个数据集鉴定的 DEGs 中选择它们的共同基因。之后,使用 STRING 构建蛋白质-蛋白质相互作用网络,然后使用 Cytoscape 确定枢纽基因、显著模块及其相关基因。同时,实施了三种基于 ML 的技术,如支持向量机 (SVM)、最小绝对值收缩和选择算子-逻辑回归 (LASSO-LR) 和偏最小二乘判别分析 (PLS-DA),以确定从常见 DEGs 中鉴别 HCC 的基因。此外,通过列出现有研究中的所有枢纽基因,形成枢纽基因的元数据,将其他发现纳入我们的分析中。最后,通过 SVM、LASSO-LR 和 PLS-DA 中的共同基因、显著模块基因、鉴别基因以及现有研究中的元枢纽基因的交集,确定了 7 个关键候选基因 (ASPM、CCNB1、CDK1、DLGAP5、KIF20A、MT1X 和 TOP2A)。还使用另外三个独立的测试数据集使用 AUC 验证了这七个关键候选基因,AUC 是从 ROC 计算得出的。