School of Information Engineering, Nanchang Institute of Technology, Jiangxi, 330099, People's Republic of China.
Jiangxi Province Key Laboratory of Water Information Cooperative Sensing and Intelligent Processing, Jiangxi, 330099, People's Republic of China.
Med Biol Eng Comput. 2022 Mar;60(3):663-681. doi: 10.1007/s11517-021-02476-x. Epub 2022 Jan 13.
Microarray gene expression data are often accompanied by a large number of genes and a small number of samples. However, only a few of these genes are relevant to cancer, resulting in significant gene selection challenges. Hence, we propose a two-stage gene selection approach by combining extreme gradient boosting (XGBoost) and a multi-objective optimization genetic algorithm (XGBoost-MOGA) for cancer classification in microarray datasets. In the first stage, the genes are ranked using an ensemble-based feature selection using XGBoost. This stage can effectively remove irrelevant genes and yield a group comprising the most relevant genes related to the class. In the second stage, XGBoost-MOGA searches for an optimal gene subset based on the most relevant genes' group using a multi-objective optimization genetic algorithm. We performed comprehensive experiments to compare XGBoost-MOGA with other state-of-the-art feature selection methods using two well-known learning classifiers on 14 publicly available microarray expression datasets. The experimental results show that XGBoost-MOGA yields significantly better results than previous state-of-the-art algorithms in terms of various evaluation criteria, such as accuracy, F-score, precision, and recall.
微阵列基因表达数据通常伴随着大量的基因和少量的样本。然而,这些基因中只有少数与癌症有关,这导致了显著的基因选择挑战。因此,我们提出了一种两阶段的基因选择方法,结合极端梯度提升(XGBoost)和多目标优化遗传算法(XGBoost-MOGA),用于微阵列数据集的癌症分类。在第一阶段,使用基于集成的特征选择方法(XGBoost)对基因进行排名。这个阶段可以有效地去除不相关的基因,并产生一组与类相关的最相关基因。在第二阶段,XGBoost-MOGA 使用多目标优化遗传算法,根据最相关基因组搜索最优基因子集。我们使用两种著名的学习分类器,在 14 个公开的微阵列表达数据集上,对 XGBoost-MOGA 与其他最先进的特征选择方法进行了全面的实验比较。实验结果表明,XGBoost-MOGA 在准确性、F 分数、精度和召回率等各种评估标准方面,都明显优于之前的最先进算法。