基于 XGBoost 和多目标遗传算法的混合基因选择方法在癌症分类中的应用。

Hybrid gene selection approach using XGBoost and multi-objective genetic algorithm for cancer classification.

机构信息

School of Information Engineering, Nanchang Institute of Technology, Jiangxi, 330099, People's Republic of China.

Jiangxi Province Key Laboratory of Water Information Cooperative Sensing and Intelligent Processing, Jiangxi, 330099, People's Republic of China.

出版信息

Med Biol Eng Comput. 2022 Mar;60(3):663-681. doi: 10.1007/s11517-021-02476-x. Epub 2022 Jan 13.

DOI:10.1007/s11517-021-02476-x

PMID:35028863

Abstract

Microarray gene expression data are often accompanied by a large number of genes and a small number of samples. However, only a few of these genes are relevant to cancer, resulting in significant gene selection challenges. Hence, we propose a two-stage gene selection approach by combining extreme gradient boosting (XGBoost) and a multi-objective optimization genetic algorithm (XGBoost-MOGA) for cancer classification in microarray datasets. In the first stage, the genes are ranked using an ensemble-based feature selection using XGBoost. This stage can effectively remove irrelevant genes and yield a group comprising the most relevant genes related to the class. In the second stage, XGBoost-MOGA searches for an optimal gene subset based on the most relevant genes' group using a multi-objective optimization genetic algorithm. We performed comprehensive experiments to compare XGBoost-MOGA with other state-of-the-art feature selection methods using two well-known learning classifiers on 14 publicly available microarray expression datasets. The experimental results show that XGBoost-MOGA yields significantly better results than previous state-of-the-art algorithms in terms of various evaluation criteria, such as accuracy, F-score, precision, and recall.

摘要

微阵列基因表达数据通常伴随着大量的基因和少量的样本。然而，这些基因中只有少数与癌症有关，这导致了显著的基因选择挑战。因此，我们提出了一种两阶段的基因选择方法，结合极端梯度提升（XGBoost）和多目标优化遗传算法（XGBoost-MOGA），用于微阵列数据集的癌症分类。在第一阶段，使用基于集成的特征选择方法（XGBoost）对基因进行排名。这个阶段可以有效地去除不相关的基因，并产生一组与类相关的最相关基因。在第二阶段，XGBoost-MOGA 使用多目标优化遗传算法，根据最相关基因组搜索最优基因子集。我们使用两种著名的学习分类器，在 14 个公开的微阵列表达数据集上，对 XGBoost-MOGA 与其他最先进的特征选择方法进行了全面的实验比较。实验结果表明，XGBoost-MOGA 在准确性、F 分数、精度和召回率等各种评估标准方面，都明显优于之前的最先进算法。

相似文献

Hybrid gene selection approach using XGBoost and multi-objective genetic algorithm for cancer classification.

Med Biol Eng Comput. 2022 Mar;60(3):663-681. doi: 10.1007/s11517-021-02476-x. Epub 2022 Jan 13.

A two-stage hybrid biomarker selection method based on ensemble filter and binary differential evolution incorporating binary African vultures optimization.

BMC Bioinformatics. 2023 Apr 4;24(1):130. doi: 10.1186/s12859-023-05247-7.

Improved intelligent water drop-based hybrid feature selection method for microarray data processing.

Comput Biol Chem. 2023 Apr;103:107809. doi: 10.1016/j.compbiolchem.2022.107809. Epub 2023 Jan 13.

A novel bio-inspired hybrid multi-filter wrapper gene selection method with ensemble classifier for microarray data.

Neural Comput Appl. 2023;35(16):11531-11561. doi: 10.1007/s00521-021-06459-9. Epub 2021 Sep 12.

Optimizing cancer classification: a hybrid RDO-XGBoost approach for feature selection and predictive insights.

Cancer Immunol Immunother. 2024 Oct 9;73(12):261. doi: 10.1007/s00262-024-03843-x.

Two-stage feature selection for classification of gene expression data based on an improved Salp Swarm Algorithm.

Math Biosci Eng. 2022 Sep 19;19(12):13747-13781. doi: 10.3934/mbe.2022641.

MEvA-X: a hybrid multiobjective evolutionary tool using an XGBoost classifier for biomarkers discovery on biomedical datasets.

Bioinformatics. 2023 Jul 1;39(7). doi: 10.1093/bioinformatics/btad384.

Diagnostic classification of cancers using extreme gradient boosting algorithm and multi-omics data.

Comput Biol Med. 2020 Jun;121:103761. doi: 10.1016/j.compbiomed.2020.103761. Epub 2020 Apr 16.

C-HMOSHSSA: Gene selection for cancer classification using multi-objective meta-heuristic and machine learning methods.

Comput Methods Programs Biomed. 2019 Sep;178:219-235. doi: 10.1016/j.cmpb.2019.06.029. Epub 2019 Jun 29.

Genetic Bee Colony (GBC) algorithm: A new gene selection method for microarray cancer classification.

Comput Biol Chem. 2015 Jun;56:49-60. doi: 10.1016/j.compbiolchem.2015.03.001. Epub 2015 Mar 18.

引用本文的文献

GPS: Harnessing data fusion strategies to improve the accuracy of machine learning-based genomic and phenotypic selection.

Plant Commun. 2025 Aug 11;6(8):101416. doi: 10.1016/j.xplc.2025.101416. Epub 2025 Jun 11.

Identification of novel therapeutic targets in hepatitis-B virus-associated membranous nephropathy using scRNA-seq and machine learning.

Sci Rep. 2025 May 29;15(1):18959. doi: 10.1038/s41598-025-03625-0.

An interpreting machine learning models to predict amputation risk in patients with diabetic foot ulcers: a multi-center study.

Front Endocrinol (Lausanne). 2025 Mar 25;16:1526098. doi: 10.3389/fendo.2025.1526098. eCollection 2025.

A comprehensive learning based swarm optimization approach for feature selection in gene expression data.

Heliyon. 2024 Sep 2;10(17):e37165. doi: 10.1016/j.heliyon.2024.e37165. eCollection 2024 Sep 15.

Decoding temporal heterogeneity in NSCLC through machine learning and prognostic model construction.

World J Surg Oncol. 2024 Jun 13;22(1):156. doi: 10.1186/s12957-024-03435-0.

A universal inverse design methodology for microfluidic mixers.

Biomicrofluidics. 2024 Mar 25;18(2):024102. doi: 10.1063/5.0185494. eCollection 2024 Mar.

A novel feature selection algorithm for identifying hub genes in lung cancer.

Sci Rep. 2023 Dec 7;13(1):21671. doi: 10.1038/s41598-023-48953-1.

A bio-inspired convolution neural network architecture for automatic breast cancer detection and classification using RNA-Seq gene expression data.

Sci Rep. 2023 Sep 5;13(1):14644. doi: 10.1038/s41598-023-41731-z.

Integrative Prognostic Machine Learning Models in Mantle Cell Lymphoma.

Cancer Res Commun. 2023 Aug 2;3(8):1435-1446. doi: 10.1158/2767-9764.CRC-23-0083. eCollection 2023 Aug.

Construction and validation of prognostic models in critically Ill patients with sepsis-associated acute kidney injury: interpretable machine learning approach.

J Transl Med. 2023 Jun 22;21(1):406. doi: 10.1186/s12967-023-04205-4.

本文引用的文献

A Novel XGBoost Method to Infer the Primary Lesion of 20 Solid Tumor Types From Gene Expression Data.

Front Genet. 2021 Feb 3;12:632761. doi: 10.3389/fgene.2021.632761. eCollection 2021.

Implication of Ataxia-Telangiectasia-mutated kinase in epithelium-mesenchyme transition.

Carcinogenesis. 2021 Apr 30;42(4):640-649. doi: 10.1093/carcin/bgab002.

Inhibition of ABCC6 Transporter Modifies Cytoskeleton and Reduces Motility of HepG2 Cells via Purinergic Pathway.

Cells. 2020 Jun 5;9(6):1410. doi: 10.3390/cells9061410.

Ube2S regulates Wnt/β-catenin signaling and promotes the progression of non-small cell lung cancer.

Int J Med Sci. 2020 Jan 14;17(2):274-279. doi: 10.7150/ijms.40243. eCollection 2020.

Gene Expression Value Prediction Based on XGBoost Algorithm.

Front Genet. 2019 Nov 12;10:1077. doi: 10.3389/fgene.2019.01077. eCollection 2019.

TRF1 as a major contributor for telomeres' shortening in the context of obesity.

Free Radic Biol Med. 2018 Dec;129:286-295. doi: 10.1016/j.freeradbiomed.2018.09.039. Epub 2018 Sep 27.

CD59 is a potential biomarker of esophageal squamous cell carcinoma radioresistance by affecting DNA repair.

Cell Death Dis. 2018 Aug 30;9(9):887. doi: 10.1038/s41419-018-0895-0.

Genetic algorithm based cancerous gene identification from microarray data using ensemble of filter methods.

Med Biol Eng Comput. 2019 Jan;57(1):159-176. doi: 10.1007/s11517-018-1874-4. Epub 2018 Aug 1.

Radiation Dose Exposure for Lumbar Transforaminal Epidural Steroid Injections and Facet Joint Blocks Under CT vs. Fluoroscopic Guidance.

Pain Pract. 2018 Jul;18(6):798-804. doi: 10.1111/papr.12677. Epub 2018 Feb 5.

Development of a two-stage gene selection method that incorporates a novel hybrid approach using the cuckoo optimization algorithm and harmony search for cancer classification.

J Biomed Inform. 2017 Mar;67:11-20. doi: 10.1016/j.jbi.2017.01.016. Epub 2017 Feb 3.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于 XGBoost 和多目标遗传算法的混合基因选择方法在癌症分类中的应用。

Hybrid gene selection approach using XGBoost and multi-objective genetic algorithm for cancer classification.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献