Suppr超能文献

基于 GA-XGBoost 和堆叠集成算法的糖尿病预测模型。

Diabetes prediction model based on GA-XGBoost and stacking ensemble algorithm.

机构信息

College of Computer Science and Engineering, Sichuan University of Science and Engineering, Yibin, China.

出版信息

PLoS One. 2024 Sep 30;19(9):e0311222. doi: 10.1371/journal.pone.0311222. eCollection 2024.

Abstract

Diabetes, as an incurable lifelong chronic disease, has profound and far-reaching effects on patients. Given this, early intervention is particularly crucial, as it can not only significantly improve the prognosis of patients but also provide valuable reference information for clinical treatment. This study selected the BRFSS (Behavioral Risk Factor Surveillance System) dataset, which is publicly available on the Kaggle platform, as the research object, aiming to provide a scientific basis for the early diagnosis and treatment of diabetes through advanced machine learning techniques. Firstly, the dataset was balanced using various sampling methods; secondly, a Stacking model based on GA-XGBoost (XGBoost model optimized by genetic algorithm) was constructed for the risk prediction of diabetes; finally, the interpretability of the model was deeply analyzed using Shapley values. The results show: (1) Random oversampling, ADASYN, SMOTE, and SMOTEENN were used for data balance processing, among which SMOTEENN showed better efficiency and effect in dealing with data imbalance. (2) The GA-XGBoost model optimized the hyperparameters of the XGBoost model through a genetic algorithm to improve the model's predictive accuracy. Combined with the better-performing LightGBM model and random forest model, a two-layer Stacking model was constructed. This model not only outperforms single machine learning models in predictive effect but also provides a new idea and method in the field of model integration. (3) Shapley value analysis identified features that have a significant impact on the prediction of diabetes, such as age and body mass index. This analysis not only enhances the transparency of the model but also provides more precise treatment decision support for doctors and patients. In summary, this study has not only improved the accuracy of predicting the risk of diabetes by adopting advanced machine learning techniques and model integration strategies but also provided a powerful tool for the early diagnosis and personalized treatment of diabetes.

摘要

糖尿病是一种不可治愈的终身慢性疾病,对患者有着深远而广泛的影响。鉴于此,早期干预尤为重要,因为它不仅可以显著改善患者的预后,还可以为临床治疗提供有价值的参考信息。本研究选择了 BRFSS(行为风险因素监测系统)数据集,该数据集可在 Kaggle 平台上公开获取,旨在通过先进的机器学习技术为糖尿病的早期诊断和治疗提供科学依据。首先,使用各种采样方法对数据集进行平衡处理;其次,构建了基于 GA-XGBoost(遗传算法优化的 XGBoost 模型)的堆叠模型,用于糖尿病风险预测;最后,使用 Shapley 值对模型的可解释性进行深入分析。结果表明:(1)随机过采样、ADASYN、SMOTE 和 SMOTEENN 用于数据平衡处理,其中 SMOTEENN 在处理数据不平衡方面表现出更好的效率和效果。(2)GA-XGBoost 模型通过遗传算法优化了 XGBoost 模型的超参数,从而提高了模型的预测精度。结合性能较好的 LightGBM 模型和随机森林模型,构建了两层堆叠模型。该模型不仅在预测效果上优于单一的机器学习模型,而且为模型集成领域提供了新的思路和方法。(3)Shapley 值分析确定了对糖尿病预测有显著影响的特征,如年龄和体重指数。这种分析不仅增强了模型的透明度,还为医生和患者提供了更精确的治疗决策支持。总之,本研究通过采用先进的机器学习技术和模型集成策略,不仅提高了预测糖尿病风险的准确性,还为糖尿病的早期诊断和个性化治疗提供了有力工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c6c7/11441666/c1cdd5aff621/pone.0311222.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验