Suppr超能文献

通过整合代谢组学和基于树的提升方法来增强 2 型糖尿病预测。

Enhancing type 2 diabetes mellitus prediction by integrating metabolomics and tree-based boosting approaches.

机构信息

Department of Biostatistics and Medical Informatics, Faculty of Medicine, Inonu University, Malatya, Türkiye.

Central Labs, King Khalid University, Abha, Saudi Arabia.

出版信息

Front Endocrinol (Lausanne). 2024 Nov 11;15:1444282. doi: 10.3389/fendo.2024.1444282. eCollection 2024.

Abstract

BACKGROUND

Type 2 diabetes mellitus (T2DM) is a global health problem characterized by insulin resistance and hyperglycemia. Early detection and accurate prediction of T2DM is crucial for effective management and prevention. This study explores the integration of machine learning (ML) and explainable artificial intelligence (XAI) approaches based on metabolomics panel data to identify biomarkers and develop predictive models for T2DM.

METHODS

Metabolomics data from T2DM (n = 31) and healthy controls (n = 34) were analyzed for biomarker discovery (mostly amino acids, fatty acids, and purines) and T2DM prediction. Feature selection was performed using the least absolute shrinkage and selection operator (LASSO) regression to enhance the model's accuracy and interpretability. Advanced three tree-based ML algorithms (KTBoost: Kernel-Tree Boosting; XGBoost: eXtreme Gradient Boosting; NGBoost: Natural Gradient Boosting) were employed to predict T2DM using these biomarkers. The SHapley Additive exPlanations (SHAP) method was used to explain the effects of metabolomics biomarkers on the prediction of the model.

RESULTS

The study identified multiple metabolites associated with T2DM, where LASSO feature selection highlighted important biomarkers. KTBoost [Accuracy: 0.938; CI: (0.880-0.997), Sensitivity: 0.971; CI: (0.847-0.999), Area under the Curve (AUC): 0.965; CI: (0.937-0.994)] demonstrated its effectiveness in using complex metabolomics data for T2DM prediction and achieved better performance than other models. According to KTBoost's SHAP, high levels of phenylactate (pla) and taurine metabolites, as well as low concentrations of cysteine, laspartate, and lcysteate, are strongly associated with the presence of T2DM.

CONCLUSION

The integration of metabolomics profiling and XAI offers a promising approach to predicting T2DM. The use of tree-based algorithms, in particular KTBoost, provides a robust framework for analyzing complex datasets and improves the prediction accuracy of T2DM onset. Future research should focus on validating these biomarkers and models in larger, more diverse populations to solidify their clinical utility.

摘要

背景

2 型糖尿病(T2DM)是一种全球性健康问题,其特征为胰岛素抵抗和高血糖。早期发现和准确预测 T2DM 对于有效管理和预防至关重要。本研究探讨了基于代谢组学面板数据的机器学习(ML)和可解释人工智能(XAI)方法的整合,以识别生物标志物并开发 T2DM 预测模型。

方法

对 T2DM(n=31)和健康对照组(n=34)的代谢组学数据进行分析,以发现生物标志物(主要为氨基酸、脂肪酸和嘌呤)并预测 T2DM。使用最小绝对收缩和选择算子(LASSO)回归进行特征选择,以提高模型的准确性和可解释性。使用三种先进的基于树的 ML 算法(KTBoost:核树增强;XGBoost:极端梯度增强;NGBoost:自然梯度增强),使用这些生物标志物预测 T2DM。使用 SHapley Additive exPlanations(SHAP)方法解释代谢组学生物标志物对模型预测的影响。

结果

本研究确定了多个与 T2DM 相关的代谢物,其中 LASSO 特征选择突出了重要的生物标志物。KTBoost[准确性:0.938;置信区间(0.880-0.997);敏感性:0.971;置信区间(0.847-0.999);曲线下面积(AUC):0.965;置信区间(0.937-0.994)]在使用复杂的代谢组学数据预测 T2DM 方面表现出有效性,并且优于其他模型。根据 KTBoost 的 SHAP,高苯丙氨酸(pla)和牛磺酸代谢物水平以及低半胱氨酸、天冬氨酸和 L-半胱氨酸浓度与 T2DM 的存在密切相关。

结论

代谢组学分析和 XAI 的整合为预测 T2DM 提供了一种很有前途的方法。基于树的算法,特别是 KTBoost 的使用,为分析复杂数据集提供了一个强大的框架,并提高了 T2DM 发病预测的准确性。未来的研究应集中在验证这些生物标志物和模型在更大、更多样化的人群中的有效性,以巩固其临床实用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/45e9/11586166/237761841733/fendo-15-1444282-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验