Yang Penglu, Yang Bin
The First Clinical School & Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China.
Health Management Center, Union Hospital, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, Hubei, China.
PLoS One. 2025 Feb 24;20(2):e0318226. doi: 10.1371/journal.pone.0318226. eCollection 2025.
This study aimed to develop and compare machine learning models for predicting diabetic retinopathy (DR) using clinical and biochemical data, specifically logistic regression, random forest, XGBoost, and neural networks.
A dataset of 3,000 diabetic patients, including 1,500 with DR, was obtained from the National Population Health Science Data Center. Significant predictors were identified, and four predictive models were developed. Model performance was assessed using accuracy, precision, recall, F1-score, and area under the curve (AUC).
Random forest and XGBoost demonstrated superior performance, achieving accuracies of 95.67% and 94.67%, respectively, with AUC values of 0.991 and 0.989. Logistic regression yielded an accuracy of 76.50% (AUC: 0.828), while neural networks achieved 82.67% accuracy (AUC: 0.927). Key predictors included 24-hour urinary microalbumin, HbA1c, and serum creatinine.
The study highlights random forest and XGBoost as effective tools for early DR detection, emphasizing the importance of renal and glycemic markers in risk assessment. These findings support the integration of machine learning models into clinical decision-making for improved patient outcomes in diabetes management.
本研究旨在利用临床和生化数据开发并比较用于预测糖尿病视网膜病变(DR)的机器学习模型,具体包括逻辑回归、随机森林、XGBoost和神经网络。
从国家人口健康科学数据中心获得了一个包含3000名糖尿病患者的数据集,其中1500名患有DR。确定了显著预测因子,并开发了四种预测模型。使用准确率、精确率、召回率、F1分数和曲线下面积(AUC)评估模型性能。
随机森林和XGBoost表现出卓越的性能,准确率分别达到95.67%和94.67%,AUC值分别为0.991和0.989。逻辑回归的准确率为76.50%(AUC:0.828),而神经网络的准确率为82.67%(AUC:0.927)。关键预测因子包括24小时尿微量白蛋白、糖化血红蛋白(HbA1c)和血清肌酐。
该研究强调随机森林和XGBoost是早期DR检测的有效工具,强调了肾脏和血糖标志物在风险评估中的重要性。这些发现支持将机器学习模型整合到临床决策中,以改善糖尿病管理中的患者预后。