Surya Janani, Kashyap Himanshu, Nadig Ramya R, Raman Rajiv
Epidemiology and Biostatistics, National Institute of Epidemiology, Chennai, IND.
Shri Bhagwan Mahavir Vitreoretinal Services, Medical Research Foundation, Sankara Nethralaya, Chennai, IND.
Cureus. 2023 Sep 24;15(9):e45853. doi: 10.7759/cureus.45853. eCollection 2023 Sep.
This study aimed to develop a predictive risk score model based on deep learning (DL) independent of fundus photography, totally reliant on systemic data through targeted screening from a population-based study to diagnose diabetic retinopathy (DR) in the Indian population.
It involved machine learning application on datasets of a cross-sectional population-based study. A total of 1425 subjects (1175 subjects with known diabetes and 250 with newly diagnosed diabetes) were included in the study. We applied five machine learning algorithms, random forest (RF), logistic regression (LR), support vector machines (SVM), artificial neural networks (ANN), and decision trees (DT), to predict diabetic retinopathy in our datasets. We incorporated a percentage split in the first experiment and randomly divided our data set into 80% as a training set and 20% as a test set. We performed a three-way data split in the second experiment to prevent overestimating predictive performance. We randomly divided our data set into 60% as a training set, 20% as a validation set, and 20% as the test set. Furthermore, we integrated five-fold cross-validation to split the percentage to evaluate our method. We judged the predictive performance based on the receiver operating characteristic (ROC) curve, the area under the curve (AUC), accuracy (Acc), sensitivity, and specificity.
The RF classifier achieved the best prediction performance with AUC, Acc, and sensitivity values of 0.91, 0.89, and 0.90, respectively, in the percentage split. Similarly, a three-way data split attained an outcome of 0.86 and 0.85 in AUC and Acc. Likewise, the five-fold cross-validation performed the best with results of 0.90, 0.97, 0.91, and 0.75 in AUC, Acc, sensitivity, and specificity, respectively.
Since the RF classifier achieved the best performance, we propose it to identify diabetic retinopathy for targeted screening in the general population.
本研究旨在开发一种基于深度学习(DL)的预测风险评分模型,该模型不依赖眼底摄影,完全依靠基于人群研究的靶向筛查中的系统数据来诊断印度人群中的糖尿病视网膜病变(DR)。
研究涉及对基于人群的横断面研究数据集进行机器学习应用。共有1425名受试者(1175名已知糖尿病患者和250名新诊断糖尿病患者)纳入研究。我们应用了五种机器学习算法,即随机森林(RF)、逻辑回归(LR)、支持向量机(SVM)、人工神经网络(ANN)和决策树(DT),来预测我们数据集中的糖尿病视网膜病变。在第一个实验中,我们采用百分比划分,将数据集随机分为80%作为训练集,20%作为测试集。在第二个实验中,我们进行了三分法数据划分,以防止高估预测性能。我们将数据集随机分为60%作为训练集,20%作为验证集,20%作为测试集。此外,我们整合了五折交叉验证来划分百分比以评估我们的方法。我们根据受试者工作特征(ROC)曲线、曲线下面积(AUC)、准确率(Acc)、敏感性和特异性来判断预测性能。
在百分比划分中,RF分类器实现了最佳预测性能,AUC、Acc和敏感性值分别为0.91、0.89和0.90。同样,三分法数据划分的AUC和Acc结果分别为0.86和0.85。同样,五折交叉验证表现最佳,AUC、Acc、敏感性和特异性结果分别为0.90、0.97、0.91和0.75。
由于RF分类器表现出最佳性能,我们建议将其用于在普通人群中识别糖尿病视网膜病变以进行靶向筛查。