Ji Seok Min, Kim Jeewuan, Kim Kyu Min
Department of Cancer AI and Digital Health, Graduate School of Cancer Science and Policy, National Cancer Center, Siheung, Gyeonggi-do, Republic of Korea.
Department of Health Administration, Gyeonggi College of Science and Technology, Siheung, Gyeonggi-do, Republic of Korea.
BMC Health Serv Res. 2025 Aug 7;25(1):1040. doi: 10.1186/s12913-025-13139-0.
Despite the National Health Insurance (NHI) system implemented in South Korea, concerns persist regarding access to health coverage for low-income households. To address this issue, this study aims to use machine learning-based data mining techniques to classify whether such households will face catastrophic health expenditures (CHEs).
A total of 4,031 low-income people were extracted using 2019 data from the Korea Health Panel Survey. The classification model was developed using four machine learning algorithms: Random Forest, Gradient boosting, Decision tree, Ridge regression, Neural network, and AdaBoost. Ten-fold cross validation was carried out to ensure the reliability of the analysis results. The model was evaluated based on the Area Under Receiver Operating Characteristics (AUROC) as well as accuracy, precision, recall, and F-1 score.
The study's findings revealed that the incidence of CHE was 26.2% in low-income households. The AdaBoost model had the highest classifiable power. It showed AUROC of 89.8%, accuracy of 83.1%, precision of 82.4%, recall of 83.1, and F1 score of 82.1%. The study found that economic activity, chronic disease, and age were significant factors that could lead to CHEs. Therefore, individuals over 65, with chronic conditions, and unemployed had the highest likelihood of developing CHE.
It is essential to identify low-income households that are at risk of CHEs in advance before facing the economic burden. This research is expected to provide fundamental data that can aid in developing an integrated support program to prevent and manage CHEs more effectively.
尽管韩国实施了国民健康保险(NHI)制度,但低收入家庭获得医疗保险的问题仍然令人担忧。为了解决这个问题,本研究旨在使用基于机器学习的数据挖掘技术来分类此类家庭是否会面临灾难性医疗支出(CHEs)。
使用韩国健康面板调查2019年的数据提取了总共4031名低收入人群。使用四种机器学习算法开发分类模型:随机森林、梯度提升、决策树、岭回归、神经网络和自适应增强(AdaBoost)。进行十折交叉验证以确保分析结果的可靠性。基于受试者工作特征曲线下面积(AUROC)以及准确性、精确性、召回率和F1分数对模型进行评估。
研究结果显示,低收入家庭中灾难性医疗支出的发生率为26.2%。AdaBoost模型具有最高的分类能力。它的AUROC为89.8%,准确性为83.1%,精确性为82.4%,召回率为83.1,F1分数为82.1%。研究发现,经济活动、慢性病和年龄是可能导致灾难性医疗支出的重要因素。因此,65岁以上、患有慢性病且失业的个体发生灾难性医疗支出的可能性最高。
在面临经济负担之前,提前识别有灾难性医疗支出风险的低收入家庭至关重要。本研究有望提供基础数据,有助于制定综合支持计划,以更有效地预防和管理灾难性医疗支出。