Marelli Ariane J, Li Chao, Liu Aihua, Nguyen Hanh, Moroz Harry, Brophy James M, Guo Liming, Buckeridge David L, Tang Jian, Yang Archer Y, Li Yue
McGill University Health Centre, McGill Adult Unit for Congenital Heart Disease Excellence, Montreal, Québec, Canada.
Department of Epidemiology, Biostatistics, and Occupational Health, McGill University, Montreal, Québec, Canada.
JACC Adv. 2023 Dec 25;3(2):100801. doi: 10.1016/j.jacadv.2023.100801. eCollection 2024 Feb.
With an increasing interest in using large claims databases in medical practice and research, it is a meaningful and essential step to efficiently identify patients with the disease of interest.
This study aims to establish a machine learning (ML) approach to identify patients with congenital heart disease (CHD) in large claims databases.
We harnessed data from the Quebec claims and hospitalization databases from 1983 to 2000. The study included 19,187 patients. Of them, 3,784 were labeled as true CHD patients using a clinician developed algorithm with manual audits considered as the gold standards. To establish an accurate ML-empowered automated CHD classification system, we evaluated ML methods including Gradient Boosting Decision Tree, Support Vector Machine, Decision tree, and compared them to regularized logistic regression. The Area Under the Precision Recall Curve was used as the evaluation metric. External validation was conducted with an updated data set to 2010 with different subjects.
Among the ML methods we evaluated, Gradient Boosting Decision Tree led the performance in identifying true CHD patients with 99.3% Area Under the Precision Recall Curve, 98.0% for sensitivity, and 99.7% for specificity. External validation returned similar statistics on model performance.
This study shows that a tedious and time-consuming clinical inspection for CHD patient identification can be replaced by an extremely efficient ML algorithm in large claims database. Our findings demonstrate that ML methods can be used to automate complicated algorithms to identify patients with complex diseases.
随着在医学实践和研究中使用大型索赔数据库的兴趣日益增加,有效识别患有感兴趣疾病的患者是有意义且至关重要的一步。
本研究旨在建立一种机器学习(ML)方法,以在大型索赔数据库中识别先天性心脏病(CHD)患者。
我们利用了1983年至2000年魁北克索赔和住院数据库中的数据。该研究包括19187名患者。其中,3784名患者使用临床医生开发的算法并经人工审核作为金标准被标记为真正的CHD患者。为了建立一个准确的由ML驱动的自动CHD分类系统,我们评估了包括梯度提升决策树、支持向量机、决策树在内的ML方法,并将它们与正则化逻辑回归进行比较。精确召回率曲线下面积被用作评估指标。使用更新至2010年的不同受试者数据集进行外部验证。
在我们评估的ML方法中,梯度提升决策树在识别真正的CHD患者方面表现最佳,精确召回率曲线下面积为99.3%,灵敏度为98.0%,特异性为99.7%。外部验证得出了关于模型性能的类似统计数据。
本研究表明,在大型索赔数据库中,用于识别CHD患者的繁琐且耗时的临床检查可以被一种极其高效的ML算法所取代。我们的研究结果表明,ML方法可用于自动化复杂算法以识别患有复杂疾病的患者。