利用机器学习和自然语言处理技术改善电子健康记录中房间隔缺损的分类

Machine Learning and Natural Language Processing to Improve Classification of Atrial Septal Defects in Electronic Health Records.

作者信息

Guo Yuting, Shi Haoming, Book Wendy M, Ivey Lindsey Carrie, Rodriguez Fred H, Sameni Reza, Raskind-Hood Cheryl, Robichaux Chad, Downing Karrie F, Sarker Abeed

机构信息

Department of Biomedical Informatics, School of Medicine, Emory University, Atlanta, Georgia, USA.

Department of Biomedical Engineering, Georgia Institute Technology, Atlanta, Georgia, USA.

出版信息

Birth Defects Res. 2025 Mar;117(3):e2451. doi: 10.1002/bdr2.2451.

DOI:10.1002/bdr2.2451

PMID:40035168

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11955907/

Abstract

BACKGROUND

International Classification of Disease (ICD) codes can accurately identify patients with certain congenital heart defects (CHDs). In ICD-defined CHD data sets, the code for secundum atrial septal defect (ASD) is the most common, but it has a low positive predictive value for CHD, potentially resulting in the drawing of erroneous conclusions from such data sets. Methods with reduced false positive rates for CHD among individuals captured with the ASD ICD code are needed for public health surveillance.

METHODS

We propose a two-level classification system, which includes a CHD and an ASD classification model, to categorize cases with an ASD ICD code into three groups: ASD, other CHD, or no CHD (including patent foramen ovale). In the proposed approach, a machine learning model that leverages structured data is combined with a text classification system. We compare performances for three text classification strategies: support vector machines (SVMs) using text-based features, a robustly optimized Transformer-based model (RoBERTa), and a scalable tree boosting system using non-text-based features (XGBoost).

RESULTS

Using SVM for both CHD and ASD resulted in the best performance for the ASD and no CHD group, achieving F scores of 0.53 (±0.05) and 0.78 (±0.02), respectively. XGBoost for CHD and SVM for ASD classification performed best for the other CHD group (F score: 0.39 [±0.03]).

CONCLUSIONS

This study demonstrates that it is feasible to use patients' clinical notes and machine learning to perform more fine-grained classification compared to ICD codes, particularly with higher PPV for CHD. The proposed approach can improve CHD surveillance.

摘要

背景

国际疾病分类（ICD）编码能够准确识别患有某些先天性心脏病（CHD）的患者。在ICD定义的CHD数据集中，继发孔房间隔缺损（ASD）的编码最为常见，但它对CHD的阳性预测值较低，可能导致从此类数据集中得出错误结论。公共卫生监测需要降低被ASD ICD编码捕获的个体中CHD假阳性率的方法。

方法

我们提出了一个两级分类系统，包括CHD和ASD分类模型，将具有ASD ICD编码的病例分为三组：ASD、其他CHD或无CHD（包括卵圆孔未闭）。在所提出的方法中，一个利用结构化数据的机器学习模型与一个文本分类系统相结合。我们比较了三种文本分类策略的性能：使用基于文本特征的支持向量机（SVM）、经过稳健优化的基于Transformer的模型（RoBERTa）以及使用非文本特征的可扩展树提升系统（XGBoost）。

结果

CHD和ASD均使用SVM时，ASD组和无CHD组的性能最佳，F分数分别为0.53（±0.05）和0.78（±0.02）。CHD使用XGBoost且ASD分类使用SVM时，其他CHD组的性能最佳（F分数：0.39 [±0.03]）。

结论

本研究表明，与ICD编码相比，使用患者的临床记录和机器学习进行更细粒度的分类是可行的，尤其是对CHD具有更高的阳性预测值。所提出的方法可以改善CHD监测。

相似文献

Machine Learning and Natural Language Processing to Improve Classification of Atrial Septal Defects in Electronic Health Records.

Birth Defects Res. 2025 Mar;117(3):e2451. doi: 10.1002/bdr2.2451.

Supervised Text Classification System Detects Fontan Patients in Electronic Records With Higher Accuracy Than Codes.

J Am Heart Assoc. 2023 Jul 4;12(13):e030046. doi: 10.1161/JAHA.123.030046. Epub 2023 Jun 22.

Positive Predictive Value of , and , Codes for Identification of Congenital Heart Defects.

J Am Heart Assoc. 2023 Aug 15;12(16):e030821. doi: 10.1161/JAHA.123.030821. Epub 2023 Aug 7.

A Generalized Machine Learning Model for Identifying Congenital Heart Defects (CHDs) Using ICD Codes.

Birth Defects Res. 2025 Feb;117(2):e2440. doi: 10.1002/bdr2.2440.

The 745.5 issue in code-based, adult congenital heart disease population studies: Relevance to current and future ICD-9-CM and ICD-10-CM studies.

Congenit Heart Dis. 2018 Jan;13(1):59-64. doi: 10.1111/chd.12563. Epub 2017 Dec 20.

A machine learning model for predicting congenital heart defects from administrative data.

Birth Defects Res. 2023 Nov 1;115(18):1693-1707. doi: 10.1002/bdr2.2245. Epub 2023 Sep 8.

Autonomous International Classification of Diseases Coding Using Pretrained Language Models and Advanced Prompt Learning Techniques: Evaluation of an Automated Analysis System Using Medical Text.

JMIR Med Inform. 2025 Jan 6;13:e63020. doi: 10.2196/63020.

How Well Do Codes Predict True Congenital Heart Defects? A Centers for Disease Control and Prevention-Based Multisite Validation Project.

J Am Heart Assoc. 2022 Aug 2;11(15):e024911. doi: 10.1161/JAHA.121.024911. Epub 2022 Jul 19.

Incorporating natural language processing to improve classification of axial spondyloarthritis using electronic health records.

Rheumatology (Oxford). 2020 May 1;59(5):1059-1065. doi: 10.1093/rheumatology/kez375.

Artificial Intelligence Learning Semantics via External Resources for Classifying Diagnosis Codes in Discharge Notes.

J Med Internet Res. 2017 Nov 6;19(11):e380. doi: 10.2196/jmir.8344.

本文引用的文献

Long term outcome after surgical ASD-closure at young age: Longitudinal follow-up up to 50 years after surgery.

Int J Cardiol. 2024 Feb 15;397:131616. doi: 10.1016/j.ijcard.2023.131616. Epub 2023 Nov 27.

A machine learning model for predicting congenital heart defects from administrative data.

Birth Defects Res. 2023 Nov 1;115(18):1693-1707. doi: 10.1002/bdr2.2245. Epub 2023 Sep 8.

Positive Predictive Value of , and , Codes for Identification of Congenital Heart Defects.

J Am Heart Assoc. 2023 Aug 15;12(16):e030821. doi: 10.1161/JAHA.123.030821. Epub 2023 Aug 7.

Atrial septal defect-associated pulmonary hypertension with decompensated heart failure: outcomes after fenestrated device closure.

Cardiol Young. 2024 Feb;34(2):395-400. doi: 10.1017/S104795112300152X. Epub 2023 Jul 19.

Applying Deep Learning Model to Predict Diagnosis Code of Medical Records.

Diagnostics (Basel). 2023 Jul 6;13(13):2297. doi: 10.3390/diagnostics13132297.

Supervised Text Classification System Detects Fontan Patients in Electronic Records With Higher Accuracy Than Codes.

J Am Heart Assoc. 2023 Jul 4;12(13):e030046. doi: 10.1161/JAHA.123.030046. Epub 2023 Jun 22.

Natural Language Processing Model for Identifying Critical Findings-A Multi-Institutional Study.

J Digit Imaging. 2023 Feb;36(1):105-113. doi: 10.1007/s10278-022-00712-w. Epub 2022 Nov 7.

Health Care Usage Among Adolescents With Congenital Heart Defects at 5 Sites in the United States, 2011 to 2013.

J Am Heart Assoc. 2022 Sep 20;11(18):e026172. doi: 10.1161/JAHA.122.026172. Epub 2022 Sep 14.

Comparison of Pretraining Models and Strategies for Health-Related Social Media Text Classification.

Healthcare (Basel). 2022 Aug 5;10(8):1478. doi: 10.3390/healthcare10081478.

How Well Do Codes Predict True Congenital Heart Defects? A Centers for Disease Control and Prevention-Based Multisite Validation Project.

J Am Heart Assoc. 2022 Aug 2;11(15):e024911. doi: 10.1161/JAHA.121.024911. Epub 2022 Jul 19.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

利用机器学习和自然语言处理技术改善电子健康记录中房间隔缺损的分类

Machine Learning and Natural Language Processing to Improve Classification of Atrial Septal Defects in Electronic Health Records.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献