Department of Population Health Sciences, Weill Cornell Medicine, New York, New York, USA.
Information Technologies and Services, Weill Cornell Medicine, New York, New York, USA.
J Am Med Inform Assoc. 2021 Nov 25;28(12):2716-2727. doi: 10.1093/jamia/ocab170.
Social determinants of health (SDoH) are nonclinical dispositions that impact patient health risks and clinical outcomes. Leveraging SDoH in clinical decision-making can potentially improve diagnosis, treatment planning, and patient outcomes. Despite increased interest in capturing SDoH in electronic health records (EHRs), such information is typically locked in unstructured clinical notes. Natural language processing (NLP) is the key technology to extract SDoH information from clinical text and expand its utility in patient care and research. This article presents a systematic review of the state-of-the-art NLP approaches and tools that focus on identifying and extracting SDoH data from unstructured clinical text in EHRs.
A broad literature search was conducted in February 2021 using 3 scholarly databases (ACL Anthology, PubMed, and Scopus) following Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines. A total of 6402 publications were initially identified, and after applying the study inclusion criteria, 82 publications were selected for the final review.
Smoking status (n = 27), substance use (n = 21), homelessness (n = 20), and alcohol use (n = 15) are the most frequently studied SDoH categories. Homelessness (n = 7) and other less-studied SDoH (eg, education, financial problems, social isolation and support, family problems) are mostly identified using rule-based approaches. In contrast, machine learning approaches are popular for identifying smoking status (n = 13), substance use (n = 9), and alcohol use (n = 9).
NLP offers significant potential to extract SDoH data from narrative clinical notes, which in turn can aid in the development of screening tools, risk prediction models, and clinical decision support systems.
健康的社会决定因素(SDoH)是非临床因素,会影响患者的健康风险和临床结果。在临床决策中利用 SDoH 可能会改善诊断、治疗计划和患者的结果。尽管人们越来越有兴趣在电子健康记录(EHR)中获取 SDoH 信息,但这些信息通常被锁定在非结构化的临床记录中。自然语言处理(NLP)是从临床文本中提取 SDoH 信息并扩大其在患者护理和研究中应用的关键技术。本文对关注从 EHR 中的非结构化临床文本中识别和提取 SDoH 数据的最先进的 NLP 方法和工具进行了系统回顾。
2021 年 2 月,根据系统评价和荟萃分析的首选报告项目(PRISMA)指南,在 3 个学术数据库(ACL 文集、PubMed 和 Scopus)中进行了广泛的文献检索。最初共确定了 6402 篇出版物,在应用研究纳入标准后,选择了 82 篇出版物进行最终综述。
吸烟状况(n=27)、物质使用(n=21)、无家可归(n=20)和酒精使用(n=15)是最常研究的 SDoH 类别。无家可归(n=7)和其他研究较少的 SDoH(例如,教育、经济问题、社会孤立和支持、家庭问题)主要通过基于规则的方法来识别。相比之下,机器学习方法常用于识别吸烟状况(n=13)、物质使用(n=9)和酒精使用(n=9)。
NLP 提供了从叙述性临床记录中提取 SDoH 数据的巨大潜力,从而有助于开发筛选工具、风险预测模型和临床决策支持系统。