Vest Joshua R, Wu Wei, Gregory Megan E, Kasturi Suranga N, Mendonca Eneida A, Bian Jiang, Magoc Tanja, Grannis Shaun, McNamee Cassidy, Harle Christopher A
Department of Health Policy & Management, Indiana University Richard M. Fairbanks School of Public Health, Indianapolis.
Center for Biomedical Informatics, Regenstrief Institute, Indianapolis, Indiana.
JAMA Netw Open. 2025 Aug 1;8(8):e2527426. doi: 10.1001/jamanetworkopen.2025.27426.
Organizations use health-related social needs (HRSN) information to identify patients in need of referrals, to increase clinician awareness, to improve analytics, and for quality reporting.
To contrast the performance of screening questionnaires, natural language processing (NLP) of clinical notes, rule-based computable phenotypes, and machine learning (ML) classification models in measuring HRSNs.
DESIGN, SETTING, AND PARTICIPANTS: This cross-sectional study assessed 4 measurement approaches for 5 HRSNs in parallel. Each approach was treated as a screening test. Data included notes from adult patients treated at primary care clinics in 2 health systems in Indianapolis, Indiana, from January 2022 to June 2023. Data were analyzed from December 2024 to February 2025.
Reference standard instruments measured food insecurity, housing instability, financial strain, transportation barriers, and history of legal problems. Participants completed the HRSN screening questions in the electronic health record (EHR). NLP algorithms, gradient-boosted decision tree ML classifiers, and refined versions of human-defined rule-based computable phenotypes were applied to participants' past 12 months EHR data.
Sensitivity, specificity, area under the curve (AUC), and positive predictive values (PPV) described performance of each approach against the reference standard measures. False-negative rates were used to explore fairness.
Data from a total of 1252 adult patients (407 [32.51%] aged 30 to 49 years; 821 [65.58%] female) were assessed, including 94 (7.51%) who identified as Hispanic, 602 (48.08%) as non-Hispanic Black or African American, and 442 (35.30%) as non-Hispanic White. The screening questions method had the strongest overall performance for food insecurity (AUC, 0.94; 95% CI, 0.93-0.95), housing instability (AUC, 0.78; 95% CI, 0.75-0.80), transportation barriers (AUC, 0.77; 95% CI, 0.74-0.79), and legal problems (AUC, 0.81; 95% CI, 0.77-0.85). The screening questions had poor performance for financial strain (AUC, 0.62; 95% CI, 0.60-0.65). The PPV for screening tools ranged from 0.77 to 0.92, indicating utility for individual-level decision-making. NLP and rule-based computable phenotypes had poor performance. ML classification resulted in higher sensitivities than the other methods. False-negative rates indicated differential, unfair performance for all measurement approaches by gender, race and ethnicity, and age groups.
In this cross-sectional study of HRSN measurement, no approach performed strongly for every HRSN, and every approach had indication of unfair performance. These findings suggest that practitioners, health care and public health organizations, researchers, and policymakers who rely on a single method to collect HRSN data will likely underestimate patients' true social burden.
各组织利用与健康相关的社会需求(HRSN)信息来识别需要转诊的患者,提高临床医生的认识,改进分析方法,并用于质量报告。
对比筛查问卷、临床记录的自然语言处理(NLP)、基于规则的可计算表型以及机器学习(ML)分类模型在测量HRSN方面的表现。
设计、设置和参与者:这项横断面研究并行评估了5种HRSN的4种测量方法。每种方法都被视为一种筛查测试。数据包括2022年1月至2023年6月在印第安纳州印第安纳波利斯市2个医疗系统的初级保健诊所接受治疗的成年患者的记录。数据于2024年12月至2025年2月进行分析。
参考标准工具测量了粮食不安全、住房不稳定、经济压力、交通障碍和法律问题史。参与者在电子健康记录(EHR)中完成了HRSN筛查问题。NLP算法、梯度提升决策树ML分类器以及人工定义的基于规则的可计算表型的改进版本被应用于参与者过去12个月的EHR数据。
敏感性、特异性、曲线下面积(AUC)和阳性预测值(PPV)描述了每种方法相对于参考标准测量的表现。假阴性率用于探索公平性。
共评估了1252名成年患者的数据(407名[32.51%]年龄在30至49岁之间;821名[65.58%]为女性),其中94名(7.51%)为西班牙裔,602名(48.08%)为非西班牙裔黑人或非裔美国人,442名(35.30%)为非西班牙裔白人。筛查问题方法在粮食不安全(AUC,0.94;95%CI,0.93 - 0.95)、住房不稳定(AUC,0.78;95%CI,0.75 - 0.80)、交通障碍(AUC,0.77;95%CI,0.74 - 0.79)和法律问题(AUC,0.81;95%CI,0.77 - 0.85)方面总体表现最强。筛查问题在经济压力方面表现不佳(AUC,0.62;95%CI,0.60 - 0.65)。筛查工具的PPV范围为0.77至0.92,表明对个体层面的决策有用。NLP和基于规则的可计算表型表现不佳。ML分类产生的敏感性高于其他方法。假阴性率表明所有测量方法在性别、种族和族裔以及年龄组方面存在差异且不公平的表现。
在这项关于HRSN测量的横断面研究中,没有一种方法对每种HRSN都表现出色,并且每种方法都有不公平表现的迹象。这些发现表明,依赖单一方法收集HRSN数据的从业者、医疗保健和公共卫生组织、研究人员以及政策制定者可能会低估患者的真实社会负担。