McCoy Thomas H, Perlis Roy H
Center for Quantitative Health and Department of Psychiatry, Massachusetts General Hospital, Boston, MA, United States.
Department of Psychiatry, Harvard Medical School, Boston, MA, United States.
J Mood Anxiety Disord. 2024 Jul 20;8:100079. doi: 10.1016/j.xjmad.2024.100079. eCollection 2024 Dec.
We sought to characterize the ability of large language models to estimate NIMH Research Domain Criteria dimensions from narrative clinical notes of adult psychiatric inpatients, deriving estimate of overall burden of symptoms in each domain. We extracted consecutive admissions to a psychiatric inpatient unit between December 23, 2009 and September 27, 2015 from the electronic health records of a large academic medical center. Admission and discharge notes were scored with a HIPAA-compliant instance of a large language model (gpt-4-1106-preview). To examine convergent validity, the resulting estimates were correlated with those derived using an earlier method; for predictive validity, they were examined for association with length of hospitalization and probability of readmission. The cohort included 3619 individuals, 1779 female (49 %), 1840 male (51 %) with mean age 44 (SD=16.6). We identified modest correlations between LLM-derived RDoC scores and a previously validated scoring method, with Kendall's tau between from.07 for arousal and 0.27 for positive and cognitive domains (p < .001 for all of these). For admission notes, greater scores on cognitive, sensorimotor, negative, and social domains were significantly associated with longer length of hospitalization in linear regression models including sociodemographic features (p < .01 for all of these); positive valence was associated with shorter hospitalization (p < .001). For discharge notes, social, arousal, and positive valence were associated with likelihood of readmission within 180 days in adjusted logistic regression models (p < .05 for social and arousal, p < .001 for positive valence). Overall, LLM-derived estimates of RDoC psychopathology demonstrated promising convergent and predictive validity, suggesting this approach may make real-world application of the RDoC framework more feasible.
我们试图描述大语言模型根据成年精神科住院患者的叙述性临床记录来估计美国国立精神卫生研究所(NIMH)研究领域标准维度的能力,从而得出每个领域症状总体负担的估计值。我们从一家大型学术医疗中心的电子健康记录中提取了2009年12月23日至2015年9月27日期间连续入住精神科住院单元的病例。入院和出院记录使用符合健康保险流通与责任法案(HIPAA)的大语言模型实例(gpt-4-1106-preview)进行评分。为了检验收敛效度,将所得估计值与使用早期方法得出的估计值进行相关性分析;为了检验预测效度,考察它们与住院时间和再入院概率的关联。该队列包括3619名个体,其中女性1779名(49%),男性1840名(51%),平均年龄44岁(标准差=16.6)。我们发现大语言模型得出的研究领域标准(RDoC)分数与先前验证的评分方法之间存在适度的相关性,其中觉醒维度的肯德尔tau系数为0.07,积极和认知维度为0.27(所有这些p值均<0.001)。对于入院记录,在包含社会人口学特征的线性回归模型中,认知、感觉运动、消极和社会领域的得分越高与住院时间越长显著相关(所有这些p值均<0.01);积极效价与住院时间较短相关(p<0.001)。对于出院记录,在调整后的逻辑回归模型中,社会、觉醒和积极效价与180天内再入院的可能性相关(社会和觉醒维度p<0.05,积极效价p<0.001)。总体而言,大语言模型得出的RDoC精神病理学估计值显示出有前景的收敛效度和预测效度,表明这种方法可能使RDoC框架在现实世界中的应用更可行。