Suppr超能文献

基于电子健康记录的大语言模型评估医院病程总结

Evaluating Hospital Course Summarization by an Electronic Health Record-Based Large Language Model.

作者信息

Small William R, Austrian Jonathan, O'Donnell Luke, Burk-Rafel Jesse, Hochman Katherine A, Goodman Adam, Zaretsky Jonah, Martin Jacob, Johnson Stephen, Major Vincent J, Jones Simon, Henke Christian, Verplanke Benjamin, Osso Jwan, Larson Ian, Saxena Archana, Mednick Aron, Simonis Choumika, Han Joseph, Kesari Ravi, Wu Xinyuan, Heery Lauren, Desel Tenzin, Baskharoun Samuel, Figman Noah, Farooq Umar, Shah Kunal, Jahan Nusrat, Kim Jeong Min, Testa Paul, Feldman Jonah

机构信息

Department of Health Informatics, New York University Langone Medical Center Information Technology.

Department of Medicine, New York University Grossman School of Medicine.

出版信息

JAMA Netw Open. 2025 Aug 1;8(8):e2526339. doi: 10.1001/jamanetworkopen.2025.26339.

Abstract

IMPORTANCE

Hospital course (HC) summarization represents an increasingly onerous discharge summary component for physicians. Literature supports large language models (LLMs) for HC summarization, but whether physicians can effectively partner with electronic health record-embedded LLMs to draft HCs is unknown.

OBJECTIVES

To compare the editing effort required by time-constrained resident physicians to improve LLM- vs physician-generated HCs toward a novel 4Cs (complete, concise, cohesive, and confabulation-free) HC.

DESIGN, SETTING, AND PARTICIPANTS: Quality improvement study using a convenience sample of 10 internal medicine resident editors, 8 hospitalist evaluators, and randomly selected general medicine admissions in December 2023 lasting 4 to 8 days at New York University Langone Health.

EXPOSURES

Residents and hospitalists reviewed randomly assigned patient medical records for 10 minutes. Residents blinded to author type who edited each HC pair (physician and LLM) for quality in 3 minutes, followed by comparative ratings by attending hospitalists.

MAIN OUTCOMES AND MEASURES

Editing effort was quantified by analyzing the edits that occurred on the HC pairs after controlling for length (percentage edited) and the degree to which the original HCs' meaning was altered (semantic change). Hospitalists compared edited HC pairs with A/B testing on the 4Cs (5-point Likert scales converted to 10-point bidirectional scales).

RESULTS

Among 100 admissions, compared with physician HCs, residents edited a smaller percentage of LLM HCs (LLM mean [SD], 31.5% [16.6%] vs physicians, 44.8% [20.0%]; P < .001). Additionally, LLM HCs required less semantic change (LLM mean [SD], 2.4% [1.6%] vs physicians, 4.9% [3.5%]; P < .001). Attending physicians deemed LLM HCs to be more complete (mean [SD] difference LLM vs physicians on 10-point bidirectional scale, 3.00 [5.28]; P < .001), similarly concise (mean [SD], -1.02 [6.08]; P = .20), and cohesive (mean [SD], 0.70 [6.14]; P = .60), but with more confabulations (mean [SD], -0.98 [3.53]; P = .002). The composite scores were similar (mean [SD] difference LLM vs physician on 40-point bidirectional scale, 1.70 [14.24]; P = .46).

CONCLUSIONS AND RELEVANCE

Electronic health record-embedded LLM HCs required less editing than physician-generated HCs to approach a quality standard, resulting in HCs that were comparably or more complete, concise, and cohesive, but contained more confabulations. Despite the potential influence of artificial time constraints, this study supports the feasibility of a physician-LLM partnership for writing HCs and provides a basis for monitoring LLM HCs in clinical practice.

摘要

重要性

医院病程(HC)总结对医生来说是出院小结中一项日益繁重的内容。文献支持使用大语言模型(LLM)进行HC总结,但医生能否有效地与嵌入电子健康记录的LLM合作来撰写HC尚不清楚。

目的

比较时间紧迫的住院医师为使基于LLM生成的HC和医生生成的HC朝着新颖的4C(完整、简洁、连贯且无虚构)HC改进所需的编辑工作量。

设计、设置和参与者:质量改进研究,采用便利样本,包括10名内科住院医师编辑、8名医院医生评估员,并于2023年12月在纽约大学朗格尼健康中心对随机选择的普通内科住院病例进行为期4至8天的研究。

暴露因素

住院医师和医院医生对随机分配的患者病历进行10分钟的审查。住院医师在不知作者类型的情况下,对每对HC(医生撰写的和基于LLM生成的)进行3分钟的质量编辑,随后由主治医院医生进行比较评分。

主要结局和测量指标

通过分析在控制长度(编辑百分比)和原始HC含义改变程度(语义变化)后HC对中发生的编辑来量化编辑工作量。医院医生通过对4C进行A/B测试(5点李克特量表转换为10点双向量表)比较编辑后的HC对。

结果

在100例住院病例中,与医生撰写的HC相比,住院医师编辑基于LLM生成的HC的百分比更小(基于LLM生成的HC均值[标准差]为31.5%[16.6%],医生撰写的为44.8%[20.0%];P < 0.001)。此外,基于LLM生成的HC所需的语义变化更少(基于LLM生成的HC均值[标准差]为2.4%[1.6%],医生撰写的为4.9%[3.5%];P < 0.001)。主治医生认为基于LLM生成的HC更完整(10点双向量表上基于LLM生成的HC与医生撰写的HC的均值[标准差]差异为3.00[5.28];P < 0.001),简洁程度相似(均值[标准差]为 -1.02[6.08];P = 0.20),连贯性也相似(均值[标准差]为0.70[6.14];P = 0.60),但虚构内容更多(均值[标准差]为 -0.98[3.53];P = 0.002)。综合得分相似(40点双向量表上基于LLM生成的HC与医生撰写的HC的均值[标准差]差异为1.70[14.24];P = 0.46)。

结论与意义

与医生生成的HC相比,嵌入电子健康记录的基于LLM生成的HC达到质量标准所需的编辑更少,生成的HC同样完整或更完整、简洁且连贯,但包含更多虚构内容。尽管存在人为时间限制的潜在影响,但本研究支持医生与LLM合作撰写HC的可行性,并为在临床实践中监测基于LLM生成的HC提供了依据。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f4ba/12351420/5aefe712c750/jamanetwopen-e2526339-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验