Zhu XingCe, Dai Wei, Evans Richard, Geng Xueyu, Mu Aruhan, Liu Zhiyong
School of Medicine and Health Management, Tongji Medical College, Huazhong University of Science and Technology, Wuhan, China.
Faculty of Computer Science, Dalhousie University, Halifax, NS, Canada.
JMIR Med Inform. 2025 Aug 7;13:e76636. doi: 10.2196/76636.
Stroke has a major impact on global health, causing long-term disability and straining health care resources. Generative large language models (gLLMs) have emerged as promising tools to help address these challenges, but their applications and reported performance in stroke care require comprehensive mapping and synthesis.
The aim of this scoping review was to consolidate a fragmented evidence base and examine the current landscape, shortcomings, and future directions in the design, reporting, and evaluation of gLLM-based interventions in stroke care.
In this scoping review, which adhered to the PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses extension for Scoping Reviews) guidelines and the Population, Concept, and Context (PCC) framework, we searched 6 major scientific databases in December 2024 for gLLM-based interventions across the stroke care pathway, mapping their key characteristics and outcomes.
A total of 25 studies met the predefined eligibility criteria and were included for analysis. Retrospective designs predominated (n=16, 64%). Key applications of gLLMs included clinical decision-making support (n=10, 40%), administrative assistance (n=9, 36%), direct patient interaction (n=5, 20%), and automated literature review (n=1, 4%). Implementations mainly used generative pretrained transformer models accessed through task-prompted chat interfaces. In total, 5 key challenges were identified from the included studies during the implementation of gLLM-based interventions: ensuring factual alignment, maintaining system robustness, enhancing interpretability, optimizing efficiency, and facilitating clinical adoption.
The application of gLLMs in stroke care, while promising, remains relatively new, with most interventions reflecting early-stage or relatively simple implementations. Against this backdrop, critical gaps in research and clinical translation persist. To support the development of clinically impactful and trustworthy applications, we propose an actionable framework that prioritizes real-world evidence, mandates transparent technical reporting, broadens evaluation beyond output accuracy, strengthens validation of advanced task adaptation strategies, and investigates mechanisms for safe and effective human-gLLM interaction.
中风对全球健康有重大影响,会导致长期残疾并使医疗保健资源紧张。生成式大语言模型(gLLMs)已成为帮助应对这些挑战的有前景的工具,但其在中风护理中的应用及报告的性能需要全面梳理和综合分析。
本范围综述的目的是整合零散的证据基础,审视基于gLLM的中风护理干预措施在设计、报告和评估方面的现状、不足及未来方向。
在本遵循PRISMA-ScR(系统评价和Meta分析扩展版的首选报告项目)指南及人群、概念和背景(PCC)框架的范围综述中,我们于2024年12月在6个主要科学数据库中搜索了中风护理路径中基于gLLM的干预措施,梳理其关键特征和结果。
共有25项研究符合预先设定的纳入标准并被纳入分析。回顾性设计占主导(n = 16,64%)。gLLMs的关键应用包括临床决策支持(n = 10,40%)、行政协助(n = 9,36%)、直接患者互动(n = 5,20%)和自动文献综述(n = 1,4%)。实施主要使用通过任务提示聊天界面访问现成的生成式预训练变换器模型。在基于gLLM的干预措施实施过程中,从纳入研究中总共识别出5个关键挑战:确保事实一致性、维持系统稳健性、增强可解释性、优化效率以及促进临床应用。
gLLMs在中风护理中的应用虽前景广阔,但仍相对较新,大多数干预措施反映的是早期或相对简单的实施情况。在此背景下,研究和临床转化方面仍存在重大差距。为支持开发具有临床影响力和可信度的应用,我们提出一个可操作的框架,该框架优先考虑真实世界证据,要求进行透明的技术报告,拓宽评估范围使其超越输出准确性,加强对高级任务适应策略的验证,并研究安全有效的人机gLLM交互机制。