人工智能驱动的证据综合：使用大语言模型对随机对照试验进行数据提取

AI-driven evidence synthesis: data extraction of randomized controlled trials with large language models.

作者信息

Liu Jiayi, Lai Honghao, Zhao Weilong, Huang Jiajie, Xia Danni, Liu Hui, Luo Xufei, Wang Bingyi, Pan Bei, Hou Liangying, Chen Yaolong, Ge Long

机构信息

Department of Health Policy and Health Management, School of Public Health, Lanzhou University, Lanzhou, China.

Evidence-Based Social Science Research Center, School of Public Health, Lanzhou University, Lanzhou, China.

出版信息

Int J Surg. 2025 Mar 1;111(3):2722-2726. doi: 10.1097/JS9.0000000000002215.

DOI:10.1097/JS9.0000000000002215

PMID:39903558

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12372713/

Abstract

The advancement of large language models (LLMs) presents promising opportunities to enhance evidence synthesis efficiency, particularly in data extraction processes, yet existing prompts for data extraction remain limited, focusing primarily on commonly used items without accommodating diverse extraction needs. This research letter developed structured prompts for LLMs and evaluated their feasibility in extracting data from randomized controlled trials (RCTs). Using Claude (Claude-2) as the platform, we designed comprehensive structured prompts comprising 58 items across six Cochrane Handbook domains and tested them on 10 randomly selected RCTs from published Cochrane reviews. The results demonstrated high accuracy with an overall correct rate of 94.77% (95% CI: 93.66% to 95.73%), with domain-specific performance ranging from 77.97% to 100%. The extraction process proved efficient, requiring only 88 seconds per RCT. These findings substantiate the feasibility and potential value of LLMs in evidence synthesis when guided by structured prompts, marking a significant advancement in systematic review methodology.

摘要

大语言模型（LLMs）的发展为提高证据综合效率带来了充满希望的机遇，尤其是在数据提取过程中。然而，现有的数据提取提示仍然有限，主要侧重于常用项目，无法满足多样化的提取需求。这篇研究信函为大语言模型开发了结构化提示，并评估了它们从随机对照试验（RCTs）中提取数据的可行性。以Claude（Claude-2）为平台，我们设计了涵盖Cochrane手册六个领域的58个项目的全面结构化提示，并在从已发表的Cochrane综述中随机选择的10项RCTs上进行了测试。结果显示出高准确性，总体正确率为94.77%（95%CI：93.66%至95.73%），各领域的表现从77.97%到100%不等。提取过程证明是高效的，每个RCT仅需88秒。这些发现证实了在结构化提示指导下大语言模型在证据综合中的可行性和潜在价值，标志着系统综述方法的重大进展。