Suppr超能文献

利用人工智能在减重手术中的应用:ChatGPT-4、Bing 和 Bard 在生成临床医生水平的减重手术建议方面的比较分析。

Harnessing artificial intelligence in bariatric surgery: comparative analysis of ChatGPT-4, Bing, and Bard in generating clinician-level bariatric surgery recommendations.

机构信息

Division of General Surgery, McMaster University, Hamilton, Ontario, Canada; Harvard T.H. Chan School of Public Health, Harvard University, Boston, Massachusetts.

Department of Surgery, Brigham and Women's Hospital, Boston, Massachusetts.

出版信息

Surg Obes Relat Dis. 2024 Jul;20(7):603-608. doi: 10.1016/j.soard.2024.03.011. Epub 2024 Mar 24.

Abstract

BACKGROUND

The formulation of clinical recommendations pertaining to bariatric surgery is essential in guiding healthcare professionals. However, the extensive and continuously evolving body of literature in bariatric surgery presents considerable challenge for staying abreast of latest developments and efficient information acquisition. Artificial intelligence (AI) has the potential to streamline access to the salient points of clinical recommendations in bariatric surgery.

OBJECTIVES

The study aims to appraise the quality and readability of AI-chat-generated answers to frequently asked clinical inquiries in the field of bariatric and metabolic surgery.

SETTING

Remote.

METHODS

Question prompts inputted into AI large language models (LLMs) and were created based on pre-existing clinical practice guidelines regarding bariatric and metabolic surgery. The prompts were queried into 3 LLMs: OpenAI ChatGPT-4, Microsoft Bing, and Google Bard. The responses from each LLM were entered into a spreadsheet for randomized and blinded duplicate review. Accredited bariatric surgeons in North America independently assessed appropriateness of each recommendation using a 5-point Likert scale. Scores of 4 and 5 were deemed appropriate, while scores of 1-3 indicated lack of appropriateness. A Flesch Reading Ease (FRE) score was calculated to assess the readability of responses generated by each LLMs.

RESULTS

There was a significant difference between the 3 LLMs in their 5-point Likert scores, with mean values of 4.46 (SD .82), 3.89 (.80), and 3.11 (.72) for ChatGPT-4, Bard, and Bing (P < .001). There was a significant difference between the 3 LLMs in the proportion of appropriate answers, with ChatGPT-4 at 85.7%, Bard at 74.3%, and Bing at 25.7% (P < .001). The mean FRE scores for ChatGPT-4, Bard, and Bing, were 21.68 (SD 2.78), 42.89 (4.03), and 14.64 (5.09), respectively, with higher scores representing easier readability.

CONCLUSIONS

LLM-based AI chat models can effectively generate appropriate responses to clinical questions related to bariatric surgery, though the performance of different models can vary greatly. Therefore, caution should be taken when interpreting clinical information provided by LLMs, and clinician oversight is necessary to ensure accuracy. Future investigation is warranted to explore how LLMs might enhance healthcare provision and clinical decision-making in bariatric surgery.

摘要

背景

制定与减重手术相关的临床建议对于指导医疗保健专业人员至关重要。然而,减重手术领域广泛且不断发展的文献资料使得及时了解最新进展和高效获取信息具有相当大的挑战性。人工智能(AI)有可能简化获取减重手术临床建议要点的途径。

目的

本研究旨在评估人工智能(AI)生成的对减重和代谢手术领域常见临床问题回答的质量和可读性。

设置

远程。

方法

将输入到 AI 大型语言模型(LLM)中的问题提示基于现有的减重和代谢手术临床实践指南创建。将提示查询到 3 个 LLM 中:OpenAI ChatGPT-4、Microsoft Bing 和 Google Bard。将每个 LLM 的响应输入到电子表格中,进行随机和盲目的重复审查。北美认证的减重外科医生使用 5 点李克特量表独立评估每个建议的适当性。评分 4 和 5 表示适当,评分 1-3 表示不适当。计算弗莱什阅读舒适度(FRE)评分以评估每个 LLM 生成的回复的可读性。

结果

在 5 点李克特评分方面,3 个 LLM 之间存在显著差异,ChatGPT-4 的平均值为 4.46(标准差.82),Bard 为 3.89(标准差.80),Bing 为 3.11(标准差.72)(P<0.001)。在适当答案的比例方面,3 个 LLM 之间也存在显著差异,ChatGPT-4 为 85.7%,Bard 为 74.3%,Bing 为 25.7%(P<0.001)。ChatGPT-4、Bard 和 Bing 的平均 FRE 分数分别为 21.68(标准差 2.78)、42.89(标准差 4.03)和 14.64(标准差 5.09),分数越高表示可读性越高。

结论

基于 LLM 的 AI 聊天模型可以有效地生成与减重手术相关的临床问题的适当回答,但不同模型的性能可能有很大差异。因此,在解释 LLM 提供的临床信息时应谨慎,并需要临床医生进行监督以确保准确性。有必要进行进一步研究,以探讨 LLM 如何增强减重手术中的医疗服务提供和临床决策制定。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验