Suppr超能文献

人工智能在手外科中的应用:评估大语言模型在手部损伤分类与管理中的作用

AI in Hand Surgery: Assessing Large Language Models in the Classification and Management of Hand Injuries.

作者信息

Pressman Sophia M, Borna Sahar, Gomez-Cabello Cesar A, Haider Syed Ali, Forte Antonio Jorge

机构信息

Division of Plastic Surgery, Mayo Clinic, Jacksonville, FL 32224, USA.

Center for Digital Health, Mayo Clinic, Rochester, MN 55905, USA.

出版信息

J Clin Med. 2024 May 11;13(10):2832. doi: 10.3390/jcm13102832.

Abstract

: OpenAI's ChatGPT (San Francisco, CA, USA) and Google's Gemini (Mountain View, CA, USA) are two large language models that show promise in improving and expediting medical decision making in hand surgery. Evaluating the applications of these models within the field of hand surgery is warranted. This study aims to evaluate ChatGPT-4 and Gemini in classifying hand injuries and recommending treatment. : Gemini and ChatGPT were given 68 fictionalized clinical vignettes of hand injuries twice. The models were asked to use a specific classification system and recommend surgical or nonsurgical treatment. Classifications were scored based on correctness. Results were analyzed using descriptive statistics, a paired two-tailed -test, and sensitivity testing. : Gemini, correctly classifying 70.6% hand injuries, demonstrated superior classification ability over ChatGPT (mean score 1.46 vs. 0.87, -value < 0.001). For management, ChatGPT demonstrated higher sensitivity in recommending surgical intervention compared to Gemini (98.0% vs. 88.8%), but lower specificity (68.4% vs. 94.7%). When compared to ChatGPT, Gemini demonstrated greater response replicability. : Large language models like ChatGPT and Gemini show promise in assisting medical decision making, particularly in hand surgery, with Gemini generally outperforming ChatGPT. These findings emphasize the importance of considering the strengths and limitations of different models when integrating them into clinical practice.

摘要

OpenAI的ChatGPT(美国加利福尼亚州旧金山)和谷歌的Gemini(美国加利福尼亚州山景城)是两个大型语言模型,在改善和加速手外科的医疗决策方面显示出前景。有必要评估这些模型在手外科领域的应用。本研究旨在评估ChatGPT-4和Gemini对手部损伤进行分类并推荐治疗方法的能力。

给Gemini和ChatGPT两次提供68个虚构的手部损伤临床病例。要求这些模型使用特定的分类系统并推荐手术或非手术治疗方法。根据正确性对分类进行评分。使用描述性统计、配对双尾t检验和敏感性测试对结果进行分析。

Gemini正确分类了70.6%的手部损伤,其分类能力优于ChatGPT(平均得分1.46对0.87,p值<0.001)。在治疗建议方面,ChatGPT在推荐手术干预方面比Gemini具有更高的敏感性(98.0%对88.8%),但特异性较低(68.4%对94.7%)。与ChatGPT相比,Gemini表现出更大的回答可重复性。

像ChatGPT和Gemini这样的大型语言模型在协助医疗决策方面显示出前景,特别是在手外科领域,Gemini通常表现优于ChatGPT。这些发现强调了在将不同模型整合到临床实践中时考虑其优势和局限性的重要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/863c/11122623/eab062fa5045/jcm-13-02832-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验