人工智能在手外科中的应用：评估大语言模型在手部损伤分类与管理中的作用

AI in Hand Surgery: Assessing Large Language Models in the Classification and Management of Hand Injuries.

作者信息

Pressman Sophia M, Borna Sahar, Gomez-Cabello Cesar A, Haider Syed Ali, Forte Antonio Jorge

机构信息

Division of Plastic Surgery, Mayo Clinic, Jacksonville, FL 32224, USA.

Center for Digital Health, Mayo Clinic, Rochester, MN 55905, USA.

出版信息

J Clin Med. 2024 May 11;13(10):2832. doi: 10.3390/jcm13102832.

DOI:10.3390/jcm13102832

PMID:38792374

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11122623/

Abstract

: OpenAI's ChatGPT (San Francisco, CA, USA) and Google's Gemini (Mountain View, CA, USA) are two large language models that show promise in improving and expediting medical decision making in hand surgery. Evaluating the applications of these models within the field of hand surgery is warranted. This study aims to evaluate ChatGPT-4 and Gemini in classifying hand injuries and recommending treatment. : Gemini and ChatGPT were given 68 fictionalized clinical vignettes of hand injuries twice. The models were asked to use a specific classification system and recommend surgical or nonsurgical treatment. Classifications were scored based on correctness. Results were analyzed using descriptive statistics, a paired two-tailed -test, and sensitivity testing. : Gemini, correctly classifying 70.6% hand injuries, demonstrated superior classification ability over ChatGPT (mean score 1.46 vs. 0.87, -value < 0.001). For management, ChatGPT demonstrated higher sensitivity in recommending surgical intervention compared to Gemini (98.0% vs. 88.8%), but lower specificity (68.4% vs. 94.7%). When compared to ChatGPT, Gemini demonstrated greater response replicability. : Large language models like ChatGPT and Gemini show promise in assisting medical decision making, particularly in hand surgery, with Gemini generally outperforming ChatGPT. These findings emphasize the importance of considering the strengths and limitations of different models when integrating them into clinical practice.

摘要

OpenAI的ChatGPT（美国加利福尼亚州旧金山）和谷歌的Gemini（美国加利福尼亚州山景城）是两个大型语言模型，在改善和加速手外科的医疗决策方面显示出前景。有必要评估这些模型在手外科领域的应用。本研究旨在评估ChatGPT-4和Gemini对手部损伤进行分类并推荐治疗方法的能力。

给Gemini和ChatGPT两次提供68个虚构的手部损伤临床病例。要求这些模型使用特定的分类系统并推荐手术或非手术治疗方法。根据正确性对分类进行评分。使用描述性统计、配对双尾t检验和敏感性测试对结果进行分析。

Gemini正确分类了70.6%的手部损伤，其分类能力优于ChatGPT（平均得分1.46对0.87，p值<0.001）。在治疗建议方面，ChatGPT在推荐手术干预方面比Gemini具有更高的敏感性（98.0%对88.8%），但特异性较低（68.4%对94.7%）。与ChatGPT相比，Gemini表现出更大的回答可重复性。

像ChatGPT和Gemini这样的大型语言模型在协助医疗决策方面显示出前景，特别是在手外科领域，Gemini通常表现优于ChatGPT。这些发现强调了在将不同模型整合到临床实践中时考虑其优势和局限性的重要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/863c/11122623/eab062fa5045/jcm-13-02832-g001.jpg

相似文献

AI in Hand Surgery: Assessing Large Language Models in the Classification and Management of Hand Injuries.

J Clin Med. 2024 May 11;13(10):2832. doi: 10.3390/jcm13102832.

Gemini AI vs. ChatGPT: A comprehensive examination alongside ophthalmology residents in medical knowledge.

Graefes Arch Clin Exp Ophthalmol. 2025 Feb;263(2):527-536. doi: 10.1007/s00417-024-06625-4. Epub 2024 Sep 15.

Reliability of large language models for advanced head and neck malignancies management: a comparison between ChatGPT 4 and Gemini Advanced.

Eur Arch Otorhinolaryngol. 2024 Sep;281(9):5001-5006. doi: 10.1007/s00405-024-08746-2. Epub 2024 May 25.

Comparison of Gemini Advanced and ChatGPT 4.0's Performances on the Ophthalmology Resident Ophthalmic Knowledge Assessment Program (OKAP) Examination Review Question Banks.

Cureus. 2024 Sep 17;16(9):e69612. doi: 10.7759/cureus.69612. eCollection 2024 Sep.

Large Language Models for Intraoperative Decision Support in Plastic Surgery: A Comparison between ChatGPT-4 and Gemini.

Medicina (Kaunas). 2024 Jun 8;60(6):957. doi: 10.3390/medicina60060957.

Comparative performance of artificial intelligence models in rheumatology board-level questions: evaluating Google Gemini and ChatGPT-4o.

Clin Rheumatol. 2024 Nov;43(11):3507-3513. doi: 10.1007/s10067-024-07154-5. Epub 2024 Sep 28.

Redefining Healthcare With Artificial Intelligence (AI): The Contributions of ChatGPT, Gemini, and Co-pilot.

Cureus. 2024 Apr 7;16(4):e57795. doi: 10.7759/cureus.57795. eCollection 2024 Apr.

Unlocking Health Literacy: The Ultimate Guide to Hypertension Education From ChatGPT Versus Google Gemini.

Cureus. 2024 May 8;16(5):e59898. doi: 10.7759/cureus.59898. eCollection 2024 May.

Comparative Evaluation of AI Models Such as ChatGPT 3.5, ChatGPT 4.0, and Google Gemini in Neuroradiology Diagnostics.

Cureus. 2024 Aug 25;16(8):e67766. doi: 10.7759/cureus.67766. eCollection 2024 Aug.

Comparative Analysis of Large Language Models in Emergency Plastic Surgery Decision-Making: The Role of Physical Exam Data.

J Pers Med. 2024 Jun 8;14(6):612. doi: 10.3390/jpm14060612.

引用本文的文献

Assessment of Recommendations Provided to Athletes Regarding Sleep Education by GPT-4o and Google Gemini: Comparative Evaluation Study.

JMIR Form Res. 2025 Jul 8;9:e71358. doi: 10.2196/71358.

The Role of Artificial Intelligence Large Language Models in Personalized Rehabilitation Programs for Knee Osteoarthritis: An Observational Study.

J Med Syst. 2025 Jun 3;49(1):73. doi: 10.1007/s10916-025-02207-x.

Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis.

J Med Internet Res. 2025 Apr 30;27:e64486. doi: 10.2196/64486.

The role of artificial intelligence in predicting injured structures based on clinical images of lacerations in the volar aspect of the hand and forearm.

J Hand Microsurg. 2025 Apr 9;17(4):100255. doi: 10.1016/j.jham.2025.100255. eCollection 2025 Jul.

Using Large Language Models to Retrieve Critical Data from Clinical Processes and Business Rules.

Bioengineering (Basel). 2024 Dec 28;12(1):17. doi: 10.3390/bioengineering12010017.

Clinical and Surgical Applications of Large Language Models: A Systematic Review.

J Clin Med. 2024 May 22;13(11):3041. doi: 10.3390/jcm13113041.

本文引用的文献

AI and Ethics: A Systematic Review of the Ethical Considerations of Large Language Model Use in Surgery Research.

Healthcare (Basel). 2024 Apr 13;12(8):825. doi: 10.3390/healthcare12080825.

Large language models as assistance for glaucoma surgical cases: a ChatGPT vs. Google Gemini comparison.

Graefes Arch Clin Exp Ophthalmol. 2024 Sep;262(9):2945-2959. doi: 10.1007/s00417-024-06470-5. Epub 2024 Apr 4.

ChatGPT Earns American Board Certification in Hand Surgery.

Hand Surg Rehabil. 2024 Jun;43(3):101688. doi: 10.1016/j.hansur.2024.101688. Epub 2024 Mar 27.

ChatGPT's Response Consistency: A Study on Repeated Queries of Medical Examination Questions.

Eur J Investig Health Psychol Educ. 2024 Mar 8;14(3):657-668. doi: 10.3390/ejihpe14030043.

Experimenting With the New Frontier: Artificial Intelligence-Powered Chat Bots in Hand Surgery.

Hand (N Y). 2024 Mar 25:15589447241238372. doi: 10.1177/15589447241238372.

Comparison of emergency medicine specialist, cardiologist, and chat-GPT in electrocardiography assessment.

Am J Emerg Med. 2024 Jun;80:51-60. doi: 10.1016/j.ajem.2024.03.017. Epub 2024 Mar 15.

Exploring AI-chatbots' capability to suggest surgical planning in ophthalmology: ChatGPT versus Google Gemini analysis of retinal detachment cases.

Br J Ophthalmol. 2024 Sep 20;108(10):1457-1469. doi: 10.1136/bjo-2023-325143.

Repeatability, reproducibility, and diagnostic accuracy of a commercial large language model (ChatGPT) to perform emergency department triage using the Canadian triage and acuity scale.

CJEM. 2024 Jan;26(1):40-46. doi: 10.1007/s43678-023-00616-w. Epub 2024 Jan 11.

Navigating the Ethical Landmines of ChatGPT: Implications of Intelligent Chatbots in Plastic Surgery Clinical Practice.

Plast Reconstr Surg Glob Open. 2023 Sep 15;11(9):e5290. doi: 10.1097/GOX.0000000000005290. eCollection 2023 Sep.

Can AI Think Like a Plastic Surgeon? Evaluating GPT-4's Clinical Judgment in Reconstructive Procedures of the Upper Extremity.

Plast Reconstr Surg Glob Open. 2023 Dec 13;11(12):e5471. doi: 10.1097/GOX.0000000000005471. eCollection 2023 Dec.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

人工智能在手外科中的应用：评估大语言模型在手部损伤分类与管理中的作用

AI in Hand Surgery: Assessing Large Language Models in the Classification and Management of Hand Injuries.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献