与专家共识相比，ChatGPT-4在慢性外侧踝关节不稳方面的性能评估。

The assessment of ChatGPT-4's performance compared to expert's consensus on chronic lateral ankle instability.

作者信息

Yokoe Takuji, Roversi Giulia, Sevivas Nuno, Kamei Naosuke, Diniz Pedro, Pereira Hélder

机构信息

Orthopaedic Department Centro Hospitalar Póvoa de Varzim Vila do Conde Portugal.

Division of Orthopaedic Surgery, Department of Medicine of Sensory and Motor Organs, Faculty of Medicine University of Miyazaki Miyazaki Japan.

出版信息

J Exp Orthop. 2025 Aug 5;12(3):e70393. doi: 10.1002/jeo2.70393. eCollection 2025 Jul.

DOI:10.1002/jeo2.70393

PMID:40766800

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12322689/

Abstract

PURPOSE

To evaluate the accuracy of answers to clinical questions on the surgical treatment of chronic lateral ankle instability (CLAI) using ChatGPT-4 as a reference for consensus statements developed by the ESSKA-AFAS Ankle Instability Group (AIG). This study simulated the clinical settings where non-expert clinicians treat patients with CLAI.

METHODS

The large language model (LLM) ChatGPT-4 was used on 10 February 2025 to answer a total of 17 questions regarding the surgical management of CLAI that were developed by the ESSKA-AFAS AIG. The ChatGPT responses were compared with the consensus statements developed by ESSKA-AFAS AIG. The consistency and accuracy of the answers by ChatGPT as a reference for the experts' answers were evaluated. The consistency of ChatGPT's answers to the consensus statements was assessed by the question, 'Is the answer by ChatGPT agreement with those by the experts? (Yes or No)'. Four scoring categories: Accuracy, Overconclusiveness (proposed recommendation despite the lack of consensus), Supplementary (additional information not covered by the consensus statement), and Incompleteness, were used to evaluate the quality of ChatGPT's answers.

RESULTS

Of the 17 questions on the surgical management of CLAI, 11 answers (64.7%) were agreement with the consensus statements by the experts. The percentages of ChatGPT's answers that were considered 'Yes' in the Accuracy and Supplementary were 64.7% (11/17) and 70.6% (12/17), respectively. The percentages of ChatGPT's answers that were considered "No" in the Overconclusiveness and Incompleteness were 76.5% (13/17) and 88.2% (15/17), respectively.

CONCLUSION

The present study showed that ChatGPT-4 could not provide answers to queries on the surgical management of CLAI, such as foot and ankle experts. However, ChatGPT also showed its promising potential for its application when managing patients with CLAI.

LEVEL OF EVIDENCE

Level Ⅳ.

摘要

目的

以欧洲足踝学会-美国足踝外科协会踝关节不稳定组（AIG）制定的共识声明为参考，使用ChatGPT-4评估关于慢性外侧踝关节不稳定（CLAI）手术治疗的临床问题答案的准确性。本研究模拟了非专家临床医生治疗CLAI患者的临床场景。

方法

2025年2月10日，使用大语言模型（LLM）ChatGPT-4回答了欧洲足踝学会-美国足踝外科协会AIG提出的关于CLAI手术管理的总共17个问题。将ChatGPT的回答与欧洲足踝学会-美国足踝外科协会AIG制定的共识声明进行比较。评估ChatGPT作为专家答案参考的回答的一致性和准确性。通过“ChatGPT的回答是否与专家的回答一致？（是或否）”这一问题评估ChatGPT回答与共识声明的一致性。使用四个评分类别：准确性、过度结论性（尽管缺乏共识仍提出建议）、补充性（共识声明未涵盖的额外信息）和不完整性，来评估ChatGPT回答的质量。

结果

在关于CLAI手术管理的17个问题中，11个回答（64.7%）与专家的共识声明一致。ChatGPT在准确性和补充性方面被认为“是”的回答百分比分别为64.7%（11/17）和70.6%（12/17）。ChatGPT在过度结论性和不完整性方面被认为“否”的回答百分比分别为76.5%（13/17）和88.2%（15/17）。

结论

本研究表明，ChatGPT-4无法像足踝专家那样提供关于CLAI手术管理问题的答案。然而，ChatGPT在管理CLAI患者时也显示出其有前景的应用潜力。

证据水平

Ⅳ级。

相似文献

The assessment of ChatGPT-4's performance compared to expert's consensus on chronic lateral ankle instability.

J Exp Orthop. 2025 Aug 5;12(3):e70393. doi: 10.1002/jeo2.70393. eCollection 2025 Jul.

Using Artificial Intelligence ChatGPT to Access Medical Information about Chemical Eye Injuries: A Comparative Study.

JMIR Form Res. 2025 Jun 30. doi: 10.2196/73642.

Can ChatGPT be trusted as a resource for a scholarly article on treatment planning implant-supported prostheses?

J Prosthet Dent. 2025 Apr 9. doi: 10.1016/j.prosdent.2025.03.025.

Can generative artificial intelligence provide accurate medical advice?: a case of ChatGPT versus Congress of Neurological Surgeons management of acute cervical spine and spinal cord injuries clinical guidelines.

Asian Spine J. 2025 Mar 4. doi: 10.31616/asj.2024.0301.

Quality and Dependability of ChatGPT and DingXiangYuan Forums for Remote Orthopedic Consultations: Comparative Analysis.

J Med Internet Res. 2024 Mar 14;26:e50882. doi: 10.2196/50882.

Comparison of ChatGPT's Diagnostic and Management Accuracy of Foot and Ankle Bone-Related Pathologies to Orthopaedic Surgeons.

J Am Acad Orthop Surg. 2025 Apr 10;33(16):e949-e955. doi: 10.5435/JAAOS-D-24-01049.

Artificial Intelligence in Peripheral Artery Disease Education: A Battle Between ChatGPT and Google Gemini.

Cureus. 2025 Jun 1;17(6):e85174. doi: 10.7759/cureus.85174. eCollection 2025 Jun.

Exploring ChatGPT's Efficacy in Orthopaedic Arthroplasty Questions Compared to Adult Reconstruction Surgeons.

Arthroplast Today. 2025 Jul 14;34:101772. doi: 10.1016/j.artd.2025.101772. eCollection 2025 Aug.

Can generative artificial intelligence pass the orthopaedic board examination?

J Orthop. 2023 Nov 5;53:27-33. doi: 10.1016/j.jor.2023.10.026. eCollection 2024 Jul.

Comparison of Responses from ChatGPT-4, Google Gemini, and Google Search to Common Patient Questions About Ankle Sprains: A Readability Analysis.

J Am Acad Orthop Surg. 2025 Jul 3;33(16):924-930. doi: 10.5435/JAAOS-D-25-00260.

本文引用的文献

Are Large Language Model-Based Chatbots Effective in Providing Reliable Medical Advice for Achilles Tendinopathy? An International Multispecialist Evaluation.

Orthop J Sports Med. 2025 Apr 30;13(4):23259671251332596. doi: 10.1177/23259671251332596. eCollection 2025 Apr.

Evaluation of ChatGPT-4o's answers to questions about hip arthroscopy from the patient perspective.

Jt Dis Relat Surg. 2025 Jan 2;36(1):193-199. doi: 10.52312/jdrs.2025.1961. Epub 2024 Dec 18.

Arthroscopic lateral ligament reconstruction for isolated chronic lateral ankle instability is associated with longer recovery compared to arthroscopic Broström repair and inferior extensor retinaculum augmentation.

Injury. 2025 Feb;56(2):112082. doi: 10.1016/j.injury.2024.112082. Epub 2024 Dec 11.

ChatGPT-3.5 and -4 provide mostly accurate information when answering patients' questions relating to femoroacetabular impingement syndrome and arthroscopic hip surgery.

J ISAKOS. 2025 Feb;10:100376. doi: 10.1016/j.jisako.2024.100376. Epub 2024 Dec 12.

Usefulness of Suture-Tape Augmentation Based on Intraoperative Ankle Stress Radiographs During Anatomical Ligament Repair for Chronic Lateral Ankle Instability.

Foot Ankle Int. 2025 Jan;46(1):54-63. doi: 10.1177/10711007241291049. Epub 2024 Nov 19.

Evaluating ChatGPT responses to frequently asked patient questions regarding periprosthetic joint infection after total hip and knee arthroplasty.

Digit Health. 2024 Aug 9;10:20552076241272620. doi: 10.1177/20552076241272620. eCollection 2024 Jan-Dec.

ChatGPT-4 Performs Clinical Information Retrieval Tasks Using Consistently More Trustworthy Resources Than Does Google Search for Queries Concerning the Latarjet Procedure.

Arthroscopy. 2025 Mar;41(3):588-597. doi: 10.1016/j.arthro.2024.05.025. Epub 2024 Jun 25.

Performance of Large Language Models on Medical Oncology Examination Questions.

JAMA Netw Open. 2024 Jun 3;7(6):e2417641. doi: 10.1001/jamanetworkopen.2024.17641.

ChatGPT can yield valuable responses in the context of orthopaedic trauma surgery.

J Exp Orthop. 2024 Jun 17;11(3):e12047. doi: 10.1002/jeo2.12047. eCollection 2024 Jul.

Medical Artificial Intelligence and Human Values.

N Engl J Med. 2024 May 30;390(20):1895-1904. doi: 10.1056/NEJMra2214183.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

与专家共识相比，ChatGPT-4在慢性外侧踝关节不稳方面的性能评估。

The assessment of ChatGPT-4's performance compared to expert's consensus on chronic lateral ankle instability.

作者信息

机构信息

出版信息

PURPOSE

METHODS

RESULTS

CONCLUSION

LEVEL OF EVIDENCE

目的

方法

结果

结论

证据水平

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献