Yokoe Takuji, Roversi Giulia, Sevivas Nuno, Kamei Naosuke, Diniz Pedro, Pereira Hélder
Orthopaedic Department Centro Hospitalar Póvoa de Varzim Vila do Conde Portugal.
Division of Orthopaedic Surgery, Department of Medicine of Sensory and Motor Organs, Faculty of Medicine University of Miyazaki Miyazaki Japan.
J Exp Orthop. 2025 Aug 5;12(3):e70393. doi: 10.1002/jeo2.70393. eCollection 2025 Jul.
To evaluate the accuracy of answers to clinical questions on the surgical treatment of chronic lateral ankle instability (CLAI) using ChatGPT-4 as a reference for consensus statements developed by the ESSKA-AFAS Ankle Instability Group (AIG). This study simulated the clinical settings where non-expert clinicians treat patients with CLAI.
The large language model (LLM) ChatGPT-4 was used on 10 February 2025 to answer a total of 17 questions regarding the surgical management of CLAI that were developed by the ESSKA-AFAS AIG. The ChatGPT responses were compared with the consensus statements developed by ESSKA-AFAS AIG. The consistency and accuracy of the answers by ChatGPT as a reference for the experts' answers were evaluated. The consistency of ChatGPT's answers to the consensus statements was assessed by the question, 'Is the answer by ChatGPT agreement with those by the experts? (Yes or No)'. Four scoring categories: Accuracy, Overconclusiveness (proposed recommendation despite the lack of consensus), Supplementary (additional information not covered by the consensus statement), and Incompleteness, were used to evaluate the quality of ChatGPT's answers.
Of the 17 questions on the surgical management of CLAI, 11 answers (64.7%) were agreement with the consensus statements by the experts. The percentages of ChatGPT's answers that were considered 'Yes' in the Accuracy and Supplementary were 64.7% (11/17) and 70.6% (12/17), respectively. The percentages of ChatGPT's answers that were considered "No" in the Overconclusiveness and Incompleteness were 76.5% (13/17) and 88.2% (15/17), respectively.
The present study showed that ChatGPT-4 could not provide answers to queries on the surgical management of CLAI, such as foot and ankle experts. However, ChatGPT also showed its promising potential for its application when managing patients with CLAI.
Level Ⅳ.
以欧洲足踝学会-美国足踝外科协会踝关节不稳定组(AIG)制定的共识声明为参考,使用ChatGPT-4评估关于慢性外侧踝关节不稳定(CLAI)手术治疗的临床问题答案的准确性。本研究模拟了非专家临床医生治疗CLAI患者的临床场景。
2025年2月10日,使用大语言模型(LLM)ChatGPT-4回答了欧洲足踝学会-美国足踝外科协会AIG提出的关于CLAI手术管理的总共17个问题。将ChatGPT的回答与欧洲足踝学会-美国足踝外科协会AIG制定的共识声明进行比较。评估ChatGPT作为专家答案参考的回答的一致性和准确性。通过“ChatGPT的回答是否与专家的回答一致?(是或否)”这一问题评估ChatGPT回答与共识声明的一致性。使用四个评分类别:准确性、过度结论性(尽管缺乏共识仍提出建议)、补充性(共识声明未涵盖的额外信息)和不完整性,来评估ChatGPT回答的质量。
在关于CLAI手术管理的17个问题中,11个回答(64.7%)与专家的共识声明一致。ChatGPT在准确性和补充性方面被认为“是”的回答百分比分别为64.7%(11/17)和70.6%(12/17)。ChatGPT在过度结论性和不完整性方面被认为“否”的回答百分比分别为76.5%(13/17)和88.2%(15/17)。
本研究表明,ChatGPT-4无法像足踝专家那样提供关于CLAI手术管理问题的答案。然而,ChatGPT在管理CLAI患者时也显示出其有前景的应用潜力。
Ⅳ级。