生成式人工智能能通过骨科医师资格考试吗？

Can generative artificial intelligence pass the orthopaedic board examination?

作者信息

Isleem Ula N, Zaidat Bashar, Ren Renee, Geng Eric A, Burapachaisri Aonnicha, Tang Justin E, Kim Jun S, Cho Samuel K

机构信息

Department of Orthopaedic Surgery, Icahn School of Medicine at Mount Sinai, New York, NY, USA.

出版信息

J Orthop. 2023 Nov 5;53:27-33. doi: 10.1016/j.jor.2023.10.026. eCollection 2024 Jul.

DOI:10.1016/j.jor.2023.10.026

PMID:38450060

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10912220/

Abstract

BACKGROUND

Resident training programs in the US use the Orthopaedic In-Training Examination (OITE) developed by the American Academy of Orthopaedic Surgeons (AAOS) to assess the current knowledge of their residents and to identify the residents at risk of failing the Amerian Board of Orthopaedic Surgery (ABOS) examination. Optimal strategies for OITE preparation are constantly being explored. There may be a role for Large Language Models (LLMs) in orthopaedic resident education. ChatGPT, an LLM launched in late 2022 has demonstrated the ability to produce accurate, detailed answers, potentially enabling it to aid in medical education and clinical decision-making. The purpose of this study is to evaluate the performance of ChatGPT on Orthopaedic In-Training Examinations using Self-Assessment Exams from the AAOS database and approved literature as a proxy for the Orthopaedic Board Examination.

METHODS

301 SAE questions from the AAOS database and associated AAOS literature were input into ChatGPT's interface in a question and multiple-choice format and the answers were then analyzed to determine which answer choice was selected. A new chat was used for every question. All answers were recorded, categorized, and compared to the answer given by the OITE and SAE exams, noting whether the answer was right or wrong.

RESULTS

Of the 301 questions asked, ChatGPT was able to correctly answer 183 (60.8%) of them. The subjects with the highest percentage of correct questions were basic science (81%), oncology (72.7%, shoulder and elbow (71.9%), and sports (71.4%). The questions were further subdivided into 3 groups: those about management, diagnosis, or knowledge recall. There were 86 management questions and 47 were correct (54.7%), 45 diagnosis questions with 32 correct (71.7%), and 168 knowledge recall questions with 102 correct (60.7%).

CONCLUSIONS

ChatGPT has the potential to provide orthopedic educators and trainees with accurate clinical conclusions for the majority of board-style questions, although its reasoning should be carefully analyzed for accuracy and clinical validity. As such, its usefulness in a clinical educational context is currently limited but rapidly evolving.

CLINICAL RELEVANCE

ChatGPT can access a multitude of medical data and may help provide accurate answers to clinical questions.

摘要

背景

美国住院医师培训项目使用美国矫形外科医师学会（AAOS）开发的骨科住院医师培训考试（OITE）来评估住院医师的当前知识水平，并识别有美国骨科医师委员会（ABOS）考试不及格风险的住院医师。OITE备考的最佳策略一直在探索中。大语言模型（LLMs）在骨科住院医师教育中可能会发挥作用。ChatGPT是2022年末推出的一个大语言模型，已证明有能力给出准确、详细的答案，这可能使其有助于医学教育和临床决策。本研究的目的是使用AAOS数据库中的自我评估考试和经批准的文献作为骨科委员会考试的替代，来评估ChatGPT在骨科住院医师培训考试中的表现。

方法

将来自AAOS数据库的301道SAE问题及相关的AAOS文献以问题和多项选择题的形式输入ChatGPT的界面，然后分析答案以确定选择了哪个答案选项。每个问题使用一个新的聊天窗口。记录所有答案，进行分类，并与OITE和SAE考试给出的答案进行比较，记录答案是否正确。

结果

在提出的301个问题中，ChatGPT能够正确回答其中的183个（60.8%）。正确问题比例最高的主题是基础科学（81%）、肿瘤学（72.7%）、肩部和肘部（71.9%）以及运动医学（71.4%）。这些问题进一步细分为3组：关于管理、诊断或知识回忆的问题。有86个管理问题，其中47个正确（54.7%）；45个诊断问题，其中32个正确（71.7%）；168个知识回忆问题，其中102个正确（60.7%）。

结论

ChatGPT有潜力为骨科教育工作者和学员提供大多数委员会风格问题的准确临床结论，不过其推理的准确性和临床有效性应仔细分析。因此，它在临床教育背景下的有用性目前有限，但正在迅速发展。

临床意义

ChatGPT可以获取大量医学数据，可能有助于为临床问题提供准确答案。

相似文献

Can generative artificial intelligence pass the orthopaedic board examination?

J Orthop. 2023 Nov 5;53:27-33. doi: 10.1016/j.jor.2023.10.026. eCollection 2024 Jul.

A Shadow of Doubt: Is There Implicit Bias Among Orthopaedic Surgery Faculty and Residents Regarding Race and Gender?

Clin Orthop Relat Res. 2024 Jul 1;482(7):1145-1155. doi: 10.1097/CORR.0000000000002933. Epub 2024 Jan 12.

Comparison of ChatGPT and Internet Research for Clinical Research and Decision-Making in Occupational Medicine: Randomized Controlled Trial.

JMIR Form Res. 2025 May 20;9:e63857. doi: 10.2196/63857.

Artificial Intelligence Shows Limited Success in Improving Readability Levels of Spanish-language Orthopaedic Patient Education Materials.

Clin Orthop Relat Res. 2025 Feb 11. doi: 10.1097/CORR.0000000000003413.

Home treatment for mental health problems: a systematic review.

Health Technol Assess. 2001;5(15):1-139. doi: 10.3310/hta5150.

Health professionals' experience of teamwork education in acute hospital settings: a systematic review of qualitative literature.

JBI Database System Rev Implement Rep. 2016 Apr;14(4):96-137. doi: 10.11124/JBISRIR-2016-1843.

A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.

Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320.

Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.

Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3.

Management of urinary stones by experts in stone disease (ESD 2025).

Arch Ital Urol Androl. 2025 Jun 30;97(2):14085. doi: 10.4081/aiua.2025.14085.

Cost-effectiveness of using prognostic information to select women with breast cancer for adjuvant systemic therapy.

Health Technol Assess. 2006 Sep;10(34):iii-iv, ix-xi, 1-204. doi: 10.3310/hta10340.

引用本文的文献

Exploring the role of artificial intelligence in Turkish orthopedic progression exams.

Acta Orthop Traumatol Turc. 2025 Mar 17;59(1):18-26. doi: 10.5152/j.aott.2025.24090.

Exploring the Current Applications of Artificial Intelligence in Orthopaedic Surgical Training: A Systematic Scoping Review.

Cureus. 2025 Apr 3;17(4):e81671. doi: 10.7759/cureus.81671. eCollection 2025 Apr.

Assessing the performance of ChatGPT-4o on the Turkish Orthopedics and Traumatology Board Examination.

Jt Dis Relat Surg. 2025 Apr 5;36(2):304-310. doi: 10.52312/jdrs.2025.1958.

Evaluating the Performance of ChatGPT4.0 Versus ChatGPT3.5 on the Hand Surgery Self-Assessment Exam: A Comparative Analysis of Performance on Image-Based Questions.

Cureus. 2025 Jan 16;17(1):e77550. doi: 10.7759/cureus.77550. eCollection 2025 Jan.

Comparative performance of artificial intelligence models in rheumatology board-level questions: evaluating Google Gemini and ChatGPT-4o.

Clin Rheumatol. 2024 Nov;43(11):3507-3513. doi: 10.1007/s10067-024-07154-5. Epub 2024 Sep 28.

Comparative performance of artificial ıntelligence models in physical medicine and rehabilitation board-level questions.

Rev Assoc Med Bras (1992). 2024 Jul 19;70(7):e20240241. doi: 10.1590/1806-9282.20240241. eCollection 2024.

Exploring the impact of rehabilitation on post-surgical recovery in elbow fracture patients: a cohort study.

Musculoskelet Surg. 2025 Mar;109(1):33-39. doi: 10.1007/s12306-024-00848-8. Epub 2024 Jul 18.

本文引用的文献

Evaluating the Performance of ChatGPT in Ophthalmology: An Analysis of Its Successes and Shortcomings.

Ophthalmol Sci. 2023 May 5;3(4):100324. doi: 10.1016/j.xops.2023.100324. eCollection 2023 Dec.

ChatGPT: Is this version good for healthcare and research?

Diabetes Metab Syndr. 2023 Apr;17(4):102744. doi: 10.1016/j.dsx.2023.102744. Epub 2023 Mar 15.

Not the Last Word: ChatGPT Can't Perform Orthopaedic Surgery.

Clin Orthop Relat Res. 2023 Apr 1;481(4):651-655. doi: 10.1097/CORR.0000000000002619. Epub 2023 Mar 3.

How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.

JMIR Med Educ. 2023 Feb 8;9:e45312. doi: 10.2196/45312.

ChatGPT: five priorities for research.

Nature. 2023 Feb;614(7947):224-226. doi: 10.1038/d41586-023-00288-7.

Generative adversarial networks and synthetic patient data: current challenges and future perspectives.

Future Healthc J. 2022 Jul;9(2):190-193. doi: 10.7861/fhj.2022-0013.

Using Machine Learning to Predict Complications in Pregnancy: A Systematic Review.

Front Bioeng Biotechnol. 2022 Jan 19;9:780389. doi: 10.3389/fbioe.2021.780389. eCollection 2021.

Prediction of Postoperative Delirium in Geriatric Hip Fracture Patients: A Clinical Prediction Model Using Machine Learning Algorithms.

Geriatr Orthop Surg Rehabil. 2021 Dec 13;12:21514593211062277. doi: 10.1177/21514593211062277. eCollection 2021.

What are patients saying about you online? A sentiment analysis of online written reviews on Scoliosis Research Society surgeons.

Spine Deform. 2022 Mar;10(2):301-306. doi: 10.1007/s43390-021-00419-y. Epub 2021 Oct 2.

Analysis of the Basic Science Questions on the Orthopaedic In-Training Examination From 2014 to 2019.

J Am Acad Orthop Surg. 2021 Dec 1;29(23):e1225-e1231. doi: 10.5435/JAAOS-D-20-00862.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

生成式人工智能能通过骨科医师资格考试吗？

Can generative artificial intelligence pass the orthopaedic board examination?

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

CLINICAL RELEVANCE

背景

方法

结果

结论

临床意义

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献