ChatGPT-4 在回答巴西医学学位再认证国家考试问题方面的表现。

Performance of ChatGPT-4 in answering questions from the Brazilian National Examination for Medical Degree Revalidation.

机构信息

Instituto Paulista de Estudos e Pesquisas em Oftalmologia, Vision Institute - São Paulo (SP), Brazil.

Massachusetts Institute of Technology, Institute for Medical Engineering and Science - Cambridge (MA), USA.

出版信息

Rev Assoc Med Bras (1992). 2023 Sep 25;69(10):e20230848. doi: 10.1590/1806-9282.20230848. eCollection 2023.

DOI:10.1590/1806-9282.20230848

PMID:37792871

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10547492/

Abstract

OBJECTIVE

The aim of this study was to evaluate the performance of ChatGPT-4.0 in answering the 2022 Brazilian National Examination for Medical Degree Revalidation (Revalida) and as a tool to provide feedback on the quality of the examination.

METHODS

A total of two independent physicians entered all examination questions into ChatGPT-4.0. After comparing the outputs with the test solutions, they classified the large language model answers as adequate, inadequate, or indeterminate. In cases of disagreement, they adjudicated and achieved a consensus decision on the ChatGPT accuracy. The performance across medical themes and nullified questions was compared using chi-square statistical analysis.

RESULTS

In the Revalida examination, ChatGPT-4.0 answered 71 (87.7%) questions correctly and 10 (12.3%) incorrectly. There was no statistically significant difference in the proportions of correct answers among different medical themes (p=0.4886). The artificial intelligence model had a lower accuracy of 71.4% in nullified questions, with no statistical difference (p=0.241) between non-nullified and nullified groups.

CONCLUSION

ChatGPT-4.0 showed satisfactory performance for the 2022 Brazilian National Examination for Medical Degree Revalidation. The large language model exhibited worse performance on subjective questions and public healthcare themes. The results of this study suggested that the overall quality of the Revalida examination questions is satisfactory and corroborates the nullified questions.

摘要

目的

本研究旨在评估 ChatGPT-4.0 在回答 2022 年巴西医学学位再认证考试（Revalida）中的表现，并作为评估考试质量的工具。

方法

两位独立医生将所有考试问题输入到 ChatGPT-4.0 中。在将输出结果与测试解决方案进行比较后，他们将大语言模型的答案分为充分、不充分或不确定。在存在分歧的情况下，他们进行裁决并就 ChatGPT 的准确性达成共识。使用卡方统计分析比较了不同医学主题和无效问题的性能。

结果

在 Revalida 考试中，ChatGPT-4.0 正确回答了 71 个（87.7%）问题，错误回答了 10 个（12.3%）。不同医学主题的正确答案比例没有统计学差异（p=0.4886）。在无效问题中，人工智能模型的准确率较低，为 71.4%，但在非无效和无效组之间没有统计学差异（p=0.241）。

结论

ChatGPT-4.0 在 2022 年巴西医学学位再认证考试中表现出令人满意的性能。大语言模型在主观问题和公共卫生保健主题上的表现较差。本研究结果表明，Revalida 考试问题的整体质量令人满意，并证实了无效问题。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/23f6/10547492/2126ccc5d9f5/1806-9282-ramb-69-10-e20230848-gf01.jpg

相似文献

Performance of ChatGPT-4 in answering questions from the Brazilian National Examination for Medical Degree Revalidation.

Rev Assoc Med Bras (1992). 2023 Sep 25;69(10):e20230848. doi: 10.1590/1806-9282.20230848. eCollection 2023.

Performance of ChatGPT on the Chinese Postgraduate Examination for Clinical Medicine: Survey Study.

JMIR Med Educ. 2024 Feb 9;10:e48514. doi: 10.2196/48514.

Assessing the Capability of ChatGPT in Answering First- and Second-Order Knowledge Questions on Microbiology as per Competency-Based Medical Education Curriculum.

Cureus. 2023 Mar 12;15(3):e36034. doi: 10.7759/cureus.36034. eCollection 2023 Mar.

Performance of ChatGPT on Ophthalmology-Related Questions Across Various Examination Levels: Observational Study.

JMIR Med Educ. 2024 Jan 18;10:e50842. doi: 10.2196/50842.

Accuracy of ChatGPT on Medical Questions in the National Medical Licensing Examination in Japan: Evaluation Study.

JMIR Form Res. 2023 Oct 13;7:e48023. doi: 10.2196/48023.

Performance of ChatGPT as an AI-assisted decision support tool in medicine: a proof-of-concept study for interpreting symptoms and management of common cardiac conditions (AMSTELHEART-2).

Acta Cardiol. 2024 May;79(3):358-366. doi: 10.1080/00015385.2024.2303528. Epub 2024 Feb 13.

Performance of the Large Language Model ChatGPT on the National Nurse Examinations in Japan: Evaluation Study.

JMIR Nurs. 2023 Jun 27;6:e47305. doi: 10.2196/47305.

Evaluating capabilities of large language models: Performance of GPT-4 on surgical knowledge assessments.

Surgery. 2024 Apr;175(4):936-942. doi: 10.1016/j.surg.2023.12.014. Epub 2024 Jan 20.

ChatGPT Earns American Board Certification in Hand Surgery.

Hand Surg Rehabil. 2024 Jun;43(3):101688. doi: 10.1016/j.hansur.2024.101688. Epub 2024 Mar 27.

Artificial intelligence performance in clinical neurology queries: the ChatGPT model.

Neurol Res. 2024 May;46(5):437-443. doi: 10.1080/01616412.2024.2334118. Epub 2024 Mar 24.

引用本文的文献

Comparative Performance of Medical Students, ChatGPT-3.5 and ChatGPT-4.0 in Answering Questions From a Brazilian National Medical Exam: Cross-Sectional Questionnaire Study.

JMIR AI. 2025 May 8;4:e66552. doi: 10.2196/66552.

Assessing Large Language Models for Medical Question Answering in Portuguese: Open-Source Versus Closed-Source Approaches.

Cureus. 2025 May 15;17(5):e84165. doi: 10.7759/cureus.84165. eCollection 2025 May.

Identification of Online Health Information Using Large Pretrained Language Models: Mixed Methods Study.

J Med Internet Res. 2025 May 14;27:e70733. doi: 10.2196/70733.

Evaluating the Accuracy of Gemini 2.0 Advanced and ChatGPT 4o in Cataract Knowledge: A Performance Analysis Using Brazilian Council of Ophthalmology Board Exam Questions.

Cureus. 2025 Feb 24;17(2):e79565. doi: 10.7759/cureus.79565. eCollection 2025 Feb.

Performance of chatbots in queries concerning fundamental concepts in photochemistry.

Photochem Photobiol. 2024 Nov 4. doi: 10.1111/php.14037.

A framework for human evaluation of large language models in healthcare derived from literature review.

NPJ Digit Med. 2024 Sep 28;7(1):258. doi: 10.1038/s41746-024-01258-7.

Multimodal Large Language Models in Health Care: Applications, Challenges, and Future Outlook.

J Med Internet Res. 2024 Sep 25;26:e59505. doi: 10.2196/59505.

Human versus Artificial Intelligence: ChatGPT-4 Outperforming Bing, Bard, ChatGPT-3.5 and Humans in Clinical Chemistry Multiple-Choice Questions.

Adv Med Educ Pract. 2024 Sep 20;15:857-871. doi: 10.2147/AMEP.S479801. eCollection 2024.

Performance of ChatGPT in Solving Questions From the Progress Test (Brazilian National Medical Exam): A Potential Artificial Intelligence Tool in Medical Practice.

Cureus. 2024 Jul 19;16(7):e64924. doi: 10.7759/cureus.64924. eCollection 2024 Jul.

Evaluating the performance of ChatGPT-3.5 and ChatGPT-4 on the Taiwan plastic surgery board examination.

Heliyon. 2024 Jul 18;10(14):e34851. doi: 10.1016/j.heliyon.2024.e34851. eCollection 2024 Jul 30.

本文引用的文献

Performance of GPT-3.5 and GPT-4 on the Japanese Medical Licensing Examination: Comparison Study.

JMIR Med Educ. 2023 Jun 29;9:e48002. doi: 10.2196/48002.

Performance of an Artificial Intelligence Chatbot in Ophthalmic Knowledge Assessment.

JAMA Ophthalmol. 2023 Jun 1;141(6):589-597. doi: 10.1001/jamaophthalmol.2023.1144.

ChatGPT outscored human candidates in a virtual objective structured clinical examination in obstetrics and gynecology.

Am J Obstet Gynecol. 2023 Aug;229(2):172.e1-172.e12. doi: 10.1016/j.ajog.2023.04.020. Epub 2023 Apr 22.

Early applications of ChatGPT in medical practice, education and research.

Clin Med (Lond). 2023 May;23(3):278-279. doi: 10.7861/clinmed.2023-0078. Epub 2023 Apr 21.

ChatGPT Utility in Healthcare Education, Research, and Practice: Systematic Review on the Promising Perspectives and Valid Concerns.

Healthcare (Basel). 2023 Mar 19;11(6):887. doi: 10.3390/healthcare11060887.

Opportunities and risks of ChatGPT in medicine, science, and academic publishing: a modern Promethean dilemma.

Croat Med J. 2023 Feb 28;64(1):1-3. doi: 10.3325/cmj.2023.64.1.

Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models.

PLOS Digit Health. 2023 Feb 9;2(2):e0000198. doi: 10.1371/journal.pdig.0000198. eCollection 2023 Feb.

Natural language processing: state of the art, current trends and challenges.

Multimed Tools Appl. 2023;82(3):3713-3744. doi: 10.1007/s11042-022-13428-4. Epub 2022 Jul 14.

Clinical Natural Language Processing in 2014: Foundational Methods Supporting Efficient Healthcare.

Yearb Med Inform. 2015 Aug 13;10(1):194-8. doi: 10.15265/IY-2015-035.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

ChatGPT-4 在回答巴西医学学位再认证国家考试问题方面的表现。

Performance of ChatGPT-4 in answering questions from the Brazilian National Examination for Medical Degree Revalidation.

机构信息

出版信息

OBJECTIVE

METHODS

RESULTS

CONCLUSION

目的

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献