复杂医疗决策场景中人工智能模型的比较分析：评估ChatGPT、Claude AI、Bard和Perplexity

A Comparative Analysis of AI Models in Complex Medical Decision-Making Scenarios: Evaluating ChatGPT, Claude AI, Bard, and Perplexity.

作者信息

Uppalapati Vamsi Krishna, Nag Deb Sanjay

机构信息

Department of Anesthesiology, Tata Main Hospital, Jamshedpur, IND.

出版信息

Cureus. 2024 Jan 18;16(1):e52485. doi: 10.7759/cureus.52485. eCollection 2024 Jan.

DOI:10.7759/cureus.52485

PMID:38371109

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10874112/

Abstract

This study rigorously evaluates the performance of four artificial intelligence (AI) language models - ChatGPT, Claude AI, Google Bard, and Perplexity AI - across four key metrics: accuracy, relevance, clarity, and completeness. We used a strong mix of research methods, getting opinions from 14 scenarios. This helped us make sure our findings were accurate and dependable. The study showed that Claude AI performs better than others because it gives complete responses. Its average score was 3.64 for relevance and 3.43 for completeness compared to other AI tools. ChatGPT always did well, and Google Bard had unclear responses, which varied greatly, making it difficult to understand it, so there was no consistency in Google Bard. These results give important information about what AI language models are doing well or not for medical suggestions. They help us use them better, telling us how to improve future tech changes that use AI. The study shows that AI abilities match complex medical scenarios.

摘要

本研究严格评估了四种人工智能（AI）语言模型——ChatGPT、Claude AI、谷歌巴德（Google Bard）和Perplexity AI——在四个关键指标上的表现：准确性、相关性、清晰度和完整性。我们采用了多种研究方法，从14个场景中获取意见。这有助于确保我们的研究结果准确可靠。研究表明，Claude AI表现优于其他模型，因为它给出的回答完整。与其他人工智能工具相比，其相关性平均得分为3.64，完整性平均得分为3.43。ChatGPT一直表现出色，而谷歌巴德的回答不清晰，差异很大，难以理解，因此谷歌巴德缺乏一致性。这些结果提供了关于人工智能语言模型在提供医学建议方面表现优劣的重要信息。它们有助于我们更好地使用这些模型，告诉我们如何改进未来使用人工智能的技术变革。研究表明，人工智能的能力与复杂的医疗场景相匹配。

相似文献

A Comparative Analysis of AI Models in Complex Medical Decision-Making Scenarios: Evaluating ChatGPT, Claude AI, Bard, and Perplexity.

Cureus. 2024 Jan 18;16(1):e52485. doi: 10.7759/cureus.52485. eCollection 2024 Jan.

Assessing the Accuracy of Information on Medication Abortion: A Comparative Analysis of ChatGPT and Google Bard AI.

Cureus. 2024 Jan 2;16(1):e51544. doi: 10.7759/cureus.51544. eCollection 2024 Jan.

Radiologic Decision-Making for Imaging in Pulmonary Embolism: Accuracy and Reliability of Large Language Models-Bing, Claude, ChatGPT, and Perplexity.

Indian J Radiol Imaging. 2024 Jul 4;34(4):653-660. doi: 10.1055/s-0044-1787974. eCollection 2024 Oct.

The performance of artificial intelligence models in generating responses to general orthodontic questions: ChatGPT vs Google Bard.

Am J Orthod Dentofacial Orthop. 2024 Jun;165(6):652-662. doi: 10.1016/j.ajodo.2024.01.012. Epub 2024 Mar 15.

Pilot Testing of a Tool to Standardize the Assessment of the Quality of Health Information Generated by Artificial Intelligence-Based Models.

Cureus. 2023 Nov 24;15(11):e49373. doi: 10.7759/cureus.49373. eCollection 2023 Nov.

Understanding the Landscape: The Emergence of Artificial Intelligence (AI), ChatGPT, and Google Bard in Gastroenterology.

Cureus. 2024 Jan 8;16(1):e51848. doi: 10.7759/cureus.51848. eCollection 2024 Jan.

Generative artificial intelligence chatbots may provide appropriate informational responses to common vascular surgery questions by patients.

Vascular. 2025 Feb;33(1):229-237. doi: 10.1177/17085381241240550. Epub 2024 Mar 18.

The performance of artificial intelligence large language model-linked chatbots in surgical decision-making for gastroesophageal reflux disease.

Surg Endosc. 2024 May;38(5):2320-2330. doi: 10.1007/s00464-024-10807-w. Epub 2024 Apr 17.

The ability of artificial intelligence tools to formulate orthopaedic clinical decisions in comparison to human clinicians: An analysis of ChatGPT 3.5, ChatGPT 4, and Bard.

J Orthop. 2023 Dec 1;50:1-7. doi: 10.1016/j.jor.2023.11.063. eCollection 2024 Apr.

Evaluation of the Performance of Generative AI Large Language Models ChatGPT, Google Bard, and Microsoft Bing Chat in Supporting Evidence-Based Dentistry: Comparative Mixed Methods Study.

J Med Internet Res. 2023 Dec 28;25:e51580. doi: 10.2196/51580.

引用本文的文献

ChatGPT's role in the rapidly evolving hematologic cancer landscape.

Future Sci OA. 2025 Dec;11(1):2546259. doi: 10.1080/20565623.2025.2546259. Epub 2025 Sep 3.

Evaluating the Accuracy, Completeness, and Readability of Chatbot Responses to Refractive Surgery-Related Patient Questions: A Comparative Analysis of ChatGPT and Google Gemini.

Cureus. 2025 Jul 29;17(7):e88980. doi: 10.7759/cureus.88980. eCollection 2025 Jul.

From dictation to diagnosis: enhancing radiology reporting with integrated speech recognition in multimodal large language models.

Eur Radiol. 2025 Aug 15. doi: 10.1007/s00330-025-11929-y.

Chatbot for the Return of Positive Genetic Screening Results for Hereditary Cancer Syndromes: Prompt Engineering Project.

JMIR Cancer. 2025 Jun 10;11:e65848. doi: 10.2196/65848.

Digital transformation of nephrology POCUS education-Integrating a multiagent, artificial intelligence, and human collaboration-enhanced curriculum with expert feedback.

Digit Health. 2025 Mar 28;11:20552076251328807. doi: 10.1177/20552076251328807. eCollection 2025 Jan-Dec.

Evaluating the Use of Generative Artificial Intelligence to Support Genetic Counseling for Rare Diseases.

Diagnostics (Basel). 2025 Mar 10;15(6):672. doi: 10.3390/diagnostics15060672.

Generative AI Decision-Making Attributes in Complex Health Services: A Rapid Review.

Cureus. 2025 Jan 30;17(1):e78257. doi: 10.7759/cureus.78257. eCollection 2025 Jan.

Opportunities and Challenges of Chatbots in Ophthalmology: A Narrative Review.

J Pers Med. 2024 Dec 21;14(12):1165. doi: 10.3390/jpm14121165.

Quality of Information Provided by Artificial Intelligence Chatbots Surrounding the Reconstructive Surgery for Head and Neck Cancer: A Comparative Analysis Between ChatGPT4 and Claude2.

Clin Otolaryngol. 2025 Mar;50(2):330-335. doi: 10.1111/coa.14261. Epub 2024 Dec 4.

Assessing AI efficacy in medical knowledge tests: A study using Taiwan's internal medicine exam questions from 2020 to 2023.

Digit Health. 2024 Oct 18;10:20552076241291404. doi: 10.1177/20552076241291404. eCollection 2024 Jan-Dec.

本文引用的文献

Perioperative Management for Non-Thyroidal Surgery in Thyroid Dysfunction.

Indian J Endocrinol Metab. 2022 Sep-Oct;26(5):428-434. doi: 10.4103/ijem.ijem_273_22. Epub 2022 Nov 22.

Postoperative outcomes of resectable periampullary cancer accompanied by obstructive jaundice with and without preoperative endoscopic biliary drainage.

Front Oncol. 2022 Nov 10;12:1040508. doi: 10.3389/fonc.2022.1040508. eCollection 2022.

Interstitial lung disease following coronavirus disease 2019.

Curr Opin Pulm Med. 2022 Sep 1;28(5):399-406. doi: 10.1097/MCP.0000000000000900.

Defining AMIA's artificial intelligence principles.

J Am Med Inform Assoc. 2022 Mar 15;29(4):585-591. doi: 10.1093/jamia/ocac006.

Prehospital management of burns requiring specialized burn centre evaluation: a single physician-based emergency medical service experience.

Scand J Trauma Resusc Emerg Med. 2020 Aug 20;28(1):84. doi: 10.1186/s13049-020-00771-4.

Adverse intraoperative events during surgical repair of ruptured cerebral aneurysms: a systematic review.

Neurosurg Rev. 2021 Jun;44(3):1273-1285. doi: 10.1007/s10143-020-01312-4. Epub 2020 Jun 16.

Are Tracheotomies Required for Patients Undergoing Composite Mandibular Resections for Oral Cancer?

J Oral Maxillofac Surg. 2020 Aug;78(8):1427-1435. doi: 10.1016/j.joms.2020.03.027. Epub 2020 Apr 6.

Ludwig's Angina: Anesthetic Management.

Anesth Prog. 2019 Summer;66(2):103-110. doi: 10.2344/anpr-66-01-13.

Postoperative outcomes of patients with chronic obstructive pulmonary disease undergoing coronary artery bypass grafting surgery: A meta-analysis.

Medicine (Baltimore). 2019 Feb;98(6):e14388. doi: 10.1097/MD.0000000000014388.

Long-Term Survival After Arterial Versus Atrial Switch in d-Transposition of the Great Arteries.

Ann Thorac Surg. 2018 Dec;106(6):1827-1833. doi: 10.1016/j.athoracsur.2018.06.084. Epub 2018 Aug 31.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

复杂医疗决策场景中人工智能模型的比较分析：评估ChatGPT、Claude AI、Bard和Perplexity

A Comparative Analysis of AI Models in Complex Medical Decision-Making Scenarios: Evaluating ChatGPT, Claude AI, Bard, and Perplexity.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献