对DeepSeek-R1、ChatGPT-o1、ChatGPT-4和牙科GPT聊天机器人针对患者有关口腔颌面修复体询问的回复进行比较评估。

Comparative evaluation of responses from DeepSeek-R1, ChatGPT-o1, ChatGPT-4, and dental GPT chatbots to patient inquiries about dental and maxillofacial prostheses.

作者信息

Özcivelek Tuğgen, Özcan Berna

机构信息

Department of Prosthodontics, Gülhane Faculty of Dentistry, University of Health Sciences, Gen Dr Tevfik Saglam St. No:1 Kecioren, Ankara, Turkey.

出版信息

BMC Oral Health. 2025 May 31;25(1):871. doi: 10.1186/s12903-025-06267-w.

DOI:10.1186/s12903-025-06267-w

PMID:40450291

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12126883/

Abstract

BACKGROUND

Artificial intelligence chatbots have the potential to inform and guide patients by providing human-like responses to questions about dental and maxillofacial prostheses. Information regarding the accuracy and qualifications of these responses is limited. This in-silico study aimed to evaluate the accuracy, quality, readability, understandability, and actionability of the responses from DeepSeek-R1, ChatGPT-o1, ChatGPT-4, and Dental GPT chatbots.

METHODS

Four chatbots were queried about 35 of the most frequently asked patient questions about their prostheses. The accuracy, quality, understandability, and actionability of the responses were assessed by two prosthodontists using five-point Likert scale, Global Quality Score, and Patient Education Materials Assessment Tool for Printed Materials scales, respectively. Readability was scored using the Flesch-Kincaid Grade Level and Flesch Reading Ease. The agreement was assessed using the Cohen Kappa test. Differences between chatbots were analyzed using the Kruskal-Wallis test, one-way ANOVA, and post-hoc tests.

RESULTS

Chatbots showed a significant difference in accuracy and readability (p <.05). Dental GPT recorded the highest accuracy score, whereas ChatGPT-4 had the lowest. DeepSeek-R1 performed best, while Dental GPT had the lowest performance in readability. Quality, understandability, actionability, and reader education scores did not show significant differences.

CONCLUSIONS

While accuracy may vary among chatbots, the domain-specific trained AI tool and ChatGPT-o1 demonstrated superior accuracy. Even if accuracy is high, misinformation in health care can have significant consequences. Enhancing the readability of the responses is essential, and chatbots should be chosen accordingly. The accuracy and readability of information from chatbots should be monitored for public health.

摘要

背景

人工智能聊天机器人有潜力通过对有关牙颌面修复体的问题提供类似人类的回答来为患者提供信息和指导。关于这些回答的准确性和资质的信息有限。这项计算机模拟研究旨在评估DeepSeek-R1、ChatGPT-o1、ChatGPT-4和Dental GPT聊天机器人回答的准确性、质量、可读性、可理解性和可操作性。

方法

针对35个患者关于修复体最常问的问题对四个聊天机器人进行了询问。两名口腔修复医生分别使用五点李克特量表、全球质量评分和印刷材料患者教育材料评估工具量表对回答的准确性、质量、可理解性和可操作性进行了评估。使用弗莱什-金凯德年级水平和弗莱什阅读简易度对可读性进行评分。使用科恩卡方检验评估一致性。使用克鲁斯卡尔-沃利斯检验、单因素方差分析和事后检验分析聊天机器人之间的差异。

结果

聊天机器人在准确性和可读性方面存在显著差异（p <.05）。Dental GPT的准确性得分最高，而ChatGPT-4的得分最低。DeepSeek-R1表现最佳，而Dental GPT在可读性方面表现最差。质量、可理解性、可操作性和读者教育得分没有显著差异。

结论

虽然聊天机器人的准确性可能有所不同，但特定领域训练的人工智能工具和ChatGPT-o1表现出更高的准确性。即使准确性很高，医疗保健中的错误信息也可能产生重大后果。提高回答的可读性至关重要，应据此选择聊天机器人。为了公共卫生，应监测聊天机器人信息的准确性和可读性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/abb3/12126883/a673657a1d47/12903_2025_6267_Fig1_HTML.jpg

相似文献

Comparative evaluation of responses from DeepSeek-R1, ChatGPT-o1, ChatGPT-4, and dental GPT chatbots to patient inquiries about dental and maxillofacial prostheses.

BMC Oral Health. 2025 May 31;25(1):871. doi: 10.1186/s12903-025-06267-w.

Performance of Artificial Intelligence Chatbots in Responding to Patient Queries Related to Traumatic Dental Injuries: A Comparative Study.

Dent Traumatol. 2025 Jun;41(3):338-347. doi: 10.1111/edt.13020. Epub 2024 Nov 22.

Assessing the quality and readability of patient education materials on chemotherapy cardiotoxicity from artificial intelligence chatbots: An observational cross-sectional study.

Medicine (Baltimore). 2025 Apr 11;104(15):e42135. doi: 10.1097/MD.0000000000042135.

Performance of artificial intelligence chatbots in responding to the frequently asked questions of patients regarding dental prostheses.

BMC Oral Health. 2025 Apr 15;25(1):574. doi: 10.1186/s12903-025-05965-9.

The promising role of chatbots in keratorefractive surgery patient education.

J Fr Ophtalmol. 2025 Feb;48(2):104381. doi: 10.1016/j.jfo.2024.104381. Epub 2024 Dec 13.

Accuracy and Readability of Artificial Intelligence Chatbot Responses to Vasectomy-Related Questions: Public Beware.

Cureus. 2024 Aug 28;16(8):e67996. doi: 10.7759/cureus.67996. eCollection 2024 Aug.

Readability, quality and accuracy of generative artificial intelligence chatbots for commonly asked questions about labor epidurals: a comparison of ChatGPT and Bard.

Int J Obstet Anesth. 2025 Feb;61:104317. doi: 10.1016/j.ijoa.2024.104317. Epub 2024 Dec 20.

Evaluating AI Chatbot Responses to Postkidney Transplant Inquiries.

Transplant Proc. 2025 Mar;57(2):394-405. doi: 10.1016/j.transproceed.2024.12.028. Epub 2025 Jan 14.

Assessing the Quality and Reliability of ChatGPT's Responses to Radiotherapy-Related Patient Queries: Comparative Study With GPT-3.5 and GPT-4.

JMIR Cancer. 2025 Apr 16;11:e63677. doi: 10.2196/63677.

Evaluating AI-generated patient education materials for spinal surgeries: Comparative analysis of readability and DISCERN quality across ChatGPT and deepseek models.

Int J Med Inform. 2025 Jun;198:105871. doi: 10.1016/j.ijmedinf.2025.105871. Epub 2025 Mar 13.

引用本文的文献

Capacity of Understanding the Future Approaches in Cancer Treatment by Multiple Models of Artificial Intelligence.

J Cancer Educ. 2025 Aug 15. doi: 10.1007/s13187-025-02706-y.

本文引用的文献

Comparison of vertical marginal gap and internal fit of chairside laminate veneers fabricated from advanced versus conventional lithium disilicate ceramics: An in vitro study.

J Prosthet Dent. 2025 Mar 18. doi: 10.1016/j.prosdent.2025.02.044.

Evaluating artificial intelligence chatbots for patient education in oral and maxillofacial radiology.

Oral Surg Oral Med Oral Pathol Oral Radiol. 2025 Jun;139(6):750-759. doi: 10.1016/j.oooo.2025.01.001. Epub 2025 Jan 11.

Dual Viral Infections and Para-Pneumonic Effusion in a Pediatric Patient: A Case of Respiratory Syncytial Virus and COVID-19 Complication.

Cureus. 2024 Dec 22;16(12):e76176. doi: 10.7759/cureus.76176. eCollection 2024 Dec.

Performance of the ChatGPT-3.5, ChatGPT-4, and Google Gemini large language models in responding to dental implantology inquiries.

J Prosthet Dent. 2025 Jan 4. doi: 10.1016/j.prosdent.2024.12.016.

Diagnostic performance of ChatGPT-4.0 in histopathological description analysis of oral and maxillofacial lesions: a comparative study with pathologists.

Oral Surg Oral Med Oral Pathol Oral Radiol. 2025 Apr;139(4):453-461. doi: 10.1016/j.oooo.2024.11.087. Epub 2024 Nov 28.

Performance of Artificial Intelligence Chatbots in Responding to Patient Queries Related to Traumatic Dental Injuries: A Comparative Study.

Dent Traumatol. 2025 Jun;41(3):338-347. doi: 10.1111/edt.13020. Epub 2024 Nov 22.

OpenAI o1-Preview vs. ChatGPT in Healthcare: A New Frontier in Medical AI Reasoning.

Cureus. 2024 Oct 1;16(10):e70640. doi: 10.7759/cureus.70640. eCollection 2024 Oct.

Can artificial intelligence models serve as patient information consultants in orthodontics?

BMC Med Inform Decis Mak. 2024 Jul 29;24(1):211. doi: 10.1186/s12911-024-02619-8.

Is ChatGPT an Accurate and Readable Patient Aid for Third Molar Extractions?

J Oral Maxillofac Surg. 2024 Oct;82(10):1239-1245. doi: 10.1016/j.joms.2024.06.177. Epub 2024 Jul 2.

Epigenetic dynamics of aging and cancer development: current concepts from studies mapping aging and cancer epigenomes.

Curr Opin Oncol. 2024 Mar 1;36(2):82-92. doi: 10.1097/CCO.0000000000001020. Epub 2024 Jan 17.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

对DeepSeek-R1、ChatGPT-o1、ChatGPT-4和牙科GPT聊天机器人针对患者有关口腔颌面修复体询问的回复进行比较评估。

Comparative evaluation of responses from DeepSeek-R1, ChatGPT-o1, ChatGPT-4, and dental GPT chatbots to patient inquiries about dental and maxillofacial prostheses.

作者信息

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSIONS

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献