Molena Kelly F, Macedo Ana P, Ijaz Anum, Carvalho Fabrício K, Gallo Maria Julia D, Wanderley Garcia de Paula E Silva Francisco, de Rossi Andiara, Mezzomo Luis A, Mugayar Leda Regina F, Queiroz Alexandra M
Department of Pediatric Dentistry, School of Dentistry of Ribeirão Preto at University of São Paulo, Ribeirão Preto, BRA.
Department of Dental Materials and Prosthesis, School of Dentistry of Ribeirão Preto at University of São Paulo, Ribeirão Preto, BRA.
Cureus. 2024 Jul 29;16(7):e65658. doi: 10.7759/cureus.65658. eCollection 2024 Jul.
Artificial intelligence (AI) can be a tool in the diagnosis and acquisition of knowledge, particularly in dentistry, sparking debates on its application in clinical decision-making.
This study aims to evaluate the accuracy, completeness, and reliability of the responses generated by Chatbot Generative Pre-Trained Transformer (ChatGPT) 3.5 in dentistry using expert-formulated questions.
Experts were invited to create three questions, answers, and respective references according to specialized fields of activity. The Likert scale was used to evaluate agreement levels between experts and ChatGPT responses. Statistical analysis compared descriptive and binary question groups in terms of accuracy and completeness. Questions with low accuracy underwent re-evaluation, and subsequent responses were compared for improvement. The Wilcoxon test was utilized (α = 0.05).
Ten experts across six dental specialties generated 30 binary and descriptive dental questions and references. The accuracy score had a median of 5.50 and a mean of 4.17. For completeness, the median was 2.00 and the mean was 2.07. No difference was observed between descriptive and binary responses for accuracy and completeness. However, re-evaluated responses showed a significant improvement with a significant difference in accuracy (median 5.50 vs. 6.00; mean 4.17 vs. 4.80; p=0.042) and completeness (median 2.0 vs. 2.0; mean 2.07 vs. 2.30; p=0.011). References were more incorrect than correct, with no differences between descriptive and binary questions.
ChatGPT initially demonstrated good accuracy and completeness, which was further improved with machine learning (ML) over time. However, some inaccurate answers and references persisted. Human critical discernment continues to be essential to facing complex clinical cases and advancing theoretical knowledge and evidence-based practice.
人工智能(AI)可以成为诊断和知识获取的工具,尤其是在牙科领域,这引发了关于其在临床决策中应用的争论。
本研究旨在使用专家制定的问题评估Chatbot生成式预训练变换器(ChatGPT)3.5在牙科领域生成的回答的准确性、完整性和可靠性。
邀请专家根据专业活动领域创建三个问题、答案及相应参考文献。采用李克特量表评估专家与ChatGPT回答之间的一致程度。统计分析比较了描述性问题组和二元问题组在准确性和完整性方面的情况。对准确性低的问题进行重新评估,并比较后续回答的改进情况。采用威尔科克森检验(α = 0.05)。
来自六个牙科专业的十位专家提出了30个二元和描述性牙科问题及参考文献。准确性得分中位数为5.50,平均值为4.17。完整性方面,中位数为2.00,平均值为2.07。在准确性和完整性方面,描述性回答和二元回答之间未观察到差异。然而,重新评估后的回答显示出显著改善,准确性有显著差异(中位数5.50对6.00;平均值4.17对4.80;p = 0.042),完整性也有显著差异(中位数2.0对2.0;平均值2.07对2.30;p = 0.011)。参考文献错误的比正确的多,描述性问题和二元问题之间没有差异。
ChatGPT最初表现出良好的准确性和完整性,随着时间的推移通过机器学习(ML)得到了进一步改善。然而,一些不准确的答案和参考文献仍然存在。面对复杂的临床病例以及推进理论知识和循证实践,人类的批判性辨别仍然至关重要。