Biochemistry Department, Faculty of Medicine, Kufa University, Najaf, Iraq.
Najaf Health Directorate, Ministry of Health, Baghdad, Iraq.
Sci Rep. 2024 Apr 8;14(1):8233. doi: 10.1038/s41598-024-58964-1.
With the release of ChatGPT at the end of 2022, a new era of thinking and technology use has begun. Artificial intelligence models (AIs) like Gemini (Bard), Copilot (Bing), and ChatGPT-3.5 have the potential to impact every aspect of our lives, including laboratory data interpretation. To assess the accuracy of ChatGPT-3.5, Copilot, and Gemini responses in evaluating biochemical data. Ten simulated patients' biochemical laboratory data, including serum urea, creatinine, glucose, cholesterol, triglycerides, low-density lipoprotein (LDL-c), and high-density lipoprotein (HDL-c), in addition to HbA1c, were interpreted by three AIs: Copilot, Gemini, and ChatGPT-3.5, followed by evaluation with three raters. The study was carried out using two approaches. The first encompassed all biochemical data. The second contained only kidney function data. The first approach indicated Copilot to have the highest level of accuracy, followed by Gemini and ChatGPT-3.5. Friedman and Dunn's post-hoc test revealed that Copilot had the highest mean rank; the pairwise comparisons revealed significant differences for Copilot vs. ChatGPT-3.5 (P = 0.002) and Gemini (P = 0.008). The second approach exhibited Copilot to have the highest accuracy of performance. The Friedman test with Dunn's post-hoc analysis showed Copilot to have the highest mean rank. The Wilcoxon Signed-Rank Test demonstrated an indistinguishable response (P = 0.5) of Copilot when all laboratory data were applied vs. the application of only kidney function data. Copilot is more accurate in interpreting biochemical data than Gemini and ChatGPT-3.5. Its consistent responses across different data subsets highlight its reliability in this context.
随着 2022 年末 ChatGPT 的发布,一个新的思维和技术应用时代已经开始。像 Gemini(Bard)、Copilot(Bing)和 ChatGPT-3.5 这样的人工智能模型有可能影响我们生活的方方面面,包括实验室数据解释。为了评估 ChatGPT-3.5、Copilot 和 Gemini 在评估生化数据方面的准确性,我们对三种 AI 进行了评估:Copilot、Gemini 和 ChatGPT-3.5。首先由三个评分者对模拟患者的十种生化实验室数据进行评估,包括血清尿素、肌酐、葡萄糖、胆固醇、甘油三酯、低密度脂蛋白(LDL-c)和高密度脂蛋白(HDL-c),以及 HbA1c。研究采用两种方法进行。第一种方法包括所有生化数据,第二种方法仅包含肾功能数据。第一种方法表明 Copilot 的准确性最高,其次是 Gemini 和 ChatGPT-3.5。Friedman 和 Dunn 的事后检验显示 Copilot 的平均秩最高;两两比较显示 Copilot 与 ChatGPT-3.5(P=0.002)和 Gemini(P=0.008)之间存在显著差异。第二种方法显示 Copilot 的准确性最高。Dunn 的 Friedman 检验后检验显示 Copilot 的平均秩最高。Wilcoxon 符号秩检验显示,当应用所有实验室数据与仅应用肾功能数据时,Copilot 的反应没有区别(P=0.5)。Copilot 在解释生化数据方面比 Gemini 和 ChatGPT-3.5 更准确。它在不同数据子集中的一致反应突出了它在这种情况下的可靠性。