ChatGPT 3.5 Copilot 和 Gemini 解读生化实验室数据的反应准确性：一项初步研究。

Response accuracy of ChatGPT 3.5 Copilot and Gemini in interpreting biochemical laboratory data a pilot study.

机构信息

Biochemistry Department, Faculty of Medicine, Kufa University, Najaf, Iraq.

Najaf Health Directorate, Ministry of Health, Baghdad, Iraq.

出版信息

Sci Rep. 2024 Apr 8;14(1):8233. doi: 10.1038/s41598-024-58964-1.

DOI:10.1038/s41598-024-58964-1

PMID:38589613

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11002004/

Abstract

With the release of ChatGPT at the end of 2022, a new era of thinking and technology use has begun. Artificial intelligence models (AIs) like Gemini (Bard), Copilot (Bing), and ChatGPT-3.5 have the potential to impact every aspect of our lives, including laboratory data interpretation. To assess the accuracy of ChatGPT-3.5, Copilot, and Gemini responses in evaluating biochemical data. Ten simulated patients' biochemical laboratory data, including serum urea, creatinine, glucose, cholesterol, triglycerides, low-density lipoprotein (LDL-c), and high-density lipoprotein (HDL-c), in addition to HbA1c, were interpreted by three AIs: Copilot, Gemini, and ChatGPT-3.5, followed by evaluation with three raters. The study was carried out using two approaches. The first encompassed all biochemical data. The second contained only kidney function data. The first approach indicated Copilot to have the highest level of accuracy, followed by Gemini and ChatGPT-3.5. Friedman and Dunn's post-hoc test revealed that Copilot had the highest mean rank; the pairwise comparisons revealed significant differences for Copilot vs. ChatGPT-3.5 (P = 0.002) and Gemini (P = 0.008). The second approach exhibited Copilot to have the highest accuracy of performance. The Friedman test with Dunn's post-hoc analysis showed Copilot to have the highest mean rank. The Wilcoxon Signed-Rank Test demonstrated an indistinguishable response (P = 0.5) of Copilot when all laboratory data were applied vs. the application of only kidney function data. Copilot is more accurate in interpreting biochemical data than Gemini and ChatGPT-3.5. Its consistent responses across different data subsets highlight its reliability in this context.

摘要

随着 2022 年末 ChatGPT 的发布，一个新的思维和技术应用时代已经开始。像 Gemini（Bard）、Copilot（Bing）和 ChatGPT-3.5 这样的人工智能模型有可能影响我们生活的方方面面，包括实验室数据解释。为了评估 ChatGPT-3.5、Copilot 和 Gemini 在评估生化数据方面的准确性，我们对三种 AI 进行了评估：Copilot、Gemini 和 ChatGPT-3.5。首先由三个评分者对模拟患者的十种生化实验室数据进行评估，包括血清尿素、肌酐、葡萄糖、胆固醇、甘油三酯、低密度脂蛋白（LDL-c）和高密度脂蛋白（HDL-c），以及 HbA1c。研究采用两种方法进行。第一种方法包括所有生化数据，第二种方法仅包含肾功能数据。第一种方法表明 Copilot 的准确性最高，其次是 Gemini 和 ChatGPT-3.5。Friedman 和 Dunn 的事后检验显示 Copilot 的平均秩最高；两两比较显示 Copilot 与 ChatGPT-3.5（P=0.002）和 Gemini（P=0.008）之间存在显著差异。第二种方法显示 Copilot 的准确性最高。Dunn 的 Friedman 检验后检验显示 Copilot 的平均秩最高。Wilcoxon 符号秩检验显示，当应用所有实验室数据与仅应用肾功能数据时，Copilot 的反应没有区别（P=0.5）。Copilot 在解释生化数据方面比 Gemini 和 ChatGPT-3.5 更准确。它在不同数据子集中的一致反应突出了它在这种情况下的可靠性。

相似文献

Response accuracy of ChatGPT 3.5 Copilot and Gemini in interpreting biochemical laboratory data a pilot study.

Sci Rep. 2024 Apr 8;14(1):8233. doi: 10.1038/s41598-024-58964-1.

Can artificial intelligence models serve as patient information consultants in orthodontics?

BMC Med Inform Decis Mak. 2024 Jul 29;24(1):211. doi: 10.1186/s12911-024-02619-8.

Assessment of readability, reliability, and quality of ChatGPT®, BARD®, Gemini®, Copilot®, Perplexity® responses on palliative care.

Medicine (Baltimore). 2024 Aug 16;103(33):e39305. doi: 10.1097/MD.0000000000039305.

Comparative accuracy of ChatGPT-4, Microsoft Copilot and Google Gemini in the Italian entrance test for healthcare sciences degrees: a cross-sectional study.

BMC Med Educ. 2024 Jun 26;24(1):694. doi: 10.1186/s12909-024-05630-9.

Assessing the Responses of Large Language Models (ChatGPT-4, Gemini, and Microsoft Copilot) to Frequently Asked Questions in Breast Imaging: A Study on Readability and Accuracy.

Cureus. 2024 May 9;16(5):e59960. doi: 10.7759/cureus.59960. eCollection 2024 May.

Can large language models provide accurate and quality information to parents regarding chronic kidney diseases?

J Eval Clin Pract. 2024 Dec;30(8):1556-1564. doi: 10.1111/jep.14084. Epub 2024 Jul 3.

Quality of information about urologic pathology in English and Spanish from ChatGPT, BARD, and Copilot.

Actas Urol Esp (Engl Ed). 2024 Jun;48(5):398-403. doi: 10.1016/j.acuroe.2024.02.009. Epub 2024 Feb 17.

Evaluating text and visual diagnostic capabilities of large language models on questions related to the Breast Imaging Reporting and Data System Atlas 5 edition.

Diagn Interv Radiol. 2025 Mar 3;31(2):111-129. doi: 10.4274/dir.2024.242876. Epub 2024 Sep 9.

Examining the Performance of ChatGPT 3.5 and Microsoft Copilot in Otolaryngology: A Comparative Study with Otolaryngologists' Evaluation.

Indian J Otolaryngol Head Neck Surg. 2024 Aug;76(4):3465-3469. doi: 10.1007/s12070-024-04729-1. Epub 2024 May 1.

Can AI Answer My Questions? Utilizing Artificial Intelligence in the Perioperative Assessment for Abdominoplasty Patients.

Aesthetic Plast Surg. 2024 Nov;48(22):4712-4724. doi: 10.1007/s00266-024-04157-0. Epub 2024 Jun 19.

引用本文的文献

Codeless Development of a Customized SMILE Nomogram Using a Large Language Model: A Practical Framework for Clinicians.

J Ophthalmol. 2025 Jul 15;2025:9930116. doi: 10.1155/joph/9930116. eCollection 2025.

Potential role of large language models and personalized medicine to innovate cardiac rehabilitation.

World J Clin Cases. 2025 Jul 6;13(19):98095. doi: 10.12998/wjcc.v13.i19.98095.

The Diagnostic Performance of Large Language Models and Oral Medicine Consultants for Identifying Oral Lesions in Text-Based Clinical Scenarios: Prospective Comparative Study.

JMIR AI. 2025 Apr 24;4:e70566. doi: 10.2196/70566.

Comparison of performance of artificial intelligence tools in answering emergency medicine question pool: ChatGPT 4.0, Google Gemini and Microsoft Copilot.

Pak J Med Sci. 2025 Apr;41(4):968-972. doi: 10.12669/pjms.41.4.11178.

Accuracy of LLMs in medical education: evidence from a concordance test with medical teacher.

BMC Med Educ. 2025 Mar 26;25(1):443. doi: 10.1186/s12909-025-07009-w.

Comparative Analysis of Information Quality in Pediatric Otorhinolaryngology: Clinicians, Residents, and Large Language Models.

Otolaryngol Head Neck Surg. 2025 Jul;173(1):228-236. doi: 10.1002/ohn.1225. Epub 2025 Mar 19.

Artificial intelligence in healthcare education: evaluating the accuracy of ChatGPT, Copilot, and Google Gemini in cardiovascular pharmacology.

Front Med (Lausanne). 2025 Feb 19;12:1495378. doi: 10.3389/fmed.2025.1495378. eCollection 2025.

Enhancing Patient Education on Cardiovascular Rehabilitation with Large Language Models.

Mo Med. 2025 Jan-Feb;122(1):67-71.

Evaluation of Chatbots in the Emergency Management of Avulsion Injuries.

Dent Traumatol. 2025 Aug;41(4):437-444. doi: 10.1111/edt.13041. Epub 2025 Jan 24.

Pulmonary Embolism Education: Role of Generative Artificial Intelligence Models.

Mo Med. 2024 Nov-Dec;121(6):495-498.

本文引用的文献

Methods for using Bing's AI-powered search engine for data extraction for a systematic review.

Res Synth Methods. 2024 Mar;15(2):347-353. doi: 10.1002/jrsm.1689. Epub 2023 Dec 8.

Artificial Intelligence in Clinical Chemistry: Dawn of a New Era?

Indian J Clin Biochem. 2023 Oct;38(4):405-406. doi: 10.1007/s12291-023-01150-3. Epub 2023 Sep 14.

Large Language Models in Hematology Case Solving: A Comparative Study of ChatGPT-3.5, Google Bard, and Microsoft Bing.

Cureus. 2023 Aug 21;15(8):e43861. doi: 10.7759/cureus.43861. eCollection 2023 Aug.

Can artificial intelligence replace biochemists? A study comparing interpretation of thyroid function test results by ChatGPT and Google Bard to practising biochemists.

Ann Clin Biochem. 2024 Mar;61(2):143-149. doi: 10.1177/00045632231203473. Epub 2023 Sep 20.

Evaluating the Performance of ChatGPT in Ophthalmology: An Analysis of Its Successes and Shortcomings.

Ophthalmol Sci. 2023 May 5;3(4):100324. doi: 10.1016/j.xops.2023.100324. eCollection 2023 Dec.

Assessing the Accuracy and Clinical Utility of ChatGPT in Laboratory Medicine.

Clin Chem. 2023 Aug 2;69(8):939-940. doi: 10.1093/clinchem/hvad058.

Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum.

JAMA Intern Med. 2023 Jun 1;183(6):589-596. doi: 10.1001/jamainternmed.2023.1838.

Artificial intelligence-based chatbot patient information on common retinal diseases using ChatGPT.

Acta Ophthalmol. 2023 Nov;101(7):829-831. doi: 10.1111/aos.15661. Epub 2023 Mar 13.

Artificial Intelligence Applications in Clinical Chemistry.

Clin Lab Med. 2023 Mar;43(1):47-69. doi: 10.1016/j.cll.2022.09.005. Epub 2022 Dec 15.

Data Analytics in Healthcare: A Tertiary Study.

SN Comput Sci. 2023;4(1):87. doi: 10.1007/s42979-022-01507-0. Epub 2022 Dec 9.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

ChatGPT 3.5 Copilot 和 Gemini 解读生化实验室数据的反应准确性：一项初步研究。

Response accuracy of ChatGPT 3.5 Copilot and Gemini in interpreting biochemical laboratory data a pilot study.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献