评估ChatGPT-3.5和ChatGPT-4.0关于高脂血症的患者教育回复。

Evaluating ChatGPT-3.5 and ChatGPT-4.0 Responses on Hyperlipidemia for Patient Education.

作者信息

Lee Thomas J, Rao Abhinav K, Campbell Daniel J, Radfar Navid, Dayal Manik, Khrais Ayham

机构信息

Department of Medicine, Rutgers University New Jersey Medical School, Newark, USA.

Department of Medicine, Trident Medical Center, Charleston, USA.

出版信息

Cureus. 2024 May 25;16(5):e61067. doi: 10.7759/cureus.61067. eCollection 2024 May.

Introduction Hyperlipidemia is prevalent worldwide and affects a significant number of US adults. It significantly contributes to ischemic heart disease and millions of deaths annually. With the increasing use of the internet for health information, tools like ChatGPT (OpenAI, San Francisco, CA, USA) have gained traction. ChatGPT version 4.0, launched in March 2023, offers enhanced features over its predecessor but requires a monthly fee. This study compares the accuracy, comprehensibility, and response length of the free and paid versions of ChatGPT for patient education on hyperlipidemia. Materials and methods ChatGPT versions 3.5 and 4.0 were prompted in three different ways and 25 questions from the Cleveland Clinic's frequently asked questions (FAQs) on hyperlipidemia. Prompts included no prompting (Form 1), patient-friendly prompting (Form 2), and physician-level prompting (Form 3). Responses were categorized as incorrect, partially correct, or correct. Additionally, the grade level and word count from each response were recorded for analysis. Results Overall, scoring frequencies for ChatGPT version 3.5 were: five (6.67%) incorrect, 18 partially correct (24%), and 52 (69.33%) correct. Scoring frequencies for ChatGPT version 4.0 were: one (1.33%) incorrect, 18 (24.00%) partially correct, and 56 (74.67%) correct. Correct answers did not significantly differ between ChatGPT version 3.5 and ChatGPT version 4.0 (p = 0.586). ChatGPT version 3.5 had a significantly higher grade reading level than version 4.0 (p = 0.0002). ChatGPT version 3.5 had a significantly higher word count than version 4.0 (p = 0.0073). Discussion There was no significant difference in accuracy between the free and paid versions of hyperlipidemia FAQs. Both versions provided accurate but sometimes partially complete responses. Version 4.0 offered more concise and readable information, aligning with the readability of most online medical resources despite exceeding the National Institutes of Health's (NIH's) recommended eighth-grade reading level. The paid version demonstrated superior adaptability in tailoring responses based on the input. Conclusion Both versions of ChatGPT provide reliable medical information, with the paid version offering more adaptable and readable responses. Healthcare providers can recommend ChatGPT as a source of patient education, regardless of the version used. Future research should explore diverse question formulations and ChatGPT's handling of incorrect information.

引言

高脂血症在全球范围内普遍存在，影响着大量美国成年人。它是缺血性心脏病的重要成因，每年导致数百万人死亡。随着互联网在健康信息领域的使用日益增加，像ChatGPT（美国加利福尼亚州旧金山的OpenAI公司）这样的工具受到了广泛关注。2023年3月推出的ChatGPT 4.0版本相比其前身有了增强功能，但需要按月付费。本研究比较了ChatGPT免费版和付费版在高脂血症患者教育方面的准确性、可理解性和回复长度。

材料与方法

以三种不同方式向ChatGPT 3.5版和4.0版提问，问题来自克利夫兰诊所关于高脂血症的常见问题（FAQ）中的25个问题。提问方式包括无提示（形式1）、患者友好型提示（形式2）和医生水平提示（形式3）。回答被归类为错误、部分正确或正确。此外，记录每个回答的年级水平和单词数以供分析。

结果

总体而言，ChatGPT 3.5版的得分频率为：5个（6.67%）错误，18个（24%）部分正确，52个（69.33%）正确。ChatGPT 4.0版的得分频率为：1个（1.33%）错误，18个（24.00%）部分正确，56个（74.67%）正确。ChatGPT 3.5版和ChatGPT 4.0版的正确答案没有显著差异（p = 0.586）。ChatGPT 3.5版的阅读年级水平显著高于4.0版（p = 0.0002）。ChatGPT 3.5版的单词数显著多于4.0版（p = 0.0073）。

讨论

高脂血症常见问题的免费版和付费版在准确性上没有显著差异。两个版本都提供了准确但有时不完整的回答。4.0版提供了更简洁易读的信息，尽管超过了美国国立卫生研究院（NIH）推荐的八年级阅读水平，但与大多数在线医学资源的可读性相符。付费版在根据输入定制回答方面表现出更好的适应性。

结论

ChatGPT的两个版本都能提供可靠的医学信息，付费版提供了更具适应性和可读性的回答。医疗保健提供者可以推荐ChatGPT作为患者教育的来源，无论使用哪个版本。未来的研究应探索不同的问题表述方式以及ChatGPT对错误信息的处理。