Suppr超能文献

ChatGPT标准版本与高级版本、谷歌Gemini和微软Copilot上前列腺癌筛查信息质量的比较:一项横断面研究。

A Comparison of Prostate Cancer Screening Information Quality on Standard and Advanced Versions of ChatGPT, Google Gemini, and Microsoft Copilot: A Cross-Sectional Study.

作者信息

Owens Otis L, Leonard Michael

机构信息

College of Social Work, University of South Carolina, Columbia, SC, USA.

出版信息

Am J Health Promot. 2025 Jun;39(5):766-776. doi: 10.1177/08901171251316371. Epub 2025 Jan 24.

Abstract

PurposeArtificially Intelligent (AI) chatbots have the potential to produce information to support shared prostate cancer (PrCA) decision-making. Therefore, our purpose was to evaluate and compare the accuracy, completeness, readability, and credibility of responses from standard and advanced versions of popular chatbots: ChatGPT-3.5, ChatGPT-4.0, Microsoft Copilot, Microsoft Copilot Pro, Google Gemini, and Google Gemini Advanced. We also investigated whether prompting chatbots for low-literacy PrCA information would improve the readability of responses. Lastly, we determined if the responses were appropriate for African-American men, who have the worst PrCA outcomes.ApproachThe study used a cross-sectional approach to examine the quality of responses solicited from chatbots.ParticipantsThe study did not include human subjects.MethodEleven frequently asked PrCA questions, based on resources produced by the Centers for Disease Control and Prevention (CDC) and the American Cancer Society (ACS), were posed to each chatbot twice (once for low literacy populations). A coding/rating form containing questions with key points/answers from the ACS or CDC to facilitate the rating process. Accuracy and completeness were rated dichotomously (i.e., yes/no). Credibility was determined by whether a trustworthy medical or health-related organization was cited. Readability was determined using a Flesch-Kincaid readability score calculator that enabled chatbot responses to be entered individually. Average accuracy, completeness, credibility, and readability percentages or scores were calculated using Excel.ResultsAll chatbots were accurate, but the completeness, readability, and credibility of responses varied. Soliciting low-literacy responses significantly improved readability, but sometimes at the detriment of completeness. All chatbots recognized the higher PrCA risk in African-American men and tailored screening recommendations. Microsoft Copilot Pro had the best overall performance on standard screening questions. Microsoft Copilot outperformed other chatbots on responses for low literacy populations.ConclusionsAI chatbots are useful tools for learning about PrCA screening but should be combined with healthcare provider advice.

摘要

目的

人工智能(AI)聊天机器人有潜力生成信息以支持前列腺癌(PrCA)的共同决策。因此,我们的目的是评估和比较流行聊天机器人的标准版本和高级版本(ChatGPT-3.5、ChatGPT-4.0、Microsoft Copilot、Microsoft Copilot Pro、Google Gemini和Google Gemini Advanced)回复的准确性、完整性、可读性和可信度。我们还研究了要求聊天机器人提供低识字水平的PrCA信息是否会提高回复的可读性。最后,我们确定这些回复是否适合PrCA结局最差的非裔美国男性。

方法

本研究采用横断面方法来检查从聊天机器人获得的回复质量。

参与者

本研究不包括人类受试者。

方法

根据美国疾病控制与预防中心(CDC)和美国癌症协会(ACS)提供的资源,向每个聊天机器人提出11个常见的PrCA问题(针对低识字水平人群提出一次)。使用一份包含来自ACS或CDC的问题及关键点/答案的编码/评分表,以方便评分过程。准确性和完整性采用二分法评分(即“是/否”)。可信度通过是否引用了值得信赖的医学或健康相关组织来确定。可读性使用Flesch-Kincaid可读性分数计算器来确定,该计算器可单独输入聊天机器人的回复。使用Excel计算平均准确性、完整性、可信度和可读性百分比或分数。

结果

所有聊天机器人的回复在准确性方面表现良好,但完整性、可读性和可信度各不相同。要求提供低识字水平的回复显著提高了可读性,但有时会牺牲完整性。所有聊天机器人都认识到非裔美国男性患PrCA的风险较高,并给出了针对性的筛查建议。Microsoft Copilot Pro在标准筛查问题上的总体表现最佳。Microsoft Copilot在针对低识字水平人群的回复方面优于其他聊天机器人。

结论

AI聊天机器人是了解PrCA筛查的有用工具,但应与医疗保健提供者的建议相结合。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验