ChatGPT-4.0还是DeepSeek-V3？全膝关节置换候选患者常见问题答案的比较分析。

ChatGPT-4.0 or DeepSeek-V3? Comparative analysis of answers to the most frequently asked questions by total knee replacement candidate patients.

作者信息

Gök Ümit, Pamuk Çağdaş, Serttaş Muhammed Fatih, Güçlü Seyit Ali, Çelik Veysel Emre, Gültekin Alper

机构信息

Orthopedics and Traumatology Clinic, University of Health Science Kocaeli City Hospital, Kocaeli, Turkey.

Private Silivri Anadolu Hospital Orthopedics and Traumatology Clinic, Istanbul, Turkey.

出版信息

Medicine (Baltimore). 2025 Aug 22;104(34):e43951. doi: 10.1097/MD.0000000000043951.

DOI:10.1097/MD.0000000000043951

PMID:40859557

Abstract

BACKGROUND

Total knee arthroplasty (TKA) is a surgical intervention that significantly improves patients' quality of life, but the preoperative process can cause uncertainty, anxiety, and a lack of information. In recent years, artificial intelligence (AI)-powered chatbots and large language models have begun to play important roles in patient information processes in the healthcare field. In this study, the answers given by chat generative pretrained transformer (ChatGPT)-4.0 and DeepSeek-V3 AI programs to the 10 most frequent questions about TKA asked by patients before surgery were compared, and the effectiveness of AI in the patient information process was analyzed with the evaluations of orthopedists.

METHODS

Using Google Trends, patient forums, and clinical experiences, the 10 questions that TKA patients are most curious about in the preoperative, peroperative, and postoperative periods were determined. These questions were directed to ChatGPT-4.0 and DeepSeek-V3, and the answers were recorded. Five orthopedists (minimum 5 year surgical experienced) evaluated the answers using a Likert scale (1-5) according to criteria such as scientific accuracy, explanatory power, understandability for the patient, and detailed content.

RESULTS

The mean Likert score of ChatGPT-4.0 (4.7 ± 0.2) was found higher than the mean Likert score of DeepSeek-V3 (3.5 ± 0.3) (P < .05). ChatGPT-4.0 provided more comprehensive and detailed information, while DeepSeek-V3 provided superficial answers, especially in the answers to questions such as "life of the prosthesis," "postoperative complications," and "return to daily activities."

CONCLUSION

Our study showed that ChatGPT-4.0 is more effective than DeepSeek-V3 in terms of patient information regarding total knee replacement. It is emphasized that AI-supported systems are a fast and accessible source of information for patient education; however this information must be inspected by medical authorities for accuracy. Future studies should be conducted with larger patient populations, to increase the reliability of AI-based patient information systems and ensure their integration into clinical practice.

LEVEL OF EVIDENCE

Level 5.

摘要

背景

全膝关节置换术（TKA）是一种能显著改善患者生活质量的外科手术，但术前过程可能会导致患者产生不确定性、焦虑情绪以及信息匮乏。近年来，人工智能（AI）驱动的聊天机器人和大语言模型已开始在医疗领域的患者信息获取过程中发挥重要作用。在本研究中，比较了聊天生成预训练变换器（ChatGPT）-4.0和深势科技-V3人工智能程序对患者术前关于TKA的10个最常见问题的回答，并通过骨科医生的评估分析了人工智能在患者信息获取过程中的有效性。

方法

利用谷歌趋势、患者论坛和临床经验，确定了TKA患者在术前、术中及术后最感兴趣的10个问题。将这些问题发给ChatGPT-4.0和深势科技-V3，并记录答案。5名骨科医生（至少有5年手术经验）根据科学准确性、解释力、患者可理解性和详细内容等标准，使用李克特量表（1-5）对答案进行评估。

结果

发现ChatGPT-4.0的平均李克特评分（4.7±0.2）高于深势科技-V3的平均李克特评分（3.5±0.3）（P<.05）。ChatGPT-4.0提供了更全面、详细的信息，而深势科技-V3提供的回答较为肤浅，尤其是在回答“假体使用寿命”“术后并发症”和“恢复日常活动”等问题时。

结论

我们的研究表明，在全膝关节置换患者信息获取方面，ChatGPT-4.0比深势科技-V3更有效。强调人工智能支持的系统是患者教育的快速且可获取的信息来源；然而，这些信息必须由医学权威机构检查其准确性。未来的研究应以更大规模的患者群体进行，以提高基于人工智能的患者信息系统的可靠性，并确保其融入临床实践。