Suppr超能文献

评估人工智能模型在脊柱侧弯分类中的准确性及建议的治疗方法。

Assessing the Accuracy of Artificial Intelligence Models in Scoliosis Classification and Suggested Therapeutic Approaches.

作者信息

Fabijan Artur, Zawadzka-Fabijan Agnieszka, Fabijan Robert, Zakrzewski Krzysztof, Nowosławska Emilia, Polis Bartosz

机构信息

Department of Neurosurgery, Polish-Mother's Memorial Hospital Research Institute, 93-338 Lodz, Poland.

Department of Rehabilitation Medicine, Faculty of Health Sciences, Medical University of Lodz, 90-419 Lodz, Poland.

出版信息

J Clin Med. 2024 Jul 9;13(14):4013. doi: 10.3390/jcm13144013.

Abstract

Open-source artificial intelligence models (OSAIMs) are increasingly being applied in various fields, including IT and medicine, offering promising solutions for diagnostic and therapeutic interventions. In response to the growing interest in AI for clinical diagnostics, we evaluated several OSAIMs-such as ChatGPT 4, Microsoft Copilot, Gemini, PopAi, You Chat, Claude, and the specialized PMC-LLaMA 13B-assessing their abilities to classify scoliosis severity and recommend treatments based on radiological descriptions from AP radiographs. Our study employed a two-stage methodology, where descriptions of single-curve scoliosis were analyzed by AI models following their evaluation by two independent neurosurgeons. Statistical analysis involved the Shapiro-Wilk test for normality, with non-normal distributions described using medians and interquartile ranges. Inter-rater reliability was assessed using Fleiss' kappa, and performance metrics, like accuracy, sensitivity, specificity, and F1 scores, were used to evaluate the AI systems' classification accuracy. The analysis indicated that although some AI systems, like ChatGPT 4, Copilot, and PopAi, accurately reflected the recommended Cobb angle ranges for disease severity and treatment, others, such as Gemini and Claude, required further calibration. Particularly, PMC-LLaMA 13B expanded the classification range for moderate scoliosis, potentially influencing clinical decisions and delaying interventions. These findings highlight the need for the continuous refinement of AI models to enhance their clinical applicability.

摘要

开源人工智能模型(OSAIMs)越来越多地应用于包括信息技术和医学在内的各个领域,为诊断和治疗干预提供了有前景的解决方案。为了回应临床诊断领域对人工智能日益增长的兴趣,我们评估了几个开源人工智能模型,如ChatGPT 4、Microsoft Copilot、Gemini、豆包、文心一言、Claude以及专门的PMC-LLaMA 13B,评估它们根据前后位X线片的影像学描述对脊柱侧弯严重程度进行分类并推荐治疗方法的能力。我们的研究采用了两阶段方法,在两名独立神经外科医生对单曲线脊柱侧弯描述进行评估之后,由人工智能模型对其进行分析。统计分析包括用于正态性检验的夏皮罗-威尔克检验,非正态分布用中位数和四分位距来描述。使用弗莱iss卡方评估评分者间的可靠性,并使用准确率、灵敏度、特异性和F1分数等性能指标来评估人工智能系统的分类准确性。分析表明,尽管一些人工智能系统,如ChatGPT 4、Copilot和豆包,准确反映了疾病严重程度和治疗的推荐Cobb角范围,但其他系统,如Gemini和Claude,则需要进一步校准。特别是,PMC-LLaMA 13B扩大了中度脊柱侧弯的分类范围,可能影响临床决策并延迟干预。这些发现凸显了持续改进人工智能模型以提高其临床适用性的必要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ee64/11278075/ebc98361efd2/jcm-13-04013-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验