Suppr超能文献

生成式人工智能大语言模型识别退伍军人自杀风险的有效性:与使用风险分层模型的人类心理健康提供者的比较。

Effectiveness of generative AI-large language models' recognition of veteran suicide risk: a comparison with human mental health providers using a risk stratification model.

作者信息

Lauderdale Sean A, Schmitt Randee, Wuckovich Breanna, Dalal Natashaa, Desai Hela, Tomlinson Shealyn

机构信息

Department of Psychological and Behavioral Sciences, University of Houston - Clear Lake, Houston, TX, United States.

出版信息

Front Psychiatry. 2025 Apr 3;16:1544951. doi: 10.3389/fpsyt.2025.1544951. eCollection 2025.

Abstract

BACKGROUND

With over 6,300 United States military veterans dying by suicide annually, the Veterans Health Administration (VHA) is exploring innovative strategies, including artificial intelligence (AI), for suicide risk assessment. Machine learning has been predominantly utilized, but the application of generative AI-large language models (GAI-LLMs) remains unexplored.

OBJECTIVE

This study evaluates the effectiveness of GAI-LLMs, specifically ChatGPT-3.5, ChatGPT-4o, and Google Gemini, in using the VHA's Risk Stratification Table for identifying suicide risks and making treatment recommendations in response to standardized veteran vignettes.

METHODS

We compared the GAI-LLMs' assessments and recommendations for both acute and chronic suicide risks to evaluations by mental health care providers (MHCPs). Four vignettes, representing varying levels of suicide risk, were used.

RESULTS

GAI-LLMs' assessments showed discrepancies with MHCPs, particularly rating the most acute case as less acute and the least acute case as more acute. For chronic risk, GAI-LLMs' evaluations were generally in line with MHCPs, except for one vignette rated with higher chronic risk by the GAI-LLM. Variation across GAI-LLMs was also observed. Notably, ChatGPT-3.5 showed lower acute risk ratings compared to ChatGPT-4o and Google Gemini, while ChatGPT-4o identified higher chronic risk ratings and recommended hospitalization for all veterans. Treatment planning by GAI-LLMs was predicted by chronic but not acute risk ratings.

CONCLUSION

While GAI-LLMs offers potential suicide risk assessment comparable to MHCPs, significant variation exists across different GAI-LLMs in both risk evaluation and treatment recommendations. Continued MHCP oversight is essential to ensure accuracy and appropriate care.

IMPLICATIONS

These findings highlight the need for further research into optimizing GAI-LLMs for consistent and reliable use in clinical settings, ensuring they complement rather than replace human expertise.

摘要

背景

每年有超过6300名美国退伍军人自杀身亡,退伍军人健康管理局(VHA)正在探索创新策略,包括人工智能(AI),用于自杀风险评估。机器学习已被广泛应用,但生成式人工智能——大语言模型(GAI-LLMs)的应用仍未得到探索。

目的

本研究评估GAI-LLMs,特别是ChatGPT-3.5、ChatGPT-4o和谷歌Gemini,在使用VHA的风险分层表识别自杀风险并针对标准化退伍军人病例做出治疗建议方面的有效性。

方法

我们将GAI-LLMs对急性和慢性自杀风险的评估及建议与精神卫生保健提供者(MHCPs)的评估进行了比较。使用了四个代表不同自杀风险水平的病例。

结果

GAI-LLMs的评估与MHCPs存在差异,特别是将最急性的病例评为急性程度较低,而将最不急性的病例评为急性程度较高。对于慢性风险,GAI-LLMs的评估总体上与MHCPs一致,但有一个病例被GAI-LLM评为慢性风险较高。不同GAI-LLMs之间也存在差异。值得注意的是,与ChatGPT-4o和谷歌Gemini相比,ChatGPT-3.5显示出较低的急性风险评级,而ChatGPT-4o识别出较高的慢性风险评级,并建议所有退伍军人住院治疗。GAI-LLMs的治疗计划是由慢性风险评级而非急性风险评级预测的。

结论

虽然GAI-LLMs在自杀风险评估方面具有与MHCPs相当的潜力,但不同的GAI-LLMs在风险评估和治疗建议方面存在显著差异。持续的MHCP监督对于确保准确性和适当护理至关重要。

启示

这些发现凸显了进一步研究优化GAI-LLMs以便在临床环境中持续可靠使用的必要性,确保它们补充而非取代人类专业知识。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7df1/12003356/5f8936a97a3d/fpsyt-16-1544951-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验