Campellone Timothy R, Flom Megan, Montgomery Robert M, Bullard Lauren, Pirner Maddison C, Pavez Aaron, Morales Michelle, Harper Devin, Oddy Catherine, O'Connor Tom, Daniels Jade, Eaneff Stephanie, Forman-Hoffman Valerie L, Sackett Casey, Darcy Alison
Woebot Health, San Francisco, CA, United States.
J Med Internet Res. 2025 May 23;27:e67365. doi: 10.2196/67365.
General awareness and exposure to generative artificial intelligence (AI) have increased recently. This transformative technology has the potential to create a more dynamic and engaging user experience in digital mental health interventions (DMHIs). However, if not appropriately used and controlled, it can introduce risks to users that may result in harm and erode trust. At the time of conducting this trial, there had not been a rigorous evaluation of an approach to safely implementing generative AI in a DMHI.
This study aims to explore the user relationship, experience, safety, and technical guardrails of a DMHI using generative AI compared with a rules-based intervention.
We conducted a 2-week exploratory randomized controlled trial (RCT) with 160 adult participants randomized to receive a generative AI (n=81) or rules-based (n=79) version of a conversation-based DMHI. Self-report measures of the user relationship (client satisfaction, working alliance bond, and accuracy of empathic listening and reflection) and experience (engagement metrics, adverse events, and technical guardrail success) were collected. Descriptions and validation of technical guardrails for handling user inputs (eg, detecting potentially concerning language and off-topic responses) and model outputs (eg, not providing medical advice and not providing a diagnosis) are provided, along with examples to illustrate how they worked. Safety monitoring was conducted throughout the trial for adverse events, and the success of technical guardrails created for the generative arm was assessed post trial.
In general, the majority of measures of user relationship and experience appeared to be similar in both the generative and rules-based arms. The generative arm appeared to be more accurate at detecting and responding to user statements with empathy (98% accuracy vs 69%). There were no serious or device-related adverse events, and technical guardrails were shown to be 100% successful in posttrial review of generated statements. A majority of participants in both groups reported an increase in positive sentiment (62% and 66%) about AI at the end of the trial.
This trial provides initial evidence that, with the right guardrails and process, generative AI can be successfully used in a digital mental health intervention (DMHI) while maintaining the user experience and relationship. It also provides an initial blueprint for approaches to technical and conversational guardrails that can be replicated to build a safe DMHI.
ClinicalTrials.gov NCT05948670; https://clinicaltrials.gov/study/NCT05948670.
近年来,人们对生成式人工智能(AI)的总体认知和接触有所增加。这项变革性技术有潜力在数字心理健康干预(DMHI)中创造更具活力和吸引力的用户体验。然而,如果使用和控制不当,它可能给用户带来风险,导致伤害并削弱信任。在进行这项试验时,尚未对在DMHI中安全实施生成式AI的方法进行严格评估。
本研究旨在探索与基于规则的干预相比,使用生成式AI的DMHI的用户关系、体验、安全性和技术保障措施。
我们进行了一项为期2周的探索性随机对照试验(RCT),160名成年参与者被随机分配接受基于生成式AI(n = 81)或基于规则(n = 79)的基于对话的DMHI版本。收集了用户关系(客户满意度、工作联盟纽带以及共情倾听和反馈的准确性)和体验(参与度指标、不良事件和技术保障措施的成功情况)的自我报告测量数据。提供了用于处理用户输入(例如,检测潜在的相关语言和离题回复)和模型输出(例如,不提供医疗建议和不进行诊断)的技术保障措施的描述和验证,并举例说明它们的工作方式。在整个试验过程中对不良事件进行安全监测,并在试验后评估为生成式组创建的技术保障措施的成功情况。
总体而言,生成式组和基于规则的组中,大多数用户关系和体验指标似乎相似。生成式组在以共情方式检测和回应用户陈述方面似乎更准确(准确率为98%,而另一组为69%)。没有严重或与设备相关的不良事件,并且在对生成陈述的试验后审查中,技术保障措施被证明100%成功。两组中的大多数参与者在试验结束时报告对AI的积极情绪有所增加(分别为62%和66%)。
这项试验提供了初步证据,表明通过正确的保障措施和流程,生成式AI可以成功用于数字心理健康干预(DMHI),同时保持用户体验和关系。它还为技术和对话保障措施的方法提供了初步蓝图,这些方法可以被复制以构建一个安全的DMHI。
ClinicalTrials.gov NCT05948670;https://clinicaltrials.gov/study/NCT05948670。