Department of Emergency Medicine, Vagelos School of Physicians and Surgeons, Columbia University Irving Medical Center, New York, New York, USA.
Department of Psychiatry, New York University Grossman School of Medicine, New York, New York, USA.
Psychol Med. 2022 Apr;52(5):957-967. doi: 10.1017/S0033291720002718. Epub 2020 Aug 3.
Visual and auditory signs of patient functioning have long been used for clinical diagnosis, treatment selection, and prognosis. Direct measurement and quantification of these signals can aim to improve the consistency, sensitivity, and scalability of clinical assessment. Currently, we investigate if machine learning-based computer vision (CV), semantic, and acoustic analysis can capture clinical features from free speech responses to a brief interview 1 month post-trauma that accurately classify major depressive disorder (MDD) and posttraumatic stress disorder (PTSD).
N = 81 patients admitted to an emergency department (ED) of a Level-1 Trauma Unit following a life-threatening traumatic event participated in an open-ended qualitative interview with a para-professional about their experience 1 month following admission. A deep neural network was utilized to extract facial features of emotion and their intensity, movement parameters, speech prosody, and natural language content. These features were utilized as inputs to classify PTSD and MDD cross-sectionally.
Both video- and audio-based markers contributed to good discriminatory classification accuracy. The algorithm discriminates PTSD status at 1 month after ED admission with an AUC of 0.90 (weighted average precision = 0.83, recall = 0.84, and f1-score = 0.83) as well as depression status at 1 month after ED admission with an AUC of 0.86 (weighted average precision = 0.83, recall = 0.82, and f1-score = 0.82).
Direct clinical observation during post-trauma free speech using deep learning identifies digital markers that can be utilized to classify MDD and PTSD status.
患者功能的视觉和听觉迹象长期以来一直用于临床诊断、治疗选择和预后。对这些信号的直接测量和量化旨在提高临床评估的一致性、敏感性和可扩展性。目前,我们研究基于机器学习的计算机视觉 (CV)、语义和声学分析是否可以从创伤后 1 个月的简短访谈中患者的自由言语反应中捕捉到临床特征,从而准确地对重度抑郁症 (MDD) 和创伤后应激障碍 (PTSD) 进行分类。
N = 81 名因危及生命的创伤性事件而被收入 1 级创伤单位急诊部 (ED) 的患者在入院后 1 个月接受了由非专业人员进行的开放式定性访谈,以了解他们的经历。利用深度神经网络提取情绪及其强度的面部特征、运动参数、言语韵律和自然语言内容。这些特征被用作输入来对 PTSD 和 MDD 进行横截面分类。
基于视频和音频的标记都有助于良好的区分分类准确性。该算法在 ED 入院后 1 个月时区分 PTSD 状态的 AUC 为 0.90(加权平均精度= 0.83,召回率= 0.84,F1 分数= 0.83),以及在 ED 入院后 1 个月时区分抑郁状态的 AUC 为 0.86(加权平均精度= 0.83,召回率= 0.82,F1 分数= 0.82)。
使用深度学习对创伤后自由言语进行直接临床观察,可以识别出可用于分类 MDD 和 PTSD 状态的数字标记。