Lyu Sihua, Ren Xiaopeng, Du Yihua, Zhao Nan
CAS Key Laboratory of Behavioral Science, Institute of Psychology, Chinese Academy of Sciences, Beijing, China.
Department of Psychology, University of Chinese Academy of Sciences, Beijing, China.
Front Psychiatry. 2023 Feb 9;14:1121583. doi: 10.3389/fpsyt.2023.1121583. eCollection 2023.
In recent years, research has used psycholinguistic features in public discourse, networking behaviors on social media and profile information to train models for depression detection. However, the most widely adopted approach for the extraction of psycholinguistic features is to use the Linguistic Inquiry Word Count (LIWC) dictionary and various affective lexicons. Other features related to cultural factors and suicide risk have not been explored. Moreover, the use of social networking behavioral features and profile features would limit the generalizability of the model. Therefore, our study aimed at building a prediction model of depression for text-only social media data through a wider range of possible linguistic features related to depression, and illuminate the relationship between linguistic expression and depression.
We collected 789 users' depression scores as well as their past posts on Weibo, and extracted a total of 117 lexical features Simplified Chinese Linguistic Inquiry Word Count, Chinese Suicide Dictionary, Chinese Version of Moral Foundations Dictionary, Chinese Version of Moral Motivation Dictionary, and Chinese Individualism/Collectivism Dictionary.
Results showed that all the dictionaries contributed to the prediction. The best performing model occurred with linear regression, with the Pearson correlation coefficient between predicted values and self-reported values was 0.33, the R-squared was 0.10, and the split-half reliability was 0.75.
This study did not only develop a predictive model applicable to text-only social media data, but also demonstrated the importance taking cultural psychological factors and suicide related expressions into consideration in the calculation of word frequency. Our research provided a more comprehensive understanding of how lexicons related to cultural psychology and suicide risk were associated with depression, and could contribute to the recognition of depression.
近年来,研究利用公共话语中的心理语言学特征、社交媒体上的网络行为和个人资料信息来训练抑郁症检测模型。然而,提取心理语言学特征最广泛采用的方法是使用语言查询词频(LIWC)词典和各种情感词典。与文化因素和自杀风险相关的其他特征尚未得到探索。此外,社交网络行为特征和个人资料特征的使用会限制模型的通用性。因此,我们的研究旨在通过更广泛的与抑郁症相关的可能语言特征,为仅文本的社交媒体数据构建抑郁症预测模型,并阐明语言表达与抑郁症之间的关系。
我们收集了789名用户的抑郁症得分以及他们在微博上的过往帖子,并提取了总共117个词汇特征,包括简体中文语言查询词频、中文自杀词典、道德基础词典中文版、道德动机词典中文版和中国个人主义/集体主义词典。
结果表明,所有词典都有助于预测。表现最佳的模型是线性回归模型,预测值与自我报告值之间的皮尔逊相关系数为0.33,决定系数为0.10,折半信度为0.75。
本研究不仅开发了一个适用于仅文本社交媒体数据的预测模型,还证明了在计算词频时考虑文化心理因素和自杀相关表达的重要性。我们的研究提供了对与文化心理和自杀风险相关的词汇如何与抑郁症相关联的更全面理解,并有助于抑郁症的识别。