Suppr超能文献

老年人日常对话中的社会怀旧:使用自然语言处理和机器学习的自动检测。

Social Reminiscence in Older Adults' Everyday Conversations: Automated Detection Using Natural Language Processing and Machine Learning.

机构信息

Department of Management, Technology, and Economics, ETH Zurich, Zurich, Switzerland.

Department of Psychology, University of Zurich, Zurich, Switzerland.

出版信息

J Med Internet Res. 2020 Sep 15;22(9):e19133. doi: 10.2196/19133.

Abstract

BACKGROUND

Reminiscence is the act of thinking or talking about personal experiences that occurred in the past. It is a central task of old age that is essential for healthy aging, and it serves multiple functions, such as decision-making and introspection, transmitting life lessons, and bonding with others. The study of social reminiscence behavior in everyday life can be used to generate data and detect reminiscence from general conversations.

OBJECTIVE

The aims of this original paper are to (1) preprocess coded transcripts of conversations in German of older adults with natural language processing (NLP), and (2) implement and evaluate learning strategies using different NLP features and machine learning algorithms to detect reminiscence in a corpus of transcripts.

METHODS

The methods in this study comprise (1) collecting and coding of transcripts of older adults' conversations in German, (2) preprocessing transcripts to generate NLP features (bag-of-words models, part-of-speech tags, pretrained German word embeddings), and (3) training machine learning models to detect reminiscence using random forests, support vector machines, and adaptive and extreme gradient boosting algorithms. The data set comprises 2214 transcripts, including 109 transcripts with reminiscence. Due to class imbalance in the data, we introduced three learning strategies: (1) class-weighted learning, (2) a meta-classifier consisting of a voting ensemble, and (3) data augmentation with the Synthetic Minority Oversampling Technique (SMOTE) algorithm. For each learning strategy, we performed cross-validation on a random sample of the training data set of transcripts. We computed the area under the curve (AUC), the average precision (AP), precision, recall, as well as F1 score and specificity measures on the test data, for all combinations of NLP features, algorithms, and learning strategies.

RESULTS

Class-weighted support vector machines on bag-of-words features outperformed all other classifiers (AUC=0.91, AP=0.56, precision=0.5, recall=0.45, F1=0.48, specificity=0.98), followed by support vector machines on SMOTE-augmented data and word embeddings features (AUC=0.89, AP=0.54, precision=0.35, recall=0.59, F1=0.44, specificity=0.94). For the meta-classifier strategy, adaptive and extreme gradient boosting algorithms trained on word embeddings and bag-of-words outperformed all other classifiers and NLP features; however, the performance of the meta-classifier learning strategy was lower compared to other strategies, with highly imbalanced precision-recall trade-offs.

CONCLUSIONS

This study provides evidence of the applicability of NLP and machine learning pipelines for the automated detection of reminiscence in older adults' everyday conversations in German. The methods and findings of this study could be relevant for designing unobtrusive computer systems for the real-time detection of social reminiscence in the everyday life of older adults and classifying their functions. With further improvements, these systems could be deployed in health interventions aimed at improving older adults' well-being by promoting self-reflection and suggesting coping strategies to be used in the case of dysfunctional reminiscence cases, which can undermine physical and mental health.

摘要

背景

怀旧是指思考或谈论过去发生的个人经历的行为。它是老年人的一项核心任务,对健康老龄化至关重要,具有多种功能,如决策和内省、传递人生经验以及与他人建立联系。研究日常生活中的社交怀旧行为可以用来生成数据并从一般对话中检测怀旧。

目的

本原创论文的目的是:(1)使用自然语言处理(NLP)对老年人德语对话的编码记录进行预处理,以及(2)实施和评估使用不同 NLP 特征和机器学习算法的学习策略,以在语料库的记录中检测怀旧。

方法

本研究的方法包括:(1)收集和编码老年人德语对话的记录,(2)预处理记录以生成 NLP 特征(词袋模型、词性标记、预训练的德语单词嵌入),以及(3)使用随机森林、支持向量机和自适应和极端梯度增强算法训练机器学习模型以检测怀旧。数据集包括 2214 份记录,其中 109 份记录有怀旧内容。由于数据中的类别不平衡,我们引入了三种学习策略:(1)类别加权学习,(2)由投票集成组成的元分类器,以及(3)使用合成少数过采样技术(SMOTE)算法进行数据扩充。对于每种学习策略,我们在记录的训练数据集的随机样本上执行了交叉验证。我们计算了测试数据的曲线下面积(AUC)、平均精度(AP)、精度、召回率以及 F1 分数和特异性度量,所有这些都与 NLP 特征、算法和学习策略的组合有关。

结果

在词袋特征上进行类别加权支持向量机的性能优于所有其他分类器(AUC=0.91,AP=0.56,精度=0.5,召回率=0.45,F1=0.48,特异性=0.98),其次是在经过 SMOTE 扩充数据和单词嵌入特征上的支持向量机(AUC=0.89,AP=0.54,精度=0.35,召回率=0.59,F1=0.44,特异性=0.94)。对于元分类器策略,在单词嵌入和词袋上训练的自适应和极端梯度增强算法优于所有其他分类器和 NLP 特征;然而,与其他策略相比,元分类器学习策略的性能较低,具有高度不平衡的精度-召回权衡。

结论

本研究为使用 NLP 和机器学习管道自动检测老年人德语日常对话中的怀旧提供了证据。本研究的方法和发现可能与设计用于实时检测老年人日常生活中的社交怀旧并对其功能进行分类的不引人注目的计算机系统有关。通过进一步改进,这些系统可以部署在健康干预措施中,通过促进自我反思并为功能失调的怀旧情况提出应对策略,从而改善老年人的幸福感,这些策略可能会破坏身心健康。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a30e/7525396/5b5cac3d9241/jmir_v22i9e19133_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验