基于图表示的语音情感识别。

Speech emotion recognition via graph-based representations.

机构信息

Institute of Computer Science, Foundation for Research and Technology-Hellas, Heraklion, GR-700 13, Greece.

Computer Science Department, University of Crete, Heraklion, GR-700 13, Greece.

出版信息

Sci Rep. 2024 Feb 23;14(1):4484. doi: 10.1038/s41598-024-52989-2.

DOI:10.1038/s41598-024-52989-2

PMID:38396002

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10891082/

Abstract

Speech emotion recognition (SER) has gained an increased interest during the last decades as part of enriched affective computing. As a consequence, a variety of engineering approaches have been developed addressing the challenge of the SER problem, exploiting different features, learning algorithms, and datasets. In this paper, we propose the application of the graph theory for classifying emotionally-colored speech signals. Graph theory provides tools for extracting statistical as well as structural information from any time series. We propose to use the mentioned information as a novel feature set. Furthermore, we suggest setting a unique feature-based identity for each emotion belonging to each speaker. The emotion classification is performed by a Random Forest classifier in a Leave-One-Speaker-Out Cross Validation (LOSO-CV) scheme. The proposed method is compared with two state-of-the-art approaches involving well known hand-crafted features as well as deep learning architectures operating on mel-spectrograms. Experimental results on three datasets, EMODB (German, acted) and AESDD (Greek, acted), and DEMoS (Italian, in-the-wild), reveal that our proposed method outperforms the comparative methods in these datasets. Specifically, we observe an average UAR increase of almost [Formula: see text], [Formula: see text] and [Formula: see text], respectively.

摘要

语音情感识别（SER）作为情感计算的一个分支，在过去几十年中引起了越来越多的关注。因此，已经开发了各种工程方法来解决 SER 问题的挑战，利用不同的特征、学习算法和数据集。在本文中，我们提出了将图论应用于分类情感色彩的语音信号。图论提供了从任何时间序列中提取统计和结构信息的工具。我们建议使用所述信息作为新的特征集。此外，我们建议为每个说话者的每种情感设置一个独特的基于特征的标识。通过在 Leave-One-Speaker-Out Cross Validation (LOSO-CV) 方案中使用随机森林分类器进行情感分类。将提出的方法与两种最先进的方法进行比较，这些方法涉及众所周知的手工制作特征以及在梅尔频谱图上运行的深度学习架构。在三个数据集 EMODB（德语，表演）、AESDD（希腊语，表演）和 DEMoS（意大利语，自然）上的实验结果表明，我们提出的方法在这些数据集上优于比较方法。具体来说，我们观察到平均 UAR 分别增加了近 [Formula: see text]、[Formula: see text] 和 [Formula: see text]。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/eede/10891082/d25964858113/41598_2024_52989_Fig1_HTML.jpg

相似文献

Speech emotion recognition via graph-based representations.

Sci Rep. 2024 Feb 23;14(1):4484. doi: 10.1038/s41598-024-52989-2.

Impact of Feature Selection Algorithm on Speech Emotion Recognition Using Deep Convolutional Neural Network.

Sensors (Basel). 2020 Oct 23;20(21):6008. doi: 10.3390/s20216008.

An enhanced speech emotion recognition using vision transformer.

Sci Rep. 2024 Jun 7;14(1):13126. doi: 10.1038/s41598-024-63776-4.

MelTrans: Mel-Spectrogram Relationship-Learning for Speech Emotion Recognition via Transformers.

Sensors (Basel). 2024 Aug 25;24(17):5506. doi: 10.3390/s24175506.

Effect on speech emotion classification of a feature selection approach using a convolutional neural network.

PeerJ Comput Sci. 2021 Nov 3;7:e766. doi: 10.7717/peerj-cs.766. eCollection 2021.

Deep-Net: A Lightweight CNN-Based Speech Emotion Recognition System Using Deep Frequency Features.

Sensors (Basel). 2020 Sep 12;20(18):5212. doi: 10.3390/s20185212.

A Hybrid Time-Distributed Deep Neural Architecture for Speech Emotion Recognition.

Int J Neural Syst. 2022 Jun;32(6):2250024. doi: 10.1142/S0129065722500241. Epub 2022 May 12.

A comprehensive study on bilingual and multilingual speech emotion recognition using a two-pass classification scheme.

PLoS One. 2019 Aug 15;14(8):e0220386. doi: 10.1371/journal.pone.0220386. eCollection 2019.

Fusing Visual Attention CNN and Bag of Visual Words for Cross-Corpus Speech Emotion Recognition.

Sensors (Basel). 2020 Sep 28;20(19):5559. doi: 10.3390/s20195559.

Evaluating deep learning architectures for Speech Emotion Recognition.

Neural Netw. 2017 Aug;92:60-68. doi: 10.1016/j.neunet.2017.02.013. Epub 2017 Mar 21.

本文引用的文献

Multimodal transformer augmented fusion for speech emotion recognition.

Front Neurorobot. 2023 May 22;17:1181598. doi: 10.3389/fnbot.2023.1181598. eCollection 2023.

A Model of Normality Inspired Deep Learning Framework for Depression Relapse Prediction Using Audiovisual Data.

Comput Methods Programs Biomed. 2022 Nov;226:107132. doi: 10.1016/j.cmpb.2022.107132. Epub 2022 Sep 20.

End-to-end multimodal clinical depression recognition using deep neural networks: A comparative analysis.

Comput Methods Programs Biomed. 2021 Nov;211:106433. doi: 10.1016/j.cmpb.2021.106433. Epub 2021 Sep 28.

Graph-based feature extraction: A new proposal to study the classification of music signals outside the time-frequency domain.

PLoS One. 2020 Nov 12;15(11):e0240915. doi: 10.1371/journal.pone.0240915. eCollection 2020.

A CNN-Assisted Enhanced Audio Signal Processing for Speech Emotion Recognition.

Sensors (Basel). 2019 Dec 28;20(1):183. doi: 10.3390/s20010183.

GRAPH CONVOLUTIONAL NEURAL NETWORKS FOR ALZHEIMER'S DISEASE CLASSIFICATION.

Proc IEEE Int Symp Biomed Imaging. 2019 Apr;2019:414-417. doi: 10.1109/ISBI.2019.8759531. Epub 2019 Jul 11.

From time series to complex networks: the visibility graph.

Proc Natl Acad Sci U S A. 2008 Apr 1;105(13):4972-5. doi: 10.1073/pnas.0709247105. Epub 2008 Mar 24.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于图表示的语音情感识别。

Speech emotion recognition via graph-based representations.

机构信息

出版信息

相似文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

本文引用的文献