Research on Speech Emotion Recognition Based on Deep Neural Network

Human emotion is a concrete form of human communication, and the research on emotion recognition is increasing gradually. In recent years, researchers have paid more attention to multi-modal emotion recognition. This paper presents a deep neural network for emotion recognition based on speech spectr...

Full description

Saved in:
Bibliographic Details
Published in2021 IEEE 6th International Conference on Signal and Image Processing (ICSIP) pp. 795 - 799
Main Authors Li, Heng, Zhang, Xue, Wang, Ming-Jiang
Format Conference Proceeding
LanguageEnglish
Published IEEE 22.10.2021
Subjects
Online AccessGet full text
DOI10.1109/ICSIP52628.2021.9689043

Cover

More Information
Summary:Human emotion is a concrete form of human communication, and the research on emotion recognition is increasing gradually. In recent years, researchers have paid more attention to multi-modal emotion recognition. This paper presents a deep neural network for emotion recognition based on speech spectrum. Spectrograms contain comprehensive information about speech and are useful for emotion recognition. We tried the convolutional neural network (CNN) and the Long-Short Term Memory (LSTM), the combination of voice to make use of CNN feature extraction, using LSTM network reserve the temporal information, the voice information extracted from spectrogram, and the emotion recognition task. This study adopts the university of southern California's Interactive Emotion Capture (IEMOCAP) dataset as the data collection. We use the speech spectrogram as input, for six kind of mood, and the final weighted accuracy is 61%, the unweighted accuracy is 56%.
DOI:10.1109/ICSIP52628.2021.9689043