Multimodal Emotion Recognition based on Face and Speech using Deep Convolution Neural Network and Long Short Term Memory
Multimodal emotion recognition (MER) is crucial for analyzing a person’s mental behavior and health to enhance the performance of human–computer-interaction systems. Various deep learning-based MER systems have been presented in the last decade. However, the outcomes of the MER schemes are limited d...
Saved in:
Published in | Circuits, systems, and signal processing Vol. 44; no. 9; pp. 6622 - 6649 |
---|---|
Main Authors | , |
Format | Journal Article |
Language | English |
Published |
New York
Springer US
01.09.2025
Springer Nature B.V |
Subjects | |
Online Access | Get full text |
ISSN | 0278-081X 1531-5878 |
DOI | 10.1007/s00034-025-03080-2 |
Cover
Summary: | Multimodal emotion recognition (MER) is crucial for analyzing a person’s mental behavior and health to enhance the performance of human–computer-interaction systems. Various deep learning-based MER systems have been presented in the last decade. However, the outcomes of the MER schemes are limited due to poor feature representation, lower correlation in short and long-term features, security issues, lower generalization capability, lower reliability of emotional modality systems, and higher computational intricacy of deep learning models. This paper presents the MER based on facial images and speech data using parallel deep convolution neural network (PDCNN) and bidirectional long short-term memory (BiLSTM) to improve the system’s reliability, security, and robustness. The PDCNN aims to offer superior generalization capability and feature depiction; however, BiLSTM offers better long-term dependency, temporal representation, and correlation between the multimodal data’s short and long-term attributes. The novel hybrid Particle Swarm Optimization based on Multi-Attribute Utility Theory and Archimedes Optimization Algorithm (PMA) is used to select crucial features of the facial expressions and speech data to minimize the computational intricacy of the PDCNN-LSTM framework. It offers an overall improved accuracy of 99.22%, precision of 0.9967, recall of 0.9933, and F1-score of 0.9949 for MER on the BAUM dataset compared to traditional techniques. |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 0278-081X 1531-5878 |
DOI: | 10.1007/s00034-025-03080-2 |