Learning Expression Features via Deep Residual Attention Networks for Facial Expression Recognition From Video Sequences

Facial expression recognition from video sequences is currently an interesting research topic in computer vision, pattern recognition, artificial intelligence, etc. Considering the problem of semantic gap between the extracted hand-designed features in affective videos and subjective emotions, recog...

Full description

Saved in:
Bibliographic Details
Published inTechnical review - IETE Vol. 38; no. 6; pp. 602 - 610
Main Authors Zhao, Xiaoming, Chen, Gang, Chuang, Yuelong, Tao, Xin, Zhang, Shiqing
Format Journal Article
LanguageEnglish
Published Taylor & Francis 02.11.2021
Subjects
Online AccessGet full text
ISSN0256-4602
0974-5971
DOI10.1080/02564602.2020.1814168

Cover

More Information
Summary:Facial expression recognition from video sequences is currently an interesting research topic in computer vision, pattern recognition, artificial intelligence, etc. Considering the problem of semantic gap between the extracted hand-designed features in affective videos and subjective emotions, recognizing facial expressions from video sequences is a challenging subject. To tackle this problem, this paper proposes a new method of facial expression recognition from video sequences via deep residual attention network. Firstly, due to the difference in the intensity of emotional representation of each local area in a facial image, deep residual attention networks are employed to learn high-level affective expression features for each frame of facial expression images in video sequences. The used deep residual attention networks integrate deep residual networks with a spatial attention mechanism. Then, average-pooling is performed to produce fixed-length global video-level feature representations. Finally, the global video-level feature representations are utilized as inputs of a multi-layer perceptron to conduct facial expression classification tasks in video sequences. Experimental results on two public video emotional datasets, i.e. BAUM-1s and RML, demonstrate the effectiveness of the proposed method.
ISSN:0256-4602
0974-5971
DOI:10.1080/02564602.2020.1814168