Learning Expression Features via Deep Residual Attention Networks for Facial Expression Recognition From Video Sequences

Facial expression recognition from video sequences is currently an interesting research topic in computer vision, pattern recognition, artificial intelligence, etc. Considering the problem of semantic gap between the extracted hand-designed features in affective videos and subjective emotions, recog...

Full description

Saved in:

Bibliographic Details
Published in	Technical review - IETE Vol. 38; no. 6; pp. 602 - 610
Main Authors	Zhao, Xiaoming, Chen, Gang, Chuang, Yuelong, Tao, Xin, Zhang, Shiqing
Format	Journal Article
Language	English
Published	Taylor & Francis 02.11.2021
Subjects	Attention mechanism Deep residual attention networks Facial expression recognition Multi-layer perceptron Video sequences
Online Access	Get full text
ISSN	0256-4602 0974-5971
DOI	10.1080/02564602.2020.1814168

Cover

More Information
Summary:	Facial expression recognition from video sequences is currently an interesting research topic in computer vision, pattern recognition, artificial intelligence, etc. Considering the problem of semantic gap between the extracted hand-designed features in affective videos and subjective emotions, recognizing facial expressions from video sequences is a challenging subject. To tackle this problem, this paper proposes a new method of facial expression recognition from video sequences via deep residual attention network. Firstly, due to the difference in the intensity of emotional representation of each local area in a facial image, deep residual attention networks are employed to learn high-level affective expression features for each frame of facial expression images in video sequences. The used deep residual attention networks integrate deep residual networks with a spatial attention mechanism. Then, average-pooling is performed to produce fixed-length global video-level feature representations. Finally, the global video-level feature representations are utilized as inputs of a multi-layer perceptron to conduct facial expression classification tasks in video sequences. Experimental results on two public video emotional datasets, i.e. BAUM-1s and RML, demonstrate the effectiveness of the proposed method.
ISSN:	0256-4602 0974-5971
DOI:	10.1080/02564602.2020.1814168