A New Joint Training Method for Facial Expression Recognition with Inconsistently Annotated and Imbalanced Data

Facial expression recognition (FER) plays a crucial role in various applications, including human–computer interaction and affective computing. However, the joint training of an FER network with multiple datasets is a promising strategy to enhance its performance. Nevertheless, widespread annotation...

Full description

Saved in:

Bibliographic Details
Published in	Electronics (Basel) Vol. 13; no. 19; p. 3891
Main Authors	Chen, Tao, Zhang, Dong, Lee, Dah-Jye
Format	Journal Article
Language	English
Published	Basel MDPI AG 01.10.2024
Subjects	Accuracy Affective computing Algorithms Annotations Background noise Data augmentation Data collection Datasets Deep learning Face recognition Methods Neural networks Canada
Online Access	Get full text
ISSN	2079-9292 2079-9292
DOI	10.3390/electronics13193891

Cover

More Information
Summary:	Facial expression recognition (FER) plays a crucial role in various applications, including human–computer interaction and affective computing. However, the joint training of an FER network with multiple datasets is a promising strategy to enhance its performance. Nevertheless, widespread annotation inconsistencies and class imbalances among FER datasets pose significant challenges to this approach. This paper proposes a new multi-dataset joint training method, Sample Selection and Paired Augmentation Joint Training (SSPA-JT), to address these challenges. SSPA-JT models annotation inconsistency as a label noise problem and selects clean samples from auxiliary datasets to expand the overall dataset size while maintaining consistent annotation standards. Additionally, a dynamic matching algorithm is developed to pair clean samples of the tail class with noisy samples, which enriches the tail classes with diverse background information. Experimental results demonstrate that SSPA-JT achieved superior or comparable performance compared with the existing methods by addressing both annotation inconsistencies and class imbalance during multi-dataset joint training. It achieved state-of-the-art performance on RAF-DB and CAER-S datasets with accuracies of 92.44% and 98.22%, respectively, reflecting improvements of 0.2% and 3.65% over existing methods.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	2079-9292 2079-9292
DOI:	10.3390/electronics13193891