Data Augmentation for 3DMM-based Arousal-Valence Prediction for HRI

Humans use multiple communication channels to interact with each other. For instance, body gestures or facial expressions are commonly used to convey an intent. The use of such non-verbal cues has motivated the development of prediction models. One such approach is predicting arousal and valence (AV...

Full description

Saved in:
Bibliographic Details
Published inIEEE RO-MAN pp. 2015 - 2022
Main Authors Cruz, Christian Arzate, Sechayk, Yotam, Igarashi, Takeo, Gomez, Randy
Format Conference Proceeding
LanguageEnglish
Published IEEE 26.08.2024
Subjects
Online AccessGet full text
ISSN1944-9437
DOI10.1109/RO-MAN60168.2024.10731438

Cover

More Information
Summary:Humans use multiple communication channels to interact with each other. For instance, body gestures or facial expressions are commonly used to convey an intent. The use of such non-verbal cues has motivated the development of prediction models. One such approach is predicting arousal and valence (AV) from facial expressions. However, making these models accurate for human-robot interaction (HRI) settings is challenging as it requires handling multiple subjects, challenging conditions, and a wide range of facial expressions. In this paper, we propose a data augmentation (DA) technique to improve the performance of AV predictors using 3D morphable models (3DMM). We then utilize this approach in an HRI setting with a mediator robot and a group of three humans. Our augmentation method creates synthetic sequences for underrepresented values in the AV space of the SEWA dataset, which is the most comprehensive dataset with continuous AV labels. Results show that using our DA method improves the accuracy and robustness of AV prediction in realtime applications. The accuracy of our models on the SEWA dataset is 0.793 for arousal and valence.
ISSN:1944-9437
DOI:10.1109/RO-MAN60168.2024.10731438