Realtime Interpersonal Human Synchrony Detection Based on Action Segmentation

IS (Interpersonal Synchrony), where the follower (participant) tries to behave the same action along with the raiser (human or metronome), is an essential social interaction skill. The evaluation of interpersonal synchronization is valuable for early autism screening. However, the research on IS eva...

Full description

Saved in:
Bibliographic Details
Published inIntelligent Robotics and Applications pp. 331 - 340
Main Authors Chen, Bowen, Zhang, Jiamin, Liu, Zuode, Lin, Ruihan, Ren, Weihong, Yu, Luodi, Liu, Honghai
Format Book Chapter
LanguageEnglish
Published Cham Springer International Publishing 2022
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text
ISBN9783031138430
3031138430
ISSN0302-9743
1611-3349
DOI10.1007/978-3-031-13844-7_32

Cover

More Information
Summary:IS (Interpersonal Synchrony), where the follower (participant) tries to behave the same action along with the raiser (human or metronome), is an essential social interaction skill. The evaluation of interpersonal synchronization is valuable for early autism screening. However, the research on IS evaluation is limited, and the current approaches usually evaluate the IS task with “motion energy” that is calculated by imprecise corner detection of the participant, which is not robust in an uncontrollable clinical environment. Moreover, these approaches need to manually mark the start and the end anchor of the specified action segment, which is labor-intensive. In this paper, we construct a realtime action segmentation model to automatically recognize the human-wise action class frame by frame. A simple yet efficient backbone is utilized to classify action class straightly instead of extracting the motion features (e.g. optical flow) with high computational complexity. Specifically, given an action video, a sliding window stacks frames in a fixed window size to feed a Resnet-like action classification branch (ACB) to classify the current action label. To further improve the accuracy of action boundary and eliminate the over-segmentation noises, we incorporate a boundary prediction branch (BPB), cooperating with majority-voting strategy, to refine the action classification generated by ACB. Then we can calculate the IS overlap easily by comparing two action timelines belonging to raiser and follower. To evaluate the proposed model, we collect 200K annotated images belonging to 40 subjects who perform 2 tasks (nod and clap) in 2 conditions (interpersonal and human-metronome). The experiment results demonstrate that our model achieves 87.1% accuracy at 200 FPS and can locate the start and end of action precisely in realtime.
ISBN:9783031138430
3031138430
ISSN:0302-9743
1611-3349
DOI:10.1007/978-3-031-13844-7_32