Beyond Joints: Learning Representations From Primitive Geometries for Skeleton-Based Action Recognition and Detection

Recently, skeleton-based action recognition becomes popular owing to the development of cost-effective depth sensors and fast pose estimation algorithms. Traditional methods based on pose descriptors often fail on large-scale datasets due to the limited representation of engineered features. Recent...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on image processing Vol. 27; no. 9; pp. 4382 - 4394
Main Authors Wang, Hongsong, Wang, Liang
Format Journal Article
LanguageEnglish
Published United States IEEE 01.09.2018
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN1057-7149
1941-0042
1941-0042
DOI10.1109/TIP.2018.2837386

Cover

More Information
Summary:Recently, skeleton-based action recognition becomes popular owing to the development of cost-effective depth sensors and fast pose estimation algorithms. Traditional methods based on pose descriptors often fail on large-scale datasets due to the limited representation of engineered features. Recent recurrent neural networks (RNN) based approaches mostly focus on the temporal evolution of body joints and neglect the geometric relations. In this paper, we aim to leverage the geometric relations among joints for action recognition. We introduce three primitive geometries: joints, edges, and surfaces. Accordingly, a generic end-to-end RNN based network is designed to accommodate the three inputs. For action recognition, a novel viewpoint transformation layer and temporal dropout layers are utilized in the RNN based network to learn robust representations. And for action detection, we first perform frame-wise action classification, then exploit a novel multi-scale sliding window algorithm. Experiments on the large-scale 3D action recognition benchmark datasets show that joints, edges, and surfaces are effective and complementary for different actions. Our approaches dramatically outperform the existing state-of-the-art methods for both tasks of action recognition and action detection.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ISSN:1057-7149
1941-0042
1941-0042
DOI:10.1109/TIP.2018.2837386