Beyond Joints: Learning Representations From Primitive Geometries for Skeleton-Based Action Recognition and Detection

Recently, skeleton-based action recognition becomes popular owing to the development of cost-effective depth sensors and fast pose estimation algorithms. Traditional methods based on pose descriptors often fail on large-scale datasets due to the limited representation of engineered features. Recent...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on image processing Vol. 27; no. 9; pp. 4382 - 4394
Main Authors	Wang, Hongsong, Wang, Liang
Format	Journal Article
Language	English
Published	United States IEEE 01.09.2018 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	action detection Algorithms Bones Datasets Edge joints geometric relations Geometry Human Activities - classification Humans Image edge detection Imaging, Three-Dimensional - methods Joints Joints - diagnostic imaging Joints - physiology Range of Motion, Articular - physiology Recognition Recurrent neural networks Representations Skeleton - diagnostic imaging Skeleton - physiology Skeleton-based action recognition Three-dimensional displays viewpoint transformation
Online Access	Get full text
ISSN	1057-7149 1941-0042 1941-0042
DOI	10.1109/TIP.2018.2837386

Cover

More Information
Summary:	Recently, skeleton-based action recognition becomes popular owing to the development of cost-effective depth sensors and fast pose estimation algorithms. Traditional methods based on pose descriptors often fail on large-scale datasets due to the limited representation of engineered features. Recent recurrent neural networks (RNN) based approaches mostly focus on the temporal evolution of body joints and neglect the geometric relations. In this paper, we aim to leverage the geometric relations among joints for action recognition. We introduce three primitive geometries: joints, edges, and surfaces. Accordingly, a generic end-to-end RNN based network is designed to accommodate the three inputs. For action recognition, a novel viewpoint transformation layer and temporal dropout layers are utilized in the RNN based network to learn robust representations. And for action detection, we first perform frame-wise action classification, then exploit a novel multi-scale sliding window algorithm. Experiments on the large-scale 3D action recognition benchmark datasets show that joints, edges, and surfaces are effective and complementary for different actions. Our approaches dramatically outperform the existing state-of-the-art methods for both tasks of action recognition and action detection.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1057-7149 1941-0042 1941-0042
DOI:	10.1109/TIP.2018.2837386