Hierarchical intertwined graph representation learning for skeleton-based action recognition

Graph Convolutional Networks (GCNs) have emerged as a leading approach for human skeleton-based action recognition, owing to their capacity to represent skeletal joints as adaptive graphs that effectively capture complex spatial relationships for feature aggregation. However, existing methods predom...

Full description

Saved in:

Bibliographic Details
Published in	Scientific reports Vol. 15; no. 1; pp. 35447 - 14
Main Authors	Zhang, Xi, Tan, Caiyan, Yuan, Yuan, Yan, Jiexing
Format	Journal Article
Language	English
Published	London Nature Publishing Group UK 10.10.2025 Nature Publishing Group Nature Portfolio
Subjects	631/114/1305 631/114/2397 Accuracy Adaptability Adaptation Benchmarks Cognition & reasoning Connectivity Graph convolutional network Graph representations Human skeleton-based action recognition Humanities and Social Sciences Knee multidisciplinary Science Science (multidisciplinary) Skeleton Skeleton-based graph Topology Skeleton-based graph Human skeleton-based action recognition Graph convolutional network
Online Access	Get full text
ISSN	2045-2322 2045-2322
DOI	10.1038/s41598-025-19399-4

Cover

More Information
Summary:	Graph Convolutional Networks (GCNs) have emerged as a leading approach for human skeleton-based action recognition, owing to their capacity to represent skeletal joints as adaptive graphs that effectively capture complex spatial relationships for feature aggregation. However, existing methods predominantly emphasize either spatial context within individual frames or holistic temporal sequences, often overlooking the interplay of spatial topology across multiple temporal scales. This limitation hinders the model’s ability to fully understand complex actions, especially those involving interactions that vary across different temporal phases. To address this challenge, we propose a Hierarchical Intertwined Graph Learning Framework (HI-GCN), which comprises two key modules: Intertwined Context Graph Convolution and Shifted Window Temporal Transformer. The former module integrates spatial–temporal information from adjacent frames at various temporal scales, thereby refining spatial relationship representations and capturing subtle topological variations that conventional GCNs tend to miss. The latter module advances temporal dependency modeling by applying shifted temporal windows with multi-scale receptive fields. Experimental results demonstrate that HI-GCN surpasses current state-of-the-art methods on multiple skeleton-based action recognition benchmarks, achieving accuracies of 93.3% on NTU RGB+D 60 (cross-subject), 90.3% on NTU RGB+D 120 (cross-subject), and 97.0% on NW-UCLA.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	2045-2322 2045-2322
DOI:	10.1038/s41598-025-19399-4