Hierarchical intertwined graph representation learning for skeleton-based action recognition
Graph Convolutional Networks (GCNs) have emerged as a leading approach for human skeleton-based action recognition, owing to their capacity to represent skeletal joints as adaptive graphs that effectively capture complex spatial relationships for feature aggregation. However, existing methods predom...
        Saved in:
      
    
          | Published in | Scientific reports Vol. 15; no. 1; pp. 35447 - 14 | 
|---|---|
| Main Authors | , , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
        London
          Nature Publishing Group UK
    
        10.10.2025
     Nature Publishing Group Nature Portfolio  | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 2045-2322 2045-2322  | 
| DOI | 10.1038/s41598-025-19399-4 | 
Cover
| Summary: | Graph Convolutional Networks (GCNs) have emerged as a leading approach for human skeleton-based action recognition, owing to their capacity to represent skeletal joints as adaptive graphs that effectively capture complex spatial relationships for feature aggregation. However, existing methods predominantly emphasize either spatial context within individual frames or holistic temporal sequences, often overlooking the interplay of spatial topology across multiple temporal scales. This limitation hinders the model’s ability to fully understand complex actions, especially those involving interactions that vary across different temporal phases. To address this challenge, we propose a Hierarchical Intertwined Graph Learning Framework (HI-GCN), which comprises two key modules: Intertwined Context Graph Convolution and Shifted Window Temporal Transformer. The former module integrates spatial–temporal information from adjacent frames at various temporal scales, thereby refining spatial relationship representations and capturing subtle topological variations that conventional GCNs tend to miss. The latter module advances temporal dependency modeling by applying shifted temporal windows with multi-scale receptive fields. Experimental results demonstrate that HI-GCN surpasses current state-of-the-art methods on multiple skeleton-based action recognition benchmarks, achieving accuracies of 93.3% on NTU RGB+D 60 (cross-subject), 90.3% on NTU RGB+D 120 (cross-subject), and 97.0% on NW-UCLA. | 
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23  | 
| ISSN: | 2045-2322 2045-2322  | 
| DOI: | 10.1038/s41598-025-19399-4 |