Enhanced Topology Representation Learning for Skeleton-Based Human Action Recognition

We propose an enhanced topology representation learning method for the Skeleton-Based Human Action Recognition problem. In this work, we investigate the application of an adaptive graph convolutional layer within the Spatial-Temporal Graph Convolutional Network (ST-GCN) to learn a flexible topology...

Full description

Saved in:
Bibliographic Details
Published inProcedia computer science Vol. 246; pp. 3093 - 3102
Main Authors Anh, Vu Ho Tran, Nguyen, Thi-Oanh
Format Journal Article
LanguageEnglish
Published Elsevier B.V 2024
Subjects
Online AccessGet full text
ISSN1877-0509
1877-0509
DOI10.1016/j.procs.2024.09.363

Cover

Abstract We propose an enhanced topology representation learning method for the Skeleton-Based Human Action Recognition problem. In this work, we investigate the application of an adaptive graph convolutional layer within the Spatial-Temporal Graph Convolutional Network (ST-GCN) to learn a flexible topology and enhance representation through regularization loss. We assess the effect of using an adaptive graph, which differs for each input to define the neighbors of a joint, instead of using a fixed heuristic graph. Additionally, by controlling the latent space, our model encodes a more effective latent representation for each action class, which can be easily differentiated by the classifier. Moreover, we evaluate the performance of the proposed method with a three-stream network and explore the potential for improved performance through the use of late fusion ensemble techniques on models trained with different modalities. Our proposal achieved promising results on multiple skeleton-based action recognition benchmarks, with an accuracy of 89.06% on the NTU RGB+D (NTU 60) cross-subject split and 87.89% on the Northwestern-UCLA (NUCLA) dataset, representing approximately 0.5% and 10% improvements over the baseline model on these datasets, respectively.
AbstractList We propose an enhanced topology representation learning method for the Skeleton-Based Human Action Recognition problem. In this work, we investigate the application of an adaptive graph convolutional layer within the Spatial-Temporal Graph Convolutional Network (ST-GCN) to learn a flexible topology and enhance representation through regularization loss. We assess the effect of using an adaptive graph, which differs for each input to define the neighbors of a joint, instead of using a fixed heuristic graph. Additionally, by controlling the latent space, our model encodes a more effective latent representation for each action class, which can be easily differentiated by the classifier. Moreover, we evaluate the performance of the proposed method with a three-stream network and explore the potential for improved performance through the use of late fusion ensemble techniques on models trained with different modalities. Our proposal achieved promising results on multiple skeleton-based action recognition benchmarks, with an accuracy of 89.06% on the NTU RGB+D (NTU 60) cross-subject split and 87.89% on the Northwestern-UCLA (NUCLA) dataset, representing approximately 0.5% and 10% improvements over the baseline model on these datasets, respectively.
Author Nguyen, Thi-Oanh
Anh, Vu Ho Tran
Author_xml – sequence: 1
  givenname: Vu Ho Tran
  surname: Anh
  fullname: Anh, Vu Ho Tran
  email: vu.hta194885@sis.hust.edu.vn
– sequence: 2
  givenname: Thi-Oanh
  surname: Nguyen
  fullname: Nguyen, Thi-Oanh
  email: oanh.nguyenthi@hust.edu.vn
BookMark eNqNkL1OwzAQgC0EEqX0CVjyAgl2bCfxwFCqQpEqIZV2thznUlxSO7JTUN-epGVgQtxyN9x3P98NurTOAkJ3BCcEk-x-l7Te6ZCkOGUJFgnN6AUakSLPY8yxuPxVX6NJCDvcBy0KQfIR2sztu7IaqmjtWte47TFaQeshgO1UZ5yNlqC8NXYb1c5Hbx_QQOds_KhCzywOe2WjqT41rkC7rTVDfYuuatUEmPzkMdo8zdezRbx8fX6ZTZexTgmlcVkXFAqaVoL3R7McdM0Uz7QilAmWVySrSpxhpgvGBcVFXTKmRZmVjHJeEk7HiJ3nHmyrjl-qaWTrzV75oyRYDnbkTp7syMGOxEL2i3qMnjHtXQge6n9SD2cK-o8-DXgZtIHBnfGgO1k58yf_DVvigcw
Cites_doi 10.1109/CVPR52688.2022.01955
10.3390/e22090999
10.1609/aaai.v32i1.12328
10.1109/ICCV48922.2021.01311
10.1109/TIP.2020.3028207
ContentType Journal Article
Copyright 2024
Copyright_xml – notice: 2024
DBID 6I.
AAFTH
AAYXX
CITATION
ADTOC
UNPAY
DOI 10.1016/j.procs.2024.09.363
DatabaseName ScienceDirect Open Access Titles
Elsevier:ScienceDirect:Open Access
CrossRef
Unpaywall for CDI: Periodical Content
Unpaywall
DatabaseTitle CrossRef
DatabaseTitleList
Database_xml – sequence: 1
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EISSN 1877-0509
EndPage 3102
ExternalDocumentID 10.1016/j.procs.2024.09.363
10_1016_j_procs_2024_09_363
S1877050924023913
GroupedDBID --K
0R~
1B1
457
5VS
6I.
71M
AAEDT
AAEDW
AAFTH
AAIKJ
AALRI
AAQFI
AAXUO
AAYWO
ABMAC
ABWVN
ACGFS
ACRPL
ACVFH
ADBBV
ADCNI
ADEZE
ADNMO
ADVLN
AEUPX
AEXQZ
AFPUW
AFTJW
AGHFR
AIGII
AITUG
AKBMS
AKRWK
AKYEP
ALMA_UNASSIGNED_HOLDINGS
AMRAJ
E3Z
EBS
EJD
EP3
FDB
FNPLU
HZ~
IXB
KQ8
M41
M~E
O-L
O9-
OK1
P2P
RIG
ROL
SES
SSZ
AAYXX
CITATION
~HD
ADTOC
UNPAY
ID FETCH-LOGICAL-c2133-bf83e832d9536347ecf4a56ca134947d16db0604c8459308fb44c9b6b4355b153
IEDL.DBID IXB
ISSN 1877-0509
IngestDate Tue Aug 19 19:49:18 EDT 2025
Wed Oct 29 21:17:38 EDT 2025
Sat Aug 09 17:31:38 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Keywords NUCLA
Skeleton-based human action recognition
Multi-stream network
Regularization
Graph Convolutional Networks
NTU
Graph Neural Networks
Language English
License This is an open access article under the CC BY-NC-ND license.
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c2133-bf83e832d9536347ecf4a56ca134947d16db0604c8459308fb44c9b6b4355b153
OpenAccessLink https://www.sciencedirect.com/science/article/pii/S1877050924023913
PageCount 10
ParticipantIDs unpaywall_primary_10_1016_j_procs_2024_09_363
crossref_primary_10_1016_j_procs_2024_09_363
elsevier_sciencedirect_doi_10_1016_j_procs_2024_09_363
PublicationCentury 2000
PublicationDate 2024
2024-00-00
PublicationDateYYYYMMDD 2024-01-01
PublicationDate_xml – year: 2024
  text: 2024
PublicationDecade 2020
PublicationTitle Procedia computer science
PublicationYear 2024
Publisher Elsevier B.V
Publisher_xml – name: Elsevier B.V
References S. Yan, Y. Xiong, and D. Lin, “Spatial temporal graph convolutional networks for skeleton-based action recognition,” in Proceedings of the Thirty-second AAAI conference on artificial intelligence, 2018, p. 9.
Shi, Zhang, Cheng, Lu (bib2) 2020; 29
L. Shi, Y. Zhang, J. Cheng, and H. Lu, “Two-stream adaptive graph convolutional networks for skeleton-based action recognition,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 12 026–12 035.
I. S. Fischer, “The conditional entropy bottleneck,” Entropy, vol. 22, no. 9, p. 999, 2020. [Online]. Available
J. Lee, M. Lee, D. Lee, and S. Lee, “Hierarchically decomposed graph convolutional networks for skeleton-based action recognition,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 10 410–10 419, 2022.
A. A. Alemi, I. Fischer, J. V. Dillon, and K. Murphy, “Deep variational information bottleneck,” in 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, 2017.
Shahroudy, Liu, Ng, Wang (bib11) 2016
Y. Chen, Z. Zhang, C. Yuan, B. Li, Y. Deng, and W. Hu, “Channel-wise topology refinement graph convolution for skeleton-based action recognition,” in Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 13 339–13 348.
H.-g. Chi, M. H. Ha, S. Chi, S. W. Lee, Q. Huang, and K. Ramani, “Infogcn: Representation learning for human skeleton-based action recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2022, pp. 20186–20 196.
M. W. Thomas N. Kipf, “Semi-supervised classification with graph convolutional networks,” in Proceedings of the International Conference on Learning Representations (ICLR), 2017.
Duan, Wang, Chen, Lin (bib10) 2022
Wang, Nie, Zhu (bib12) 2014
Shi (10.1016/j.procs.2024.09.363_bib2) 2020; 29
10.1016/j.procs.2024.09.363_bib3
10.1016/j.procs.2024.09.363_bib1
10.1016/j.procs.2024.09.363_bib6
Wang (10.1016/j.procs.2024.09.363_bib12) 2014
10.1016/j.procs.2024.09.363_bib7
10.1016/j.procs.2024.09.363_bib4
10.1016/j.procs.2024.09.363_bib5
Duan (10.1016/j.procs.2024.09.363_bib10) 2022
Shahroudy (10.1016/j.procs.2024.09.363_bib11) 2016
10.1016/j.procs.2024.09.363_bib8
10.1016/j.procs.2024.09.363_bib9
References_xml – reference: J. Lee, M. Lee, D. Lee, and S. Lee, “Hierarchically decomposed graph convolutional networks for skeleton-based action recognition,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 10 410–10 419, 2022.
– reference: L. Shi, Y. Zhang, J. Cheng, and H. Lu, “Two-stream adaptive graph convolutional networks for skeleton-based action recognition,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 12 026–12 035.
– start-page: 2649
  year: 2014
  end-page: 2656
  ident: bib12
  article-title: “Cross-view action modeling, learning, and recognition”
  publication-title: in 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, June 23-28
– start-page: 1010
  year: 2016
  end-page: 1019
  ident: bib11
  article-title: “Ntu rgb+d: A large scale dataset for 3d human activity analysis”
  publication-title: in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
– reference: M. W. Thomas N. Kipf, “Semi-supervised classification with graph convolutional networks,” in Proceedings of the International Conference on Learning Representations (ICLR), 2017.
– reference: H.-g. Chi, M. H. Ha, S. Chi, S. W. Lee, Q. Huang, and K. Ramani, “Infogcn: Representation learning for human skeleton-based action recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2022, pp. 20186–20 196.
– reference: Y. Chen, Z. Zhang, C. Yuan, B. Li, Y. Deng, and W. Hu, “Channel-wise topology refinement graph convolution for skeleton-based action recognition,” in Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 13 339–13 348.
– start-page: 7351
  year: 2022
  end-page: 7354
  ident: bib10
  article-title: “Pyskl: Towards good practices for skeleton action recognition”
  publication-title: in Proceedings of the 30th ACM International Conference on Multimedia
– reference: S. Yan, Y. Xiong, and D. Lin, “Spatial temporal graph convolutional networks for skeleton-based action recognition,” in Proceedings of the Thirty-second AAAI conference on artificial intelligence, 2018, p. 9.
– volume: 29
  start-page: 9532
  year: 2020
  end-page: 9545
  ident: bib2
  article-title: “Skeleton-based action recognition with multi-stream adaptive graph convolutional networks”
  publication-title: IEEE Transactions on Image Processing
– reference: A. A. Alemi, I. Fischer, J. V. Dillon, and K. Murphy, “Deep variational information bottleneck,” in 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, 2017.
– reference: I. S. Fischer, “The conditional entropy bottleneck,” Entropy, vol. 22, no. 9, p. 999, 2020. [Online]. Available:
– ident: 10.1016/j.procs.2024.09.363_bib3
– ident: 10.1016/j.procs.2024.09.363_bib4
  doi: 10.1109/CVPR52688.2022.01955
– ident: 10.1016/j.procs.2024.09.363_bib5
– ident: 10.1016/j.procs.2024.09.363_bib9
  doi: 10.3390/e22090999
– ident: 10.1016/j.procs.2024.09.363_bib1
  doi: 10.1609/aaai.v32i1.12328
– ident: 10.1016/j.procs.2024.09.363_bib6
  doi: 10.1109/ICCV48922.2021.01311
– ident: 10.1016/j.procs.2024.09.363_bib8
– start-page: 7351
  year: 2022
  ident: 10.1016/j.procs.2024.09.363_bib10
  article-title: “Pyskl: Towards good practices for skeleton action recognition”
  publication-title: in Proceedings of the 30th ACM International Conference on Multimedia
– ident: 10.1016/j.procs.2024.09.363_bib7
– volume: 29
  start-page: 9532
  year: 2020
  ident: 10.1016/j.procs.2024.09.363_bib2
  article-title: “Skeleton-based action recognition with multi-stream adaptive graph convolutional networks”
  publication-title: IEEE Transactions on Image Processing
  doi: 10.1109/TIP.2020.3028207
– start-page: 1010
  year: 2016
  ident: 10.1016/j.procs.2024.09.363_bib11
  article-title: “Ntu rgb+d: A large scale dataset for 3d human activity analysis”
  publication-title: in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition
– start-page: 2649
  year: 2014
  ident: 10.1016/j.procs.2024.09.363_bib12
  article-title: “Cross-view action modeling, learning, and recognition”
  publication-title: in 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, June 23-28
SSID ssj0000388917
Score 2.310653
Snippet We propose an enhanced topology representation learning method for the Skeleton-Based Human Action Recognition problem. In this work, we investigate the...
SourceID unpaywall
crossref
elsevier
SourceType Open Access Repository
Index Database
Publisher
StartPage 3093
SubjectTerms Graph Convolutional Networks
Graph Neural Networks
Multi-stream network
NTU
NUCLA
Regularization
Skeleton-based human action recognition
SummonAdditionalLinks – databaseName: Unpaywall
  dbid: UNPAY
  link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlZ1bS8MwFMeDbg8-Oa84UcmDj3a0a259nLIxBIfMFeZTaC5V3OiGbsj89CZpKioq872h4eRy_knO-R0AzsOMslA5-KCm9uomCoTMUBAxJo1eV4Iqmyh8MyD9FF2P8dhztm0uzJf3exeHZTdyy9VuOyBpTOJNUCfYCO8aqKeD2869PVIxSgNLMqm4Qj-3_M33bC2LebZ6zabTT76l1yiTtl8cktCGlExay4VoybdvwMY1u70Dtr3GhJ1yUuyCDV3sgUZVvwH65bwP0m7x6AIA4KgslbCCQxcY6_ORCujpqw_QSFt4NzEuylYcvjSeT0F3_Q87Li8CDqs4pFlxANJed3TVD3yZhUC2zQk1EDmLtVnYyr7kxohqmaMME5lZciGiKiJKWMSOZAgncchygZBMBBFGaWFhdsxDUCtmhT4CEONcG1dIpWAaZZalr1gkY2mOkSRhOWmCi2oA-LykafAqzOyJO5NxazIeJtz0pQlINUjcC4LS0XNj778bBh9Dus6Pjv_5_QmoLZ6X-tTokYU48_PwHT7d3ZE
  priority: 102
  providerName: Unpaywall
Title Enhanced Topology Representation Learning for Skeleton-Based Human Action Recognition
URI https://dx.doi.org/10.1016/j.procs.2024.09.363
https://doi.org/10.1016/j.procs.2024.09.363
UnpaywallVersion publishedVersion
Volume 246
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAFT
  databaseName: Open Access Digital Library
  customDbUrl:
  eissn: 1877-0509
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0000388917
  issn: 1877-0509
  databaseCode: KQ8
  dateStart: 20100501
  isFulltext: true
  titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html
  providerName: Colorado Alliance of Research Libraries
– providerCode: PRVESC
  databaseName: ScienceDirect
  customDbUrl:
  eissn: 1877-0509
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0000388917
  issn: 1877-0509
  databaseCode: IXB
  dateStart: 20100501
  isFulltext: true
  titleUrlDefault: https://www.sciencedirect.com
  providerName: Elsevier
– providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 1877-0509
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0000388917
  issn: 1877-0509
  databaseCode: M~E
  dateStart: 20100101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
– providerCode: PRVLSH
  databaseName: Elsevier Journals
  customDbUrl:
  mediaType: online
  eissn: 1877-0509
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0000388917
  issn: 1877-0509
  databaseCode: AKRWK
  dateStart: 20100501
  isFulltext: true
  providerName: Library Specific Holdings
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3NT8IwFG8IHvTitxE_SA8eXWCsW7vjIBCUSAywiKdmbTdFySAGYvjv7es6oweN8dhlTZvX9r3X1_d-P4SumgllTWXAB1MKoRvXETIhjsuY1P66ElRBofDdMOjH5HbqTyuoU9bCQFql1f2FTjfa2n5pWGk2lrNZY-wySgG9BN4HvNAw13qEAX3DzbT9GWcBtJPQEO_C_w50KMGHTJoX2AmA7W4ZvFMv8H4yUNvrfJls3pP5_IsB6u2jXes54qiY3AGqpPkh2itZGbA9pEco7ubP5lkfTwoChA0emXRXW2WUY4up-oS1w4rHr9rwAI9wW9szhU1QH0em2gGPyuyiRX6M4l530uk7ljzBkS1973RExrxUH1cF77MeoanMSOIHMgE8QkKVGygBwDmSET_0miwThMhQBEL7T77QevAEVfNFnp4i7PtZqg0clYKlJAGEfMVc6Ul9OQxClgU1dF1KjC8LjAxeJo-9cCNgDgLmzZDrudRQUEqVf1tqrrX47x2dzzX4y0Bn_x3oHO1Aq4i0XKDq6m2dXmrfYyXqaCsajB4GdbPJdCse3kePHwFA2Y0
linkProvider Elsevier
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LTwIxEG4IHvDi24jPHjy6gWW73e4RCAQVOPBIuDXbxypKFmIkhn9vp9sletAYr7tp2kzbmek8vg-h23oSsbqy4IM6gtCN7wmZEM9nTBp_XYlIQaPwYEh7U_IwC2cl1C56YaCs0un-XKdbbe2-1Jw0a6v5vDb2WRQBegnkB4IYmGt3SGi8E-jim7W2gRaAO4kt8y4M8GBEgT5k67zAUABud8MCngY0-MlCVdbZKtl8JIvFFwvUPUB7znXEzXx1h6iksyO0X9AyYHdLj9G0kz3bvD6e5AwIGzyy9a6uzSjDDlT1CRuPFY9fjeUBIuGWMWgK26g-btp2BzwqyouW2QmadjuTds9z7AmebJiHpydSFmhzXxUkaAMSaZmSJKQyAUBCEimfKgHIOZKRMA7qLBWEyFhQYRyoUBhFeIrK2TLTZwiHYaqNhYukYJokAJGvmC8DaV6HNGYpraK7QmJ8lYNk8KJ67IVbAXMQMK_H3KylimghVf5tr7lR478P9LZ78JeJzv870Q2q9CaDPu_fDx8v0C78ycMul6j8_rbWV8YReRfX9qB9AmUf2WA
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlZ1bS8MwFMeDbg8-Oa84UcmDj3a0a259nLIxBIfMFeZTaC5V3OiGbsj89CZpKioq872h4eRy_knO-R0AzsOMslA5-KCm9uomCoTMUBAxJo1eV4Iqmyh8MyD9FF2P8dhztm0uzJf3exeHZTdyy9VuOyBpTOJNUCfYCO8aqKeD2869PVIxSgNLMqm4Qj-3_M33bC2LebZ6zabTT76l1yiTtl8cktCGlExay4VoybdvwMY1u70Dtr3GhJ1yUuyCDV3sgUZVvwH65bwP0m7x6AIA4KgslbCCQxcY6_ORCujpqw_QSFt4NzEuylYcvjSeT0F3_Q87Li8CDqs4pFlxANJed3TVD3yZhUC2zQk1EDmLtVnYyr7kxohqmaMME5lZciGiKiJKWMSOZAgncchygZBMBBFGaWFhdsxDUCtmhT4CEONcG1dIpWAaZZalr1gkY2mOkSRhOWmCi2oA-LykafAqzOyJO5NxazIeJtz0pQlINUjcC4LS0XNj778bBh9Dus6Pjv_5_QmoLZ6X-tTokYU48_PwHT7d3ZE
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Enhanced+Topology+Representation+Learning+for+Skeleton-Based+Human+Action+Recognition&rft.jtitle=Procedia+computer+science&rft.au=Anh%2C+Vu+Ho+Tran&rft.au=Nguyen%2C+Thi-Oanh&rft.date=2024&rft.pub=Elsevier+B.V&rft.issn=1877-0509&rft.eissn=1877-0509&rft.volume=246&rft.spage=3093&rft.epage=3102&rft_id=info:doi/10.1016%2Fj.procs.2024.09.363&rft.externalDocID=S1877050924023913
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1877-0509&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1877-0509&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1877-0509&client=summon