Enhanced Topology Representation Learning for Skeleton-Based Human Action Recognition

We propose an enhanced topology representation learning method for the Skeleton-Based Human Action Recognition problem. In this work, we investigate the application of an adaptive graph convolutional layer within the Spatial-Temporal Graph Convolutional Network (ST-GCN) to learn a flexible topology...

Full description

Saved in:

Bibliographic Details
Published in	Procedia computer science Vol. 246; pp. 3093 - 3102
Main Authors	Anh, Vu Ho Tran, Nguyen, Thi-Oanh
Format	Journal Article
Language	English
Published	Elsevier B.V 2024
Subjects	Graph Convolutional Networks Graph Neural Networks Multi-stream network NTU NUCLA Regularization Skeleton-based human action recognition NUCLA Skeleton-based human action recognition Multi-stream network Regularization Graph Convolutional Networks NTU Graph Neural Networks
Online Access	Get full text
ISSN	1877-0509 1877-0509
DOI	10.1016/j.procs.2024.09.363

Cover

Abstract	We propose an enhanced topology representation learning method for the Skeleton-Based Human Action Recognition problem. In this work, we investigate the application of an adaptive graph convolutional layer within the Spatial-Temporal Graph Convolutional Network (ST-GCN) to learn a flexible topology and enhance representation through regularization loss. We assess the effect of using an adaptive graph, which differs for each input to define the neighbors of a joint, instead of using a fixed heuristic graph. Additionally, by controlling the latent space, our model encodes a more effective latent representation for each action class, which can be easily differentiated by the classifier. Moreover, we evaluate the performance of the proposed method with a three-stream network and explore the potential for improved performance through the use of late fusion ensemble techniques on models trained with different modalities. Our proposal achieved promising results on multiple skeleton-based action recognition benchmarks, with an accuracy of 89.06% on the NTU RGB+D (NTU 60) cross-subject split and 87.89% on the Northwestern-UCLA (NUCLA) dataset, representing approximately 0.5% and 10% improvements over the baseline model on these datasets, respectively.
AbstractList	We propose an enhanced topology representation learning method for the Skeleton-Based Human Action Recognition problem. In this work, we investigate the application of an adaptive graph convolutional layer within the Spatial-Temporal Graph Convolutional Network (ST-GCN) to learn a flexible topology and enhance representation through regularization loss. We assess the effect of using an adaptive graph, which differs for each input to define the neighbors of a joint, instead of using a fixed heuristic graph. Additionally, by controlling the latent space, our model encodes a more effective latent representation for each action class, which can be easily differentiated by the classifier. Moreover, we evaluate the performance of the proposed method with a three-stream network and explore the potential for improved performance through the use of late fusion ensemble techniques on models trained with different modalities. Our proposal achieved promising results on multiple skeleton-based action recognition benchmarks, with an accuracy of 89.06% on the NTU RGB+D (NTU 60) cross-subject split and 87.89% on the Northwestern-UCLA (NUCLA) dataset, representing approximately 0.5% and 10% improvements over the baseline model on these datasets, respectively.
Author	Nguyen, Thi-Oanh Anh, Vu Ho Tran
Author_xml	– sequence: 1 givenname: Vu Ho Tran surname: Anh fullname: Anh, Vu Ho Tran email: vu.hta194885@sis.hust.edu.vn – sequence: 2 givenname: Thi-Oanh surname: Nguyen fullname: Nguyen, Thi-Oanh email: oanh.nguyenthi@hust.edu.vn
BookMark	eNqNkL1OwzAQgC0EEqX0CVjyAgl2bCfxwFCqQpEqIZV2thznUlxSO7JTUN-epGVgQtxyN9x3P98NurTOAkJ3BCcEk-x-l7Te6ZCkOGUJFgnN6AUakSLPY8yxuPxVX6NJCDvcBy0KQfIR2sztu7IaqmjtWte47TFaQeshgO1UZ5yNlqC8NXYb1c5Hbx_QQOds_KhCzywOe2WjqT41rkC7rTVDfYuuatUEmPzkMdo8zdezRbx8fX6ZTZexTgmlcVkXFAqaVoL3R7McdM0Uz7QilAmWVySrSpxhpgvGBcVFXTKmRZmVjHJeEk7HiJ3nHmyrjl-qaWTrzV75oyRYDnbkTp7syMGOxEL2i3qMnjHtXQge6n9SD2cK-o8-DXgZtIHBnfGgO1k58yf_DVvigcw
Cites_doi	10.1109/CVPR52688.2022.01955 10.3390/e22090999 10.1609/aaai.v32i1.12328 10.1109/ICCV48922.2021.01311 10.1109/TIP.2020.3028207
ContentType	Journal Article
Copyright	2024
Copyright_xml	– notice: 2024
DBID	6I. AAFTH AAYXX CITATION ADTOC UNPAY
DOI	10.1016/j.procs.2024.09.363
DatabaseName	ScienceDirect Open Access Titles Elsevier:ScienceDirect:Open Access CrossRef Unpaywall for CDI: Periodical Content Unpaywall
DatabaseTitle	CrossRef
DatabaseTitleList
Database_xml	– sequence: 1 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EISSN	1877-0509
EndPage	3102
ExternalDocumentID	10.1016/j.procs.2024.09.363 10_1016_j_procs_2024_09_363 S1877050924023913
GroupedDBID	--K 0R~ 1B1 457 5VS 6I. 71M AAEDT AAEDW AAFTH AAIKJ AALRI AAQFI AAXUO AAYWO ABMAC ABWVN ACGFS ACRPL ACVFH ADBBV ADCNI ADEZE ADNMO ADVLN AEUPX AEXQZ AFPUW AFTJW AGHFR AIGII AITUG AKBMS AKRWK AKYEP ALMA_UNASSIGNED_HOLDINGS AMRAJ E3Z EBS EJD EP3 FDB FNPLU HZ~ IXB KQ8 M41 M~E O-L O9- OK1 P2P RIG ROL SES SSZ AAYXX CITATION ~HD ADTOC UNPAY
ID	FETCH-LOGICAL-c2133-bf83e832d9536347ecf4a56ca134947d16db0604c8459308fb44c9b6b4355b153
IEDL.DBID	IXB
ISSN	1877-0509
IngestDate	Tue Aug 19 19:49:18 EDT 2025 Wed Oct 29 21:17:38 EDT 2025 Sat Aug 09 17:31:38 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	true
IsScholarly	true
Keywords	NUCLA Skeleton-based human action recognition Multi-stream network Regularization Graph Convolutional Networks NTU Graph Neural Networks
Language	English
License	This is an open access article under the CC BY-NC-ND license.
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c2133-bf83e832d9536347ecf4a56ca134947d16db0604c8459308fb44c9b6b4355b153
OpenAccessLink	https://www.sciencedirect.com/science/article/pii/S1877050924023913
PageCount	10
ParticipantIDs	unpaywall_primary_10_1016_j_procs_2024_09_363 crossref_primary_10_1016_j_procs_2024_09_363 elsevier_sciencedirect_doi_10_1016_j_procs_2024_09_363
PublicationCentury	2000
PublicationDate	2024 2024-00-00
PublicationDateYYYYMMDD	2024-01-01
PublicationDate_xml	– year: 2024 text: 2024
PublicationDecade	2020
PublicationTitle	Procedia computer science
PublicationYear	2024
Publisher	Elsevier B.V
Publisher_xml	– name: Elsevier B.V
References	S. Yan, Y. Xiong, and D. Lin, “Spatial temporal graph convolutional networks for skeleton-based action recognition,” in Proceedings of the Thirty-second AAAI conference on artificial intelligence, 2018, p. 9. Shi, Zhang, Cheng, Lu (bib2) 2020; 29 L. Shi, Y. Zhang, J. Cheng, and H. Lu, “Two-stream adaptive graph convolutional networks for skeleton-based action recognition,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 12 026–12 035. I. S. Fischer, “The conditional entropy bottleneck,” Entropy, vol. 22, no. 9, p. 999, 2020. [Online]. Available J. Lee, M. Lee, D. Lee, and S. Lee, “Hierarchically decomposed graph convolutional networks for skeleton-based action recognition,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 10 410–10 419, 2022. A. A. Alemi, I. Fischer, J. V. Dillon, and K. Murphy, “Deep variational information bottleneck,” in 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, 2017. Shahroudy, Liu, Ng, Wang (bib11) 2016 Y. Chen, Z. Zhang, C. Yuan, B. Li, Y. Deng, and W. Hu, “Channel-wise topology refinement graph convolution for skeleton-based action recognition,” in Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 13 339–13 348. H.-g. Chi, M. H. Ha, S. Chi, S. W. Lee, Q. Huang, and K. Ramani, “Infogcn: Representation learning for human skeleton-based action recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2022, pp. 20186–20 196. M. W. Thomas N. Kipf, “Semi-supervised classification with graph convolutional networks,” in Proceedings of the International Conference on Learning Representations (ICLR), 2017. Duan, Wang, Chen, Lin (bib10) 2022 Wang, Nie, Zhu (bib12) 2014 Shi (10.1016/j.procs.2024.09.363_bib2) 2020; 29 10.1016/j.procs.2024.09.363_bib3 10.1016/j.procs.2024.09.363_bib1 10.1016/j.procs.2024.09.363_bib6 Wang (10.1016/j.procs.2024.09.363_bib12) 2014 10.1016/j.procs.2024.09.363_bib7 10.1016/j.procs.2024.09.363_bib4 10.1016/j.procs.2024.09.363_bib5 Duan (10.1016/j.procs.2024.09.363_bib10) 2022 Shahroudy (10.1016/j.procs.2024.09.363_bib11) 2016 10.1016/j.procs.2024.09.363_bib8 10.1016/j.procs.2024.09.363_bib9
References_xml	– reference: J. Lee, M. Lee, D. Lee, and S. Lee, “Hierarchically decomposed graph convolutional networks for skeleton-based action recognition,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 10 410–10 419, 2022. – reference: L. Shi, Y. Zhang, J. Cheng, and H. Lu, “Two-stream adaptive graph convolutional networks for skeleton-based action recognition,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 12 026–12 035. – start-page: 2649 year: 2014 end-page: 2656 ident: bib12 article-title: “Cross-view action modeling, learning, and recognition” publication-title: in 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, June 23-28 – start-page: 1010 year: 2016 end-page: 1019 ident: bib11 article-title: “Ntu rgb+d: A large scale dataset for 3d human activity analysis” publication-title: in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition – reference: M. W. Thomas N. Kipf, “Semi-supervised classification with graph convolutional networks,” in Proceedings of the International Conference on Learning Representations (ICLR), 2017. – reference: H.-g. Chi, M. H. Ha, S. Chi, S. W. Lee, Q. Huang, and K. Ramani, “Infogcn: Representation learning for human skeleton-based action recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2022, pp. 20186–20 196. – reference: Y. Chen, Z. Zhang, C. Yuan, B. Li, Y. Deng, and W. Hu, “Channel-wise topology refinement graph convolution for skeleton-based action recognition,” in Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 13 339–13 348. – start-page: 7351 year: 2022 end-page: 7354 ident: bib10 article-title: “Pyskl: Towards good practices for skeleton action recognition” publication-title: in Proceedings of the 30th ACM International Conference on Multimedia – reference: S. Yan, Y. Xiong, and D. Lin, “Spatial temporal graph convolutional networks for skeleton-based action recognition,” in Proceedings of the Thirty-second AAAI conference on artificial intelligence, 2018, p. 9. – volume: 29 start-page: 9532 year: 2020 end-page: 9545 ident: bib2 article-title: “Skeleton-based action recognition with multi-stream adaptive graph convolutional networks” publication-title: IEEE Transactions on Image Processing – reference: A. A. Alemi, I. Fischer, J. V. Dillon, and K. Murphy, “Deep variational information bottleneck,” in 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, 2017. – reference: I. S. Fischer, “The conditional entropy bottleneck,” Entropy, vol. 22, no. 9, p. 999, 2020. [Online]. Available: – ident: 10.1016/j.procs.2024.09.363_bib3 – ident: 10.1016/j.procs.2024.09.363_bib4 doi: 10.1109/CVPR52688.2022.01955 – ident: 10.1016/j.procs.2024.09.363_bib5 – ident: 10.1016/j.procs.2024.09.363_bib9 doi: 10.3390/e22090999 – ident: 10.1016/j.procs.2024.09.363_bib1 doi: 10.1609/aaai.v32i1.12328 – ident: 10.1016/j.procs.2024.09.363_bib6 doi: 10.1109/ICCV48922.2021.01311 – ident: 10.1016/j.procs.2024.09.363_bib8 – start-page: 7351 year: 2022 ident: 10.1016/j.procs.2024.09.363_bib10 article-title: “Pyskl: Towards good practices for skeleton action recognition” publication-title: in Proceedings of the 30th ACM International Conference on Multimedia – ident: 10.1016/j.procs.2024.09.363_bib7 – volume: 29 start-page: 9532 year: 2020 ident: 10.1016/j.procs.2024.09.363_bib2 article-title: “Skeleton-based action recognition with multi-stream adaptive graph convolutional networks” publication-title: IEEE Transactions on Image Processing doi: 10.1109/TIP.2020.3028207 – start-page: 1010 year: 2016 ident: 10.1016/j.procs.2024.09.363_bib11 article-title: “Ntu rgb+d: A large scale dataset for 3d human activity analysis” publication-title: in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition – start-page: 2649 year: 2014 ident: 10.1016/j.procs.2024.09.363_bib12 article-title: “Cross-view action modeling, learning, and recognition” publication-title: in 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, June 23-28
SSID	ssj0000388917
Score	2.310653
Snippet	We propose an enhanced topology representation learning method for the Skeleton-Based Human Action Recognition problem. In this work, we investigate the...
SourceID	unpaywall crossref elsevier
SourceType	Open Access Repository Index Database Publisher
StartPage	3093
SubjectTerms	Graph Convolutional Networks Graph Neural Networks Multi-stream network NTU NUCLA Regularization Skeleton-based human action recognition
SummonAdditionalLinks	– databaseName: Unpaywall dbid: UNPAY link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlZ1bS8MwFMeDbg8-Oa84UcmDj3a0a259nLIxBIfMFeZTaC5V3OiGbsj89CZpKioq872h4eRy_knO-R0AzsOMslA5-KCm9uomCoTMUBAxJo1eV4Iqmyh8MyD9FF2P8dhztm0uzJf3exeHZTdyy9VuOyBpTOJNUCfYCO8aqKeD2869PVIxSgNLMqm4Qj-3_M33bC2LebZ6zabTT76l1yiTtl8cktCGlExay4VoybdvwMY1u70Dtr3GhJ1yUuyCDV3sgUZVvwH65bwP0m7x6AIA4KgslbCCQxcY6_ORCujpqw_QSFt4NzEuylYcvjSeT0F3_Q87Li8CDqs4pFlxANJed3TVD3yZhUC2zQk1EDmLtVnYyr7kxohqmaMME5lZciGiKiJKWMSOZAgncchygZBMBBFGaWFhdsxDUCtmhT4CEONcG1dIpWAaZZalr1gkY2mOkSRhOWmCi2oA-LykafAqzOyJO5NxazIeJtz0pQlINUjcC4LS0XNj778bBh9Dus6Pjv_5_QmoLZ6X-tTokYU48_PwHT7d3ZE priority: 102 providerName: Unpaywall
Title	Enhanced Topology Representation Learning for Skeleton-Based Human Action Recognition
URI	https://dx.doi.org/10.1016/j.procs.2024.09.363 https://doi.org/10.1016/j.procs.2024.09.363
UnpaywallVersion	publishedVersion
Volume	246
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVAFT databaseName: Open Access Digital Library customDbUrl: eissn: 1877-0509 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000388917 issn: 1877-0509 databaseCode: KQ8 dateStart: 20100501 isFulltext: true titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html providerName: Colorado Alliance of Research Libraries – providerCode: PRVESC databaseName: ScienceDirect customDbUrl: eissn: 1877-0509 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000388917 issn: 1877-0509 databaseCode: IXB dateStart: 20100501 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 1877-0509 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000388917 issn: 1877-0509 databaseCode: M~E dateStart: 20100101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre – providerCode: PRVLSH databaseName: Elsevier Journals customDbUrl: mediaType: online eissn: 1877-0509 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000388917 issn: 1877-0509 databaseCode: AKRWK dateStart: 20100501 isFulltext: true providerName: Library Specific Holdings
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3NT8IwFG8IHvTitxE_SA8eXWCsW7vjIBCUSAywiKdmbTdFySAGYvjv7es6oweN8dhlTZvX9r3X1_d-P4SumgllTWXAB1MKoRvXETIhjsuY1P66ElRBofDdMOjH5HbqTyuoU9bCQFql1f2FTjfa2n5pWGk2lrNZY-wySgG9BN4HvNAw13qEAX3DzbT9GWcBtJPQEO_C_w50KMGHTJoX2AmA7W4ZvFMv8H4yUNvrfJls3pP5_IsB6u2jXes54qiY3AGqpPkh2itZGbA9pEco7ubP5lkfTwoChA0emXRXW2WUY4up-oS1w4rHr9rwAI9wW9szhU1QH0em2gGPyuyiRX6M4l530uk7ljzBkS1973RExrxUH1cF77MeoanMSOIHMgE8QkKVGygBwDmSET_0miwThMhQBEL7T77QevAEVfNFnp4i7PtZqg0clYKlJAGEfMVc6Ul9OQxClgU1dF1KjC8LjAxeJo-9cCNgDgLmzZDrudRQUEqVf1tqrrX47x2dzzX4y0Bn_x3oHO1Aq4i0XKDq6m2dXmrfYyXqaCsajB4GdbPJdCse3kePHwFA2Y0
linkProvider	Elsevier
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LTwIxEG4IHvDi24jPHjy6gWW73e4RCAQVOPBIuDXbxypKFmIkhn9vp9sletAYr7tp2kzbmek8vg-h23oSsbqy4IM6gtCN7wmZEM9nTBp_XYlIQaPwYEh7U_IwC2cl1C56YaCs0un-XKdbbe2-1Jw0a6v5vDb2WRQBegnkB4IYmGt3SGi8E-jim7W2gRaAO4kt8y4M8GBEgT5k67zAUABud8MCngY0-MlCVdbZKtl8JIvFFwvUPUB7znXEzXx1h6iksyO0X9AyYHdLj9G0kz3bvD6e5AwIGzyy9a6uzSjDDlT1CRuPFY9fjeUBIuGWMWgK26g-btp2BzwqyouW2QmadjuTds9z7AmebJiHpydSFmhzXxUkaAMSaZmSJKQyAUBCEimfKgHIOZKRMA7qLBWEyFhQYRyoUBhFeIrK2TLTZwiHYaqNhYukYJokAJGvmC8DaV6HNGYpraK7QmJ8lYNk8KJ67IVbAXMQMK_H3KylimghVf5tr7lR478P9LZ78JeJzv870Q2q9CaDPu_fDx8v0C78ycMul6j8_rbWV8YReRfX9qB9AmUf2WA
linkToUnpaywall	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlZ1bS8MwFMeDbg8-Oa84UcmDj3a0a259nLIxBIfMFeZTaC5V3OiGbsj89CZpKioq872h4eRy_knO-R0AzsOMslA5-KCm9uomCoTMUBAxJo1eV4Iqmyh8MyD9FF2P8dhztm0uzJf3exeHZTdyy9VuOyBpTOJNUCfYCO8aqKeD2869PVIxSgNLMqm4Qj-3_M33bC2LebZ6zabTT76l1yiTtl8cktCGlExay4VoybdvwMY1u70Dtr3GhJ1yUuyCDV3sgUZVvwH65bwP0m7x6AIA4KgslbCCQxcY6_ORCujpqw_QSFt4NzEuylYcvjSeT0F3_Q87Li8CDqs4pFlxANJed3TVD3yZhUC2zQk1EDmLtVnYyr7kxohqmaMME5lZciGiKiJKWMSOZAgncchygZBMBBFGaWFhdsxDUCtmhT4CEONcG1dIpWAaZZalr1gkY2mOkSRhOWmCi2oA-LykafAqzOyJO5NxazIeJtz0pQlINUjcC4LS0XNj778bBh9Dus6Pjv_5_QmoLZ6X-tTokYU48_PwHT7d3ZE
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Enhanced+Topology+Representation+Learning+for+Skeleton-Based+Human+Action+Recognition&rft.jtitle=Procedia+computer+science&rft.au=Anh%2C+Vu+Ho+Tran&rft.au=Nguyen%2C+Thi-Oanh&rft.date=2024&rft.pub=Elsevier+B.V&rft.issn=1877-0509&rft.eissn=1877-0509&rft.volume=246&rft.spage=3093&rft.epage=3102&rft_id=info:doi/10.1016%2Fj.procs.2024.09.363&rft.externalDocID=S1877050924023913
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1877-0509&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1877-0509&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1877-0509&client=summon