Enhanced Topology Representation Learning for Skeleton-Based Human Action Recognition
We propose an enhanced topology representation learning method for the Skeleton-Based Human Action Recognition problem. In this work, we investigate the application of an adaptive graph convolutional layer within the Spatial-Temporal Graph Convolutional Network (ST-GCN) to learn a flexible topology...
Saved in:
| Published in | Procedia computer science Vol. 246; pp. 3093 - 3102 |
|---|---|
| Main Authors | , |
| Format | Journal Article |
| Language | English |
| Published |
Elsevier B.V
2024
|
| Subjects | |
| Online Access | Get full text |
| ISSN | 1877-0509 1877-0509 |
| DOI | 10.1016/j.procs.2024.09.363 |
Cover
| Abstract | We propose an enhanced topology representation learning method for the Skeleton-Based Human Action Recognition problem. In this work, we investigate the application of an adaptive graph convolutional layer within the Spatial-Temporal Graph Convolutional Network (ST-GCN) to learn a flexible topology and enhance representation through regularization loss. We assess the effect of using an adaptive graph, which differs for each input to define the neighbors of a joint, instead of using a fixed heuristic graph. Additionally, by controlling the latent space, our model encodes a more effective latent representation for each action class, which can be easily differentiated by the classifier. Moreover, we evaluate the performance of the proposed method with a three-stream network and explore the potential for improved performance through the use of late fusion ensemble techniques on models trained with different modalities. Our proposal achieved promising results on multiple skeleton-based action recognition benchmarks, with an accuracy of 89.06% on the NTU RGB+D (NTU 60) cross-subject split and 87.89% on the Northwestern-UCLA (NUCLA) dataset, representing approximately 0.5% and 10% improvements over the baseline model on these datasets, respectively. |
|---|---|
| AbstractList | We propose an enhanced topology representation learning method for the Skeleton-Based Human Action Recognition problem. In this work, we investigate the application of an adaptive graph convolutional layer within the Spatial-Temporal Graph Convolutional Network (ST-GCN) to learn a flexible topology and enhance representation through regularization loss. We assess the effect of using an adaptive graph, which differs for each input to define the neighbors of a joint, instead of using a fixed heuristic graph. Additionally, by controlling the latent space, our model encodes a more effective latent representation for each action class, which can be easily differentiated by the classifier. Moreover, we evaluate the performance of the proposed method with a three-stream network and explore the potential for improved performance through the use of late fusion ensemble techniques on models trained with different modalities. Our proposal achieved promising results on multiple skeleton-based action recognition benchmarks, with an accuracy of 89.06% on the NTU RGB+D (NTU 60) cross-subject split and 87.89% on the Northwestern-UCLA (NUCLA) dataset, representing approximately 0.5% and 10% improvements over the baseline model on these datasets, respectively. |
| Author | Nguyen, Thi-Oanh Anh, Vu Ho Tran |
| Author_xml | – sequence: 1 givenname: Vu Ho Tran surname: Anh fullname: Anh, Vu Ho Tran email: vu.hta194885@sis.hust.edu.vn – sequence: 2 givenname: Thi-Oanh surname: Nguyen fullname: Nguyen, Thi-Oanh email: oanh.nguyenthi@hust.edu.vn |
| BookMark | eNqNkL1OwzAQgC0EEqX0CVjyAgl2bCfxwFCqQpEqIZV2thznUlxSO7JTUN-epGVgQtxyN9x3P98NurTOAkJ3BCcEk-x-l7Te6ZCkOGUJFgnN6AUakSLPY8yxuPxVX6NJCDvcBy0KQfIR2sztu7IaqmjtWte47TFaQeshgO1UZ5yNlqC8NXYb1c5Hbx_QQOds_KhCzywOe2WjqT41rkC7rTVDfYuuatUEmPzkMdo8zdezRbx8fX6ZTZexTgmlcVkXFAqaVoL3R7McdM0Uz7QilAmWVySrSpxhpgvGBcVFXTKmRZmVjHJeEk7HiJ3nHmyrjl-qaWTrzV75oyRYDnbkTp7syMGOxEL2i3qMnjHtXQge6n9SD2cK-o8-DXgZtIHBnfGgO1k58yf_DVvigcw |
| Cites_doi | 10.1109/CVPR52688.2022.01955 10.3390/e22090999 10.1609/aaai.v32i1.12328 10.1109/ICCV48922.2021.01311 10.1109/TIP.2020.3028207 |
| ContentType | Journal Article |
| Copyright | 2024 |
| Copyright_xml | – notice: 2024 |
| DBID | 6I. AAFTH AAYXX CITATION ADTOC UNPAY |
| DOI | 10.1016/j.procs.2024.09.363 |
| DatabaseName | ScienceDirect Open Access Titles Elsevier:ScienceDirect:Open Access CrossRef Unpaywall for CDI: Periodical Content Unpaywall |
| DatabaseTitle | CrossRef |
| DatabaseTitleList | |
| Database_xml | – sequence: 1 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Computer Science |
| EISSN | 1877-0509 |
| EndPage | 3102 |
| ExternalDocumentID | 10.1016/j.procs.2024.09.363 10_1016_j_procs_2024_09_363 S1877050924023913 |
| GroupedDBID | --K 0R~ 1B1 457 5VS 6I. 71M AAEDT AAEDW AAFTH AAIKJ AALRI AAQFI AAXUO AAYWO ABMAC ABWVN ACGFS ACRPL ACVFH ADBBV ADCNI ADEZE ADNMO ADVLN AEUPX AEXQZ AFPUW AFTJW AGHFR AIGII AITUG AKBMS AKRWK AKYEP ALMA_UNASSIGNED_HOLDINGS AMRAJ E3Z EBS EJD EP3 FDB FNPLU HZ~ IXB KQ8 M41 M~E O-L O9- OK1 P2P RIG ROL SES SSZ AAYXX CITATION ~HD ADTOC UNPAY |
| ID | FETCH-LOGICAL-c2133-bf83e832d9536347ecf4a56ca134947d16db0604c8459308fb44c9b6b4355b153 |
| IEDL.DBID | IXB |
| ISSN | 1877-0509 |
| IngestDate | Tue Aug 19 19:49:18 EDT 2025 Wed Oct 29 21:17:38 EDT 2025 Sat Aug 09 17:31:38 EDT 2025 |
| IsDoiOpenAccess | true |
| IsOpenAccess | true |
| IsPeerReviewed | true |
| IsScholarly | true |
| Keywords | NUCLA Skeleton-based human action recognition Multi-stream network Regularization Graph Convolutional Networks NTU Graph Neural Networks |
| Language | English |
| License | This is an open access article under the CC BY-NC-ND license. |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c2133-bf83e832d9536347ecf4a56ca134947d16db0604c8459308fb44c9b6b4355b153 |
| OpenAccessLink | https://www.sciencedirect.com/science/article/pii/S1877050924023913 |
| PageCount | 10 |
| ParticipantIDs | unpaywall_primary_10_1016_j_procs_2024_09_363 crossref_primary_10_1016_j_procs_2024_09_363 elsevier_sciencedirect_doi_10_1016_j_procs_2024_09_363 |
| PublicationCentury | 2000 |
| PublicationDate | 2024 2024-00-00 |
| PublicationDateYYYYMMDD | 2024-01-01 |
| PublicationDate_xml | – year: 2024 text: 2024 |
| PublicationDecade | 2020 |
| PublicationTitle | Procedia computer science |
| PublicationYear | 2024 |
| Publisher | Elsevier B.V |
| Publisher_xml | – name: Elsevier B.V |
| References | S. Yan, Y. Xiong, and D. Lin, “Spatial temporal graph convolutional networks for skeleton-based action recognition,” in Proceedings of the Thirty-second AAAI conference on artificial intelligence, 2018, p. 9. Shi, Zhang, Cheng, Lu (bib2) 2020; 29 L. Shi, Y. Zhang, J. Cheng, and H. Lu, “Two-stream adaptive graph convolutional networks for skeleton-based action recognition,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 12 026–12 035. I. S. Fischer, “The conditional entropy bottleneck,” Entropy, vol. 22, no. 9, p. 999, 2020. [Online]. Available J. Lee, M. Lee, D. Lee, and S. Lee, “Hierarchically decomposed graph convolutional networks for skeleton-based action recognition,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 10 410–10 419, 2022. A. A. Alemi, I. Fischer, J. V. Dillon, and K. Murphy, “Deep variational information bottleneck,” in 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, 2017. Shahroudy, Liu, Ng, Wang (bib11) 2016 Y. Chen, Z. Zhang, C. Yuan, B. Li, Y. Deng, and W. Hu, “Channel-wise topology refinement graph convolution for skeleton-based action recognition,” in Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 13 339–13 348. H.-g. Chi, M. H. Ha, S. Chi, S. W. Lee, Q. Huang, and K. Ramani, “Infogcn: Representation learning for human skeleton-based action recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2022, pp. 20186–20 196. M. W. Thomas N. Kipf, “Semi-supervised classification with graph convolutional networks,” in Proceedings of the International Conference on Learning Representations (ICLR), 2017. Duan, Wang, Chen, Lin (bib10) 2022 Wang, Nie, Zhu (bib12) 2014 Shi (10.1016/j.procs.2024.09.363_bib2) 2020; 29 10.1016/j.procs.2024.09.363_bib3 10.1016/j.procs.2024.09.363_bib1 10.1016/j.procs.2024.09.363_bib6 Wang (10.1016/j.procs.2024.09.363_bib12) 2014 10.1016/j.procs.2024.09.363_bib7 10.1016/j.procs.2024.09.363_bib4 10.1016/j.procs.2024.09.363_bib5 Duan (10.1016/j.procs.2024.09.363_bib10) 2022 Shahroudy (10.1016/j.procs.2024.09.363_bib11) 2016 10.1016/j.procs.2024.09.363_bib8 10.1016/j.procs.2024.09.363_bib9 |
| References_xml | – reference: J. Lee, M. Lee, D. Lee, and S. Lee, “Hierarchically decomposed graph convolutional networks for skeleton-based action recognition,” in Proceedings of the IEEE International Conference on Computer Vision (ICCV), pp. 10 410–10 419, 2022. – reference: L. Shi, Y. Zhang, J. Cheng, and H. Lu, “Two-stream adaptive graph convolutional networks for skeleton-based action recognition,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2019, pp. 12 026–12 035. – start-page: 2649 year: 2014 end-page: 2656 ident: bib12 article-title: “Cross-view action modeling, learning, and recognition” publication-title: in 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, June 23-28 – start-page: 1010 year: 2016 end-page: 1019 ident: bib11 article-title: “Ntu rgb+d: A large scale dataset for 3d human activity analysis” publication-title: in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition – reference: M. W. Thomas N. Kipf, “Semi-supervised classification with graph convolutional networks,” in Proceedings of the International Conference on Learning Representations (ICLR), 2017. – reference: H.-g. Chi, M. H. Ha, S. Chi, S. W. Lee, Q. Huang, and K. Ramani, “Infogcn: Representation learning for human skeleton-based action recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, June 2022, pp. 20186–20 196. – reference: Y. Chen, Z. Zhang, C. Yuan, B. Li, Y. Deng, and W. Hu, “Channel-wise topology refinement graph convolution for skeleton-based action recognition,” in Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), 2021, pp. 13 339–13 348. – start-page: 7351 year: 2022 end-page: 7354 ident: bib10 article-title: “Pyskl: Towards good practices for skeleton action recognition” publication-title: in Proceedings of the 30th ACM International Conference on Multimedia – reference: S. Yan, Y. Xiong, and D. Lin, “Spatial temporal graph convolutional networks for skeleton-based action recognition,” in Proceedings of the Thirty-second AAAI conference on artificial intelligence, 2018, p. 9. – volume: 29 start-page: 9532 year: 2020 end-page: 9545 ident: bib2 article-title: “Skeleton-based action recognition with multi-stream adaptive graph convolutional networks” publication-title: IEEE Transactions on Image Processing – reference: A. A. Alemi, I. Fischer, J. V. Dillon, and K. Murphy, “Deep variational information bottleneck,” in 5th International Conference on Learning Representations, ICLR 2017, Toulon, France, April 24-26, 2017, Conference Track Proceedings, 2017. – reference: I. S. Fischer, “The conditional entropy bottleneck,” Entropy, vol. 22, no. 9, p. 999, 2020. [Online]. Available: – ident: 10.1016/j.procs.2024.09.363_bib3 – ident: 10.1016/j.procs.2024.09.363_bib4 doi: 10.1109/CVPR52688.2022.01955 – ident: 10.1016/j.procs.2024.09.363_bib5 – ident: 10.1016/j.procs.2024.09.363_bib9 doi: 10.3390/e22090999 – ident: 10.1016/j.procs.2024.09.363_bib1 doi: 10.1609/aaai.v32i1.12328 – ident: 10.1016/j.procs.2024.09.363_bib6 doi: 10.1109/ICCV48922.2021.01311 – ident: 10.1016/j.procs.2024.09.363_bib8 – start-page: 7351 year: 2022 ident: 10.1016/j.procs.2024.09.363_bib10 article-title: “Pyskl: Towards good practices for skeleton action recognition” publication-title: in Proceedings of the 30th ACM International Conference on Multimedia – ident: 10.1016/j.procs.2024.09.363_bib7 – volume: 29 start-page: 9532 year: 2020 ident: 10.1016/j.procs.2024.09.363_bib2 article-title: “Skeleton-based action recognition with multi-stream adaptive graph convolutional networks” publication-title: IEEE Transactions on Image Processing doi: 10.1109/TIP.2020.3028207 – start-page: 1010 year: 2016 ident: 10.1016/j.procs.2024.09.363_bib11 article-title: “Ntu rgb+d: A large scale dataset for 3d human activity analysis” publication-title: in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition – start-page: 2649 year: 2014 ident: 10.1016/j.procs.2024.09.363_bib12 article-title: “Cross-view action modeling, learning, and recognition” publication-title: in 2014 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, June 23-28 |
| SSID | ssj0000388917 |
| Score | 2.310653 |
| Snippet | We propose an enhanced topology representation learning method for the Skeleton-Based Human Action Recognition problem. In this work, we investigate the... |
| SourceID | unpaywall crossref elsevier |
| SourceType | Open Access Repository Index Database Publisher |
| StartPage | 3093 |
| SubjectTerms | Graph Convolutional Networks Graph Neural Networks Multi-stream network NTU NUCLA Regularization Skeleton-based human action recognition |
| SummonAdditionalLinks | – databaseName: Unpaywall dbid: UNPAY link: http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlZ1bS8MwFMeDbg8-Oa84UcmDj3a0a259nLIxBIfMFeZTaC5V3OiGbsj89CZpKioq872h4eRy_knO-R0AzsOMslA5-KCm9uomCoTMUBAxJo1eV4Iqmyh8MyD9FF2P8dhztm0uzJf3exeHZTdyy9VuOyBpTOJNUCfYCO8aqKeD2869PVIxSgNLMqm4Qj-3_M33bC2LebZ6zabTT76l1yiTtl8cktCGlExay4VoybdvwMY1u70Dtr3GhJ1yUuyCDV3sgUZVvwH65bwP0m7x6AIA4KgslbCCQxcY6_ORCujpqw_QSFt4NzEuylYcvjSeT0F3_Q87Li8CDqs4pFlxANJed3TVD3yZhUC2zQk1EDmLtVnYyr7kxohqmaMME5lZciGiKiJKWMSOZAgncchygZBMBBFGaWFhdsxDUCtmhT4CEONcG1dIpWAaZZalr1gkY2mOkSRhOWmCi2oA-LykafAqzOyJO5NxazIeJtz0pQlINUjcC4LS0XNj778bBh9Dus6Pjv_5_QmoLZ6X-tTokYU48_PwHT7d3ZE priority: 102 providerName: Unpaywall |
| Title | Enhanced Topology Representation Learning for Skeleton-Based Human Action Recognition |
| URI | https://dx.doi.org/10.1016/j.procs.2024.09.363 https://doi.org/10.1016/j.procs.2024.09.363 |
| UnpaywallVersion | publishedVersion |
| Volume | 246 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVAFT databaseName: Open Access Digital Library customDbUrl: eissn: 1877-0509 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000388917 issn: 1877-0509 databaseCode: KQ8 dateStart: 20100501 isFulltext: true titleUrlDefault: http://grweb.coalliance.org/oadl/oadl.html providerName: Colorado Alliance of Research Libraries – providerCode: PRVESC databaseName: ScienceDirect customDbUrl: eissn: 1877-0509 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000388917 issn: 1877-0509 databaseCode: IXB dateStart: 20100501 isFulltext: true titleUrlDefault: https://www.sciencedirect.com providerName: Elsevier – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 1877-0509 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000388917 issn: 1877-0509 databaseCode: M~E dateStart: 20100101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre – providerCode: PRVLSH databaseName: Elsevier Journals customDbUrl: mediaType: online eissn: 1877-0509 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0000388917 issn: 1877-0509 databaseCode: AKRWK dateStart: 20100501 isFulltext: true providerName: Library Specific Holdings |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3NT8IwFG8IHvTitxE_SA8eXWCsW7vjIBCUSAywiKdmbTdFySAGYvjv7es6oweN8dhlTZvX9r3X1_d-P4SumgllTWXAB1MKoRvXETIhjsuY1P66ElRBofDdMOjH5HbqTyuoU9bCQFql1f2FTjfa2n5pWGk2lrNZY-wySgG9BN4HvNAw13qEAX3DzbT9GWcBtJPQEO_C_w50KMGHTJoX2AmA7W4ZvFMv8H4yUNvrfJls3pP5_IsB6u2jXes54qiY3AGqpPkh2itZGbA9pEco7ubP5lkfTwoChA0emXRXW2WUY4up-oS1w4rHr9rwAI9wW9szhU1QH0em2gGPyuyiRX6M4l530uk7ljzBkS1973RExrxUH1cF77MeoanMSOIHMgE8QkKVGygBwDmSET_0miwThMhQBEL7T77QevAEVfNFnp4i7PtZqg0clYKlJAGEfMVc6Ul9OQxClgU1dF1KjC8LjAxeJo-9cCNgDgLmzZDrudRQUEqVf1tqrrX47x2dzzX4y0Bn_x3oHO1Aq4i0XKDq6m2dXmrfYyXqaCsajB4GdbPJdCse3kePHwFA2Y0 |
| linkProvider | Elsevier |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LTwIxEG4IHvDi24jPHjy6gWW73e4RCAQVOPBIuDXbxypKFmIkhn9vp9sletAYr7tp2kzbmek8vg-h23oSsbqy4IM6gtCN7wmZEM9nTBp_XYlIQaPwYEh7U_IwC2cl1C56YaCs0un-XKdbbe2-1Jw0a6v5vDb2WRQBegnkB4IYmGt3SGi8E-jim7W2gRaAO4kt8y4M8GBEgT5k67zAUABud8MCngY0-MlCVdbZKtl8JIvFFwvUPUB7znXEzXx1h6iksyO0X9AyYHdLj9G0kz3bvD6e5AwIGzyy9a6uzSjDDlT1CRuPFY9fjeUBIuGWMWgK26g-btp2BzwqyouW2QmadjuTds9z7AmebJiHpydSFmhzXxUkaAMSaZmSJKQyAUBCEimfKgHIOZKRMA7qLBWEyFhQYRyoUBhFeIrK2TLTZwiHYaqNhYukYJokAJGvmC8DaV6HNGYpraK7QmJ8lYNk8KJ67IVbAXMQMK_H3KylimghVf5tr7lR478P9LZ78JeJzv870Q2q9CaDPu_fDx8v0C78ycMul6j8_rbWV8YReRfX9qB9AmUf2WA |
| linkToUnpaywall | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlZ1bS8MwFMeDbg8-Oa84UcmDj3a0a259nLIxBIfMFeZTaC5V3OiGbsj89CZpKioq872h4eRy_knO-R0AzsOMslA5-KCm9uomCoTMUBAxJo1eV4Iqmyh8MyD9FF2P8dhztm0uzJf3exeHZTdyy9VuOyBpTOJNUCfYCO8aqKeD2869PVIxSgNLMqm4Qj-3_M33bC2LebZ6zabTT76l1yiTtl8cktCGlExay4VoybdvwMY1u70Dtr3GhJ1yUuyCDV3sgUZVvwH65bwP0m7x6AIA4KgslbCCQxcY6_ORCujpqw_QSFt4NzEuylYcvjSeT0F3_Q87Li8CDqs4pFlxANJed3TVD3yZhUC2zQk1EDmLtVnYyr7kxohqmaMME5lZciGiKiJKWMSOZAgncchygZBMBBFGaWFhdsxDUCtmhT4CEONcG1dIpWAaZZalr1gkY2mOkSRhOWmCi2oA-LykafAqzOyJO5NxazIeJtz0pQlINUjcC4LS0XNj778bBh9Dus6Pjv_5_QmoLZ6X-tTokYU48_PwHT7d3ZE |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Enhanced+Topology+Representation+Learning+for+Skeleton-Based+Human+Action+Recognition&rft.jtitle=Procedia+computer+science&rft.au=Anh%2C+Vu+Ho+Tran&rft.au=Nguyen%2C+Thi-Oanh&rft.date=2024&rft.pub=Elsevier+B.V&rft.issn=1877-0509&rft.eissn=1877-0509&rft.volume=246&rft.spage=3093&rft.epage=3102&rft_id=info:doi/10.1016%2Fj.procs.2024.09.363&rft.externalDocID=S1877050924023913 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1877-0509&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1877-0509&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1877-0509&client=summon |