Trunk-branch contrastive network with multi-view deformable aggregation for multi-view action recognition
Multi-view action recognition aims to identify actions in a given multi-view scene. Traditional studies initially extracted refined features from each view, followed by implemented paired interaction and integration, but they potentially overlooked the critical local features in each view. When obse...
Saved in:
Published in | Pattern recognition Vol. 169; p. 111923 |
---|---|
Main Authors | , , , |
Format | Journal Article |
Language | English |
Published |
Elsevier Ltd
01.01.2026
|
Subjects | |
Online Access | Get full text |
ISSN | 0031-3203 |
DOI | 10.1016/j.patcog.2025.111923 |
Cover
Abstract | Multi-view action recognition aims to identify actions in a given multi-view scene. Traditional studies initially extracted refined features from each view, followed by implemented paired interaction and integration, but they potentially overlooked the critical local features in each view. When observing objects from multiple perspectives, individuals typically form a comprehensive impression and subsequently fill in specific details. Drawing inspiration from this cognitive process, we propose a novel trunk-branch contrastive network (TBCNet) for RGB-based multi-view action recognition. Distinctively, TBCNet first obtains fused features in the trunk block and then implicitly supplements vital details provided by the branch block via contrastive learning, generating a more informative and comprehensive action representation. Within this framework, we construct two core components: the multi-view deformable aggregation (MVDA) and the trunk-branch contrastive learning. MVDA employed in the trunk block effectively facilitates multi-view feature fusion and adaptive cross-view spatio-temporal correlation, where a global aggregation module (GAM) is utilized to emphasize significant spatial information and a composite relative position bias (CRPB) is designed to capture the intra- and cross-view relative positions. Moreover, a trunk-branch contrastive loss is constructed between aggregated features and refined details from each view. By incorporating two distinct weights for positive and negative samples, a weighted trunk-branch contrastive loss is proposed to extract valuable information and emphasize subtle inter-class differences. The effectiveness of TBCNet is verified by extensive experiments on four datasets including NTU-RGB+D 60, NTU-RGB+D 120, PKU-MMD, and N-UCLA dataset. Compared to other RGB-based methods, our approach achieves state-of-the-art performance in cross-subject and cross-setting protocols. |
---|---|
AbstractList | Multi-view action recognition aims to identify actions in a given multi-view scene. Traditional studies initially extracted refined features from each view, followed by implemented paired interaction and integration, but they potentially overlooked the critical local features in each view. When observing objects from multiple perspectives, individuals typically form a comprehensive impression and subsequently fill in specific details. Drawing inspiration from this cognitive process, we propose a novel trunk-branch contrastive network (TBCNet) for RGB-based multi-view action recognition. Distinctively, TBCNet first obtains fused features in the trunk block and then implicitly supplements vital details provided by the branch block via contrastive learning, generating a more informative and comprehensive action representation. Within this framework, we construct two core components: the multi-view deformable aggregation (MVDA) and the trunk-branch contrastive learning. MVDA employed in the trunk block effectively facilitates multi-view feature fusion and adaptive cross-view spatio-temporal correlation, where a global aggregation module (GAM) is utilized to emphasize significant spatial information and a composite relative position bias (CRPB) is designed to capture the intra- and cross-view relative positions. Moreover, a trunk-branch contrastive loss is constructed between aggregated features and refined details from each view. By incorporating two distinct weights for positive and negative samples, a weighted trunk-branch contrastive loss is proposed to extract valuable information and emphasize subtle inter-class differences. The effectiveness of TBCNet is verified by extensive experiments on four datasets including NTU-RGB+D 60, NTU-RGB+D 120, PKU-MMD, and N-UCLA dataset. Compared to other RGB-based methods, our approach achieves state-of-the-art performance in cross-subject and cross-setting protocols. |
ArticleNumber | 111923 |
Author | Yang, Yingyuan Liang, Guoyuan Wu, Xiaojun Wang, Can |
Author_xml | – sequence: 1 givenname: Yingyuan orcidid: 0009-0000-3937-207X surname: Yang fullname: Yang, Yingyuan organization: Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China – sequence: 2 givenname: Guoyuan orcidid: 0000-0002-5207-6969 surname: Liang fullname: Liang, Guoyuan email: gy.liang@siat.ac.cn organization: Guangdong Provincial Key Laboratory of Robotics and Intelligent System, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China – sequence: 3 givenname: Can orcidid: 0000-0002-0914-3994 surname: Wang fullname: Wang, Can organization: Guangdong Provincial Key Laboratory of Robotics and Intelligent System, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China – sequence: 4 givenname: Xiaojun orcidid: 0000-0003-4988-5420 surname: Wu fullname: Wu, Xiaojun organization: School of Intelligent Science and Engineering, Harbin Institute of Technology Shenzhen, Shenzhen 518055, China |
BookMark | eNp9UE1PwzAMzWFIbMA_4JA_0JKk7apekNDElzSJyzhHTuJ02Ucypdkm_j0Z5cCJi23Zfs_Pb0YmPngk5J6zkjM-f9iUB0g69KVgoik5552oJmTKWMWLSrDqmsyGYcMYb3ktpsSt4tFvCxXB6zXVwacIQ3InpB7TOcQtPbu0pvvjLrni5PBMDdoQ96B2SKHvI_aQXPA0N_9ugf7pRsxSvLvUt-TKwm7Au998Qz5fnleLt2L58fq-eFoWWjRtKpoGjKqtEbZVqsYGuOW6VYzh3LIcgVmWx6abazQ1VBpBQCu0tm2nO6OqG1KPvDqGYYho5SG6PcQvyZm8WCQ3crRIXiySo0UZ9jjCMGvLL0Q5aIc-H3H5iSRNcP8TfAM0nXoX |
Cites_doi | 10.1109/CVPR.2014.339 10.1016/j.neucom.2019.12.151 10.1109/CVPR.2016.115 10.1109/ICCV48922.2021.00986 10.1109/CVPR52688.2022.00475 10.1109/ACCESS.2022.3201227 10.1109/CVPR52733.2024.02528 10.1109/CVPR52729.2023.00544 10.1109/ICCV51070.2023.00942 10.1109/WACV56688.2023.00338 10.1609/aaai.v38i5.28290 10.1007/978-3-030-01240-3_28 10.1016/j.patcog.2024.110427 10.1145/3474085.3475310 10.1109/CVPR52688.2022.01955 10.1109/TIP.2024.3391913 10.1109/CVPR52733.2024.01736 10.1109/ICCV.2019.00718 10.1109/CVPR.2018.00685 10.1109/TIP.2023.3273459 10.1109/TPAMI.2022.3177813 10.1109/TCSVT.2021.3076165 10.1109/ICCV.2019.00630 10.1109/CVPR52688.2022.00678 10.1109/CVPR52688.2022.00298 10.1007/978-3-030-58583-9_26 10.1016/j.patcog.2024.111106 10.1109/WACV56688.2023.00553 10.1109/TPAMI.2019.2916873 10.1007/978-3-030-58621-8_45 10.1109/ICASSP43922.2022.9746006 |
ContentType | Journal Article |
Copyright | 2025 Elsevier Ltd |
Copyright_xml | – notice: 2025 Elsevier Ltd |
DBID | AAYXX CITATION |
DOI | 10.1016/j.patcog.2025.111923 |
DatabaseName | CrossRef |
DatabaseTitle | CrossRef |
DatabaseTitleList | |
DeliveryMethod | fulltext_linktorsrc |
Discipline | Computer Science |
ExternalDocumentID | 10_1016_j_patcog_2025_111923 S0031320325005837 |
GroupedDBID | --K --M -D8 -DT -~X .DC .~1 0R~ 123 1B1 1RT 1~. 1~5 29O 4.4 457 4G. 53G 5VS 7-5 71M 8P~ 9JN AABNK AAEDT AAEDW AAIKJ AAKOC AALRI AAOAW AAQFI AAQXK AATTM AAXKI AAXUO AAYFN AAYWO ABBOA ABDPE ABEFU ABFNM ABFRF ABHFT ABJNI ABMAC ABWVN ABXDB ACBEA ACDAQ ACGFO ACGFS ACNNM ACRLP ACRPL ACVFH ACZNC ADBBV ADCNI ADEZE ADJOM ADMUD ADMXK ADNMO ADTZH AEBSH AECPX AEFWE AEIPS AEKER AENEX AEUPX AFJKZ AFPUW AFTJW AGCQF AGHFR AGQPQ AGUBO AGYEJ AHHHB AHJVU AHZHX AIALX AIEXJ AIGII AIIUN AIKHN AITUG AKBMS AKRWK AKYEP ALMA_UNASSIGNED_HOLDINGS AMRAJ ANKPU AOUOD APXCP ASPBG AVWKF AXJTR AZFZN BJAXD BKOJK BLXMC CS3 DU5 EBS EFJIC EFKBS EFLBG EJD EO8 EO9 EP2 EP3 F0J F5P FD6 FDB FEDTE FGOYB FIRID FNPLU FYGXN G-Q GBLVA GBOLZ HLZ HVGLF HZ~ H~9 IHE J1W JJJVA KOM KZ1 LG9 LMP LY1 M41 MO0 N9A O-L O9- OAUVE OZT P-8 P-9 P2P PC. Q38 R2- RNS ROL RPZ SBC SDF SDG SDP SDS SES SEW SPC SPCBC SST SSV SSZ T5K TN5 UNMZH VOH WUQ XJE XPP ZMT ZY4 ~G- AAYXX ACLOT CITATION ~HD |
ID | FETCH-LOGICAL-c257t-55adb4fd2f7bb4e5a1f1c7b00e6f000ea0f04fdd96ced4a3cea2a72ccf79c9db3 |
IEDL.DBID | .~1 |
ISSN | 0031-3203 |
IngestDate | Wed Oct 01 05:34:28 EDT 2025 Sat Sep 13 17:01:52 EDT 2025 |
IsPeerReviewed | true |
IsScholarly | true |
Keywords | Action recognition Multi-view video analytics Deformable attention Contrastive learning |
Language | English |
LinkModel | DirectLink |
MergedId | FETCHMERGED-LOGICAL-c257t-55adb4fd2f7bb4e5a1f1c7b00e6f000ea0f04fdd96ced4a3cea2a72ccf79c9db3 |
ORCID | 0000-0002-0914-3994 0000-0003-4988-5420 0000-0002-5207-6969 0009-0000-3937-207X |
ParticipantIDs | crossref_primary_10_1016_j_patcog_2025_111923 elsevier_sciencedirect_doi_10_1016_j_patcog_2025_111923 |
ProviderPackageCode | CITATION AAYXX |
PublicationCentury | 2000 |
PublicationDate | January 2026 2026-01-00 |
PublicationDateYYYYMMDD | 2026-01-01 |
PublicationDate_xml | – month: 01 year: 2026 text: January 2026 |
PublicationDecade | 2020 |
PublicationTitle | Pattern recognition |
PublicationYear | 2026 |
Publisher | Elsevier Ltd |
Publisher_xml | – name: Elsevier Ltd |
References | G. Ke, B. Wang, X. Wang, S. He, Rethinking Multi-View Representation Learning via Distilled Disentangling, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2024, pp. 26764–26773. A. Shahroudy, J. Liu, T.-T. Ng, G. Wang, Ntu rgb+ d: A large scale dataset for 3d human activity analysis, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2016, pp. 1010–1019. Liu, Hu, Li, Song, Liu (b25) 2017 C. Feichtenhofer, H. Fan, J. Malik, K. He, SlowFast Networks for Video Recognition, in: Proc. IEEE/CVF Int. Conf. Comput. Vis., 2018, pp. 6201–6210. Y. Tian, D. Krishnan, P. Isola, Contrastive Multiview Coding, in: Proc. Eur. Conf. Comput. Vis., 2020, pp. 776–794. Pose Induced Video Transformers for Understanding Activities of Daily Living, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2024, pp. 18340–18350. Z. Xia, X. Pan, S. Song, L.E. Li, G. Huang, Vision Transformer with Deformable Attention, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 4784–4793. J. Zhu, Z. Wang, J. Chen, Y.-P.P. Chen, Y. Jiang, Balanced Contrastive Learning for Long-Tailed Visual Recognition, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 6898–6907. Wu, Ding, Wan, Li, Nian (b30) 2025; 159 S. Das, M.S. Ryoo, Viewclr: Learning self-supervised video representation for unseen viewpoints, in: Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis., 2023, pp. 5573–5583. Nguyen, Kawanishi, John, Komamizu, Ide (b9) 2025 J. Lin, C. Gan, S. Han, TSM: Temporal Shift Module for Efficient Video Understanding, in: Proc. IEEE/CVF Int. Conf. Comput. Vis., 2018, pp. 7082–7092. Nguyen, Kawanishi, Komamizu, Ide (b11) 2024; vol. 50 H.-g. Chi, M.H. Ha, S. Chi, S.W. Lee, Q. Huang, K. Ramani, Infogcn: Representation learning for human skeleton-based action recognition, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 20186–20196. K. Shah, A. Shah, C.P. Lau, C.M. de Melo, R. Chellappa, Multi-view action recognition using contrastive learning, in: Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis., 2023, pp. 3381–3391. Yu, Liu, Zhang, Zhong, Chan (b31) 2023; 45 P. Khosla, P. Teterwak, C. Wang, A. Sarna, Y. Tian, P. Isola, A. Maschinot, C. Liu, D. Krishnan, Supervised contrastive learning, in: Proc. Adv. Neural Inf. Process. Syst., 2020, pp. 18661–18673. Cheng, Liu, Ren, Cheng, Liu (b35) 2022; 10 L. Wang, P. Koniusz, 3mformer: Multi-order multi-mode transformer for skeletal action recognition, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 5620–5631. J. Wang, X. Nie, Y. Xia, Y. Wu, S.-C. Zhu, Cross-View Action Modeling, Learning, and Recognition, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2014, pp. 2649–2656. Ullah, Muhammad, Hussain, Baik (b3) 2021; 435 Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin Transformer: Hierarchical Vision Transformer using Shifted Windows, in: Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 9992–10002. Liu, Zhong, Zhou, Jiang, Wang, Lin (b4) 2023; 32 S. Vyas, Y.S. Rawat, M. Shah, Multi-view action recognition using cross-view video prediction, in: Proc. Eur. Conf. Comput. Vis., 2020, pp. 427–444. K. Hara, H. Kataoka, Y. Satoh, Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet?, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2018, pp. 6546–6555. Ke, Hong, Zeng, Liu, Sun, Xie (b6) 2021 M. Yasuda, Y. Ohishi, S. Saito, N. Harado, Multi-View And Multi-Modal Event Detection Utilizing Transformer-Based Multi-Sensor Fusion, in: Proc. IEEE Int. Conf. Acoustics, Speech Signal Process, 2022, pp. 4638–4642. Wu, Ma, Wang, Xu, Xu, Li (b29) 2024; 151 Ma, Wu, Feng, Wang, Gao (b13) 2024; 33 S. Kim, D. Ahn, B.C. Ko, Cross-modal learning with 3D deformable attention for action recognition, in: Proc. IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 10265–10275. Y. Hou, L. Zheng, Multiview detection with shadow transformer (and view-coherent data augmentation), in: Proc. ACM Int. Conf. Multimedia, 2021, pp. 1673–1682. D. Wang, W. Ouyang, W. Li, D. Xu, Dividing and aggregating network for multi-view action recognition, in: Proc. Eur. Conf. Comput. Vis., 2018, pp. 457–467. H. Duan, Y. Zhao, K. Chen, D. Lin, B. Dai, Revisiting skeleton-based action recognition, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 2969–2978. Liu, Shahroudy, Perez, Wang, Duan, Kot (b24) 2019; 42 D. Reilly, S. Das, Just Add Cheng, Ren, Zhang, Gao, Hao (b36) 2022; 32 N. Siddiqui, P. Tirupattur, M. Shah, DVANet: Disentangling View and Action Features for Multi-View Action Recognition, in: Proc. AAAI Conf. Artif. Intell., 2023, pp. 4873–4881. 10.1016/j.patcog.2025.111923_b23 Ke (10.1016/j.patcog.2025.111923_b6) 2021 10.1016/j.patcog.2025.111923_b26 Nguyen (10.1016/j.patcog.2025.111923_b9) 2025 Yu (10.1016/j.patcog.2025.111923_b31) 2023; 45 10.1016/j.patcog.2025.111923_b20 10.1016/j.patcog.2025.111923_b22 10.1016/j.patcog.2025.111923_b21 Ullah (10.1016/j.patcog.2025.111923_b3) 2021; 435 Nguyen (10.1016/j.patcog.2025.111923_b11) 2024; vol. 50 10.1016/j.patcog.2025.111923_b28 10.1016/j.patcog.2025.111923_b27 Cheng (10.1016/j.patcog.2025.111923_b35) 2022; 10 Ma (10.1016/j.patcog.2025.111923_b13) 2024; 33 Cheng (10.1016/j.patcog.2025.111923_b36) 2022; 32 Liu (10.1016/j.patcog.2025.111923_b4) 2023; 32 Liu (10.1016/j.patcog.2025.111923_b24) 2019; 42 10.1016/j.patcog.2025.111923_b12 10.1016/j.patcog.2025.111923_b34 10.1016/j.patcog.2025.111923_b15 10.1016/j.patcog.2025.111923_b14 10.1016/j.patcog.2025.111923_b33 10.1016/j.patcog.2025.111923_b10 10.1016/j.patcog.2025.111923_b32 10.1016/j.patcog.2025.111923_b2 10.1016/j.patcog.2025.111923_b1 10.1016/j.patcog.2025.111923_b17 10.1016/j.patcog.2025.111923_b16 10.1016/j.patcog.2025.111923_b19 10.1016/j.patcog.2025.111923_b18 10.1016/j.patcog.2025.111923_b8 10.1016/j.patcog.2025.111923_b7 Wu (10.1016/j.patcog.2025.111923_b29) 2024; 151 Liu (10.1016/j.patcog.2025.111923_b25) 2017 10.1016/j.patcog.2025.111923_b5 Wu (10.1016/j.patcog.2025.111923_b30) 2025; 159 |
References_xml | – reference: K. Shah, A. Shah, C.P. Lau, C.M. de Melo, R. Chellappa, Multi-view action recognition using contrastive learning, in: Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis., 2023, pp. 3381–3391. – reference: J. Wang, X. Nie, Y. Xia, Y. Wu, S.-C. Zhu, Cross-View Action Modeling, Learning, and Recognition, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2014, pp. 2649–2656. – volume: 32 start-page: 2719 year: 2023 end-page: 2733 ident: b4 article-title: Dual-recommendation disentanglement network for view fuzz in action recognition publication-title: IEEE Trans. Lmage Process. – volume: 435 start-page: 321 year: 2021 end-page: 329 ident: b3 article-title: Conflux LSTMs network: A novel approach for multi-view action recognition publication-title: Neurocomputing – reference: A. Shahroudy, J. Liu, T.-T. Ng, G. Wang, Ntu rgb+ d: A large scale dataset for 3d human activity analysis, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2016, pp. 1010–1019. – reference: S. Kim, D. Ahn, B.C. Ko, Cross-modal learning with 3D deformable attention for action recognition, in: Proc. IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 10265–10275. – volume: 151 year: 2024 ident: b29 article-title: Spatial–temporal hypergraph based on dual-stage attention network for multi-view data lightweight action recognition publication-title: Pattern Recognit. – volume: 45 start-page: 3522 year: 2023 end-page: 3538 ident: b31 article-title: MMNet: A model-based multimodal network for human action recognition in RGB-D videos publication-title: IEEE Trans. Pattern Anal. Mach. Intell. – reference: Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin Transformer: Hierarchical Vision Transformer using Shifted Windows, in: Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 9992–10002. – reference: S. Das, M.S. Ryoo, Viewclr: Learning self-supervised video representation for unseen viewpoints, in: Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis., 2023, pp. 5573–5583. – reference: Pose Induced Video Transformers for Understanding Activities of Daily Living, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2024, pp. 18340–18350. – reference: D. Reilly, S. Das, Just Add – reference: Y. Tian, D. Krishnan, P. Isola, Contrastive Multiview Coding, in: Proc. Eur. Conf. Comput. Vis., 2020, pp. 776–794. – start-page: 653 year: 2021 end-page: 660 ident: b6 article-title: CONAN: contrastive fusion networks for multi-view clustering publication-title: Proc. IEEE Int. Conf. Big Data – reference: Y. Hou, L. Zheng, Multiview detection with shadow transformer (and view-coherent data augmentation), in: Proc. ACM Int. Conf. Multimedia, 2021, pp. 1673–1682. – reference: D. Wang, W. Ouyang, W. Li, D. Xu, Dividing and aggregating network for multi-view action recognition, in: Proc. Eur. Conf. Comput. Vis., 2018, pp. 457–467. – volume: 33 start-page: 3301 year: 2024 end-page: 3313 ident: b13 article-title: Multi-view time-series hypergraph neural network for action recognition publication-title: IEEE Trans. Lmage Process. – reference: L. Wang, P. Koniusz, 3mformer: Multi-order multi-mode transformer for skeletal action recognition, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 5620–5631. – volume: 32 start-page: 1498 year: 2022 end-page: 1509 ident: b36 article-title: Cross-modality compensation convolutional neural networks for RGB-D action recognition publication-title: IEEE Trans. Circuits Syst. Video Technol. – volume: 10 start-page: 104190 year: 2022 end-page: 104201 ident: b35 article-title: Spatial-temporal information aggregation and cross-modality interactive learning for RGB-D-based human action recognition publication-title: IEEE Access – reference: H. Duan, Y. Zhao, K. Chen, D. Lin, B. Dai, Revisiting skeleton-based action recognition, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 2969–2978. – reference: M. Yasuda, Y. Ohishi, S. Saito, N. Harado, Multi-View And Multi-Modal Event Detection Utilizing Transformer-Based Multi-Sensor Fusion, in: Proc. IEEE Int. Conf. Acoustics, Speech Signal Process, 2022, pp. 4638–4642. – volume: vol. 50 start-page: 1 year: 2024 ident: b11 article-title: Action selection learning for multi-label multi-view action recognition publication-title: Proc. ACM Int. Conf. Multimedia Asia – volume: 42 start-page: 2684 year: 2019 end-page: 2701 ident: b24 article-title: NTU RGB+D 120: A large-scale benchmark for 3d human activity understanding publication-title: IEEE Trans. Pattern Anal. Mach. Intell. – reference: P. Khosla, P. Teterwak, C. Wang, A. Sarna, Y. Tian, P. Isola, A. Maschinot, C. Liu, D. Krishnan, Supervised contrastive learning, in: Proc. Adv. Neural Inf. Process. Syst., 2020, pp. 18661–18673. – reference: S. Vyas, Y.S. Rawat, M. Shah, Multi-view action recognition using cross-view video prediction, in: Proc. Eur. Conf. Comput. Vis., 2020, pp. 427–444. – reference: N. Siddiqui, P. Tirupattur, M. Shah, DVANet: Disentangling View and Action Features for Multi-View Action Recognition, in: Proc. AAAI Conf. Artif. Intell., 2023, pp. 4873–4881. – reference: C. Feichtenhofer, H. Fan, J. Malik, K. He, SlowFast Networks for Video Recognition, in: Proc. IEEE/CVF Int. Conf. Comput. Vis., 2018, pp. 6201–6210. – reference: Z. Xia, X. Pan, S. Song, L.E. Li, G. Huang, Vision Transformer with Deformable Attention, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 4784–4793. – reference: G. Ke, B. Wang, X. Wang, S. He, Rethinking Multi-View Representation Learning via Distilled Disentangling, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2024, pp. 26764–26773. – year: 2017 ident: b25 article-title: Pku-mmd: A large scale benchmark for continuous multi-modal human action understanding – reference: H.-g. Chi, M.H. Ha, S. Chi, S.W. Lee, Q. Huang, K. Ramani, Infogcn: Representation learning for human skeleton-based action recognition, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 20186–20196. – reference: J. Lin, C. Gan, S. Han, TSM: Temporal Shift Module for Efficient Video Understanding, in: Proc. IEEE/CVF Int. Conf. Comput. Vis., 2018, pp. 7082–7092. – reference: J. Zhu, Z. Wang, J. Chen, Y.-P.P. Chen, Y. Jiang, Balanced Contrastive Learning for Long-Tailed Visual Recognition, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 6898–6907. – volume: 159 year: 2025 ident: b30 article-title: Local and global self-attention enhanced graph convolutional network for skeleton-based action recognition publication-title: Pattern Recognit. – reference: K. Hara, H. Kataoka, Y. Satoh, Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet?, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2018, pp. 6546–6555. – year: 2025 ident: b9 article-title: MultiSensor-home: A wide-area multi-modal multi-view dataset for action recognition and transformer-based sensor fusion – ident: 10.1016/j.patcog.2025.111923_b26 doi: 10.1109/CVPR.2014.339 – volume: 435 start-page: 321 year: 2021 ident: 10.1016/j.patcog.2025.111923_b3 article-title: Conflux LSTMs network: A novel approach for multi-view action recognition publication-title: Neurocomputing doi: 10.1016/j.neucom.2019.12.151 – ident: 10.1016/j.patcog.2025.111923_b23 doi: 10.1109/CVPR.2016.115 – ident: 10.1016/j.patcog.2025.111923_b21 doi: 10.1109/ICCV48922.2021.00986 – year: 2017 ident: 10.1016/j.patcog.2025.111923_b25 – ident: 10.1016/j.patcog.2025.111923_b14 doi: 10.1109/CVPR52688.2022.00475 – volume: 10 start-page: 104190 year: 2022 ident: 10.1016/j.patcog.2025.111923_b35 article-title: Spatial-temporal information aggregation and cross-modality interactive learning for RGB-D-based human action recognition publication-title: IEEE Access doi: 10.1109/ACCESS.2022.3201227 – ident: 10.1016/j.patcog.2025.111923_b7 doi: 10.1109/CVPR52733.2024.02528 – ident: 10.1016/j.patcog.2025.111923_b12 doi: 10.1109/CVPR52729.2023.00544 – ident: 10.1016/j.patcog.2025.111923_b16 doi: 10.1109/ICCV51070.2023.00942 – ident: 10.1016/j.patcog.2025.111923_b19 doi: 10.1109/WACV56688.2023.00338 – ident: 10.1016/j.patcog.2025.111923_b5 doi: 10.1609/aaai.v38i5.28290 – ident: 10.1016/j.patcog.2025.111923_b2 doi: 10.1007/978-3-030-01240-3_28 – volume: 151 year: 2024 ident: 10.1016/j.patcog.2025.111923_b29 article-title: Spatial–temporal hypergraph based on dual-stage attention network for multi-view data lightweight action recognition publication-title: Pattern Recognit. doi: 10.1016/j.patcog.2024.110427 – ident: 10.1016/j.patcog.2025.111923_b15 doi: 10.1145/3474085.3475310 – ident: 10.1016/j.patcog.2025.111923_b28 doi: 10.1109/CVPR52688.2022.01955 – start-page: 653 year: 2021 ident: 10.1016/j.patcog.2025.111923_b6 article-title: CONAN: contrastive fusion networks for multi-view clustering – volume: 33 start-page: 3301 year: 2024 ident: 10.1016/j.patcog.2025.111923_b13 article-title: Multi-view time-series hypergraph neural network for action recognition publication-title: IEEE Trans. Lmage Process. doi: 10.1109/TIP.2024.3391913 – ident: 10.1016/j.patcog.2025.111923_b33 doi: 10.1109/CVPR52733.2024.01736 – ident: 10.1016/j.patcog.2025.111923_b34 doi: 10.1109/ICCV.2019.00718 – ident: 10.1016/j.patcog.2025.111923_b20 doi: 10.1109/CVPR.2018.00685 – volume: 32 start-page: 2719 year: 2023 ident: 10.1016/j.patcog.2025.111923_b4 article-title: Dual-recommendation disentanglement network for view fuzz in action recognition publication-title: IEEE Trans. Lmage Process. doi: 10.1109/TIP.2023.3273459 – volume: 45 start-page: 3522 issue: 3 year: 2023 ident: 10.1016/j.patcog.2025.111923_b31 article-title: MMNet: A model-based multimodal network for human action recognition in RGB-D videos publication-title: IEEE Trans. Pattern Anal. Mach. Intell. doi: 10.1109/TPAMI.2022.3177813 – volume: 32 start-page: 1498 issue: 3 year: 2022 ident: 10.1016/j.patcog.2025.111923_b36 article-title: Cross-modality compensation convolutional neural networks for RGB-D action recognition publication-title: IEEE Trans. Circuits Syst. Video Technol. doi: 10.1109/TCSVT.2021.3076165 – volume: vol. 50 start-page: 1 year: 2024 ident: 10.1016/j.patcog.2025.111923_b11 article-title: Action selection learning for multi-label multi-view action recognition – ident: 10.1016/j.patcog.2025.111923_b22 – ident: 10.1016/j.patcog.2025.111923_b27 doi: 10.1109/ICCV.2019.00630 – ident: 10.1016/j.patcog.2025.111923_b17 doi: 10.1109/CVPR52688.2022.00678 – ident: 10.1016/j.patcog.2025.111923_b32 doi: 10.1109/CVPR52688.2022.00298 – ident: 10.1016/j.patcog.2025.111923_b8 doi: 10.1007/978-3-030-58583-9_26 – year: 2025 ident: 10.1016/j.patcog.2025.111923_b9 – volume: 159 year: 2025 ident: 10.1016/j.patcog.2025.111923_b30 article-title: Local and global self-attention enhanced graph convolutional network for skeleton-based action recognition publication-title: Pattern Recognit. doi: 10.1016/j.patcog.2024.111106 – ident: 10.1016/j.patcog.2025.111923_b1 doi: 10.1109/WACV56688.2023.00553 – volume: 42 start-page: 2684 issue: 10 year: 2019 ident: 10.1016/j.patcog.2025.111923_b24 article-title: NTU RGB+D 120: A large-scale benchmark for 3d human activity understanding publication-title: IEEE Trans. Pattern Anal. Mach. Intell. doi: 10.1109/TPAMI.2019.2916873 – ident: 10.1016/j.patcog.2025.111923_b18 doi: 10.1007/978-3-030-58621-8_45 – ident: 10.1016/j.patcog.2025.111923_b10 doi: 10.1109/ICASSP43922.2022.9746006 |
SSID | ssj0017142 |
Score | 2.4907627 |
Snippet | Multi-view action recognition aims to identify actions in a given multi-view scene. Traditional studies initially extracted refined features from each view,... |
SourceID | crossref elsevier |
SourceType | Index Database Publisher |
StartPage | 111923 |
SubjectTerms | Action recognition Contrastive learning Deformable attention Multi-view video analytics |
Title | Trunk-branch contrastive network with multi-view deformable aggregation for multi-view action recognition |
URI | https://dx.doi.org/10.1016/j.patcog.2025.111923 |
Volume | 169 |
hasFullText | 1 |
inHoldings | 1 |
isFullTextHit | |
isPrint | |
journalDatabaseRights | – providerCode: PRVESC databaseName: Baden-Württemberg Complete Freedom Collection (Elsevier) issn: 0031-3203 databaseCode: GBLVA dateStart: 20110101 customDbUrl: isFulltext: true dateEnd: 99991231 titleUrlDefault: https://www.sciencedirect.com omitProxy: true ssIdentifier: ssj0017142 providerName: Elsevier – providerCode: PRVESC databaseName: Elsevier SD Complete Freedom Collection [SCCMFC] issn: 0031-3203 databaseCode: ACRLP dateStart: 19950101 customDbUrl: isFulltext: true dateEnd: 99991231 titleUrlDefault: https://www.sciencedirect.com omitProxy: true ssIdentifier: ssj0017142 providerName: Elsevier – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection issn: 0031-3203 databaseCode: .~1 dateStart: 19950101 customDbUrl: isFulltext: true dateEnd: 99991231 titleUrlDefault: https://www.sciencedirect.com omitProxy: true ssIdentifier: ssj0017142 providerName: Elsevier – providerCode: PRVESC databaseName: Elsevier SD Freedom Collection Journals [SCFCJ] issn: 0031-3203 databaseCode: AIKHN dateStart: 19950101 customDbUrl: isFulltext: true dateEnd: 99991231 titleUrlDefault: https://www.sciencedirect.com omitProxy: true ssIdentifier: ssj0017142 providerName: Elsevier – providerCode: PRVLSH databaseName: Elsevier Journals issn: 0031-3203 databaseCode: AKRWK dateStart: 19680101 customDbUrl: isFulltext: true mediaType: online dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0017142 providerName: Library Specific Holdings |
link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV09b8IwED0hunTpd1X6gTx0dQmJneARoSLaSkwgsUW2Y1PaKiAKa397z3aCqFR16BjHlqJ38fOd_c4HcK-NEKyILbWCM8pEIqiwUlLOUhmpTMU97dUW43Q0Zc8zPmvAoM6FcbLKivsDp3u2rlo6FZqd1WLhcnzdtYNRgot4xDHOchnsLHWyvoevnczD1fcON4YnXep61-lzXuO1QrpbzjFKjLnjDhEnvy9Pe0vO8ASOKl-R9MPnnELDlGdwXNdhINW0PIfFZL0t36lyNTJeiRefy09HY6QMIm_idluJ1w5SdxRACuN9VfVhiJxjxD339iHYuN8r5DyQncZoWV7AdPg4GYxoVUKBapyLG8q5LBSzaIxMKWa47NquznCqmdQiGRoZ2QhfFyJFvJlMtJGxzGKtbSa0KFRyCc1yWZorIMIoEWO83DOaMZVZdGwKyXSibZoao4oW0Bq5fBVuyshrCdlbHpDOHdJ5QLoFWQ1v_sPiOZL5nyOv_z3yBg7xqdpCuYXmZr01d-hUbFTb_zVtOOg_vYzG35wu0QY |
linkProvider | Elsevier |
linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LTwIxEJ4gHPTi24jPHrw27KPdpUdCJCDICRJum7bbImoWgvD_ne6DYGI8eG23yeZr-81M-00H4EkbIVgaWGoFZ5SJUFBhpaScRdJTsQraOldbjKP-lL3M-KwG3SoXxskqS-4vOD1n67KlVaLZWi0WLsfXPTvohWjEPY5x1gE0GEdOrkOjMxj2x7vLhNhnxaPhoU_dgCqDLpd5rZDxlnMMFAPu6EME4e8Was_q9E7huHQXSaf4ozOomewcTqpSDKTcmRewmKy32QdVrkzGG8n15_LLMRnJCp03cQeuJJcPUncbQFKTu6vq0xA5x6B7nk8Rwcb9r4q0B7KTGS2zS5j2nifdPi2rKFCN23FDOZepYhbnI1aKGS596-sYd5uJLPKhkZ71sDsVEULOZKiNDGQcaG1joUWqwiuoZ8vMXAMRRokAQ-a20Yyp2KJvk0qmQ22jyBiVNoFWyCWr4rGMpFKRvScF0olDOimQbkJcwZv8mPQE-fzPkTf_HvkIh_3J6ygZDcbDWzjCnvJE5Q7qm_XW3KOPsVEP5Rr6Bryv07E |
openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Trunk-branch+contrastive+network+with+multi-view+deformable+aggregation+for+multi-view+action+recognition&rft.jtitle=Pattern+recognition&rft.au=Yang%2C+Yingyuan&rft.au=Liang%2C+Guoyuan&rft.au=Wang%2C+Can&rft.au=Wu%2C+Xiaojun&rft.date=2026-01-01&rft.issn=0031-3203&rft.volume=169&rft.spage=111923&rft_id=info:doi/10.1016%2Fj.patcog.2025.111923&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_patcog_2025_111923 |
thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0031-3203&client=summon |
thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0031-3203&client=summon |
thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0031-3203&client=summon |