Trunk-branch contrastive network with multi-view deformable aggregation for multi-view action recognition

Multi-view action recognition aims to identify actions in a given multi-view scene. Traditional studies initially extracted refined features from each view, followed by implemented paired interaction and integration, but they potentially overlooked the critical local features in each view. When obse...

Full description

Saved in:
Bibliographic Details
Published inPattern recognition Vol. 169; p. 111923
Main Authors Yang, Yingyuan, Liang, Guoyuan, Wang, Can, Wu, Xiaojun
Format Journal Article
LanguageEnglish
Published Elsevier Ltd 01.01.2026
Subjects
Online AccessGet full text
ISSN0031-3203
DOI10.1016/j.patcog.2025.111923

Cover

Abstract Multi-view action recognition aims to identify actions in a given multi-view scene. Traditional studies initially extracted refined features from each view, followed by implemented paired interaction and integration, but they potentially overlooked the critical local features in each view. When observing objects from multiple perspectives, individuals typically form a comprehensive impression and subsequently fill in specific details. Drawing inspiration from this cognitive process, we propose a novel trunk-branch contrastive network (TBCNet) for RGB-based multi-view action recognition. Distinctively, TBCNet first obtains fused features in the trunk block and then implicitly supplements vital details provided by the branch block via contrastive learning, generating a more informative and comprehensive action representation. Within this framework, we construct two core components: the multi-view deformable aggregation (MVDA) and the trunk-branch contrastive learning. MVDA employed in the trunk block effectively facilitates multi-view feature fusion and adaptive cross-view spatio-temporal correlation, where a global aggregation module (GAM) is utilized to emphasize significant spatial information and a composite relative position bias (CRPB) is designed to capture the intra- and cross-view relative positions. Moreover, a trunk-branch contrastive loss is constructed between aggregated features and refined details from each view. By incorporating two distinct weights for positive and negative samples, a weighted trunk-branch contrastive loss is proposed to extract valuable information and emphasize subtle inter-class differences. The effectiveness of TBCNet is verified by extensive experiments on four datasets including NTU-RGB+D 60, NTU-RGB+D 120, PKU-MMD, and N-UCLA dataset. Compared to other RGB-based methods, our approach achieves state-of-the-art performance in cross-subject and cross-setting protocols.
AbstractList Multi-view action recognition aims to identify actions in a given multi-view scene. Traditional studies initially extracted refined features from each view, followed by implemented paired interaction and integration, but they potentially overlooked the critical local features in each view. When observing objects from multiple perspectives, individuals typically form a comprehensive impression and subsequently fill in specific details. Drawing inspiration from this cognitive process, we propose a novel trunk-branch contrastive network (TBCNet) for RGB-based multi-view action recognition. Distinctively, TBCNet first obtains fused features in the trunk block and then implicitly supplements vital details provided by the branch block via contrastive learning, generating a more informative and comprehensive action representation. Within this framework, we construct two core components: the multi-view deformable aggregation (MVDA) and the trunk-branch contrastive learning. MVDA employed in the trunk block effectively facilitates multi-view feature fusion and adaptive cross-view spatio-temporal correlation, where a global aggregation module (GAM) is utilized to emphasize significant spatial information and a composite relative position bias (CRPB) is designed to capture the intra- and cross-view relative positions. Moreover, a trunk-branch contrastive loss is constructed between aggregated features and refined details from each view. By incorporating two distinct weights for positive and negative samples, a weighted trunk-branch contrastive loss is proposed to extract valuable information and emphasize subtle inter-class differences. The effectiveness of TBCNet is verified by extensive experiments on four datasets including NTU-RGB+D 60, NTU-RGB+D 120, PKU-MMD, and N-UCLA dataset. Compared to other RGB-based methods, our approach achieves state-of-the-art performance in cross-subject and cross-setting protocols.
ArticleNumber 111923
Author Yang, Yingyuan
Liang, Guoyuan
Wu, Xiaojun
Wang, Can
Author_xml – sequence: 1
  givenname: Yingyuan
  orcidid: 0009-0000-3937-207X
  surname: Yang
  fullname: Yang, Yingyuan
  organization: Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
– sequence: 2
  givenname: Guoyuan
  orcidid: 0000-0002-5207-6969
  surname: Liang
  fullname: Liang, Guoyuan
  email: gy.liang@siat.ac.cn
  organization: Guangdong Provincial Key Laboratory of Robotics and Intelligent System, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
– sequence: 3
  givenname: Can
  orcidid: 0000-0002-0914-3994
  surname: Wang
  fullname: Wang, Can
  organization: Guangdong Provincial Key Laboratory of Robotics and Intelligent System, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen 518055, China
– sequence: 4
  givenname: Xiaojun
  orcidid: 0000-0003-4988-5420
  surname: Wu
  fullname: Wu, Xiaojun
  organization: School of Intelligent Science and Engineering, Harbin Institute of Technology Shenzhen, Shenzhen 518055, China
BookMark eNp9UE1PwzAMzWFIbMA_4JA_0JKk7apekNDElzSJyzhHTuJ02Ucypdkm_j0Z5cCJi23Zfs_Pb0YmPngk5J6zkjM-f9iUB0g69KVgoik5552oJmTKWMWLSrDqmsyGYcMYb3ktpsSt4tFvCxXB6zXVwacIQ3InpB7TOcQtPbu0pvvjLrni5PBMDdoQ96B2SKHvI_aQXPA0N_9ugf7pRsxSvLvUt-TKwm7Au998Qz5fnleLt2L58fq-eFoWWjRtKpoGjKqtEbZVqsYGuOW6VYzh3LIcgVmWx6abazQ1VBpBQCu0tm2nO6OqG1KPvDqGYYho5SG6PcQvyZm8WCQ3crRIXiySo0UZ9jjCMGvLL0Q5aIc-H3H5iSRNcP8TfAM0nXoX
Cites_doi 10.1109/CVPR.2014.339
10.1016/j.neucom.2019.12.151
10.1109/CVPR.2016.115
10.1109/ICCV48922.2021.00986
10.1109/CVPR52688.2022.00475
10.1109/ACCESS.2022.3201227
10.1109/CVPR52733.2024.02528
10.1109/CVPR52729.2023.00544
10.1109/ICCV51070.2023.00942
10.1109/WACV56688.2023.00338
10.1609/aaai.v38i5.28290
10.1007/978-3-030-01240-3_28
10.1016/j.patcog.2024.110427
10.1145/3474085.3475310
10.1109/CVPR52688.2022.01955
10.1109/TIP.2024.3391913
10.1109/CVPR52733.2024.01736
10.1109/ICCV.2019.00718
10.1109/CVPR.2018.00685
10.1109/TIP.2023.3273459
10.1109/TPAMI.2022.3177813
10.1109/TCSVT.2021.3076165
10.1109/ICCV.2019.00630
10.1109/CVPR52688.2022.00678
10.1109/CVPR52688.2022.00298
10.1007/978-3-030-58583-9_26
10.1016/j.patcog.2024.111106
10.1109/WACV56688.2023.00553
10.1109/TPAMI.2019.2916873
10.1007/978-3-030-58621-8_45
10.1109/ICASSP43922.2022.9746006
ContentType Journal Article
Copyright 2025 Elsevier Ltd
Copyright_xml – notice: 2025 Elsevier Ltd
DBID AAYXX
CITATION
DOI 10.1016/j.patcog.2025.111923
DatabaseName CrossRef
DatabaseTitle CrossRef
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
ExternalDocumentID 10_1016_j_patcog_2025_111923
S0031320325005837
GroupedDBID --K
--M
-D8
-DT
-~X
.DC
.~1
0R~
123
1B1
1RT
1~.
1~5
29O
4.4
457
4G.
53G
5VS
7-5
71M
8P~
9JN
AABNK
AAEDT
AAEDW
AAIKJ
AAKOC
AALRI
AAOAW
AAQFI
AAQXK
AATTM
AAXKI
AAXUO
AAYFN
AAYWO
ABBOA
ABDPE
ABEFU
ABFNM
ABFRF
ABHFT
ABJNI
ABMAC
ABWVN
ABXDB
ACBEA
ACDAQ
ACGFO
ACGFS
ACNNM
ACRLP
ACRPL
ACVFH
ACZNC
ADBBV
ADCNI
ADEZE
ADJOM
ADMUD
ADMXK
ADNMO
ADTZH
AEBSH
AECPX
AEFWE
AEIPS
AEKER
AENEX
AEUPX
AFJKZ
AFPUW
AFTJW
AGCQF
AGHFR
AGQPQ
AGUBO
AGYEJ
AHHHB
AHJVU
AHZHX
AIALX
AIEXJ
AIGII
AIIUN
AIKHN
AITUG
AKBMS
AKRWK
AKYEP
ALMA_UNASSIGNED_HOLDINGS
AMRAJ
ANKPU
AOUOD
APXCP
ASPBG
AVWKF
AXJTR
AZFZN
BJAXD
BKOJK
BLXMC
CS3
DU5
EBS
EFJIC
EFKBS
EFLBG
EJD
EO8
EO9
EP2
EP3
F0J
F5P
FD6
FDB
FEDTE
FGOYB
FIRID
FNPLU
FYGXN
G-Q
GBLVA
GBOLZ
HLZ
HVGLF
HZ~
H~9
IHE
J1W
JJJVA
KOM
KZ1
LG9
LMP
LY1
M41
MO0
N9A
O-L
O9-
OAUVE
OZT
P-8
P-9
P2P
PC.
Q38
R2-
RNS
ROL
RPZ
SBC
SDF
SDG
SDP
SDS
SES
SEW
SPC
SPCBC
SST
SSV
SSZ
T5K
TN5
UNMZH
VOH
WUQ
XJE
XPP
ZMT
ZY4
~G-
AAYXX
ACLOT
CITATION
~HD
ID FETCH-LOGICAL-c257t-55adb4fd2f7bb4e5a1f1c7b00e6f000ea0f04fdd96ced4a3cea2a72ccf79c9db3
IEDL.DBID .~1
ISSN 0031-3203
IngestDate Wed Oct 01 05:34:28 EDT 2025
Sat Sep 13 17:01:52 EDT 2025
IsPeerReviewed true
IsScholarly true
Keywords Action recognition
Multi-view video analytics
Deformable attention
Contrastive learning
Language English
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c257t-55adb4fd2f7bb4e5a1f1c7b00e6f000ea0f04fdd96ced4a3cea2a72ccf79c9db3
ORCID 0000-0002-0914-3994
0000-0003-4988-5420
0000-0002-5207-6969
0009-0000-3937-207X
ParticipantIDs crossref_primary_10_1016_j_patcog_2025_111923
elsevier_sciencedirect_doi_10_1016_j_patcog_2025_111923
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate January 2026
2026-01-00
PublicationDateYYYYMMDD 2026-01-01
PublicationDate_xml – month: 01
  year: 2026
  text: January 2026
PublicationDecade 2020
PublicationTitle Pattern recognition
PublicationYear 2026
Publisher Elsevier Ltd
Publisher_xml – name: Elsevier Ltd
References G. Ke, B. Wang, X. Wang, S. He, Rethinking Multi-View Representation Learning via Distilled Disentangling, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2024, pp. 26764–26773.
A. Shahroudy, J. Liu, T.-T. Ng, G. Wang, Ntu rgb+ d: A large scale dataset for 3d human activity analysis, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2016, pp. 1010–1019.
Liu, Hu, Li, Song, Liu (b25) 2017
C. Feichtenhofer, H. Fan, J. Malik, K. He, SlowFast Networks for Video Recognition, in: Proc. IEEE/CVF Int. Conf. Comput. Vis., 2018, pp. 6201–6210.
Y. Tian, D. Krishnan, P. Isola, Contrastive Multiview Coding, in: Proc. Eur. Conf. Comput. Vis., 2020, pp. 776–794.
Pose Induced Video Transformers for Understanding Activities of Daily Living, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2024, pp. 18340–18350.
Z. Xia, X. Pan, S. Song, L.E. Li, G. Huang, Vision Transformer with Deformable Attention, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 4784–4793.
J. Zhu, Z. Wang, J. Chen, Y.-P.P. Chen, Y. Jiang, Balanced Contrastive Learning for Long-Tailed Visual Recognition, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 6898–6907.
Wu, Ding, Wan, Li, Nian (b30) 2025; 159
S. Das, M.S. Ryoo, Viewclr: Learning self-supervised video representation for unseen viewpoints, in: Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis., 2023, pp. 5573–5583.
Nguyen, Kawanishi, John, Komamizu, Ide (b9) 2025
J. Lin, C. Gan, S. Han, TSM: Temporal Shift Module for Efficient Video Understanding, in: Proc. IEEE/CVF Int. Conf. Comput. Vis., 2018, pp. 7082–7092.
Nguyen, Kawanishi, Komamizu, Ide (b11) 2024; vol. 50
H.-g. Chi, M.H. Ha, S. Chi, S.W. Lee, Q. Huang, K. Ramani, Infogcn: Representation learning for human skeleton-based action recognition, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 20186–20196.
K. Shah, A. Shah, C.P. Lau, C.M. de Melo, R. Chellappa, Multi-view action recognition using contrastive learning, in: Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis., 2023, pp. 3381–3391.
Yu, Liu, Zhang, Zhong, Chan (b31) 2023; 45
P. Khosla, P. Teterwak, C. Wang, A. Sarna, Y. Tian, P. Isola, A. Maschinot, C. Liu, D. Krishnan, Supervised contrastive learning, in: Proc. Adv. Neural Inf. Process. Syst., 2020, pp. 18661–18673.
Cheng, Liu, Ren, Cheng, Liu (b35) 2022; 10
L. Wang, P. Koniusz, 3mformer: Multi-order multi-mode transformer for skeletal action recognition, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 5620–5631.
J. Wang, X. Nie, Y. Xia, Y. Wu, S.-C. Zhu, Cross-View Action Modeling, Learning, and Recognition, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2014, pp. 2649–2656.
Ullah, Muhammad, Hussain, Baik (b3) 2021; 435
Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin Transformer: Hierarchical Vision Transformer using Shifted Windows, in: Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 9992–10002.
Liu, Zhong, Zhou, Jiang, Wang, Lin (b4) 2023; 32
S. Vyas, Y.S. Rawat, M. Shah, Multi-view action recognition using cross-view video prediction, in: Proc. Eur. Conf. Comput. Vis., 2020, pp. 427–444.
K. Hara, H. Kataoka, Y. Satoh, Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet?, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2018, pp. 6546–6555.
Ke, Hong, Zeng, Liu, Sun, Xie (b6) 2021
M. Yasuda, Y. Ohishi, S. Saito, N. Harado, Multi-View And Multi-Modal Event Detection Utilizing Transformer-Based Multi-Sensor Fusion, in: Proc. IEEE Int. Conf. Acoustics, Speech Signal Process, 2022, pp. 4638–4642.
Wu, Ma, Wang, Xu, Xu, Li (b29) 2024; 151
Ma, Wu, Feng, Wang, Gao (b13) 2024; 33
S. Kim, D. Ahn, B.C. Ko, Cross-modal learning with 3D deformable attention for action recognition, in: Proc. IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 10265–10275.
Y. Hou, L. Zheng, Multiview detection with shadow transformer (and view-coherent data augmentation), in: Proc. ACM Int. Conf. Multimedia, 2021, pp. 1673–1682.
D. Wang, W. Ouyang, W. Li, D. Xu, Dividing and aggregating network for multi-view action recognition, in: Proc. Eur. Conf. Comput. Vis., 2018, pp. 457–467.
H. Duan, Y. Zhao, K. Chen, D. Lin, B. Dai, Revisiting skeleton-based action recognition, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 2969–2978.
Liu, Shahroudy, Perez, Wang, Duan, Kot (b24) 2019; 42
D. Reilly, S. Das, Just Add
Cheng, Ren, Zhang, Gao, Hao (b36) 2022; 32
N. Siddiqui, P. Tirupattur, M. Shah, DVANet: Disentangling View and Action Features for Multi-View Action Recognition, in: Proc. AAAI Conf. Artif. Intell., 2023, pp. 4873–4881.
10.1016/j.patcog.2025.111923_b23
Ke (10.1016/j.patcog.2025.111923_b6) 2021
10.1016/j.patcog.2025.111923_b26
Nguyen (10.1016/j.patcog.2025.111923_b9) 2025
Yu (10.1016/j.patcog.2025.111923_b31) 2023; 45
10.1016/j.patcog.2025.111923_b20
10.1016/j.patcog.2025.111923_b22
10.1016/j.patcog.2025.111923_b21
Ullah (10.1016/j.patcog.2025.111923_b3) 2021; 435
Nguyen (10.1016/j.patcog.2025.111923_b11) 2024; vol. 50
10.1016/j.patcog.2025.111923_b28
10.1016/j.patcog.2025.111923_b27
Cheng (10.1016/j.patcog.2025.111923_b35) 2022; 10
Ma (10.1016/j.patcog.2025.111923_b13) 2024; 33
Cheng (10.1016/j.patcog.2025.111923_b36) 2022; 32
Liu (10.1016/j.patcog.2025.111923_b4) 2023; 32
Liu (10.1016/j.patcog.2025.111923_b24) 2019; 42
10.1016/j.patcog.2025.111923_b12
10.1016/j.patcog.2025.111923_b34
10.1016/j.patcog.2025.111923_b15
10.1016/j.patcog.2025.111923_b14
10.1016/j.patcog.2025.111923_b33
10.1016/j.patcog.2025.111923_b10
10.1016/j.patcog.2025.111923_b32
10.1016/j.patcog.2025.111923_b2
10.1016/j.patcog.2025.111923_b1
10.1016/j.patcog.2025.111923_b17
10.1016/j.patcog.2025.111923_b16
10.1016/j.patcog.2025.111923_b19
10.1016/j.patcog.2025.111923_b18
10.1016/j.patcog.2025.111923_b8
10.1016/j.patcog.2025.111923_b7
Wu (10.1016/j.patcog.2025.111923_b29) 2024; 151
Liu (10.1016/j.patcog.2025.111923_b25) 2017
10.1016/j.patcog.2025.111923_b5
Wu (10.1016/j.patcog.2025.111923_b30) 2025; 159
References_xml – reference: K. Shah, A. Shah, C.P. Lau, C.M. de Melo, R. Chellappa, Multi-view action recognition using contrastive learning, in: Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis., 2023, pp. 3381–3391.
– reference: J. Wang, X. Nie, Y. Xia, Y. Wu, S.-C. Zhu, Cross-View Action Modeling, Learning, and Recognition, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2014, pp. 2649–2656.
– volume: 32
  start-page: 2719
  year: 2023
  end-page: 2733
  ident: b4
  article-title: Dual-recommendation disentanglement network for view fuzz in action recognition
  publication-title: IEEE Trans. Lmage Process.
– volume: 435
  start-page: 321
  year: 2021
  end-page: 329
  ident: b3
  article-title: Conflux LSTMs network: A novel approach for multi-view action recognition
  publication-title: Neurocomputing
– reference: A. Shahroudy, J. Liu, T.-T. Ng, G. Wang, Ntu rgb+ d: A large scale dataset for 3d human activity analysis, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2016, pp. 1010–1019.
– reference: S. Kim, D. Ahn, B.C. Ko, Cross-modal learning with 3D deformable attention for action recognition, in: Proc. IEEE/CVF Int. Conf. Comput. Vis., 2023, pp. 10265–10275.
– volume: 151
  year: 2024
  ident: b29
  article-title: Spatial–temporal hypergraph based on dual-stage attention network for multi-view data lightweight action recognition
  publication-title: Pattern Recognit.
– volume: 45
  start-page: 3522
  year: 2023
  end-page: 3538
  ident: b31
  article-title: MMNet: A model-based multimodal network for human action recognition in RGB-D videos
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
– reference: Z. Liu, Y. Lin, Y. Cao, H. Hu, Y. Wei, Z. Zhang, S. Lin, B. Guo, Swin Transformer: Hierarchical Vision Transformer using Shifted Windows, in: Proc. IEEE/CVF Int. Conf. Comput. Vis., 2021, pp. 9992–10002.
– reference: S. Das, M.S. Ryoo, Viewclr: Learning self-supervised video representation for unseen viewpoints, in: Proc. IEEE/CVF Winter Conf. Appl. Comput. Vis., 2023, pp. 5573–5583.
– reference: Pose Induced Video Transformers for Understanding Activities of Daily Living, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2024, pp. 18340–18350.
– reference: D. Reilly, S. Das, Just Add
– reference: Y. Tian, D. Krishnan, P. Isola, Contrastive Multiview Coding, in: Proc. Eur. Conf. Comput. Vis., 2020, pp. 776–794.
– start-page: 653
  year: 2021
  end-page: 660
  ident: b6
  article-title: CONAN: contrastive fusion networks for multi-view clustering
  publication-title: Proc. IEEE Int. Conf. Big Data
– reference: Y. Hou, L. Zheng, Multiview detection with shadow transformer (and view-coherent data augmentation), in: Proc. ACM Int. Conf. Multimedia, 2021, pp. 1673–1682.
– reference: D. Wang, W. Ouyang, W. Li, D. Xu, Dividing and aggregating network for multi-view action recognition, in: Proc. Eur. Conf. Comput. Vis., 2018, pp. 457–467.
– volume: 33
  start-page: 3301
  year: 2024
  end-page: 3313
  ident: b13
  article-title: Multi-view time-series hypergraph neural network for action recognition
  publication-title: IEEE Trans. Lmage Process.
– reference: L. Wang, P. Koniusz, 3mformer: Multi-order multi-mode transformer for skeletal action recognition, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2023, pp. 5620–5631.
– volume: 32
  start-page: 1498
  year: 2022
  end-page: 1509
  ident: b36
  article-title: Cross-modality compensation convolutional neural networks for RGB-D action recognition
  publication-title: IEEE Trans. Circuits Syst. Video Technol.
– volume: 10
  start-page: 104190
  year: 2022
  end-page: 104201
  ident: b35
  article-title: Spatial-temporal information aggregation and cross-modality interactive learning for RGB-D-based human action recognition
  publication-title: IEEE Access
– reference: H. Duan, Y. Zhao, K. Chen, D. Lin, B. Dai, Revisiting skeleton-based action recognition, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 2969–2978.
– reference: M. Yasuda, Y. Ohishi, S. Saito, N. Harado, Multi-View And Multi-Modal Event Detection Utilizing Transformer-Based Multi-Sensor Fusion, in: Proc. IEEE Int. Conf. Acoustics, Speech Signal Process, 2022, pp. 4638–4642.
– volume: vol. 50
  start-page: 1
  year: 2024
  ident: b11
  article-title: Action selection learning for multi-label multi-view action recognition
  publication-title: Proc. ACM Int. Conf. Multimedia Asia
– volume: 42
  start-page: 2684
  year: 2019
  end-page: 2701
  ident: b24
  article-title: NTU RGB+D 120: A large-scale benchmark for 3d human activity understanding
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
– reference: P. Khosla, P. Teterwak, C. Wang, A. Sarna, Y. Tian, P. Isola, A. Maschinot, C. Liu, D. Krishnan, Supervised contrastive learning, in: Proc. Adv. Neural Inf. Process. Syst., 2020, pp. 18661–18673.
– reference: S. Vyas, Y.S. Rawat, M. Shah, Multi-view action recognition using cross-view video prediction, in: Proc. Eur. Conf. Comput. Vis., 2020, pp. 427–444.
– reference: N. Siddiqui, P. Tirupattur, M. Shah, DVANet: Disentangling View and Action Features for Multi-View Action Recognition, in: Proc. AAAI Conf. Artif. Intell., 2023, pp. 4873–4881.
– reference: C. Feichtenhofer, H. Fan, J. Malik, K. He, SlowFast Networks for Video Recognition, in: Proc. IEEE/CVF Int. Conf. Comput. Vis., 2018, pp. 6201–6210.
– reference: Z. Xia, X. Pan, S. Song, L.E. Li, G. Huang, Vision Transformer with Deformable Attention, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 4784–4793.
– reference: G. Ke, B. Wang, X. Wang, S. He, Rethinking Multi-View Representation Learning via Distilled Disentangling, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2024, pp. 26764–26773.
– year: 2017
  ident: b25
  article-title: Pku-mmd: A large scale benchmark for continuous multi-modal human action understanding
– reference: H.-g. Chi, M.H. Ha, S. Chi, S.W. Lee, Q. Huang, K. Ramani, Infogcn: Representation learning for human skeleton-based action recognition, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 20186–20196.
– reference: J. Lin, C. Gan, S. Han, TSM: Temporal Shift Module for Efficient Video Understanding, in: Proc. IEEE/CVF Int. Conf. Comput. Vis., 2018, pp. 7082–7092.
– reference: J. Zhu, Z. Wang, J. Chen, Y.-P.P. Chen, Y. Jiang, Balanced Contrastive Learning for Long-Tailed Visual Recognition, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2022, pp. 6898–6907.
– volume: 159
  year: 2025
  ident: b30
  article-title: Local and global self-attention enhanced graph convolutional network for skeleton-based action recognition
  publication-title: Pattern Recognit.
– reference: K. Hara, H. Kataoka, Y. Satoh, Can spatiotemporal 3d cnns retrace the history of 2d cnns and imagenet?, in: Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit., 2018, pp. 6546–6555.
– year: 2025
  ident: b9
  article-title: MultiSensor-home: A wide-area multi-modal multi-view dataset for action recognition and transformer-based sensor fusion
– ident: 10.1016/j.patcog.2025.111923_b26
  doi: 10.1109/CVPR.2014.339
– volume: 435
  start-page: 321
  year: 2021
  ident: 10.1016/j.patcog.2025.111923_b3
  article-title: Conflux LSTMs network: A novel approach for multi-view action recognition
  publication-title: Neurocomputing
  doi: 10.1016/j.neucom.2019.12.151
– ident: 10.1016/j.patcog.2025.111923_b23
  doi: 10.1109/CVPR.2016.115
– ident: 10.1016/j.patcog.2025.111923_b21
  doi: 10.1109/ICCV48922.2021.00986
– year: 2017
  ident: 10.1016/j.patcog.2025.111923_b25
– ident: 10.1016/j.patcog.2025.111923_b14
  doi: 10.1109/CVPR52688.2022.00475
– volume: 10
  start-page: 104190
  year: 2022
  ident: 10.1016/j.patcog.2025.111923_b35
  article-title: Spatial-temporal information aggregation and cross-modality interactive learning for RGB-D-based human action recognition
  publication-title: IEEE Access
  doi: 10.1109/ACCESS.2022.3201227
– ident: 10.1016/j.patcog.2025.111923_b7
  doi: 10.1109/CVPR52733.2024.02528
– ident: 10.1016/j.patcog.2025.111923_b12
  doi: 10.1109/CVPR52729.2023.00544
– ident: 10.1016/j.patcog.2025.111923_b16
  doi: 10.1109/ICCV51070.2023.00942
– ident: 10.1016/j.patcog.2025.111923_b19
  doi: 10.1109/WACV56688.2023.00338
– ident: 10.1016/j.patcog.2025.111923_b5
  doi: 10.1609/aaai.v38i5.28290
– ident: 10.1016/j.patcog.2025.111923_b2
  doi: 10.1007/978-3-030-01240-3_28
– volume: 151
  year: 2024
  ident: 10.1016/j.patcog.2025.111923_b29
  article-title: Spatial–temporal hypergraph based on dual-stage attention network for multi-view data lightweight action recognition
  publication-title: Pattern Recognit.
  doi: 10.1016/j.patcog.2024.110427
– ident: 10.1016/j.patcog.2025.111923_b15
  doi: 10.1145/3474085.3475310
– ident: 10.1016/j.patcog.2025.111923_b28
  doi: 10.1109/CVPR52688.2022.01955
– start-page: 653
  year: 2021
  ident: 10.1016/j.patcog.2025.111923_b6
  article-title: CONAN: contrastive fusion networks for multi-view clustering
– volume: 33
  start-page: 3301
  year: 2024
  ident: 10.1016/j.patcog.2025.111923_b13
  article-title: Multi-view time-series hypergraph neural network for action recognition
  publication-title: IEEE Trans. Lmage Process.
  doi: 10.1109/TIP.2024.3391913
– ident: 10.1016/j.patcog.2025.111923_b33
  doi: 10.1109/CVPR52733.2024.01736
– ident: 10.1016/j.patcog.2025.111923_b34
  doi: 10.1109/ICCV.2019.00718
– ident: 10.1016/j.patcog.2025.111923_b20
  doi: 10.1109/CVPR.2018.00685
– volume: 32
  start-page: 2719
  year: 2023
  ident: 10.1016/j.patcog.2025.111923_b4
  article-title: Dual-recommendation disentanglement network for view fuzz in action recognition
  publication-title: IEEE Trans. Lmage Process.
  doi: 10.1109/TIP.2023.3273459
– volume: 45
  start-page: 3522
  issue: 3
  year: 2023
  ident: 10.1016/j.patcog.2025.111923_b31
  article-title: MMNet: A model-based multimodal network for human action recognition in RGB-D videos
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
  doi: 10.1109/TPAMI.2022.3177813
– volume: 32
  start-page: 1498
  issue: 3
  year: 2022
  ident: 10.1016/j.patcog.2025.111923_b36
  article-title: Cross-modality compensation convolutional neural networks for RGB-D action recognition
  publication-title: IEEE Trans. Circuits Syst. Video Technol.
  doi: 10.1109/TCSVT.2021.3076165
– volume: vol. 50
  start-page: 1
  year: 2024
  ident: 10.1016/j.patcog.2025.111923_b11
  article-title: Action selection learning for multi-label multi-view action recognition
– ident: 10.1016/j.patcog.2025.111923_b22
– ident: 10.1016/j.patcog.2025.111923_b27
  doi: 10.1109/ICCV.2019.00630
– ident: 10.1016/j.patcog.2025.111923_b17
  doi: 10.1109/CVPR52688.2022.00678
– ident: 10.1016/j.patcog.2025.111923_b32
  doi: 10.1109/CVPR52688.2022.00298
– ident: 10.1016/j.patcog.2025.111923_b8
  doi: 10.1007/978-3-030-58583-9_26
– year: 2025
  ident: 10.1016/j.patcog.2025.111923_b9
– volume: 159
  year: 2025
  ident: 10.1016/j.patcog.2025.111923_b30
  article-title: Local and global self-attention enhanced graph convolutional network for skeleton-based action recognition
  publication-title: Pattern Recognit.
  doi: 10.1016/j.patcog.2024.111106
– ident: 10.1016/j.patcog.2025.111923_b1
  doi: 10.1109/WACV56688.2023.00553
– volume: 42
  start-page: 2684
  issue: 10
  year: 2019
  ident: 10.1016/j.patcog.2025.111923_b24
  article-title: NTU RGB+D 120: A large-scale benchmark for 3d human activity understanding
  publication-title: IEEE Trans. Pattern Anal. Mach. Intell.
  doi: 10.1109/TPAMI.2019.2916873
– ident: 10.1016/j.patcog.2025.111923_b18
  doi: 10.1007/978-3-030-58621-8_45
– ident: 10.1016/j.patcog.2025.111923_b10
  doi: 10.1109/ICASSP43922.2022.9746006
SSID ssj0017142
Score 2.4907627
Snippet Multi-view action recognition aims to identify actions in a given multi-view scene. Traditional studies initially extracted refined features from each view,...
SourceID crossref
elsevier
SourceType Index Database
Publisher
StartPage 111923
SubjectTerms Action recognition
Contrastive learning
Deformable attention
Multi-view video analytics
Title Trunk-branch contrastive network with multi-view deformable aggregation for multi-view action recognition
URI https://dx.doi.org/10.1016/j.patcog.2025.111923
Volume 169
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVESC
  databaseName: Baden-Württemberg Complete Freedom Collection (Elsevier)
  issn: 0031-3203
  databaseCode: GBLVA
  dateStart: 20110101
  customDbUrl:
  isFulltext: true
  dateEnd: 99991231
  titleUrlDefault: https://www.sciencedirect.com
  omitProxy: true
  ssIdentifier: ssj0017142
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: Elsevier SD Complete Freedom Collection [SCCMFC]
  issn: 0031-3203
  databaseCode: ACRLP
  dateStart: 19950101
  customDbUrl:
  isFulltext: true
  dateEnd: 99991231
  titleUrlDefault: https://www.sciencedirect.com
  omitProxy: true
  ssIdentifier: ssj0017142
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection
  issn: 0031-3203
  databaseCode: .~1
  dateStart: 19950101
  customDbUrl:
  isFulltext: true
  dateEnd: 99991231
  titleUrlDefault: https://www.sciencedirect.com
  omitProxy: true
  ssIdentifier: ssj0017142
  providerName: Elsevier
– providerCode: PRVESC
  databaseName: Elsevier SD Freedom Collection Journals [SCFCJ]
  issn: 0031-3203
  databaseCode: AIKHN
  dateStart: 19950101
  customDbUrl:
  isFulltext: true
  dateEnd: 99991231
  titleUrlDefault: https://www.sciencedirect.com
  omitProxy: true
  ssIdentifier: ssj0017142
  providerName: Elsevier
– providerCode: PRVLSH
  databaseName: Elsevier Journals
  issn: 0031-3203
  databaseCode: AKRWK
  dateStart: 19680101
  customDbUrl:
  isFulltext: true
  mediaType: online
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0017142
  providerName: Library Specific Holdings
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV09b8IwED0hunTpd1X6gTx0dQmJneARoSLaSkwgsUW2Y1PaKiAKa397z3aCqFR16BjHlqJ38fOd_c4HcK-NEKyILbWCM8pEIqiwUlLOUhmpTMU97dUW43Q0Zc8zPmvAoM6FcbLKivsDp3u2rlo6FZqd1WLhcnzdtYNRgot4xDHOchnsLHWyvoevnczD1fcON4YnXep61-lzXuO1QrpbzjFKjLnjDhEnvy9Pe0vO8ASOKl-R9MPnnELDlGdwXNdhINW0PIfFZL0t36lyNTJeiRefy09HY6QMIm_idluJ1w5SdxRACuN9VfVhiJxjxD339iHYuN8r5DyQncZoWV7AdPg4GYxoVUKBapyLG8q5LBSzaIxMKWa47NquznCqmdQiGRoZ2QhfFyJFvJlMtJGxzGKtbSa0KFRyCc1yWZorIMIoEWO83DOaMZVZdGwKyXSibZoao4oW0Bq5fBVuyshrCdlbHpDOHdJ5QLoFWQ1v_sPiOZL5nyOv_z3yBg7xqdpCuYXmZr01d-hUbFTb_zVtOOg_vYzG35wu0QY
linkProvider Elsevier
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV1LTwIxEJ4gHPTi24jPHrw27KPdpUdCJCDICRJum7bbImoWgvD_ne6DYGI8eG23yeZr-81M-00H4EkbIVgaWGoFZ5SJUFBhpaScRdJTsQraOldbjKP-lL3M-KwG3SoXxskqS-4vOD1n67KlVaLZWi0WLsfXPTvohWjEPY5x1gE0GEdOrkOjMxj2x7vLhNhnxaPhoU_dgCqDLpd5rZDxlnMMFAPu6EME4e8Was_q9E7huHQXSaf4ozOomewcTqpSDKTcmRewmKy32QdVrkzGG8n15_LLMRnJCp03cQeuJJcPUncbQFKTu6vq0xA5x6B7nk8Rwcb9r4q0B7KTGS2zS5j2nifdPi2rKFCN23FDOZepYhbnI1aKGS596-sYd5uJLPKhkZ71sDsVEULOZKiNDGQcaG1joUWqwiuoZ8vMXAMRRokAQ-a20Yyp2KJvk0qmQ22jyBiVNoFWyCWr4rGMpFKRvScF0olDOimQbkJcwZv8mPQE-fzPkTf_HvkIh_3J6ygZDcbDWzjCnvJE5Q7qm_XW3KOPsVEP5Rr6Bryv07E
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Trunk-branch+contrastive+network+with+multi-view+deformable+aggregation+for+multi-view+action+recognition&rft.jtitle=Pattern+recognition&rft.au=Yang%2C+Yingyuan&rft.au=Liang%2C+Guoyuan&rft.au=Wang%2C+Can&rft.au=Wu%2C+Xiaojun&rft.date=2026-01-01&rft.issn=0031-3203&rft.volume=169&rft.spage=111923&rft_id=info:doi/10.1016%2Fj.patcog.2025.111923&rft.externalDBID=n%2Fa&rft.externalDocID=10_1016_j_patcog_2025_111923
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0031-3203&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0031-3203&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0031-3203&client=summon