Fine-Grained Video Captioning via Graph-based Multi-Granularity Interaction Learning

Learning to generate continuous linguistic descriptions for multi-subject interactive videos in great details has particular applications in team sports auto-narrative. In contrast to traditional video caption, this task is more challenging as it requires simultaneous modeling of fine-grained indivi...

Full description

Saved in:
Bibliographic Details
Published inIEEE transactions on pattern analysis and machine intelligence Vol. 44; no. 2; pp. 666 - 683
Main Authors Yan, Yichao, Zhuang, Ning, Ni, Bingbing, Zhang, Jian, Xu, Minghao, Zhang, Qiang, Zhang, Zheng, Cheng, Shuo, Tian, Qi, Xu, Yi, Yang, Xiaokang, Zhang, Wenjun
Format Journal Article
LanguageEnglish
Published United States IEEE 01.02.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects
Online AccessGet full text
ISSN0162-8828
1939-3539
2160-9292
1939-3539
DOI10.1109/TPAMI.2019.2946823

Cover

Abstract Learning to generate continuous linguistic descriptions for multi-subject interactive videos in great details has particular applications in team sports auto-narrative. In contrast to traditional video caption, this task is more challenging as it requires simultaneous modeling of fine-grained individual actions, uncovering of spatio-temporal dependency structures of frequent group interactions, and then accurate mapping of these complex interaction details into long and detailed commentary. To explicitly address these challenges, we propose a novel framework Graph-based Learning for Multi-Granularity Interaction Representation (GLMGIR) for fine-grained team sports auto-narrative task. A multi-granular interaction modeling module is proposed to extract among-subjects' interactive actions in a progressive way for encoding both intra- and inter-team interactions. Based on the above multi-granular representations, a multi-granular attention module is developed to consider action/event descriptions of multiple spatio-temporal resolutions. Both modules are integrated seamlessly and work in a collaborative way to generate the final narrative. In the meantime, to facilitate reproducible research, we collect a new video dataset from YouTube.com called Sports Video Narrative dataset (SVN). It is a novel direction as it contains <inline-formula><tex-math notation="LaTeX">6K</tex-math> <mml:math><mml:mrow><mml:mn>6</mml:mn><mml:mi>K</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="zhuang-ieq1-2946823.gif"/> </inline-formula> team sports videos (i.e., NBA basketball games) with <inline-formula><tex-math notation="LaTeX">10K</tex-math> <mml:math><mml:mrow><mml:mn>10</mml:mn><mml:mi>K</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="zhuang-ieq2-2946823.gif"/> </inline-formula> ground-truth narratives(e.g., sentences). Furthermore, as previous metrics such as METEOR (i.e., used in coarse-grained video caption task) DO NOT cope with fine-grained sports narrative task well, we hence develop a novel evaluation metric named Fine-grained Captioning Evaluation (FCE), which measures how accurate the generated linguistic description reflects fine-grained action details as well as the overall spatio-temporal interactional structure. Extensive experiments on our SVN dataset have demonstrated the effectiveness of the proposed framework for fine-grained team sports video auto-narrative.
AbstractList Learning to generate continuous linguistic descriptions for multi-subject interactive videos in great details has particular applications in team sports auto-narrative. In contrast to traditional video caption, this task is more challenging as it requires simultaneous modeling of fine-grained individual actions, uncovering of spatio-temporal dependency structures of frequent group interactions, and then accurate mapping of these complex interaction details into long and detailed commentary. To explicitly address these challenges, we propose a novel framework Graph-based Learning for Multi-Granularity Interaction Representation (GLMGIR) for fine-grained team sports auto-narrative task. A multi-granular interaction modeling module is proposed to extract among-subjects’ interactive actions in a progressive way for encoding both intra- and inter-team interactions. Based on the above multi-granular representations, a multi-granular attention module is developed to consider action/event descriptions of multiple spatio-temporal resolutions. Both modules are integrated seamlessly and work in a collaborative way to generate the final narrative. In the meantime, to facilitate reproducible research, we collect a new video dataset from YouTube.com called Sports Video Narrative dataset (SVN). It is a novel direction as it contains [Formula Omitted] team sports videos (i.e., NBA basketball games) with [Formula Omitted] ground-truth narratives(e.g., sentences). Furthermore, as previous metrics such as METEOR (i.e., used in coarse-grained video caption task) DO NOT cope with fine-grained sports narrative task well, we hence develop a novel evaluation metric named Fine-grained Captioning Evaluation (FCE), which measures how accurate the generated linguistic description reflects fine-grained action details as well as the overall spatio-temporal interactional structure. Extensive experiments on our SVN dataset have demonstrated the effectiveness of the proposed framework for fine-grained team sports video auto-narrative.
Learning to generate continuous linguistic descriptions for multi-subject interactive videos in great details has particular applications in team sports auto-narrative. In contrast to traditional video caption, this task is more challenging as it requires simultaneous modeling of fine-grained individual actions, uncovering of spatio-temporal dependency structures of frequent group interactions, and then accurate mapping of these complex interaction details into long and detailed commentary. To explicitly address these challenges, we propose a novel framework Graph-based Learning for Multi-Granularity Interaction Representation (GLMGIR) for fine-grained team sports auto-narrative task. A multi-granular interaction modeling module is proposed to extract among-subjects' interactive actions in a progressive way for encoding both intra- and inter-team interactions. Based on the above multi-granular representations, a multi-granular attention module is developed to consider action/event descriptions of multiple spatio-temporal resolutions. Both modules are integrated seamlessly and work in a collaborative way to generate the final narrative. In the meantime, to facilitate reproducible research, we collect a new video dataset from YouTube.com called Sports Video Narrative dataset (SVN). It is a novel direction as it contains <inline-formula><tex-math notation="LaTeX">6K</tex-math> <mml:math><mml:mrow><mml:mn>6</mml:mn><mml:mi>K</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="zhuang-ieq1-2946823.gif"/> </inline-formula> team sports videos (i.e., NBA basketball games) with <inline-formula><tex-math notation="LaTeX">10K</tex-math> <mml:math><mml:mrow><mml:mn>10</mml:mn><mml:mi>K</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="zhuang-ieq2-2946823.gif"/> </inline-formula> ground-truth narratives(e.g., sentences). Furthermore, as previous metrics such as METEOR (i.e., used in coarse-grained video caption task) DO NOT cope with fine-grained sports narrative task well, we hence develop a novel evaluation metric named Fine-grained Captioning Evaluation (FCE), which measures how accurate the generated linguistic description reflects fine-grained action details as well as the overall spatio-temporal interactional structure. Extensive experiments on our SVN dataset have demonstrated the effectiveness of the proposed framework for fine-grained team sports video auto-narrative.
Learning to generate continuous linguistic descriptions for multi-subject interactive videos in great details has particular applications in team sports auto-narrative. In contrast to traditional video caption, this task is more challenging as it requires simultaneous modeling of fine-grained individual actions, uncovering of spatio-temporal dependency structures of frequent group interactions, and then accurate mapping of these complex interaction details into long and detailed commentary. To explicitly address these challenges, we propose a novel framework Graph-based Learning for Multi-Granularity Interaction Representation (GLMGIR) for fine-grained team sports auto-narrative task. A multi-granular interaction modeling module is proposed to extract among-subjects' interactive actions in a progressive way for encoding both intra- and inter-team interactions. Based on the above multi-granular representations, a multi-granular attention module is developed to consider action/event descriptions of multiple spatio-temporal resolutions. Both modules are integrated seamlessly and work in a collaborative way to generate the final narrative. In the meantime, to facilitate reproducible research, we collect a new video dataset from YouTube.com called Sports Video Narrative dataset (SVN). It is a novel direction as it contains 6K team sports videos (i.e., NBA basketball games) with 10K ground-truth narratives(e.g., sentences). Furthermore, as previous metrics such as METEOR (i.e., used in coarse-grained video caption task) DO NOT cope with fine-grained sports narrative task well, we hence develop a novel evaluation metric named Fine-grained Captioning Evaluation (FCE), which measures how accurate the generated linguistic description reflects fine-grained action details as well as the overall spatio-temporal interactional structure. Extensive experiments on our SVN dataset have demonstrated the effectiveness of the proposed framework for fine-grained team sports video auto-narrative.Learning to generate continuous linguistic descriptions for multi-subject interactive videos in great details has particular applications in team sports auto-narrative. In contrast to traditional video caption, this task is more challenging as it requires simultaneous modeling of fine-grained individual actions, uncovering of spatio-temporal dependency structures of frequent group interactions, and then accurate mapping of these complex interaction details into long and detailed commentary. To explicitly address these challenges, we propose a novel framework Graph-based Learning for Multi-Granularity Interaction Representation (GLMGIR) for fine-grained team sports auto-narrative task. A multi-granular interaction modeling module is proposed to extract among-subjects' interactive actions in a progressive way for encoding both intra- and inter-team interactions. Based on the above multi-granular representations, a multi-granular attention module is developed to consider action/event descriptions of multiple spatio-temporal resolutions. Both modules are integrated seamlessly and work in a collaborative way to generate the final narrative. In the meantime, to facilitate reproducible research, we collect a new video dataset from YouTube.com called Sports Video Narrative dataset (SVN). It is a novel direction as it contains 6K team sports videos (i.e., NBA basketball games) with 10K ground-truth narratives(e.g., sentences). Furthermore, as previous metrics such as METEOR (i.e., used in coarse-grained video caption task) DO NOT cope with fine-grained sports narrative task well, we hence develop a novel evaluation metric named Fine-grained Captioning Evaluation (FCE), which measures how accurate the generated linguistic description reflects fine-grained action details as well as the overall spatio-temporal interactional structure. Extensive experiments on our SVN dataset have demonstrated the effectiveness of the proposed framework for fine-grained team sports video auto-narrative.
Learning to generate continuous linguistic descriptions for multi-subject interactive videos in great details has particular applications in team sports auto-narrative. In contrast to traditional video caption, this task is more challenging as it requires simultaneous modeling of fine-grained individual actions, uncovering of spatio-temporal dependency structures of frequent group interactions, and then accurate mapping of these complex interaction details into long and detailed commentary. To explicitly address these challenges, we propose a novel framework Graph-based Learning for Multi-Granularity Interaction Representation (GLMGIR) for fine-grained team sports auto-narrative task. A multi-granular interaction modeling module is proposed to extract among-subjects' interactive actions in a progressive way for encoding both intra- and inter-team interactions. Based on the above multi-granular representations, a multi-granular attention module is developed to consider action/event descriptions of multiple spatio-temporal resolutions. Both modules are integrated seamlessly and work in a collaborative way to generate the final narrative. In the meantime, to facilitate reproducible research, we collect a new video dataset from YouTube.com called Sports Video Narrative dataset (SVN). It is a novel direction as it contains 6K team sports videos (i.e., NBA basketball games) with 10K ground-truth narratives(e.g., sentences). Furthermore, as previous metrics such as METEOR (i.e., used in coarse-grained video caption task) DO NOT cope with fine-grained sports narrative task well, we hence develop a novel evaluation metric named Fine-grained Captioning Evaluation (FCE), which measures how accurate the generated linguistic description reflects fine-grained action details as well as the overall spatio-temporal interactional structure. Extensive experiments on our SVN dataset have demonstrated the effectiveness of the proposed framework for fine-grained team sports video auto-narrative.
Author Zhuang, Ning
Xu, Minghao
Yan, Yichao
Zhang, Zheng
Yang, Xiaokang
Zhang, Jian
Zhang, Qiang
Tian, Qi
Ni, Bingbing
Zhang, Wenjun
Xu, Yi
Cheng, Shuo
Author_xml – sequence: 1
  givenname: Yichao
  orcidid: 0000-0003-3209-8965
  surname: Yan
  fullname: Yan, Yichao
  email: yanyichao@sjtu.edu.cn
  organization: Shanghai Jiao Tong University, Shanghai, China
– sequence: 2
  givenname: Ning
  orcidid: 0000-0002-9605-0891
  surname: Zhuang
  fullname: Zhuang, Ning
  email: ningzhuang@sjtu.edu.cn
  organization: Shanghai Jiao Tong University, Shanghai, China
– sequence: 3
  givenname: Bingbing
  orcidid: 0000-0001-7339-028X
  surname: Ni
  fullname: Ni, Bingbing
  email: nibingbing@sjtu.edu.cn
  organization: Shanghai Jiao Tong University, Shanghai, China
– sequence: 4
  givenname: Jian
  orcidid: 0000-0003-4410-3741
  surname: Zhang
  fullname: Zhang, Jian
  email: stevenash0822@sjtu.edu.cn
  organization: Shanghai Jiao Tong University, Shanghai, China
– sequence: 5
  givenname: Minghao
  orcidid: 0000-0001-7468-8790
  surname: Xu
  fullname: Xu, Minghao
  email: xuminghao118@sjtu.edu.cn
  organization: Shanghai Jiao Tong University, Shanghai, China
– sequence: 6
  givenname: Qiang
  orcidid: 0000-0002-8142-1362
  surname: Zhang
  fullname: Zhang, Qiang
  email: zhangqiang2016@sjtu.edu.cn
  organization: Shanghai Jiao Tong University, Shanghai, China
– sequence: 7
  givenname: Zheng
  orcidid: 0000-0002-7170-3884
  surname: Zhang
  fullname: Zhang, Zheng
  email: 123derrick@sjtu.edu.cn
  organization: Shanghai Jiao Tong University, Shanghai, China
– sequence: 8
  givenname: Shuo
  orcidid: 0000-0002-4477-9875
  surname: Cheng
  fullname: Cheng, Shuo
  email: acccheng94@gmail.com
  organization: Shanghai Jiao Tong University, Shanghai, China
– sequence: 9
  givenname: Qi
  orcidid: 0000-0002-5165-4325
  surname: Tian
  fullname: Tian, Qi
  email: qitian@cs.utsa.edu
  organization: University of Texas at San Antonio, San Antonio, TX, USA
– sequence: 10
  givenname: Yi
  surname: Xu
  fullname: Xu, Yi
  email: xuyi@sjtu.edu.cn
  organization: Shanghai Jiao Tong University, Shanghai, China
– sequence: 11
  givenname: Xiaokang
  orcidid: 0000-0003-4029-3322
  surname: Yang
  fullname: Yang, Xiaokang
  email: yangxiaokang@sjtu.edu.cn
  organization: MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China
– sequence: 12
  givenname: Wenjun
  orcidid: 0000-0001-8799-1182
  surname: Zhang
  fullname: Zhang, Wenjun
  email: zhangwenjun@sjtu.edu.cn
  organization: Shanghai Jiao Tong University, Shanghai, China
BackLink https://www.ncbi.nlm.nih.gov/pubmed/31613750$$D View this record in MEDLINE/PubMed
BookMark eNp9kcFu1DAQhi1URLeFFwAJReLSSxbPOHbsY7Wi7UpbwWHhajnZCbjKOoudIPXtcbpbDj1wmoO_b-T5_wt2FoZAjL0HvgTg5vP22_X9eokczBJNpTSKV2yBoHhp0OAZW3BQWGqN-pxdpPTAOVSSizfsXIACUUu-YNsbH6i8jS6PXfHD72goVu4w-iH48LP4412RHw-_ysalDNxP_ehnPEy9i358LNZhpOjaWSg25OKsvWWvO9cneneal-z7zZft6q7cfL1dr643ZSskjKV23NW7DlsOjSbZSAltly9xWitdE3bQkVCy7iqFtQSBzpDjsoHGEVS1EZfs6rj3EIffE6XR7n1qqe9doGFKFgVXCKiQZ_TTC_RhmGLIv7OooK4MyCfq44mamj3t7CH6vYuP9jmvDOgj0MYhpUidbf3o5uPHHGFvgdu5GvtUjZ2rsadqsoov1Oft_5U-HCVPRP-EnI9U3Ii_GbSYFA
CODEN ITPIDJ
CitedBy_id crossref_primary_10_1109_TPAMI_2022_3217046
crossref_primary_10_1109_TPAMI_2024_3479776
crossref_primary_10_1109_TIP_2023_3261743
crossref_primary_10_1016_j_patcog_2020_107267
crossref_primary_10_1049_ipr2_12819
crossref_primary_10_1109_TMM_2023_3330070
crossref_primary_10_1007_s10462_023_10414_6
Cites_doi 10.1145/3272127.3275110
10.1007/978-3-642-40760-4_43
10.1109/CVPR.2017.179
10.1109/CVPR.2015.7298932
10.1098/rstb.2013.0480
10.1609/aaai.v32i1.11849
10.1109/CVPR.2017.356
10.1109/ICCV.2017.83
10.1109/CVPR.2017.548
10.1109/CVPR.2016.497
10.1109/CVPR.2015.7298935
10.5244/C.29.128
10.1109/ICCV.2015.512
10.1007/978-3-642-15561-1_30
10.1109/CVPR.2013.128
10.1109/CVPR.2017.345
10.1146/annurev.neuro.26.041002.131047
10.1109/ICCV.2013.47
10.1109/CVPR.2015.7298878
10.1109/CVPR.2016.571
10.1109/CVPR.2015.7298965
10.3115/1073083.1073135
10.18653/v1/D16-1204
10.1007/978-3-642-15561-1_2
10.1007/978-3-642-30618-1_17
10.1109/ICCV.2013.337
10.18653/v1/D15-1166
10.1109/CVPR.2012.6247801
10.1109/ICCV.2015.276
10.1109/ICCV.2015.368
10.1109/TPAMI.2016.2577031
10.1007/s11263-014-0733-5
10.3115/v1/N15-1173
10.1109/CVPR.2018.00102
10.1109/CVPR.2018.00484
10.1109/CVPR.2018.00629
10.1109/CVPR.2017.662
10.1023/A:1020346032608
10.1109/CVPR.2017.713
10.1109/ICCV.2013.61
10.1109/ICCV.2013.215
10.1109/CVPR.2013.320
10.1609/aaai.v27i1.8679
10.1109/ICRA.2016.7487305
10.1007/978-3-319-46487-9_47
10.1109/CVPR.2018.00754
10.1145/3123266.3123358
10.1109/TIP.2016.2609811
10.1371/journal.pone.0180234
10.1109/CVPR.2016.494
10.1109/TBME.2005.869771
10.1007/978-3-319-11752-2_15
10.1145/3326362
10.1007/978-3-319-24947-6_17
10.1049/cp:19991218
10.1109/CVPR.2016.496
10.1109/CVPR.2013.340
10.1109/WACV.2018.00174
10.1109/TPAMI.2012.162
10.1109/ICCV.2015.515
10.1109/WACV.2015.154
10.1109/CVPR.2015.7299087
10.1109/ICCV.2011.6126306
10.1109/CVPR.2017.127
10.1109/ICASSP.2013.6638947
10.1109/CVPR.2015.7298685
10.1109/ICDMW.2009.79
10.24963/ijcai.2017/381
10.1109/ICCV.2015.316
10.1007/978-3-642-33718-5_13
10.1109/CVPR.2014.102
ContentType Journal Article
Copyright Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022
Copyright_xml – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022
DBID 97E
RIA
RIE
AAYXX
CITATION
CGR
CUY
CVF
ECM
EIF
NPM
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
7X8
DOI 10.1109/TPAMI.2019.2946823
DatabaseName IEEE All-Society Periodicals Package (ASPP) 2005–Present
IEEE All-Society Periodicals Package (ASPP) 1998–Present
IEEE/IET Electronic Library
CrossRef
Medline
MEDLINE
MEDLINE (Ovid)
MEDLINE
MEDLINE
PubMed
Computer and Information Systems Abstracts
Electronics & Communications Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
DatabaseTitle CrossRef
MEDLINE
Medline Complete
MEDLINE with Full Text
PubMed
MEDLINE (Ovid)
Technology Research Database
Computer and Information Systems Abstracts – Academic
Electronics & Communications Abstracts
ProQuest Computer Science Collection
Computer and Information Systems Abstracts
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts Professional
MEDLINE - Academic
DatabaseTitleList Technology Research Database

MEDLINE - Academic
MEDLINE
Database_xml – sequence: 1
  dbid: NPM
  name: PubMed
  url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed
  sourceTypes: Index Database
– sequence: 2
  dbid: EIF
  name: MEDLINE
  url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search
  sourceTypes: Index Database
– sequence: 3
  dbid: RIE
  name: IEEE/IET Electronic Library
  url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/
  sourceTypes: Publisher
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Computer Science
EISSN 2160-9292
1939-3539
EndPage 683
ExternalDocumentID 31613750
10_1109_TPAMI_2019_2946823
8865609
Genre orig-research
Research Support, Non-U.S. Gov't
Journal Article
GrantInformation_xml – fundername: Higher Education Discipline Innovation Project; 111 Project
  grantid: B07022; 150633
  funderid: 10.13039/501100013314
– fundername: Shanghai Key Laboratory of Digital Media Processing and Transmission; Shanghai Key Laboratory of Digital Media Processing and Transmissions
  funderid: 10.13039/501100012656
– fundername: CCF-Tencent Open Fund
– fundername: State Key Research and Development Program
  grantid: 2016YFB1001003
– fundername: National Natural Science Foundation of China
  grantid: 61976137; U1611461
  funderid: 10.13039/501100001809
– fundername: SJTU-BIGO Joint Research Fund
– fundername: MoE-China Mobile Research Fund Project
  grantid: MCM20180702
GroupedDBID ---
-DZ
-~X
.DC
0R~
29I
4.4
53G
5GY
6IK
97E
AAJGR
AARMG
AASAJ
AAWTH
ABAZT
ABQJQ
ABVLG
ACGFO
ACGFS
ACIWK
ACNCT
AENEX
AGQYO
AHBIQ
AKJIK
AKQYR
ALMA_UNASSIGNED_HOLDINGS
ASUFR
ATWAV
BEFXN
BFFAM
BGNUA
BKEBE
BPEOZ
CS3
DU5
E.L
EBS
EJD
F5P
HZ~
IEDLZ
IFIPE
IPLJI
JAVBF
LAI
M43
MS~
O9-
OCL
P2P
PQQKQ
RIA
RIE
RNS
RXW
TAE
TN5
UHB
~02
AAYXX
CITATION
5VS
9M8
ABFSI
ADRHT
AETEA
AETIX
AGSQL
AI.
AIBXA
ALLEH
CGR
CUY
CVF
ECM
EIF
FA8
H~9
IBMZZ
ICLAB
IFJZH
NPM
RIG
RNI
RZB
VH1
XJT
7SC
7SP
8FD
JQ2
L7M
L~C
L~D
7X8
ID FETCH-LOGICAL-c351t-8a0a7df2c01b8e5b551cf294a88687e2f1fe3657f46275132a9ea05b1bae14793
IEDL.DBID RIE
ISSN 0162-8828
1939-3539
IngestDate Sat Sep 27 20:18:51 EDT 2025
Mon Jun 30 06:08:35 EDT 2025
Mon Jul 21 06:04:57 EDT 2025
Wed Oct 01 06:04:42 EDT 2025
Thu Apr 24 23:11:10 EDT 2025
Wed Jun 11 06:00:31 EDT 2025
IsPeerReviewed true
IsScholarly true
Issue 2
Language English
License https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html
https://doi.org/10.15223/policy-029
https://doi.org/10.15223/policy-037
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c351t-8a0a7df2c01b8e5b551cf294a88687e2f1fe3657f46275132a9ea05b1bae14793
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
ORCID 0000-0002-8142-1362
0000-0002-9605-0891
0000-0001-7339-028X
0000-0001-8799-1182
0000-0003-3209-8965
0000-0003-4410-3741
0000-0002-5165-4325
0000-0003-4029-3322
0000-0002-4477-9875
0000-0001-7468-8790
0000-0002-7170-3884
PMID 31613750
PQID 2617491520
PQPubID 85458
PageCount 18
ParticipantIDs proquest_journals_2617491520
ieee_primary_8865609
proquest_miscellaneous_2306212620
crossref_primary_10_1109_TPAMI_2019_2946823
crossref_citationtrail_10_1109_TPAMI_2019_2946823
pubmed_primary_31613750
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2022-02-01
PublicationDateYYYYMMDD 2022-02-01
PublicationDate_xml – month: 02
  year: 2022
  text: 2022-02-01
  day: 01
PublicationDecade 2020
PublicationPlace United States
PublicationPlace_xml – name: United States
– name: New York
PublicationTitle IEEE transactions on pattern analysis and machine intelligence
PublicationTitleAbbrev TPAMI
PublicationTitleAlternate IEEE Trans Pattern Anal Mach Intell
PublicationYear 2022
Publisher IEEE
The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml – name: IEEE
– name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References Bouvrie (ref6) 2009
ref13
ref12
ref56
ref15
ref59
ref14
Sutskever (ref72)
ref53
ref11
ref55
Thomason (ref75)
ref54
Qi (ref58)
ref17
ref16
ref19
ref18
Yang (ref90)
Jiang (ref28) 2018
ref93
ref92
ref95
ref50
ref94
ref91
ref46
Qi (ref57)
ref45
ref89
ref48
ref47
ref42
ref86
ref41
ref85
ref44
ref87
ref49
ref8
Xu (ref88)
ref9
ref4
ref3
Newell (ref52)
ref5
Kingma (ref32) 2015
ref82
ref81
ref84
Lavie (ref40)
ref83
ref80
Klimek (ref33) 2017
ref79
ref34
Lin (ref43)
ref78
ref37
Simonyan (ref70)
ref36
ref31
ref30
ref74
ref77
Newell (ref51) 2017
ref1
ref39
ref38
Vaswani (ref76) 2017
Sermanet (ref67) 2015
Chen (ref10) 2015
Kojima (ref35) 2002; 50
ref71
ref73
Cao (ref7) 2016
ref24
ref68
ref23
ref26
ref25
ref69
ref20
ref64
ref63
ref22
ref66
Ba (ref2) 2015
ref21
ref65
ref27
ref29
ref60
ref62
ref61
References_xml – ident: ref25
  doi: 10.1145/3272127.3275110
– ident: ref74
  doi: 10.1007/978-3-642-40760-4_43
– start-page: 1218
  volume-title: Proc. 25th Int. Conf. Comput. Linguistics
  ident: ref75
  article-title: Integrating language and vision to generate natural language descriptions of videos in the wild
– ident: ref26
  doi: 10.1109/CVPR.2017.179
– ident: ref30
  doi: 10.1109/CVPR.2015.7298932
– start-page: 5998
  year: 2017
  ident: ref76
  article-title: Attention is all you need
  publication-title: Advances Neural Inf. Process. Syst.
– ident: ref5
  doi: 10.1098/rstb.2013.0480
– ident: ref9
  doi: 10.1609/aaai.v32i1.11849
– ident: ref36
  doi: 10.1109/CVPR.2017.356
– ident: ref37
  doi: 10.1109/ICCV.2017.83
– ident: ref68
  doi: 10.1109/CVPR.2017.548
– ident: ref54
  doi: 10.1109/CVPR.2016.497
– ident: ref82
  doi: 10.1109/CVPR.2015.7298935
– ident: ref69
  doi: 10.5244/C.29.128
– ident: ref91
  doi: 10.1109/ICCV.2015.512
– ident: ref13
  doi: 10.1007/978-3-642-15561-1_30
– start-page: 3122
  volume-title: Proc. Advances Neural Inf. Process. Syst.
  ident: ref90
  article-title: Unsupervised template learning for fine-grained object recognition
– ident: ref4
  doi: 10.1109/CVPR.2013.128
– ident: ref47
  doi: 10.1109/CVPR.2017.345
– ident: ref3
  doi: 10.1146/annurev.neuro.26.041002.131047
– ident: ref8
  doi: 10.1109/ICCV.2013.47
– ident: ref15
  doi: 10.1109/CVPR.2015.7298878
– ident: ref87
  doi: 10.1109/CVPR.2016.571
– ident: ref46
  doi: 10.1109/CVPR.2015.7298965
– ident: ref55
  doi: 10.3115/1073083.1073135
– ident: ref78
  doi: 10.18653/v1/D16-1204
– ident: ref31
  doi: 10.1109/CVPR.2015.7298932
– year: 2015
  ident: ref32
  article-title: Adam: A method for stochastic optimization
– ident: ref19
  doi: 10.1007/978-3-642-15561-1_2
– ident: ref81
  doi: 10.1109/CVPR.2015.7298935
– start-page: 2274
  volume-title: Proc. Annu. Conf. Neural Inf. Process. Syst.
  ident: ref52
  article-title: Associative embedding: End-to-end learning for joint detection and grouping
– ident: ref73
  doi: 10.1007/978-3-642-30618-1_17
– ident: ref24
  doi: 10.1109/ICCV.2013.337
– ident: ref49
  doi: 10.18653/v1/D15-1166
– ident: ref64
  doi: 10.1109/CVPR.2012.6247801
– ident: ref83
  doi: 10.1109/ICCV.2015.276
– ident: ref11
  doi: 10.1109/ICCV.2015.368
– ident: ref61
  doi: 10.1109/TPAMI.2016.2577031
– ident: ref18
  doi: 10.1007/s11263-014-0733-5
– start-page: 2277
  year: 2017
  ident: ref51
  article-title: Associative embedding: End-to-end learning for joint detection and grouping
  publication-title: Proc. Adv. Neural Inf. Process. Syst. 30: Annu. Conf. Neural Inf. Process. Syst.
– ident: ref80
  doi: 10.3115/v1/N15-1173
– start-page: 74
  volume-title: Proc. Workshop Text Summarization Branches Out
  ident: ref43
  article-title: ROUGE: A package for automatic evaluation of summaries
– ident: ref56
  doi: 10.1109/CVPR.2018.00102
– ident: ref86
  doi: 10.1109/CVPR.2018.00484
– ident: ref93
  doi: 10.1109/CVPR.2018.00629
– start-page: 3104
  volume-title: Proc. Advances Neural Inf. Process. Syst.
  ident: ref72
  article-title: Sequence to sequence learning with neural networks
– year: 2009
  ident: ref6
  article-title: Hierarchical learning: Theory with applications in speech and vision
– ident: ref95
  doi: 10.1109/CVPR.2017.662
– volume: 50
  start-page: 171
  issue: 2
  year: 2002
  ident: ref35
  article-title: Natural language description of human activities from video images based on concept hierarchy of actions
  publication-title: Int. J. Comput. Vis.
  doi: 10.1023/A:1020346032608
– ident: ref92
  doi: 10.1109/ICCV.2015.512
– ident: ref45
  doi: 10.1109/CVPR.2017.713
– start-page: 77
  volume-title: Proc. Conf. Comput. Vis. Pattern Recognit.
  ident: ref57
  article-title: PointNet: Deep learning on point sets for 3D classification and segmentation
– year: 2018
  ident: ref28
  article-title: Pointsift: A sift-like network module for 3D point cloud semantic segmentation
  publication-title: CoRR
– start-page: 65
  volume-title: Proc. EMNLP Workshop Statistical Mach. Transl.
  ident: ref40
  article-title: Meteor: An automatic metric for mt evaluation with improved correlation with human judgments
– ident: ref65
  doi: 10.1109/ICCV.2013.61
– year: 2015
  ident: ref67
  article-title: Attention for fine-grained categorization
  publication-title: Proc. 3rd Int. Conf. Learn. Representations
– ident: ref21
  doi: 10.1109/ICCV.2013.215
– start-page: 5099
  volume-title: Proc. Advances Neural Inf. Process. Syst.
  ident: ref58
  article-title: Pointnet++: Deep hierarchical feature learning on point sets in a metric space
– start-page: 301
  volume-title: Proc. Conf. Robot Learn.
  year: 2017
  ident: ref33
  article-title: Hierarchical reinforcement learning with parameters
– ident: ref59
  doi: 10.1109/CVPR.2013.320
– ident: ref38
  doi: 10.1609/aaai.v27i1.8679
– ident: ref42
  doi: 10.1109/ICRA.2016.7487305
– ident: ref17
  doi: 10.1007/978-3-319-46487-9_47
– ident: ref48
  doi: 10.1109/CVPR.2018.00754
– ident: ref89
  doi: 10.1145/3123266.3123358
– ident: ref34
  doi: 10.1109/TIP.2016.2609811
– ident: ref60
  doi: 10.1371/journal.pone.0180234
– year: 2015
  ident: ref2
  article-title: Multiple object recognition with visual attention
  publication-title: Proc. 3rd Int. Conf. Learn. Representations
– ident: ref29
  doi: 10.1109/CVPR.2016.494
– ident: ref66
  doi: 10.1109/TBME.2005.869771
– ident: ref62
  doi: 10.1007/978-3-319-11752-2_15
– volume-title: Proc. Int. Conf. Learn. Representations
  ident: ref70
  article-title: Very deep convolutional networks for large-scale image recognition
– ident: ref84
  doi: 10.1145/3326362
– ident: ref63
  doi: 10.1007/978-3-319-24947-6_17
– ident: ref22
  doi: 10.1049/cp:19991218
– year: 2016
  ident: ref7
  article-title: Realtime multi-person 2D pose estimation using part affinity fields
– ident: ref94
  doi: 10.1109/CVPR.2016.496
– ident: ref12
  doi: 10.1109/CVPR.2013.340
– ident: ref50
  doi: 10.1109/WACV.2018.00174
– ident: ref39
  doi: 10.1109/TPAMI.2012.162
– ident: ref79
  doi: 10.1109/ICCV.2015.515
– ident: ref41
  doi: 10.1109/WACV.2015.154
– ident: ref77
  doi: 10.1109/CVPR.2015.7299087
– ident: ref14
  doi: 10.1109/ICCV.2011.6126306
– ident: ref20
  doi: 10.1109/CVPR.2017.127
– ident: ref23
  doi: 10.1109/ICASSP.2013.6638947
– start-page: 2048
  volume-title: Proc. 32nd Int. Conf. Mach. Learn.
  ident: ref88
  article-title: Show, attend and tell: Neural image caption generation with visual attention
– ident: ref27
  doi: 10.1109/CVPR.2017.179
– ident: ref85
  doi: 10.1109/CVPR.2015.7298685
– ident: ref1
  doi: 10.1109/ICDMW.2009.79
– ident: ref71
  doi: 10.24963/ijcai.2017/381
– ident: ref16
  doi: 10.1109/ICCV.2015.316
– ident: ref44
  doi: 10.1007/978-3-642-33718-5_13
– year: 2015
  ident: ref10
  article-title: Microsoft COCO captions: Data collection and evaluation server
– ident: ref53
  doi: 10.1109/CVPR.2014.102
SSID ssj0014503
Score 2.462949
Snippet Learning to generate continuous linguistic descriptions for multi-subject interactive videos in great details has particular applications in team sports...
SourceID proquest
pubmed
crossref
ieee
SourceType Aggregation Database
Index Database
Enrichment Source
Publisher
StartPage 666
SubjectTerms Algorithms
Datasets
Descriptions
Evaluation
Feature extraction
fine-grained
Games
graphCNN
Humans
Interaction models
Learning
Linguistics
Measurement
Modelling
Modules
multiple granularity
Natural language processing
representation learning
Representations
Sentences
Sports
Task analysis
Team sports
Three-dimensional displays
Video
Video caption
Title Fine-Grained Video Captioning via Graph-based Multi-Granularity Interaction Learning
URI https://ieeexplore.ieee.org/document/8865609
https://www.ncbi.nlm.nih.gov/pubmed/31613750
https://www.proquest.com/docview/2617491520
https://www.proquest.com/docview/2306212620
Volume 44
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVIEE
  databaseName: IEEE/IET Electronic Library
  customDbUrl:
  eissn: 2160-9292
  dateEnd: 99991231
  omitProxy: false
  ssIdentifier: ssj0014503
  issn: 0162-8828
  databaseCode: RIE
  dateStart: 19790101
  isFulltext: true
  titleUrlDefault: https://ieeexplore.ieee.org/
  providerName: IEEE
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT9wwEB6xnNpDKdCWLQ8Zqbc22bxjHxFioUhb9bBU3CI7mVQrUIIgy6G_nhnHiVDVIi6RpUwezozjb-yZbwC-5LXOtEH0qjSMvUTRwdBP0CPfmdntSprBeEd38SO7uEour9PrDfg25sIgog0-Q5-bdi-_ass1L5XNpGSqGDWBSS6zPldr3DFIUlsFmRAMjXByI4YEmUDNlj9PFt85ikv5kUoyGXHxnJigTpxzuv2z-cgWWPk_1rRzznwLFsPb9qEmN_66M3755y8ix9d25z28c-BTnPTWsg0b2OzA1lDYQbhxvgNvn7EU7sJyTm3vnGtJYCV-rSpsxam-cyu54nGlxTnTXns8I1bCpvSyeMMRrgTyhV117BMohONz_f0BruZny9MLzxVj8Mo4DTtP6kDnVR2VQWgkpoaQVlnT59TUC5ljVIc1xlma1wnzHpOPqxXqIDWh0Rjy8t1H2GzaBvdAWNijtCwzpRKTK0JohJoTjEwsCeurKYSDSorSMZVzwYzbwnosgSqsRgvWaOE0OoWv4zV3PU_Hi9K7rI5R0mliCgeD5gs3lB8KpqwnG06jYArH42kahLyzohts1yRDjhdhgIxlPvUWM957MLTP_37mPryJOKPCBoIfwGZ3v8ZDwjmdObIG_gRDF_Pd
linkProvider IEEE
linkToHtml http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB6VcgAObWkpLLRgJG6QbR52Eh-rqvuAbsVhi3qL7GSCVqDsCnZ74Ncz4zhRVRXUS2Qpk4cz4_gbe-YbgA9ZbVJjEYNKRUkgNR0s_QQD8p2Z3a6kGYx3dGeX6eRKfr5W11vwqc-FQUQXfIZDbrq9_GpZbnip7CTPmSpGP4LHSkqp2mytfs9AKlcHmTAMjXFyJLoUmVCfzL-ezqYcx6WHsZZpHnP5nITATpJxwv2tGcmVWPk32nSzzmgXZt37tsEmP4abtR2Wf-5QOT60Q3uw4-GnOG3t5TlsYbMPu11pB-FH-j48u8VTeADzEbWDMVeTwEp8W1S4FGdm5ddyxc3CiDETXwc8J1bCJfWyeMMxrgTzhVt3bFMohGd0_f4Crkbn87NJ4MsxBGWionWQm9BkVR2XYWRzVJawVlnT5zTUizzDuI5qTFKV1ZKZj8nLNRpNqGxkDUa8gHcI282ywVcgHPDRJi9TraXNNGE0ws0SY5vkhPb1AKJOJUXpucq5ZMbPwvksoS6cRgvWaOE1OoCP_TWrlqnjv9IHrI5e0mtiAEed5gs_mH8XTFpPVqzicADv-9M0DHlvxTS43JAMuV6EAlKWedlaTH_vztBe3__Md_BkMp9dFBfTyy9v4GnM-RUuLPwItte_NnhMqGdt3zpj_wswMPcq
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Fine-Grained+Video+Captioning+via+Graph-based+Multi-Granularity+Interaction+Learning&rft.jtitle=IEEE+transactions+on+pattern+analysis+and+machine+intelligence&rft.au=Yan%2C+Yichao&rft.au=Zhuang%2C+Ning&rft.au=Ni%2C+Bingbing&rft.au=Zhang%2C+Jian&rft.date=2022-02-01&rft.issn=0162-8828&rft.eissn=2160-9292&rft.volume=44&rft.issue=2&rft.spage=666&rft.epage=683&rft_id=info:doi/10.1109%2FTPAMI.2019.2946823&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TPAMI_2019_2946823
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0162-8828&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0162-8828&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0162-8828&client=summon