Fine-Grained Video Captioning via Graph-based Multi-Granularity Interaction Learning

Learning to generate continuous linguistic descriptions for multi-subject interactive videos in great details has particular applications in team sports auto-narrative. In contrast to traditional video caption, this task is more challenging as it requires simultaneous modeling of fine-grained indivi...

Full description

Saved in:

Bibliographic Details
Published in	IEEE transactions on pattern analysis and machine intelligence Vol. 44; no. 2; pp. 666 - 683
Main Authors	Yan, Yichao, Zhuang, Ning, Ni, Bingbing, Zhang, Jian, Xu, Minghao, Zhang, Qiang, Zhang, Zheng, Cheng, Shuo, Tian, Qi, Xu, Yi, Yang, Xiaokang, Zhang, Wenjun
Format	Journal Article
Language	English
Published	United States IEEE 01.02.2022 The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Subjects	Algorithms Datasets Descriptions Evaluation Feature extraction fine-grained Games graphCNN Humans Interaction models Learning Linguistics Measurement Modelling Modules multiple granularity Natural language processing representation learning Representations Sentences Sports Task analysis Team sports Three-dimensional displays Video Video caption
Online Access	Get full text
ISSN	0162-8828 1939-3539 2160-9292 1939-3539
DOI	10.1109/TPAMI.2019.2946823

Cover

Abstract	Learning to generate continuous linguistic descriptions for multi-subject interactive videos in great details has particular applications in team sports auto-narrative. In contrast to traditional video caption, this task is more challenging as it requires simultaneous modeling of fine-grained individual actions, uncovering of spatio-temporal dependency structures of frequent group interactions, and then accurate mapping of these complex interaction details into long and detailed commentary. To explicitly address these challenges, we propose a novel framework Graph-based Learning for Multi-Granularity Interaction Representation (GLMGIR) for fine-grained team sports auto-narrative task. A multi-granular interaction modeling module is proposed to extract among-subjects' interactive actions in a progressive way for encoding both intra- and inter-team interactions. Based on the above multi-granular representations, a multi-granular attention module is developed to consider action/event descriptions of multiple spatio-temporal resolutions. Both modules are integrated seamlessly and work in a collaborative way to generate the final narrative. In the meantime, to facilitate reproducible research, we collect a new video dataset from YouTube.com called Sports Video Narrative dataset (SVN). It is a novel direction as it contains <inline-formula><tex-math notation="LaTeX">6K</tex-math> <mml:math><mml:mrow><mml:mn>6</mml:mn><mml:mi>K</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="zhuang-ieq1-2946823.gif"/> </inline-formula> team sports videos (i.e., NBA basketball games) with <inline-formula><tex-math notation="LaTeX">10K</tex-math> <mml:math><mml:mrow><mml:mn>10</mml:mn><mml:mi>K</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="zhuang-ieq2-2946823.gif"/> </inline-formula> ground-truth narratives(e.g., sentences). Furthermore, as previous metrics such as METEOR (i.e., used in coarse-grained video caption task) DO NOT cope with fine-grained sports narrative task well, we hence develop a novel evaluation metric named Fine-grained Captioning Evaluation (FCE), which measures how accurate the generated linguistic description reflects fine-grained action details as well as the overall spatio-temporal interactional structure. Extensive experiments on our SVN dataset have demonstrated the effectiveness of the proposed framework for fine-grained team sports video auto-narrative.
AbstractList	Learning to generate continuous linguistic descriptions for multi-subject interactive videos in great details has particular applications in team sports auto-narrative. In contrast to traditional video caption, this task is more challenging as it requires simultaneous modeling of fine-grained individual actions, uncovering of spatio-temporal dependency structures of frequent group interactions, and then accurate mapping of these complex interaction details into long and detailed commentary. To explicitly address these challenges, we propose a novel framework Graph-based Learning for Multi-Granularity Interaction Representation (GLMGIR) for fine-grained team sports auto-narrative task. A multi-granular interaction modeling module is proposed to extract among-subjects’ interactive actions in a progressive way for encoding both intra- and inter-team interactions. Based on the above multi-granular representations, a multi-granular attention module is developed to consider action/event descriptions of multiple spatio-temporal resolutions. Both modules are integrated seamlessly and work in a collaborative way to generate the final narrative. In the meantime, to facilitate reproducible research, we collect a new video dataset from YouTube.com called Sports Video Narrative dataset (SVN). It is a novel direction as it contains [Formula Omitted] team sports videos (i.e., NBA basketball games) with [Formula Omitted] ground-truth narratives(e.g., sentences). Furthermore, as previous metrics such as METEOR (i.e., used in coarse-grained video caption task) DO NOT cope with fine-grained sports narrative task well, we hence develop a novel evaluation metric named Fine-grained Captioning Evaluation (FCE), which measures how accurate the generated linguistic description reflects fine-grained action details as well as the overall spatio-temporal interactional structure. Extensive experiments on our SVN dataset have demonstrated the effectiveness of the proposed framework for fine-grained team sports video auto-narrative. Learning to generate continuous linguistic descriptions for multi-subject interactive videos in great details has particular applications in team sports auto-narrative. In contrast to traditional video caption, this task is more challenging as it requires simultaneous modeling of fine-grained individual actions, uncovering of spatio-temporal dependency structures of frequent group interactions, and then accurate mapping of these complex interaction details into long and detailed commentary. To explicitly address these challenges, we propose a novel framework Graph-based Learning for Multi-Granularity Interaction Representation (GLMGIR) for fine-grained team sports auto-narrative task. A multi-granular interaction modeling module is proposed to extract among-subjects' interactive actions in a progressive way for encoding both intra- and inter-team interactions. Based on the above multi-granular representations, a multi-granular attention module is developed to consider action/event descriptions of multiple spatio-temporal resolutions. Both modules are integrated seamlessly and work in a collaborative way to generate the final narrative. In the meantime, to facilitate reproducible research, we collect a new video dataset from YouTube.com called Sports Video Narrative dataset (SVN). It is a novel direction as it contains <inline-formula><tex-math notation="LaTeX">6K</tex-math> <mml:math><mml:mrow><mml:mn>6</mml:mn><mml:mi>K</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="zhuang-ieq1-2946823.gif"/> </inline-formula> team sports videos (i.e., NBA basketball games) with <inline-formula><tex-math notation="LaTeX">10K</tex-math> <mml:math><mml:mrow><mml:mn>10</mml:mn><mml:mi>K</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="zhuang-ieq2-2946823.gif"/> </inline-formula> ground-truth narratives(e.g., sentences). Furthermore, as previous metrics such as METEOR (i.e., used in coarse-grained video caption task) DO NOT cope with fine-grained sports narrative task well, we hence develop a novel evaluation metric named Fine-grained Captioning Evaluation (FCE), which measures how accurate the generated linguistic description reflects fine-grained action details as well as the overall spatio-temporal interactional structure. Extensive experiments on our SVN dataset have demonstrated the effectiveness of the proposed framework for fine-grained team sports video auto-narrative. Learning to generate continuous linguistic descriptions for multi-subject interactive videos in great details has particular applications in team sports auto-narrative. In contrast to traditional video caption, this task is more challenging as it requires simultaneous modeling of fine-grained individual actions, uncovering of spatio-temporal dependency structures of frequent group interactions, and then accurate mapping of these complex interaction details into long and detailed commentary. To explicitly address these challenges, we propose a novel framework Graph-based Learning for Multi-Granularity Interaction Representation (GLMGIR) for fine-grained team sports auto-narrative task. A multi-granular interaction modeling module is proposed to extract among-subjects' interactive actions in a progressive way for encoding both intra- and inter-team interactions. Based on the above multi-granular representations, a multi-granular attention module is developed to consider action/event descriptions of multiple spatio-temporal resolutions. Both modules are integrated seamlessly and work in a collaborative way to generate the final narrative. In the meantime, to facilitate reproducible research, we collect a new video dataset from YouTube.com called Sports Video Narrative dataset (SVN). It is a novel direction as it contains 6K team sports videos (i.e., NBA basketball games) with 10K ground-truth narratives(e.g., sentences). Furthermore, as previous metrics such as METEOR (i.e., used in coarse-grained video caption task) DO NOT cope with fine-grained sports narrative task well, we hence develop a novel evaluation metric named Fine-grained Captioning Evaluation (FCE), which measures how accurate the generated linguistic description reflects fine-grained action details as well as the overall spatio-temporal interactional structure. Extensive experiments on our SVN dataset have demonstrated the effectiveness of the proposed framework for fine-grained team sports video auto-narrative.Learning to generate continuous linguistic descriptions for multi-subject interactive videos in great details has particular applications in team sports auto-narrative. In contrast to traditional video caption, this task is more challenging as it requires simultaneous modeling of fine-grained individual actions, uncovering of spatio-temporal dependency structures of frequent group interactions, and then accurate mapping of these complex interaction details into long and detailed commentary. To explicitly address these challenges, we propose a novel framework Graph-based Learning for Multi-Granularity Interaction Representation (GLMGIR) for fine-grained team sports auto-narrative task. A multi-granular interaction modeling module is proposed to extract among-subjects' interactive actions in a progressive way for encoding both intra- and inter-team interactions. Based on the above multi-granular representations, a multi-granular attention module is developed to consider action/event descriptions of multiple spatio-temporal resolutions. Both modules are integrated seamlessly and work in a collaborative way to generate the final narrative. In the meantime, to facilitate reproducible research, we collect a new video dataset from YouTube.com called Sports Video Narrative dataset (SVN). It is a novel direction as it contains 6K team sports videos (i.e., NBA basketball games) with 10K ground-truth narratives(e.g., sentences). Furthermore, as previous metrics such as METEOR (i.e., used in coarse-grained video caption task) DO NOT cope with fine-grained sports narrative task well, we hence develop a novel evaluation metric named Fine-grained Captioning Evaluation (FCE), which measures how accurate the generated linguistic description reflects fine-grained action details as well as the overall spatio-temporal interactional structure. Extensive experiments on our SVN dataset have demonstrated the effectiveness of the proposed framework for fine-grained team sports video auto-narrative. Learning to generate continuous linguistic descriptions for multi-subject interactive videos in great details has particular applications in team sports auto-narrative. In contrast to traditional video caption, this task is more challenging as it requires simultaneous modeling of fine-grained individual actions, uncovering of spatio-temporal dependency structures of frequent group interactions, and then accurate mapping of these complex interaction details into long and detailed commentary. To explicitly address these challenges, we propose a novel framework Graph-based Learning for Multi-Granularity Interaction Representation (GLMGIR) for fine-grained team sports auto-narrative task. A multi-granular interaction modeling module is proposed to extract among-subjects' interactive actions in a progressive way for encoding both intra- and inter-team interactions. Based on the above multi-granular representations, a multi-granular attention module is developed to consider action/event descriptions of multiple spatio-temporal resolutions. Both modules are integrated seamlessly and work in a collaborative way to generate the final narrative. In the meantime, to facilitate reproducible research, we collect a new video dataset from YouTube.com called Sports Video Narrative dataset (SVN). It is a novel direction as it contains 6K team sports videos (i.e., NBA basketball games) with 10K ground-truth narratives(e.g., sentences). Furthermore, as previous metrics such as METEOR (i.e., used in coarse-grained video caption task) DO NOT cope with fine-grained sports narrative task well, we hence develop a novel evaluation metric named Fine-grained Captioning Evaluation (FCE), which measures how accurate the generated linguistic description reflects fine-grained action details as well as the overall spatio-temporal interactional structure. Extensive experiments on our SVN dataset have demonstrated the effectiveness of the proposed framework for fine-grained team sports video auto-narrative.
Author	Zhuang, Ning Xu, Minghao Yan, Yichao Zhang, Zheng Yang, Xiaokang Zhang, Jian Zhang, Qiang Tian, Qi Ni, Bingbing Zhang, Wenjun Xu, Yi Cheng, Shuo
Author_xml	– sequence: 1 givenname: Yichao orcidid: 0000-0003-3209-8965 surname: Yan fullname: Yan, Yichao email: yanyichao@sjtu.edu.cn organization: Shanghai Jiao Tong University, Shanghai, China – sequence: 2 givenname: Ning orcidid: 0000-0002-9605-0891 surname: Zhuang fullname: Zhuang, Ning email: ningzhuang@sjtu.edu.cn organization: Shanghai Jiao Tong University, Shanghai, China – sequence: 3 givenname: Bingbing orcidid: 0000-0001-7339-028X surname: Ni fullname: Ni, Bingbing email: nibingbing@sjtu.edu.cn organization: Shanghai Jiao Tong University, Shanghai, China – sequence: 4 givenname: Jian orcidid: 0000-0003-4410-3741 surname: Zhang fullname: Zhang, Jian email: stevenash0822@sjtu.edu.cn organization: Shanghai Jiao Tong University, Shanghai, China – sequence: 5 givenname: Minghao orcidid: 0000-0001-7468-8790 surname: Xu fullname: Xu, Minghao email: xuminghao118@sjtu.edu.cn organization: Shanghai Jiao Tong University, Shanghai, China – sequence: 6 givenname: Qiang orcidid: 0000-0002-8142-1362 surname: Zhang fullname: Zhang, Qiang email: zhangqiang2016@sjtu.edu.cn organization: Shanghai Jiao Tong University, Shanghai, China – sequence: 7 givenname: Zheng orcidid: 0000-0002-7170-3884 surname: Zhang fullname: Zhang, Zheng email: 123derrick@sjtu.edu.cn organization: Shanghai Jiao Tong University, Shanghai, China – sequence: 8 givenname: Shuo orcidid: 0000-0002-4477-9875 surname: Cheng fullname: Cheng, Shuo email: acccheng94@gmail.com organization: Shanghai Jiao Tong University, Shanghai, China – sequence: 9 givenname: Qi orcidid: 0000-0002-5165-4325 surname: Tian fullname: Tian, Qi email: qitian@cs.utsa.edu organization: University of Texas at San Antonio, San Antonio, TX, USA – sequence: 10 givenname: Yi surname: Xu fullname: Xu, Yi email: xuyi@sjtu.edu.cn organization: Shanghai Jiao Tong University, Shanghai, China – sequence: 11 givenname: Xiaokang orcidid: 0000-0003-4029-3322 surname: Yang fullname: Yang, Xiaokang email: yangxiaokang@sjtu.edu.cn organization: MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China – sequence: 12 givenname: Wenjun orcidid: 0000-0001-8799-1182 surname: Zhang fullname: Zhang, Wenjun email: zhangwenjun@sjtu.edu.cn organization: Shanghai Jiao Tong University, Shanghai, China
BackLink	https://www.ncbi.nlm.nih.gov/pubmed/31613750$$D View this record in MEDLINE/PubMed
BookMark	eNp9kcFu1DAQhi1URLeFFwAJReLSSxbPOHbsY7Wi7UpbwWHhajnZCbjKOoudIPXtcbpbDj1wmoO_b-T5_wt2FoZAjL0HvgTg5vP22_X9eokczBJNpTSKV2yBoHhp0OAZW3BQWGqN-pxdpPTAOVSSizfsXIACUUu-YNsbH6i8jS6PXfHD72goVu4w-iH48LP4412RHw-_ysalDNxP_ehnPEy9i358LNZhpOjaWSg25OKsvWWvO9cneneal-z7zZft6q7cfL1dr643ZSskjKV23NW7DlsOjSbZSAltly9xWitdE3bQkVCy7iqFtQSBzpDjsoHGEVS1EZfs6rj3EIffE6XR7n1qqe9doGFKFgVXCKiQZ_TTC_RhmGLIv7OooK4MyCfq44mamj3t7CH6vYuP9jmvDOgj0MYhpUidbf3o5uPHHGFvgdu5GvtUjZ2rsadqsoov1Oft_5U-HCVPRP-EnI9U3Ii_GbSYFA
CODEN	ITPIDJ
CitedBy_id	crossref_primary_10_1109_TPAMI_2022_3217046 crossref_primary_10_1109_TPAMI_2024_3479776 crossref_primary_10_1109_TIP_2023_3261743 crossref_primary_10_1016_j_patcog_2020_107267 crossref_primary_10_1049_ipr2_12819 crossref_primary_10_1109_TMM_2023_3330070 crossref_primary_10_1007_s10462_023_10414_6
Cites_doi	10.1145/3272127.3275110 10.1007/978-3-642-40760-4_43 10.1109/CVPR.2017.179 10.1109/CVPR.2015.7298932 10.1098/rstb.2013.0480 10.1609/aaai.v32i1.11849 10.1109/CVPR.2017.356 10.1109/ICCV.2017.83 10.1109/CVPR.2017.548 10.1109/CVPR.2016.497 10.1109/CVPR.2015.7298935 10.5244/C.29.128 10.1109/ICCV.2015.512 10.1007/978-3-642-15561-1_30 10.1109/CVPR.2013.128 10.1109/CVPR.2017.345 10.1146/annurev.neuro.26.041002.131047 10.1109/ICCV.2013.47 10.1109/CVPR.2015.7298878 10.1109/CVPR.2016.571 10.1109/CVPR.2015.7298965 10.3115/1073083.1073135 10.18653/v1/D16-1204 10.1007/978-3-642-15561-1_2 10.1007/978-3-642-30618-1_17 10.1109/ICCV.2013.337 10.18653/v1/D15-1166 10.1109/CVPR.2012.6247801 10.1109/ICCV.2015.276 10.1109/ICCV.2015.368 10.1109/TPAMI.2016.2577031 10.1007/s11263-014-0733-5 10.3115/v1/N15-1173 10.1109/CVPR.2018.00102 10.1109/CVPR.2018.00484 10.1109/CVPR.2018.00629 10.1109/CVPR.2017.662 10.1023/A:1020346032608 10.1109/CVPR.2017.713 10.1109/ICCV.2013.61 10.1109/ICCV.2013.215 10.1109/CVPR.2013.320 10.1609/aaai.v27i1.8679 10.1109/ICRA.2016.7487305 10.1007/978-3-319-46487-9_47 10.1109/CVPR.2018.00754 10.1145/3123266.3123358 10.1109/TIP.2016.2609811 10.1371/journal.pone.0180234 10.1109/CVPR.2016.494 10.1109/TBME.2005.869771 10.1007/978-3-319-11752-2_15 10.1145/3326362 10.1007/978-3-319-24947-6_17 10.1049/cp:19991218 10.1109/CVPR.2016.496 10.1109/CVPR.2013.340 10.1109/WACV.2018.00174 10.1109/TPAMI.2012.162 10.1109/ICCV.2015.515 10.1109/WACV.2015.154 10.1109/CVPR.2015.7299087 10.1109/ICCV.2011.6126306 10.1109/CVPR.2017.127 10.1109/ICASSP.2013.6638947 10.1109/CVPR.2015.7298685 10.1109/ICDMW.2009.79 10.24963/ijcai.2017/381 10.1109/ICCV.2015.316 10.1007/978-3-642-33718-5_13 10.1109/CVPR.2014.102
ContentType	Journal Article
Copyright	Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022
Copyright_xml	– notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022
DBID	97E RIA RIE AAYXX CITATION CGR CUY CVF ECM EIF NPM 7SC 7SP 8FD JQ2 L7M L~C L~D 7X8
DOI	10.1109/TPAMI.2019.2946823
DatabaseName	IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE/IET Electronic Library CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional MEDLINE - Academic
DatabaseTitle	CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional MEDLINE - Academic
DatabaseTitleList	Technology Research Database MEDLINE - Academic MEDLINE
Database_xml	– sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database – sequence: 3 dbid: RIE name: IEEE/IET Electronic Library url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher
DeliveryMethod	fulltext_linktorsrc
Discipline	Engineering Computer Science
EISSN	2160-9292 1939-3539
EndPage	683
ExternalDocumentID	31613750 10_1109_TPAMI_2019_2946823 8865609
Genre	orig-research Research Support, Non-U.S. Gov't Journal Article
GrantInformation_xml	– fundername: Higher Education Discipline Innovation Project; 111 Project grantid: B07022; 150633 funderid: 10.13039/501100013314 – fundername: Shanghai Key Laboratory of Digital Media Processing and Transmission; Shanghai Key Laboratory of Digital Media Processing and Transmissions funderid: 10.13039/501100012656 – fundername: CCF-Tencent Open Fund – fundername: State Key Research and Development Program grantid: 2016YFB1001003 – fundername: National Natural Science Foundation of China grantid: 61976137; U1611461 funderid: 10.13039/501100001809 – fundername: SJTU-BIGO Joint Research Fund – fundername: MoE-China Mobile Research Fund Project grantid: MCM20180702
GroupedDBID	--- -DZ -~X .DC 0R~ 29I 4.4 53G 5GY 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK ACNCT AENEX AGQYO AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD F5P HZ~ IEDLZ IFIPE IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIE RNS RXW TAE TN5 UHB ~02 AAYXX CITATION 5VS 9M8 ABFSI ADRHT AETEA AETIX AGSQL AI. AIBXA ALLEH CGR CUY CVF ECM EIF FA8 H~9 IBMZZ ICLAB IFJZH NPM RIG RNI RZB VH1 XJT 7SC 7SP 8FD JQ2 L7M L~C L~D 7X8
ID	FETCH-LOGICAL-c351t-8a0a7df2c01b8e5b551cf294a88687e2f1fe3657f46275132a9ea05b1bae14793
IEDL.DBID	RIE
ISSN	0162-8828 1939-3539
IngestDate	Sat Sep 27 20:18:51 EDT 2025 Mon Jun 30 06:08:35 EDT 2025 Mon Jul 21 06:04:57 EDT 2025 Wed Oct 01 06:04:42 EDT 2025 Thu Apr 24 23:11:10 EDT 2025 Wed Jun 11 06:00:31 EDT 2025
IsPeerReviewed	true
IsScholarly	true
Issue	2
Language	English
License	https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-c351t-8a0a7df2c01b8e5b551cf294a88687e2f1fe3657f46275132a9ea05b1bae14793
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ORCID	0000-0002-8142-1362 0000-0002-9605-0891 0000-0001-7339-028X 0000-0001-8799-1182 0000-0003-3209-8965 0000-0003-4410-3741 0000-0002-5165-4325 0000-0003-4029-3322 0000-0002-4477-9875 0000-0001-7468-8790 0000-0002-7170-3884
PMID	31613750
PQID	2617491520
PQPubID	85458
PageCount	18
ParticipantIDs	proquest_journals_2617491520 ieee_primary_8865609 proquest_miscellaneous_2306212620 crossref_primary_10_1109_TPAMI_2019_2946823 crossref_citationtrail_10_1109_TPAMI_2019_2946823 pubmed_primary_31613750
ProviderPackageCode	CITATION AAYXX
PublicationCentury	2000
PublicationDate	2022-02-01
PublicationDateYYYYMMDD	2022-02-01
PublicationDate_xml	– month: 02 year: 2022 text: 2022-02-01 day: 01
PublicationDecade	2020
PublicationPlace	United States
PublicationPlace_xml	– name: United States – name: New York
PublicationTitle	IEEE transactions on pattern analysis and machine intelligence
PublicationTitleAbbrev	TPAMI
PublicationTitleAlternate	IEEE Trans Pattern Anal Mach Intell
PublicationYear	2022
Publisher	IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
Publisher_xml	– name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE)
References	Bouvrie (ref6) 2009 ref13 ref12 ref56 ref15 ref59 ref14 Sutskever (ref72) ref53 ref11 ref55 Thomason (ref75) ref54 Qi (ref58) ref17 ref16 ref19 ref18 Yang (ref90) Jiang (ref28) 2018 ref93 ref92 ref95 ref50 ref94 ref91 ref46 Qi (ref57) ref45 ref89 ref48 ref47 ref42 ref86 ref41 ref85 ref44 ref87 ref49 ref8 Xu (ref88) ref9 ref4 ref3 Newell (ref52) ref5 Kingma (ref32) 2015 ref82 ref81 ref84 Lavie (ref40) ref83 ref80 Klimek (ref33) 2017 ref79 ref34 Lin (ref43) ref78 ref37 Simonyan (ref70) ref36 ref31 ref30 ref74 ref77 Newell (ref51) 2017 ref1 ref39 ref38 Vaswani (ref76) 2017 Sermanet (ref67) 2015 Chen (ref10) 2015 Kojima (ref35) 2002; 50 ref71 ref73 Cao (ref7) 2016 ref24 ref68 ref23 ref26 ref25 ref69 ref20 ref64 ref63 ref22 ref66 Ba (ref2) 2015 ref21 ref65 ref27 ref29 ref60 ref62 ref61
References_xml	– ident: ref25 doi: 10.1145/3272127.3275110 – ident: ref74 doi: 10.1007/978-3-642-40760-4_43 – start-page: 1218 volume-title: Proc. 25th Int. Conf. Comput. Linguistics ident: ref75 article-title: Integrating language and vision to generate natural language descriptions of videos in the wild – ident: ref26 doi: 10.1109/CVPR.2017.179 – ident: ref30 doi: 10.1109/CVPR.2015.7298932 – start-page: 5998 year: 2017 ident: ref76 article-title: Attention is all you need publication-title: Advances Neural Inf. Process. Syst. – ident: ref5 doi: 10.1098/rstb.2013.0480 – ident: ref9 doi: 10.1609/aaai.v32i1.11849 – ident: ref36 doi: 10.1109/CVPR.2017.356 – ident: ref37 doi: 10.1109/ICCV.2017.83 – ident: ref68 doi: 10.1109/CVPR.2017.548 – ident: ref54 doi: 10.1109/CVPR.2016.497 – ident: ref82 doi: 10.1109/CVPR.2015.7298935 – ident: ref69 doi: 10.5244/C.29.128 – ident: ref91 doi: 10.1109/ICCV.2015.512 – ident: ref13 doi: 10.1007/978-3-642-15561-1_30 – start-page: 3122 volume-title: Proc. Advances Neural Inf. Process. Syst. ident: ref90 article-title: Unsupervised template learning for fine-grained object recognition – ident: ref4 doi: 10.1109/CVPR.2013.128 – ident: ref47 doi: 10.1109/CVPR.2017.345 – ident: ref3 doi: 10.1146/annurev.neuro.26.041002.131047 – ident: ref8 doi: 10.1109/ICCV.2013.47 – ident: ref15 doi: 10.1109/CVPR.2015.7298878 – ident: ref87 doi: 10.1109/CVPR.2016.571 – ident: ref46 doi: 10.1109/CVPR.2015.7298965 – ident: ref55 doi: 10.3115/1073083.1073135 – ident: ref78 doi: 10.18653/v1/D16-1204 – ident: ref31 doi: 10.1109/CVPR.2015.7298932 – year: 2015 ident: ref32 article-title: Adam: A method for stochastic optimization – ident: ref19 doi: 10.1007/978-3-642-15561-1_2 – ident: ref81 doi: 10.1109/CVPR.2015.7298935 – start-page: 2274 volume-title: Proc. Annu. Conf. Neural Inf. Process. Syst. ident: ref52 article-title: Associative embedding: End-to-end learning for joint detection and grouping – ident: ref73 doi: 10.1007/978-3-642-30618-1_17 – ident: ref24 doi: 10.1109/ICCV.2013.337 – ident: ref49 doi: 10.18653/v1/D15-1166 – ident: ref64 doi: 10.1109/CVPR.2012.6247801 – ident: ref83 doi: 10.1109/ICCV.2015.276 – ident: ref11 doi: 10.1109/ICCV.2015.368 – ident: ref61 doi: 10.1109/TPAMI.2016.2577031 – ident: ref18 doi: 10.1007/s11263-014-0733-5 – start-page: 2277 year: 2017 ident: ref51 article-title: Associative embedding: End-to-end learning for joint detection and grouping publication-title: Proc. Adv. Neural Inf. Process. Syst. 30: Annu. Conf. Neural Inf. Process. Syst. – ident: ref80 doi: 10.3115/v1/N15-1173 – start-page: 74 volume-title: Proc. Workshop Text Summarization Branches Out ident: ref43 article-title: ROUGE: A package for automatic evaluation of summaries – ident: ref56 doi: 10.1109/CVPR.2018.00102 – ident: ref86 doi: 10.1109/CVPR.2018.00484 – ident: ref93 doi: 10.1109/CVPR.2018.00629 – start-page: 3104 volume-title: Proc. Advances Neural Inf. Process. Syst. ident: ref72 article-title: Sequence to sequence learning with neural networks – year: 2009 ident: ref6 article-title: Hierarchical learning: Theory with applications in speech and vision – ident: ref95 doi: 10.1109/CVPR.2017.662 – volume: 50 start-page: 171 issue: 2 year: 2002 ident: ref35 article-title: Natural language description of human activities from video images based on concept hierarchy of actions publication-title: Int. J. Comput. Vis. doi: 10.1023/A:1020346032608 – ident: ref92 doi: 10.1109/ICCV.2015.512 – ident: ref45 doi: 10.1109/CVPR.2017.713 – start-page: 77 volume-title: Proc. Conf. Comput. Vis. Pattern Recognit. ident: ref57 article-title: PointNet: Deep learning on point sets for 3D classification and segmentation – year: 2018 ident: ref28 article-title: Pointsift: A sift-like network module for 3D point cloud semantic segmentation publication-title: CoRR – start-page: 65 volume-title: Proc. EMNLP Workshop Statistical Mach. Transl. ident: ref40 article-title: Meteor: An automatic metric for mt evaluation with improved correlation with human judgments – ident: ref65 doi: 10.1109/ICCV.2013.61 – year: 2015 ident: ref67 article-title: Attention for fine-grained categorization publication-title: Proc. 3rd Int. Conf. Learn. Representations – ident: ref21 doi: 10.1109/ICCV.2013.215 – start-page: 5099 volume-title: Proc. Advances Neural Inf. Process. Syst. ident: ref58 article-title: Pointnet++: Deep hierarchical feature learning on point sets in a metric space – start-page: 301 volume-title: Proc. Conf. Robot Learn. year: 2017 ident: ref33 article-title: Hierarchical reinforcement learning with parameters – ident: ref59 doi: 10.1109/CVPR.2013.320 – ident: ref38 doi: 10.1609/aaai.v27i1.8679 – ident: ref42 doi: 10.1109/ICRA.2016.7487305 – ident: ref17 doi: 10.1007/978-3-319-46487-9_47 – ident: ref48 doi: 10.1109/CVPR.2018.00754 – ident: ref89 doi: 10.1145/3123266.3123358 – ident: ref34 doi: 10.1109/TIP.2016.2609811 – ident: ref60 doi: 10.1371/journal.pone.0180234 – year: 2015 ident: ref2 article-title: Multiple object recognition with visual attention publication-title: Proc. 3rd Int. Conf. Learn. Representations – ident: ref29 doi: 10.1109/CVPR.2016.494 – ident: ref66 doi: 10.1109/TBME.2005.869771 – ident: ref62 doi: 10.1007/978-3-319-11752-2_15 – volume-title: Proc. Int. Conf. Learn. Representations ident: ref70 article-title: Very deep convolutional networks for large-scale image recognition – ident: ref84 doi: 10.1145/3326362 – ident: ref63 doi: 10.1007/978-3-319-24947-6_17 – ident: ref22 doi: 10.1049/cp:19991218 – year: 2016 ident: ref7 article-title: Realtime multi-person 2D pose estimation using part affinity fields – ident: ref94 doi: 10.1109/CVPR.2016.496 – ident: ref12 doi: 10.1109/CVPR.2013.340 – ident: ref50 doi: 10.1109/WACV.2018.00174 – ident: ref39 doi: 10.1109/TPAMI.2012.162 – ident: ref79 doi: 10.1109/ICCV.2015.515 – ident: ref41 doi: 10.1109/WACV.2015.154 – ident: ref77 doi: 10.1109/CVPR.2015.7299087 – ident: ref14 doi: 10.1109/ICCV.2011.6126306 – ident: ref20 doi: 10.1109/CVPR.2017.127 – ident: ref23 doi: 10.1109/ICASSP.2013.6638947 – start-page: 2048 volume-title: Proc. 32nd Int. Conf. Mach. Learn. ident: ref88 article-title: Show, attend and tell: Neural image caption generation with visual attention – ident: ref27 doi: 10.1109/CVPR.2017.179 – ident: ref85 doi: 10.1109/CVPR.2015.7298685 – ident: ref1 doi: 10.1109/ICDMW.2009.79 – ident: ref71 doi: 10.24963/ijcai.2017/381 – ident: ref16 doi: 10.1109/ICCV.2015.316 – ident: ref44 doi: 10.1007/978-3-642-33718-5_13 – year: 2015 ident: ref10 article-title: Microsoft COCO captions: Data collection and evaluation server – ident: ref53 doi: 10.1109/CVPR.2014.102
SSID	ssj0014503
Score	2.462949
Snippet	Learning to generate continuous linguistic descriptions for multi-subject interactive videos in great details has particular applications in team sports...
SourceID	proquest pubmed crossref ieee
SourceType	Aggregation Database Index Database Enrichment Source Publisher
StartPage	666
SubjectTerms	Algorithms Datasets Descriptions Evaluation Feature extraction fine-grained Games graphCNN Humans Interaction models Learning Linguistics Measurement Modelling Modules multiple granularity Natural language processing representation learning Representations Sentences Sports Task analysis Team sports Three-dimensional displays Video Video caption
Title	Fine-Grained Video Captioning via Graph-based Multi-Granularity Interaction Learning
URI	https://ieeexplore.ieee.org/document/8865609 https://www.ncbi.nlm.nih.gov/pubmed/31613750 https://www.proquest.com/docview/2617491520 https://www.proquest.com/docview/2306212620
Volume	44
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVIEE databaseName: IEEE/IET Electronic Library customDbUrl: eissn: 2160-9292 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014503 issn: 0162-8828 databaseCode: RIE dateStart: 19790101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT9wwEB6xnNpDKdCWLQ8Zqbc22bxjHxFioUhb9bBU3CI7mVQrUIIgy6G_nhnHiVDVIi6RpUwezozjb-yZbwC-5LXOtEH0qjSMvUTRwdBP0CPfmdntSprBeEd38SO7uEour9PrDfg25sIgog0-Q5-bdi-_ass1L5XNpGSqGDWBSS6zPldr3DFIUlsFmRAMjXByI4YEmUDNlj9PFt85ikv5kUoyGXHxnJigTpxzuv2z-cgWWPk_1rRzznwLFsPb9qEmN_66M3755y8ix9d25z28c-BTnPTWsg0b2OzA1lDYQbhxvgNvn7EU7sJyTm3vnGtJYCV-rSpsxam-cyu54nGlxTnTXns8I1bCpvSyeMMRrgTyhV117BMohONz_f0BruZny9MLzxVj8Mo4DTtP6kDnVR2VQWgkpoaQVlnT59TUC5ljVIc1xlma1wnzHpOPqxXqIDWh0Rjy8t1H2GzaBvdAWNijtCwzpRKTK0JohJoTjEwsCeurKYSDSorSMZVzwYzbwnosgSqsRgvWaOE0OoWv4zV3PU_Hi9K7rI5R0mliCgeD5gs3lB8KpqwnG06jYArH42kahLyzohts1yRDjhdhgIxlPvUWM957MLTP_37mPryJOKPCBoIfwGZ3v8ZDwjmdObIG_gRDF_Pd
linkProvider	IEEE
linkToHtml	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB6VcgAObWkpLLRgJG6QbR52Eh-rqvuAbsVhi3qL7GSCVqDsCnZ74Ncz4zhRVRXUS2Qpk4cz4_gbe-YbgA9ZbVJjEYNKRUkgNR0s_QQD8p2Z3a6kGYx3dGeX6eRKfr5W11vwqc-FQUQXfIZDbrq9_GpZbnip7CTPmSpGP4LHSkqp2mytfs9AKlcHmTAMjXFyJLoUmVCfzL-ezqYcx6WHsZZpHnP5nITATpJxwv2tGcmVWPk32nSzzmgXZt37tsEmP4abtR2Wf-5QOT60Q3uw4-GnOG3t5TlsYbMPu11pB-FH-j48u8VTeADzEbWDMVeTwEp8W1S4FGdm5ddyxc3CiDETXwc8J1bCJfWyeMMxrgTzhVt3bFMohGd0_f4Crkbn87NJ4MsxBGWionWQm9BkVR2XYWRzVJawVlnT5zTUizzDuI5qTFKV1ZKZj8nLNRpNqGxkDUa8gHcI282ywVcgHPDRJi9TraXNNGE0ws0SY5vkhPb1AKJOJUXpucq5ZMbPwvksoS6cRgvWaOE1OoCP_TWrlqnjv9IHrI5e0mtiAEed5gs_mH8XTFpPVqzicADv-9M0DHlvxTS43JAMuV6EAlKWedlaTH_vztBe3__Md_BkMp9dFBfTyy9v4GnM-RUuLPwItte_NnhMqGdt3zpj_wswMPcq
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Fine-Grained+Video+Captioning+via+Graph-based+Multi-Granularity+Interaction+Learning&rft.jtitle=IEEE+transactions+on+pattern+analysis+and+machine+intelligence&rft.au=Yan%2C+Yichao&rft.au=Zhuang%2C+Ning&rft.au=Ni%2C+Bingbing&rft.au=Zhang%2C+Jian&rft.date=2022-02-01&rft.issn=0162-8828&rft.eissn=2160-9292&rft.volume=44&rft.issue=2&rft.spage=666&rft.epage=683&rft_id=info:doi/10.1109%2FTPAMI.2019.2946823&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TPAMI_2019_2946823
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0162-8828&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0162-8828&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0162-8828&client=summon