Fine-Grained Video Captioning via Graph-based Multi-Granularity Interaction Learning
Learning to generate continuous linguistic descriptions for multi-subject interactive videos in great details has particular applications in team sports auto-narrative. In contrast to traditional video caption, this task is more challenging as it requires simultaneous modeling of fine-grained indivi...
Saved in:
| Published in | IEEE transactions on pattern analysis and machine intelligence Vol. 44; no. 2; pp. 666 - 683 |
|---|---|
| Main Authors | , , , , , , , , , , , |
| Format | Journal Article |
| Language | English |
| Published |
United States
IEEE
01.02.2022
The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Subjects | |
| Online Access | Get full text |
| ISSN | 0162-8828 1939-3539 2160-9292 1939-3539 |
| DOI | 10.1109/TPAMI.2019.2946823 |
Cover
| Abstract | Learning to generate continuous linguistic descriptions for multi-subject interactive videos in great details has particular applications in team sports auto-narrative. In contrast to traditional video caption, this task is more challenging as it requires simultaneous modeling of fine-grained individual actions, uncovering of spatio-temporal dependency structures of frequent group interactions, and then accurate mapping of these complex interaction details into long and detailed commentary. To explicitly address these challenges, we propose a novel framework Graph-based Learning for Multi-Granularity Interaction Representation (GLMGIR) for fine-grained team sports auto-narrative task. A multi-granular interaction modeling module is proposed to extract among-subjects' interactive actions in a progressive way for encoding both intra- and inter-team interactions. Based on the above multi-granular representations, a multi-granular attention module is developed to consider action/event descriptions of multiple spatio-temporal resolutions. Both modules are integrated seamlessly and work in a collaborative way to generate the final narrative. In the meantime, to facilitate reproducible research, we collect a new video dataset from YouTube.com called Sports Video Narrative dataset (SVN). It is a novel direction as it contains <inline-formula><tex-math notation="LaTeX">6K</tex-math> <mml:math><mml:mrow><mml:mn>6</mml:mn><mml:mi>K</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="zhuang-ieq1-2946823.gif"/> </inline-formula> team sports videos (i.e., NBA basketball games) with <inline-formula><tex-math notation="LaTeX">10K</tex-math> <mml:math><mml:mrow><mml:mn>10</mml:mn><mml:mi>K</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="zhuang-ieq2-2946823.gif"/> </inline-formula> ground-truth narratives(e.g., sentences). Furthermore, as previous metrics such as METEOR (i.e., used in coarse-grained video caption task) DO NOT cope with fine-grained sports narrative task well, we hence develop a novel evaluation metric named Fine-grained Captioning Evaluation (FCE), which measures how accurate the generated linguistic description reflects fine-grained action details as well as the overall spatio-temporal interactional structure. Extensive experiments on our SVN dataset have demonstrated the effectiveness of the proposed framework for fine-grained team sports video auto-narrative. |
|---|---|
| AbstractList | Learning to generate continuous linguistic descriptions for multi-subject interactive videos in great details has particular applications in team sports auto-narrative. In contrast to traditional video caption, this task is more challenging as it requires simultaneous modeling of fine-grained individual actions, uncovering of spatio-temporal dependency structures of frequent group interactions, and then accurate mapping of these complex interaction details into long and detailed commentary. To explicitly address these challenges, we propose a novel framework Graph-based Learning for Multi-Granularity Interaction Representation (GLMGIR) for fine-grained team sports auto-narrative task. A multi-granular interaction modeling module is proposed to extract among-subjects’ interactive actions in a progressive way for encoding both intra- and inter-team interactions. Based on the above multi-granular representations, a multi-granular attention module is developed to consider action/event descriptions of multiple spatio-temporal resolutions. Both modules are integrated seamlessly and work in a collaborative way to generate the final narrative. In the meantime, to facilitate reproducible research, we collect a new video dataset from YouTube.com called Sports Video Narrative dataset (SVN). It is a novel direction as it contains [Formula Omitted] team sports videos (i.e., NBA basketball games) with [Formula Omitted] ground-truth narratives(e.g., sentences). Furthermore, as previous metrics such as METEOR (i.e., used in coarse-grained video caption task) DO NOT cope with fine-grained sports narrative task well, we hence develop a novel evaluation metric named Fine-grained Captioning Evaluation (FCE), which measures how accurate the generated linguistic description reflects fine-grained action details as well as the overall spatio-temporal interactional structure. Extensive experiments on our SVN dataset have demonstrated the effectiveness of the proposed framework for fine-grained team sports video auto-narrative. Learning to generate continuous linguistic descriptions for multi-subject interactive videos in great details has particular applications in team sports auto-narrative. In contrast to traditional video caption, this task is more challenging as it requires simultaneous modeling of fine-grained individual actions, uncovering of spatio-temporal dependency structures of frequent group interactions, and then accurate mapping of these complex interaction details into long and detailed commentary. To explicitly address these challenges, we propose a novel framework Graph-based Learning for Multi-Granularity Interaction Representation (GLMGIR) for fine-grained team sports auto-narrative task. A multi-granular interaction modeling module is proposed to extract among-subjects' interactive actions in a progressive way for encoding both intra- and inter-team interactions. Based on the above multi-granular representations, a multi-granular attention module is developed to consider action/event descriptions of multiple spatio-temporal resolutions. Both modules are integrated seamlessly and work in a collaborative way to generate the final narrative. In the meantime, to facilitate reproducible research, we collect a new video dataset from YouTube.com called Sports Video Narrative dataset (SVN). It is a novel direction as it contains <inline-formula><tex-math notation="LaTeX">6K</tex-math> <mml:math><mml:mrow><mml:mn>6</mml:mn><mml:mi>K</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="zhuang-ieq1-2946823.gif"/> </inline-formula> team sports videos (i.e., NBA basketball games) with <inline-formula><tex-math notation="LaTeX">10K</tex-math> <mml:math><mml:mrow><mml:mn>10</mml:mn><mml:mi>K</mml:mi></mml:mrow></mml:math><inline-graphic xlink:href="zhuang-ieq2-2946823.gif"/> </inline-formula> ground-truth narratives(e.g., sentences). Furthermore, as previous metrics such as METEOR (i.e., used in coarse-grained video caption task) DO NOT cope with fine-grained sports narrative task well, we hence develop a novel evaluation metric named Fine-grained Captioning Evaluation (FCE), which measures how accurate the generated linguistic description reflects fine-grained action details as well as the overall spatio-temporal interactional structure. Extensive experiments on our SVN dataset have demonstrated the effectiveness of the proposed framework for fine-grained team sports video auto-narrative. Learning to generate continuous linguistic descriptions for multi-subject interactive videos in great details has particular applications in team sports auto-narrative. In contrast to traditional video caption, this task is more challenging as it requires simultaneous modeling of fine-grained individual actions, uncovering of spatio-temporal dependency structures of frequent group interactions, and then accurate mapping of these complex interaction details into long and detailed commentary. To explicitly address these challenges, we propose a novel framework Graph-based Learning for Multi-Granularity Interaction Representation (GLMGIR) for fine-grained team sports auto-narrative task. A multi-granular interaction modeling module is proposed to extract among-subjects' interactive actions in a progressive way for encoding both intra- and inter-team interactions. Based on the above multi-granular representations, a multi-granular attention module is developed to consider action/event descriptions of multiple spatio-temporal resolutions. Both modules are integrated seamlessly and work in a collaborative way to generate the final narrative. In the meantime, to facilitate reproducible research, we collect a new video dataset from YouTube.com called Sports Video Narrative dataset (SVN). It is a novel direction as it contains 6K team sports videos (i.e., NBA basketball games) with 10K ground-truth narratives(e.g., sentences). Furthermore, as previous metrics such as METEOR (i.e., used in coarse-grained video caption task) DO NOT cope with fine-grained sports narrative task well, we hence develop a novel evaluation metric named Fine-grained Captioning Evaluation (FCE), which measures how accurate the generated linguistic description reflects fine-grained action details as well as the overall spatio-temporal interactional structure. Extensive experiments on our SVN dataset have demonstrated the effectiveness of the proposed framework for fine-grained team sports video auto-narrative.Learning to generate continuous linguistic descriptions for multi-subject interactive videos in great details has particular applications in team sports auto-narrative. In contrast to traditional video caption, this task is more challenging as it requires simultaneous modeling of fine-grained individual actions, uncovering of spatio-temporal dependency structures of frequent group interactions, and then accurate mapping of these complex interaction details into long and detailed commentary. To explicitly address these challenges, we propose a novel framework Graph-based Learning for Multi-Granularity Interaction Representation (GLMGIR) for fine-grained team sports auto-narrative task. A multi-granular interaction modeling module is proposed to extract among-subjects' interactive actions in a progressive way for encoding both intra- and inter-team interactions. Based on the above multi-granular representations, a multi-granular attention module is developed to consider action/event descriptions of multiple spatio-temporal resolutions. Both modules are integrated seamlessly and work in a collaborative way to generate the final narrative. In the meantime, to facilitate reproducible research, we collect a new video dataset from YouTube.com called Sports Video Narrative dataset (SVN). It is a novel direction as it contains 6K team sports videos (i.e., NBA basketball games) with 10K ground-truth narratives(e.g., sentences). Furthermore, as previous metrics such as METEOR (i.e., used in coarse-grained video caption task) DO NOT cope with fine-grained sports narrative task well, we hence develop a novel evaluation metric named Fine-grained Captioning Evaluation (FCE), which measures how accurate the generated linguistic description reflects fine-grained action details as well as the overall spatio-temporal interactional structure. Extensive experiments on our SVN dataset have demonstrated the effectiveness of the proposed framework for fine-grained team sports video auto-narrative. Learning to generate continuous linguistic descriptions for multi-subject interactive videos in great details has particular applications in team sports auto-narrative. In contrast to traditional video caption, this task is more challenging as it requires simultaneous modeling of fine-grained individual actions, uncovering of spatio-temporal dependency structures of frequent group interactions, and then accurate mapping of these complex interaction details into long and detailed commentary. To explicitly address these challenges, we propose a novel framework Graph-based Learning for Multi-Granularity Interaction Representation (GLMGIR) for fine-grained team sports auto-narrative task. A multi-granular interaction modeling module is proposed to extract among-subjects' interactive actions in a progressive way for encoding both intra- and inter-team interactions. Based on the above multi-granular representations, a multi-granular attention module is developed to consider action/event descriptions of multiple spatio-temporal resolutions. Both modules are integrated seamlessly and work in a collaborative way to generate the final narrative. In the meantime, to facilitate reproducible research, we collect a new video dataset from YouTube.com called Sports Video Narrative dataset (SVN). It is a novel direction as it contains 6K team sports videos (i.e., NBA basketball games) with 10K ground-truth narratives(e.g., sentences). Furthermore, as previous metrics such as METEOR (i.e., used in coarse-grained video caption task) DO NOT cope with fine-grained sports narrative task well, we hence develop a novel evaluation metric named Fine-grained Captioning Evaluation (FCE), which measures how accurate the generated linguistic description reflects fine-grained action details as well as the overall spatio-temporal interactional structure. Extensive experiments on our SVN dataset have demonstrated the effectiveness of the proposed framework for fine-grained team sports video auto-narrative. |
| Author | Zhuang, Ning Xu, Minghao Yan, Yichao Zhang, Zheng Yang, Xiaokang Zhang, Jian Zhang, Qiang Tian, Qi Ni, Bingbing Zhang, Wenjun Xu, Yi Cheng, Shuo |
| Author_xml | – sequence: 1 givenname: Yichao orcidid: 0000-0003-3209-8965 surname: Yan fullname: Yan, Yichao email: yanyichao@sjtu.edu.cn organization: Shanghai Jiao Tong University, Shanghai, China – sequence: 2 givenname: Ning orcidid: 0000-0002-9605-0891 surname: Zhuang fullname: Zhuang, Ning email: ningzhuang@sjtu.edu.cn organization: Shanghai Jiao Tong University, Shanghai, China – sequence: 3 givenname: Bingbing orcidid: 0000-0001-7339-028X surname: Ni fullname: Ni, Bingbing email: nibingbing@sjtu.edu.cn organization: Shanghai Jiao Tong University, Shanghai, China – sequence: 4 givenname: Jian orcidid: 0000-0003-4410-3741 surname: Zhang fullname: Zhang, Jian email: stevenash0822@sjtu.edu.cn organization: Shanghai Jiao Tong University, Shanghai, China – sequence: 5 givenname: Minghao orcidid: 0000-0001-7468-8790 surname: Xu fullname: Xu, Minghao email: xuminghao118@sjtu.edu.cn organization: Shanghai Jiao Tong University, Shanghai, China – sequence: 6 givenname: Qiang orcidid: 0000-0002-8142-1362 surname: Zhang fullname: Zhang, Qiang email: zhangqiang2016@sjtu.edu.cn organization: Shanghai Jiao Tong University, Shanghai, China – sequence: 7 givenname: Zheng orcidid: 0000-0002-7170-3884 surname: Zhang fullname: Zhang, Zheng email: 123derrick@sjtu.edu.cn organization: Shanghai Jiao Tong University, Shanghai, China – sequence: 8 givenname: Shuo orcidid: 0000-0002-4477-9875 surname: Cheng fullname: Cheng, Shuo email: acccheng94@gmail.com organization: Shanghai Jiao Tong University, Shanghai, China – sequence: 9 givenname: Qi orcidid: 0000-0002-5165-4325 surname: Tian fullname: Tian, Qi email: qitian@cs.utsa.edu organization: University of Texas at San Antonio, San Antonio, TX, USA – sequence: 10 givenname: Yi surname: Xu fullname: Xu, Yi email: xuyi@sjtu.edu.cn organization: Shanghai Jiao Tong University, Shanghai, China – sequence: 11 givenname: Xiaokang orcidid: 0000-0003-4029-3322 surname: Yang fullname: Yang, Xiaokang email: yangxiaokang@sjtu.edu.cn organization: MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University, Shanghai, China – sequence: 12 givenname: Wenjun orcidid: 0000-0001-8799-1182 surname: Zhang fullname: Zhang, Wenjun email: zhangwenjun@sjtu.edu.cn organization: Shanghai Jiao Tong University, Shanghai, China |
| BackLink | https://www.ncbi.nlm.nih.gov/pubmed/31613750$$D View this record in MEDLINE/PubMed |
| BookMark | eNp9kcFu1DAQhi1URLeFFwAJReLSSxbPOHbsY7Wi7UpbwWHhajnZCbjKOoudIPXtcbpbDj1wmoO_b-T5_wt2FoZAjL0HvgTg5vP22_X9eokczBJNpTSKV2yBoHhp0OAZW3BQWGqN-pxdpPTAOVSSizfsXIACUUu-YNsbH6i8jS6PXfHD72goVu4w-iH48LP4412RHw-_ysalDNxP_ehnPEy9i358LNZhpOjaWSg25OKsvWWvO9cneneal-z7zZft6q7cfL1dr643ZSskjKV23NW7DlsOjSbZSAltly9xWitdE3bQkVCy7iqFtQSBzpDjsoHGEVS1EZfs6rj3EIffE6XR7n1qqe9doGFKFgVXCKiQZ_TTC_RhmGLIv7OooK4MyCfq44mamj3t7CH6vYuP9jmvDOgj0MYhpUidbf3o5uPHHGFvgdu5GvtUjZ2rsadqsoov1Oft_5U-HCVPRP-EnI9U3Ii_GbSYFA |
| CODEN | ITPIDJ |
| CitedBy_id | crossref_primary_10_1109_TPAMI_2022_3217046 crossref_primary_10_1109_TPAMI_2024_3479776 crossref_primary_10_1109_TIP_2023_3261743 crossref_primary_10_1016_j_patcog_2020_107267 crossref_primary_10_1049_ipr2_12819 crossref_primary_10_1109_TMM_2023_3330070 crossref_primary_10_1007_s10462_023_10414_6 |
| Cites_doi | 10.1145/3272127.3275110 10.1007/978-3-642-40760-4_43 10.1109/CVPR.2017.179 10.1109/CVPR.2015.7298932 10.1098/rstb.2013.0480 10.1609/aaai.v32i1.11849 10.1109/CVPR.2017.356 10.1109/ICCV.2017.83 10.1109/CVPR.2017.548 10.1109/CVPR.2016.497 10.1109/CVPR.2015.7298935 10.5244/C.29.128 10.1109/ICCV.2015.512 10.1007/978-3-642-15561-1_30 10.1109/CVPR.2013.128 10.1109/CVPR.2017.345 10.1146/annurev.neuro.26.041002.131047 10.1109/ICCV.2013.47 10.1109/CVPR.2015.7298878 10.1109/CVPR.2016.571 10.1109/CVPR.2015.7298965 10.3115/1073083.1073135 10.18653/v1/D16-1204 10.1007/978-3-642-15561-1_2 10.1007/978-3-642-30618-1_17 10.1109/ICCV.2013.337 10.18653/v1/D15-1166 10.1109/CVPR.2012.6247801 10.1109/ICCV.2015.276 10.1109/ICCV.2015.368 10.1109/TPAMI.2016.2577031 10.1007/s11263-014-0733-5 10.3115/v1/N15-1173 10.1109/CVPR.2018.00102 10.1109/CVPR.2018.00484 10.1109/CVPR.2018.00629 10.1109/CVPR.2017.662 10.1023/A:1020346032608 10.1109/CVPR.2017.713 10.1109/ICCV.2013.61 10.1109/ICCV.2013.215 10.1109/CVPR.2013.320 10.1609/aaai.v27i1.8679 10.1109/ICRA.2016.7487305 10.1007/978-3-319-46487-9_47 10.1109/CVPR.2018.00754 10.1145/3123266.3123358 10.1109/TIP.2016.2609811 10.1371/journal.pone.0180234 10.1109/CVPR.2016.494 10.1109/TBME.2005.869771 10.1007/978-3-319-11752-2_15 10.1145/3326362 10.1007/978-3-319-24947-6_17 10.1049/cp:19991218 10.1109/CVPR.2016.496 10.1109/CVPR.2013.340 10.1109/WACV.2018.00174 10.1109/TPAMI.2012.162 10.1109/ICCV.2015.515 10.1109/WACV.2015.154 10.1109/CVPR.2015.7299087 10.1109/ICCV.2011.6126306 10.1109/CVPR.2017.127 10.1109/ICASSP.2013.6638947 10.1109/CVPR.2015.7298685 10.1109/ICDMW.2009.79 10.24963/ijcai.2017/381 10.1109/ICCV.2015.316 10.1007/978-3-642-33718-5_13 10.1109/CVPR.2014.102 |
| ContentType | Journal Article |
| Copyright | Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022 |
| Copyright_xml | – notice: Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022 |
| DBID | 97E RIA RIE AAYXX CITATION CGR CUY CVF ECM EIF NPM 7SC 7SP 8FD JQ2 L7M L~C L~D 7X8 |
| DOI | 10.1109/TPAMI.2019.2946823 |
| DatabaseName | IEEE All-Society Periodicals Package (ASPP) 2005–Present IEEE All-Society Periodicals Package (ASPP) 1998–Present IEEE/IET Electronic Library CrossRef Medline MEDLINE MEDLINE (Ovid) MEDLINE MEDLINE PubMed Computer and Information Systems Abstracts Electronics & Communications Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional MEDLINE - Academic |
| DatabaseTitle | CrossRef MEDLINE Medline Complete MEDLINE with Full Text PubMed MEDLINE (Ovid) Technology Research Database Computer and Information Systems Abstracts – Academic Electronics & Communications Abstracts ProQuest Computer Science Collection Computer and Information Systems Abstracts Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Professional MEDLINE - Academic |
| DatabaseTitleList | Technology Research Database MEDLINE - Academic MEDLINE |
| Database_xml | – sequence: 1 dbid: NPM name: PubMed url: https://proxy.k.utb.cz/login?url=http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=PubMed sourceTypes: Index Database – sequence: 2 dbid: EIF name: MEDLINE url: https://proxy.k.utb.cz/login?url=https://www.webofscience.com/wos/medline/basic-search sourceTypes: Index Database – sequence: 3 dbid: RIE name: IEEE/IET Electronic Library url: https://proxy.k.utb.cz/login?url=https://ieeexplore.ieee.org/ sourceTypes: Publisher |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Computer Science |
| EISSN | 2160-9292 1939-3539 |
| EndPage | 683 |
| ExternalDocumentID | 31613750 10_1109_TPAMI_2019_2946823 8865609 |
| Genre | orig-research Research Support, Non-U.S. Gov't Journal Article |
| GrantInformation_xml | – fundername: Higher Education Discipline Innovation Project; 111 Project grantid: B07022; 150633 funderid: 10.13039/501100013314 – fundername: Shanghai Key Laboratory of Digital Media Processing and Transmission; Shanghai Key Laboratory of Digital Media Processing and Transmissions funderid: 10.13039/501100012656 – fundername: CCF-Tencent Open Fund – fundername: State Key Research and Development Program grantid: 2016YFB1001003 – fundername: National Natural Science Foundation of China grantid: 61976137; U1611461 funderid: 10.13039/501100001809 – fundername: SJTU-BIGO Joint Research Fund – fundername: MoE-China Mobile Research Fund Project grantid: MCM20180702 |
| GroupedDBID | --- -DZ -~X .DC 0R~ 29I 4.4 53G 5GY 6IK 97E AAJGR AARMG AASAJ AAWTH ABAZT ABQJQ ABVLG ACGFO ACGFS ACIWK ACNCT AENEX AGQYO AHBIQ AKJIK AKQYR ALMA_UNASSIGNED_HOLDINGS ASUFR ATWAV BEFXN BFFAM BGNUA BKEBE BPEOZ CS3 DU5 E.L EBS EJD F5P HZ~ IEDLZ IFIPE IPLJI JAVBF LAI M43 MS~ O9- OCL P2P PQQKQ RIA RIE RNS RXW TAE TN5 UHB ~02 AAYXX CITATION 5VS 9M8 ABFSI ADRHT AETEA AETIX AGSQL AI. AIBXA ALLEH CGR CUY CVF ECM EIF FA8 H~9 IBMZZ ICLAB IFJZH NPM RIG RNI RZB VH1 XJT 7SC 7SP 8FD JQ2 L7M L~C L~D 7X8 |
| ID | FETCH-LOGICAL-c351t-8a0a7df2c01b8e5b551cf294a88687e2f1fe3657f46275132a9ea05b1bae14793 |
| IEDL.DBID | RIE |
| ISSN | 0162-8828 1939-3539 |
| IngestDate | Sat Sep 27 20:18:51 EDT 2025 Mon Jun 30 06:08:35 EDT 2025 Mon Jul 21 06:04:57 EDT 2025 Wed Oct 01 06:04:42 EDT 2025 Thu Apr 24 23:11:10 EDT 2025 Wed Jun 11 06:00:31 EDT 2025 |
| IsPeerReviewed | true |
| IsScholarly | true |
| Issue | 2 |
| Language | English |
| License | https://ieeexplore.ieee.org/Xplorehelp/downloads/license-information/IEEE.html https://doi.org/10.15223/policy-029 https://doi.org/10.15223/policy-037 |
| LinkModel | DirectLink |
| MergedId | FETCHMERGED-LOGICAL-c351t-8a0a7df2c01b8e5b551cf294a88687e2f1fe3657f46275132a9ea05b1bae14793 |
| Notes | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 |
| ORCID | 0000-0002-8142-1362 0000-0002-9605-0891 0000-0001-7339-028X 0000-0001-8799-1182 0000-0003-3209-8965 0000-0003-4410-3741 0000-0002-5165-4325 0000-0003-4029-3322 0000-0002-4477-9875 0000-0001-7468-8790 0000-0002-7170-3884 |
| PMID | 31613750 |
| PQID | 2617491520 |
| PQPubID | 85458 |
| PageCount | 18 |
| ParticipantIDs | proquest_journals_2617491520 ieee_primary_8865609 proquest_miscellaneous_2306212620 crossref_primary_10_1109_TPAMI_2019_2946823 crossref_citationtrail_10_1109_TPAMI_2019_2946823 pubmed_primary_31613750 |
| ProviderPackageCode | CITATION AAYXX |
| PublicationCentury | 2000 |
| PublicationDate | 2022-02-01 |
| PublicationDateYYYYMMDD | 2022-02-01 |
| PublicationDate_xml | – month: 02 year: 2022 text: 2022-02-01 day: 01 |
| PublicationDecade | 2020 |
| PublicationPlace | United States |
| PublicationPlace_xml | – name: United States – name: New York |
| PublicationTitle | IEEE transactions on pattern analysis and machine intelligence |
| PublicationTitleAbbrev | TPAMI |
| PublicationTitleAlternate | IEEE Trans Pattern Anal Mach Intell |
| PublicationYear | 2022 |
| Publisher | IEEE The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| Publisher_xml | – name: IEEE – name: The Institute of Electrical and Electronics Engineers, Inc. (IEEE) |
| References | Bouvrie (ref6) 2009 ref13 ref12 ref56 ref15 ref59 ref14 Sutskever (ref72) ref53 ref11 ref55 Thomason (ref75) ref54 Qi (ref58) ref17 ref16 ref19 ref18 Yang (ref90) Jiang (ref28) 2018 ref93 ref92 ref95 ref50 ref94 ref91 ref46 Qi (ref57) ref45 ref89 ref48 ref47 ref42 ref86 ref41 ref85 ref44 ref87 ref49 ref8 Xu (ref88) ref9 ref4 ref3 Newell (ref52) ref5 Kingma (ref32) 2015 ref82 ref81 ref84 Lavie (ref40) ref83 ref80 Klimek (ref33) 2017 ref79 ref34 Lin (ref43) ref78 ref37 Simonyan (ref70) ref36 ref31 ref30 ref74 ref77 Newell (ref51) 2017 ref1 ref39 ref38 Vaswani (ref76) 2017 Sermanet (ref67) 2015 Chen (ref10) 2015 Kojima (ref35) 2002; 50 ref71 ref73 Cao (ref7) 2016 ref24 ref68 ref23 ref26 ref25 ref69 ref20 ref64 ref63 ref22 ref66 Ba (ref2) 2015 ref21 ref65 ref27 ref29 ref60 ref62 ref61 |
| References_xml | – ident: ref25 doi: 10.1145/3272127.3275110 – ident: ref74 doi: 10.1007/978-3-642-40760-4_43 – start-page: 1218 volume-title: Proc. 25th Int. Conf. Comput. Linguistics ident: ref75 article-title: Integrating language and vision to generate natural language descriptions of videos in the wild – ident: ref26 doi: 10.1109/CVPR.2017.179 – ident: ref30 doi: 10.1109/CVPR.2015.7298932 – start-page: 5998 year: 2017 ident: ref76 article-title: Attention is all you need publication-title: Advances Neural Inf. Process. Syst. – ident: ref5 doi: 10.1098/rstb.2013.0480 – ident: ref9 doi: 10.1609/aaai.v32i1.11849 – ident: ref36 doi: 10.1109/CVPR.2017.356 – ident: ref37 doi: 10.1109/ICCV.2017.83 – ident: ref68 doi: 10.1109/CVPR.2017.548 – ident: ref54 doi: 10.1109/CVPR.2016.497 – ident: ref82 doi: 10.1109/CVPR.2015.7298935 – ident: ref69 doi: 10.5244/C.29.128 – ident: ref91 doi: 10.1109/ICCV.2015.512 – ident: ref13 doi: 10.1007/978-3-642-15561-1_30 – start-page: 3122 volume-title: Proc. Advances Neural Inf. Process. Syst. ident: ref90 article-title: Unsupervised template learning for fine-grained object recognition – ident: ref4 doi: 10.1109/CVPR.2013.128 – ident: ref47 doi: 10.1109/CVPR.2017.345 – ident: ref3 doi: 10.1146/annurev.neuro.26.041002.131047 – ident: ref8 doi: 10.1109/ICCV.2013.47 – ident: ref15 doi: 10.1109/CVPR.2015.7298878 – ident: ref87 doi: 10.1109/CVPR.2016.571 – ident: ref46 doi: 10.1109/CVPR.2015.7298965 – ident: ref55 doi: 10.3115/1073083.1073135 – ident: ref78 doi: 10.18653/v1/D16-1204 – ident: ref31 doi: 10.1109/CVPR.2015.7298932 – year: 2015 ident: ref32 article-title: Adam: A method for stochastic optimization – ident: ref19 doi: 10.1007/978-3-642-15561-1_2 – ident: ref81 doi: 10.1109/CVPR.2015.7298935 – start-page: 2274 volume-title: Proc. Annu. Conf. Neural Inf. Process. Syst. ident: ref52 article-title: Associative embedding: End-to-end learning for joint detection and grouping – ident: ref73 doi: 10.1007/978-3-642-30618-1_17 – ident: ref24 doi: 10.1109/ICCV.2013.337 – ident: ref49 doi: 10.18653/v1/D15-1166 – ident: ref64 doi: 10.1109/CVPR.2012.6247801 – ident: ref83 doi: 10.1109/ICCV.2015.276 – ident: ref11 doi: 10.1109/ICCV.2015.368 – ident: ref61 doi: 10.1109/TPAMI.2016.2577031 – ident: ref18 doi: 10.1007/s11263-014-0733-5 – start-page: 2277 year: 2017 ident: ref51 article-title: Associative embedding: End-to-end learning for joint detection and grouping publication-title: Proc. Adv. Neural Inf. Process. Syst. 30: Annu. Conf. Neural Inf. Process. Syst. – ident: ref80 doi: 10.3115/v1/N15-1173 – start-page: 74 volume-title: Proc. Workshop Text Summarization Branches Out ident: ref43 article-title: ROUGE: A package for automatic evaluation of summaries – ident: ref56 doi: 10.1109/CVPR.2018.00102 – ident: ref86 doi: 10.1109/CVPR.2018.00484 – ident: ref93 doi: 10.1109/CVPR.2018.00629 – start-page: 3104 volume-title: Proc. Advances Neural Inf. Process. Syst. ident: ref72 article-title: Sequence to sequence learning with neural networks – year: 2009 ident: ref6 article-title: Hierarchical learning: Theory with applications in speech and vision – ident: ref95 doi: 10.1109/CVPR.2017.662 – volume: 50 start-page: 171 issue: 2 year: 2002 ident: ref35 article-title: Natural language description of human activities from video images based on concept hierarchy of actions publication-title: Int. J. Comput. Vis. doi: 10.1023/A:1020346032608 – ident: ref92 doi: 10.1109/ICCV.2015.512 – ident: ref45 doi: 10.1109/CVPR.2017.713 – start-page: 77 volume-title: Proc. Conf. Comput. Vis. Pattern Recognit. ident: ref57 article-title: PointNet: Deep learning on point sets for 3D classification and segmentation – year: 2018 ident: ref28 article-title: Pointsift: A sift-like network module for 3D point cloud semantic segmentation publication-title: CoRR – start-page: 65 volume-title: Proc. EMNLP Workshop Statistical Mach. Transl. ident: ref40 article-title: Meteor: An automatic metric for mt evaluation with improved correlation with human judgments – ident: ref65 doi: 10.1109/ICCV.2013.61 – year: 2015 ident: ref67 article-title: Attention for fine-grained categorization publication-title: Proc. 3rd Int. Conf. Learn. Representations – ident: ref21 doi: 10.1109/ICCV.2013.215 – start-page: 5099 volume-title: Proc. Advances Neural Inf. Process. Syst. ident: ref58 article-title: Pointnet++: Deep hierarchical feature learning on point sets in a metric space – start-page: 301 volume-title: Proc. Conf. Robot Learn. year: 2017 ident: ref33 article-title: Hierarchical reinforcement learning with parameters – ident: ref59 doi: 10.1109/CVPR.2013.320 – ident: ref38 doi: 10.1609/aaai.v27i1.8679 – ident: ref42 doi: 10.1109/ICRA.2016.7487305 – ident: ref17 doi: 10.1007/978-3-319-46487-9_47 – ident: ref48 doi: 10.1109/CVPR.2018.00754 – ident: ref89 doi: 10.1145/3123266.3123358 – ident: ref34 doi: 10.1109/TIP.2016.2609811 – ident: ref60 doi: 10.1371/journal.pone.0180234 – year: 2015 ident: ref2 article-title: Multiple object recognition with visual attention publication-title: Proc. 3rd Int. Conf. Learn. Representations – ident: ref29 doi: 10.1109/CVPR.2016.494 – ident: ref66 doi: 10.1109/TBME.2005.869771 – ident: ref62 doi: 10.1007/978-3-319-11752-2_15 – volume-title: Proc. Int. Conf. Learn. Representations ident: ref70 article-title: Very deep convolutional networks for large-scale image recognition – ident: ref84 doi: 10.1145/3326362 – ident: ref63 doi: 10.1007/978-3-319-24947-6_17 – ident: ref22 doi: 10.1049/cp:19991218 – year: 2016 ident: ref7 article-title: Realtime multi-person 2D pose estimation using part affinity fields – ident: ref94 doi: 10.1109/CVPR.2016.496 – ident: ref12 doi: 10.1109/CVPR.2013.340 – ident: ref50 doi: 10.1109/WACV.2018.00174 – ident: ref39 doi: 10.1109/TPAMI.2012.162 – ident: ref79 doi: 10.1109/ICCV.2015.515 – ident: ref41 doi: 10.1109/WACV.2015.154 – ident: ref77 doi: 10.1109/CVPR.2015.7299087 – ident: ref14 doi: 10.1109/ICCV.2011.6126306 – ident: ref20 doi: 10.1109/CVPR.2017.127 – ident: ref23 doi: 10.1109/ICASSP.2013.6638947 – start-page: 2048 volume-title: Proc. 32nd Int. Conf. Mach. Learn. ident: ref88 article-title: Show, attend and tell: Neural image caption generation with visual attention – ident: ref27 doi: 10.1109/CVPR.2017.179 – ident: ref85 doi: 10.1109/CVPR.2015.7298685 – ident: ref1 doi: 10.1109/ICDMW.2009.79 – ident: ref71 doi: 10.24963/ijcai.2017/381 – ident: ref16 doi: 10.1109/ICCV.2015.316 – ident: ref44 doi: 10.1007/978-3-642-33718-5_13 – year: 2015 ident: ref10 article-title: Microsoft COCO captions: Data collection and evaluation server – ident: ref53 doi: 10.1109/CVPR.2014.102 |
| SSID | ssj0014503 |
| Score | 2.462949 |
| Snippet | Learning to generate continuous linguistic descriptions for multi-subject interactive videos in great details has particular applications in team sports... |
| SourceID | proquest pubmed crossref ieee |
| SourceType | Aggregation Database Index Database Enrichment Source Publisher |
| StartPage | 666 |
| SubjectTerms | Algorithms Datasets Descriptions Evaluation Feature extraction fine-grained Games graphCNN Humans Interaction models Learning Linguistics Measurement Modelling Modules multiple granularity Natural language processing representation learning Representations Sentences Sports Task analysis Team sports Three-dimensional displays Video Video caption |
| Title | Fine-Grained Video Captioning via Graph-based Multi-Granularity Interaction Learning |
| URI | https://ieeexplore.ieee.org/document/8865609 https://www.ncbi.nlm.nih.gov/pubmed/31613750 https://www.proquest.com/docview/2617491520 https://www.proquest.com/docview/2306212620 |
| Volume | 44 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVIEE databaseName: IEEE/IET Electronic Library customDbUrl: eissn: 2160-9292 dateEnd: 99991231 omitProxy: false ssIdentifier: ssj0014503 issn: 0162-8828 databaseCode: RIE dateStart: 19790101 isFulltext: true titleUrlDefault: https://ieeexplore.ieee.org/ providerName: IEEE |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1LT9wwEB6xnNpDKdCWLQ8Zqbc22bxjHxFioUhb9bBU3CI7mVQrUIIgy6G_nhnHiVDVIi6RpUwezozjb-yZbwC-5LXOtEH0qjSMvUTRwdBP0CPfmdntSprBeEd38SO7uEour9PrDfg25sIgog0-Q5-bdi-_ass1L5XNpGSqGDWBSS6zPldr3DFIUlsFmRAMjXByI4YEmUDNlj9PFt85ikv5kUoyGXHxnJigTpxzuv2z-cgWWPk_1rRzznwLFsPb9qEmN_66M3755y8ix9d25z28c-BTnPTWsg0b2OzA1lDYQbhxvgNvn7EU7sJyTm3vnGtJYCV-rSpsxam-cyu54nGlxTnTXns8I1bCpvSyeMMRrgTyhV117BMohONz_f0BruZny9MLzxVj8Mo4DTtP6kDnVR2VQWgkpoaQVlnT59TUC5ljVIc1xlma1wnzHpOPqxXqIDWh0Rjy8t1H2GzaBvdAWNijtCwzpRKTK0JohJoTjEwsCeurKYSDSorSMZVzwYzbwnosgSqsRgvWaOE0OoWv4zV3PU_Hi9K7rI5R0mliCgeD5gs3lB8KpqwnG06jYArH42kahLyzohts1yRDjhdhgIxlPvUWM957MLTP_37mPryJOKPCBoIfwGZ3v8ZDwjmdObIG_gRDF_Pd |
| linkProvider | IEEE |
| linkToHtml | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwjV1Lb9QwEB6VcgAObWkpLLRgJG6QbR52Eh-rqvuAbsVhi3qL7GSCVqDsCnZ74Ncz4zhRVRXUS2Qpk4cz4_gbe-YbgA9ZbVJjEYNKRUkgNR0s_QQD8p2Z3a6kGYx3dGeX6eRKfr5W11vwqc-FQUQXfIZDbrq9_GpZbnip7CTPmSpGP4LHSkqp2mytfs9AKlcHmTAMjXFyJLoUmVCfzL-ezqYcx6WHsZZpHnP5nITATpJxwv2tGcmVWPk32nSzzmgXZt37tsEmP4abtR2Wf-5QOT60Q3uw4-GnOG3t5TlsYbMPu11pB-FH-j48u8VTeADzEbWDMVeTwEp8W1S4FGdm5ddyxc3CiDETXwc8J1bCJfWyeMMxrgTzhVt3bFMohGd0_f4Crkbn87NJ4MsxBGWionWQm9BkVR2XYWRzVJawVlnT5zTUizzDuI5qTFKV1ZKZj8nLNRpNqGxkDUa8gHcI282ywVcgHPDRJi9TraXNNGE0ws0SY5vkhPb1AKJOJUXpucq5ZMbPwvksoS6cRgvWaOE1OoCP_TWrlqnjv9IHrI5e0mtiAEed5gs_mH8XTFpPVqzicADv-9M0DHlvxTS43JAMuV6EAlKWedlaTH_vztBe3__Md_BkMp9dFBfTyy9v4GnM-RUuLPwItte_NnhMqGdt3zpj_wswMPcq |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Fine-Grained+Video+Captioning+via+Graph-based+Multi-Granularity+Interaction+Learning&rft.jtitle=IEEE+transactions+on+pattern+analysis+and+machine+intelligence&rft.au=Yan%2C+Yichao&rft.au=Zhuang%2C+Ning&rft.au=Ni%2C+Bingbing&rft.au=Zhang%2C+Jian&rft.date=2022-02-01&rft.issn=0162-8828&rft.eissn=2160-9292&rft.volume=44&rft.issue=2&rft.spage=666&rft.epage=683&rft_id=info:doi/10.1109%2FTPAMI.2019.2946823&rft.externalDBID=n%2Fa&rft.externalDocID=10_1109_TPAMI_2019_2946823 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0162-8828&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0162-8828&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0162-8828&client=summon |