Sparse Adversarial Examples Attacking on Video Captioning Model

Despite the fact that multi-modal deep learning such as image captioning model has been proved to be vulnerable to adversarial examples, the adversarial susceptibility in video caption generation is under-examined.There are two main reasons for this.On the one hand, the video captioning model input...

Full description

Saved in:
Bibliographic Details
Published inJi suan ji ke xue Vol. 50; no. 12; pp. 330 - 336
Main Authors Qiu, Jiangxing, Tang, Xueming, Wang, Tianmei, Wang, Chen, Cui, Yongquan, Luo, Ting
Format Journal Article
LanguageChinese
Published Chongqing Guojia Kexue Jishu Bu 01.12.2023
Editorial office of Computer Science
Subjects
Online AccessGet full text
ISSN1002-137X
DOI10.11896/jsjkx.221100068

Cover

Abstract Despite the fact that multi-modal deep learning such as image captioning model has been proved to be vulnerable to adversarial examples, the adversarial susceptibility in video caption generation is under-examined.There are two main reasons for this.On the one hand, the video captioning model input is a stream of images rather than a single picture in contrast to image captioning systems.The calculation would be enormous if we perturb each frame of a video.On the other hand, compared with the video recognition model, the output of the model is not a single word, but a more complex semantic description.To solve the above problems and study the robustness of video captioning model, this paper proposes a sparse adversarial attack method.Firstly, a method is proposed based on the idea derived from saliency maps in image object recognition model to verify the contribution of different frames to the video captioning model output and a L norm based optimistic objective function suited for video caption models is des
AbstractList Despite the fact that multi-modal deep learning such as image captioning model has been proved to be vulnerable to adversarial examples,the adversarial susceptibility in video caption generation is under-examined.There are two main reasons for this.On the one hand,the video captioning model input is a stream of images rather than a single picture in contrast to image captioning systems.The calculation would be enormous if we perturb each frame of a video.On the other hand,compared with the video recognition model,the output of the model is not a single word,but a more complex semantic description.To solve the above problems and study the robustness of video captioning model,this paper proposes a sparse adversarial attack method.Firstly,a method is proposed based on the idea derived from saliency maps in image object recognition model to verify the contribution of different frames to the video captioning model output and a L2norm based optimistic objective function suited for video caption models is designed.W
Despite the fact that multi-modal deep learning such as image captioning model has been proved to be vulnerable to adversarial examples, the adversarial susceptibility in video caption generation is under-examined.There are two main reasons for this.On the one hand, the video captioning model input is a stream of images rather than a single picture in contrast to image captioning systems.The calculation would be enormous if we perturb each frame of a video.On the other hand, compared with the video recognition model, the output of the model is not a single word, but a more complex semantic description.To solve the above problems and study the robustness of video captioning model, this paper proposes a sparse adversarial attack method.Firstly, a method is proposed based on the idea derived from saliency maps in image object recognition model to verify the contribution of different frames to the video captioning model output and a L norm based optimistic objective function suited for video caption models is des
Author Wang, Tianmei
Wang, Chen
Qiu, Jiangxing
Cui, Yongquan
Tang, Xueming
Luo, Ting
Author_xml – sequence: 1
  givenname: Jiangxing
  surname: Qiu
  fullname: Qiu, Jiangxing
– sequence: 2
  givenname: Xueming
  surname: Tang
  fullname: Tang, Xueming
– sequence: 3
  givenname: Tianmei
  surname: Wang
  fullname: Wang, Tianmei
– sequence: 4
  givenname: Chen
  surname: Wang
  fullname: Wang, Chen
– sequence: 5
  givenname: Yongquan
  surname: Cui
  fullname: Cui, Yongquan
– sequence: 6
  givenname: Ting
  surname: Luo
  fullname: Luo, Ting
BookMark eNotjktLw0AUhWdRwVq7dxlwnTqvTmZWUkrVQsWFRdyFedyUSdNMnEml_ntTK3dxDh-cj3uDRm1oAaE7gmeESCUe6lTvTzNKCcEYCzlC46HQnLDi8xpNU_IGUyb4cGSMHt87HRNkC_cNMenodZOtTvrQNZCyRd9ru_ftLgtt9uEdhGypu96H9sxeg4PmFl1Vukkw_c8J2j6ttsuXfPP2vF4uNrlTXOXGWm0txRWtlMOaa1GJuQIGXBTWSFUxENYYU1BlFIc5ACOacHCaCqycYxO0vmhd0HXZRX_Q8acM2pd_IMRdqWPvbQOlcUQWzAgimePOEEm5KbCkZ8SlZYPr_uLqYvg6QurLOhxjO3xfUjWsmMJUsV8qZWWN
ContentType Journal Article
Copyright Copyright Guojia Kexue Jishu Bu 2023
Copyright_xml – notice: Copyright Guojia Kexue Jishu Bu 2023
DBID 7SC
8FD
JQ2
L7M
L~C
L~D
DOA
DOI 10.11896/jsjkx.221100068
DatabaseName Computer and Information Systems Abstracts
Technology Research Database
ProQuest Computer Science Collection
Advanced Technologies Database with Aerospace
Computer and Information Systems Abstracts – Academic
Computer and Information Systems Abstracts Professional
DOAJ Directory of Open Access Journals
DatabaseTitle Computer and Information Systems Abstracts
Technology Research Database
Computer and Information Systems Abstracts – Academic
Advanced Technologies Database with Aerospace
ProQuest Computer Science Collection
Computer and Information Systems Abstracts Professional
DatabaseTitleList
Computer and Information Systems Abstracts
Database_xml – sequence: 1
  dbid: DOA
  name: DOAJ Directory of Open Access Journals
  url: https://www.doaj.org/
  sourceTypes: Open Website
DeliveryMethod fulltext_linktorsrc
Discipline Computer Science
EndPage 336
ExternalDocumentID oai_doaj_org_article_bd1873b6183d4db1824b70823b6148c3
GroupedDBID -0Y
5XA
5XJ
7SC
8FD
92H
92I
ABJNI
ACGFS
ALMA_UNASSIGNED_HOLDINGS
CCEZO
CUBFJ
CW9
GROUPED_DOAJ
JQ2
L7M
L~C
L~D
TCJ
TGT
U1G
U5S
ID FETCH-LOGICAL-d949-bccacc20f2f9d0a4a6f659e3e467cb89f3e6cbbb729b94e5ee31a14eda2609dd3
IEDL.DBID DOA
ISSN 1002-137X
IngestDate Mon Sep 01 19:40:29 EDT 2025
Sun Jun 29 15:18:47 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed false
IsScholarly true
Issue 12
Language Chinese
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-d949-bccacc20f2f9d0a4a6f659e3e467cb89f3e6cbbb729b94e5ee31a14eda2609dd3
Notes ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
OpenAccessLink https://doaj.org/article/bd1873b6183d4db1824b70823b6148c3
PQID 2918339029
PQPubID 2048282
PageCount 7
ParticipantIDs doaj_primary_oai_doaj_org_article_bd1873b6183d4db1824b70823b6148c3
proquest_journals_2918339029
PublicationCentury 2000
PublicationDate 2023-12-01
PublicationDateYYYYMMDD 2023-12-01
PublicationDate_xml – month: 12
  year: 2023
  text: 2023-12-01
  day: 01
PublicationDecade 2020
PublicationPlace Chongqing
PublicationPlace_xml – name: Chongqing
PublicationTitle Ji suan ji ke xue
PublicationYear 2023
Publisher Guojia Kexue Jishu Bu
Editorial office of Computer Science
Publisher_xml – name: Guojia Kexue Jishu Bu
– name: Editorial office of Computer Science
SSID ssib023646461
ssib051375750
ssib001164759
ssj0057673
Score 2.3689966
Snippet Despite the fact that multi-modal deep learning such as image captioning model has been proved to be vulnerable to adversarial examples, the adversarial...
Despite the fact that multi-modal deep learning such as image captioning model has been proved to be vulnerable to adversarial examples,the adversarial...
SourceID doaj
proquest
SourceType Open Website
Aggregation Database
StartPage 330
SubjectTerms Frames (data processing)
Image contrast
multi-model|video caption|adversarial example|saliency map|keyframe select
Object recognition
Title Sparse Adversarial Examples Attacking on Video Captioning Model
URI https://www.proquest.com/docview/2918339029
https://doaj.org/article/bd1873b6183d4db1824b70823b6148c3
Volume 50
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVAON
  databaseName: DOAJ Directory of Open Access Journals
  issn: 1002-137X
  databaseCode: DOA
  dateStart: 20210101
  customDbUrl:
  isFulltext: true
  dateEnd: 99991231
  titleUrlDefault: https://www.doaj.org/
  omitProxy: true
  ssIdentifier: ssj0057673
  providerName: Directory of Open Access Journals
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1JS8NAFB7Ekxd3sVplDl5jk0wyy7GWliLoqUpvYVa0SltMhOKv970sWPDgxVuSQ0LeNt-b5fsIuUmCQpZvEwmeqSjjcKUAxkZSKzzIKa0KeN754ZFPn7L7eT7fkvrCPWENPXBjuIFxiRTMcAg9lzkD78mMwOUhgxSWtub5hNutZqoGAkiT9TNQI0s63yJOyxMmAKfEXc0G0C2arfgo9cHEvFvQlIoPFuXibXObYqeEBypacv9ftbsekCaHZL9FknTY_MER2fl6OSYHnUoDbZP2BOD4GppXT2vp5VJjwNHxRiMrcEmHVaUtzpbT1ZI-vzq_oiO9bidpKQqlvZ-S2WQ8G02jVjYhcgrsbcAn1qZxSINysc40DzxXnnkoidZIFZjn1hgDqNqozOfes0QnmXcaWhvlHDsju8vV0p8Tqow2noH1c8jaOAnSCCekS6XNE2sD65E7NEWxbogxCqSqrh-AA4vWgcVfDuyRfmfIos2fskghYBhTcaou_uMbl2QPZeKbbSh9slt9fPorABOVua7j5hs-fr_d
linkProvider Directory of Open Access Journals
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Sparse+Adversarial+Examples+Attacking+on+Video+Captioning+Model&rft.jtitle=Ji+suan+ji+ke+xue&rft.au=QIU+Jiangxing%2C+TANG+Xueming%2C+WANG+Tianmei%2C+WANG+Chen%2C+CUI+Yongquan%2C+LUO+Ting&rft.date=2023-12-01&rft.pub=Editorial+office+of+Computer+Science&rft.issn=1002-137X&rft.volume=50&rft.issue=12&rft.spage=330&rft.epage=336&rft_id=info:doi/10.11896%2Fjsjkx.221100068&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_bd1873b6183d4db1824b70823b6148c3
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1002-137X&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1002-137X&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1002-137X&client=summon