Sparse Adversarial Examples Attacking on Video Captioning Model

Despite the fact that multi-modal deep learning such as image captioning model has been proved to be vulnerable to adversarial examples, the adversarial susceptibility in video caption generation is under-examined.There are two main reasons for this.On the one hand, the video captioning model input...

Full description

Saved in:

Bibliographic Details
Published in	Ji suan ji ke xue Vol. 50; no. 12; pp. 330 - 336
Main Authors	Qiu, Jiangxing, Tang, Xueming, Wang, Tianmei, Wang, Chen, Cui, Yongquan, Luo, Ting
Format	Journal Article
Language	Chinese
Published	Chongqing Guojia Kexue Jishu Bu 01.12.2023 Editorial office of Computer Science
Subjects	Frames (data processing) Image contrast multi-model\|video caption\|adversarial example\|saliency map\|keyframe select Object recognition
Online Access	Get full text
ISSN	1002-137X
DOI	10.11896/jsjkx.221100068

Cover

Abstract	Despite the fact that multi-modal deep learning such as image captioning model has been proved to be vulnerable to adversarial examples, the adversarial susceptibility in video caption generation is under-examined.There are two main reasons for this.On the one hand, the video captioning model input is a stream of images rather than a single picture in contrast to image captioning systems.The calculation would be enormous if we perturb each frame of a video.On the other hand, compared with the video recognition model, the output of the model is not a single word, but a more complex semantic description.To solve the above problems and study the robustness of video captioning model, this paper proposes a sparse adversarial attack method.Firstly, a method is proposed based on the idea derived from saliency maps in image object recognition model to verify the contribution of different frames to the video captioning model output and a L norm based optimistic objective function suited for video caption models is des
AbstractList	Despite the fact that multi-modal deep learning such as image captioning model has been proved to be vulnerable to adversarial examples,the adversarial susceptibility in video caption generation is under-examined.There are two main reasons for this.On the one hand,the video captioning model input is a stream of images rather than a single picture in contrast to image captioning systems.The calculation would be enormous if we perturb each frame of a video.On the other hand,compared with the video recognition model,the output of the model is not a single word,but a more complex semantic description.To solve the above problems and study the robustness of video captioning model,this paper proposes a sparse adversarial attack method.Firstly,a method is proposed based on the idea derived from saliency maps in image object recognition model to verify the contribution of different frames to the video captioning model output and a L2norm based optimistic objective function suited for video caption models is designed.W Despite the fact that multi-modal deep learning such as image captioning model has been proved to be vulnerable to adversarial examples, the adversarial susceptibility in video caption generation is under-examined.There are two main reasons for this.On the one hand, the video captioning model input is a stream of images rather than a single picture in contrast to image captioning systems.The calculation would be enormous if we perturb each frame of a video.On the other hand, compared with the video recognition model, the output of the model is not a single word, but a more complex semantic description.To solve the above problems and study the robustness of video captioning model, this paper proposes a sparse adversarial attack method.Firstly, a method is proposed based on the idea derived from saliency maps in image object recognition model to verify the contribution of different frames to the video captioning model output and a L norm based optimistic objective function suited for video caption models is des
Author	Wang, Tianmei Wang, Chen Qiu, Jiangxing Cui, Yongquan Tang, Xueming Luo, Ting
Author_xml	– sequence: 1 givenname: Jiangxing surname: Qiu fullname: Qiu, Jiangxing – sequence: 2 givenname: Xueming surname: Tang fullname: Tang, Xueming – sequence: 3 givenname: Tianmei surname: Wang fullname: Wang, Tianmei – sequence: 4 givenname: Chen surname: Wang fullname: Wang, Chen – sequence: 5 givenname: Yongquan surname: Cui fullname: Cui, Yongquan – sequence: 6 givenname: Ting surname: Luo fullname: Luo, Ting
BookMark	eNotjktLw0AUhWdRwVq7dxlwnTqvTmZWUkrVQsWFRdyFedyUSdNMnEml_ntTK3dxDh-cj3uDRm1oAaE7gmeESCUe6lTvTzNKCcEYCzlC46HQnLDi8xpNU_IGUyb4cGSMHt87HRNkC_cNMenodZOtTvrQNZCyRd9ru_ftLgtt9uEdhGypu96H9sxeg4PmFl1Vukkw_c8J2j6ttsuXfPP2vF4uNrlTXOXGWm0txRWtlMOaa1GJuQIGXBTWSFUxENYYU1BlFIc5ACOacHCaCqycYxO0vmhd0HXZRX_Q8acM2pd_IMRdqWPvbQOlcUQWzAgimePOEEm5KbCkZ8SlZYPr_uLqYvg6QurLOhxjO3xfUjWsmMJUsV8qZWWN
ContentType	Journal Article
Copyright	Copyright Guojia Kexue Jishu Bu 2023
Copyright_xml	– notice: Copyright Guojia Kexue Jishu Bu 2023
DBID	7SC 8FD JQ2 L7M L~C L~D DOA
DOI	10.11896/jsjkx.221100068
DatabaseName	Computer and Information Systems Abstracts Technology Research Database ProQuest Computer Science Collection Advanced Technologies Database with Aerospace Computer and Information Systems Abstracts Academic Computer and Information Systems Abstracts Professional DOAJ Directory of Open Access Journals
DatabaseTitle	Computer and Information Systems Abstracts Technology Research Database Computer and Information Systems Abstracts – Academic Advanced Technologies Database with Aerospace ProQuest Computer Science Collection Computer and Information Systems Abstracts Professional
DatabaseTitleList	Computer and Information Systems Abstracts
Database_xml	– sequence: 1 dbid: DOA name: DOAJ Directory of Open Access Journals url: https://www.doaj.org/ sourceTypes: Open Website
DeliveryMethod	fulltext_linktorsrc
Discipline	Computer Science
EndPage	336
ExternalDocumentID	oai_doaj_org_article_bd1873b6183d4db1824b70823b6148c3
GroupedDBID	-0Y 5XA 5XJ 7SC 8FD 92H 92I ABJNI ACGFS ALMA_UNASSIGNED_HOLDINGS CCEZO CUBFJ CW9 GROUPED_DOAJ JQ2 L7M L~C L~D TCJ TGT U1G U5S
ID	FETCH-LOGICAL-d949-bccacc20f2f9d0a4a6f659e3e467cb89f3e6cbbb729b94e5ee31a14eda2609dd3
IEDL.DBID	DOA
ISSN	1002-137X
IngestDate	Mon Sep 01 19:40:29 EDT 2025 Sun Jun 29 15:18:47 EDT 2025
IsDoiOpenAccess	true
IsOpenAccess	true
IsPeerReviewed	false
IsScholarly	true
Issue	12
Language	Chinese
LinkModel	DirectLink
MergedId	FETCHMERGED-LOGICAL-d949-bccacc20f2f9d0a4a6f659e3e467cb89f3e6cbbb729b94e5ee31a14eda2609dd3
Notes	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
OpenAccessLink	https://doaj.org/article/bd1873b6183d4db1824b70823b6148c3
PQID	2918339029
PQPubID	2048282
PageCount	7
ParticipantIDs	doaj_primary_oai_doaj_org_article_bd1873b6183d4db1824b70823b6148c3 proquest_journals_2918339029
PublicationCentury	2000
PublicationDate	2023-12-01
PublicationDateYYYYMMDD	2023-12-01
PublicationDate_xml	– month: 12 year: 2023 text: 2023-12-01 day: 01
PublicationDecade	2020
PublicationPlace	Chongqing
PublicationPlace_xml	– name: Chongqing
PublicationTitle	Ji suan ji ke xue
PublicationYear	2023
Publisher	Guojia Kexue Jishu Bu Editorial office of Computer Science
Publisher_xml	– name: Guojia Kexue Jishu Bu – name: Editorial office of Computer Science
SSID	ssib023646461 ssib051375750 ssib001164759 ssj0057673
Score	2.3689966
Snippet	Despite the fact that multi-modal deep learning such as image captioning model has been proved to be vulnerable to adversarial examples, the adversarial... Despite the fact that multi-modal deep learning such as image captioning model has been proved to be vulnerable to adversarial examples,the adversarial...
SourceID	doaj proquest
SourceType	Open Website Aggregation Database
StartPage	330
SubjectTerms	Frames (data processing) Image contrast multi-model\|video caption\|adversarial example\|saliency map\|keyframe select Object recognition
Title	Sparse Adversarial Examples Attacking on Video Captioning Model
URI	https://www.proquest.com/docview/2918339029 https://doaj.org/article/bd1873b6183d4db1824b70823b6148c3
Volume	50
hasFullText	1
inHoldings	1
isFullTextHit
isPrint
journalDatabaseRights	– providerCode: PRVAON databaseName: DOAJ Directory of Open Access Journals issn: 1002-137X databaseCode: DOA dateStart: 20210101 customDbUrl: isFulltext: true dateEnd: 99991231 titleUrlDefault: https://www.doaj.org/ omitProxy: true ssIdentifier: ssj0057673 providerName: Directory of Open Access Journals
link	http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwrV1JS8NAFB7Ekxd3sVplDl5jk0wyy7GWliLoqUpvYVa0SltMhOKv970sWPDgxVuSQ0LeNt-b5fsIuUmCQpZvEwmeqSjjcKUAxkZSKzzIKa0KeN754ZFPn7L7eT7fkvrCPWENPXBjuIFxiRTMcAg9lzkD78mMwOUhgxSWtub5hNutZqoGAkiT9TNQI0s63yJOyxMmAKfEXc0G0C2arfgo9cHEvFvQlIoPFuXibXObYqeEBypacv9ftbsekCaHZL9FknTY_MER2fl6OSYHnUoDbZP2BOD4GppXT2vp5VJjwNHxRiMrcEmHVaUtzpbT1ZI-vzq_oiO9bidpKQqlvZ-S2WQ8G02jVjYhcgrsbcAn1qZxSINysc40DzxXnnkoidZIFZjn1hgDqNqozOfes0QnmXcaWhvlHDsju8vV0p8Tqow2noH1c8jaOAnSCCekS6XNE2sD65E7NEWxbogxCqSqrh-AA4vWgcVfDuyRfmfIos2fskghYBhTcaou_uMbl2QPZeKbbSh9slt9fPorABOVua7j5hs-fr_d
linkProvider	Directory of Open Access Journals
openUrl	ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Sparse+Adversarial+Examples+Attacking+on+Video+Captioning+Model&rft.jtitle=Ji+suan+ji+ke+xue&rft.au=QIU+Jiangxing%2C+TANG+Xueming%2C+WANG+Tianmei%2C+WANG+Chen%2C+CUI+Yongquan%2C+LUO+Ting&rft.date=2023-12-01&rft.pub=Editorial+office+of+Computer+Science&rft.issn=1002-137X&rft.volume=50&rft.issue=12&rft.spage=330&rft.epage=336&rft_id=info:doi/10.11896%2Fjsjkx.221100068&rft.externalDBID=DOA&rft.externalDocID=oai_doaj_org_article_bd1873b6183d4db1824b70823b6148c3
thumbnail_l	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=1002-137X&client=summon
thumbnail_m	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=1002-137X&client=summon
thumbnail_s	http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=1002-137X&client=summon