Sparse Adversarial Examples Attacking on Video Captioning Model

Despite the fact that multi-modal deep learning such as image captioning model has been proved to be vulnerable to adversarial examples, the adversarial susceptibility in video caption generation is under-examined.There are two main reasons for this.On the one hand, the video captioning model input...

Full description

Saved in:

Bibliographic Details
Published in	Ji suan ji ke xue Vol. 50; no. 12; pp. 330 - 336
Main Authors	Qiu, Jiangxing, Tang, Xueming, Wang, Tianmei, Wang, Chen, Cui, Yongquan, Luo, Ting
Format	Journal Article
Language	Chinese
Published	Chongqing Guojia Kexue Jishu Bu 01.12.2023 Editorial office of Computer Science
Subjects	Frames (data processing) Image contrast multi-model\|video caption\|adversarial example\|saliency map\|keyframe select Object recognition
Online Access	Get full text
ISSN	1002-137X
DOI	10.11896/jsjkx.221100068

Cover

More Information
Summary:	Despite the fact that multi-modal deep learning such as image captioning model has been proved to be vulnerable to adversarial examples, the adversarial susceptibility in video caption generation is under-examined.There are two main reasons for this.On the one hand, the video captioning model input is a stream of images rather than a single picture in contrast to image captioning systems.The calculation would be enormous if we perturb each frame of a video.On the other hand, compared with the video recognition model, the output of the model is not a single word, but a more complex semantic description.To solve the above problems and study the robustness of video captioning model, this paper proposes a sparse adversarial attack method.Firstly, a method is proposed based on the idea derived from saliency maps in image object recognition model to verify the contribution of different frames to the video captioning model output and a L norm based optimistic objective function suited for video caption models is des
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1002-137X
DOI:	10.11896/jsjkx.221100068