Sparse Adversarial Examples Attacking on Video Captioning Model
Despite the fact that multi-modal deep learning such as image captioning model has been proved to be vulnerable to adversarial examples, the adversarial susceptibility in video caption generation is under-examined.There are two main reasons for this.On the one hand, the video captioning model input...
Saved in:
Published in | Ji suan ji ke xue Vol. 50; no. 12; pp. 330 - 336 |
---|---|
Main Authors | , , , , , |
Format | Journal Article |
Language | Chinese |
Published |
Chongqing
Guojia Kexue Jishu Bu
01.12.2023
Editorial office of Computer Science |
Subjects | |
Online Access | Get full text |
ISSN | 1002-137X |
DOI | 10.11896/jsjkx.221100068 |
Cover
Summary: | Despite the fact that multi-modal deep learning such as image captioning model has been proved to be vulnerable to adversarial examples, the adversarial susceptibility in video caption generation is under-examined.There are two main reasons for this.On the one hand, the video captioning model input is a stream of images rather than a single picture in contrast to image captioning systems.The calculation would be enormous if we perturb each frame of a video.On the other hand, compared with the video recognition model, the output of the model is not a single word, but a more complex semantic description.To solve the above problems and study the robustness of video captioning model, this paper proposes a sparse adversarial attack method.Firstly, a method is proposed based on the idea derived from saliency maps in image object recognition model to verify the contribution of different frames to the video captioning model output and a L norm based optimistic objective function suited for video caption models is des |
---|---|
Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
ISSN: | 1002-137X |
DOI: | 10.11896/jsjkx.221100068 |