Robustification of Deep Net Classifiers by Key Based Diversified Aggregation with Pre-Filtering

In this paper, we address a problem of machine learning system vulnerability to adversarial attacks. We propose and investigate a Key based Diversified Aggregation (KDA) mechanism as a defense strategy. The KDA assumes that the attacker (i) knows the architecture of classifier and the used de-fense...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings - International Conference on Image Processing pp. 2294 - 2298
Main Authors	Taran, Olga, Rezaeifar, Shideh, Holotyak, Taras, Voloshynovskiy, Slava
Format	Conference Proceeding
Language	English
Published	IEEE 01.09.2019
Subjects	Adversarial attacks black / gray-box Classification algorithms Computer architecture defense Discrete cosine transforms machine learning Microsoft Windows non-gradient / gradient based attacks Perturbation methods Training
Online Access	Get full text
ISSN	2381-8549
DOI	10.1109/ICIP.2019.8803714

Cover

More Information
Summary:	In this paper, we address a problem of machine learning system vulnerability to adversarial attacks. We propose and investigate a Key based Diversified Aggregation (KDA) mechanism as a defense strategy. The KDA assumes that the attacker (i) knows the architecture of classifier and the used de-fense strategy, (ii) has an access to the training data set but (iii) does not know the secret key. The robustness of the system is achieved by a specially designed key based randomization. The proposed randomization prevents the gradients' back propagation or the creating of a "bypass" system. The randomization is performed simultaneously in several channels and a multi-channel aggregation stabilizes the results of randomization by aggregating soft outputs from each classifier in multi-channel system. The performed experimental evaluation demonstrates a high robustness and universality of the KDA against the most efficient gradient based attacks like those proposed by N. Carlini and D. Wagner [1] and the non-gradient based sparse adversarial perturbations like OnePixel attacks [2].
ISSN:	2381-8549
DOI:	10.1109/ICIP.2019.8803714