Exploring Generative Adversarial Networks for Augmenting Network Intrusion Detection Tasks

The advent of generative networks and their adoption in numerous domains and communities have led to a wave of innovation and breakthroughs in AI and machine learning. Generative Adversarial Networks (GANs) have expanded the scope of what is possible with machine learning, allowing for new applicati...

Full description

Saved in:
Bibliographic Details
Published inACM transactions on multimedia computing communications and applications Vol. 21; no. 1; pp. 1 - 19
Main Authors Constantin, Mihai Gabriel, Stanciu, Dan-Cristian, Ştefan, Liviu-Daniel, Dogariu, Mihai, Mihăilescu, Dan, Ciobanu, George, Bergeron, Matt, Liu, Winston, Belov, Konstantin, Radu, Octavian, Ionescu, Bogdan
Format Journal Article
LanguageEnglish
Published New York, NY ACM 23.12.2024
Subjects
Online AccessGet full text
ISSN1551-6857
1551-6865
DOI10.1145/3689636

Cover

More Information
Summary:The advent of generative networks and their adoption in numerous domains and communities have led to a wave of innovation and breakthroughs in AI and machine learning. Generative Adversarial Networks (GANs) have expanded the scope of what is possible with machine learning, allowing for new applications in areas such as computer vision, natural language processing, and creative AI. GANs, in particular, have been used for a wide range of tasks, including image and video generation, data augmentation, style transfer, and anomaly detection. They have also been used for medical imaging and drug discovery, where they can generate synthetic data to augment small datasets, reduce the need for expensive experiments, and lower the number of real patients that must be included in medical trials. Given these developments, we propose using the power of GANs to create and augment flow-based network traffic datasets. We evaluate a series of GAN architectures, including Wasserstein, conditional, energy-based, gradient penalty, and LSTM-GANs. We evaluate their performance on a set of flow-based network traffic data collected from 16 subjects who used their computers for home, work, and study purposes. The performance of these GAN architectures is described according to metrics that involve networking principles, data distribution among a collection of flows, and temporal data distribution. Given the tendency of network intrusion detection datasets to have a very imbalanced data distribution, i.e., a large number of samples in the “normal traffic” category and a comparatively low number of samples assigned to the “intrusion” categories, we test our GANs by augmenting the intrusion data and checking whether this helps intrusion detection neural networks in their task. We publish the resulting UPBFlow dataset and code on GitHub.1
ISSN:1551-6857
1551-6865
DOI:10.1145/3689636