Unsupervised Clustering for a Comparative Methodology of Machine Learning Models to Detect Domain-Generated Algorithms Based on an Alphanumeric Features Analysis

Domain Generation Algorithms (DGAs) are often used for generating huge amounts of domain names to maintain command and control between the infected computer and the bot master. By establishing as needed a great number of domain names, attackers may mask their C2 servers and escape detection. Many ma...

Full description

Saved in:

Bibliographic Details
Published in	Journal of network and systems management Vol. 32; no. 1; p. 18
Main Authors	Hassaoui, Mohamed, Hanini, Mohamed, El Kafhali, Said
Format	Journal Article
Language	English
Published	New York Springer US 01.03.2024 Springer Nature B.V
Subjects	Algorithms Artificial intelligence Clustering Command and control Communications Engineering Computer Communication Networks Computer Science Computer Systems Organization and Communication Networks Domain names Information Systems and Communication Service Machine learning Methodology Networks Neural networks Operations Research/Decision Theory Domain generation algorithms DGA Cybersecurity Neural network Machine learning
Online Access	Get full text
ISSN	1064-7570 1573-7705
DOI	10.1007/s10922-023-09793-6

Cover

More Information
Summary:	Domain Generation Algorithms (DGAs) are often used for generating huge amounts of domain names to maintain command and control between the infected computer and the bot master. By establishing as needed a great number of domain names, attackers may mask their C2 servers and escape detection. Many malware families have switched to a stealthier contact approach. Therefore, the traditional methods become ineffective. Over the past decades, many researches have started to use artificial intelligence to create systems able to detect DGA in traffic, but these works do not use the same data to evaluate their models. This article proposes a comparative methodology to compare machine learning models based on unsupervised clustering and then applied this methodology to study the best models belonging to neural network methods and traditional machine learning methods to detect DGAs. We extracted 21 linguistic features based on the analysis of alphanumeric and n-gram, we studied the correlation between these features in order to reduce their number. We examine in detail those Machine learning algorithms and we discuss the drawbacks and strengths of each method with specific classes of DGA to propose a new switch case model that could be always reliable to detect DGAs.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1064-7570 1573-7705
DOI:	10.1007/s10922-023-09793-6