A comparative analysis of machine learning techniques for detecting probing attack with SHAP algorithm

Internet-based network safety has transformed into a major global issue because of the rising dependency of people, businesses, and countries. Therefore, it is vitally important for individuals to use an intrusion detection system (IDS) that may protect computer networks from potential threats and d...

Full description

Saved in:

Bibliographic Details
Published in	Expert systems with applications Vol. 271; p. 126718
Main Authors	Rabbi, Fazla, Ibne Hossain, Niamat Ullah, Das, Saikat
Format	Journal Article
Language	English
Published	Elsevier Ltd 01.05.2025
Online Access	Get full text
ISSN	0957-4174
DOI	10.1016/j.eswa.2025.126718

Cover

More Information
Summary:	Internet-based network safety has transformed into a major global issue because of the rising dependency of people, businesses, and countries. Therefore, it is vitally important for individuals to use an intrusion detection system (IDS) that may protect computer networks from potential threats and data leakage. It is gradually improving with the growth of machine learning (ML) methods. In this research, we present an intrusion detection method utilizing several ML algorithms to detect probe attacks using the NSL-KDD dataset. This attack targets the potential weak point of the network to get an idea about the structure and vulnerabilities. Therefore, the objective of this study is to build a best-performed ML model that provides the lowest possible false positive rate, the lowest run time, and the highest possible F1 score. To that end, different ML models have been developed, such as Neural Network (NN), Random Forest (RF), K-Nearest Neighbor (KNN), Bagging Classifier, and Extreme Gradient Boosting Classifier (XGBoost). Furthermore, cross-validation, sampling methods, and hyperparameter tuning were conducted on those ML models to improve their efficiency. Moreover, a SHAP algorithm has been conducted to interpret the prediction of the ML models and figure out the most influential features that affect cyber-attack detection. We performed a comparative analysis among all ML models that we built, and it shows the XGBoost model is the best-performing model that outperformed all other models with a 92.93% F1 score, the lowest 2.35% false positive rate, and with a minimum runtime of 13 s. Furthermore, our feature importance study shows that the “src_bytes” or source bytes feature, which offers information on the number of bytes an attacker sends to each port during the scanning phase, has the greatest influence on identifying probing attacks. Compared to existing research on probe attack detection, our proposed model demonstrates an excellent example in terms of fast and accurate anomaly detection with negligible false positives. Additionally, it outperforms traditional probe attack detection in terms of computational efficiency and handling diverse network scenarios in the presence of high traffic volumes and dynamic environments.
ISSN:	0957-4174
DOI:	10.1016/j.eswa.2025.126718