A Comparative Analysis of Metaheuristic Feature Selection Methods in Software Vulnerability Prediction

Background: Early identification of software vulnerabilities is an intrinsic step in achieving software security. In the era of artificial intelligence, software vulnerability prediction models (VPMs) are created using machine learning and deep learning approaches. The effectiveness of these models...

Full description

Saved in:

Bibliographic Details
Published in	E-informatica : software engineering journal Vol. 19; no. 1
Main Authors	Deepali Bassi, Hardeep Singh
Format	Journal Article
Language	English
Published	Wroclaw University of Science and Technology 01.01.2025
Subjects	machine learning algorithms metaheuristic feature selection SMOTE software vulnerability prediction
Online Access	Get full text
ISSN	1897-7979 2084-4840 2084-4840
DOI	10.37190/e-inf250103

Cover

More Information
Summary:	Background: Early identification of software vulnerabilities is an intrinsic step in achieving software security. In the era of artificial intelligence, software vulnerability prediction models (VPMs) are created using machine learning and deep learning approaches. The effectiveness of these models aids in increasing the quality of the software. The handling of imbalanced datasets and dimensionality reduction are important aspects that affect the performance of VPMs. Aim: The current study applies novel metaheuristic approaches for feature subset selection. Method: This paper performs a comparative analysis of forty-eight combinations of eight machine learning techniques and six metaheuristic feature selection methods on four public datasets. Results: The experimental results reveal that VPMs productivity is upgraded after the application of the feature selection methods for both metrics-based and text-mining-based datasets. Additionally, the study has applied Wilcoxon signed-rank test to the results of metrics-based and text-features-based VPMs to evaluate which outperformed the other. Furthermore, it discovers the best-performing feature selection algorithm based on AUC for each dataset. Finally, this paper has performed better than the benchmark studies in terms of F1-Score. Conclusion: The results conclude that GWO has performed satisfactorily for all the datasets.
ISSN:	1897-7979 2084-4840 2084-4840
DOI:	10.37190/e-inf250103