Software Defect Prediction Using Non-Dominated Sorting Genetic Algorithm and $k$-Nearest Neighbour Classifier

Background: Software Defect Prediction (SDP) is a vital step in software development. SDP aims to identify the most likely defect-prone modules before starting the testing phase, and it helps assign resources and reduces the cost of testing. Aim: Although many machine learning algorithms have been u...

Full description

Saved in:
Bibliographic Details
Published inE-informatica : software engineering journal Vol. 18; no. 1
Main Authors Mohammad Azzeh, Ali Bou Nassif, Manar Abu Talib, Hajra Iqbal
Format Journal Article
LanguageEnglish
Published Wroclaw University of Science and Technology 01.01.2024
Subjects
Online AccessGet full text
ISSN1897-7979
2084-4840
2084-4840
DOI10.37190/e-inf240103

Cover

More Information
Summary:Background: Software Defect Prediction (SDP) is a vital step in software development. SDP aims to identify the most likely defect-prone modules before starting the testing phase, and it helps assign resources and reduces the cost of testing. Aim: Although many machine learning algorithms have been used to classify software modules based on static code metrics, the k-Nearest Neighbors (kNN) method does not greatly improve defect prediction because it requires careful set-up of multiple configuration parameters before it can be used. To address this issue, we used the Non-dominated Sorting Genetic Algorithm (NSGA-II) to optimize the parameters in the kNN classifier with favor to improve SDP accuracy. We used NSGA-II because the existing accuracy metrics often behave differently, making an opposite judgment in evaluating SDP models. This means that changing one parameter might improve one accuracy measure while it decreases the others. Method: The proposed NSGAII-kNN model was evaluated against the classical kNN model and state-of-the-art machine learning algorithms such as Support Vector Machine (SVM), Naïve Bayes (NB), and Random Forest (RF) classifiers. Results: Results indicate that the GA-optimized kNN model yields a higher Matthews Coefficient Correlation (MCC) and higher balanced accuracy based on ten datasets. Conclusion: The paper concludes that integrating GA with kNN improved defect prediction when applied to large or small or large datasets.
ISSN:1897-7979
2084-4840
2084-4840
DOI:10.37190/e-inf240103