Heterogeneous Fault Prediction Using Feature Selection and Supervised Learning Algorithms

Software Fault Prediction (SFP) is the most persuasive research area of software engineering. Software Fault Prediction which is carried out within the same software project is known as With-In Fault Prediction. However, local data repositories are not enough to build the model of With-in software F...

Full description

Saved in:
Bibliographic Details
Published inVietnam journal of computer science Vol. 9; no. 3; pp. 261 - 284
Main Authors Arora, Rashmi, Kaur, Arvinder
Format Journal Article
LanguageEnglish
Published World Scientific Publishing Company 01.08.2022
World Scientific Publishing
Subjects
Online AccessGet full text
ISSN2196-8888
2196-8896
2196-8896
DOI10.1142/S2196888822500142

Cover

More Information
Summary:Software Fault Prediction (SFP) is the most persuasive research area of software engineering. Software Fault Prediction which is carried out within the same software project is known as With-In Fault Prediction. However, local data repositories are not enough to build the model of With-in software Fault prediction. The idea of cross-project fault prediction (CPFP) has been suggested in recent years, which aims to construct a prediction model on one project, and use that model to predict the other project. However, CPFP requires that both the training and testing datasets use the same set of metrics. As a consequence, traditional CPFP approaches are challenging to implement through projects with diverse metric sets. The specific case of CPFP is Heterogeneous Fault Prediction (HFP), which allows the program to predict faults among projects with diverse metrics. The proposed framework aims to achieve an HFP model by implementing Feature Selection on both the source and target datasets to build an efficient prediction model using supervised machine learning techniques. Our approach is applied on two open-source projects, Linux and MySQL, and prediction is evaluated based on Area Under Curve (AUC) performance measure. The key results of the proposed approach are as follows: It significantly gives better results of prediction performance for heterogeneous projects as compared with cross projects. Also, it demonstrates that feature selection with feature mapping has a significant effect on HFP models. Non-parametric statistical analyses, such as the Friedman and Nemenyi Post-hoc Tests, are applied, demonstrating that Logistic Regression performed significantly better than other supervised learning algorithms in HFP models.
ISSN:2196-8888
2196-8896
2196-8896
DOI:10.1142/S2196888822500142