Assessing the impact of tuning parameter in instance selection based bug resolution classification
Software maintenance is time-consuming and requires significant effort for bug resolution and various types of software enhancement. Estimating software maintenance effort is challenging for open source software (OSS) without historical data about direct effort expressed in terms of man-days, compar...
        Saved in:
      
    
          | Published in | Information and software technology Vol. 188; p. 107874 | 
|---|---|
| Main Authors | , , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
            Elsevier B.V
    
        01.12.2025
     | 
| Subjects | |
| Online Access | Get full text | 
| ISSN | 0950-5849 | 
| DOI | 10.1016/j.infsof.2025.107874 | 
Cover
| Summary: | Software maintenance is time-consuming and requires significant effort for bug resolution and various types of software enhancement. Estimating software maintenance effort is challenging for open source software (OSS) without historical data about direct effort expressed in terms of man-days, compared to proprietary software for which this data about effort is available. Therefore, maintenance efforts in the OSS context can only be estimated indirectly through other features, such as OSS bug reports, and other approaches, such as bug resolution prediction models using a number of machine learning (ML) techniques. Although these bug reports are at times large in size, they need to be preprocessed before they can be used. In this context, instance selection (IS) has been presented in the literature as a way of reducing the size of datasets by selecting a subset of instances. Additionally, ML techniques often require fine-tuning of numerous parameters to achieve optimal predictions. This is typically done using tuning parameter (TP) methods.
The empirical study reported here investigated the impact of TP methods together with instance selection algorithms (ISAs) on the performance of bug resolution prediction ML classifiers on five datasets: Eclipse JDT, Eclipse Platform, KDE, LibreOffice, and Apache.
To this end, a set of 480 ML classifiers are built using 60 datasets including the five original ones, 15 reduced datasets using Edited Nearest Neighbor (ENN), Repeated Edited Nearest Neighbor (RENN), and all-k Nearest Neighbor (AllkNN) single ISAs, and 40 reduced datasets using Bagging, Random Feature Subsets, and Voting ensemble ISAs, together with four ML techniques (k Nearest Neighbor (kNN), Support Vector Machine (SVM), Voted Perceptron (VP), and Random Tree (RT) using Grid Search (GS) and Default Parameter (DP) configurations. Furthermore, the classifiers were evaluated using Accuracy, Precision, and Recall performance criteria, in addition to the ten-fold cross-validation method. Next, these classifiers are compared to determine how parameter tuning and IS can enhance bug resolution prediction performance.
The findings revealed that (1) using GS with single ISAs enhanced the performance of the built ML classifiers, (2) using GS with homogeneous and heterogeneous ensemble ISAs enhanced the performance of the built ML classifiers, and (3) associating GS and SVM with RENN (either used as a single ISA or implemented as a base algorithm for ensemble ISAs) gave the best performance.
[Display omitted] | 
|---|---|
| ISSN: | 0950-5849 | 
| DOI: | 10.1016/j.infsof.2025.107874 |