Comparative Study of Various Hyperparameter Tuning on Random Forest Classification With SMOTE and Feature Selection Using Genetic Algorithm in Software Defect Prediction
Software defect prediction is necessary for desktop and mobile applications. Random Forest defect prediction performance can be significantly increased with the parameter optimization process compared to the default parameter. However, the parameter tuning step is commonly neglected. Random Forest h...
        Saved in:
      
    
          | Published in | Journal of electronics, electromedical engineering, and medical informatics Vol. 6; no. 2; pp. 137 - 147 | 
|---|---|
| Main Authors | , , , , | 
| Format | Journal Article | 
| Language | English | 
| Published | 
          
        01.04.2024
     | 
| Online Access | Get full text | 
| ISSN | 2656-8632 2656-8632  | 
| DOI | 10.35882/jeeemi.v6i2.375 | 
Cover
| Abstract | Software defect prediction is necessary for desktop and mobile applications. Random Forest defect prediction performance can be significantly increased with the parameter optimization process compared to the default parameter. However, the parameter tuning step is commonly neglected. Random Forest has numerous parameters that can be tuned, as a result manually adjusting parameters would diminish the efficiency of Random Forest, yield suboptimal results and it will take a lot of time. This research aims to improve the performance of Random Forest classification by using SMOTE to balance the data, Genetic Algorithm as selection feature, and using hyperparameter tuning to optimize the performance. Apart from that, it is also to find out which hyperparameter tuning method produces the best improvement on the Random Forest classification method. The dataset used in this study is NASA MDP which included 13 datasets. The method used contains SMOTE to handle imbalance data, Genetic Algorithm feature selection, Random Forest classification, and hyperparameter tuning methods including Grid Search, Random Search, Optuna, Bayesian (with Hyperopt), Hyperband, TPE and Nevergrad. The results of this research were carried out by evaluating performance using accuracy and AUC values. In terms of accuracy improvement, the three best methods are Nevergrad, TPE, and Hyperband. In terms of AUC improvement, the three best methods are Hyperband, Optuna, and Random Search. Nevergrad on average improves accuracy by about 3.9% and Hyperband on average improves AUC by about 3.51%. This study indicates that the use of hyperparameter tuning improves Random Forest performance and among all the hyperparameter tuning methods used, Hyperband has the best hyperparameter tuning performance with the highest average increase in both accuracy and AUC. The implication of this research is to increase the use of hyperparameter tuning in software defect prediction and improve software defect prediction performance. | 
    
|---|---|
| AbstractList | Software defect prediction is necessary for desktop and mobile applications. Random Forest defect prediction performance can be significantly increased with the parameter optimization process compared to the default parameter. However, the parameter tuning step is commonly neglected. Random Forest has numerous parameters that can be tuned, as a result manually adjusting parameters would diminish the efficiency of Random Forest, yield suboptimal results and it will take a lot of time. This research aims to improve the performance of Random Forest classification by using SMOTE to balance the data, Genetic Algorithm as selection feature, and using hyperparameter tuning to optimize the performance. Apart from that, it is also to find out which hyperparameter tuning method produces the best improvement on the Random Forest classification method. The dataset used in this study is NASA MDP which included 13 datasets. The method used contains SMOTE to handle imbalance data, Genetic Algorithm feature selection, Random Forest classification, and hyperparameter tuning methods including Grid Search, Random Search, Optuna, Bayesian (with Hyperopt), Hyperband, TPE and Nevergrad. The results of this research were carried out by evaluating performance using accuracy and AUC values. In terms of accuracy improvement, the three best methods are Nevergrad, TPE, and Hyperband. In terms of AUC improvement, the three best methods are Hyperband, Optuna, and Random Search. Nevergrad on average improves accuracy by about 3.9% and Hyperband on average improves AUC by about 3.51%. This study indicates that the use of hyperparameter tuning improves Random Forest performance and among all the hyperparameter tuning methods used, Hyperband has the best hyperparameter tuning performance with the highest average increase in both accuracy and AUC. The implication of this research is to increase the use of hyperparameter tuning in software defect prediction and improve software defect prediction performance. | 
    
| Author | Faisal, Mohammad Reza Herteno, Rudy Nugroho, Radityo Adi Suryadi, Mulia Kevin Saputro, Setyo Wahyu  | 
    
| Author_xml | – sequence: 1 givenname: Mulia Kevin orcidid: 0009-0006-2954-6236 surname: Suryadi fullname: Suryadi, Mulia Kevin – sequence: 2 givenname: Rudy orcidid: 0000-0003-0637-8090 surname: Herteno fullname: Herteno, Rudy – sequence: 3 givenname: Setyo Wahyu orcidid: 0009-0007-9250-7704 surname: Saputro fullname: Saputro, Setyo Wahyu – sequence: 4 givenname: Mohammad Reza orcidid: 0000-0001-5748-7639 surname: Faisal fullname: Faisal, Mohammad Reza – sequence: 5 givenname: Radityo Adi orcidid: 0000-0002-7326-7668 surname: Nugroho fullname: Nugroho, Radityo Adi  | 
    
| BookMark | eNqNkM1KAzEUhYMo-Lt3mRdoTTKdzLiUalVQFFt1OdwmNxqZSUqSUfpIvqVp60IEwdW9cM53Ft8-2XbeISHHnA2Lsq7FyRsidnb4Lq0YFlW5RfaELOWgloXY_vHvkqMY3xhjoq7KkrM98jn23QICJPuOdJp6vaTe0CcI1veRXi0XGFZxhwkDnfXOuhfqHX0Ap31HJz5gTHTcQozWWJVncvhs0yud3t7NLmiu0QlC6kNexxbVuvAYVzOX6DBZRc_aFx8y0lHr6NSb9AG5fY4mt-l9QG3X1CHZMdBGPPq-B-RxcjEbXw1u7i6vx2c3AyUYKwfISs0rM5camSiYNKg1KtCgcMTgVI7mtQY0IyzUSJaGy0pyLSoB7JTPuWLFAeGb3d4tYPkBbdssgu0gLBvOmrXuZqO7Welusu7MsA2jgo8xoPkPIn8hyqa1vxTAtn-DX5I-nv0 | 
    
| CitedBy_id | crossref_primary_10_1016_j_rineng_2025_104123 | 
    
| ContentType | Journal Article | 
    
| DBID | AAYXX CITATION ADTOC UNPAY  | 
    
| DOI | 10.35882/jeeemi.v6i2.375 | 
    
| DatabaseName | CrossRef Unpaywall for CDI: Periodical Content Unpaywall  | 
    
| DatabaseTitle | CrossRef | 
    
| DatabaseTitleList | CrossRef | 
    
| Database_xml | – sequence: 1 dbid: UNPAY name: Unpaywall url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/ sourceTypes: Open Access Repository  | 
    
| DeliveryMethod | fulltext_linktorsrc | 
    
| Discipline | Engineering | 
    
| EISSN | 2656-8632 | 
    
| EndPage | 147 | 
    
| ExternalDocumentID | 10.35882/jeeemi.v6i2.375 10_35882_jeeemi_v6i2_375  | 
    
| GroupedDBID | AAYXX ALMA_UNASSIGNED_HOLDINGS CITATION M~E ADTOC UNPAY  | 
    
| ID | FETCH-LOGICAL-c2005-e05d17fb6de02306feddecadace40a964b8daef4e3c465f16761d272a091b1c03 | 
    
| IEDL.DBID | UNPAY | 
    
| ISSN | 2656-8632 | 
    
| IngestDate | Sun Sep 07 11:08:52 EDT 2025 Tue Jul 01 02:42:40 EDT 2025 Thu Apr 24 23:12:55 EDT 2025  | 
    
| IsDoiOpenAccess | true | 
    
| IsOpenAccess | true | 
    
| IsPeerReviewed | true | 
    
| IsScholarly | true | 
    
| Issue | 2 | 
    
| Language | English | 
    
| License | https://creativecommons.org/licenses/by-sa/4.0 cc-by-sa  | 
    
| LinkModel | DirectLink | 
    
| MergedId | FETCHMERGED-LOGICAL-c2005-e05d17fb6de02306feddecadace40a964b8daef4e3c465f16761d272a091b1c03 | 
    
| ORCID | 0000-0003-0637-8090 0000-0002-7326-7668 0009-0006-2954-6236 0009-0007-9250-7704 0000-0001-5748-7639  | 
    
| OpenAccessLink | https://proxy.k.utb.cz/login?url=https://doi.org/10.35882/jeeemi.v6i2.375 | 
    
| PageCount | 11 | 
    
| ParticipantIDs | unpaywall_primary_10_35882_jeeemi_v6i2_375 crossref_primary_10_35882_jeeemi_v6i2_375 crossref_citationtrail_10_35882_jeeemi_v6i2_375  | 
    
| ProviderPackageCode | CITATION AAYXX  | 
    
| PublicationCentury | 2000 | 
    
| PublicationDate | 2024-04-01 | 
    
| PublicationDateYYYYMMDD | 2024-04-01 | 
    
| PublicationDate_xml | – month: 04 year: 2024 text: 2024-04-01 day: 01  | 
    
| PublicationDecade | 2020 | 
    
| PublicationTitle | Journal of electronics, electromedical engineering, and medical informatics | 
    
| PublicationYear | 2024 | 
    
| SSID | ssj0002875510 | 
    
| Score | 2.389981 | 
    
| Snippet | Software defect prediction is necessary for desktop and mobile applications. Random Forest defect prediction performance can be significantly increased with... | 
    
| SourceID | unpaywall crossref  | 
    
| SourceType | Open Access Repository Enrichment Source Index Database  | 
    
| StartPage | 137 | 
    
| Title | Comparative Study of Various Hyperparameter Tuning on Random Forest Classification With SMOTE and Feature Selection Using Genetic Algorithm in Software Defect Prediction | 
    
| URI | https://doi.org/10.35882/jeeemi.v6i2.375 | 
    
| UnpaywallVersion | publishedVersion | 
    
| Volume | 6 | 
    
| hasFullText | 1 | 
    
| inHoldings | 1 | 
    
| isFullTextHit | |
| isPrint | |
| journalDatabaseRights | – providerCode: PRVHPJ databaseName: ROAD: Directory of Open Access Scholarly Resources customDbUrl: eissn: 2656-8632 dateEnd: 99991231 omitProxy: true ssIdentifier: ssj0002875510 issn: 2656-8632 databaseCode: M~E dateStart: 20190101 isFulltext: true titleUrlDefault: https://road.issn.org providerName: ISSN International Centre  | 
    
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9NAEF7R9AA98EYtj2oOXEByurZ3184xKqkipJSKJlBO1r5cAolduQ5VOfB_-JfM2KYEhHicPbNaaVaz38x-_oaxp06KyCUDExiTi0CkTgVGch3YMHVJLqXlnvqQk0M1nomXJ_Kk63fQvzBr7_exRPS398F7v5z3P6l51I8TucE2lUTU3WObs8Oj4TuaHYeQJEhVHLWvkL91--nWub4qzvTlhV4s1q6Sg1utrtF5o0BIDJKP_VVt-vbzL_qM_7LL2-xmhydh2B6AO-yaL-6yrTWVwXvs6_4PhW8g3uAllDm8wSIZq34YYyFa0eclEWNguqJGCZQFvNaFK5dAszvPa2iGZxKtqIkkvJ3X7-F48mo6AjQDApKrCldvpuqQQUNFABK1xo3BcHFaVuiyhHkBx5j5LzRav_DEJYGjih6LyOs-mx2MpvvjoJvQENhGwtRz6cIkN8r5ppbJPWZLq522XnA9UMKkTvtc-NgKJfNQJSp0URJpRCkmtDx-wHpFWfhtBlybAWYDnsfcCCtCzbVXsUhsynUqhd1he98jmNlOvpymaCwyLGOaIGRtEDIKQoZB2GHPrjzOWumOP9g-vzoUfzV--D_Gj9iNCHFQS_Z5zHp1tfJPEMfUZpdtTL6MdruD_A3FxPiP | 
    
| linkProvider | Unpaywall | 
    
| linkToUnpaywall | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwELZgewAOvFFLAc2BC0jZOontZI-r0mqF1FLRXSinyK_QbXeTKk2o2n_Ev2QmCWVBiMc5M5alscbfjL98w9hLJ0XkkpEJjMlFIFKnAiO5DmyYuiSX0nJPfci9fTWZibdH8qjvd9C_MCvv97FE9Ld14r1fzodf1Dwaxom8ydaURNQ9YGuz_YPxJ5odh5AkSFUcda-Qv3X76da51RRn-vJCLxYrV8nuvU7X6LxVICQGyemwqc3QXv2iz_gvu7zP7vZ4EsbdAXjAbvjiIbuzojL4iH3d_qHwDcQbvIQyhw9YJGPVDxMsRCv6vCRiDEwbapRAWcB7XbhyCTS787yGdngm0YraSMLHeX0Mh3vvpjuAZkBAsqlw9XaqDhm0VAQgUWvcGIwXn8sKXZYwL-AQM_-FRus3nrgkcFDRYxF5PWaz3Z3p9iToJzQEtpUw9Vy6MMmNcr6tZXKP2dJqp60XXI-UMKnTPhc-tkLJPFSJCl2URBpRigktj5-wQVEWfp0B12aE2YDnMTfCilBz7VUsEptynUphN9jW9whmtpcvpykaiwzLmDYIWReEjIKQYRA22Ktrj7NOuuMPtq-vD8VfjZ_-j_Emux0hDurIPs_YoK4a_xxxTG1e9Ef4G_bm914 | 
    
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Comparative+Study+of+Various+Hyperparameter+Tuning+on+Random+Forest+Classification+With+SMOTE+and+Feature+Selection+Using+Genetic+Algorithm+in+Software+Defect+Prediction&rft.jtitle=Journal+of+electronics%2C+electromedical+engineering%2C+and+medical+informatics&rft.au=Suryadi%2C+Mulia+Kevin&rft.au=Herteno%2C+Rudy&rft.au=Saputro%2C+Setyo+Wahyu&rft.au=Faisal%2C+Mohammad+Reza&rft.date=2024-04-01&rft.issn=2656-8632&rft.eissn=2656-8632&rft.volume=6&rft.issue=2&rft.spage=137&rft.epage=147&rft_id=info:doi/10.35882%2Fjeeemi.v6i2.375&rft.externalDBID=n%2Fa&rft.externalDocID=10_35882_jeeemi_v6i2_375 | 
    
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2656-8632&client=summon | 
    
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2656-8632&client=summon | 
    
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2656-8632&client=summon |