Comparative Study of Various Hyperparameter Tuning on Random Forest Classification With SMOTE and Feature Selection Using Genetic Algorithm in Software Defect Prediction

Software defect prediction is necessary for desktop and mobile applications. Random Forest defect prediction performance can be significantly increased with the parameter optimization process compared to the default parameter. However, the parameter tuning step is commonly neglected. Random Forest h...

Full description

Saved in:
Bibliographic Details
Published inJournal of electronics, electromedical engineering, and medical informatics Vol. 6; no. 2; pp. 137 - 147
Main Authors Suryadi, Mulia Kevin, Herteno, Rudy, Saputro, Setyo Wahyu, Faisal, Mohammad Reza, Nugroho, Radityo Adi
Format Journal Article
LanguageEnglish
Published 01.04.2024
Online AccessGet full text
ISSN2656-8632
2656-8632
DOI10.35882/jeeemi.v6i2.375

Cover

Abstract Software defect prediction is necessary for desktop and mobile applications. Random Forest defect prediction performance can be significantly increased with the parameter optimization process compared to the default parameter. However, the parameter tuning step is commonly neglected. Random Forest has numerous parameters that can be tuned, as a result manually adjusting parameters would diminish the efficiency of Random Forest, yield suboptimal results and it will take a lot of time. This research aims to improve the performance of Random Forest classification by using SMOTE to balance the data, Genetic Algorithm as selection feature, and using hyperparameter tuning to optimize the performance. Apart from that, it is also to find out which hyperparameter tuning method produces the best improvement on the Random Forest classification method. The dataset used in this study is NASA MDP which included 13 datasets. The method used contains SMOTE to handle imbalance data, Genetic Algorithm feature selection, Random Forest classification, and hyperparameter tuning methods including Grid Search, Random Search, Optuna, Bayesian (with Hyperopt), Hyperband, TPE and Nevergrad. The results of this research were carried out by evaluating performance using accuracy and AUC values. In terms of accuracy improvement, the three best methods are Nevergrad, TPE, and Hyperband. In terms of AUC improvement, the three best methods are Hyperband, Optuna, and Random Search. Nevergrad on average improves accuracy by about 3.9% and Hyperband on average improves AUC by about 3.51%. This study indicates that the use of hyperparameter tuning improves Random Forest performance and among all the hyperparameter tuning methods used, Hyperband has the best hyperparameter tuning performance with the highest average increase in both accuracy and AUC. The implication of this research is to increase the use of hyperparameter tuning in software defect prediction and improve software defect prediction performance.
AbstractList Software defect prediction is necessary for desktop and mobile applications. Random Forest defect prediction performance can be significantly increased with the parameter optimization process compared to the default parameter. However, the parameter tuning step is commonly neglected. Random Forest has numerous parameters that can be tuned, as a result manually adjusting parameters would diminish the efficiency of Random Forest, yield suboptimal results and it will take a lot of time. This research aims to improve the performance of Random Forest classification by using SMOTE to balance the data, Genetic Algorithm as selection feature, and using hyperparameter tuning to optimize the performance. Apart from that, it is also to find out which hyperparameter tuning method produces the best improvement on the Random Forest classification method. The dataset used in this study is NASA MDP which included 13 datasets. The method used contains SMOTE to handle imbalance data, Genetic Algorithm feature selection, Random Forest classification, and hyperparameter tuning methods including Grid Search, Random Search, Optuna, Bayesian (with Hyperopt), Hyperband, TPE and Nevergrad. The results of this research were carried out by evaluating performance using accuracy and AUC values. In terms of accuracy improvement, the three best methods are Nevergrad, TPE, and Hyperband. In terms of AUC improvement, the three best methods are Hyperband, Optuna, and Random Search. Nevergrad on average improves accuracy by about 3.9% and Hyperband on average improves AUC by about 3.51%. This study indicates that the use of hyperparameter tuning improves Random Forest performance and among all the hyperparameter tuning methods used, Hyperband has the best hyperparameter tuning performance with the highest average increase in both accuracy and AUC. The implication of this research is to increase the use of hyperparameter tuning in software defect prediction and improve software defect prediction performance.
Author Faisal, Mohammad Reza
Herteno, Rudy
Nugroho, Radityo Adi
Suryadi, Mulia Kevin
Saputro, Setyo Wahyu
Author_xml – sequence: 1
  givenname: Mulia Kevin
  orcidid: 0009-0006-2954-6236
  surname: Suryadi
  fullname: Suryadi, Mulia Kevin
– sequence: 2
  givenname: Rudy
  orcidid: 0000-0003-0637-8090
  surname: Herteno
  fullname: Herteno, Rudy
– sequence: 3
  givenname: Setyo Wahyu
  orcidid: 0009-0007-9250-7704
  surname: Saputro
  fullname: Saputro, Setyo Wahyu
– sequence: 4
  givenname: Mohammad Reza
  orcidid: 0000-0001-5748-7639
  surname: Faisal
  fullname: Faisal, Mohammad Reza
– sequence: 5
  givenname: Radityo Adi
  orcidid: 0000-0002-7326-7668
  surname: Nugroho
  fullname: Nugroho, Radityo Adi
BookMark eNqNkM1KAzEUhYMo-Lt3mRdoTTKdzLiUalVQFFt1OdwmNxqZSUqSUfpIvqVp60IEwdW9cM53Ft8-2XbeISHHnA2Lsq7FyRsidnb4Lq0YFlW5RfaELOWgloXY_vHvkqMY3xhjoq7KkrM98jn23QICJPuOdJp6vaTe0CcI1veRXi0XGFZxhwkDnfXOuhfqHX0Ap31HJz5gTHTcQozWWJVncvhs0yud3t7NLmiu0QlC6kNexxbVuvAYVzOX6DBZRc_aFx8y0lHr6NSb9AG5fY4mt-l9QG3X1CHZMdBGPPq-B-RxcjEbXw1u7i6vx2c3AyUYKwfISs0rM5camSiYNKg1KtCgcMTgVI7mtQY0IyzUSJaGy0pyLSoB7JTPuWLFAeGb3d4tYPkBbdssgu0gLBvOmrXuZqO7Welusu7MsA2jgo8xoPkPIn8hyqa1vxTAtn-DX5I-nv0
CitedBy_id crossref_primary_10_1016_j_rineng_2025_104123
ContentType Journal Article
DBID AAYXX
CITATION
ADTOC
UNPAY
DOI 10.35882/jeeemi.v6i2.375
DatabaseName CrossRef
Unpaywall for CDI: Periodical Content
Unpaywall
DatabaseTitle CrossRef
DatabaseTitleList CrossRef
Database_xml – sequence: 1
  dbid: UNPAY
  name: Unpaywall
  url: https://proxy.k.utb.cz/login?url=https://unpaywall.org/
  sourceTypes: Open Access Repository
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
EISSN 2656-8632
EndPage 147
ExternalDocumentID 10.35882/jeeemi.v6i2.375
10_35882_jeeemi_v6i2_375
GroupedDBID AAYXX
ALMA_UNASSIGNED_HOLDINGS
CITATION
M~E
ADTOC
UNPAY
ID FETCH-LOGICAL-c2005-e05d17fb6de02306feddecadace40a964b8daef4e3c465f16761d272a091b1c03
IEDL.DBID UNPAY
ISSN 2656-8632
IngestDate Sun Sep 07 11:08:52 EDT 2025
Tue Jul 01 02:42:40 EDT 2025
Thu Apr 24 23:12:55 EDT 2025
IsDoiOpenAccess true
IsOpenAccess true
IsPeerReviewed true
IsScholarly true
Issue 2
Language English
License https://creativecommons.org/licenses/by-sa/4.0
cc-by-sa
LinkModel DirectLink
MergedId FETCHMERGED-LOGICAL-c2005-e05d17fb6de02306feddecadace40a964b8daef4e3c465f16761d272a091b1c03
ORCID 0000-0003-0637-8090
0000-0002-7326-7668
0009-0006-2954-6236
0009-0007-9250-7704
0000-0001-5748-7639
OpenAccessLink https://proxy.k.utb.cz/login?url=https://doi.org/10.35882/jeeemi.v6i2.375
PageCount 11
ParticipantIDs unpaywall_primary_10_35882_jeeemi_v6i2_375
crossref_primary_10_35882_jeeemi_v6i2_375
crossref_citationtrail_10_35882_jeeemi_v6i2_375
ProviderPackageCode CITATION
AAYXX
PublicationCentury 2000
PublicationDate 2024-04-01
PublicationDateYYYYMMDD 2024-04-01
PublicationDate_xml – month: 04
  year: 2024
  text: 2024-04-01
  day: 01
PublicationDecade 2020
PublicationTitle Journal of electronics, electromedical engineering, and medical informatics
PublicationYear 2024
SSID ssj0002875510
Score 2.389981
Snippet Software defect prediction is necessary for desktop and mobile applications. Random Forest defect prediction performance can be significantly increased with...
SourceID unpaywall
crossref
SourceType Open Access Repository
Enrichment Source
Index Database
StartPage 137
Title Comparative Study of Various Hyperparameter Tuning on Random Forest Classification With SMOTE and Feature Selection Using Genetic Algorithm in Software Defect Prediction
URI https://doi.org/10.35882/jeeemi.v6i2.375
UnpaywallVersion publishedVersion
Volume 6
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
journalDatabaseRights – providerCode: PRVHPJ
  databaseName: ROAD: Directory of Open Access Scholarly Resources
  customDbUrl:
  eissn: 2656-8632
  dateEnd: 99991231
  omitProxy: true
  ssIdentifier: ssj0002875510
  issn: 2656-8632
  databaseCode: M~E
  dateStart: 20190101
  isFulltext: true
  titleUrlDefault: https://road.issn.org
  providerName: ISSN International Centre
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9NAEF7R9AA98EYtj2oOXEByurZ3184xKqkipJSKJlBO1r5cAolduQ5VOfB_-JfM2KYEhHicPbNaaVaz38x-_oaxp06KyCUDExiTi0CkTgVGch3YMHVJLqXlnvqQk0M1nomXJ_Kk63fQvzBr7_exRPS398F7v5z3P6l51I8TucE2lUTU3WObs8Oj4TuaHYeQJEhVHLWvkL91--nWub4qzvTlhV4s1q6Sg1utrtF5o0BIDJKP_VVt-vbzL_qM_7LL2-xmhydh2B6AO-yaL-6yrTWVwXvs6_4PhW8g3uAllDm8wSIZq34YYyFa0eclEWNguqJGCZQFvNaFK5dAszvPa2iGZxKtqIkkvJ3X7-F48mo6AjQDApKrCldvpuqQQUNFABK1xo3BcHFaVuiyhHkBx5j5LzRav_DEJYGjih6LyOs-mx2MpvvjoJvQENhGwtRz6cIkN8r5ppbJPWZLq522XnA9UMKkTvtc-NgKJfNQJSp0URJpRCkmtDx-wHpFWfhtBlybAWYDnsfcCCtCzbVXsUhsynUqhd1he98jmNlOvpymaCwyLGOaIGRtEDIKQoZB2GHPrjzOWumOP9g-vzoUfzV--D_Gj9iNCHFQS_Z5zHp1tfJPEMfUZpdtTL6MdruD_A3FxPiP
linkProvider Unpaywall
linkToUnpaywall http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwlV1Lb9QwELZgewAOvFFLAc2BC0jZOontZI-r0mqF1FLRXSinyK_QbXeTKk2o2n_Ev2QmCWVBiMc5M5alscbfjL98w9hLJ0XkkpEJjMlFIFKnAiO5DmyYuiSX0nJPfci9fTWZibdH8qjvd9C_MCvv97FE9Ld14r1fzodf1Dwaxom8ydaURNQ9YGuz_YPxJ5odh5AkSFUcda-Qv3X76da51RRn-vJCLxYrV8nuvU7X6LxVICQGyemwqc3QXv2iz_gvu7zP7vZ4EsbdAXjAbvjiIbuzojL4iH3d_qHwDcQbvIQyhw9YJGPVDxMsRCv6vCRiDEwbapRAWcB7XbhyCTS787yGdngm0YraSMLHeX0Mh3vvpjuAZkBAsqlw9XaqDhm0VAQgUWvcGIwXn8sKXZYwL-AQM_-FRus3nrgkcFDRYxF5PWaz3Z3p9iToJzQEtpUw9Vy6MMmNcr6tZXKP2dJqp60XXI-UMKnTPhc-tkLJPFSJCl2URBpRigktj5-wQVEWfp0B12aE2YDnMTfCilBz7VUsEptynUphN9jW9whmtpcvpykaiwzLmDYIWReEjIKQYRA22Ktrj7NOuuMPtq-vD8VfjZ_-j_Emux0hDurIPs_YoK4a_xxxTG1e9Ef4G_bm914
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Ajournal&rft.genre=article&rft.atitle=Comparative+Study+of+Various+Hyperparameter+Tuning+on+Random+Forest+Classification+With+SMOTE+and+Feature+Selection+Using+Genetic+Algorithm+in+Software+Defect+Prediction&rft.jtitle=Journal+of+electronics%2C+electromedical+engineering%2C+and+medical+informatics&rft.au=Suryadi%2C+Mulia+Kevin&rft.au=Herteno%2C+Rudy&rft.au=Saputro%2C+Setyo+Wahyu&rft.au=Faisal%2C+Mohammad+Reza&rft.date=2024-04-01&rft.issn=2656-8632&rft.eissn=2656-8632&rft.volume=6&rft.issue=2&rft.spage=137&rft.epage=147&rft_id=info:doi/10.35882%2Fjeeemi.v6i2.375&rft.externalDBID=n%2Fa&rft.externalDocID=10_35882_jeeemi_v6i2_375
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=2656-8632&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=2656-8632&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=2656-8632&client=summon