Comparative investigation of bagging enhanced machine learning for early detection of HCV infections using class imbalance technique with feature selection

Around 1.5 million new cases of Hepatitis C Virus (HCV) are diagnosed globally each year (World Health Organization, 2023). Consequently, there is a pressing need for early diagnostic methods for HCV. This study investigates the prognostic accuracy of several ensemble machine learning (ML) models fo...

Full description

Saved in:

Bibliographic Details
Published in	PloS one Vol. 20; no. 6; p. e0326488
Main Authors	Tusher, Ekramul Haque, Ismail, Mohd Arfian, Akib, Abdullah, Gabralla, Lubna A., Ibrahim, Ashraf Osman, Som, Hafizan Mat, Remli, Muhammad Akmal
Format	Journal Article
Language	English
Published	United States Public Library of Science 26.06.2025 Public Library of Science (PLoS)
Subjects	Accuracy Adult Algorithms Area Under Curve Bagging Bayes Theorem Biology and life sciences Care and treatment Classification Computer and Information Sciences Control Data mining Data science Datasets Decision making Decision Trees Diagnosis Disease Early Diagnosis Engineering and Technology Ensemble learning Feature selection Female Hepacivirus Hepatitis C Hepatitis C - diagnosis Hepatitis C virus Humans Identification and classification Infections Learning algorithms Liver cancer Liver cirrhosis Machine Learning Male Mean Medical diagnosis Medical research Medicine and health sciences Middle Aged Oversampling Physical Sciences Recall Research and Analysis Methods Risk factors Standard deviation Support Vector Machine Support vector machines Variance analysis Malaysia
Online Access	Get full text
ISSN	1932-6203 1932-6203
DOI	10.1371/journal.pone.0326488

Cover

More Information
Summary:	Around 1.5 million new cases of Hepatitis C Virus (HCV) are diagnosed globally each year (World Health Organization, 2023). Consequently, there is a pressing need for early diagnostic methods for HCV. This study investigates the prognostic accuracy of several ensemble machine learning (ML) models for diagnosing HCV infection. The study utilizes a dataset comprising demographic information of 615 individuals suspected of having HCV infection. Additionally, the research employs oversampling and undersampling techniques to address class imbalances in the dataset and conducts feature reduction using the F-test in one-way analysis of variance. Ensemble ML methods, including Support Vector Machine (SVM), k-Nearest Neighbors (k-NN), Logistic Regression (LR), Random Forest (RF), Naïve Bayes (NB), and Decision Tree (DT), are used to predict HCV infection. The performance of these ensemble methods is evaluated using metrics such as accuracy, recall, precision, F1 score, G-mean, balanced accuracy, cross-validation (CV), area under the curve (AUC), standard deviation, and error rate. Compared with previous studies, the Bagging k-NN model demonstrated superior performance under oversampling conditions, achieving 98.37% accuracy, 98.23% CV score, 97.67% precision, 97.93% recall, 98.18% selectivity, 97.79% F1 score, 98.06% balanced accuracy, 98.05% G-mean, a 1.63% error rate, 0.98 AUC, and a standard deviation of 0.192. This study highlights the potential of ensemble ML approaches in improving the diagnosis of HCV. The findings provide a foundation for developing accurate predictive methods for HCV diagnosis.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23 Competing Interests: The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0326488