Optimizing ensemble machine learning models for accurate liver disease prediction in healthcare

Liver disease encompasses a range of conditions affecting the liver, including hepatitis, cirrhosis, fatty liver, and liver cancer. It can be caused by infections, alcohol abuse, obesity, or genetic factors, and it often progresses silently until advanced stages. Early detection and lifestyle adjust...

Full description

Saved in:
Bibliographic Details
Published inPloS one Vol. 20; no. 8; p. e0330899
Main Authors El Atifi, W., El Rhazouani, O., Khan, Fida Muhammad, Sekkat, H.
Format Journal Article
LanguageEnglish
Published United States Public Library of Science 28.08.2025
Public Library of Science (PLoS)
Subjects
Online AccessGet full text
ISSN1932-6203
1932-6203
DOI10.1371/journal.pone.0330899

Cover

More Information
Summary:Liver disease encompasses a range of conditions affecting the liver, including hepatitis, cirrhosis, fatty liver, and liver cancer. It can be caused by infections, alcohol abuse, obesity, or genetic factors, and it often progresses silently until advanced stages. Early detection and lifestyle adjustments are essential for effective management and to prevent severe liver damage. This study explores the application of machine learning (ML) techniques to predict liver disease, leveraging a dataset to compare the performance of several ensemble classifiers. The algorithms include the Random Forrest Classifier, Ada Boost Classifier, and Gradient Boosting Classifier. After a series of feature extraction and selection, hyperparameter tuning by Randomized Search CV and GridSearchCV, we aimed to determine the best model for liver disease prediction in terms of accuracy, precision, recall, and F1-score. The results showed that the Random Forest Classifier, optimized with GridSearchCV, achieved the highest accuracy at just over 85.17%. The considerations presented in this classifier can be considered for potential use as a precise diagnostic tool for liver disease diagnostics as these measurements indicate that this classifier works balanced with precision at 0.85 for both the presence and absence of the given disease as well as recall of 0.81 for its presence and 0.87 for its absence and F1-measure of 0.83 and 0.85 respectively. There were also relatively high performances of AdaBoost Classifier and Gradient Boosting Classifier, though none of the classifiers outperformed Random Forest Classifier significantly. The research has shown the potential of ensemble ML techniques, especially in the diagnosis of medical conditions, including liver diseases which, if diagnosed early, are critical. The results add evidence regarding the applicability of the ML models in clinical practices with the potential to improve diagnostic activities and consequently the outcomes of patients. Future studies will build on these models, testing them on larger and more diverse sets of data, including aspects of deep learning, and apply the research to other disease domains. The work presented in this research offers a starting point for carrying out innovations with ML in the sphere of healthcare to progress the methods of diagnosing diseases and treatment.
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
content type line 23
Competing Interests: NO authors have competing interests.
ISSN:1932-6203
1932-6203
DOI:10.1371/journal.pone.0330899