Explainability enhanced liver disease diagnosis technique using tree selection and stacking ensemble-based random forest model

Liver disease (LD) significantly impacts global health, requiring accurate diagnostic methods. This study aims to develop an automated system for LD prediction using machine learning (ML) and explainable artificial intelligence (XAI), enhancing diagnostic precision and interpretability. This researc...

Full description

Saved in:

Bibliographic Details
Published in	Informatics and Health Vol. 2; no. 1; pp. 17 - 40
Main Authors	Mamun, Mohammad, Chowdhury, Safiul Haque, Hossain, Muhammad Minoar, Khatun, M.R., Iqbal, Sadiq
Format	Journal Article
Language	English
Published	Elsevier B.V 01.03.2025 KeAi Communications Co., Ltd
Subjects	Diagnosis Explainable artificial intelligence (XAI) Feature optimization Liver disease Machine learning Diagnosis Explainable artificial intelligence (XAI) Feature optimization Liver disease Machine learning
Online Access	Get full text
ISSN	2949-9534 2949-9534
DOI	10.1016/j.infoh.2025.01.001

Cover

More Information
Summary:	Liver disease (LD) significantly impacts global health, requiring accurate diagnostic methods. This study aims to develop an automated system for LD prediction using machine learning (ML) and explainable artificial intelligence (XAI), enhancing diagnostic precision and interpretability. This research systematically analyzes two distinct datasets encompassing liver health indicators. A combination of preprocessing techniques, including feature optimization methods such as Forward Feature Selection (FFS), Backward Feature Selection (BFS), and Recursive Feature Elimination (RFE), is applied to enhance data quality. After that, ML models, namely Support Vector Machines (SVM), Naive Bayes (NB), Random Forest (RF), K-nearest neighbors (KNN), Decision Trees (DT), and a novel Tree Selection and Stacking Ensemble-based RF (TSRF), are assessed in the dataset to diagnose LD. Finally, the ultimate model is selected based on incorporating cross-validation and evaluation through performance metrics like accuracy, precision, specificity, etc., and efficient XAI methods express the ultimate model's interoperability. The analysis reveals TSRF as the most effective model, achieving a peak accuracy of 99.92 % on Dataset-1 without feature optimization and 88.88 % on Dataset-2 with RFE optimization. XAI techniques, including SHAP and LIME plots, highlight key features influencing model predictions, providing insights into the reasoning behind classification outcomes. The findings highlight TSRF's potential in improving LD diagnosis, using XAI to enhance transparency and trust in ML models. Despite high accuracy and interpretability, limitations such as dataset bias and lack of clinical validation remain. Future work focuses on integrating advanced XAI, diversifying datasets, and applying the approach in clinical settings for reliable diagnostics. •Performance comparison of different ML models for the prediction of LD using multiple datasets.•Analysis of the effect of different feature optimization techniques for ML-based LD diagnosis.•Developing a novel hybrid ML model namely TSRF for diagnosis of LD.•Exploring the reasoning behind the model's decision through XAI.
ISSN:	2949-9534 2949-9534
DOI:	10.1016/j.infoh.2025.01.001