Explainability enhanced liver disease diagnosis technique using tree selection and stacking ensemble-based random forest model
Liver disease (LD) significantly impacts global health, requiring accurate diagnostic methods. This study aims to develop an automated system for LD prediction using machine learning (ML) and explainable artificial intelligence (XAI), enhancing diagnostic precision and interpretability. This researc...
Saved in:
| Published in | Informatics and Health Vol. 2; no. 1; pp. 17 - 40 |
|---|---|
| Main Authors | , , , , |
| Format | Journal Article |
| Language | English |
| Published |
Elsevier B.V
01.03.2025
KeAi Communications Co., Ltd |
| Subjects | |
| Online Access | Get full text |
| ISSN | 2949-9534 2949-9534 |
| DOI | 10.1016/j.infoh.2025.01.001 |
Cover
| Summary: | Liver disease (LD) significantly impacts global health, requiring accurate diagnostic methods. This study aims to develop an automated system for LD prediction using machine learning (ML) and explainable artificial intelligence (XAI), enhancing diagnostic precision and interpretability.
This research systematically analyzes two distinct datasets encompassing liver health indicators. A combination of preprocessing techniques, including feature optimization methods such as Forward Feature Selection (FFS), Backward Feature Selection (BFS), and Recursive Feature Elimination (RFE), is applied to enhance data quality. After that, ML models, namely Support Vector Machines (SVM), Naive Bayes (NB), Random Forest (RF), K-nearest neighbors (KNN), Decision Trees (DT), and a novel Tree Selection and Stacking Ensemble-based RF (TSRF), are assessed in the dataset to diagnose LD. Finally, the ultimate model is selected based on incorporating cross-validation and evaluation through performance metrics like accuracy, precision, specificity, etc., and efficient XAI methods express the ultimate model's interoperability.
The analysis reveals TSRF as the most effective model, achieving a peak accuracy of 99.92 % on Dataset-1 without feature optimization and 88.88 % on Dataset-2 with RFE optimization. XAI techniques, including SHAP and LIME plots, highlight key features influencing model predictions, providing insights into the reasoning behind classification outcomes.
The findings highlight TSRF's potential in improving LD diagnosis, using XAI to enhance transparency and trust in ML models. Despite high accuracy and interpretability, limitations such as dataset bias and lack of clinical validation remain. Future work focuses on integrating advanced XAI, diversifying datasets, and applying the approach in clinical settings for reliable diagnostics.
•Performance comparison of different ML models for the prediction of LD using multiple datasets.•Analysis of the effect of different feature optimization techniques for ML-based LD diagnosis.•Developing a novel hybrid ML model namely TSRF for diagnosis of LD.•Exploring the reasoning behind the model's decision through XAI. |
|---|---|
| ISSN: | 2949-9534 2949-9534 |
| DOI: | 10.1016/j.infoh.2025.01.001 |