Improved liver disease prediction from clinical data through an evaluation of ensemble learning approaches

Purpose Liver disease causes two million deaths annually, accounting for 4% of all deaths globally. Prediction or early detection of the disease via machine learning algorithms on large clinical data have become promising and potentially powerful, but such methods often have some limitations due to...

Full description

Saved in:

Bibliographic Details
Published in	BMC medical informatics and decision making Vol. 24; no. 1; pp. 160 - 24
Main Authors	Ganie, Shahid Mohammad, Dutta Pramanik, Pijush Kanti, Zhao, Zhongming
Format	Journal Article
Language	English
Published	London BioMed Central 07.06.2024 BioMed Central Ltd Springer Nature B.V BMC
Subjects	Accuracy Algorithms Bagging Boosting Business metrics Classification Commonality Datasets Diagnosis Disease prediction Ensemble learning Fatalities Feature selection Health Informatics Humans Information Systems and Communication Service Liver Liver disease Liver Diseases Machine Learning Management of Computing and Information Systems Medical research Medicine Medicine & Public Health Methods Performance measurement Predictions Prognosis Support vector machines Voting India Voting Liver disease Boosting Ensemble learning Bagging Disease prediction Gradient boosting
Online Access	Get full text
ISSN	1472-6947 1472-6947
DOI	10.1186/s12911-024-02550-y

Cover

More Information
Summary:	Purpose Liver disease causes two million deaths annually, accounting for 4% of all deaths globally. Prediction or early detection of the disease via machine learning algorithms on large clinical data have become promising and potentially powerful, but such methods often have some limitations due to the complexity of the data. In this regard, ensemble learning has shown promising results. There is an urgent need to evaluate different algorithms and then suggest a robust ensemble algorithm in liver disease prediction. Method Three ensemble approaches with nine algorithms are evaluated on a large dataset of liver patients comprising 30,691 samples with 11 features. Various preprocessing procedures are utilized to feed the proposed model with better quality data, in addition to the appropriate tuning of hyperparameters and selection of features. Results The models’ performances with each algorithm are extensively evaluated with several positive and negative performance metrics along with runtime. Gradient boosting is found to have the overall best performance with 98.80% accuracy and 98.50% precision, recall and F1-score for each. Conclusions The proposed model with gradient boosting bettered in most metrics compared with several recent similar works, suggesting its efficacy in predicting liver disease. It can be further applied to predict other diseases with the commonality of predicate indicators.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 content type line 23
ISSN:	1472-6947 1472-6947
DOI:	10.1186/s12911-024-02550-y