Comparing different supervised machine learning algorithms for disease prediction

Background Supervised machine learning algorithms have been a dominant method in the data mining field. Disease prediction using health data has recently shown a potential application area for these methods. This study aims to identify the key trends among different types of supervised machine learn...

Full description

Saved in:

Bibliographic Details
Published in	BMC medical informatics and decision making Vol. 19; no. 1; pp. 281 - 16
Main Authors	Uddin, Shahadat, Khan, Arif, Hossain, Md Ekramul, Moni, Mohammad Ali
Format	Journal Article
Language	English
Published	London BioMed Central 21.12.2019 BioMed Central Ltd BMC
Subjects	Algorithms Analysis Bayes Theorem Bayesian analysis Classification Clinical Decision Rules Data Mining Datasets Diabetes Disease Disease prediction Health Informatics Health risks Humans Identification methods Information Systems and Communication Service Learning algorithms Machine Learning Management of Computing and Information Systems Medical data Medical research Medicine Medicine & Public Health modeling Research Article Risk Factors Supervised machine learning algorithm Support Vector Machine Support vector machines technology Trends Taiwan Medical data Supervised machine learning algorithm Machine learning Disease prediction
Online Access	Get full text
ISSN	1472-6947 1472-6947
DOI	10.1186/s12911-019-1004-8

Cover

More Information
Summary:	Background Supervised machine learning algorithms have been a dominant method in the data mining field. Disease prediction using health data has recently shown a potential application area for these methods. This study aims to identify the key trends among different types of supervised machine learning algorithms, and their performance and usage for disease risk prediction. Methods In this study, extensive research efforts were made to identify those studies that applied more than one supervised machine learning algorithm on single disease prediction. Two databases (i.e., Scopus and PubMed) were searched for different types of search items. Thus, we selected 48 articles in total for the comparison among variants supervised machine learning algorithms for disease prediction. Results We found that the Support Vector Machine (SVM) algorithm is applied most frequently (in 29 studies) followed by the Naïve Bayes algorithm (in 23 studies). However, the Random Forest (RF) algorithm showed superior accuracy comparatively. Of the 17 studies where it was applied, RF showed the highest accuracy in 9 of them, i.e., 53%. This was followed by SVM which topped in 41% of the studies it was considered. Conclusion This study provides a wide overview of the relative performance of different variants of supervised machine learning algorithms for disease prediction. This important information of relative performance can be used to aid researchers in the selection of an appropriate supervised machine learning algorithm for their studies.
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 ObjectType-Article-2 ObjectType-Feature-1 content type line 23
ISSN:	1472-6947 1472-6947
DOI:	10.1186/s12911-019-1004-8