Modelling the map reduce based optimal gradient boosted tree classification algorithm for diabetes mellitus diagnosis system

In recent days, the term big data become popular and refers to data heterogeneity and massive quantity which gets updated and multiplied in every fraction of second. This paper discusses the application of big data and its impact on the medical domain. It is noted that the usage of big data models a...

Full description

Saved in:

Bibliographic Details
Published in	Journal of ambient intelligence and humanized computing Vol. 12; no. 2; pp. 1717 - 1730
Main Authors	Selvi, R. Thanga, Muthulakshmi, I.
Format	Journal Article
Language	English
Published	Berlin/Heidelberg Springer Berlin Heidelberg 01.02.2021 Springer Nature B.V
Subjects	Algorithms Artificial Intelligence Big Data Chronic illnesses Classification Classifiers Cluster analysis Clustering Computational Intelligence Customer relationship management Data acquisition Data mining Datasets Diabetes Diabetes mellitus Engineering Heterogeneity Machine learning Medical diagnosis Original Research Patients Robotics and Automation User Interfaces and Human Computer Interaction Vector quantization China Big data Clustering Diabetes mellitus Classification Hadoop Gradient boosting
Online Access	Get full text
ISSN	1868-5137 1868-5145
DOI	10.1007/s12652-020-02242-1

Cover

More Information
Summary:	In recent days, the term big data become popular and refers to data heterogeneity and massive quantity which gets updated and multiplied in every fraction of second. This paper discusses the application of big data and its impact on the medical domain. It is noted that the usage of big data models and methods is seamlessly using in the management of exponential data growth in the healthcare domain. Presently, it is complex to visualize how machine learning and big data will have an impact on the medical field. At the same time, there is a significant increment in the number of persons suffers from diabetes mellitus (DM) in various healing centers. This study develops a new map reduce based optimal data classifier (MRODC) technique to diagnose DM efficiently. The presented MRODC model involves different stages of the Hadoop ecosystem, data acquisition, and classification based on gradient boosting tree (GBT). To further improvising the classifier results of the GBT, an improved k-means clustering approach is integrated into it. The traditional K-means clustering involves a random generation of seed value, which greatly affects the cluster’s outcome. In improved K-means clustering, a new mechanism is introduced, which sets the seed value based on the minimal clustering error (CE). A detailed experimentation takes place on the benchmark PIMA Indians Diabetes dataset under several aspects. The obtained simulation outcome depicted that the presented MRODC model produces consistently better results over the compared methods with a supreme precision of 99.23, recall of 97.48, accuracy of 97.79, F-score of 98.34, and κ value of 95.02 .
Bibliography:	ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14
ISSN:	1868-5137 1868-5145
DOI:	10.1007/s12652-020-02242-1