Modelling the map reduce based optimal gradient boosted tree classification algorithm for diabetes mellitus diagnosis system

In recent days, the term big data become popular and refers to data heterogeneity and massive quantity which gets updated and multiplied in every fraction of second. This paper discusses the application of big data and its impact on the medical domain. It is noted that the usage of big data models a...

Full description

Saved in:
Bibliographic Details
Published inJournal of ambient intelligence and humanized computing Vol. 12; no. 2; pp. 1717 - 1730
Main Authors Selvi, R. Thanga, Muthulakshmi, I.
Format Journal Article
LanguageEnglish
Published Berlin/Heidelberg Springer Berlin Heidelberg 01.02.2021
Springer Nature B.V
Subjects
Online AccessGet full text
ISSN1868-5137
1868-5145
DOI10.1007/s12652-020-02242-1

Cover

More Information
Summary:In recent days, the term big data become popular and refers to data heterogeneity and massive quantity which gets updated and multiplied in every fraction of second. This paper discusses the application of big data and its impact on the medical domain. It is noted that the usage of big data models and methods is seamlessly using in the management of exponential data growth in the healthcare domain. Presently, it is complex to visualize how machine learning and big data will have an impact on the medical field. At the same time, there is a significant increment in the number of persons suffers from diabetes mellitus (DM) in various healing centers. This study develops a new map reduce based optimal data classifier (MRODC) technique to diagnose DM efficiently. The presented MRODC model involves different stages of the Hadoop ecosystem, data acquisition, and classification based on gradient boosting tree (GBT). To further improvising the classifier results of the GBT, an improved k-means clustering approach is integrated into it. The traditional K-means clustering involves a random generation of seed value, which greatly affects the cluster’s outcome. In improved K-means clustering, a new mechanism is introduced, which sets the seed value based on the minimal clustering error (CE). A detailed experimentation takes place on the benchmark PIMA Indians Diabetes dataset under several aspects. The obtained simulation outcome depicted that the presented MRODC model produces consistently better results over the compared methods with a supreme precision of 99.23, recall of 97.48, accuracy of 97.79, F-score of 98.34, and κ value of 95.02 .
Bibliography:ObjectType-Article-1
SourceType-Scholarly Journals-1
ObjectType-Feature-2
content type line 14
ISSN:1868-5137
1868-5145
DOI:10.1007/s12652-020-02242-1