Modelling the map reduce based optimal gradient boosted tree classification algorithm for diabetes mellitus diagnosis system
In recent days, the term big data become popular and refers to data heterogeneity and massive quantity which gets updated and multiplied in every fraction of second. This paper discusses the application of big data and its impact on the medical domain. It is noted that the usage of big data models a...
Saved in:
| Published in | Journal of ambient intelligence and humanized computing Vol. 12; no. 2; pp. 1717 - 1730 |
|---|---|
| Main Authors | , |
| Format | Journal Article |
| Language | English |
| Published |
Berlin/Heidelberg
Springer Berlin Heidelberg
01.02.2021
Springer Nature B.V |
| Subjects | |
| Online Access | Get full text |
| ISSN | 1868-5137 1868-5145 |
| DOI | 10.1007/s12652-020-02242-1 |
Cover
| Summary: | In recent days, the term big data become popular and refers to data heterogeneity and massive quantity which gets updated and multiplied in every fraction of second. This paper discusses the application of big data and its impact on the medical domain. It is noted that the usage of big data models and methods is seamlessly using in the management of exponential data growth in the healthcare domain. Presently, it is complex to visualize how machine learning and big data will have an impact on the medical field. At the same time, there is a significant increment in the number of persons suffers from diabetes mellitus (DM) in various healing centers. This study develops a new map reduce based optimal data classifier (MRODC) technique to diagnose DM efficiently. The presented MRODC model involves different stages of the Hadoop ecosystem, data acquisition, and classification based on gradient boosting tree (GBT). To further improvising the classifier results of the GBT, an improved k-means clustering approach is integrated into it. The traditional K-means clustering involves a random generation of seed value, which greatly affects the cluster’s outcome. In improved K-means clustering, a new mechanism is introduced, which sets the seed value based on the minimal clustering error (CE). A detailed experimentation takes place on the benchmark PIMA Indians Diabetes dataset under several aspects. The obtained simulation outcome depicted that the presented MRODC model produces consistently better results over the compared methods with a supreme precision of 99.23, recall of 97.48, accuracy of 97.79, F-score of 98.34, and
κ
value of 95.02 . |
|---|---|
| Bibliography: | ObjectType-Article-1 SourceType-Scholarly Journals-1 ObjectType-Feature-2 content type line 14 |
| ISSN: | 1868-5137 1868-5145 |
| DOI: | 10.1007/s12652-020-02242-1 |