CKNNI: An Improved KNN-Based Missing Value Handling Technique
In data mining field, experimental data sets are often incomplete due to the imperfect nature of real world situations. However, the incompleteness of data sets generally leads to biased outcomes. Thus, data completeness is one of the most essential challenges among data mining tasks. In order to ac...
Saved in:
| Published in | Advanced Intelligent Computing Theories and Applications Vol. 9227; pp. 441 - 452 |
|---|---|
| Main Authors | , |
| Format | Book Chapter |
| Language | English |
| Published |
Switzerland
Springer International Publishing AG
2015
Springer International Publishing |
| Series | Lecture Notes in Computer Science |
| Subjects | |
| Online Access | Get full text |
| ISBN | 9783319220529 3319220527 |
| ISSN | 0302-9743 1611-3349 |
| DOI | 10.1007/978-3-319-22053-6_47 |
Cover
| Summary: | In data mining field, experimental data sets are often incomplete due to the imperfect nature of real world situations. However, the incompleteness of data sets generally leads to biased outcomes. Thus, data completeness is one of the most essential challenges among data mining tasks. In order to achieve better outcome many researchers have explored various techniques to reduce data incompleteness, and some existing methods have been widely used in real world applications. This paper first discusses some existing representative missing data handling techniques with their advantages and drawbacks. Then a new improved KNN based algorithm, Class-BasedK-clusters Nearest Neighbor Imputation (CKNNI) is proposed, which integrates K-means cluster algorithm and conventional KNN algorithm to impute missing values in data sets. By clustering instances in the same class with K-means algorithm, CKNNI method then applies KNN algorithm to select a closest neighbor from the set of centroids in resulted clusters, and missing values are imputed with the ones from corresponding variables in a selected neighbor. Finally, the comparison based on multiple data sets indicates that CKNNI has improved the performance of KNN imputation significantly on large data sets yet comparative to other superior missing value handling algorithms. |
|---|---|
| ISBN: | 9783319220529 3319220527 |
| ISSN: | 0302-9743 1611-3349 |
| DOI: | 10.1007/978-3-319-22053-6_47 |