CKNNI: An Improved KNN-Based Missing Value Handling Technique

In data mining field, experimental data sets are often incomplete due to the imperfect nature of real world situations. However, the incompleteness of data sets generally leads to biased outcomes. Thus, data completeness is one of the most essential challenges among data mining tasks. In order to ac...

Full description

Saved in:
Bibliographic Details
Published inAdvanced Intelligent Computing Theories and Applications Vol. 9227; pp. 441 - 452
Main Authors Jiang, Chao, Yang, Zijiang
Format Book Chapter
LanguageEnglish
Published Switzerland Springer International Publishing AG 2015
Springer International Publishing
SeriesLecture Notes in Computer Science
Subjects
Online AccessGet full text
ISBN9783319220529
3319220527
ISSN0302-9743
1611-3349
DOI10.1007/978-3-319-22053-6_47

Cover

More Information
Summary:In data mining field, experimental data sets are often incomplete due to the imperfect nature of real world situations. However, the incompleteness of data sets generally leads to biased outcomes. Thus, data completeness is one of the most essential challenges among data mining tasks. In order to achieve better outcome many researchers have explored various techniques to reduce data incompleteness, and some existing methods have been widely used in real world applications. This paper first discusses some existing representative missing data handling techniques with their advantages and drawbacks. Then a new improved KNN based algorithm, Class-BasedK-clusters Nearest Neighbor Imputation (CKNNI) is proposed, which integrates K-means cluster algorithm and conventional KNN algorithm to impute missing values in data sets. By clustering instances in the same class with K-means algorithm, CKNNI method then applies KNN algorithm to select a closest neighbor from the set of centroids in resulted clusters, and missing values are imputed with the ones from corresponding variables in a selected neighbor. Finally, the comparison based on multiple data sets indicates that CKNNI has improved the performance of KNN imputation significantly on large data sets yet comparative to other superior missing value handling algorithms.
ISBN:9783319220529
3319220527
ISSN:0302-9743
1611-3349
DOI:10.1007/978-3-319-22053-6_47