Missing value imputation using decision trees and decision forests by splitting and merging records: Two novel techniques

•Two novel missing value imputation techniques.•Justification of the basic concepts of the techniques through some empirical analyses.•Experimentation on nine data sets, four evaluation criteria.•Comparison with two existing techniques.•A complexity analysis of all techniques. We present two novel t...

Full description

Saved in:
Bibliographic Details
Published inKnowledge-based systems Vol. 53; pp. 51 - 65
Main Authors Rahman, Md. Geaur, Islam, Md Zahidul
Format Journal Article
LanguageEnglish
Published Elsevier B.V 01.11.2013
Subjects
Online AccessGet full text
ISSN0950-7051
1872-7409
DOI10.1016/j.knosys.2013.08.023

Cover

More Information
Summary:•Two novel missing value imputation techniques.•Justification of the basic concepts of the techniques through some empirical analyses.•Experimentation on nine data sets, four evaluation criteria.•Comparison with two existing techniques.•A complexity analysis of all techniques. We present two novel techniques for the imputation of both categorical and numerical missing values. The techniques use decision trees and forests to identify horizontal segments of a data set where the records belonging to a segment have higher similarity and attribute correlations. Using the similarity and correlations, missing values are then imputed. To achieve a higher quality of imputation some segments are merged together using a novel approach. We use nine publicly available data sets to experimentally compare our techniques with a few existing ones in terms of four commonly used evaluation criteria. The experimental results indicate a clear superiority of our techniques based on statistical analyses such as confidence interval.
Bibliography:ObjectType-Article-2
SourceType-Scholarly Journals-1
ObjectType-Feature-1
content type line 23
ObjectType-Article-1
ObjectType-Feature-2
ISSN:0950-7051
1872-7409
DOI:10.1016/j.knosys.2013.08.023