RACEkNN: A hybrid approach for improving the effectiveness of the k-nearest neighbor algorithm

Classification is a fundamental task in data mining, involving the prediction of class labels for new data. k-Nearest Neighbors (kNN), a lazy learning algorithm, is sensitive to data distribution and suffers from high computational costs due to the requirement of finding the closest neighbors across...

Full description

Saved in:
Bibliographic Details
Published inKnowledge-based systems Vol. 301; p. 112357
Main Authors Ebrahimi, Mahdiyeh, Basiri, Alireza
Format Journal Article
LanguageEnglish
Published Elsevier B.V 09.10.2024
Subjects
Online AccessGet full text
ISSN0950-7051
DOI10.1016/j.knosys.2024.112357

Cover

More Information
Summary:Classification is a fundamental task in data mining, involving the prediction of class labels for new data. k-Nearest Neighbors (kNN), a lazy learning algorithm, is sensitive to data distribution and suffers from high computational costs due to the requirement of finding the closest neighbors across the entire training set. Recent advancements in classification techniques have led to the development of hybrid algorithms that combine the strengths of multiple methods to address specific limitations. In response to the inherent execution time constraint of kNN and the impact of data distribution on its performance, we propose RACEkNN (Rule Aggregating ClassifiEr kNN), a hybrid solution that integrates kNN with RACER, a newly devised rule-based classifier. RACER improves predictive capability and decreases kNN’s runtime by creating more generalized rules, each encompassing a subset of training instances with similar characteristics. During prediction, a test instance is compared to these rules based on its features. By selecting the rule with the closest match, the test instance identifies the most relevant subset of training data for kNN. This significantly reduces the data kNN needs to consider, leading to faster execution times and enhanced prediction accuracy. Empirical findings demonstrate that RACEkNN outperforms kNN in terms of both runtime and accuracy. Additionally, it surpasses RACER, four well-known classifiers, and certain kNN-based methods in terms of accuracy.11The datasets used in our experiments, along with the code, can be accessed at https://github.com/mahdiyehebrahimi/RACEkNN. •RACEkNN integrates kNN and RACER algorithms to boost kNN’s accuracy and efficiency.•RACEkNN reduces kNN’s runtime with precise rule-based subsets for faster prediction.•General rules’ precise subsets decrease the impact of data distribution for kNN.•RACEkNN surpasses RACER and four well-known classifiers in terms of accuracy.
ISSN:0950-7051
DOI:10.1016/j.knosys.2024.112357