A Fast KNN Algorithm for Text Categorization

The KNN algorithm applied to text categorization is a simple, valid and non-parameter method. The traditional KNN has a fatal defect that the time of similarity computing is huge. The practicality will be lost when the KNN algorithm is applied to text categorization with the high dimension and huge...

Full description

Saved in:
Bibliographic Details
Published in2007 International Conference on Machine Learning and Cybernetics Vol. 6; pp. 3436 - 3441
Main Authors Yu Wang, Zheng-Ou Wang
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.08.2007
Subjects
Online AccessGet full text
ISBN1424409721
9781424409723
ISSN2160-133X
DOI10.1109/ICMLC.2007.4370742

Cover

More Information
Summary:The KNN algorithm applied to text categorization is a simple, valid and non-parameter method. The traditional KNN has a fatal defect that the time of similarity computing is huge. The practicality will be lost when the KNN algorithm is applied to text categorization with the high dimension and huge samples. In this paper, a method called TFKNN(Tree-Fast-K-Nearest-Neighbor) is presented, which can search the exact k nearest neighbors quickly. In the method, a SSR tree for searching K nearest neighbors is created, in which all child nodes of each non-leaf node are ranked according to the distances between their central points and the central point of their parent. Then the searching scope is reduced based on the tree. Subsequently , the time of similarity computing is decreased largely.
ISBN:1424409721
9781424409723
ISSN:2160-133X
DOI:10.1109/ICMLC.2007.4370742