The influence of word normalization in English document clustering

Stemming or lemmatization method is a key step in English document processing. Based on three clustering algorithms and two evaluation functions, the paper makes a comprehensive study about two stemming algorithms and one lemmatization algorithm. According to the experimental result, it shows that t...

Full description

Saved in:
Bibliographic Details
Published in2012 IEEE International Conference on Computer Science and Automation Engineering Vol. 2; pp. 116 - 120
Main Authors Pu Han, Si Shen, Dongbo Wang, Yanyun Liu
Format Conference Proceeding
LanguageEnglish
Published IEEE 01.05.2012
Subjects
Online AccessGet full text
ISBN1467300888
9781467300889
DOI10.1109/CSAE.2012.6272740

Cover

More Information
Summary:Stemming or lemmatization method is a key step in English document processing. Based on three clustering algorithms and two evaluation functions, the paper makes a comprehensive study about two stemming algorithms and one lemmatization algorithm. According to the experimental result, it shows that the performance is not remarkable, compared with Snowball stemmer and Stanford lemmatization, Porter stemmer can make a better performance in entropy and purity.
ISBN:1467300888
9781467300889
DOI:10.1109/CSAE.2012.6272740