The influence of word normalization in English document clustering
Stemming or lemmatization method is a key step in English document processing. Based on three clustering algorithms and two evaluation functions, the paper makes a comprehensive study about two stemming algorithms and one lemmatization algorithm. According to the experimental result, it shows that t...
Saved in:
| Published in | 2012 IEEE International Conference on Computer Science and Automation Engineering Vol. 2; pp. 116 - 120 |
|---|---|
| Main Authors | , , , |
| Format | Conference Proceeding |
| Language | English |
| Published |
IEEE
01.05.2012
|
| Subjects | |
| Online Access | Get full text |
| ISBN | 1467300888 9781467300889 |
| DOI | 10.1109/CSAE.2012.6272740 |
Cover
| Summary: | Stemming or lemmatization method is a key step in English document processing. Based on three clustering algorithms and two evaluation functions, the paper makes a comprehensive study about two stemming algorithms and one lemmatization algorithm. According to the experimental result, it shows that the performance is not remarkable, compared with Snowball stemmer and Stanford lemmatization, Porter stemmer can make a better performance in entropy and purity. |
|---|---|
| ISBN: | 1467300888 9781467300889 |
| DOI: | 10.1109/CSAE.2012.6272740 |