Blogosonomy Autotagging Any Text Using Bloggers' Knowledge
There are at least three barriers to utilizing blog tags in classification or navigation: 40% of entries are not (from our observations) tagged, there are many orthographic or synonymous tag variations, and not all tags are informative.We propose a method of multi-autotagging, based on k-NN, which i...
Saved in:
| Published in | Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence pp. 205 - 212 |
|---|---|
| Main Authors | , , |
| Format | Conference Proceeding |
| Language | English |
| Published |
Washington, DC, USA
IEEE Computer Society
02.11.2007
|
| Series | ACM Conferences |
| Subjects |
Computing methodologies
> Machine learning
> Learning paradigms
> Supervised learning
> Supervised learning by classification
Computing methodologies
> Machine learning
> Learning paradigms
> Unsupervised learning
> Cluster analysis
|
| Online Access | Get full text |
| ISBN | 0769530265 9780769530260 |
| DOI | 10.1109/WI.2007.31 |
Cover
| Summary: | There are at least three barriers to utilizing blog tags in classification or navigation: 40% of entries are not (from our observations) tagged, there are many orthographic or synonymous tag variations, and not all tags are informative.We propose a method of multi-autotagging, based on k-NN, which is a case-based classijication method. Our method also has the functions of merging tags with the same meaning and identifying informative tags. For realizing these functions, we propose the term weighting method named residual document frequency(RDF); it can score the similarity between tags. Experiments show the effectiveness of our methods. Our autotagging system is generic and can assign tag(s) to any text as well as blog entries although the training data is collected from the blogosophere. |
|---|---|
| ISBN: | 0769530265 9780769530260 |
| DOI: | 10.1109/WI.2007.31 |