A Comparison of Dimensionality Reduction Techniques for Web Structure Mining
In many domains, dimensionality reduction techniques have been shown to be very effective for elucidating the underlying semantics of data. Thus, in this paper we investigate the use of various dimensionality reduction techniques (DRTs) to extract the implicit structures hidden in the web hyperlink...
Saved in:
| Published in | Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence pp. 116 - 119 |
|---|---|
| Main Authors | , , |
| Format | Conference Proceeding |
| Language | English |
| Published |
Washington, DC, USA
IEEE Computer Society
02.11.2007
|
| Series | ACM Conferences |
| Subjects | |
| Online Access | Get full text |
| ISBN | 0769530265 9780769530260 |
| DOI | 10.1109/WI.2007.6 |
Cover
| Summary: | In many domains, dimensionality reduction techniques have been shown to be very effective for elucidating the underlying semantics of data. Thus, in this paper we investigate the use of various dimensionality reduction techniques (DRTs) to extract the implicit structures hidden in the web hyperlink connectivity. We apply and compare four DRTs, namely, Principal Component Analysis (PCA), Non-negative Matrix Factorization (NMF), Independent Component Analysis (ICA) and Random Projection (RP). Experiments conducted on three datasets allow us to assert the following: NMF outperforms PCA and ICA in terms of stability and interpretability of the discovered structures; the wellknown WebKb dataset used in a large number of works about the analysis of the hyperlink connectivity seems to be not adapted for this task and we suggest rather to use the recent Wikipedia dataset which is better suited. |
|---|---|
| ISBN: | 0769530265 9780769530260 |
| DOI: | 10.1109/WI.2007.6 |