A Comparison of Dimensionality Reduction Techniques for Web Structure Mining

In many domains, dimensionality reduction techniques have been shown to be very effective for elucidating the underlying semantics of data. Thus, in this paper we investigate the use of various dimensionality reduction techniques (DRTs) to extract the implicit structures hidden in the web hyperlink...

Full description

Saved in:

Bibliographic Details
Published in	Proceedings of the IEEE/WIC/ACM International Conference on Web Intelligence pp. 116 - 119
Main Authors	Chikhi, Nacim Fateh, Rothenburger, Bernard, Aussenac-Gilles, Nathalie
Format	Conference Proceeding
Language	English
Published	Washington, DC, USA IEEE Computer Society 02.11.2007
Series	ACM Conferences
Subjects	Information systems > Information retrieval Information systems > Information retrieval > Evaluation of retrieval results Information systems > Information systems applications > Data mining Information systems > World Wide Web > Web applications Information systems > World Wide Web > Web services Mathematics of computing > Mathematical analysis > Numerical analysis > Computations on matrices Mathematics of computing > Probability and statistics > Statistical paradigms Mathematics of computing > Probability and statistics > Statistical paradigms > Statistical graphics Theory of computation > Computational complexity and cryptography > Problems, reductions and completeness
Online Access	Get full text
ISBN	0769530265 9780769530260
DOI	10.1109/WI.2007.6

Cover

More Information
Summary:	In many domains, dimensionality reduction techniques have been shown to be very effective for elucidating the underlying semantics of data. Thus, in this paper we investigate the use of various dimensionality reduction techniques (DRTs) to extract the implicit structures hidden in the web hyperlink connectivity. We apply and compare four DRTs, namely, Principal Component Analysis (PCA), Non-negative Matrix Factorization (NMF), Independent Component Analysis (ICA) and Random Projection (RP). Experiments conducted on three datasets allow us to assert the following: NMF outperforms PCA and ICA in terms of stability and interpretability of the discovered structures; the wellknown WebKb dataset used in a large number of works about the analysis of the hyperlink connectivity seems to be not adapted for this task and we suggest rather to use the recent Wikipedia dataset which is better suited.
ISBN:	0769530265 9780769530260
DOI:	10.1109/WI.2007.6