A Comparison of Dimensionality Reduction Techniques for Web Structure Mining

In many domains, dimensionality reduction techniques have been shown to be very effective for elucidating the underlying semantics of data. Thus, in this paper we investigate the use of various dimensionality reduction techniques (DRTs) to extract the implicit structures hidden in the web hyperlink...

Full description

Saved in:
Bibliographic Details
Published inProceedings of the IEEE/WIC/ACM International Conference on Web Intelligence pp. 116 - 119
Main Authors Chikhi, Nacim Fateh, Rothenburger, Bernard, Aussenac-Gilles, Nathalie
Format Conference Proceeding
LanguageEnglish
Published Washington, DC, USA IEEE Computer Society 02.11.2007
SeriesACM Conferences
Subjects
Online AccessGet full text
ISBN0769530265
9780769530260
DOI10.1109/WI.2007.6

Cover

More Information
Summary:In many domains, dimensionality reduction techniques have been shown to be very effective for elucidating the underlying semantics of data. Thus, in this paper we investigate the use of various dimensionality reduction techniques (DRTs) to extract the implicit structures hidden in the web hyperlink connectivity. We apply and compare four DRTs, namely, Principal Component Analysis (PCA), Non-negative Matrix Factorization (NMF), Independent Component Analysis (ICA) and Random Projection (RP). Experiments conducted on three datasets allow us to assert the following: NMF outperforms PCA and ICA in terms of stability and interpretability of the discovered structures; the wellknown WebKb dataset used in a large number of works about the analysis of the hyperlink connectivity seems to be not adapted for this task and we suggest rather to use the recent Wikipedia dataset which is better suited.
ISBN:0769530265
9780769530260
DOI:10.1109/WI.2007.6