Data Sparsity Issues in the Collaborative Filtering Framework

With the amount of available information on the Web growing rapidly with each day, the need to automatically filter the information in order to ensure greater user efficiency has emerged. Within the fields of user profiling and Web personalization several popular content filtering techniques have be...

Full description

Saved in:
Bibliographic Details
Published inAdvances in Web Mining and Web Usage Analysis pp. 58 - 76
Main Authors Grčar, Miha, Mladenič, Dunja, Fortuna, Blaž, Grobelnik, Marko
Format Book Chapter
LanguageEnglish
Published Berlin, Heidelberg Springer Berlin Heidelberg 2006
SeriesLecture Notes in Computer Science
Online AccessGet full text
ISBN3540463461
9783540463467
ISSN0302-9743
1611-3349
DOI10.1007/11891321_4

Cover

Abstract With the amount of available information on the Web growing rapidly with each day, the need to automatically filter the information in order to ensure greater user efficiency has emerged. Within the fields of user profiling and Web personalization several popular content filtering techniques have been developed. In this chapter we present one of such techniques – collaborative filtering. Apart from giving an overview of collaborative filtering approaches, we present the experimental results of confronting the k-Nearest Neighbor (kNN) algorithm with Support Vector Machine (SVM) in the collaborative filtering framework using datasets with different properties. While the k-Nearest Neighbor algorithm is usually used for collaborative filtering tasks, Support Vector Machine is considered a state-of-the-art classification algorithm. Since collaborative filtering can also be interpreted as a classification/regression task, virtually any supervised learning algorithm (such as SVM) can also be applied. Experiments were performed on two standard, publicly available datasets and, on the other hand, on a real-life corporate dataset that does not fit the profile of ideal data for collaborative filtering. We conclude that the quality of collaborative filtering recommendations is highly dependent on the sparsity of available data. Furthermore, we show that kNN is dominant on datasets with relatively low sparsity while SVM-based approaches may perform better on highly sparse data.
AbstractList With the amount of available information on the Web growing rapidly with each day, the need to automatically filter the information in order to ensure greater user efficiency has emerged. Within the fields of user profiling and Web personalization several popular content filtering techniques have been developed. In this chapter we present one of such techniques – collaborative filtering. Apart from giving an overview of collaborative filtering approaches, we present the experimental results of confronting the k-Nearest Neighbor (kNN) algorithm with Support Vector Machine (SVM) in the collaborative filtering framework using datasets with different properties. While the k-Nearest Neighbor algorithm is usually used for collaborative filtering tasks, Support Vector Machine is considered a state-of-the-art classification algorithm. Since collaborative filtering can also be interpreted as a classification/regression task, virtually any supervised learning algorithm (such as SVM) can also be applied. Experiments were performed on two standard, publicly available datasets and, on the other hand, on a real-life corporate dataset that does not fit the profile of ideal data for collaborative filtering. We conclude that the quality of collaborative filtering recommendations is highly dependent on the sparsity of available data. Furthermore, we show that kNN is dominant on datasets with relatively low sparsity while SVM-based approaches may perform better on highly sparse data.
Author Mladenič, Dunja
Grčar, Miha
Fortuna, Blaž
Grobelnik, Marko
Author_xml – sequence: 1
  givenname: Miha
  surname: Grčar
  fullname: Grčar, Miha
  email: miha.grcar@ijs.si
  organization: Jožef Stefan Institute, Ljubljana, Slovenia
– sequence: 2
  givenname: Dunja
  surname: Mladenič
  fullname: Mladenič, Dunja
  organization: Jožef Stefan Institute, Ljubljana, Slovenia
– sequence: 3
  givenname: Blaž
  surname: Fortuna
  fullname: Fortuna, Blaž
  organization: Jožef Stefan Institute, Ljubljana, Slovenia
– sequence: 4
  givenname: Marko
  surname: Grobelnik
  fullname: Grobelnik, Marko
  organization: Jožef Stefan Institute, Ljubljana, Slovenia
BookMark eNpFkLFOwzAURQ0UiaSw8AUeWQLv-TmxPTCglkClSgzAHNmJA6EhruIA4u-hAqm6wx2udHR1UjYbwuAZO0e4RAB1hagNksBKHrCUcgmyIKn1IUuwQMyIpDnaDwXOWAIEIjNK0glLY3wDAKGMSNj10k6WP27tGLvpm69i_PCRdwOfXj1fhL63Lox26j49L7t-8mM3vPBytO_-K4ybU3bc2j76s_-es-fy9mlxn60f7laLm3UWEUhmhamFq2tqf4NGNJJQ1Mq3jUItndEqF1I6cnlRK2i1dlr5hloF0mqoBdCcXfxx43Z3wI-VC2ETK4RqJ6TaC6EfaHBOwg
ContentType Book Chapter
Copyright Springer-Verlag Berlin Heidelberg 2006
Copyright_xml – notice: Springer-Verlag Berlin Heidelberg 2006
DOI 10.1007/11891321_4
DatabaseTitleList
DeliveryMethod fulltext_linktorsrc
Discipline Engineering
Library & Information Science
Computer Science
EISBN 3540463488
9783540463481
EISSN 1611-3349
Editor Spiliopoulou, Myra
Nasraoui, Olfa
Mobasher, Bamshad
Yu, Philip S.
Zaïane, Osmar
Masand, Brij
Editor_xml – sequence: 1
  givenname: Olfa
  surname: Nasraoui
  fullname: Nasraoui, Olfa
  email: olfa.nasraoui@louisville.edu
– sequence: 2
  givenname: Osmar
  surname: Zaïane
  fullname: Zaïane, Osmar
  email: zaiane@cs.ualberta.ca
– sequence: 3
  givenname: Myra
  surname: Spiliopoulou
  fullname: Spiliopoulou, Myra
  email: myra@iti.cs.uni-magdeburg.de
– sequence: 4
  givenname: Bamshad
  surname: Mobasher
  fullname: Mobasher, Bamshad
  email: mobasher@cti.depaul.edu
– sequence: 5
  givenname: Brij
  surname: Masand
  fullname: Masand, Brij
  email: brij@data-miners.com
– sequence: 6
  givenname: Philip S.
  surname: Yu
  fullname: Yu, Philip S.
  email: psyu@cs.uic.com
EndPage 76
GroupedDBID -DT
-GH
-~X
1SB
29L
2HA
2HV
5QI
875
AASHB
ABMNI
ACGFS
ADCXD
AEFIE
ALMA_UNASSIGNED_HOLDINGS
EJD
F5P
FEDTE
HVGLF
LAS
LDH
P2P
RNI
RSU
SVGTG
VI1
~02
ID FETCH-LOGICAL-s1034-69c2bcc3f3f3192d4312c7efd7184b9875244b3b56c70f88b87ed3f704a80c203
ISBN 3540463461
9783540463467
ISSN 0302-9743
IngestDate Wed Sep 17 03:30:20 EDT 2025
IsPeerReviewed false
IsScholarly false
Language English
LinkModel OpenURL
MergedId FETCHMERGED-LOGICAL-s1034-69c2bcc3f3f3192d4312c7efd7184b9875244b3b56c70f88b87ed3f704a80c203
PageCount 19
ParticipantIDs springer_books_10_1007_11891321_4
PublicationCentury 2000
PublicationDate 2006
PublicationDateYYYYMMDD 2006-01-01
PublicationDate_xml – year: 2006
  text: 2006
PublicationDecade 2000
PublicationPlace Berlin, Heidelberg
PublicationPlace_xml – name: Berlin, Heidelberg
PublicationSeriesSubtitle Lecture Notes in Artificial Intelligence
PublicationSeriesTitle Lecture Notes in Computer Science
PublicationSubtitle 7th International Workshop on Knowledge Discovery on the Web, WebKDD 2005, Chicago, IL, USA, August 21, 2005. Revised Papers
PublicationTitle Advances in Web Mining and Web Usage Analysis
PublicationYear 2006
Publisher Springer Berlin Heidelberg
Publisher_xml – name: Springer Berlin Heidelberg
RelatedPersons Kleinberg, Jon M.
Mattern, Friedemann
Nierstrasz, Oscar
Tygar, Dough
Steffen, Bernhard
Kittler, Josef
Vardi, Moshe Y.
Weikum, Gerhard
Sudan, Madhu
Naor, Moni
Mitchell, John C.
Terzopoulos, Demetri
Pandu Rangan, C.
Kanade, Takeo
Hutchison, David
RelatedPersons_xml – sequence: 1
  givenname: David
  surname: Hutchison
  fullname: Hutchison, David
  organization: Lancaster University, UK
– sequence: 2
  givenname: Takeo
  surname: Kanade
  fullname: Kanade, Takeo
  organization: Carnegie Mellon University, Pittsburgh, USA
– sequence: 3
  givenname: Josef
  surname: Kittler
  fullname: Kittler, Josef
  organization: University of Surrey, Guildford, UK
– sequence: 4
  givenname: Jon M.
  surname: Kleinberg
  fullname: Kleinberg, Jon M.
  organization: Cornell University, Ithaca, USA
– sequence: 5
  givenname: Friedemann
  surname: Mattern
  fullname: Mattern, Friedemann
  organization: ETH Zurich, Switzerland
– sequence: 6
  givenname: John C.
  surname: Mitchell
  fullname: Mitchell, John C.
  organization: Stanford University, CA, USA
– sequence: 7
  givenname: Moni
  surname: Naor
  fullname: Naor, Moni
  organization: Weizmann Institute of Science, Rehovot, Israel
– sequence: 8
  givenname: Oscar
  surname: Nierstrasz
  fullname: Nierstrasz, Oscar
  organization: University of Bern, Switzerland
– sequence: 9
  givenname: C.
  surname: Pandu Rangan
  fullname: Pandu Rangan, C.
  organization: Indian Institute of Technology, Madras, India
– sequence: 10
  givenname: Bernhard
  surname: Steffen
  fullname: Steffen, Bernhard
  organization: University of Dortmund, Germany
– sequence: 11
  givenname: Madhu
  surname: Sudan
  fullname: Sudan, Madhu
  organization: Massachusetts Institute of Technology, MA, USA
– sequence: 12
  givenname: Demetri
  surname: Terzopoulos
  fullname: Terzopoulos, Demetri
  organization: University of California, Los Angeles, USA
– sequence: 13
  givenname: Dough
  surname: Tygar
  fullname: Tygar, Dough
  organization: University of California, Berkeley, USA
– sequence: 14
  givenname: Moshe Y.
  surname: Vardi
  fullname: Vardi, Moshe Y.
  organization: Rice University, Houston, USA
– sequence: 15
  givenname: Gerhard
  surname: Weikum
  fullname: Weikum, Gerhard
  organization: Max-Planck Institute of Computer Science, Saarbruecken, Germany
SSID ssj0002792
ssj0000316079
Score 1.3856463
Snippet With the amount of available information on the Web growing rapidly with each day, the need to automatically filter the information in order to ensure greater...
SourceID springer
SourceType Publisher
StartPage 58
Title Data Sparsity Issues in the Collaborative Filtering Framework
URI http://link.springer.com/10.1007/11891321_4
hasFullText 1
inHoldings 1
isFullTextHit
isPrint
link http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3fb9MwELa6IiTGA1BA_BoyEuOlCnIax0keeGBdq2la98IKe6tsx9ECVTqtKQ_8F_zHnHN20jE0AaoUtW7kuP7c8_nuvjtC3qZZGhqho4DJXAY8zUWgpOCBSSRsx0VseGiJwrNTcTTnx-fxea_3cytqaVOr9_rHH3kl_4MqtAGuliX7D8i2nUIDvAd84QoIw_U35fe6mRXDi9F738SzfjFqOGtKPTTOAPtxbiPG2pwjXZhNoz0eYlj1rLxopfJsKUEElfh1I4k21VfZbVJX9QbpYwdLuT-O9w8mXZ8rZZZV-c1zf1bb6_BQ1nL46VJi8AcW-_PBleNuEX43w2lpPff2N0x9xBjKPJuLef3hxHk7Tlc19uALUnj5dKsBwxswh7fk93L2KS4ijsU7POULxDkciFBCGpTgwuZljDAPqpPKmBze7e9YbubGzoHBInDcyuB8Hi74DtmBh_bJnY-T45PPrdkOZKBg1rHpNnubfxEdVTgSRx9qRhrebS1rOPKt9Ljdg2644BvN5uwhuW_ZLtTSUGAqH5GeqQbkgZ9a6qZ2QHa3ElcOyJ6ju9B31PHZ7CL3dz8mDerUo04RdVpWFFCn11CnLeq0Rf0JmU8nZ-OjwBXqCNYhi3ggMj1SWkcFvODEkINSOtKJKXJQfLjK4EgMSqSKVCx0woo0VWli8qhIGJcp0yMWPSX9alWZZ4SqMJKCmQzaDdcjo0DjzEOWc6N5qpP4OXnjZ2th_3rrhc-73c3oi7-45yW5163FV6RfX23MHqiXtXrtIP8FrXpyKw
linkProvider Library Specific Holdings
openUrl ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Advances+in+Web+Mining+and+Web+Usage+Analysis&rft.au=Gr%C4%8Dar%2C+Miha&rft.au=Mladeni%C4%8D%2C+Dunja&rft.au=Fortuna%2C+Bla%C5%BE&rft.au=Grobelnik%2C+Marko&rft.atitle=Data+Sparsity+Issues+in+the+Collaborative+Filtering+Framework&rft.series=Lecture+Notes+in+Computer+Science&rft.date=2006-01-01&rft.pub=Springer+Berlin+Heidelberg&rft.isbn=9783540463467&rft.issn=0302-9743&rft.eissn=1611-3349&rft.spage=58&rft.epage=76&rft_id=info:doi/10.1007%2F11891321_4
thumbnail_l http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0302-9743&client=summon
thumbnail_m http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0302-9743&client=summon
thumbnail_s http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0302-9743&client=summon