Data Sparsity Issues in the Collaborative Filtering Framework
With the amount of available information on the Web growing rapidly with each day, the need to automatically filter the information in order to ensure greater user efficiency has emerged. Within the fields of user profiling and Web personalization several popular content filtering techniques have be...
Saved in:
| Published in | Advances in Web Mining and Web Usage Analysis pp. 58 - 76 |
|---|---|
| Main Authors | , , , |
| Format | Book Chapter |
| Language | English |
| Published |
Berlin, Heidelberg
Springer Berlin Heidelberg
2006
|
| Series | Lecture Notes in Computer Science |
| Online Access | Get full text |
| ISBN | 3540463461 9783540463467 |
| ISSN | 0302-9743 1611-3349 |
| DOI | 10.1007/11891321_4 |
Cover
| Abstract | With the amount of available information on the Web growing rapidly with each day, the need to automatically filter the information in order to ensure greater user efficiency has emerged. Within the fields of user profiling and Web personalization several popular content filtering techniques have been developed. In this chapter we present one of such techniques – collaborative filtering. Apart from giving an overview of collaborative filtering approaches, we present the experimental results of confronting the k-Nearest Neighbor (kNN) algorithm with Support Vector Machine (SVM) in the collaborative filtering framework using datasets with different properties. While the k-Nearest Neighbor algorithm is usually used for collaborative filtering tasks, Support Vector Machine is considered a state-of-the-art classification algorithm. Since collaborative filtering can also be interpreted as a classification/regression task, virtually any supervised learning algorithm (such as SVM) can also be applied. Experiments were performed on two standard, publicly available datasets and, on the other hand, on a real-life corporate dataset that does not fit the profile of ideal data for collaborative filtering. We conclude that the quality of collaborative filtering recommendations is highly dependent on the sparsity of available data. Furthermore, we show that kNN is dominant on datasets with relatively low sparsity while SVM-based approaches may perform better on highly sparse data. |
|---|---|
| AbstractList | With the amount of available information on the Web growing rapidly with each day, the need to automatically filter the information in order to ensure greater user efficiency has emerged. Within the fields of user profiling and Web personalization several popular content filtering techniques have been developed. In this chapter we present one of such techniques – collaborative filtering. Apart from giving an overview of collaborative filtering approaches, we present the experimental results of confronting the k-Nearest Neighbor (kNN) algorithm with Support Vector Machine (SVM) in the collaborative filtering framework using datasets with different properties. While the k-Nearest Neighbor algorithm is usually used for collaborative filtering tasks, Support Vector Machine is considered a state-of-the-art classification algorithm. Since collaborative filtering can also be interpreted as a classification/regression task, virtually any supervised learning algorithm (such as SVM) can also be applied. Experiments were performed on two standard, publicly available datasets and, on the other hand, on a real-life corporate dataset that does not fit the profile of ideal data for collaborative filtering. We conclude that the quality of collaborative filtering recommendations is highly dependent on the sparsity of available data. Furthermore, we show that kNN is dominant on datasets with relatively low sparsity while SVM-based approaches may perform better on highly sparse data. |
| Author | Mladenič, Dunja Grčar, Miha Fortuna, Blaž Grobelnik, Marko |
| Author_xml | – sequence: 1 givenname: Miha surname: Grčar fullname: Grčar, Miha email: miha.grcar@ijs.si organization: Jožef Stefan Institute, Ljubljana, Slovenia – sequence: 2 givenname: Dunja surname: Mladenič fullname: Mladenič, Dunja organization: Jožef Stefan Institute, Ljubljana, Slovenia – sequence: 3 givenname: Blaž surname: Fortuna fullname: Fortuna, Blaž organization: Jožef Stefan Institute, Ljubljana, Slovenia – sequence: 4 givenname: Marko surname: Grobelnik fullname: Grobelnik, Marko organization: Jožef Stefan Institute, Ljubljana, Slovenia |
| BookMark | eNpFkLFOwzAURQ0UiaSw8AUeWQLv-TmxPTCglkClSgzAHNmJA6EhruIA4u-hAqm6wx2udHR1UjYbwuAZO0e4RAB1hagNksBKHrCUcgmyIKn1IUuwQMyIpDnaDwXOWAIEIjNK0glLY3wDAKGMSNj10k6WP27tGLvpm69i_PCRdwOfXj1fhL63Lox26j49L7t-8mM3vPBytO_-K4ybU3bc2j76s_-es-fy9mlxn60f7laLm3UWEUhmhamFq2tqf4NGNJJQ1Mq3jUItndEqF1I6cnlRK2i1dlr5hloF0mqoBdCcXfxx43Z3wI-VC2ETK4RqJ6TaC6EfaHBOwg |
| ContentType | Book Chapter |
| Copyright | Springer-Verlag Berlin Heidelberg 2006 |
| Copyright_xml | – notice: Springer-Verlag Berlin Heidelberg 2006 |
| DOI | 10.1007/11891321_4 |
| DatabaseTitleList | |
| DeliveryMethod | fulltext_linktorsrc |
| Discipline | Engineering Library & Information Science Computer Science |
| EISBN | 3540463488 9783540463481 |
| EISSN | 1611-3349 |
| Editor | Spiliopoulou, Myra Nasraoui, Olfa Mobasher, Bamshad Yu, Philip S. Zaïane, Osmar Masand, Brij |
| Editor_xml | – sequence: 1 givenname: Olfa surname: Nasraoui fullname: Nasraoui, Olfa email: olfa.nasraoui@louisville.edu – sequence: 2 givenname: Osmar surname: Zaïane fullname: Zaïane, Osmar email: zaiane@cs.ualberta.ca – sequence: 3 givenname: Myra surname: Spiliopoulou fullname: Spiliopoulou, Myra email: myra@iti.cs.uni-magdeburg.de – sequence: 4 givenname: Bamshad surname: Mobasher fullname: Mobasher, Bamshad email: mobasher@cti.depaul.edu – sequence: 5 givenname: Brij surname: Masand fullname: Masand, Brij email: brij@data-miners.com – sequence: 6 givenname: Philip S. surname: Yu fullname: Yu, Philip S. email: psyu@cs.uic.com |
| EndPage | 76 |
| GroupedDBID | -DT -GH -~X 1SB 29L 2HA 2HV 5QI 875 AASHB ABMNI ACGFS ADCXD AEFIE ALMA_UNASSIGNED_HOLDINGS EJD F5P FEDTE HVGLF LAS LDH P2P RNI RSU SVGTG VI1 ~02 |
| ID | FETCH-LOGICAL-s1034-69c2bcc3f3f3192d4312c7efd7184b9875244b3b56c70f88b87ed3f704a80c203 |
| ISBN | 3540463461 9783540463467 |
| ISSN | 0302-9743 |
| IngestDate | Wed Sep 17 03:30:20 EDT 2025 |
| IsPeerReviewed | false |
| IsScholarly | false |
| Language | English |
| LinkModel | OpenURL |
| MergedId | FETCHMERGED-LOGICAL-s1034-69c2bcc3f3f3192d4312c7efd7184b9875244b3b56c70f88b87ed3f704a80c203 |
| PageCount | 19 |
| ParticipantIDs | springer_books_10_1007_11891321_4 |
| PublicationCentury | 2000 |
| PublicationDate | 2006 |
| PublicationDateYYYYMMDD | 2006-01-01 |
| PublicationDate_xml | – year: 2006 text: 2006 |
| PublicationDecade | 2000 |
| PublicationPlace | Berlin, Heidelberg |
| PublicationPlace_xml | – name: Berlin, Heidelberg |
| PublicationSeriesSubtitle | Lecture Notes in Artificial Intelligence |
| PublicationSeriesTitle | Lecture Notes in Computer Science |
| PublicationSubtitle | 7th International Workshop on Knowledge Discovery on the Web, WebKDD 2005, Chicago, IL, USA, August 21, 2005. Revised Papers |
| PublicationTitle | Advances in Web Mining and Web Usage Analysis |
| PublicationYear | 2006 |
| Publisher | Springer Berlin Heidelberg |
| Publisher_xml | – name: Springer Berlin Heidelberg |
| RelatedPersons | Kleinberg, Jon M. Mattern, Friedemann Nierstrasz, Oscar Tygar, Dough Steffen, Bernhard Kittler, Josef Vardi, Moshe Y. Weikum, Gerhard Sudan, Madhu Naor, Moni Mitchell, John C. Terzopoulos, Demetri Pandu Rangan, C. Kanade, Takeo Hutchison, David |
| RelatedPersons_xml | – sequence: 1 givenname: David surname: Hutchison fullname: Hutchison, David organization: Lancaster University, UK – sequence: 2 givenname: Takeo surname: Kanade fullname: Kanade, Takeo organization: Carnegie Mellon University, Pittsburgh, USA – sequence: 3 givenname: Josef surname: Kittler fullname: Kittler, Josef organization: University of Surrey, Guildford, UK – sequence: 4 givenname: Jon M. surname: Kleinberg fullname: Kleinberg, Jon M. organization: Cornell University, Ithaca, USA – sequence: 5 givenname: Friedemann surname: Mattern fullname: Mattern, Friedemann organization: ETH Zurich, Switzerland – sequence: 6 givenname: John C. surname: Mitchell fullname: Mitchell, John C. organization: Stanford University, CA, USA – sequence: 7 givenname: Moni surname: Naor fullname: Naor, Moni organization: Weizmann Institute of Science, Rehovot, Israel – sequence: 8 givenname: Oscar surname: Nierstrasz fullname: Nierstrasz, Oscar organization: University of Bern, Switzerland – sequence: 9 givenname: C. surname: Pandu Rangan fullname: Pandu Rangan, C. organization: Indian Institute of Technology, Madras, India – sequence: 10 givenname: Bernhard surname: Steffen fullname: Steffen, Bernhard organization: University of Dortmund, Germany – sequence: 11 givenname: Madhu surname: Sudan fullname: Sudan, Madhu organization: Massachusetts Institute of Technology, MA, USA – sequence: 12 givenname: Demetri surname: Terzopoulos fullname: Terzopoulos, Demetri organization: University of California, Los Angeles, USA – sequence: 13 givenname: Dough surname: Tygar fullname: Tygar, Dough organization: University of California, Berkeley, USA – sequence: 14 givenname: Moshe Y. surname: Vardi fullname: Vardi, Moshe Y. organization: Rice University, Houston, USA – sequence: 15 givenname: Gerhard surname: Weikum fullname: Weikum, Gerhard organization: Max-Planck Institute of Computer Science, Saarbruecken, Germany |
| SSID | ssj0002792 ssj0000316079 |
| Score | 1.3856463 |
| Snippet | With the amount of available information on the Web growing rapidly with each day, the need to automatically filter the information in order to ensure greater... |
| SourceID | springer |
| SourceType | Publisher |
| StartPage | 58 |
| Title | Data Sparsity Issues in the Collaborative Filtering Framework |
| URI | http://link.springer.com/10.1007/11891321_4 |
| hasFullText | 1 |
| inHoldings | 1 |
| isFullTextHit | |
| isPrint | |
| link | http://utb.summon.serialssolutions.com/2.0.0/link/0/eLvHCXMwnV3fb9MwELa6IiTGA1BA_BoyEuOlCnIax0keeGBdq2la98IKe6tsx9ECVTqtKQ_8F_zHnHN20jE0AaoUtW7kuP7c8_nuvjtC3qZZGhqho4DJXAY8zUWgpOCBSSRsx0VseGiJwrNTcTTnx-fxea_3cytqaVOr9_rHH3kl_4MqtAGuliX7D8i2nUIDvAd84QoIw_U35fe6mRXDi9F738SzfjFqOGtKPTTOAPtxbiPG2pwjXZhNoz0eYlj1rLxopfJsKUEElfh1I4k21VfZbVJX9QbpYwdLuT-O9w8mXZ8rZZZV-c1zf1bb6_BQ1nL46VJi8AcW-_PBleNuEX43w2lpPff2N0x9xBjKPJuLef3hxHk7Tlc19uALUnj5dKsBwxswh7fk93L2KS4ijsU7POULxDkciFBCGpTgwuZljDAPqpPKmBze7e9YbubGzoHBInDcyuB8Hi74DtmBh_bJnY-T45PPrdkOZKBg1rHpNnubfxEdVTgSRx9qRhrebS1rOPKt9Ljdg2644BvN5uwhuW_ZLtTSUGAqH5GeqQbkgZ9a6qZ2QHa3ElcOyJ6ju9B31PHZ7CL3dz8mDerUo04RdVpWFFCn11CnLeq0Rf0JmU8nZ-OjwBXqCNYhi3ggMj1SWkcFvODEkINSOtKJKXJQfLjK4EgMSqSKVCx0woo0VWli8qhIGJcp0yMWPSX9alWZZ4SqMJKCmQzaDdcjo0DjzEOWc6N5qpP4OXnjZ2th_3rrhc-73c3oi7-45yW5163FV6RfX23MHqiXtXrtIP8FrXpyKw |
| linkProvider | Library Specific Holdings |
| openUrl | ctx_ver=Z39.88-2004&ctx_enc=info%3Aofi%2Fenc%3AUTF-8&rfr_id=info%3Asid%2Fsummon.serialssolutions.com&rft_val_fmt=info%3Aofi%2Ffmt%3Akev%3Amtx%3Abook&rft.genre=bookitem&rft.title=Advances+in+Web+Mining+and+Web+Usage+Analysis&rft.au=Gr%C4%8Dar%2C+Miha&rft.au=Mladeni%C4%8D%2C+Dunja&rft.au=Fortuna%2C+Bla%C5%BE&rft.au=Grobelnik%2C+Marko&rft.atitle=Data+Sparsity+Issues+in+the+Collaborative+Filtering+Framework&rft.series=Lecture+Notes+in+Computer+Science&rft.date=2006-01-01&rft.pub=Springer+Berlin+Heidelberg&rft.isbn=9783540463467&rft.issn=0302-9743&rft.eissn=1611-3349&rft.spage=58&rft.epage=76&rft_id=info:doi/10.1007%2F11891321_4 |
| thumbnail_l | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/lc.gif&issn=0302-9743&client=summon |
| thumbnail_m | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/mc.gif&issn=0302-9743&client=summon |
| thumbnail_s | http://covers-cdn.summon.serialssolutions.com/index.aspx?isbn=/sc.gif&issn=0302-9743&client=summon |