SlideShare una empresa de Scribd logo
1 de 17
Descargar para leer sin conexión
Comparing State-of-the-Art
                Collaborative Filtering Systems


                         Laurent Candillier, Frank Meyer, Marc Boull´
                                                                    e
Introduction

                                France Telecom R&D Lannion
Collaborative
approaches
                                         MLDM 2007
Experiments

Conclusions



                 1 Introduction

                 2 Collaborative approaches

                 3 Experiments

                 4 Conclusions
Recommender systems


                Help users find items they should appreciate from huge
                catalogues [Adomavicius and Tuzhilin, 2005]
Introduction

Collaborative
approaches
                ⇒ Collaborative filtering : based on user to item rating matrix
Experiments

Conclusions
                                        i1   i2   i3   i4   i5
                                         4    4              1
                                  u1
                                         4    3
                                  u2
                                         5             2    1
                                  u3
                                                       4    5
                                  u4
                                                  5    4
                                  u5
                                             5         3
                                  u6
                                        4    ?              1
                                  u7
User-based approaches

                Recommend items appreciated by users whose tastes are similar
                to the ones of the given user [Resnick et al., 1994]
Introduction

                ⇒ need a similarity measure between users
Collaborative
approaches
                ex : pearson similarity : cosine of deviation from the mean
Experiments

Conclusions

                                          i ∈Sa ∩Su (vai     − va )(vui − vu )
                    w (a, u) =
                                                     − va )2                         − vu )2
                                    i ∈Sa ∩Su (vai                  i ∈Sa ∩Su (vui

                    vui : rating of user u on item i
                    Su : set of items rated by user u
                    vu : mean rating of user u

                                                              vui
                                                     i ∈Su
                                        vu =
                                                      |Su |
User-based approaches



                Which rating for user a (active) on item i ?
Introduction

Collaborative
approaches
                Prediction using weighted sum
Experiments


                                         {u|i ∈Su } w (a, u) × vui
Conclusions
                                pai =
                                            {u|i ∈Su } |w (a, u)|

                Prediction using weighted sum of deviations from the mean

                                        {u|i ∈Su } w (a, u)   × (vui − vu )
                          pai = va +
                                               {u|i ∈Su } |w (a, u)|

                How many neighbors considered ?
Cluster-based approaches



                Recommend items appreciated by users that belong to the
Introduction

                same group as the given user [Breese et al., 1998]
Collaborative
approaches

Experiments
                ⇒ need
Conclusions
                    a clustering method : ex : K-means
                    a distance measure : ex : euclidian distance

                Then the rating of a user on an item is the mean rating given
                by the users that belong to the same cluster

                How many clusters considered ?
Item-based approaches


                Recommend items similar to those appreciated by the given
                user [Karypis, 2001]
Introduction

Collaborative
approaches
                ⇒ dual of user-based approach
Experiments

Conclusions
                                                                 × (vaj − vj )
                                       {j∈Sa |j=i } sim(i , j)
                         pai = vi +
                                              {j∈Sa |j=i } |sim(i , j)|

                    sim(i , j) : similarity measure between items i and j
                    Sa : set of items rated by user a
                    vi : mean rating on item i


                How many neighbors considered ?
Experiments

                For user- and item-based approaches, choose
                     similarity measure
                     prediction scheme
Introduction

Collaborative
                     neighborhood size K
approaches

                For cluster-based approaches, choose
Experiments

                     distance measure
Conclusions

                     prediction scheme
                     number of clusters
                Evaluation protocol [Herlocker et al., 2004]
                     movie rating dataset : MovieLens (6040 × 3706)
                     10-fold cross validation (10 × 9/10th for learning)
                     Mean Absolute Error Rate on test set T = {(u, i , r )}
                                             1
                                   MAE =                         |pui − r |
                                            |T |
                                                   (u,i ,r )∈T
User-based approaches, similarity measures



                        MAE
Introduction
                                                         Pearson
Collaborative
                                                       Constraint
approaches
                         0.8                              Cosine
Experiments
                                                        Adjusted
Conclusions
                                                           Proba
                        0.76


                        0.72


                        0.68

                               0   500   1000   1500   2000   2500   K
User-based approaches, prediction schemes



                        MAE
Introduction
                                                 PearsonWeighted
Collaborative
                                                 PearsonDeviation
approaches
                         0.8                      ProbaWeighted
Experiments
                                                  ProbaDeviation
Conclusions

                        0.76


                        0.72


                        0.68

                               0   500   1000   1500   2000   2500   K
Item-based approaches, similarity measures



                        MAE
Introduction
                                                             Pearson
Collaborative
                                                           Constraint
approaches
                        0.76                                  Cosine
Experiments
                                                            Adjusted
Conclusions
                                                               Proba
                        0.72


                        0.68


                        0.64

                               0   200   400   600   800 1000 1200 1400   K
Summary of experiments


                                        BestDefault   BestUser   BestItem   BestCluster
Introduction       model construction
                                            1           730        170         254
                     time (in sec.)
Collaborative
                    prediction time
approaches
                                            1           31          3           1
                        (in sec.)
Experiments

                         MAE              0.6829      0.6688      0.6382      0.6736
Conclusions




                    BestDefault : Bayes minimizing MAE
                    BestUser : pearson similarity, 1500 neighbors, prediction
                    using deviation from the mean
                    BestItem : probabilistic similarity, 400 neighbors,
                    prediction using deviation from the mean
                    BestCluster : K-means, euclidian distance, 4 clusters,
                    prediction using Bayes minimizing MAE
Conclusions



Introduction

Collaborative
                    All approaches, and all their possible options, are tested
approaches
                    under exactly the same conditions
Experiments

                    Bayes is a good compromise : low error rate, low
Conclusions

                    execution time, incremental
                    Deviation from the mean : better results, new for
                    item-based approaches
                    Similarity measures : pearson for user-based, probabilistic
                    for item-based
Conclusions



                The item-based approach
Introduction

Collaborative
                    get the best performances in the experiments
approaches

                    seems to need fewer neighbors than user-based approach
Experiments

Conclusions
                    is also appropriate to navigate in item catalogues even
                    with no user information
                    may naturally use content data about items to improve its
                    results (idem for user-based approach with demographic
                    data)
                    results depend on the number of items compared to the
                    number of users ?
Next



                Need to scale well even when faced with huge datasets
Introduction

                ex : netflix prize : 100,480,507 ratings from 480,189 users on
Collaborative
approaches
                17,770 movies
Experiments

                    select most relevant users [Yu et al., 2002]
Conclusions

                    reduce dimensionality with PCA or SVD
                    [Goldberg et al., 2001, Vozalis and Margaritis, 2005]
                    create a set of super-users [Rashid et al., 2006]
                    sampling ? stochastic ? bagging ?


                Combine approaches ⇒ ensemble methods [Polikar, 2006]
P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom and J.
                Riedl (1994)
                Grouplens: an open architecture for collaborative filtering
Introduction

                of netnews
Collaborative
approaches
                In Conference on Computer Supported Cooperative Work,
Experiments
                pages 175–186. ACM
Conclusions

                J. Breese, D. Heckerman and C. Kadie (1998)
                Empirical analysis of predictive algorithms for collaborative
                filtering
                In 14th Conference on Uncertainty in Artificial Intelligence,
                pages 43–52. Morgan Kaufman
                G. Karypis (2001)
                Evaluation of item-based top-N recommendation
                algorithms
In 10th International Conference on Information and
                Knowledge Management, pages 247–254
                K. Goldberg, T. Roeder, D. Gupta and C. Perkins (2001)
Introduction

                Eigentaste: a constant time collaborative filtering
Collaborative
approaches
                algorithm
Experiments
                Information Retrieval, 4(2):133–151
Conclusions

                K. Yu, X. Xu, J. Tao, M. Ester and H. Kriegel (2002)
                Instance selection techniques for memory-based
                collaborative filtering
                In SIAM Data Mining
                J. Herlocker, J. Konstan, L. Terveen and J. Riedl (2004)
                Evaluating collaborative filtering recommender systems
                ACM Transactions on Information Systems, 22(1):5–53
                G. Adomavicius and A. Tuzhilin (2005)
Toward the next generation of recommender systems: a
                survey of the state-of-the-art and possible extensions
                IEEE Transactions on Knowledge and Data Engineering,
Introduction
                17(6):734–749
Collaborative
approaches
                M. Vozalis and K. Margaritis (2005)
Experiments
                Applying SVD on item-based filtering
Conclusions

                In 5th International Conference on Intelligent Systems
                Design and Applications, pages 464–469
                A.M. Rashid, S.K. Lam, G. Karypis and J. Riedl (2006)
                ClustKNN: a highly scalable hybrid model- &
                memory-based CF algorithm
                In KDD Workshop on Web Mining and Web Usage Analysis
                R. Polikar (2006)
                Ensemble systems in decision making
                IEEE Circuits & Systems Magazine, 6(3):21–45

Más contenido relacionado

Similar a Comparing State-of-the-Art Collaborative Filtering Systems

Harnessing Ratings and Aspect-Sentiment to Estimate Contradiction Intensity i...
Harnessing Ratings and Aspect-Sentiment to Estimate Contradiction Intensity i...Harnessing Ratings and Aspect-Sentiment to Estimate Contradiction Intensity i...
Harnessing Ratings and Aspect-Sentiment to Estimate Contradiction Intensity i...
Ismail BADACHE
 
Experimental research design.revised
Experimental research design.revisedExperimental research design.revised
Experimental research design.revised
Franz Dalluay
 
Finding and Quantifying Temporal-Aware Contradiction in Reviews
Finding and Quantifying Temporal-Aware Contradiction in ReviewsFinding and Quantifying Temporal-Aware Contradiction in Reviews
Finding and Quantifying Temporal-Aware Contradiction in Reviews
Ismail BADACHE
 

Similar a Comparing State-of-the-Art Collaborative Filtering Systems (20)

Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
Additive Smoothing for Relevance-Based Language Modelling of Recommender Syst...
 
Language Models for Collaborative Filtering Neighbourhoods [ECIR '16 Slides]
Language Models for Collaborative Filtering Neighbourhoods [ECIR '16 Slides]Language Models for Collaborative Filtering Neighbourhoods [ECIR '16 Slides]
Language Models for Collaborative Filtering Neighbourhoods [ECIR '16 Slides]
 
Matrix Factorization Technique for Recommender Systems
Matrix Factorization Technique for Recommender SystemsMatrix Factorization Technique for Recommender Systems
Matrix Factorization Technique for Recommender Systems
 
Computing Neighbourhoods with Language Models in a Collaborative Filtering Sc...
Computing Neighbourhoods with Language Models in a Collaborative Filtering Sc...Computing Neighbourhoods with Language Models in a Collaborative Filtering Sc...
Computing Neighbourhoods with Language Models in a Collaborative Filtering Sc...
 
Harnessing Ratings and Aspect-Sentiment to Estimate Contradiction Intensity i...
Harnessing Ratings and Aspect-Sentiment to Estimate Contradiction Intensity i...Harnessing Ratings and Aspect-Sentiment to Estimate Contradiction Intensity i...
Harnessing Ratings and Aspect-Sentiment to Estimate Contradiction Intensity i...
 
Experimental research design.revised
Experimental research design.revisedExperimental research design.revised
Experimental research design.revised
 
Replicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender SystemsReplicable Evaluation of Recommender Systems
Replicable Evaluation of Recommender Systems
 
Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)Metabolomic Data Analysis Workshop and Tutorials (2014)
Metabolomic Data Analysis Workshop and Tutorials (2014)
 
Calibration approach for parameter estimation.pptx
Calibration approach for parameter estimation.pptxCalibration approach for parameter estimation.pptx
Calibration approach for parameter estimation.pptx
 
Probabilistic Collaborative Filtering with Negative Cross Entropy
Probabilistic Collaborative Filtering with Negative Cross EntropyProbabilistic Collaborative Filtering with Negative Cross Entropy
Probabilistic Collaborative Filtering with Negative Cross Entropy
 
The Impact of Formative Assessment on EFL Learners’ Vocabulary Enhancement by...
The Impact of Formative Assessment on EFL Learners’ Vocabulary Enhancement by...The Impact of Formative Assessment on EFL Learners’ Vocabulary Enhancement by...
The Impact of Formative Assessment on EFL Learners’ Vocabulary Enhancement by...
 
Introduzione ai differenti approcci alla stima dell'incertezza di misura Nari...
Introduzione ai differenti approcci alla stima dell'incertezza di misura Nari...Introduzione ai differenti approcci alla stima dell'incertezza di misura Nari...
Introduzione ai differenti approcci alla stima dell'incertezza di misura Nari...
 
BEARS: Towards an Evaluation Framework for Bandit-based Interactive Recommend...
BEARS: Towards an Evaluation Framework for Bandit-based Interactive Recommend...BEARS: Towards an Evaluation Framework for Bandit-based Interactive Recommend...
BEARS: Towards an Evaluation Framework for Bandit-based Interactive Recommend...
 
Your Classifier is Secretly an Energy based model and you should treat it lik...
Your Classifier is Secretly an Energy based model and you should treat it lik...Your Classifier is Secretly an Energy based model and you should treat it lik...
Your Classifier is Secretly an Energy based model and you should treat it lik...
 
IRJET- Effectiveness of Constructivist Instructional Approach on Achievem...
IRJET-  	  Effectiveness of Constructivist Instructional Approach on Achievem...IRJET-  	  Effectiveness of Constructivist Instructional Approach on Achievem...
IRJET- Effectiveness of Constructivist Instructional Approach on Achievem...
 
Machine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.pptMachine Learning and Artificial Neural Networks.ppt
Machine Learning and Artificial Neural Networks.ppt
 
Crystallization classification semisupervised
Crystallization classification semisupervisedCrystallization classification semisupervised
Crystallization classification semisupervised
 
Finding and Quantifying Temporal-Aware Contradiction in Reviews
Finding and Quantifying Temporal-Aware Contradiction in ReviewsFinding and Quantifying Temporal-Aware Contradiction in Reviews
Finding and Quantifying Temporal-Aware Contradiction in Reviews
 
A Novel Nonadditive Collaborative-Filtering Approach Using Multicriteria Ratings
A Novel Nonadditive Collaborative-Filtering Approach Using Multicriteria RatingsA Novel Nonadditive Collaborative-Filtering Approach Using Multicriteria Ratings
A Novel Nonadditive Collaborative-Filtering Approach Using Multicriteria Ratings
 
A new similarity measurement based on hellinger distance for collaborating fi...
A new similarity measurement based on hellinger distance for collaborating fi...A new similarity measurement based on hellinger distance for collaborating fi...
A new similarity measurement based on hellinger distance for collaborating fi...
 

Más de nextlib

Hadoop Map Reduce Arch
Hadoop Map Reduce ArchHadoop Map Reduce Arch
Hadoop Map Reduce Arch
nextlib
 
D Rb Silicon Valley Ruby Conference
D Rb   Silicon Valley Ruby ConferenceD Rb   Silicon Valley Ruby Conference
D Rb Silicon Valley Ruby Conference
nextlib
 
Multi-core architectures
Multi-core architecturesMulti-core architectures
Multi-core architectures
nextlib
 
Aldous Huxley Brave New World
Aldous Huxley Brave New WorldAldous Huxley Brave New World
Aldous Huxley Brave New World
nextlib
 
Social Graph
Social GraphSocial Graph
Social Graph
nextlib
 
Ajax Prediction
Ajax PredictionAjax Prediction
Ajax Prediction
nextlib
 
SVD review
SVD reviewSVD review
SVD review
nextlib
 
Mongrel Handlers
Mongrel HandlersMongrel Handlers
Mongrel Handlers
nextlib
 
Blue Ocean Strategy
Blue Ocean StrategyBlue Ocean Strategy
Blue Ocean Strategy
nextlib
 
日本7-ELEVEN消費心理學
日本7-ELEVEN消費心理學日本7-ELEVEN消費心理學
日本7-ELEVEN消費心理學
nextlib
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithms
nextlib
 
Agile Adoption2007
Agile Adoption2007Agile Adoption2007
Agile Adoption2007
nextlib
 
Modern Compiler Design
Modern Compiler DesignModern Compiler Design
Modern Compiler Design
nextlib
 
透过众神的眼睛--鸟瞰非洲
透过众神的眼睛--鸟瞰非洲透过众神的眼睛--鸟瞰非洲
透过众神的眼睛--鸟瞰非洲
nextlib
 
Improving Quality of Search Results Clustering with Approximate Matrix Factor...
Improving Quality of Search Results Clustering with Approximate Matrix Factor...Improving Quality of Search Results Clustering with Approximate Matrix Factor...
Improving Quality of Search Results Clustering with Approximate Matrix Factor...
nextlib
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
nextlib
 
Bigtable
BigtableBigtable
Bigtable
nextlib
 

Más de nextlib (20)

Nio
NioNio
Nio
 
Hadoop Map Reduce Arch
Hadoop Map Reduce ArchHadoop Map Reduce Arch
Hadoop Map Reduce Arch
 
D Rb Silicon Valley Ruby Conference
D Rb   Silicon Valley Ruby ConferenceD Rb   Silicon Valley Ruby Conference
D Rb Silicon Valley Ruby Conference
 
Multi-core architectures
Multi-core architecturesMulti-core architectures
Multi-core architectures
 
Aldous Huxley Brave New World
Aldous Huxley Brave New WorldAldous Huxley Brave New World
Aldous Huxley Brave New World
 
Social Graph
Social GraphSocial Graph
Social Graph
 
Ajax Prediction
Ajax PredictionAjax Prediction
Ajax Prediction
 
Closures for Java
Closures for JavaClosures for Java
Closures for Java
 
A Content-Driven Reputation System for the Wikipedia
A Content-Driven Reputation System for the WikipediaA Content-Driven Reputation System for the Wikipedia
A Content-Driven Reputation System for the Wikipedia
 
SVD review
SVD reviewSVD review
SVD review
 
Mongrel Handlers
Mongrel HandlersMongrel Handlers
Mongrel Handlers
 
Blue Ocean Strategy
Blue Ocean StrategyBlue Ocean Strategy
Blue Ocean Strategy
 
日本7-ELEVEN消費心理學
日本7-ELEVEN消費心理學日本7-ELEVEN消費心理學
日本7-ELEVEN消費心理學
 
Item Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation AlgorithmsItem Based Collaborative Filtering Recommendation Algorithms
Item Based Collaborative Filtering Recommendation Algorithms
 
Agile Adoption2007
Agile Adoption2007Agile Adoption2007
Agile Adoption2007
 
Modern Compiler Design
Modern Compiler DesignModern Compiler Design
Modern Compiler Design
 
透过众神的眼睛--鸟瞰非洲
透过众神的眼睛--鸟瞰非洲透过众神的眼睛--鸟瞰非洲
透过众神的眼睛--鸟瞰非洲
 
Improving Quality of Search Results Clustering with Approximate Matrix Factor...
Improving Quality of Search Results Clustering with Approximate Matrix Factor...Improving Quality of Search Results Clustering with Approximate Matrix Factor...
Improving Quality of Search Results Clustering with Approximate Matrix Factor...
 
Support Vector Machines
Support Vector MachinesSupport Vector Machines
Support Vector Machines
 
Bigtable
BigtableBigtable
Bigtable
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 

Comparing State-of-the-Art Collaborative Filtering Systems

  • 1. Comparing State-of-the-Art Collaborative Filtering Systems Laurent Candillier, Frank Meyer, Marc Boull´ e Introduction France Telecom R&D Lannion Collaborative approaches MLDM 2007 Experiments Conclusions 1 Introduction 2 Collaborative approaches 3 Experiments 4 Conclusions
  • 2. Recommender systems Help users find items they should appreciate from huge catalogues [Adomavicius and Tuzhilin, 2005] Introduction Collaborative approaches ⇒ Collaborative filtering : based on user to item rating matrix Experiments Conclusions i1 i2 i3 i4 i5 4 4 1 u1 4 3 u2 5 2 1 u3 4 5 u4 5 4 u5 5 3 u6 4 ? 1 u7
  • 3. User-based approaches Recommend items appreciated by users whose tastes are similar to the ones of the given user [Resnick et al., 1994] Introduction ⇒ need a similarity measure between users Collaborative approaches ex : pearson similarity : cosine of deviation from the mean Experiments Conclusions i ∈Sa ∩Su (vai − va )(vui − vu ) w (a, u) = − va )2 − vu )2 i ∈Sa ∩Su (vai i ∈Sa ∩Su (vui vui : rating of user u on item i Su : set of items rated by user u vu : mean rating of user u vui i ∈Su vu = |Su |
  • 4. User-based approaches Which rating for user a (active) on item i ? Introduction Collaborative approaches Prediction using weighted sum Experiments {u|i ∈Su } w (a, u) × vui Conclusions pai = {u|i ∈Su } |w (a, u)| Prediction using weighted sum of deviations from the mean {u|i ∈Su } w (a, u) × (vui − vu ) pai = va + {u|i ∈Su } |w (a, u)| How many neighbors considered ?
  • 5. Cluster-based approaches Recommend items appreciated by users that belong to the Introduction same group as the given user [Breese et al., 1998] Collaborative approaches Experiments ⇒ need Conclusions a clustering method : ex : K-means a distance measure : ex : euclidian distance Then the rating of a user on an item is the mean rating given by the users that belong to the same cluster How many clusters considered ?
  • 6. Item-based approaches Recommend items similar to those appreciated by the given user [Karypis, 2001] Introduction Collaborative approaches ⇒ dual of user-based approach Experiments Conclusions × (vaj − vj ) {j∈Sa |j=i } sim(i , j) pai = vi + {j∈Sa |j=i } |sim(i , j)| sim(i , j) : similarity measure between items i and j Sa : set of items rated by user a vi : mean rating on item i How many neighbors considered ?
  • 7. Experiments For user- and item-based approaches, choose similarity measure prediction scheme Introduction Collaborative neighborhood size K approaches For cluster-based approaches, choose Experiments distance measure Conclusions prediction scheme number of clusters Evaluation protocol [Herlocker et al., 2004] movie rating dataset : MovieLens (6040 × 3706) 10-fold cross validation (10 × 9/10th for learning) Mean Absolute Error Rate on test set T = {(u, i , r )} 1 MAE = |pui − r | |T | (u,i ,r )∈T
  • 8. User-based approaches, similarity measures MAE Introduction Pearson Collaborative Constraint approaches 0.8 Cosine Experiments Adjusted Conclusions Proba 0.76 0.72 0.68 0 500 1000 1500 2000 2500 K
  • 9. User-based approaches, prediction schemes MAE Introduction PearsonWeighted Collaborative PearsonDeviation approaches 0.8 ProbaWeighted Experiments ProbaDeviation Conclusions 0.76 0.72 0.68 0 500 1000 1500 2000 2500 K
  • 10. Item-based approaches, similarity measures MAE Introduction Pearson Collaborative Constraint approaches 0.76 Cosine Experiments Adjusted Conclusions Proba 0.72 0.68 0.64 0 200 400 600 800 1000 1200 1400 K
  • 11. Summary of experiments BestDefault BestUser BestItem BestCluster Introduction model construction 1 730 170 254 time (in sec.) Collaborative prediction time approaches 1 31 3 1 (in sec.) Experiments MAE 0.6829 0.6688 0.6382 0.6736 Conclusions BestDefault : Bayes minimizing MAE BestUser : pearson similarity, 1500 neighbors, prediction using deviation from the mean BestItem : probabilistic similarity, 400 neighbors, prediction using deviation from the mean BestCluster : K-means, euclidian distance, 4 clusters, prediction using Bayes minimizing MAE
  • 12. Conclusions Introduction Collaborative All approaches, and all their possible options, are tested approaches under exactly the same conditions Experiments Bayes is a good compromise : low error rate, low Conclusions execution time, incremental Deviation from the mean : better results, new for item-based approaches Similarity measures : pearson for user-based, probabilistic for item-based
  • 13. Conclusions The item-based approach Introduction Collaborative get the best performances in the experiments approaches seems to need fewer neighbors than user-based approach Experiments Conclusions is also appropriate to navigate in item catalogues even with no user information may naturally use content data about items to improve its results (idem for user-based approach with demographic data) results depend on the number of items compared to the number of users ?
  • 14. Next Need to scale well even when faced with huge datasets Introduction ex : netflix prize : 100,480,507 ratings from 480,189 users on Collaborative approaches 17,770 movies Experiments select most relevant users [Yu et al., 2002] Conclusions reduce dimensionality with PCA or SVD [Goldberg et al., 2001, Vozalis and Margaritis, 2005] create a set of super-users [Rashid et al., 2006] sampling ? stochastic ? bagging ? Combine approaches ⇒ ensemble methods [Polikar, 2006]
  • 15. P. Resnick, N. Iacovou, M. Suchak, P. Bergstrom and J. Riedl (1994) Grouplens: an open architecture for collaborative filtering Introduction of netnews Collaborative approaches In Conference on Computer Supported Cooperative Work, Experiments pages 175–186. ACM Conclusions J. Breese, D. Heckerman and C. Kadie (1998) Empirical analysis of predictive algorithms for collaborative filtering In 14th Conference on Uncertainty in Artificial Intelligence, pages 43–52. Morgan Kaufman G. Karypis (2001) Evaluation of item-based top-N recommendation algorithms
  • 16. In 10th International Conference on Information and Knowledge Management, pages 247–254 K. Goldberg, T. Roeder, D. Gupta and C. Perkins (2001) Introduction Eigentaste: a constant time collaborative filtering Collaborative approaches algorithm Experiments Information Retrieval, 4(2):133–151 Conclusions K. Yu, X. Xu, J. Tao, M. Ester and H. Kriegel (2002) Instance selection techniques for memory-based collaborative filtering In SIAM Data Mining J. Herlocker, J. Konstan, L. Terveen and J. Riedl (2004) Evaluating collaborative filtering recommender systems ACM Transactions on Information Systems, 22(1):5–53 G. Adomavicius and A. Tuzhilin (2005)
  • 17. Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions IEEE Transactions on Knowledge and Data Engineering, Introduction 17(6):734–749 Collaborative approaches M. Vozalis and K. Margaritis (2005) Experiments Applying SVD on item-based filtering Conclusions In 5th International Conference on Intelligent Systems Design and Applications, pages 464–469 A.M. Rashid, S.K. Lam, G. Karypis and J. Riedl (2006) ClustKNN: a highly scalable hybrid model- & memory-based CF algorithm In KDD Workshop on Web Mining and Web Usage Analysis R. Polikar (2006) Ensemble systems in decision making IEEE Circuits & Systems Magazine, 6(3):21–45