Coping with the Persistent Coldstart Problem

Coping with the Persistent Coldstart
Problem
Siarhei Bykau, Georgia Koutrika, Yannis Velegrakis
PersDB, 30.08.2013

Siarhei Bykau, U of Trento 2
Recommendation Systems
● Amazon (products)
● Netflix (movies)
● Facebook (friends)
● Google (news)
● Twitter (who to follow)

Recommendation Approaches
● Content-based filtering (CB)
– build user's profile & look for similar items
● Collaborative filtering (CF)
– find users with similar tastes
● Hybrid
– combine previous two

Course Evaluations
cid year area instructor trimester exam student rating
cs343 2011 DB Fox 1 written s5 avg
cs343 2010 DB Fox 1 written s6 low
cs241 2010 PL Smith 2 oral s9 avg
cs241 2011 PL Smith 2 oral s5 low
cs241 2011 PL Smith 2 oral s1 high
cs120 2008 OS Fox 1 oral s4 low
cs120 2009 OS Fox 1 oral s4 high
cs400 2010 DB Newton 3 oral s20 high

Course Evaluations
cs241 2012 PL Smith 2 oral s19 ?

cs241 2012 PL Smith 2 oral s19 ?
Course Evaluations

Course Evaluations
cs421 2012 DB Fox 3 oral s19 ?

Cold-Start Problem
Existing items
New items
Existing users New users
● Collaborative filtering
● Content based filtering
● Hybrid approaches
● SVD
● ...
recommend highly-rated
items to new users
recommend new items to
existing users based on the
users’ historical ratings and
features of items
We are here

Cold-Start: Existing Approaches
● Random recommendations
● External knowledge
– social network [Guy et al. 2009]
– trust network [Jamali et al. 2010]
– ontologies [Middleton et al. 2002]
● Interviews [Rashid et al. 2002]
● Pairwise regression [Park et al. 2009]

Similarity Based Predictions
● Similar items have similar ratings:
● Similarity between two items:
● Pick only topK similar items

Feature Based Prediction
● Rating transfers equally to ratings of features
● Rating of a feature:
● Prediction is the average of feature ratings:

Course Evaluations

Preference Pattern

Preference Pattern
<<DB,Fox>,avg> pattern frequency is 2/11

Entropy Based Prediction
1. model features and ratings as variables:
2. introduce a joint distribution of features and ratings
to model observations:
3. Generalized Iterative Scaling (GIS) is used to find
which satisfies frequent preference patterns
4. use to predict missing ratings:

Max Entropy Intuition

Metrics
● Predictability
– Root Mean Square Error (individual rating accuracy)
– Normalized Discounted Cumulative Gain (accuracy in
order)
● Coverage

Datasets
● Stanford Courses
– from 1997 to 2008
– 9799 ratings
– 675 courses
– 193 instructors
– features: title, description, department
● MovieLens
– 100K ratings
– 1000 users
– 1700 movies
– 42000 unique features (39 features per movie in average)

Algorithms
● Similarity-based
● Feature-based
● Max entropy
● Linear regression [Park et al. 2009]

Accuracy/Coverage for Varying
Training Data Size (Stanford)

Average/Coverage for Varying Density of Features
(MovieLens)

Conclusions
● Addressed the new-user new-item cold start
problem
● Proposed a number of algorithms:
– Similarity-based
– Feature-based
– Max entropy
● Experimental evaluation showed a high
effectiveness of the algorithms (Max entropy is the
best)

Coping with the Persistent Coldstart Problem

Recomendados

Recomendados

Más contenido relacionado

Último

Último (20)

Destacado

Destacado (20)

Coping with the Persistent Coldstart Problem