This presentation presents a high level overview of recommender systems and active learning, including from the viewpoint of startups vs. established companies, the cold-start problem, etc.
6. Value of RS
• Amazon: 35% of sales from recommendations
• Netflix: 2/3 of the movies watched are
recommended
• Choicestream: 28% of the people would buy
more music if they found what they liked
• Google News: recommendations generate
38% more click-throughs
www.slideshare.net/kerveros99/machine-learning-for-recommender-systems-mlss-2015-sydney 6
9. •Assumption: preferences of “similar”
items/users stay similar
•Similarity: variety of ways to define
9
Common Approach
10. Use ratings to estimate “similarity”
10
Collaborative Filtering (CF)
Users
Items Ratings
Love
Like
Okay
Dislike
Hate
https://buildingrecommenders.wordpress.com/2015/11/23/overview-of-recommender-algorithms-part-5/
11. Users with similar dis/likes are similar,
e.g. if Sarah and you have similar
tastes, then anything that Sarah likes
you will too (and vice versa)
Similar items will have similar
ratings, e.g., if you liked a book A,
you will also like a book B with a
similar rating
https://buildingrecommenders.wordpress.com/2015/11/23/overview-of-recommender-algorithms-part-5/
11
Item-based CF
User-based CF
15. Established
Companies
Startups
“cruise mode”
• Many existing loyal users
• RS used to increase per-
user metrics, e.g. revenue,
profit, etc.
“launch mode”
• Still building user-base
• RS used to attract/
retain new users
15
16. Startups = Growth
“The only essential thing is
growth. Everything else we
associate with startups follows
from growth.”
(Paul Graham, Y Combinator)
16
18. 18
“Cold Start” Problem
? ? ? ? ?
?
?
?
?
?
?
• RS Needs user/item data to make
recommendations with CF
• For new users/new items,
no data is available yet:
• New item problem
• New user problem
New User
New
Item
19. • Problem: don’t have any reviews yet
(to base recommendations on)
• Solution: can use content-based item
similarity (to bootstrap recommendations)
19
New Item Problem
Jordan Jumpman Team II Air Jordan 1 Retro High
Nouveau
Hurley One And Only
Printed
Air Jordan 1 Retro
High OG
23. • Contacts:
friends may already be users
of app (likely to have similar
interests)
• Location
• Device type
• Social profile
NOTE: should not be intrusive
23
Indirect Data
26. • Recommend an item that a
user will like:
Popular items, i.e., everyone
likes (but provides little info
about user’s preferences)
• Present an item to learn about user’s
preferences (Active Learning, AL):
Contentious Items, i.e., many people
like / dislike (informative about user’s
preferences)
26
Item Selection
•RS Presents items for two primary purposes:
•In practice multiple items are shown for different
objectives
27. 27
AL Categories
• Item-based AL: analyse items and select
items that seem most informative
• Model-based AL: analyse model and
select items that seem most informative
28. • Popular: rated by many users [Rashid 2002]
• High Variance in Ratings: item that people
either like or hate [Rashid 2002]
• Best/Worst: ask user which items s/he likes
most/least [Leino & Raiha 2007]
• Influential: items on which ratings of many
other items depend (representative + not
represented) [Rubens & Sugiyama 2007]
28
Item Categories
29. c
a
b
input1
input2
d
• 3R Properties:
• Represented by the
existing training set? E.g.,
(b) is already represented
• Representative of others?
E.g., (a) is not this way
• Results in achieving
objective? E.g., (d) → max
coverage
[Rubens & Kaplan, 2010] 29
Item-based AL
36.
g: optimal function (in the sollution
space)
bf : learned function
bfi ’s: learned functions from a slightly
di⇣erent training set.
EG = B +V +C
B =
⇣
Ebf (x) g(x)
⌘2
V =
⇣
bf Ebf (x)
⌘2
C = (g(x) f (x))2
Model Error – C
constant and is ignored
Bias – B
Hard to estimate, but is assumed
to vanish (assymptotically).
Variance – V
Estimate and minize.
10 / 20
36
AL Model Error
37. Table 1: Performance comparison of active learning strategies (“XX” Very Good, “X” Good, “ ” Poor, “-” Not Available)
ML: Movielens, NF: Netflix, EM: EachMovie, AWM: Active Web Museum, MP: MyPersonality, STS: South Tyrol Suggests, LF: Last.fm
Type Strategy
Metric Eval.
Compar. Strategies Datasets
MAE/RMSE
NDCG/MAP
Precision
#Rating
Online
Offline
Non-Personalized
Single
uncertainty based
1. variance [59, 61] X - - - - y 2, 4, 6, 9, 24 AWM, EM
2. entropy [20, 67] - - - - y 3, 6, 8, 9, 11, 13, 22 EM
3. entropy0 [67] XX - - XX y y 2, 6, 8, 11, 13, 22 ML
error reduction
4. greedy extend [68] X - - - - y 2, 3, 6, 7, 10, 11 NF
5. representative [69] - XX XX - - y 6 NF, ML, LF
attention based
6. popularity [20, 67] X - - XX y y 2, 8, 9, 11, 13, 22 ML
7. co-coverage [68] - - - - y 2, 3, 4, 6, 10, 11 NF
Combined
static combin.
8. rand-pop [20, 67] - - y y 2, 3, 6, 11, 13, 22 ML
9. log(pop)*entropy [20] XX - - X y y 3, 6, 8, 13 ML
10. sqrt(pop)*var [68] X - - - - y 2, 3, 4, 6, 7, 11 NF
11. HELF [67] XX - - y y 2, 3, 6, 8, 13, 22 ML
12. non-pers-part rand. [11] X XX X - y 1, 6, 9, 12, 14, 20, 21, 28, 29 ML, NF
Personalized
Single
acquisition prob.
13. item-item [20, 67] - - XX y y 2, 3, 6, 8, 9, 11, 22 ML
14. binary-pred [11, 12] X XX X - y 1, 6, 9, 12, 20, 21, 28, 29 ML, NF
15. personality-based [70, 97] XX XX - XX y y 3, 9, 14 STS, MP
16. impact analysis [71] XX - - - - y 9 ML
prediction based
17. aspect model [72, 73] X - - - - y 2 EM, ML
18. min rating [74] X - - - - y 19,25 ML
19. min norm [74] - - - - y 18,25 ML
20. highest-pred [11, 12] X XX X - y 1, 6, 9, 12, 14, 21, 28, 29 ML, NF
21. lowest-pred [11, 12] X X - y 1, 6, 9, 12, 14, 20, 28, 29 ML, NF
user partitioning
22. IGCN [67] XX - - X y y 2, 3, 6, 8, 11, 13 ML
23. decision tree [64] XX - - - - y 3, 4, 10, 11 NF
Combined
static combin.
24. influence based [61] XX - - - - y 1, 4, 6, 9 ML
25. non-myopic [74] X - - - - y 18, 19 ML
26. treeU [75] X - - - - y 23, 27 ML, EM, NF
27. fMF [75] XX - - - - y 23, 26 ML, EM, NF
28. pers-partially rand. [11] X XX X - y 1, 6, 9, 12, 14, 20, 21, 28, 29 ML, NF
29. voting [11, 12] XX XX - y 1, 6, 9, 12, 14, 20, 21, 28 ML, NF
adaptive combin. 30. switching [76] XX XX - XX - y 9, 20, 29 ML
Mehdi Elahi, Francesco Ricci, Neil Rubens,A survey of active learning in collaborative
filtering recommender systems, Computer Science Review, Elsevier, 2016.
It is clearly shown in the table that different strategies can improve different aspects of the recom-
mendation quality. In terms of rating prediction accuracy (MAE/RMSE), there are various strategies that
have shown excellent performance. While, some of these strategies are easy to implement (e.g., Entropy0
and Log(popularity)*Entropy), others are more complex and use more sophisticated Machine Learning
algorithms (e.g., Decision Tree, and Personality-based FM). Strategies that have shown excellent per-
formance in terms of ranking quality (NDCG/MAP), are Representative-based and Voting strategies.
In terms of precision, prediction-based strategies (Highest-predicted, and Binary-predicted) have shown
excellent performance. In terms of number of ratings acquired (# Ratings), as expected, strategies that
consider the popularity of items (Popularity and Entropy0) can acquire the largest number of ratings.
But, other strategies that maximize the chance that the selected items are familiar to the user (Item-item
and Personality-based) can also elicit a considerable number of ratings. For these strategies the success
ratio (#acquired_ratings/#requested_items) is the largest. This is an important factor, since strategies
that only focus on the informativeness of the items may fail to actually acquire ratings, by selecting
obscure items that users do not know and cannot rate.
Table 1: Performance comparison of active learning strategies (“XX” Very Good, “X” Good, “ ” Poor, “-” Not Available)
ML: Movielens, NF: Netflix, EM: EachMovie, AWM: Active Web Museum, MP: MyPersonality, STS: South Tyrol Suggests, LF: Last.fm
Metric Eval.
Tailored to:
•different
objectives
•different
data &
settings
37
MANY AL-RS APPROACHES
39. Take-home Messages
• RS shows users items they want
• RS accounts for a large portion of purchases
• RS methods: user/item-based
• RS is crucial for user growth, and:
• addressing new items/users (“cold start”) with:
• indirect data acquisition
• content-based item similarity
• informative item selection with AL
• Many RS components could be tuned to achieve high
performance