(Presented at the Deep Learning Re-Work SF Summit on 01/25/2018)
In this talk, we go through the traditional recommendation systems set-up, and show that deep learning approaches in that set-up don't bring a lot of extra value. We then focus on different ways to leverage these techniques, most of which relying on breaking away from that traditional set-up; through providing additional data to your recommendation algorithm, modeling different facets of user/item interactions, and most importantly re-framing the recommendation problem itself. In particular we show a few results obtained by casting the problem as a contextual sequence prediction task, and using it to model time (a very important dimension in most recommendation systems).
1. Deep Learning for
Recommender Systems
Yves Raimond & Justin Basilico
January 25, 2017
Re·Work Deep Learning Summit San Francisco
@moustaki @JustinBasilico
2. The value of recommendations
● A few seconds to find something
great to watch…
● Can only show a few titles
● Enjoyment directly impacts
customer satisfaction
● Generates over $1B per year of
Netflix revenue
● How? Personalize everything
8. A quick & dirty experiment
● MovieLens-20M
○ Binarized ratings
○ Two weeks validation, two weeks test
● Comparing two models
○ ‘Standard’ MF, with hyperparameters:
■ L2 regularization
■ Rank
○ Feed-forward net, with hyperparameters:
■ L2 regularization (for all layers + embeddings)
■ Embeddings dimensionality
■ Number of hidden layers
■ Hidden layer dimensionalities
■ Activations
● After hyperparameter search for both models, what do we get?
9.
10.
11. What’s going on?
● Very similar models: representation learning
through embeddings, MSE loss, gradient-based
optimization
● Main difference is that we can learn a different
embedding combination than a dot product
● … but embeddings are arbitrary representations
● … and capturing pairwise interactions through a
feed-forward net requires a very large model
12. Conclusion?
● Not much benefit in the ‘traditional’
recommendation setup of a deep versus a
properly tuned model
● … Is this talk over?
13. Breaking the ‘traditional’ recsys setup
● Adding extra data / inputs
● Modeling different facets of users and items
● Alternative framings of the problem
15. Content-based side information
● VBPR: helping cold-start by augmenting item
factors with visual factors from CNNs [He et. al.,
2015]
● Content2Vec [Nedelec et. al., 2017]
● Learning to approximate MF item embeddings
from content [Dieleman, 2014]
16. Metadata-based side information
● Factorization Machines [Rendle, 2010] with
side-information
○ Extending the factorization framework to an arbitrary
number of inputs
● Meta-Prod2Vec [Vasile et. al., 2016]
○ Regularize item embeddings using side-information
● DCF [Li et al., 2016]
● Using associated textual information for
recommendations [Bansal et. al., 2016]
17. YouTube Recommendations
● Two stage ranker:
candidate generation
(shrinking set of items to
rank) and ranking
(classifying actual
impressions)
● Two feed-forward, fully
connected, networks with
hundreds of features
[Covington et. al., 2016]
19. Restricted Boltzmann Machines
● RBMs for Collaborative
Filtering [Salakhutdinov,
Minh & Hinton, 2007]
● Part of the ensemble that
won the $1M Netflix Prize
● Used in our rating
prediction system for
several years
20. Auto-encoders
● RBMs are hard to train
● CF-NADE [Zheng et al., 2016]
○ Define (random) orderings over conditionals
and model with a neural network
● Denoising auto-encoders: CDL [Wang et
al., 2015], CDAE [Wu et al., 2016]
● Variational auto-encoders [Liang et al.,
2017]
21. (*)2Vec
● Prod2Vec [Grbovic et al., 2015],
Item2Vec [Barkan & Koenigstein,
2016], Pin2Vec [Ma, 2017]
● Item-item co-occurrence
factorization (instead of user-item
factorization)
● The two approaches can be
blended [Liang et al., 2016]
prod2vec
(Skip-gram)
user2vec
(Continuous Bag of Words)
22. Wide + Deep models
● Wide model:
memorize
sparse, specific
rules
● Deep model:
generalize to
similar items via
embeddings
[Cheng et. al., 2016]
Deep Wide
(many parameters due
to cross product)
24. Sequence prediction
● Treat recommendations as a
sequence classification problem
○ Input: sequence of user actions
○ Output: next action
● E.g. Gru4Rec [Hidasi et. al., 2016]
○ Input: sequence of items in a
sessions
○ Output: next item in the session
● Also co-evolution: [Wu et al., 2017],
[Dai et al., 2017]
25. Contextual sequence prediction
● Input: sequence of contextual user actions, plus
current context
● Output: probability of next action
● E.g. “Given all the actions a user has taken so far,
what’s the most likely video they’re going to play right
now?”
● e.g. [Smirnova & Vasile, 2017], [Beutel et. al., 2018]
26. Contextual sequence data
2017-12-10 15:40:22
2017-12-23 19:32:10
2017-12-24 12:05:53
2017-12-27 22:40:22
2017-12-29 19:39:36
2017-12-30 20:42:13
Context ActionSequence
per user
?
Time
27. Time-sensitive sequence prediction
● Recommendations are actions at a moment in time
○ Proper modeling of time and system dynamics is critical
● Experiment on a Netflix internal dataset
○ Context:
■ Discrete time
● Day-of-week: Sunday, Monday, …
● Hour-of-day
■ Continuous time (Timestamp)
○ Predict next play (temporal split data)
29. Other framings
● Causality in recommendations
○ Explicitly modeling the consequence of a recommender systems’ intervention
[Schnabel et al., 2016]
● Recommendation as question answering
○ E.g. “I loved Billy Madison, My Neighbor Totoro, Blades of Glory, Bio-Dome,
Clue, and Happy Gilmore. I’m looking for a Music movie.” [Dodge et al., 2016]
● Deep Reinforcement Learning for
recommendations [Zhao et al, 2017]
31. Takeaways
● Deep Learning can work well for Recommendations... when you go
beyond the classic problem definition
● Similarities between DL and MF are a good thing: Lots of MF work
can be translated to DL
● Lots of open areas to improve recommendations using deep
learning
● Think beyond solving existing problems with new tools and instead
what new problems they can solve
32. More Resources
● RecSys 2017 tutorial by Karatzoglou and Hidasi
● RecSys Summer School slides by Hidasi
● DLRS Workshop 2016, 2017
● Recommenders Shallow/Deep by Sudeep Das
● Survey paper by Zhang, Yao & Sun
● GitHub repo of papers by Nandi