Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep product recommendations with Keras and TensorFlow"

DEEP RECOMMENDATIONS
ANDREW CLEGG

IN A NUTSHELL
ABOUT ME
• Etsy, Pearson, Last.fm,
AstraZeneca, consulting
• Academic background:
bioinformatics, information
retrieval, natural language
processing (UCL/Birkbeck)
• Main interests: search,
recommendations,
personalization
• @andrew_clegg
• http://andrewclegg.org/

DEEP PRODUCT RECOMMENDATIONS
ABOUT THIS TALK
• Framing recommendation tasks in terms of neural networks and
deep learning
• Some network architectures from the literature
• Why use neural methods at all?

This is a broad overview, and I’m not a specialist (yet)
BEAR WITH ME, I’M A DEEP
LEARNING NOOB.
”
“

COMMON RECOMMENDATION TASKS
AND HOW TO FRAME THEM IN DEEP LEARNING TERMS

NON-PERSONALIZED, CONTEXT-BASED
SIMILAR ITEMS BASED ON CO-OCCURRENCE
• “Context” = liked by same user, bought in same order, etc.
• Item-based collaborative filtering
• Cold-start recommendations for new users
• “Related items” / “Buy these items together”
• Also useful for nearest-neighbours classification (e.g. tag prediction)
• Classically based on item-item matrix factorization

NEURAL INTERPRETATION
SIMILAR ITEMS BASED ON CO-OCCURRENCE
• Items that co-occur should have embeddings that are close in space
• Key point: similar goal to word2vec
• “Item2Vec: Neural Item Embedding for Collaborative Filtering”
• Barkan & Koenigstein (Microsoft)
• Train network to differentiate between items that did occur in same
context as training item, and items that didn’t occur (skip-gram)

SESSION-BASED, ORDER-AWARE
PREDICTING NEXT-VIEWED ITEMS
• Contextual recommendations based on recent/current activity
• Given a user’s session so far, what are they most likely to want next?
• Classical methods include:
• Markov decision processes
• Bayesian scoring with Thompson sampling
• Models based on sum or average of events so far (not order-aware)

PREDICTING NEXT-VIEWED ITEMS
• Recurrent Neural Networks and their relatives (LSTMs, GRUs etc.)
• Key point: similar to sampling from a language model
• Session is “sentence”, items are “words”, catalogue is “vocabulary”
• “Session-based Recommendations with Recurrent Neural Networks”
• Hidasi et al (Gravity R&D / Telefonica Research) — GRU4Rec
• “Improved Recurrent Neural Networks for Session-based
Recommendations”
• Tan et al (A*STAR)

PERSONALIZED, HISTORY-BASED
BEHAVIOURAL ITEM RECOMMENDATIONS
• User-based collaborative filtering
• “Users like you also liked…”
• Explicit or implicit feedback (likes/ratings vs. viewed/not-viewed)
• Typical approach: user-item matrix factorization
• Map users and items into same lower-dimensional space
• Rating prediction or nearest-neighbour search

• Learn embeddings for users and items such that:
• user · item approximates rating, or…
• user · viewed item > user · non-viewed item
• Or more generally:
• f(user, item) approximates rating, or…
• f(user, viewed item) > f(user, non-viewed item)
• Key point: easy to replicate factor models, then add more layers

NEURAL INTERPRETATION: SIMPLE APPROACH
class CFModel(Sequential):
def __init__(self, n_users, m_items, k_factors, **kwargs):
P = Sequential()
P.add(Embedding(n_users, k_factors, input_length=1))
P.add(Reshape((k_factors,)))
Q = Sequential()
Q.add(Embedding(m_items, k_factors, input_length=1))
Q.add(Reshape((k_factors,)))
super(CFModel, self).__init__(**kwargs)
self.add(Merge([P, Q], mode='dot', dot_axes=1))
• github: bradleypallen/keras-movielens-cf based on example from
fenris.org blog post: “Collaborative Filtering in Keras”
• Deeper version: “Recommending Movies with Deep Learning”
• Richard Weiss (blog post)

NEURAL INTERPRETATION: TRIPLET LEARNING
def build_model(num_users, num_items, latent_dim):
positive_item_input = Input((1, ), name='positive_item_input')
negative_item_input = Input((1, ), name='negative_item_input')
# Shared embedding layer for positive and negative items
item_embedding_layer = Embedding(
num_items, latent_dim, name='item_embedding', input_length=1)
user_input = Input((1, ), name='user_input')
positive_item_embedding = Flatten()(item_embedding_layer(positive_item_input))
negative_item_embedding = Flatten()(item_embedding_layer(negative_item_input))
user_embedding = Flatten()(Embedding(
num_users, latent_dim, name='user_embedding', input_length=1)(
user_input))
loss = merge(
[positive_item_embedding, negative_item_embedding, user_embedding],
mode=bpr_triplet_loss,
name='loss',
output_shape=(1, ))
model = Model(
input=[positive_item_input, negative_item_input, user_input], output=loss)
model.compile(loss=identity_loss, optimizer=Adam())
return model

NEURAL INTERPRETATION: TRIPLET LEARNING
def bpr_triplet_loss(X):
positive_item_latent, negative_item_latent, user_latent = X
# Bayesian Personalized Ranking loss
loss = 1.0 - K.sigmoid(
K.sum(user_latent * positive_item_latent, axis=-1, keepdims=True) -
K.sum(user_latent * negative_item_latent, axis=-1, keepdims=True))
return loss
• github: maciejkula/triplet_recommendations_keras
• For each (user, item_pos, item_neg) triplet:
• Learn to score (user, item_pos) higher than (user, item_neg)
• Loss function from “BPR: Bayesian Personalized Ranking from
Implicit Feedback” — Rendle et al (Univ. Of Hildesheim)

PERSONALIZED, HISTORY & CONTENT-BASED
HYBRID RECOMMENDATIONS
• Blend of collaborative filtering based on behavioural data, and
supervised learning based on features (of item, of user, of context)
• State-of-the-art for many recommendation tasks
• “Factorization Machines”
• Steffen Rendle (Osaka University) — LibFM
• “Metadata Embeddings for User and Item Cold-start
Recommendations”
• Maciej Kula (Lyst) — LightFM

NEURAL INTERPRETATION: WIDE & DEEP MODELS
• Deep neural networks can learn to generalize to both similar users
and similar items
• But shallow, sparse models can learn specific effects easily
• So, combine both into one model
• “Wide & Deep Learning for Recommender Systems”
• Cheng et al (Google) — see also tutorial on TensorFlow website

NEURAL INTERPRETATION: WIDE & DEEP MODELS
Image from Wide & Deep Learning tutorial on tensorflow.org

NEURAL INTERPRETATION: TWO-STAGE RECOMMENDERS
• Alternative approach — split task into two steps:
1. Classification model to select shortlist of items likely to be clicked
2. Ranking model to determine correct display order for these items
• Both steps can use personalized features about the user
• Each one is an easier task than attempting to rank entire catalogue
• “Deep Neural Networks for YouTube Recommendations”
• Covington et al (Google)

THE BIGGER PICTURE
WHY USE DEEP LEARNING AT ALL?

DEEP LEARNING ISN’T AN AUTOMATIC WIN
A WORD OF WARNING
• Deep learning won’t necessarily beat best classical models!
• DL models can be harder to train
• They often require a lot more tedious hyperparameter tuning
• How many layers? How wide should each be?
• What kind of activation function(s)?
• How much regularization, and where?
• What learning algorithm to use? With what settings? Etc etc etc…

WHY SHOULD I CONSIDER DEEP LEARNING?
SO WHAT ARE THE ADVANTAGES?
• DL models are flexible and highly composable
• Easy to use smaller models as components in larger ones, e.g.:
• Plug output of LSTM model of session activity into input of deep
collaborative filtering model
• Plug output of image feature extraction model into input of
personalized reranking models
• Take weights learnt in one model and use them in another, e.g.:
• Embeddings from word2vec in a content-based recommender

WHY SHOULD I CONSIDER DEEP LEARNING?
SO WHAT ARE THE ADVANTAGES?
• DL toolkits keep getting better and better
• Take advantage of GPUs (single/multiple/whole cluster…)
• Easy to take existing models and slightly extend/modify them
• Easy to try new loss functions, network architectures etc.
• No need to get bogged down in C code or maths
• Models and whole approaches can be transferred between domains
• So much example code out there to follow

ANY QUESTIONS?
THANKS!
• Feel free to grab me
afterwards to chat about
anything
• Or ping me on Twitter:
• @andrew_clegg
• Special thanks to Maciej Kula
for his suggestions

Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep product recommendations with Keras and TensorFlow"

Recomendados

Más contenido relacionado

Similar a Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep product recommendations with Keras and TensorFlow"

Similar a Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep product recommendations with Keras and TensorFlow" (20)

Más de Dataconomy Media

Más de Dataconomy Media (20)

Último

Último (16)

Andrew Clegg, Data Scientician & Machine Learning Engine-Driver: "Deep product recommendations with Keras and TensorFlow"