2. Spotify in Numbers
• Started in 2006, now available in 59
markets
• 100+ Million active users
• 30 Million + tracks
• 20,000 new songs added per day
• 2+ Billion user generated playlists
7. Today, we’ll talk about 3 types of models
‣ Latent Factor Models
‣ Deep Learning Audio models
‣ NLP models (which are also latent factor models …)
8. Lets start off with Latent Factor Models
“Compact” representation for each user and items(songs): f-dimensional
vectors
Rohan
Track a
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. . . . .
.. .
.. .
.. .
.. .
. .
...
...
...
...
..
mUsers
Songs
User Vector Matrix: X: (m x f) Song Vector Matrix: Y: (n x f)
9. If we were to visualize a few Artist Latent Factors
10. Implicit Feedback (Hu et al. 2008)
‣ If a user u, listens to an item i, dot product of the user vector and
item vector should be as close to 1 as possible.
‣ Also takes into account confidence of a user liking an item i
‣ Solve with Alternating Gradient Descent or Alternating Least
squares.
11. Logistic Matrix Factorization (Johnson 2014)
‣ Model the probability of a user clicking on an item as the logistic
function.
‣ Maximize the likelihood of observations R, given ….
12. Recent Advances in MF
‣ Different loss functions (rank loss)
‣ Use of side information (demographics, metadata)
‣ Use of context (where, when)
‣ Deep Learning CF models
13. Deep Learning on Audio
http://benanne.github.io/2014/08/05/spotify-cnns.html
14. Document : User Session
Word : Song
NLP Models For Recommendations
15. Word2Vec (Mikolov et al. 2013)
‣ Each word / track has an input
and output vector
representation.
‣ Output is a vector space with
similar items living close to each
other in cosine distance. (and
awesome vector algebra
property)
Softmax
skipgram
16. Sequential Data? RNN ?
‣ Output layer is same as word2vec, softmax. Make a prediction of
the next item based on hidden state
‣ Recurrent connection
‣ Learning output weights and b’s for each item
https://erikbern.com/2014/06/28/recurrent-neural-networks-for-collaborative-filtering/
17. User Representations?
‣ Word2vec can output word / track representation but what about user
representations.
‣ Simple Aggregation (Bag of words) ?
Averaging problems
‣ Doc2Vec ?
Retrain every time there is new user activity
‣ Clustering?
Losing vector addition information
‣ Learn user vector through RNN ?
18. Another RNN approach
‣ Assume item vectors are fixed
‣ Try to learn the next item vector in the sequence
‣ Long term intents, train RNN to predict longer ahead in the future
19. Challenges, what lies ahead
Side information in embedding models, remove regional
biases, external genre information, lyrics, Facebook /
Twitter account data, [ cover art, who knows :) ]
Deep Learning
Transfer Learning
Outlier Detection
20. Thank You!
You can reach me @
Email: rohanag@spotify.com
Twitter: @rohanag
20