2. Spotify’s Big Data
‣ Started in 2006, now available in 58 countries
‣ 100+ million active users, 35+ million paid subscribers
‣ 30+ million songs in our catalog, ~20K added every
day
‣ 2+ billion playlists
‣ 1 TB of log data every day
‣ Hadoop cluster with ~2500 nodes
22. Why Vectors?
Encodes higher order dependencies
Users and Items in the same latent space
User - Item recommendations
Item - Item similarities
Easy to scale up
Complexity is linear in order of latent factors
27. Ranking
Similarity score can be used for ranking
Balance relevance, diversity, popularity, freshness
Heuristic based
28. Ranking
Similarity score can be used for ranking
Balance relevance, diversity, popularity, freshness
Heuristic based
MAB
Interactions
Impressions
Clicks
Streams
31. Challenges Unique to Spotify
Scale of catalog
Music is “niche”
Music consumption has heavy correlation to users’ context
Repeated consumption of music is NOT so uncommon.
32. Challenge Accepted!
Cold start problem for both users and new music/upcoming artists:
Content Based Signals
Real Time Recommendations
Measuring Quality:
Implicit: A/B Test Metrics
Explicit: Feedback from social forums
Scam Attacks:
Rule based model to detect scammers
Humans choices are not always predictable:
Faith in humanity