SlideShare una empresa de Scribd logo
1 de 40
Descargar para leer sin conexión
Oscar Carlsson
Data Engineer
lad@spotify.com
Big Data
and
Machine Learning
@ Spotify
Friday 6/3 2015
● D-student starting 2009
● Graduated last year from CSALL
(Student in this class 2013)
● Master thesis at Spotify
● Data Engineer at Spotify in Gothenburg
Me
● What is data at Spotify?
● Big data and processing it
● Using data at Spotify
● Machine Learning
Outline
Supervised learning:
data (X), labels (Y)
Unsupervised learning:
data (X)
In the Machine Learning class:
What is data at Spotify?
Songs Track
Metadata
User generated Users Playlists
Cover arts Listens Country, email etc Tracks of
playlist
Album Clicks Add/Removes
Genres, Mood
etc
Page views
30 Million songs
60 Million Monthly Active Users
58 Markets
15 Million subscribers
1.5 Billion Playlists
● What is data at Spotify?
● Big data and processing it
● Using data at Spotify
● Machine Learning
Outline
Big Data and processing it
● 20 TB compressed data / DAY
○ 200 TB generated and stored / day (replication)
● Our business is highly dependent on these logs
○ We pay artist depending on plays, plays = logs
Too much to store on a single computer. We need a
cluster to process it!
.. this is typically what is called “Big Data”
Big Data and processing it
● Distributed computing and storage
○ Hadoop
■ MapReduce
○ Cassandra
● Hadoop cluster
○ 1100 nodes
○ ~8000 jobs/day
● What is data at Spotify?
● Big data and processing it
● Using data at Spotify
● Machine Learning
Outline
Using data at Spotify
Everyone part of the company is interested in our data
● Product
○ Are people using X? Should we focus on features such as Y?
● Insights
○ What music is trending? What artists is popular where?
● Performance
○ How is latency in country Y? Did this reduce stutter in country X?
Using data at Spotify
● Data-driven decision making
○ Like.. every decision.
○ Analysts / Data scientists
● A/B test everything!
● A/B testing:
○ Statistical hypothesis testing
○ Simple randomized experiment with >= 2
variants (A, B)
Using data at Spotify: A/B testing
Objective: Decrease time from loading playlist to first play
Hypothesis: The bigger button the faster users finds it
Test set up:
● A - variant 1
○ 2% US and SE MAU users
● B - variant 2
○ 2% US and SE MAU users
● Control - normal
○ Rest of users in US SE
“The shuffle button”
Using data at Spotify: A/B testing
CONTROL A B
Analytics: A/B testing
Metric:
Share of users playing first play > 500ms
(500ms is made up)
Lets roll out A to all users and throw away B!
● What is data at Spotify?
● Big data and processing it
● Using data at Spotify
● Machine Learning
Outline
● Machine Learning
○ User analysis
○ Artist disambiguation
○ Recommender systems
Outline
“ A music session
somehow represents
a moment for the
user. Can we find
these moments and
describe them? ”
● Take a subset of user listening data with new genre
data
○ Combine listens in sessions
■ Consequent plays, no 15 min pause
○ Session = [genres]
● Clustering algorithms to find similar sessions
○ K-means / Hierarchical clustering
● Describe the clusters using logistic regression
Machine Learning: Cluster user music sessions
Machine Learning: Cluster user music sessions
K-Means Per cluster classification
Machine Learning: Cluster user music sessions
Per cluster logistic regression
w: weight vector
Each w_i can be interpreted as the effect in the x_i variable
x_i = genres
Machine Learning: Cluster user music sessions
Clusters described by logistic regression
name of x_i
at largest
w_i
Machine Learning: Cluster user music sessions
Machine Learning: Cluster user music sessions
Machine Learning
Artist disambiguation
Cleaning up the artists pages
Machine Learning: Artist disambiguation
Machine Learning: Artist disambiguation
Lets listen to those tracks!
Is it really the same Fredrik?
Machine Learning: Artist disambiguation
Machine Learning: Artist disambiguation
● Rank artists with probability of being ambiguous
● Apply clustering on each “ambiguous” artists
albums/tracks
○ Using features such as country, release year,
label/licensor etc.
○ Distinct cluster could be different artists
● Nicely present this for manual curation
Machine Learning: Recommender system
The discover page
Machine Learning: Recommender system
Collaborative filtering
Machine Learning: Recommender system
Collaborative filtering
● Build a matrix of user plays
● Compute similarity between items
Machine Learning: Recommender system
4 Million tracks x 60 Million users
→ Pairwise similarity infeasible
Approximate the matrix with NMF
Machine Learning: Recommender system
Matrix factorization (latent factor models)
Machine Learning: Recommender system
Small vectors
Cosine similarity and dot product efficient
Machine Learning: Recommender system
Finding recommendations:
Approximate nearest neighbour (ANN)
code: https://github.com/spotify/annoy
Related artists & Radio:
Similar to user recommendations, more models and not
all CF-based
Multiple models:
Score candidates from all models, combine and rank!
Machine Learning: Recommender system
I just went through this quickly, read more details of
Spotify Rec sys here:
Doing this on MapReduce
Comparing with Netflix
Music Rec @ MLConf 2014
● More content-based ML
○ Fingerprinting: Echo nest
○ Content-based music recommendation using
convolutional neural networks
● Personalize everything
○ Emails
○ Ads
○ User profiling
● ML on other parts of product than Rec Sys
.. final last words on the Future of ML at Spotify
Summary
● Multiple data sources -> multiple angles
● Data drives decision with A/B testing
● User analysis
○ Cluster and describe with classifier
● Artist disambiguation
○ Cluster and give to manual curators
● Recommender systems
○ Collaborative filtering
● We supervise thesis workers
○ Artist disambiguation/deduplication
○ Cluster user music sessions
○ Context-based recommender systems
○ Personalized ads / Personalized emails
● We have internships!
www.spotify.com/jobs
.. and potentially you could help us?
Oscar Carlsson
lad@spotify.com
Linkedin
Thank you for
listening!

Más contenido relacionado

La actualidad más candente

Big Data At Spotify
Big Data At SpotifyBig Data At Spotify
Big Data At Spotify
Adam Kawa
 

La actualidad más candente (20)

Spotify Machine Learning Solution for Music Discovery
Spotify Machine Learning Solution for Music DiscoverySpotify Machine Learning Solution for Music Discovery
Spotify Machine Learning Solution for Music Discovery
 
Collaborative Filtering at Spotify
Collaborative Filtering at SpotifyCollaborative Filtering at Spotify
Collaborative Filtering at Spotify
 
Scala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music RecommendationsScala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music Recommendations
 
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
DataEngConf: Building a Music Recommender System from Scratch with Spotify Da...
 
Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)Recommending and Searching (Research @ Spotify)
Recommending and Searching (Research @ Spotify)
 
Big Data At Spotify
Big Data At SpotifyBig Data At Spotify
Big Data At Spotify
 
Music Recommendations at Scale with Spark
Music Recommendations at Scale with SparkMusic Recommendations at Scale with Spark
Music Recommendations at Scale with Spark
 
CF Models for Music Recommendations At Spotify
CF Models for Music Recommendations At SpotifyCF Models for Music Recommendations At Spotify
CF Models for Music Recommendations At Spotify
 
Building Data Pipelines for Music Recommendations at Spotify
Building Data Pipelines for Music Recommendations at SpotifyBuilding Data Pipelines for Music Recommendations at Spotify
Building Data Pipelines for Music Recommendations at Spotify
 
Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014
 
How Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At SpotifyHow Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At Spotify
 
Machine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data MeetupMachine learning @ Spotify - Madison Big Data Meetup
Machine learning @ Spotify - Madison Big Data Meetup
 
Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.
 
Data at Spotify
Data at SpotifyData at Spotify
Data at Spotify
 
Storm at Spotify
Storm at SpotifyStorm at Spotify
Storm at Spotify
 
Homepage Personalization at Spotify
Homepage Personalization at SpotifyHomepage Personalization at Spotify
Homepage Personalization at Spotify
 
How data drives spotify
How data drives spotifyHow data drives spotify
How data drives spotify
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ Spotify
 
The Evolution of Big Data at Spotify
The Evolution of Big Data at SpotifyThe Evolution of Big Data at Spotify
The Evolution of Big Data at Spotify
 
Engagement, Metrics & Personalisation at Scale
Engagement, Metrics &  Personalisation at ScaleEngagement, Metrics &  Personalisation at Scale
Engagement, Metrics & Personalisation at Scale
 

Similar a Big data and machine learning @ Spotify

(SoWeMine Workshop) "#nowplaying on #Spotify: Leveraging Spotify Information ...
(SoWeMine Workshop) "#nowplaying on #Spotify: Leveraging Spotify Information ...(SoWeMine Workshop) "#nowplaying on #Spotify: Leveraging Spotify Information ...
(SoWeMine Workshop) "#nowplaying on #Spotify: Leveraging Spotify Information ...
icwe2015
 
MUSIC APPLICATION (1).pdf
MUSIC   APPLICATION (1).pdfMUSIC   APPLICATION (1).pdf
MUSIC APPLICATION (1).pdf
namrataSingh900842
 

Similar a Big data and machine learning @ Spotify (20)

(SoWeMine Workshop) "#nowplaying on #Spotify: Leveraging Spotify Information ...
(SoWeMine Workshop) "#nowplaying on #Spotify: Leveraging Spotify Information ...(SoWeMine Workshop) "#nowplaying on #Spotify: Leveraging Spotify Information ...
(SoWeMine Workshop) "#nowplaying on #Spotify: Leveraging Spotify Information ...
 
Playlist Recommendations @ Spotify
Playlist Recommendations @ SpotifyPlaylist Recommendations @ Spotify
Playlist Recommendations @ Spotify
 
Deezer - Big data as a streaming service
Deezer - Big data as a streaming serviceDeezer - Big data as a streaming service
Deezer - Big data as a streaming service
 
Recommendations 101
Recommendations 101 Recommendations 101
Recommendations 101
 
Data Science Game 2017 - Machine Learning Meetup Presentation
Data Science Game 2017 - Machine Learning Meetup PresentationData Science Game 2017 - Machine Learning Meetup Presentation
Data Science Game 2017 - Machine Learning Meetup Presentation
 
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
MediaEval 2016 - COSMIR and the OpenMIC Challenge: A Plan for Sustainable Mus...
 
Anghami: From Billions Of Streams To Better Recommendations
Anghami: From Billions Of Streams To Better RecommendationsAnghami: From Billions Of Streams To Better Recommendations
Anghami: From Billions Of Streams To Better Recommendations
 
Music Recommendation 2018
Music Recommendation 2018Music Recommendation 2018
Music Recommendation 2018
 
Trends in Music Recommendations 2018
Trends in Music Recommendations 2018Trends in Music Recommendations 2018
Trends in Music Recommendations 2018
 
A Plan for Sustainable MIR Evaluation
A Plan for Sustainable MIR EvaluationA Plan for Sustainable MIR Evaluation
A Plan for Sustainable MIR Evaluation
 
Machine Learning - Startup weekend UCSB 2018
Machine Learning - Startup weekend UCSB 2018Machine Learning - Startup weekend UCSB 2018
Machine Learning - Startup weekend UCSB 2018
 
Recommending for the World
Recommending for the WorldRecommending for the World
Recommending for the World
 
Story of the algorithms behind Deezer Flow
Story of the algorithms behind Deezer FlowStory of the algorithms behind Deezer Flow
Story of the algorithms behind Deezer Flow
 
Music
MusicMusic
Music
 
Spotify company presentation
Spotify company presentationSpotify company presentation
Spotify company presentation
 
Thesis presentation on Music Information Retrieval
Thesis presentation on Music Information RetrievalThesis presentation on Music Information Retrieval
Thesis presentation on Music Information Retrieval
 
Luis Aguiar: Platforms, Promotion, and Product Discovery: Evidence from Spoti...
Luis Aguiar: Platforms, Promotion, and Product Discovery: Evidence from Spoti...Luis Aguiar: Platforms, Promotion, and Product Discovery: Evidence from Spoti...
Luis Aguiar: Platforms, Promotion, and Product Discovery: Evidence from Spoti...
 
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...
The MediaEval 2017 AcousticBrainz Genre Task: Content-based Music Genre Recog...
 
MUSIC APPLICATION (1).pdf
MUSIC   APPLICATION (1).pdfMUSIC   APPLICATION (1).pdf
MUSIC APPLICATION (1).pdf
 
Map Reduce: An Example (James Grant at Big Data Brighton)
Map Reduce: An Example (James Grant at Big Data Brighton)Map Reduce: An Example (James Grant at Big Data Brighton)
Map Reduce: An Example (James Grant at Big Data Brighton)
 

Último

Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
MarinCaroMartnezBerg
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 

Último (20)

5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 

Big data and machine learning @ Spotify

  • 1. Oscar Carlsson Data Engineer lad@spotify.com Big Data and Machine Learning @ Spotify Friday 6/3 2015
  • 2. ● D-student starting 2009 ● Graduated last year from CSALL (Student in this class 2013) ● Master thesis at Spotify ● Data Engineer at Spotify in Gothenburg Me
  • 3. ● What is data at Spotify? ● Big data and processing it ● Using data at Spotify ● Machine Learning Outline
  • 4. Supervised learning: data (X), labels (Y) Unsupervised learning: data (X) In the Machine Learning class:
  • 5. What is data at Spotify? Songs Track Metadata User generated Users Playlists Cover arts Listens Country, email etc Tracks of playlist Album Clicks Add/Removes Genres, Mood etc Page views 30 Million songs 60 Million Monthly Active Users 58 Markets 15 Million subscribers 1.5 Billion Playlists
  • 6. ● What is data at Spotify? ● Big data and processing it ● Using data at Spotify ● Machine Learning Outline
  • 7. Big Data and processing it ● 20 TB compressed data / DAY ○ 200 TB generated and stored / day (replication) ● Our business is highly dependent on these logs ○ We pay artist depending on plays, plays = logs Too much to store on a single computer. We need a cluster to process it! .. this is typically what is called “Big Data”
  • 8. Big Data and processing it ● Distributed computing and storage ○ Hadoop ■ MapReduce ○ Cassandra ● Hadoop cluster ○ 1100 nodes ○ ~8000 jobs/day
  • 9. ● What is data at Spotify? ● Big data and processing it ● Using data at Spotify ● Machine Learning Outline
  • 10. Using data at Spotify Everyone part of the company is interested in our data ● Product ○ Are people using X? Should we focus on features such as Y? ● Insights ○ What music is trending? What artists is popular where? ● Performance ○ How is latency in country Y? Did this reduce stutter in country X?
  • 11. Using data at Spotify ● Data-driven decision making ○ Like.. every decision. ○ Analysts / Data scientists ● A/B test everything! ● A/B testing: ○ Statistical hypothesis testing ○ Simple randomized experiment with >= 2 variants (A, B)
  • 12. Using data at Spotify: A/B testing Objective: Decrease time from loading playlist to first play Hypothesis: The bigger button the faster users finds it Test set up: ● A - variant 1 ○ 2% US and SE MAU users ● B - variant 2 ○ 2% US and SE MAU users ● Control - normal ○ Rest of users in US SE “The shuffle button”
  • 13. Using data at Spotify: A/B testing CONTROL A B
  • 14. Analytics: A/B testing Metric: Share of users playing first play > 500ms (500ms is made up) Lets roll out A to all users and throw away B!
  • 15. ● What is data at Spotify? ● Big data and processing it ● Using data at Spotify ● Machine Learning Outline
  • 16. ● Machine Learning ○ User analysis ○ Artist disambiguation ○ Recommender systems Outline
  • 17. “ A music session somehow represents a moment for the user. Can we find these moments and describe them? ”
  • 18. ● Take a subset of user listening data with new genre data ○ Combine listens in sessions ■ Consequent plays, no 15 min pause ○ Session = [genres] ● Clustering algorithms to find similar sessions ○ K-means / Hierarchical clustering ● Describe the clusters using logistic regression Machine Learning: Cluster user music sessions
  • 19. Machine Learning: Cluster user music sessions K-Means Per cluster classification
  • 20. Machine Learning: Cluster user music sessions Per cluster logistic regression w: weight vector Each w_i can be interpreted as the effect in the x_i variable x_i = genres
  • 21. Machine Learning: Cluster user music sessions Clusters described by logistic regression name of x_i at largest w_i
  • 22. Machine Learning: Cluster user music sessions
  • 23. Machine Learning: Cluster user music sessions
  • 25. Machine Learning: Artist disambiguation
  • 26. Machine Learning: Artist disambiguation Lets listen to those tracks! Is it really the same Fredrik?
  • 27. Machine Learning: Artist disambiguation
  • 28. Machine Learning: Artist disambiguation ● Rank artists with probability of being ambiguous ● Apply clustering on each “ambiguous” artists albums/tracks ○ Using features such as country, release year, label/licensor etc. ○ Distinct cluster could be different artists ● Nicely present this for manual curation
  • 29. Machine Learning: Recommender system The discover page
  • 30. Machine Learning: Recommender system Collaborative filtering
  • 31. Machine Learning: Recommender system Collaborative filtering ● Build a matrix of user plays ● Compute similarity between items
  • 32. Machine Learning: Recommender system 4 Million tracks x 60 Million users → Pairwise similarity infeasible Approximate the matrix with NMF
  • 33. Machine Learning: Recommender system Matrix factorization (latent factor models)
  • 34. Machine Learning: Recommender system Small vectors Cosine similarity and dot product efficient
  • 35. Machine Learning: Recommender system Finding recommendations: Approximate nearest neighbour (ANN) code: https://github.com/spotify/annoy Related artists & Radio: Similar to user recommendations, more models and not all CF-based Multiple models: Score candidates from all models, combine and rank!
  • 36. Machine Learning: Recommender system I just went through this quickly, read more details of Spotify Rec sys here: Doing this on MapReduce Comparing with Netflix Music Rec @ MLConf 2014
  • 37. ● More content-based ML ○ Fingerprinting: Echo nest ○ Content-based music recommendation using convolutional neural networks ● Personalize everything ○ Emails ○ Ads ○ User profiling ● ML on other parts of product than Rec Sys .. final last words on the Future of ML at Spotify
  • 38. Summary ● Multiple data sources -> multiple angles ● Data drives decision with A/B testing ● User analysis ○ Cluster and describe with classifier ● Artist disambiguation ○ Cluster and give to manual curators ● Recommender systems ○ Collaborative filtering
  • 39. ● We supervise thesis workers ○ Artist disambiguation/deduplication ○ Cluster user music sessions ○ Context-based recommender systems ○ Personalized ads / Personalized emails ● We have internships! www.spotify.com/jobs .. and potentially you could help us?