SlideShare a Scribd company logo
1 of 56
Download to read offline
The Evolution of
Hadoop at Spotify
Through Failures and Pain
Josh Baer (jbx@spotify.com)
Rafal Wojdyla (rav@spotify.com) 1
Note: Our views are our own and don't necessarily represent those of Spotify.
2
• Growing Pains (2009-2012)
• Gaining Focus (2013 - 2014)
• The Future (2015+)
Overview
3
Our First Major
Hadoop Bug
4
Cluster 1.0
What is Spotify?
• Music Streaming Service
• Browse and Discover Millions of
Songs, Artists and Albums
• Launched in October 2008
• December 2014:
• 60 Million Monthly Users
• 15 Million Paid Subscribers
5
What is Spotify?
• Data Infrastructure:
• 1300 Hadoop Nodes
• 42 PB Storage
• 20 TB data ingested via Kafka/day
• 200 TB generated by Hadoop/day
6
7
select artist_id, count(1)
from user_activities
where play_seconds > 30
group by artist_id;
7
select artist_id, count(1)
from user_activities
where play_seconds > 30
group by artist_id;
7
0	
  *	
  *	
  *	
  *	
  	
  	
  	
  spotify-­‐core	
  	
  	
  	
  	
  	
  hadoop	
  jar	
  hourly_import.jar	
  
15	
  *	
  *	
  *	
  *	
  	
  	
  spotify-­‐core	
  	
  	
  	
  	
  	
  hadoop	
  jar	
  hourly_listeners.jar	
  
30	
  *	
  *	
  *	
  *	
  	
  	
  spotify-­‐analytics	
  hadoop	
  jar	
  user_funnel_hourly.jar	
  
*	
  1	
  *	
  *	
  *	
  	
  	
  	
  spotify-­‐core	
  	
  	
  	
  	
  	
  hadoop	
  jar	
  daily_aggregate.jar	
  
*	
  2	
  *	
  *	
  *	
  	
  	
  	
  spotify-­‐core	
  	
  	
  	
  	
  	
  hadoop	
  jar	
  calculate_royalties.jar	
  
*/2	
  22	
  *	
  *	
  *	
  spotify-­‐radio	
  	
  	
  	
  	
  hadoop	
  jar	
  generate_radio.jar	
  
8
0	
  *	
  *	
  *	
  *	
  	
  	
  	
  spotify-­‐core	
  	
  	
  	
  	
  	
  hadoop	
  jar	
  hourly_import.jar	
  
15	
  *	
  *	
  *	
  *	
  	
  	
  spotify-­‐core	
  	
  	
  	
  	
  	
  hadoop	
  jar	
  hourly_listeners.jar	
  
30	
  *	
  *	
  *	
  *	
  	
  	
  spotify-­‐analytics	
  hadoop	
  jar	
  user_funnel_hourly.jar	
  
*	
  1	
  *	
  *	
  *	
  	
  	
  	
  spotify-­‐core	
  	
  	
  	
  	
  	
  hadoop	
  jar	
  daily_aggregate.jar	
  
*	
  2	
  *	
  *	
  *	
  	
  	
  	
  spotify-­‐core	
  	
  	
  	
  	
  	
  hadoop	
  jar	
  calculate_royalties.jar	
  
*/2	
  22	
  *	
  *	
  *	
  spotify-­‐radio	
  	
  	
  	
  	
  hadoop	
  jar	
  generate_radio.jar	
  
8
9
Handles the ‘plumbing’ for Hadoop jobs
https://github.com/spotify/luigi
10
10
1111
To the Cloud!
1111
To the Cloud!
1111
12
#	
  sudo	
  addgroup	
  hadoop	
  
#	
  sudo	
  adduser	
  —ingroup	
  hadoop	
  hdfs	
  
#	
  sudo	
  adduser	
  —ingroup	
  hadoop	
  yarn	
  
#	
  cp	
  /tmp/configs/*.xml	
  /etc/hadoop/conf/	
  
#	
  apt-­‐get	
  update	
  
…	
  
[hdfs@sj-­‐hadoop-­‐b20	
  ~]	
  $	
  apt-­‐get	
  install	
  hadoop-­‐hdfs-­‐datanode	
  
…	
  
[yarn@sj-­‐hadoop-­‐b20	
  ~]	
  $	
  apt-­‐get	
  install	
  hadoop-­‐yarn-­‐nodemanager	
  
12
#	
  sudo	
  addgroup	
  hadoop	
  
#	
  sudo	
  adduser	
  —ingroup	
  hadoop	
  hdfs	
  
#	
  sudo	
  adduser	
  —ingroup	
  hadoop	
  yarn	
  
#	
  cp	
  /tmp/configs/*.xml	
  /etc/hadoop/conf/	
  
#	
  apt-­‐get	
  update	
  
…	
  
[hdfs@sj-­‐hadoop-­‐b20	
  ~]	
  $	
  apt-­‐get	
  install	
  hadoop-­‐hdfs-­‐datanode	
  
…	
  
[yarn@sj-­‐hadoop-­‐b20	
  ~]	
  $	
  apt-­‐get	
  install	
  hadoop-­‐yarn-­‐nodemanager	
  
13
Automated
Config Management
(via Puppet)
14
[data-­‐sci@sj-­‐edge-­‐a1	
  ~]	
  $	
  hdfs	
  dfs	
  -­‐ls	
  /data	
  
Found	
  3	
  items	
  
drwxr-­‐xr-­‐x	
  	
  	
  -­‐	
  hdfs	
  hadoop	
  	
  	
  	
  	
  0	
  2015-­‐01-­‐01	
  12:00	
  lake	
  
drwxr-­‐xr-­‐x	
  	
  	
  -­‐	
  hdfs	
  hadoop	
  	
  	
  	
  	
  0	
  2015-­‐01-­‐01	
  12:00	
  pond	
  
drwxr-­‐xr-­‐x	
  	
  	
  -­‐	
  hdfs	
  hadoop	
  	
  	
  	
  	
  0	
  2015-­‐01-­‐01	
  12:00	
  ocean	
  
[data-­‐sci@sj-­‐edge-­‐a1	
  ~]	
  $	
  hdfs	
  dfs	
  -­‐ls	
  /data/lake	
  
Found	
  1	
  items	
  
drwxr-­‐xr-­‐x	
  	
  	
  -­‐	
  hdfs	
  hadoop	
  	
  	
  	
  	
  1321451	
  2015-­‐01-­‐01	
  12:00	
  boats.txt	
  
[data-­‐sci@sj-­‐edge-­‐a1	
  ~]	
  $	
  hdfs	
  dfs	
  -­‐cat	
  /data/lake/boats.txt	
  
…
14
[data-­‐sci@sj-­‐edge-­‐a1	
  ~]	
  $	
  hdfs	
  dfs	
  -­‐ls	
  /data	
  
Found	
  3	
  items	
  
drwxr-­‐xr-­‐x	
  	
  	
  -­‐	
  hdfs	
  hadoop	
  	
  	
  	
  	
  0	
  2015-­‐01-­‐01	
  12:00	
  lake	
  
drwxr-­‐xr-­‐x	
  	
  	
  -­‐	
  hdfs	
  hadoop	
  	
  	
  	
  	
  0	
  2015-­‐01-­‐01	
  12:00	
  pond	
  
drwxr-­‐xr-­‐x	
  	
  	
  -­‐	
  hdfs	
  hadoop	
  	
  	
  	
  	
  0	
  2015-­‐01-­‐01	
  12:00	
  ocean	
  
[data-­‐sci@sj-­‐edge-­‐a1	
  ~]	
  $	
  hdfs	
  dfs	
  -­‐ls	
  /data/lake	
  
Found	
  1	
  items	
  
drwxr-­‐xr-­‐x	
  	
  	
  -­‐	
  hdfs	
  hadoop	
  	
  	
  	
  	
  1321451	
  2015-­‐01-­‐01	
  12:00	
  boats.txt	
  
[data-­‐sci@sj-­‐edge-­‐a1	
  ~]	
  $	
  hdfs	
  dfs	
  -­‐cat	
  /data/lake/boats.txt	
  
…
15
$	
  time	
  for	
  i	
  in	
  {1..100};	
  do	
  hadoop	
  fs	
  -­‐ls	
  /	
  >	
  /dev/null;	
  done	
  
real	
   3m32.014s	
  
user	
   6m15.891s	
  
sys	
  	
  	
  	
  0m18.821s	
  
$	
  time	
  for	
  i	
  in	
  {1..100};	
  do	
  snakebite	
  ls	
  /	
  >	
  /dev/null;	
  done	
  
real	
   0m34.760s	
  
user	
   0m29.962s	
  
sys	
  	
  	
  	
  0m4.512s	
  
16
17
Gaining Focus
(2013-2014)
18
• In 2013, expanded to 200 nodes
• Hadoop critical
• Needed a team totally focused on it
• Created a ‘squad’ with two missions:
• Migrate to a new distribution with Yarn
• Make Hadoop reliable
Forming a team
19
19
Hadoop ownerless
19
Hadoop ownerless
Squad
19
Hadoop ownerless Upgrades
Squad
19
Hadoop ownerless Upgrades Getting there
Squad
20
• Alert on service level problems (i.e. no jobs running)
• Keep your alarm channel clean. Beware of alert fatigue.
Alerting
21
Uhh ohh…..
I think I made a mistake
[data-­‐sci@sj-­‐edge-­‐a1	
  ~]	
  $	
  snakebite	
  rm	
  -­‐R	
  /team/disco/	
  CF/test-­‐10/	
  
22
Goodbye Data (1PB)
[data-­‐sci@sj-­‐edge-­‐a1	
  ~]	
  $	
  snakebite	
  rm	
  -­‐R	
  /team/disco/	
  CF/test-­‐10/	
  
22
OK:	
  Deleted	
  /team/disco	
  
23
• “Sit on your hands before you type” - Wouter de Bie
• Users will always want to retain data!
• Remove superusers from ‘edgenodes’
• Moving to trash = client-side implementation
Lessons Learned
24
The Wild Wild West
25
• Same hardware profile as production cluster
• Similar configuration
• Staging environment
• Reliable
Pre-Production Cluster
26
Automated Testing
27
28
Moving Data
29
• Features:
• Data discovery
• Lineage
• Lifecycle management
• More
• We use it for data movement
• Uses Oozie behind the scenes
Apache Falcon
• Most of our jobs were Hadoop (python) Streaming
• Lots of failures, slow performance
• Had to find a better way….
30
Improving Performance
31
• Investigated several frameworks
• Selected Crunch:
• Real types!
• Higher level API
• Easier to test
• Better performance #JVM_FTW
*Dave Whiting’s analysis of systems: http://thewit.ch/scalding_crunchy_pig
Improving Performance
32
33
Let’s Review
34
The Future
(2015+)
35
Note: Spotify users is based on publicly released numbers only
36
Explosive Growth
• Increased Spotify Users
• Increased use cases
• Increased Engineers
37
http://everynoise.com/engenremap.html
38
Scaling machines: easy
Scaling people: hard
39
User Feedback:
Automate IT!
40
Data Management
41
• Data-discovery tool
• Luigi Integration
• Find and browse datasets
• View schemas
• Trace lineage
• Open-source plans? :-(
Raynor
42
Two Takeaways
• Automate Everything
• More time to play FIFA build cool tools
• Listen to your users
• Fail fast, don’t be afraid to scrap work
43
Join the Band!
Engineers wanted in NYC & Stockholm
http://spotify.com/jobs

More Related Content

What's hot

What's hot (20)

Building Data Pipelines for Music Recommendations at Spotify
Building Data Pipelines for Music Recommendations at SpotifyBuilding Data Pipelines for Music Recommendations at Spotify
Building Data Pipelines for Music Recommendations at Spotify
 
How data drives spotify
How data drives spotifyHow data drives spotify
How data drives spotify
 
From Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover WeeklyFrom Idea to Execution: Spotify's Discover Weekly
From Idea to Execution: Spotify's Discover Weekly
 
Big data and machine learning @ Spotify
Big data and machine learning @ SpotifyBig data and machine learning @ Spotify
Big data and machine learning @ Spotify
 
Spotify: Data center & Backend buildout
Spotify: Data center & Backend buildoutSpotify: Data center & Backend buildout
Spotify: Data center & Backend buildout
 
Music Personalization At Spotify
Music Personalization At SpotifyMusic Personalization At Spotify
Music Personalization At Spotify
 
Music Recommendations at Scale with Spark
Music Recommendations at Scale with SparkMusic Recommendations at Scale with Spark
Music Recommendations at Scale with Spark
 
Spotify Discover Weekly: The machine learning behind your music recommendations
Spotify Discover Weekly: The machine learning behind your music recommendationsSpotify Discover Weekly: The machine learning behind your music recommendations
Spotify Discover Weekly: The machine learning behind your music recommendations
 
Storm at Spotify
Storm at SpotifyStorm at Spotify
Storm at Spotify
 
Collaborative Filtering at Spotify
Collaborative Filtering at SpotifyCollaborative Filtering at Spotify
Collaborative Filtering at Spotify
 
Scala Data Pipelines @ Spotify
Scala Data Pipelines @ SpotifyScala Data Pipelines @ Spotify
Scala Data Pipelines @ Spotify
 
Personalized Playlists at Spotify
Personalized Playlists at SpotifyPersonalized Playlists at Spotify
Personalized Playlists at Spotify
 
Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014Music recommendations @ MLConf 2014
Music recommendations @ MLConf 2014
 
Homepage Personalization at Spotify
Homepage Personalization at SpotifyHomepage Personalization at Spotify
Homepage Personalization at Spotify
 
Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.Music Personalization : Real time Platforms.
Music Personalization : Real time Platforms.
 
Recommending and searching @ Spotify
Recommending and searching @ SpotifyRecommending and searching @ Spotify
Recommending and searching @ Spotify
 
CF Models for Music Recommendations At Spotify
CF Models for Music Recommendations At SpotifyCF Models for Music Recommendations At Spotify
CF Models for Music Recommendations At Spotify
 
Approximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetupApproximate nearest neighbor methods and vector models – NYC ML meetup
Approximate nearest neighbor methods and vector models – NYC ML meetup
 
Scala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music RecommendationsScala Data Pipelines for Music Recommendations
Scala Data Pipelines for Music Recommendations
 
ML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive AnalyticsML+Hadoop at NYC Predictive Analytics
ML+Hadoop at NYC Predictive Analytics
 

Viewers also liked

360i Idea Safari: The Hunt of the Mysterious BIG IDEA (Presented at Cannes 2012)
360i Idea Safari: The Hunt of the Mysterious BIG IDEA (Presented at Cannes 2012)360i Idea Safari: The Hunt of the Mysterious BIG IDEA (Presented at Cannes 2012)
360i Idea Safari: The Hunt of the Mysterious BIG IDEA (Presented at Cannes 2012)
360i
 
4.4 mb portfolio print 2012-2016
4.4 mb portfolio print 2012-20164.4 mb portfolio print 2012-2016
4.4 mb portfolio print 2012-2016
Meghan Garnett
 
Structural Project 1 Group 1
Structural Project 1 Group 1Structural Project 1 Group 1
Structural Project 1 Group 1
guestb2749c7
 

Viewers also liked (20)

Postmodernism
PostmodernismPostmodernism
Postmodernism
 
Barnes and Noble
Barnes and NobleBarnes and Noble
Barnes and Noble
 
Moving into movies - using video in E-Learning
Moving into movies - using video in E-Learning Moving into movies - using video in E-Learning
Moving into movies - using video in E-Learning
 
We’ve created a monster! Truth and fiction in SOA
We’ve created a monster! Truth and fiction in SOAWe’ve created a monster! Truth and fiction in SOA
We’ve created a monster! Truth and fiction in SOA
 
A thousand fronts: on the architectures I like
A thousand fronts: on the architectures I likeA thousand fronts: on the architectures I like
A thousand fronts: on the architectures I like
 
360i Idea Safari: The Hunt of the Mysterious BIG IDEA (Presented at Cannes 2012)
360i Idea Safari: The Hunt of the Mysterious BIG IDEA (Presented at Cannes 2012)360i Idea Safari: The Hunt of the Mysterious BIG IDEA (Presented at Cannes 2012)
360i Idea Safari: The Hunt of the Mysterious BIG IDEA (Presented at Cannes 2012)
 
Math music and architecture
Math music and architectureMath music and architecture
Math music and architecture
 
4.4 mb portfolio print 2012-2016
4.4 mb portfolio print 2012-20164.4 mb portfolio print 2012-2016
4.4 mb portfolio print 2012-2016
 
Story, Sci-Fi & Transmedia to develop Corporate Technology Strategies.
Story, Sci-Fi & Transmedia to develop Corporate Technology Strategies.Story, Sci-Fi & Transmedia to develop Corporate Technology Strategies.
Story, Sci-Fi & Transmedia to develop Corporate Technology Strategies.
 
Big Idea: FIction & Non-Fiction
Big Idea: FIction & Non-FictionBig Idea: FIction & Non-Fiction
Big Idea: FIction & Non-Fiction
 
Hamburg 2012
Hamburg 2012Hamburg 2012
Hamburg 2012
 
Spotify's Music Recommendations Lambda Architecture
Spotify's Music Recommendations Lambda ArchitectureSpotify's Music Recommendations Lambda Architecture
Spotify's Music Recommendations Lambda Architecture
 
When Brands Attack
When Brands AttackWhen Brands Attack
When Brands Attack
 
Nature, Architecture And Music (V M )
Nature, Architecture And Music (V M )Nature, Architecture And Music (V M )
Nature, Architecture And Music (V M )
 
exploring architecture and music
exploring architecture and musicexploring architecture and music
exploring architecture and music
 
Structural Project 1 Group 1
Structural Project 1 Group 1Structural Project 1 Group 1
Structural Project 1 Group 1
 
Urban Design
Urban DesignUrban Design
Urban Design
 
But Today We Collect Bullshit: Architecture and Storytelling in the Age of So...
But Today We Collect Bullshit: Architecture and Storytelling in the Age of So...But Today We Collect Bullshit: Architecture and Storytelling in the Age of So...
But Today We Collect Bullshit: Architecture and Storytelling in the Age of So...
 
Architectural structures world Wide
Architectural structures   world WideArchitectural structures   world Wide
Architectural structures world Wide
 
Notaable Buildings Around The World
Notaable Buildings Around The WorldNotaable Buildings Around The World
Notaable Buildings Around The World
 

Similar to The Evolution of Hadoop at Spotify - Through Failures and Pain

RMLL 2013 : Build Your Personal Search Engine using Crawlzilla
RMLL 2013 : Build Your Personal Search Engine using CrawlzillaRMLL 2013 : Build Your Personal Search Engine using Crawlzilla
RMLL 2013 : Build Your Personal Search Engine using Crawlzilla
Jazz Yao-Tsung Wang
 

Similar to The Evolution of Hadoop at Spotify - Through Failures and Pain (20)

Unleash your cluster with YARN
Unleash your cluster with YARNUnleash your cluster with YARN
Unleash your cluster with YARN
 
Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)Yarn by default (Spark on YARN)
Yarn by default (Spark on YARN)
 
EclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An IntroductionEclipseCon Keynote: Apache Hadoop - An Introduction
EclipseCon Keynote: Apache Hadoop - An Introduction
 
Hecuba2: Cassandra Operations Made Easy (Radovan Zvoncek, Spotify) | C* Summi...
Hecuba2: Cassandra Operations Made Easy (Radovan Zvoncek, Spotify) | C* Summi...Hecuba2: Cassandra Operations Made Easy (Radovan Zvoncek, Spotify) | C* Summi...
Hecuba2: Cassandra Operations Made Easy (Radovan Zvoncek, Spotify) | C* Summi...
 
Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815Running R on Hadoop - CHUG - 20120815
Running R on Hadoop - CHUG - 20120815
 
Getting started with R & Hadoop
Getting started with R & HadoopGetting started with R & Hadoop
Getting started with R & Hadoop
 
Hadoop bootcamp getting started
Hadoop bootcamp getting startedHadoop bootcamp getting started
Hadoop bootcamp getting started
 
Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤Amebaサービスのログ解析基盤
Amebaサービスのログ解析基盤
 
GOTO 2011 preso: 3x Hadoop
GOTO 2011 preso: 3x HadoopGOTO 2011 preso: 3x Hadoop
GOTO 2011 preso: 3x Hadoop
 
TriHUG Feb: Hive on spark
TriHUG Feb: Hive on sparkTriHUG Feb: Hive on spark
TriHUG Feb: Hive on spark
 
Hadoop installation with an example
Hadoop installation with an exampleHadoop installation with an example
Hadoop installation with an example
 
20080529dublinpt1
20080529dublinpt120080529dublinpt1
20080529dublinpt1
 
Backups
BackupsBackups
Backups
 
Big Data
Big DataBig Data
Big Data
 
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
Hadoop Operations Powered By ... Hadoop (Hadoop Summit 2014 Amsterdam)
 
RMLL 2013 : Build Your Personal Search Engine using Crawlzilla
RMLL 2013 : Build Your Personal Search Engine using CrawlzillaRMLL 2013 : Build Your Personal Search Engine using Crawlzilla
RMLL 2013 : Build Your Personal Search Engine using Crawlzilla
 
Druid at Strata Conf NY 2016.pdf
Druid at Strata Conf NY 2016.pdfDruid at Strata Conf NY 2016.pdf
Druid at Strata Conf NY 2016.pdf
 
#WeSpeakLinux Session
#WeSpeakLinux Session#WeSpeakLinux Session
#WeSpeakLinux Session
 
Apache hadoop
Apache hadoopApache hadoop
Apache hadoop
 
Why Managed Service Providers Should Embrace Container Technology
Why Managed Service Providers Should Embrace Container TechnologyWhy Managed Service Providers Should Embrace Container Technology
Why Managed Service Providers Should Embrace Container Technology
 

Recently uploaded

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
gajnagarg
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
gajnagarg
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
amitlee9823
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
amitlee9823
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 

Recently uploaded (20)

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
Just Call Vip call girls Erode Escorts ☎️9352988975 Two shot with one girl (E...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Detecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning ApproachDetecting Credit Card Fraud: A Machine Learning Approach
Detecting Credit Card Fraud: A Machine Learning Approach
 
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
Just Call Vip call girls kakinada Escorts ☎️9352988975 Two shot with one girl...
 
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men  🔝Ongole🔝   Escorts S...
➥🔝 7737669865 🔝▻ Ongole Call-girls in Women Seeking Men 🔝Ongole🔝 Escorts S...
 
Aspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - AlmoraAspirational Block Program Block Syaldey District - Almora
Aspirational Block Program Block Syaldey District - Almora
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night StandCall Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Doddaballapur Road ☎ 7737669865 🥵 Book Your One night Stand
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 

The Evolution of Hadoop at Spotify - Through Failures and Pain

  • 1. The Evolution of Hadoop at Spotify Through Failures and Pain Josh Baer (jbx@spotify.com) Rafal Wojdyla (rav@spotify.com) 1 Note: Our views are our own and don't necessarily represent those of Spotify.
  • 2. 2 • Growing Pains (2009-2012) • Gaining Focus (2013 - 2014) • The Future (2015+) Overview
  • 5. What is Spotify? • Music Streaming Service • Browse and Discover Millions of Songs, Artists and Albums • Launched in October 2008 • December 2014: • 60 Million Monthly Users • 15 Million Paid Subscribers 5
  • 6. What is Spotify? • Data Infrastructure: • 1300 Hadoop Nodes • 42 PB Storage • 20 TB data ingested via Kafka/day • 200 TB generated by Hadoop/day 6
  • 7. 7 select artist_id, count(1) from user_activities where play_seconds > 30 group by artist_id;
  • 8. 7 select artist_id, count(1) from user_activities where play_seconds > 30 group by artist_id;
  • 9. 7
  • 10. 0  *  *  *  *        spotify-­‐core            hadoop  jar  hourly_import.jar   15  *  *  *  *      spotify-­‐core            hadoop  jar  hourly_listeners.jar   30  *  *  *  *      spotify-­‐analytics  hadoop  jar  user_funnel_hourly.jar   *  1  *  *  *        spotify-­‐core            hadoop  jar  daily_aggregate.jar   *  2  *  *  *        spotify-­‐core            hadoop  jar  calculate_royalties.jar   */2  22  *  *  *  spotify-­‐radio          hadoop  jar  generate_radio.jar   8
  • 11. 0  *  *  *  *        spotify-­‐core            hadoop  jar  hourly_import.jar   15  *  *  *  *      spotify-­‐core            hadoop  jar  hourly_listeners.jar   30  *  *  *  *      spotify-­‐analytics  hadoop  jar  user_funnel_hourly.jar   *  1  *  *  *        spotify-­‐core            hadoop  jar  daily_aggregate.jar   *  2  *  *  *        spotify-­‐core            hadoop  jar  calculate_royalties.jar   */2  22  *  *  *  spotify-­‐radio          hadoop  jar  generate_radio.jar   8
  • 12. 9 Handles the ‘plumbing’ for Hadoop jobs https://github.com/spotify/luigi
  • 13. 10
  • 14. 10
  • 15. 1111
  • 18. 12 #  sudo  addgroup  hadoop   #  sudo  adduser  —ingroup  hadoop  hdfs   #  sudo  adduser  —ingroup  hadoop  yarn   #  cp  /tmp/configs/*.xml  /etc/hadoop/conf/   #  apt-­‐get  update   …   [hdfs@sj-­‐hadoop-­‐b20  ~]  $  apt-­‐get  install  hadoop-­‐hdfs-­‐datanode   …   [yarn@sj-­‐hadoop-­‐b20  ~]  $  apt-­‐get  install  hadoop-­‐yarn-­‐nodemanager  
  • 19. 12 #  sudo  addgroup  hadoop   #  sudo  adduser  —ingroup  hadoop  hdfs   #  sudo  adduser  —ingroup  hadoop  yarn   #  cp  /tmp/configs/*.xml  /etc/hadoop/conf/   #  apt-­‐get  update   …   [hdfs@sj-­‐hadoop-­‐b20  ~]  $  apt-­‐get  install  hadoop-­‐hdfs-­‐datanode   …   [yarn@sj-­‐hadoop-­‐b20  ~]  $  apt-­‐get  install  hadoop-­‐yarn-­‐nodemanager  
  • 21. 14 [data-­‐sci@sj-­‐edge-­‐a1  ~]  $  hdfs  dfs  -­‐ls  /data   Found  3  items   drwxr-­‐xr-­‐x      -­‐  hdfs  hadoop          0  2015-­‐01-­‐01  12:00  lake   drwxr-­‐xr-­‐x      -­‐  hdfs  hadoop          0  2015-­‐01-­‐01  12:00  pond   drwxr-­‐xr-­‐x      -­‐  hdfs  hadoop          0  2015-­‐01-­‐01  12:00  ocean   [data-­‐sci@sj-­‐edge-­‐a1  ~]  $  hdfs  dfs  -­‐ls  /data/lake   Found  1  items   drwxr-­‐xr-­‐x      -­‐  hdfs  hadoop          1321451  2015-­‐01-­‐01  12:00  boats.txt   [data-­‐sci@sj-­‐edge-­‐a1  ~]  $  hdfs  dfs  -­‐cat  /data/lake/boats.txt   …
  • 22. 14 [data-­‐sci@sj-­‐edge-­‐a1  ~]  $  hdfs  dfs  -­‐ls  /data   Found  3  items   drwxr-­‐xr-­‐x      -­‐  hdfs  hadoop          0  2015-­‐01-­‐01  12:00  lake   drwxr-­‐xr-­‐x      -­‐  hdfs  hadoop          0  2015-­‐01-­‐01  12:00  pond   drwxr-­‐xr-­‐x      -­‐  hdfs  hadoop          0  2015-­‐01-­‐01  12:00  ocean   [data-­‐sci@sj-­‐edge-­‐a1  ~]  $  hdfs  dfs  -­‐ls  /data/lake   Found  1  items   drwxr-­‐xr-­‐x      -­‐  hdfs  hadoop          1321451  2015-­‐01-­‐01  12:00  boats.txt   [data-­‐sci@sj-­‐edge-­‐a1  ~]  $  hdfs  dfs  -­‐cat  /data/lake/boats.txt   …
  • 23. 15
  • 24. $  time  for  i  in  {1..100};  do  hadoop  fs  -­‐ls  /  >  /dev/null;  done   real   3m32.014s   user   6m15.891s   sys        0m18.821s   $  time  for  i  in  {1..100};  do  snakebite  ls  /  >  /dev/null;  done   real   0m34.760s   user   0m29.962s   sys        0m4.512s   16
  • 26. 18 • In 2013, expanded to 200 nodes • Hadoop critical • Needed a team totally focused on it • Created a ‘squad’ with two missions: • Migrate to a new distribution with Yarn • Make Hadoop reliable Forming a team
  • 27. 19
  • 31. 19 Hadoop ownerless Upgrades Getting there Squad
  • 32. 20 • Alert on service level problems (i.e. no jobs running) • Keep your alarm channel clean. Beware of alert fatigue. Alerting
  • 33. 21 Uhh ohh….. I think I made a mistake
  • 34. [data-­‐sci@sj-­‐edge-­‐a1  ~]  $  snakebite  rm  -­‐R  /team/disco/  CF/test-­‐10/   22
  • 35. Goodbye Data (1PB) [data-­‐sci@sj-­‐edge-­‐a1  ~]  $  snakebite  rm  -­‐R  /team/disco/  CF/test-­‐10/   22 OK:  Deleted  /team/disco  
  • 36. 23 • “Sit on your hands before you type” - Wouter de Bie • Users will always want to retain data! • Remove superusers from ‘edgenodes’ • Moving to trash = client-side implementation Lessons Learned
  • 38. 25 • Same hardware profile as production cluster • Similar configuration • Staging environment • Reliable Pre-Production Cluster
  • 40. 27
  • 42. 29 • Features: • Data discovery • Lineage • Lifecycle management • More • We use it for data movement • Uses Oozie behind the scenes Apache Falcon
  • 43. • Most of our jobs were Hadoop (python) Streaming • Lots of failures, slow performance • Had to find a better way…. 30 Improving Performance
  • 44. 31 • Investigated several frameworks • Selected Crunch: • Real types! • Higher level API • Easier to test • Better performance #JVM_FTW *Dave Whiting’s analysis of systems: http://thewit.ch/scalding_crunchy_pig Improving Performance
  • 45. 32
  • 48. 35 Note: Spotify users is based on publicly released numbers only
  • 49. 36 Explosive Growth • Increased Spotify Users • Increased use cases • Increased Engineers
  • 54. 41 • Data-discovery tool • Luigi Integration • Find and browse datasets • View schemas • Trace lineage • Open-source plans? :-( Raynor
  • 55. 42 Two Takeaways • Automate Everything • More time to play FIFA build cool tools • Listen to your users • Fail fast, don’t be afraid to scrap work
  • 56. 43 Join the Band! Engineers wanted in NYC & Stockholm http://spotify.com/jobs