SlideShare a Scribd company logo
1 of 23
Distributed “Web Scale” Systems



                      Ricardo Vice Santos
                            @ricardovice
Who am I?
•  I’m Ricardo!
•  Lead Engineer at Spotify
•  ricardovice on twitter, spotify, about.me, kiva, slideshare, github,
   bitbucket, delicious…
•  Portuguese
•  Previously working in the video streaming industry
•  (only) Discovered Spotify late 2009
•  Joined in 2010
spotifiera:           to use Spotify;
spo·ti·fie·ra   Verb to provide a service free of cost;
What’s Spotify all about?
•  A big catalogue, tons of music
•  Available everywhere
•  Great user experience
•  More convenient than piracy
•  Reliable, high availability
•  Scalable for many, many users
But what really got me hooked up:
•  Free, legal ad-supported service
•  Very fast
The importance of being fast
•  High latency can be a problem, not only in First
   Person Shooters
•  Slow performance is a major user experience killer
•  At Velocity 2009, Eric Schurman (Bing) and Jake
   Brutlag (Google Search) showed that increased
   latency directly hurt usage and revenue per user[1].
•  Latency leads to users leaving, many wont ever
   come back
•  Users will share their experience with friends


          [1] http://radar.oreilly.com/2009/07/velocity-making-your-site-fast.html
So how fast is Spotify?
•  We monitor playback latency on the client side
•  Current median latency to play any track is 265ms
•  On average, the human notion of “instant” is
   anything under 200ms
•  Due to disk lookup, at times it's actually faster to
   start playing a track from network than from disk
•  Below 1% of playbacks experienced stutter
“Spotify is fast due to P2P”
•  This is something I read a lot around the web
•  P2P does play a crucial role in the picture, but…
•  Experience at Spotify showed me that most latency issues are
   directly linked to backend problems
•  It’s a mistake to think that we could be this fast without a smart and
   scalable backend architecture

So let’s give credit where credit is due.
Going web scale!!1




“Scaling Twitter”
Blaine Cook, 2007
http://www.slideshare.net/Blaine/scaling-twitter
Handling growth
Things to keep in mind:
•  Scaling is not an exact science
•  There is no such thing as a magic formula
•  Usage patterns differ
•  There is always a limit to what you can handle
•  Fail gracefully
•  Continuous evolution process
Scaling horizontally
•    You can always add more machines!
•    Stateless services
•    Several processes can share memcached
•    Possible to run in “the cloud” (EC2, Rackspace)
•    Need some kind of load balancer
•    Data sharing/synchronization can be hard
•    Complexity: many pieces, maybe hidden SPOFs
•    Fundamental to the application’s design
Usage patterns
Typically, some services are more demanding than
others, this can be due to:
•  Higher popularity
•  Higher complexity
•  Low latency expectation
•  All combined
Decoupling
•    Divide and conquer!
•    The Unix way
•    Resources assigned individually
•    Using the right tools to address each problem
•    Organization and delegation
•    Problems are isolated
•    Easier to handle growth
Read only services
•    The easiest to scale
•    Stateless
•    Use indices, large read-optimized data containers
•    Each node has its local copy
•    Data structured according to service
•    Updated periodically, during off-peak hours
•    Take advantage of OS page cache
Read-write services
•  User generated content, e.g. playlists
•  Hard to ensure consistence of data across instances

Solutions:
•  Eventual consistency:
   •  Reads of just written data not guaranteed to be up-to-date
•  Locking, atomic operations
    •  Creating globally unique keys, e.g. usernames
    •  Transactions, e.g. billing
Decoupling at Spotify
Finding a service via DNS
Each service has an SRV DNS record:
•  One record with same name for each service instance
•  Clients (AP) resolve to find servers providing that service
•  Lowest priority record is chosen with weighted shuffle
•  Clients retry other instances in case of failures

Example SRV record
_frobnicator._http.example.com. 3600 SRV 10     50   8081 frob1.example.com.!
       name                     TTL type prio weight port      host!
Request assignment
•    Hardware load balancers
•    Round-robin DNS
•    Proxy servers
•    Sharding:
      •  Each server/instance responsible for subset of data
      •  Directs client to instance that has its data
      •  Easy if nothing is shared
      •  Hard if you require replication
Sharding using a DHT
Some Spotify services use Dynamo inspired DHTs[1]:
•  Each request has a key
•  Each service node is responsible for a range of hash keys
•  Data is distributed among service nodes
•  Redundancy is ensured by re-hashing and writing to replica node
•  Data must be transitioned when ring changes
!




         [1] http://dl.acm.org/citation.cfm?id=1294281
DHT example
Spotify’s DNS powered DHT
Configuration of DHT
config._frobnicator._http.example.com.     3600    TXT          “slaves=0”!
      config.srv_name.                     TTL     type   !   no replication!
!
config._frobnicator._http.example.com.     3600    TXT      “slaves=2 redundancy=host”!
      config.srv_name.                     TTL!    type   !      three replicas!
                                                                on separate hosts!

Ring segment, one per node
tokens.8081.frob1.example.com.   3600    TXT      “00112233445566778899aabbccddeeff”!
      tokens.port.host.          TTL     type                last key!
!
And if none of this works for you
Remember
/dev/null is
web scale!!




          http://www.xtranormal.com/watch/6995033/
Questions?
                     get in touch!
                    @ricardovice
             ricardo@spotify.com
Thank you.

                    @ricardovice
             ricardo@spotify.com

More Related Content

What's hot

Real time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafkaReal time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafkaTimothy Spann
 
Kafka At Scale in the Cloud
Kafka At Scale in the CloudKafka At Scale in the Cloud
Kafka At Scale in the Cloudconfluent
 
How Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At SpotifyHow Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At SpotifyJosh Baer
 
Distributed stream processing with Apache Kafka
Distributed stream processing with Apache KafkaDistributed stream processing with Apache Kafka
Distributed stream processing with Apache Kafkaconfluent
 
How to launch and defend against a DDoS
How to launch and defend against a DDoSHow to launch and defend against a DDoS
How to launch and defend against a DDoSjgrahamc
 
Prometheus: From technical metrics to business observability
Prometheus: From technical metrics to business observabilityPrometheus: From technical metrics to business observability
Prometheus: From technical metrics to business observabilityJulien Pivotto
 
SIPREC RTPEngine Media Forking
SIPREC RTPEngine Media ForkingSIPREC RTPEngine Media Forking
SIPREC RTPEngine Media ForkingHossein Yavari
 
Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...
Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...
Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...confluent
 
How data drives spotify
How data drives spotifyHow data drives spotify
How data drives spotifyAli Sarrafi
 
Loki - like prometheus, but for logs
Loki - like prometheus, but for logsLoki - like prometheus, but for logs
Loki - like prometheus, but for logsJuraj Hantak
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
Introduction to Grafana Loki
Introduction to Grafana LokiIntroduction to Grafana Loki
Introduction to Grafana LokiJulien Pivotto
 
Scaling WebRTC applications with Janus
Scaling WebRTC applications with JanusScaling WebRTC applications with Janus
Scaling WebRTC applications with JanusLorenzo Miniero
 
Kubernetes Networking with Cilium - Deep Dive
Kubernetes Networking with Cilium - Deep DiveKubernetes Networking with Cilium - Deep Dive
Kubernetes Networking with Cilium - Deep DiveMichal Rostecki
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaSlim Baltagi
 
The Evolution of Big Data at Spotify
The Evolution of Big Data at SpotifyThe Evolution of Big Data at Spotify
The Evolution of Big Data at SpotifyJosh Baer
 
Tracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracingTracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracingHemant Kumar
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
 

What's hot (20)

Real time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafkaReal time stock processing with apache nifi, apache flink and apache kafka
Real time stock processing with apache nifi, apache flink and apache kafka
 
Kafka At Scale in the Cloud
Kafka At Scale in the CloudKafka At Scale in the Cloud
Kafka At Scale in the Cloud
 
How Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At SpotifyHow Apache Drives Music Recommendations At Spotify
How Apache Drives Music Recommendations At Spotify
 
Distributed stream processing with Apache Kafka
Distributed stream processing with Apache KafkaDistributed stream processing with Apache Kafka
Distributed stream processing with Apache Kafka
 
How to launch and defend against a DDoS
How to launch and defend against a DDoSHow to launch and defend against a DDoS
How to launch and defend against a DDoS
 
Prometheus: From technical metrics to business observability
Prometheus: From technical metrics to business observabilityPrometheus: From technical metrics to business observability
Prometheus: From technical metrics to business observability
 
SIPREC RTPEngine Media Forking
SIPREC RTPEngine Media ForkingSIPREC RTPEngine Media Forking
SIPREC RTPEngine Media Forking
 
Apache Kafka - Overview
Apache Kafka - OverviewApache Kafka - Overview
Apache Kafka - Overview
 
Apache Kafka Best Practices
Apache Kafka Best PracticesApache Kafka Best Practices
Apache Kafka Best Practices
 
Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...
Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...
Show Me Kafka Tools That Will Increase My Productivity! (Stephane Maarek, Dat...
 
How data drives spotify
How data drives spotifyHow data drives spotify
How data drives spotify
 
Loki - like prometheus, but for logs
Loki - like prometheus, but for logsLoki - like prometheus, but for logs
Loki - like prometheus, but for logs
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Introduction to Grafana Loki
Introduction to Grafana LokiIntroduction to Grafana Loki
Introduction to Grafana Loki
 
Scaling WebRTC applications with Janus
Scaling WebRTC applications with JanusScaling WebRTC applications with Janus
Scaling WebRTC applications with Janus
 
Kubernetes Networking with Cilium - Deep Dive
Kubernetes Networking with Cilium - Deep DiveKubernetes Networking with Cilium - Deep Dive
Kubernetes Networking with Cilium - Deep Dive
 
Building Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache KafkaBuilding Streaming Data Applications Using Apache Kafka
Building Streaming Data Applications Using Apache Kafka
 
The Evolution of Big Data at Spotify
The Evolution of Big Data at SpotifyThe Evolution of Big Data at Spotify
The Evolution of Big Data at Spotify
 
Tracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracingTracing Micro Services with OpenTracing
Tracing Micro Services with OpenTracing
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 

Viewers also liked

Spotify Brand Audit IMC 613
Spotify Brand Audit IMC 613Spotify Brand Audit IMC 613
Spotify Brand Audit IMC 613Jamie Huggins
 
Astronaut Wheelock Pictures
Astronaut Wheelock PicturesAstronaut Wheelock Pictures
Astronaut Wheelock PicturesTom Kuipers
 
Riding promotion team
Riding promotion teamRiding promotion team
Riding promotion teamJungkoo Kim
 
Sharing china photos on flickr
Sharing china photos on flickrSharing china photos on flickr
Sharing china photos on flickrLeigh Scott
 
Guide for One Person Company Registration
Guide for One Person Company RegistrationGuide for One Person Company Registration
Guide for One Person Company RegistrationBinoy Chacko
 
Kỹ năng thuyết trình
Kỹ năng thuyết trìnhKỹ năng thuyết trình
Kỹ năng thuyết trìnhLinh Pham Dieu
 
Geo23.1102 winter2015 session1
Geo23.1102 winter2015 session1Geo23.1102 winter2015 session1
Geo23.1102 winter2015 session1Melanie Zurba
 
Vincent tema5 town
Vincent tema5 townVincent tema5 town
Vincent tema5 townJacket25
 
Azkena rock
Azkena rockAzkena rock
Azkena rockaneborja
 
Geo23.1102 winter2015 session1
Geo23.1102 winter2015 session1Geo23.1102 winter2015 session1
Geo23.1102 winter2015 session1Melanie Zurba
 
Two wrongs don’t make a right
Two wrongs don’t make a rightTwo wrongs don’t make a right
Two wrongs don’t make a rightBillGENGL1021
 

Viewers also liked (20)

Spotify Brand Audit IMC 613
Spotify Brand Audit IMC 613Spotify Brand Audit IMC 613
Spotify Brand Audit IMC 613
 
Astronaut Wheelock Pictures
Astronaut Wheelock PicturesAstronaut Wheelock Pictures
Astronaut Wheelock Pictures
 
Riding promotion team
Riding promotion teamRiding promotion team
Riding promotion team
 
Who is HAYAL KÖKSAL? What has she done in 40 years of teaching life?
Who is HAYAL KÖKSAL? What has she done in 40 years of teaching life?Who is HAYAL KÖKSAL? What has she done in 40 years of teaching life?
Who is HAYAL KÖKSAL? What has she done in 40 years of teaching life?
 
2016 Leading Seagulls 4 Todays Interns
2016 Leading Seagulls 4 Todays Interns 2016 Leading Seagulls 4 Todays Interns
2016 Leading Seagulls 4 Todays Interns
 
Sharing china photos on flickr
Sharing china photos on flickrSharing china photos on flickr
Sharing china photos on flickr
 
Quechua
QuechuaQuechua
Quechua
 
17 icsqcc hayal koksal
17 icsqcc hayal koksal17 icsqcc hayal koksal
17 icsqcc hayal koksal
 
Guide for One Person Company Registration
Guide for One Person Company RegistrationGuide for One Person Company Registration
Guide for One Person Company Registration
 
www.toneabs.info
www.toneabs.infowww.toneabs.info
www.toneabs.info
 
Kỹ năng thuyết trình
Kỹ năng thuyết trìnhKỹ năng thuyết trình
Kỹ năng thuyết trình
 
Geo23.1102 winter2015 session1
Geo23.1102 winter2015 session1Geo23.1102 winter2015 session1
Geo23.1102 winter2015 session1
 
Bibliotecas famosas
Bibliotecas famosasBibliotecas famosas
Bibliotecas famosas
 
Lurdes
LurdesLurdes
Lurdes
 
2016 leading seagulls 7 teacher candy dates
2016 leading seagulls 7 teacher candy dates2016 leading seagulls 7 teacher candy dates
2016 leading seagulls 7 teacher candy dates
 
Vincent tema5 town
Vincent tema5 townVincent tema5 town
Vincent tema5 town
 
Azkena rock
Azkena rockAzkena rock
Azkena rock
 
Geo23.1102 winter2015 session1
Geo23.1102 winter2015 session1Geo23.1102 winter2015 session1
Geo23.1102 winter2015 session1
 
2016 leading seagulls 16 beautiful minds
2016 leading seagulls 16 beautiful minds 2016 leading seagulls 16 beautiful minds
2016 leading seagulls 16 beautiful minds
 
Two wrongs don’t make a right
Two wrongs don’t make a rightTwo wrongs don’t make a right
Two wrongs don’t make a right
 

Similar to Distributed "Web Scale" Systems

Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Bob Pusateri
 
Spotify: P2P music-on-demand streaming
Spotify: P2P music-on-demand streamingSpotify: P2P music-on-demand streaming
Spotify: P2P music-on-demand streamingRicardo Vice Santos
 
Messaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new frameworkMessaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new frameworkTomas Doran
 
Putting Kafka Into Overdrive
Putting Kafka Into OverdrivePutting Kafka Into Overdrive
Putting Kafka Into OverdriveTodd Palino
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterJohn Adams
 
Parallel and Asynchronous Programming - ITProDevConnections 2012 (English)
Parallel and Asynchronous Programming -  ITProDevConnections 2012 (English)Parallel and Asynchronous Programming -  ITProDevConnections 2012 (English)
Parallel and Asynchronous Programming - ITProDevConnections 2012 (English)Panagiotis Kanavos
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudyJohn Adams
 
Parallel and Asynchronous Programming - ITProDevConnections 2012 (Greek)
Parallel and Asynchronous Programming -  ITProDevConnections 2012 (Greek)Parallel and Asynchronous Programming -  ITProDevConnections 2012 (Greek)
Parallel and Asynchronous Programming - ITProDevConnections 2012 (Greek)Panagiotis Kanavos
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineDataWorks Summit
 
Design for Scale / Surge 2010
Design for Scale / Surge 2010Design for Scale / Surge 2010
Design for Scale / Surge 2010Christopher Brown
 
The economies of scaling software - Abdel Remani
The economies of scaling software - Abdel RemaniThe economies of scaling software - Abdel Remani
The economies of scaling software - Abdel Remanijaxconf
 
The Economies of Scaling Software
The Economies of Scaling SoftwareThe Economies of Scaling Software
The Economies of Scaling SoftwareAbdelmonaim Remani
 
Setting Up .Onion Addresses for your Enterprise, v3.5
Setting Up .Onion Addresses for your Enterprise, v3.5Setting Up .Onion Addresses for your Enterprise, v3.5
Setting Up .Onion Addresses for your Enterprise, v3.5Alec Muffett
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitterRoger Xia
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...smallerror
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...xlight
 
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...Bob Pusateri
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Bob Pusateri
 

Similar to Distributed "Web Scale" Systems (20)

Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
Select Stars: A DBA's Guide to Azure Cosmos DB (SQL Saturday Oslo 2018)
 
Spotify: P2P music-on-demand streaming
Spotify: P2P music-on-demand streamingSpotify: P2P music-on-demand streaming
Spotify: P2P music-on-demand streaming
 
Messaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new frameworkMessaging, interoperability and log aggregation - a new framework
Messaging, interoperability and log aggregation - a new framework
 
Putting Kafka Into Overdrive
Putting Kafka Into OverdrivePutting Kafka Into Overdrive
Putting Kafka Into Overdrive
 
Chirp 2010: Scaling Twitter
Chirp 2010: Scaling TwitterChirp 2010: Scaling Twitter
Chirp 2010: Scaling Twitter
 
Realtime web2012
Realtime web2012Realtime web2012
Realtime web2012
 
Parallel and Asynchronous Programming - ITProDevConnections 2012 (English)
Parallel and Asynchronous Programming -  ITProDevConnections 2012 (English)Parallel and Asynchronous Programming -  ITProDevConnections 2012 (English)
Parallel and Asynchronous Programming - ITProDevConnections 2012 (English)
 
John adams talk cloudy
John adams   talk cloudyJohn adams   talk cloudy
John adams talk cloudy
 
Parallel and Asynchronous Programming - ITProDevConnections 2012 (Greek)
Parallel and Asynchronous Programming -  ITProDevConnections 2012 (Greek)Parallel and Asynchronous Programming -  ITProDevConnections 2012 (Greek)
Parallel and Asynchronous Programming - ITProDevConnections 2012 (Greek)
 
Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query Engine
 
Design for Scale / Surge 2010
Design for Scale / Surge 2010Design for Scale / Surge 2010
Design for Scale / Surge 2010
 
The economies of scaling software - Abdel Remani
The economies of scaling software - Abdel RemaniThe economies of scaling software - Abdel Remani
The economies of scaling software - Abdel Remani
 
The Economies of Scaling Software
The Economies of Scaling SoftwareThe Economies of Scaling Software
The Economies of Scaling Software
 
Setting Up .Onion Addresses for your Enterprise, v3.5
Setting Up .Onion Addresses for your Enterprise, v3.5Setting Up .Onion Addresses for your Enterprise, v3.5
Setting Up .Onion Addresses for your Enterprise, v3.5
 
Fixing twitter
Fixing twitterFixing twitter
Fixing twitter
 
Fixing_Twitter
Fixing_TwitterFixing_Twitter
Fixing_Twitter
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...Fixing Twitter  Improving The Performance And Scalability Of The Worlds Most ...
Fixing Twitter Improving The Performance And Scalability Of The Worlds Most ...
 
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
Select Stars: A SQL DBA's Introduction to Azure Cosmos DB (SQL Saturday Orego...
 
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
Select Stars: A DBA's Guide to Azure Cosmos DB (Chicago Suburban SQL Server U...
 

Recently uploaded

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfhans926745
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdfChristopherTHyatt
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Distributed "Web Scale" Systems

  • 1. Distributed “Web Scale” Systems Ricardo Vice Santos @ricardovice
  • 2. Who am I? •  I’m Ricardo! •  Lead Engineer at Spotify •  ricardovice on twitter, spotify, about.me, kiva, slideshare, github, bitbucket, delicious… •  Portuguese •  Previously working in the video streaming industry •  (only) Discovered Spotify late 2009 •  Joined in 2010
  • 3. spotifiera: to use Spotify; spo·ti·fie·ra Verb to provide a service free of cost;
  • 4. What’s Spotify all about? •  A big catalogue, tons of music •  Available everywhere •  Great user experience •  More convenient than piracy •  Reliable, high availability •  Scalable for many, many users But what really got me hooked up: •  Free, legal ad-supported service •  Very fast
  • 5. The importance of being fast •  High latency can be a problem, not only in First Person Shooters •  Slow performance is a major user experience killer •  At Velocity 2009, Eric Schurman (Bing) and Jake Brutlag (Google Search) showed that increased latency directly hurt usage and revenue per user[1]. •  Latency leads to users leaving, many wont ever come back •  Users will share their experience with friends [1] http://radar.oreilly.com/2009/07/velocity-making-your-site-fast.html
  • 6. So how fast is Spotify? •  We monitor playback latency on the client side •  Current median latency to play any track is 265ms •  On average, the human notion of “instant” is anything under 200ms •  Due to disk lookup, at times it's actually faster to start playing a track from network than from disk •  Below 1% of playbacks experienced stutter
  • 7. “Spotify is fast due to P2P” •  This is something I read a lot around the web •  P2P does play a crucial role in the picture, but… •  Experience at Spotify showed me that most latency issues are directly linked to backend problems •  It’s a mistake to think that we could be this fast without a smart and scalable backend architecture So let’s give credit where credit is due.
  • 8. Going web scale!!1 “Scaling Twitter” Blaine Cook, 2007 http://www.slideshare.net/Blaine/scaling-twitter
  • 9. Handling growth Things to keep in mind: •  Scaling is not an exact science •  There is no such thing as a magic formula •  Usage patterns differ •  There is always a limit to what you can handle •  Fail gracefully •  Continuous evolution process
  • 10. Scaling horizontally •  You can always add more machines! •  Stateless services •  Several processes can share memcached •  Possible to run in “the cloud” (EC2, Rackspace) •  Need some kind of load balancer •  Data sharing/synchronization can be hard •  Complexity: many pieces, maybe hidden SPOFs •  Fundamental to the application’s design
  • 11. Usage patterns Typically, some services are more demanding than others, this can be due to: •  Higher popularity •  Higher complexity •  Low latency expectation •  All combined
  • 12. Decoupling •  Divide and conquer! •  The Unix way •  Resources assigned individually •  Using the right tools to address each problem •  Organization and delegation •  Problems are isolated •  Easier to handle growth
  • 13. Read only services •  The easiest to scale •  Stateless •  Use indices, large read-optimized data containers •  Each node has its local copy •  Data structured according to service •  Updated periodically, during off-peak hours •  Take advantage of OS page cache
  • 14. Read-write services •  User generated content, e.g. playlists •  Hard to ensure consistence of data across instances Solutions: •  Eventual consistency: •  Reads of just written data not guaranteed to be up-to-date •  Locking, atomic operations •  Creating globally unique keys, e.g. usernames •  Transactions, e.g. billing
  • 16. Finding a service via DNS Each service has an SRV DNS record: •  One record with same name for each service instance •  Clients (AP) resolve to find servers providing that service •  Lowest priority record is chosen with weighted shuffle •  Clients retry other instances in case of failures Example SRV record _frobnicator._http.example.com. 3600 SRV 10 50 8081 frob1.example.com.! name TTL type prio weight port host!
  • 17. Request assignment •  Hardware load balancers •  Round-robin DNS •  Proxy servers •  Sharding: •  Each server/instance responsible for subset of data •  Directs client to instance that has its data •  Easy if nothing is shared •  Hard if you require replication
  • 18. Sharding using a DHT Some Spotify services use Dynamo inspired DHTs[1]: •  Each request has a key •  Each service node is responsible for a range of hash keys •  Data is distributed among service nodes •  Redundancy is ensured by re-hashing and writing to replica node •  Data must be transitioned when ring changes ! [1] http://dl.acm.org/citation.cfm?id=1294281
  • 20. Spotify’s DNS powered DHT Configuration of DHT config._frobnicator._http.example.com. 3600 TXT “slaves=0”! config.srv_name. TTL type ! no replication! ! config._frobnicator._http.example.com. 3600 TXT “slaves=2 redundancy=host”! config.srv_name. TTL! type ! three replicas! on separate hosts! Ring segment, one per node tokens.8081.frob1.example.com. 3600 TXT “00112233445566778899aabbccddeeff”! tokens.port.host. TTL type last key! !
  • 21. And if none of this works for you Remember /dev/null is web scale!! http://www.xtranormal.com/watch/6995033/
  • 22. Questions? get in touch! @ricardovice ricardo@spotify.com
  • 23. Thank you. @ricardovice ricardo@spotify.com