SlideShare una empresa de Scribd logo
1 de 26
Descargar para leer sin conexión
The 5 Stages of
     Scale
   Christopher Smith
Who am I?
Two decades experience

Half of that in online advertising

Internet systems engineering

Scaling web serving, data collection
& analysis

Places big & small.
Scalability


scale: v.tr.
  1. To clear or strip of scale or scales.

  2. Weigh a specified weight.

  3. Climb up or over (something steep)
Your App                                    Your App




                                        Responsiveness
Throughput




             0   2      4       6   8                    0   2      4       6   8
                      Usage                                       Usage
Your App       Trend                            Your App       Trend




                                            Responsiveness
Throughput




             0     2        4   6       8                    0     2        4   6       8
                       Usage                                           Usage
Your App                                           Your App       Trend
             Undeniable Extrapolation




                                           Responsiveness
Throughput




             0       500000      1000000                    0         500000      1000000
                      Usage                                           Usage
Your App                                           Your App       Trend
             Undeniable Extrapolation



                              Scalability:



                                           Responsiveness
Throughput




                 Saving This Guy’s Job

             0       500000      1000000                    0         500000      1000000
                      Usage                                           Usage
Scalability
       Envelopes

There is always a “next” bottleneck.

In case of scalability problem...

6 envelopes
Envelope 0

Session partitioning

Commodity: load balancer, multi-*

Linear scale for CPU

Limit: C10K?
Envelope 1
Read Caching

  Reverse-proxy

  memcached

  CDN

log(n) scale: thank you Zipf

Limit: ~200 w/sec
Envelope 2
Get a real persistence framework

  Data structures FTW!

  DB: concurrent read/write

  MOM: queuing/event IO/TP monitors

  Cheat on ACID (particularly C & D)

log(n) scale?

1000-10000 w/sec
Tipping over
Scaling Catamaran’s

RAM caching I/O

RAID

Threads (sometimes)

Packet loss (UR DUING IT WRONG)

SSD’s?
Jeff Dean’s Numbers
Latency Comparison Numbers

--------------------------

L1 cache reference                            0.5 ns

Branch mispredict                             5   ns

L2 cache reference                            7   ns               14x L1 cache

Mutex lock/unlock                            25   ns

Main memory reference                       100   ns               20x L2 cache, 200x L1 cache

Compress 1K bytes with Zippy              3,000   ns

Send 1K bytes over 1 Gbps network        10,000   ns    0.01 ms

Read 4K randomly from SSD*              150,000   ns    0.15 ms

Read 1 MB sequentially from memory      250,000   ns    0.25 ms

Round trip within same datacenter       500,000   ns    0.5   ms

Read 1 MB sequentially from SSD*      1,000,000   ns    1     ms   4X memory

Disk seek                            10,000,000   ns   10     ms   20x datacenter roundtrip
Problem: IO latency
Throughput: 2x every 18 months

Latency:
  CPU: <2x every 18 months

  LAN network: 2x every 2-3 years

  Memory: 2x every 3-5 years

  Disk: 2x every decade? (SSD?)

  WAN Network: 1x every...
Problem IO Latency
Traditional indexes on the wrong side

  Turns a scan in to a seek

  Index lookup: scan 0.1% of records + 1
  random seek

  Scan: scan 100% of records, 0 random seek

  Seek is 10ms & Scan is 100Hz -> 10x win

  Seek is 1ms & Scan is 1GHz -> 1000x loss
Envelope 3

Real partitioning of IO

Move code, not data

Commodities: Map/Reduce (Hadoop), DHT
(Cassandra, HBase, Riak)

CAP Theory limiting sync’ing
Envelope 4

Route new data through data
partitions

  Using MOM/EventIO “the right way”

  ESP/CEP: Eigen, Storm, Esper,
  StreamBase, 0mq, etc.
Envelope 5

Cheat more on reliability.
 UDP w/o reliability > TCP

 Measure loss vs. prevent loss

 Horseshoes, hand grenades,
 features...?
Integrated Systems
Combined IO management solutions:

  real-time memory key/value lookup

  LSM + bitmap indexes + etc.

  eventual consistency

  mobile code for batch processing

Cassandra, HBase, etc.
Efficient Logging

Events in efficient machine parseable
form: (protobuf, thrift, etc.)

Event source writes only to NIC

UDP Multicast

Redundant listeners
message LogEvent {

    required uint64 pid = 1;

    optional uint64 tid = 2;

    optional uint64 sid = 4;

    required uint64 sequence = 5;

    required uint64 timestamp = 6;

    enum Level { PANIC = 0, ERROR = 1..}

    required Level level = 7;

    required bytes payload = 8;

}
Announcements

Dedicated channel.

Payload: channel IP, channel port,
last seq, pid, tid, sid + stats

All announcers listen and self-
throttle.

Directory service accumulates
Consolidation


Redundant journalers (RAID)

ESP: detect loss in real time window

If necessary, Map/Reduce processing
to try to resolve partial loss.
Efficiency
Hundreds of nodes

>50MB/sec

>50,000 pps

3-4 “journalers” resolving data

>5TB reconciled data a day

<0.1% data loss
Envelope 6



Take out 6 envelopes...

Más contenido relacionado

Similar a The 5 Stages of Scale

How to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and FastHow to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and Fast
MapR Technologies
 

Similar a The 5 Stages of Scale (20)

Internet Scale Architecture
Internet Scale ArchitectureInternet Scale Architecture
Internet Scale Architecture
 
Parallel and Distributed Computing on Low Latency Clusters
Parallel and Distributed Computing on Low Latency ClustersParallel and Distributed Computing on Low Latency Clusters
Parallel and Distributed Computing on Low Latency Clusters
 
Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012Cassandra at eBay - Cassandra Summit 2012
Cassandra at eBay - Cassandra Summit 2012
 
SnappyData @ Seattle Spark Meetup
SnappyData @ Seattle Spark MeetupSnappyData @ Seattle Spark Meetup
SnappyData @ Seattle Spark Meetup
 
Tidal scale short_story_v2
Tidal scale short_story_v2Tidal scale short_story_v2
Tidal scale short_story_v2
 
Modernización del manejo de datos con v fabric
Modernización del manejo de datos con v fabricModernización del manejo de datos con v fabric
Modernización del manejo de datos con v fabric
 
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance BarriersCeph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
Ceph Day Berlin: Ceph on All Flash Storage - Breaking Performance Barriers
 
Performance Oriented Design
Performance Oriented DesignPerformance Oriented Design
Performance Oriented Design
 
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of ThingsDay 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
Day 4 - Big Data on AWS - RedShift, EMR & the Internet of Things
 
Advanced Open IoT Platform for Prevention and Early Detection of Forest Fires
Advanced Open IoT Platform for Prevention and Early Detection of Forest FiresAdvanced Open IoT Platform for Prevention and Early Detection of Forest Fires
Advanced Open IoT Platform for Prevention and Early Detection of Forest Fires
 
High-speed, Reactive Microservices 2017
High-speed, Reactive Microservices 2017High-speed, Reactive Microservices 2017
High-speed, Reactive Microservices 2017
 
Pandora FMS - Technical presentation
Pandora FMS - Technical presentationPandora FMS - Technical presentation
Pandora FMS - Technical presentation
 
Real-time DeepLearning on IoT Sensor Data
Real-time DeepLearning on IoT Sensor DataReal-time DeepLearning on IoT Sensor Data
Real-time DeepLearning on IoT Sensor Data
 
A Scalable and Hybrid Cloud Framework for Evaluating Streams of Sensor Data i...
A Scalable and Hybrid Cloud Framework for Evaluating Streams of Sensor Data i...A Scalable and Hybrid Cloud Framework for Evaluating Streams of Sensor Data i...
A Scalable and Hybrid Cloud Framework for Evaluating Streams of Sensor Data i...
 
Pets vs. Cattle: The Elastic Cloud Story
Pets vs. Cattle: The Elastic Cloud StoryPets vs. Cattle: The Elastic Cloud Story
Pets vs. Cattle: The Elastic Cloud Story
 
Betting On Data Grids
Betting On Data GridsBetting On Data Grids
Betting On Data Grids
 
Launching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWSLaunching Your First Big Data Project on AWS
Launching Your First Big Data Project on AWS
 
Mastering Chaos - A Netflix Guide to Microservices
Mastering Chaos - A Netflix Guide to MicroservicesMastering Chaos - A Netflix Guide to Microservices
Mastering Chaos - A Netflix Guide to Microservices
 
How to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and FastHow to Make Hadoop Easy, Dependable and Fast
How to Make Hadoop Easy, Dependable and Fast
 
Starfish: A Self-tuning System for Big Data Analytics
Starfish: A Self-tuning System for Big Data AnalyticsStarfish: A Self-tuning System for Big Data Analytics
Starfish: A Self-tuning System for Big Data Analytics
 

Último

Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

The 5 Stages of Scale

  • 1. The 5 Stages of Scale Christopher Smith
  • 2. Who am I? Two decades experience Half of that in online advertising Internet systems engineering Scaling web serving, data collection & analysis Places big & small.
  • 3. Scalability scale: v.tr. 1. To clear or strip of scale or scales. 2. Weigh a specified weight. 3. Climb up or over (something steep)
  • 4. Your App Your App Responsiveness Throughput 0 2 4 6 8 0 2 4 6 8 Usage Usage
  • 5. Your App Trend Your App Trend Responsiveness Throughput 0 2 4 6 8 0 2 4 6 8 Usage Usage
  • 6. Your App Your App Trend Undeniable Extrapolation Responsiveness Throughput 0 500000 1000000 0 500000 1000000 Usage Usage
  • 7. Your App Your App Trend Undeniable Extrapolation Scalability: Responsiveness Throughput Saving This Guy’s Job 0 500000 1000000 0 500000 1000000 Usage Usage
  • 8. Scalability Envelopes There is always a “next” bottleneck. In case of scalability problem... 6 envelopes
  • 9. Envelope 0 Session partitioning Commodity: load balancer, multi-* Linear scale for CPU Limit: C10K?
  • 10. Envelope 1 Read Caching Reverse-proxy memcached CDN log(n) scale: thank you Zipf Limit: ~200 w/sec
  • 11. Envelope 2 Get a real persistence framework Data structures FTW! DB: concurrent read/write MOM: queuing/event IO/TP monitors Cheat on ACID (particularly C & D) log(n) scale? 1000-10000 w/sec
  • 13. Scaling Catamaran’s RAM caching I/O RAID Threads (sometimes) Packet loss (UR DUING IT WRONG) SSD’s?
  • 14. Jeff Dean’s Numbers Latency Comparison Numbers -------------------------- L1 cache reference 0.5 ns Branch mispredict 5 ns L2 cache reference 7 ns 14x L1 cache Mutex lock/unlock 25 ns Main memory reference 100 ns 20x L2 cache, 200x L1 cache Compress 1K bytes with Zippy 3,000 ns Send 1K bytes over 1 Gbps network 10,000 ns 0.01 ms Read 4K randomly from SSD* 150,000 ns 0.15 ms Read 1 MB sequentially from memory 250,000 ns 0.25 ms Round trip within same datacenter 500,000 ns 0.5 ms Read 1 MB sequentially from SSD* 1,000,000 ns 1 ms 4X memory Disk seek 10,000,000 ns 10 ms 20x datacenter roundtrip
  • 15. Problem: IO latency Throughput: 2x every 18 months Latency: CPU: <2x every 18 months LAN network: 2x every 2-3 years Memory: 2x every 3-5 years Disk: 2x every decade? (SSD?) WAN Network: 1x every...
  • 16. Problem IO Latency Traditional indexes on the wrong side Turns a scan in to a seek Index lookup: scan 0.1% of records + 1 random seek Scan: scan 100% of records, 0 random seek Seek is 10ms & Scan is 100Hz -> 10x win Seek is 1ms & Scan is 1GHz -> 1000x loss
  • 17. Envelope 3 Real partitioning of IO Move code, not data Commodities: Map/Reduce (Hadoop), DHT (Cassandra, HBase, Riak) CAP Theory limiting sync’ing
  • 18. Envelope 4 Route new data through data partitions Using MOM/EventIO “the right way” ESP/CEP: Eigen, Storm, Esper, StreamBase, 0mq, etc.
  • 19. Envelope 5 Cheat more on reliability. UDP w/o reliability > TCP Measure loss vs. prevent loss Horseshoes, hand grenades, features...?
  • 20. Integrated Systems Combined IO management solutions: real-time memory key/value lookup LSM + bitmap indexes + etc. eventual consistency mobile code for batch processing Cassandra, HBase, etc.
  • 21. Efficient Logging Events in efficient machine parseable form: (protobuf, thrift, etc.) Event source writes only to NIC UDP Multicast Redundant listeners
  • 22. message LogEvent { required uint64 pid = 1; optional uint64 tid = 2; optional uint64 sid = 4; required uint64 sequence = 5; required uint64 timestamp = 6; enum Level { PANIC = 0, ERROR = 1..} required Level level = 7; required bytes payload = 8; }
  • 23. Announcements Dedicated channel. Payload: channel IP, channel port, last seq, pid, tid, sid + stats All announcers listen and self- throttle. Directory service accumulates
  • 24. Consolidation Redundant journalers (RAID) ESP: detect loss in real time window If necessary, Map/Reduce processing to try to resolve partial loss.
  • 25. Efficiency Hundreds of nodes >50MB/sec >50,000 pps 3-4 “journalers” resolving data >5TB reconciled data a day <0.1% data loss
  • 26. Envelope 6 Take out 6 envelopes...