SlideShare una empresa de Scribd logo
1 de 41
Descargar para leer sin conexión
Petabytes and Nanoseconds 
Distributed Data Storage andthe CAP Theorem 
FIN talk 
Robert Greiner 
Nathan Murray 
August 21,2014
CHAPTER 
The Problems 
Your phone can add two numbers in the same time it takes light to travel one foot 
All high frequency trading servers are connected to the NASDAQ network with the same length of cable, so that no party has a speed advantage
A Common Scenario 
Web 
Application 
RDBMS 
+ =
The Solution: Scale All the Things!!1
Why shouldwe scale? 
Throughput 
Latency 
Storage 
Reliability
The Solution? 
Add a load balancer 
Add more web servers 
Tune the DB. Indexes,SPs, etc.
There’sa new bottleneck 
Generally an RDBMS can becomea bottleneck around 10K transactions per second
Next Step… Distribute Your Data 
Each web server can talk to any data storage node 
Nodes distribute queries and replicate data – lots more complexity!
Cluster = Additional Complexity
Enter the CAP Theorem! 
This guy created the CAP Theorem 
This guy’s 
VP Invented the internet
CAP Theorem: Defined 
Within a distributed system, you can only make two of the following three guarantees across a write/read pair
Guarantee 1: Consistency 
If a value is written, and then fetched, I will alwaysget back the new value 
Note: not the same as the C in ACID! 
_
Guarantee 2: Availability 
If a value is written, a success message should always be returned. If a subsequent read returns a stale value, or something reasonable, it’s OK. 
_ 
Note: not the same as the A in HA!
Guarantee 3: Partition Tolerance 
The system will continue to function when network partitions occur –OOP != NP. 
_ 
Note: nothing to do with BAC!
CAP Triangle 
The CAP Theorem is explained as a triangle 
C, A or P: Pick two 
This is true in practice, except…
When choosing a distributed system… 
vs.
… You Can’t Sacrifice Partition Tolerance! 
NOTDistributed 
(a.k.a. NOTPartition Tolerant) 
Available 
AND 
Consistent 
Distributed 
(a.k.a. Partition Tolerant) 
Available 
OR 
Consistent 
_ 
_
CPvs. AP 
Synchronous. 
Waits until partition heals or times out. 
Asynchronous. 
Returns a reasonable response always.
CPvs. AP 
Synchronous. 
Waits until partition heals or times out. 
Asynchronous. 
Returns a reasonable response always. 
At a bank, you get a deposit receipt afterthe work is complete 
At a coffee shop, you get a receipt beforethe work is complete
CHAPTER 
Whendo companies care?
Companies care about internetscale
Distributed Storage Past 
2004 
Google’s Map Reduce paper published 
2006 
Google’s Big Table paper published 
2007 
Amazon’s Dynamo paper published 
2008 
Yahoo runs search on Hadoop 
2008 
Facebook open sources Cassandra 
2008 
Bitcoin paper published 
2009 
Yahoo open sources Hadoop 
2010 
Azure Table Storage released 
2012 
Google’s Spanner and F1 papers 
2013 
Amazon releases DynamoDB inside AWS 
2014 
Google’s Mesa paper published 
2015 
????
Looking forward 
•Open source implementations of more sophisticated storage systems 
•Managed services with more advanced capabilities 
•Google Cloud versions of F1, Spanner, or Mesa? 
•NoSQL + SQL 
•Distributed data storage in untrusted environments
CHAPTER 
How does this affect me
Even our most “legacy” clients are already starting to care about internet scale: 
_
Scenario 
Client = Energy Retailer (Independent Sales Force) 
Sales Agent captures info about potential customer 
Price generated on-demand based on daily rate curve 
Quote no longer valid at midnight 
Each night, rates are updated based on new rate-curve 
Used to take 4hours 
Now takes > 24hours (Due to increased demand)
Current State
Solution Strategy 
Assess 
•Analyze business performance needs 
•Select non-performing work streams 
•Filter –(Could/Should) 
•Prioritize 
•Performance Baseline / Load Test 
Strategize 
•Identify Bottlenecks (CPU/RAM/Network) 
•Optimization strategy 
•Technology Selection 
Implement 
•POC 
•Load Test 
•Optimize 
•Build
Optimize Code 
Scale Up 
Scale Out 
Managed Service
Optimize CodeLevel 1 
Least organizational impact 
No architecture changes required 
Use existing development processes 
Risky –Code may be fine 
Expensive –Dev Resources 
Time Consuming –Dev + Deploy
Scale UpLevel 2 
Easiest solution 
Utilize existing infrastructure 
Little/no architecture changes 
Low probability of network partitions 
May not solve the problem long-term 
Hardware limitations 
Non-linear improvement (2x RAM != 2x Performance) 
C/A
Scale OutLevel 3 
Highest throughput 
Improved system up-time 
No single point of failure 
Linear performance increases 
Use commodity hardware –Hard to scale-up CPU 
Increased infrastructure / system complexity 
Increased probability of network partitions 
Automation complexity 
A/C
Managed ServiceLevel 4 
Low barrier to entry 
No additional hardware investment required 
Treat as extension of existing data center 
Appliance configuration 
Globally redundant (cloud) 
Most organizational change 
Less control and customization 
Built-in redundancy and innovation 
C/A 
A/C
Optimize Code(Level 1) 
•Least organizational impact 
•No architecture changes required 
•Use existing development processes 
•Risky –Code may be fine 
•Expensive –Dev Resources 
•Time Consuming –Dev + Deploy 
Scale Up(Level 2) 
•Easiest solution 
•Utilize existing infrastructure 
•Little/no architecture changes 
•Reduce probability of network partitions 
•May not solve the problem long-term 
•Hardware limitations 
•Non-linear improvement 
Scale Out(Level 3) 
•Highest throughput 
•Improved system up-time 
•No single point of failure 
•Linear performance inc. 
•Use commodity hardware 
•Increased infrastructure / system complexity 
•Increased probability of network partitions 
•Automation complexity 
Managed Service(Level 4) 
•Low barrier to entry 
•No additional hardware investment required 
•Treat as extension of existing data center 
•Appliance configuration 
•Globally redundant (cloud) 
•Most organizational change 
•Less control and customization 
•High innovation 
Pick One (Or More!)
First Attempt
Good Enough?
Taking It to the Next Level
The Best Solution?
What Would YOUDo?
Fin’ 
robert.greiner@parivedasolutions.com 
nathan.murray@parivedasolutions.com

Más contenido relacionado

La actualidad más candente

How to Get Cloud Architecture and Design Right the First Time
How to Get Cloud Architecture and Design Right the First TimeHow to Get Cloud Architecture and Design Right the First Time
How to Get Cloud Architecture and Design Right the First Time
David Linthicum
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWS Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWS
Tom Laszewski
 
Microsoft Cloud Services Architecture
Microsoft Cloud Services ArchitectureMicrosoft Cloud Services Architecture
Microsoft Cloud Services Architecture
David Chou
 

La actualidad más candente (20)

Building enterprise class disaster recovery as a service to aws - session spo...
Building enterprise class disaster recovery as a service to aws - session spo...Building enterprise class disaster recovery as a service to aws - session spo...
Building enterprise class disaster recovery as a service to aws - session spo...
 
Keeping Security In-Step with your Application Demand Curve
Keeping Security In-Step with your Application Demand CurveKeeping Security In-Step with your Application Demand Curve
Keeping Security In-Step with your Application Demand Curve
 
Azure intelligent edge solutions overview
Azure intelligent edge solutions overviewAzure intelligent edge solutions overview
Azure intelligent edge solutions overview
 
Introduction to RightScale
Introduction to RightScaleIntroduction to RightScale
Introduction to RightScale
 
RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...
RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...
RightScale Webinar: Hybrid-IT: Connecting Your On-Premises Infrastructure Wit...
 
Trends in Cloud and Mobile Computing - Alain Azagury, IBM
Trends in Cloud and Mobile Computing - Alain Azagury, IBMTrends in Cloud and Mobile Computing - Alain Azagury, IBM
Trends in Cloud and Mobile Computing - Alain Azagury, IBM
 
How to Get Cloud Architecture and Design Right the First Time
How to Get Cloud Architecture and Design Right the First TimeHow to Get Cloud Architecture and Design Right the First Time
How to Get Cloud Architecture and Design Right the First Time
 
How We end the Walking Dead in the Enterprise - Session Sponsored by Versent
How We end the Walking Dead in the Enterprise - Session Sponsored by VersentHow We end the Walking Dead in the Enterprise - Session Sponsored by Versent
How We end the Walking Dead in the Enterprise - Session Sponsored by Versent
 
Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWS Migrating enterprise workloads to AWS
Migrating enterprise workloads to AWS
 
Best Practices for Architecting VDI with Flash Storage
Best Practices for Architecting VDI with Flash StorageBest Practices for Architecting VDI with Flash Storage
Best Practices for Architecting VDI with Flash Storage
 
AWS Summit Stockholm 2014 – B3 – Integrating on-premises workloads with AWS
AWS Summit Stockholm 2014 – B3 – Integrating on-premises workloads with AWSAWS Summit Stockholm 2014 – B3 – Integrating on-premises workloads with AWS
AWS Summit Stockholm 2014 – B3 – Integrating on-premises workloads with AWS
 
3 Secrets to Becoming a Cloud Security Superhero
3 Secrets to Becoming a Cloud Security Superhero 3 Secrets to Becoming a Cloud Security Superhero
3 Secrets to Becoming a Cloud Security Superhero
 
DevOps at Scale: How Datadog is using AWS and PagerDuty to Keep Pace with Gr...
DevOps at Scale:  How Datadog is using AWS and PagerDuty to Keep Pace with Gr...DevOps at Scale:  How Datadog is using AWS and PagerDuty to Keep Pace with Gr...
DevOps at Scale: How Datadog is using AWS and PagerDuty to Keep Pace with Gr...
 
Get Started Today with Cloud-Ready Contracts | AWS Public Sector Summit 2016
Get Started Today with Cloud-Ready Contracts | AWS Public Sector Summit 2016Get Started Today with Cloud-Ready Contracts | AWS Public Sector Summit 2016
Get Started Today with Cloud-Ready Contracts | AWS Public Sector Summit 2016
 
What Organizational and Governance Changes Do I Need to Make Prior to Migrati...
What Organizational and Governance Changes Do I Need to Make Prior to Migrati...What Organizational and Governance Changes Do I Need to Make Prior to Migrati...
What Organizational and Governance Changes Do I Need to Make Prior to Migrati...
 
Scaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million UsersScaling on AWS for the First 10 Million Users
Scaling on AWS for the First 10 Million Users
 
Event-Driven Serverless Architecture - the next big thing in the cloud (Cleme...
Event-Driven Serverless Architecture - the next big thing in the cloud (Cleme...Event-Driven Serverless Architecture - the next big thing in the cloud (Cleme...
Event-Driven Serverless Architecture - the next big thing in the cloud (Cleme...
 
FSI202 Machine Learning in Capital Markets
FSI202 Machine Learning in Capital MarketsFSI202 Machine Learning in Capital Markets
FSI202 Machine Learning in Capital Markets
 
AWS Architecting In The Cloud
AWS Architecting In The CloudAWS Architecting In The Cloud
AWS Architecting In The Cloud
 
Microsoft Cloud Services Architecture
Microsoft Cloud Services ArchitectureMicrosoft Cloud Services Architecture
Microsoft Cloud Services Architecture
 

Destacado

Destacado (6)

Fin fest 2014 - Internet of Things and APIs
Fin fest 2014 - Internet of Things and APIsFin fest 2014 - Internet of Things and APIs
Fin fest 2014 - Internet of Things and APIs
 
Testing javascript
Testing javascriptTesting javascript
Testing javascript
 
Test Driven Development at 10,000 Feet
Test Driven Development at 10,000 FeetTest Driven Development at 10,000 Feet
Test Driven Development at 10,000 Feet
 
Code Quality and Tipster
Code Quality and TipsterCode Quality and Tipster
Code Quality and Tipster
 
Automated Testing for Websites With Selenium IDE
Automated Testing for Websites With Selenium IDEAutomated Testing for Websites With Selenium IDE
Automated Testing for Websites With Selenium IDE
 
Introduction to Amazon Web Services
Introduction to Amazon Web ServicesIntroduction to Amazon Web Services
Introduction to Amazon Web Services
 

Similar a Petabytes and Nanoseconds

ScalabilityAvailability
ScalabilityAvailabilityScalabilityAvailability
ScalabilityAvailability
webuploader
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151
xlight
 

Similar a Petabytes and Nanoseconds (20)

Building Big Data Streaming Architectures
Building Big Data Streaming ArchitecturesBuilding Big Data Streaming Architectures
Building Big Data Streaming Architectures
 
Migration to the cloud
Migration to the cloudMigration to the cloud
Migration to the cloud
 
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
 
Best Practices for Large-Scale Websites -- Lessons from eBay
Best Practices for Large-Scale Websites -- Lessons from eBayBest Practices for Large-Scale Websites -- Lessons from eBay
Best Practices for Large-Scale Websites -- Lessons from eBay
 
ScalabilityAvailability
ScalabilityAvailabilityScalabilityAvailability
ScalabilityAvailability
 
NoSQL
NoSQLNoSQL
NoSQL
 
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
Hard Truths About Streaming and Eventing (Dan Rosanova, Microsoft) Kafka Summ...
 
Adding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance TestAdding Value in the Cloud with Performance Test
Adding Value in the Cloud with Performance Test
 
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
Tales From The Front: An Architecture For Multi-Data Center Scalable Applicat...
 
Master.pptx
Master.pptxMaster.pptx
Master.pptx
 
Scaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, GoalsScaling Streaming - Concepts, Research, Goals
Scaling Streaming - Concepts, Research, Goals
 
Waters Grid & HPC Course
Waters Grid & HPC CourseWaters Grid & HPC Course
Waters Grid & HPC Course
 
Web Speed And Scalability
Web Speed And ScalabilityWeb Speed And Scalability
Web Speed And Scalability
 
Telehouse Enhanced Connect slide share
Telehouse Enhanced Connect  slide shareTelehouse Enhanced Connect  slide share
Telehouse Enhanced Connect slide share
 
Introduction
IntroductionIntroduction
Introduction
 
Training - What is Performance ?
Training  - What is Performance ?Training  - What is Performance ?
Training - What is Performance ?
 
5 Quick Wins for the Cloud
5 Quick Wins for the Cloud5 Quick Wins for the Cloud
5 Quick Wins for the Cloud
 
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACLPerformance Optimization of Cloud Based Applications by Peter Smith, ACL
Performance Optimization of Cloud Based Applications by Peter Smith, ACL
 
http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151http://www.hfadeel.com/Blog/?p=151
http://www.hfadeel.com/Blog/?p=151
 
Why Distributed Databases?
Why Distributed Databases?Why Distributed Databases?
Why Distributed Databases?
 

Más de Robert Greiner

Más de Robert Greiner (9)

Portfolio Rationalization - Making Sound Financial and Strategic Decisions in...
Portfolio Rationalization - Making Sound Financial and Strategic Decisions in...Portfolio Rationalization - Making Sound Financial and Strategic Decisions in...
Portfolio Rationalization - Making Sound Financial and Strategic Decisions in...
 
Virtual Team Best Practices
Virtual Team Best PracticesVirtual Team Best Practices
Virtual Team Best Practices
 
Becoming the Ideal Team Player
Becoming the Ideal Team PlayerBecoming the Ideal Team Player
Becoming the Ideal Team Player
 
POV - Practical Containerization
POV - Practical ContainerizationPOV - Practical Containerization
POV - Practical Containerization
 
POV - Enterprise Security Canvas
POV - Enterprise Security CanvasPOV - Enterprise Security Canvas
POV - Enterprise Security Canvas
 
Foundations of financial independence
Foundations of financial independenceFoundations of financial independence
Foundations of financial independence
 
Why feedback is important
Why feedback is importantWhy feedback is important
Why feedback is important
 
Infrastructure as Code
Infrastructure as CodeInfrastructure as Code
Infrastructure as Code
 
Introduction to Windows Azure Data Services
Introduction to Windows Azure Data ServicesIntroduction to Windows Azure Data Services
Introduction to Windows Azure Data Services
 

Último

EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
giselly40
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Último (20)

Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

Petabytes and Nanoseconds

  • 1. Petabytes and Nanoseconds Distributed Data Storage andthe CAP Theorem FIN talk Robert Greiner Nathan Murray August 21,2014
  • 2. CHAPTER The Problems Your phone can add two numbers in the same time it takes light to travel one foot All high frequency trading servers are connected to the NASDAQ network with the same length of cable, so that no party has a speed advantage
  • 3. A Common Scenario Web Application RDBMS + =
  • 4. The Solution: Scale All the Things!!1
  • 5. Why shouldwe scale? Throughput Latency Storage Reliability
  • 6. The Solution? Add a load balancer Add more web servers Tune the DB. Indexes,SPs, etc.
  • 7. There’sa new bottleneck Generally an RDBMS can becomea bottleneck around 10K transactions per second
  • 8. Next Step… Distribute Your Data Each web server can talk to any data storage node Nodes distribute queries and replicate data – lots more complexity!
  • 9. Cluster = Additional Complexity
  • 10. Enter the CAP Theorem! This guy created the CAP Theorem This guy’s VP Invented the internet
  • 11. CAP Theorem: Defined Within a distributed system, you can only make two of the following three guarantees across a write/read pair
  • 12. Guarantee 1: Consistency If a value is written, and then fetched, I will alwaysget back the new value Note: not the same as the C in ACID! _
  • 13. Guarantee 2: Availability If a value is written, a success message should always be returned. If a subsequent read returns a stale value, or something reasonable, it’s OK. _ Note: not the same as the A in HA!
  • 14. Guarantee 3: Partition Tolerance The system will continue to function when network partitions occur –OOP != NP. _ Note: nothing to do with BAC!
  • 15. CAP Triangle The CAP Theorem is explained as a triangle C, A or P: Pick two This is true in practice, except…
  • 16. When choosing a distributed system… vs.
  • 17. … You Can’t Sacrifice Partition Tolerance! NOTDistributed (a.k.a. NOTPartition Tolerant) Available AND Consistent Distributed (a.k.a. Partition Tolerant) Available OR Consistent _ _
  • 18. CPvs. AP Synchronous. Waits until partition heals or times out. Asynchronous. Returns a reasonable response always.
  • 19. CPvs. AP Synchronous. Waits until partition heals or times out. Asynchronous. Returns a reasonable response always. At a bank, you get a deposit receipt afterthe work is complete At a coffee shop, you get a receipt beforethe work is complete
  • 21. Companies care about internetscale
  • 22. Distributed Storage Past 2004 Google’s Map Reduce paper published 2006 Google’s Big Table paper published 2007 Amazon’s Dynamo paper published 2008 Yahoo runs search on Hadoop 2008 Facebook open sources Cassandra 2008 Bitcoin paper published 2009 Yahoo open sources Hadoop 2010 Azure Table Storage released 2012 Google’s Spanner and F1 papers 2013 Amazon releases DynamoDB inside AWS 2014 Google’s Mesa paper published 2015 ????
  • 23. Looking forward •Open source implementations of more sophisticated storage systems •Managed services with more advanced capabilities •Google Cloud versions of F1, Spanner, or Mesa? •NoSQL + SQL •Distributed data storage in untrusted environments
  • 24. CHAPTER How does this affect me
  • 25. Even our most “legacy” clients are already starting to care about internet scale: _
  • 26.
  • 27. Scenario Client = Energy Retailer (Independent Sales Force) Sales Agent captures info about potential customer Price generated on-demand based on daily rate curve Quote no longer valid at midnight Each night, rates are updated based on new rate-curve Used to take 4hours Now takes > 24hours (Due to increased demand)
  • 29. Solution Strategy Assess •Analyze business performance needs •Select non-performing work streams •Filter –(Could/Should) •Prioritize •Performance Baseline / Load Test Strategize •Identify Bottlenecks (CPU/RAM/Network) •Optimization strategy •Technology Selection Implement •POC •Load Test •Optimize •Build
  • 30. Optimize Code Scale Up Scale Out Managed Service
  • 31. Optimize CodeLevel 1 Least organizational impact No architecture changes required Use existing development processes Risky –Code may be fine Expensive –Dev Resources Time Consuming –Dev + Deploy
  • 32. Scale UpLevel 2 Easiest solution Utilize existing infrastructure Little/no architecture changes Low probability of network partitions May not solve the problem long-term Hardware limitations Non-linear improvement (2x RAM != 2x Performance) C/A
  • 33. Scale OutLevel 3 Highest throughput Improved system up-time No single point of failure Linear performance increases Use commodity hardware –Hard to scale-up CPU Increased infrastructure / system complexity Increased probability of network partitions Automation complexity A/C
  • 34. Managed ServiceLevel 4 Low barrier to entry No additional hardware investment required Treat as extension of existing data center Appliance configuration Globally redundant (cloud) Most organizational change Less control and customization Built-in redundancy and innovation C/A A/C
  • 35. Optimize Code(Level 1) •Least organizational impact •No architecture changes required •Use existing development processes •Risky –Code may be fine •Expensive –Dev Resources •Time Consuming –Dev + Deploy Scale Up(Level 2) •Easiest solution •Utilize existing infrastructure •Little/no architecture changes •Reduce probability of network partitions •May not solve the problem long-term •Hardware limitations •Non-linear improvement Scale Out(Level 3) •Highest throughput •Improved system up-time •No single point of failure •Linear performance inc. •Use commodity hardware •Increased infrastructure / system complexity •Increased probability of network partitions •Automation complexity Managed Service(Level 4) •Low barrier to entry •No additional hardware investment required •Treat as extension of existing data center •Appliance configuration •Globally redundant (cloud) •Most organizational change •Less control and customization •High innovation Pick One (Or More!)
  • 38. Taking It to the Next Level