SlideShare una empresa de Scribd logo
1 de 29
Descargar para leer sin conexión
Netflix Development Patterns for
Rapid Iteration, Scale, Performance, & Availability
Neil Hunt, Netflix
November 13, 2013

© 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Are You Designing Systems That Are:
•
•
•
•

Web-scale
Global
Highly-available
Consumer-facing

• Cloud Native
Cloud Native
•
•
•
•
•

Service oriented architecture
Redundancy
Statelessness
NoSQL
Eventual consistency
Assumptions
Everything is Broken

Hardware will fail

Scale

Slowly Changing
Large Scale

Rapid Change
Large Scale

Telcos Web-Scale
Enterprise IT Startups
Slowly Changing
Small Scale

Rapid Change
Small Scale

Everything works

Software will fail
Speed
Netflix Cloud Goals:
Availability, Scale, Performance
Performance
• Reduce session start by 1s
Save 1 human lifetime per day!
Win more moments of truth
• Suggest choices 1% better
500k hours/day additional value delivered
Scale
•
•
•
•
•

50% y/y traffic growth
50 Countries, 3 continents
Tens of thousands of instances at peak
4 AWS regions, 12 datacenters
~$.001 per start
Availability
• Aspire to 4 x nines (99.99% of starts successful)
• Per Quarter:
– Downtime: < 3 mins (peak time)
– Successful starts: 9.999B
– Failures: 1M
 frustration, calls, lost business
Availabilities Compound
N Service
Dependencies
2

…

N dependencies

99.99%

.99

1000

99.99%

.999

100

99.99%

.9998

10

99.99N%

Availability

.9
Availabilities Compound
To achieve 99.99% availability
with 1000 components
requires:
or
99.9999% availability
for each dependency

Isolation for
independence

Component failure leads
to system failure

Component failure leads
to degradation rather than
system failure
Availability, Scale, Performance
Are Not Enough!
Rapid Iteration – Rate of Change
• Running tests
• Rolling out tests
– Engineering the winning test experience for scale

• Adding features
• Scaling up
• Removing features, simplifying, minimizing
Testing
• Up to 1,000 changes per day!
Rate of Change
• Change leads to bugs
–
–
–
–

New features
New configurations
New types of inputs
Scaling up

• Availability is in tension with rate of change
Availability / Rate of Change Tradeoff
Availability

99.999%

99.99%
Frontier of
availability/change
99.9%

99%
1

10

100

Rate of Change

1000
Availability / Rate of Change Tradeoff
Availability

99.999%

99.99%
Frontier of
availability/change
99.9%

99%
1

10

100

Rate of Change

1000
Shifting the Curve…
Availability

99.999%

99.99%

99.9%

99%
1

10

100

Rate of Change

1000
Shifting the Curve
• Must break the chained dependencies
that compound in cascading system failure
• Subsystem isolation:
– Failure in one component
should never result in cascading system failure
Isolating Subsystems
Redundant systems with timeout & failover
• Failure of instance
• Failure of network
• Latency monkey to
test

Dependent
System
Timeout

Dependence
Isolating Subsystems
Redundant systems with timeout & failover
• Failure of instance
• Failure of network

Higher Tier
System
Longer
timeout
Dependent
System
Short
timeout

• Latency monkey to
test
Dependence
Isolating Subsystems
Timeout with fallback default response
• Network failure
• Software bug

{ status=mem,
plan=4,
device=true }

Dependent
System
Timeout &
Default response

Dependence
Isolating Subsystems
Canary Push
• Network failure
• Software bug

Dependent
System
Timeout

Canary
instance
new code

Dependence
Isolating Subsystems
Red/Black deployment
• Software bugs

Dependent
System
Fail back to
old code

Bad code
pushed

Dependence
V2.3

Dependence
V2.2
Isolating Subsystems
Standby Blue system
• Independent
implementation
• Simplified logic

Dependent
System
Fail to static
version

Static reference
implementation
Dependence
V2.3
Isolating Subsystems

Load
Balancer

Zone isolation
• Infrastructure failure
(e.g. power outage)

Zone A

Zone B

Dependent
System

Dependent
System

• Chaos Gorilla
Dependence

Dependence
Isolating Subsystems
Region isolation
DNS

• Infrastructure
software bugs
(e.g. load
balancer fail)
• Chaos Kong

Region E

Region W

Load
Balancer

Load
Balancer

Zone A

Zone B

Zone A

Zone B

Dependen
t System

Dependen
t System

Dependen
t System

Dependen
t System

Dependence

Dependence

Dependence

Dependence
Isolating Subsystems
Dependency Mode

Isolating Technique

Instance Failure
Network failure

Redundant systems with failover and timeout
Timeout with default response

Network failure
Software bug

Canary push
Red-black deployment
Blue systems

Infrastructure failure

Zone isolation

Cross-zone software bugs

Region isolation
Trying Harder Won’t Cut It
• Trying harder gets a linear return on an exponential
problem
• Need to be great at execution
AND
Have the right architecture
• What architectural features are you using to ensure
availability, scale, performance, & rapid rate of change?
Please give us your feedback on this
presentation

DMG206
As a thank you, we will select prize
winners daily for completed surveys!

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Terraform -- Infrastructure as Code
Terraform -- Infrastructure as CodeTerraform -- Infrastructure as Code
Terraform -- Infrastructure as Code
 
Migrating to the Cloud
Migrating to the CloudMigrating to the Cloud
Migrating to the Cloud
 
Transforming Organizations with CI/CD
Transforming Organizations with CI/CDTransforming Organizations with CI/CD
Transforming Organizations with CI/CD
 
Amazon EKS - Elastic Container Service for Kubernetes
Amazon EKS - Elastic Container Service for KubernetesAmazon EKS - Elastic Container Service for Kubernetes
Amazon EKS - Elastic Container Service for Kubernetes
 
Chaos Engineering Kubernetes
Chaos Engineering KubernetesChaos Engineering Kubernetes
Chaos Engineering Kubernetes
 
Site reliability engineering
Site reliability engineeringSite reliability engineering
Site reliability engineering
 
Introduction to Chaos Engineering
Introduction to Chaos EngineeringIntroduction to Chaos Engineering
Introduction to Chaos Engineering
 
Intro to AWS: EC2 & Compute Services
Intro to AWS: EC2 & Compute ServicesIntro to AWS: EC2 & Compute Services
Intro to AWS: EC2 & Compute Services
 
AWS 101 and the benefits of Migrating to the Cloud
AWS 101 and the benefits of Migrating to the CloudAWS 101 and the benefits of Migrating to the Cloud
AWS 101 and the benefits of Migrating to the Cloud
 
SRE 101
SRE 101SRE 101
SRE 101
 
SRE: Site Reliability Engineering
SRE: Site Reliability EngineeringSRE: Site Reliability Engineering
SRE: Site Reliability Engineering
 
Red Hat Openshift Fundamentals.pptx
Red Hat Openshift Fundamentals.pptxRed Hat Openshift Fundamentals.pptx
Red Hat Openshift Fundamentals.pptx
 
Terraform
TerraformTerraform
Terraform
 
From Monolithic to Microservices
From Monolithic to Microservices From Monolithic to Microservices
From Monolithic to Microservices
 
Global Netflix Platform
Global Netflix PlatformGlobal Netflix Platform
Global Netflix Platform
 
Iterating Towards a Cloud-Enabled IT Organization (ENT204-R2) - AWS re:Invent...
Iterating Towards a Cloud-Enabled IT Organization (ENT204-R2) - AWS re:Invent...Iterating Towards a Cloud-Enabled IT Organization (ENT204-R2) - AWS re:Invent...
Iterating Towards a Cloud-Enabled IT Organization (ENT204-R2) - AWS re:Invent...
 
SRE-iously! Reliability!
SRE-iously! Reliability!SRE-iously! Reliability!
SRE-iously! Reliability!
 
DevOps Powerpoint Presentation Slides
DevOps Powerpoint Presentation SlidesDevOps Powerpoint Presentation Slides
DevOps Powerpoint Presentation Slides
 
Cloud Native In-Depth
Cloud Native In-DepthCloud Native In-Depth
Cloud Native In-Depth
 
Cloud Migration Strategy Framework
Cloud Migration Strategy FrameworkCloud Migration Strategy Framework
Cloud Migration Strategy Framework
 

Destacado

Destacado (20)

Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
 
Migrating My.T-Mobile.com to AWS (ENT214) | AWS re:Invent 2013
Migrating My.T-Mobile.com to AWS (ENT214) | AWS re:Invent 2013Migrating My.T-Mobile.com to AWS (ENT214) | AWS re:Invent 2013
Migrating My.T-Mobile.com to AWS (ENT214) | AWS re:Invent 2013
 
Java Microservices with Netflix OSS & Spring
Java Microservices with Netflix OSS & Spring Java Microservices with Netflix OSS & Spring
Java Microservices with Netflix OSS & Spring
 
MicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scaleMicroServices at Netflix - challenges of scale
MicroServices at Netflix - challenges of scale
 
Expanding you business through M&A - Hosting Industry
Expanding you business through M&A - Hosting IndustryExpanding you business through M&A - Hosting Industry
Expanding you business through M&A - Hosting Industry
 
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
Designing a Scalable Twitter - Patterns for Designing Scalable Real-Time Web ...
 
Hulu.com
Hulu.comHulu.com
Hulu.com
 
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial IntroductionGluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
Gluecon 2013 - NetflixOSS Cloud Native Tutorial Introduction
 
Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013Bottleneck analysis - Devopsdays Silicon Valley 2013
Bottleneck analysis - Devopsdays Silicon Valley 2013
 
Arc305 how netflix leverages multiple regions to increase availability an i...
Arc305 how netflix leverages multiple regions to increase availability   an i...Arc305 how netflix leverages multiple regions to increase availability   an i...
Arc305 how netflix leverages multiple regions to increase availability an i...
 
Architectures for High Availability - QConSF
Architectures for High Availability - QConSFArchitectures for High Availability - QConSF
Architectures for High Availability - QConSF
 
Svc 202-netflix-open-source
Svc 202-netflix-open-sourceSvc 202-netflix-open-source
Svc 202-netflix-open-source
 
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
CMG2013 Workshop: Netflix Cloud Native, Capacity, Performance and Cost Optimi...
 
Recsys 2014 Keynote: The Value of Better Recommendations - For Businesses, Co...
Recsys 2014 Keynote: The Value of Better Recommendations - For Businesses, Co...Recsys 2014 Keynote: The Value of Better Recommendations - For Businesses, Co...
Recsys 2014 Keynote: The Value of Better Recommendations - For Businesses, Co...
 
Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)Systems Monitoring with Prometheus (Devops Ireland April 2015)
Systems Monitoring with Prometheus (Devops Ireland April 2015)
 
Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)Cloud Architecture Tutorial - Running in the Cloud (3of3)
Cloud Architecture Tutorial - Running in the Cloud (3of3)
 
Prometheus Overview
Prometheus OverviewPrometheus Overview
Prometheus Overview
 
A quick comparison between Netflix, Hulu, iTunes and whatnot
A quick comparison between Netflix, Hulu, iTunes and whatnotA quick comparison between Netflix, Hulu, iTunes and whatnot
A quick comparison between Netflix, Hulu, iTunes and whatnot
 
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
Your Linux AMI: Optimization and Performance (CPN302) | AWS re:Invent 2013
 
Production and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning ModelsProduction and Beyond: Deploying and Managing Machine Learning Models
Production and Beyond: Deploying and Managing Machine Learning Models
 

Similar a Netflix Development Patterns for Scale, Performance & Availability (DMG206) | AWS re:Invent 2013

Tiger oracle
Tiger oracleTiger oracle
Tiger oracle
d0nn9n
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQLWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Continuent
 

Similar a Netflix Development Patterns for Scale, Performance & Availability (DMG206) | AWS re:Invent 2013 (20)

More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...
 
(PFC305) Embracing Failure: Fault-Injection and Service Reliability | AWS re:...
(PFC305) Embracing Failure: Fault-Injection and Service Reliability | AWS re:...(PFC305) Embracing Failure: Fault-Injection and Service Reliability | AWS re:...
(PFC305) Embracing Failure: Fault-Injection and Service Reliability | AWS re:...
 
Embracing Failure - Fault Injection and Service Resilience at Netflix
Embracing Failure - Fault Injection and Service Resilience at NetflixEmbracing Failure - Fault Injection and Service Resilience at Netflix
Embracing Failure - Fault Injection and Service Resilience at Netflix
 
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...
More Nines for Your Dimes: Improving Availability and Lowering Costs using Au...
 
Scaling Systems: Architectures that grow
Scaling Systems: Architectures that growScaling Systems: Architectures that grow
Scaling Systems: Architectures that grow
 
Service Stampede: Surviving a Thousand Services
Service Stampede: Surviving a Thousand ServicesService Stampede: Surviving a Thousand Services
Service Stampede: Surviving a Thousand Services
 
Mini-Training: Netflix Simian Army
Mini-Training: Netflix Simian ArmyMini-Training: Netflix Simian Army
Mini-Training: Netflix Simian Army
 
8 cloud design patterns you ought to know - Update Conference 2018
8 cloud design patterns you ought to know - Update Conference 20188 cloud design patterns you ought to know - Update Conference 2018
8 cloud design patterns you ought to know - Update Conference 2018
 
High Availability in the Cloud - Architectural Best Practices
High Availability in the Cloud - Architectural Best PracticesHigh Availability in the Cloud - Architectural Best Practices
High Availability in the Cloud - Architectural Best Practices
 
Designing apps for resiliency
Designing apps for resiliencyDesigning apps for resiliency
Designing apps for resiliency
 
Tiger oracle
Tiger oracleTiger oracle
Tiger oracle
 
(ISM301) Engineering Netflix Global Operations In The Cloud
(ISM301) Engineering Netflix Global Operations In The Cloud(ISM301) Engineering Netflix Global Operations In The Cloud
(ISM301) Engineering Netflix Global Operations In The Cloud
 
Engineering Netflix Global Operations in the Cloud
Engineering Netflix Global Operations in the CloudEngineering Netflix Global Operations in the Cloud
Engineering Netflix Global Operations in the Cloud
 
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQLWebinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
Webinar Slides: MySQL HA/DR/Geo-Scale - High Noon #4: MS Azure Database MySQL
 
So we're running Apache ZooKeeper. Now What? By Camille Fournier
So we're running Apache ZooKeeper. Now What? By Camille Fournier So we're running Apache ZooKeeper. Now What? By Camille Fournier
So we're running Apache ZooKeeper. Now What? By Camille Fournier
 
Site reliability in the Serverless age - Serverless Boston 2019
Site reliability in the Serverless age  - Serverless Boston 2019Site reliability in the Serverless age  - Serverless Boston 2019
Site reliability in the Serverless age - Serverless Boston 2019
 
Tokyo azure meetup #12 service fabric internals
Tokyo azure meetup #12   service fabric internalsTokyo azure meetup #12   service fabric internals
Tokyo azure meetup #12 service fabric internals
 
Cloud Design Patterns - Hong Kong Codeaholics
Cloud Design Patterns - Hong Kong CodeaholicsCloud Design Patterns - Hong Kong Codeaholics
Cloud Design Patterns - Hong Kong Codeaholics
 
AWS Summit London 2014 | Improving Availability and Lowering Costs (300)
AWS Summit London 2014 | Improving Availability and Lowering Costs (300)AWS Summit London 2014 | Improving Availability and Lowering Costs (300)
AWS Summit London 2014 | Improving Availability and Lowering Costs (300)
 
Why Distributed Databases?
Why Distributed Databases?Why Distributed Databases?
Why Distributed Databases?
 

Más de Amazon Web Services

Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
Amazon Web Services
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
Amazon Web Services
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
Amazon Web Services
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
Amazon Web Services
 

Más de Amazon Web Services (20)

Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
Come costruire servizi di Forecasting sfruttando algoritmi di ML e deep learn...
 
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
Big Data per le Startup: come creare applicazioni Big Data in modalità Server...
 
Esegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS FargateEsegui pod serverless con Amazon EKS e AWS Fargate
Esegui pod serverless con Amazon EKS e AWS Fargate
 
Costruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWSCostruire Applicazioni Moderne con AWS
Costruire Applicazioni Moderne con AWS
 
Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot Come spendere fino al 90% in meno con i container e le istanze spot
Come spendere fino al 90% in meno con i container e le istanze spot
 
Open banking as a service
Open banking as a serviceOpen banking as a service
Open banking as a service
 
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
Rendi unica l’offerta della tua startup sul mercato con i servizi Machine Lea...
 
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...OpsWorks Configuration Management: automatizza la gestione e i deployment del...
OpsWorks Configuration Management: automatizza la gestione e i deployment del...
 
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows WorkloadsMicrosoft Active Directory su AWS per supportare i tuoi Windows Workloads
Microsoft Active Directory su AWS per supportare i tuoi Windows Workloads
 
Computer Vision con AWS
Computer Vision con AWSComputer Vision con AWS
Computer Vision con AWS
 
Database Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatareDatabase Oracle e VMware Cloud on AWS i miti da sfatare
Database Oracle e VMware Cloud on AWS i miti da sfatare
 
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJSCrea la tua prima serverless ledger-based app con QLDB e NodeJS
Crea la tua prima serverless ledger-based app con QLDB e NodeJS
 
API moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e webAPI moderne real-time per applicazioni mobili e web
API moderne real-time per applicazioni mobili e web
 
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatareDatabase Oracle e VMware Cloud™ on AWS: i miti da sfatare
Database Oracle e VMware Cloud™ on AWS: i miti da sfatare
 
Tools for building your MVP on AWS
Tools for building your MVP on AWSTools for building your MVP on AWS
Tools for building your MVP on AWS
 
How to Build a Winning Pitch Deck
How to Build a Winning Pitch DeckHow to Build a Winning Pitch Deck
How to Build a Winning Pitch Deck
 
Building a web application without servers
Building a web application without serversBuilding a web application without servers
Building a web application without servers
 
Fundraising Essentials
Fundraising EssentialsFundraising Essentials
Fundraising Essentials
 
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
AWS_HK_StartupDay_Building Interactive websites while automating for efficien...
 
Introduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container ServiceIntroduzione a Amazon Elastic Container Service
Introduzione a Amazon Elastic Container Service
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 

Último (20)

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 

Netflix Development Patterns for Scale, Performance & Availability (DMG206) | AWS re:Invent 2013

  • 1. Netflix Development Patterns for Rapid Iteration, Scale, Performance, & Availability Neil Hunt, Netflix November 13, 2013 © 2013 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
  • 2. Are You Designing Systems That Are: • • • • Web-scale Global Highly-available Consumer-facing • Cloud Native
  • 3. Cloud Native • • • • • Service oriented architecture Redundancy Statelessness NoSQL Eventual consistency
  • 4. Assumptions Everything is Broken Hardware will fail Scale Slowly Changing Large Scale Rapid Change Large Scale Telcos Web-Scale Enterprise IT Startups Slowly Changing Small Scale Rapid Change Small Scale Everything works Software will fail Speed
  • 6. Performance • Reduce session start by 1s Save 1 human lifetime per day! Win more moments of truth • Suggest choices 1% better 500k hours/day additional value delivered
  • 7. Scale • • • • • 50% y/y traffic growth 50 Countries, 3 continents Tens of thousands of instances at peak 4 AWS regions, 12 datacenters ~$.001 per start
  • 8. Availability • Aspire to 4 x nines (99.99% of starts successful) • Per Quarter: – Downtime: < 3 mins (peak time) – Successful starts: 9.999B – Failures: 1M  frustration, calls, lost business
  • 9. Availabilities Compound N Service Dependencies 2 … N dependencies 99.99% .99 1000 99.99% .999 100 99.99% .9998 10 99.99N% Availability .9
  • 10. Availabilities Compound To achieve 99.99% availability with 1000 components requires: or 99.9999% availability for each dependency Isolation for independence Component failure leads to system failure Component failure leads to degradation rather than system failure
  • 12. Rapid Iteration – Rate of Change • Running tests • Rolling out tests – Engineering the winning test experience for scale • Adding features • Scaling up • Removing features, simplifying, minimizing
  • 13. Testing • Up to 1,000 changes per day!
  • 14. Rate of Change • Change leads to bugs – – – – New features New configurations New types of inputs Scaling up • Availability is in tension with rate of change
  • 15. Availability / Rate of Change Tradeoff Availability 99.999% 99.99% Frontier of availability/change 99.9% 99% 1 10 100 Rate of Change 1000
  • 16. Availability / Rate of Change Tradeoff Availability 99.999% 99.99% Frontier of availability/change 99.9% 99% 1 10 100 Rate of Change 1000
  • 18. Shifting the Curve • Must break the chained dependencies that compound in cascading system failure • Subsystem isolation: – Failure in one component should never result in cascading system failure
  • 19. Isolating Subsystems Redundant systems with timeout & failover • Failure of instance • Failure of network • Latency monkey to test Dependent System Timeout Dependence
  • 20. Isolating Subsystems Redundant systems with timeout & failover • Failure of instance • Failure of network Higher Tier System Longer timeout Dependent System Short timeout • Latency monkey to test Dependence
  • 21. Isolating Subsystems Timeout with fallback default response • Network failure • Software bug { status=mem, plan=4, device=true } Dependent System Timeout & Default response Dependence
  • 22. Isolating Subsystems Canary Push • Network failure • Software bug Dependent System Timeout Canary instance new code Dependence
  • 23. Isolating Subsystems Red/Black deployment • Software bugs Dependent System Fail back to old code Bad code pushed Dependence V2.3 Dependence V2.2
  • 24. Isolating Subsystems Standby Blue system • Independent implementation • Simplified logic Dependent System Fail to static version Static reference implementation Dependence V2.3
  • 25. Isolating Subsystems Load Balancer Zone isolation • Infrastructure failure (e.g. power outage) Zone A Zone B Dependent System Dependent System • Chaos Gorilla Dependence Dependence
  • 26. Isolating Subsystems Region isolation DNS • Infrastructure software bugs (e.g. load balancer fail) • Chaos Kong Region E Region W Load Balancer Load Balancer Zone A Zone B Zone A Zone B Dependen t System Dependen t System Dependen t System Dependen t System Dependence Dependence Dependence Dependence
  • 27. Isolating Subsystems Dependency Mode Isolating Technique Instance Failure Network failure Redundant systems with failover and timeout Timeout with default response Network failure Software bug Canary push Red-black deployment Blue systems Infrastructure failure Zone isolation Cross-zone software bugs Region isolation
  • 28. Trying Harder Won’t Cut It • Trying harder gets a linear return on an exponential problem • Need to be great at execution AND Have the right architecture • What architectural features are you using to ensure availability, scale, performance, & rapid rate of change?
  • 29. Please give us your feedback on this presentation DMG206 As a thank you, we will select prize winners daily for completed surveys!