SlideShare una empresa de Scribd logo
1 de 33
Retrospective from a startup built in the cloud :  top 3 big lessons from the AWS outage on 04.21.2011 plus 4,369 other smaller ones 5/27/2011 1
What a country : entrepreneurial resiliency 5/27/2011 2
(true story) “robust systems:highly fault-tolerant, on or off grid. eg: our culture wrt entrepreneurs, AWS, the BD API” 5/27/2011 3
Boom 5/27/2011 4
good to be home! Go Buffs 5/27/2011 5
me: previous startupteams in 3 countrieshighly transactional systemMS tech : IIS/MS SQL Serverco-located, leased/owned hardware0% in cloud$75M/yearly rev  5/27/2011 6
me : current startupsystems 100% on AWS99% free/open-source software 5/27/2011 7 standing on the shoulders of giants
fault tolerance: 3 to 47 important failearnings and 4,369 less important ones 5/27/2011 8
in the context of our startup, of course YMMV depending on velocity 5/27/2011 9
Ruger 5/27/2011 10
The Ruger Fault Equivalencytime = money fault tolerance = time²  - risk tolerance  Also known as:  'Fast, good and cheap : pick two‘ 5/27/2011 11
system design philosophy: 5/27/2011 12 leverage proven, open-source tech in the cloud to build a scaleable reliable secure operational foundation quickly
So how do you achievethe right level of fault tolerance in the cloud? 3 tenets 5/27/2011 13
Tenet #1 5/27/2011 14 Scripted Repeatability  Tenet #2 SPOF Elimination Tenet #3 Clear-Cut Communication
who here has used AWS? 5/27/2011 15
Tenet #1prepare a fault-tolerant foundation with scripted repeatability aka automation 5/27/2011 16
from the start :script the non-interactive install of your toolsand OScustom  AMIDebian : great package managementbased on Eric Hammond’s workhttp://alestic.com/ 5/27/2011 17
which will allow you toscript the setup/tear-down of your stack 5/27/2011 18
which will allow you toscript system testsintegrity (3-4K tests)performance (30-40K tests)load, capacity (2-4M requests) 5/27/2011 19
5/27/2011 20 A/B system test results : MySQL Percona Upgrade
That’s how1 person set up andmanaged a networkcomprised of 90+/- server instancesfor 1.5 yearswhile serving various other roleswithout having to leave their chair 5/27/2011 21 try that with real hardware
Tenet #2SPOF Elimination We don’t need no stinkin single points of failure.   5/27/2011 22
SPOF Examples:Cloud ProviderRegionZoneLoad BalancerApp Server DatabaseFred 5/27/2011 23
Cloud Provider fail-over? e.g. AWS –> Rackspace 5/27/2011 24
Region fail-over? e.g. useast->uswest within AWS Nah. 5/27/2011 25
Zone fail-over? Yes. 5/27/2011 26 US-WEST US-EAST
Zone fail-over best practices:are you using auto-scaling?no : distribute server instances evenly between 2 or more zonesyes : trigger scaling on network I/O or custom metrics 5/27/2011 27
Load-balancer (ELB), app server, database fail-over? Yes. 5/27/2011 28
So it’s actually all about reduction of the right SPOFs for your business context Just adding the ability to fail-over and have backups within a region is huge! Probably enough for most. What about Fred? 5/27/2011 29
Tenet #3Clear-Cut Communication transparency is soooo 2010 5/27/2011 30
During an outage, communicating the right things at the right time:hard. But not that hard. 5/27/2011 31
Tenet #1 5/27/2011 32 Three Tenets Revisited Scripted Repeatability  Tenet #2 SPOF Elimination Tenet #3 Clear-Cut Communication
Notes 5/27/2011 33

Más contenido relacionado

La actualidad más candente

Amazon Elastic Beanstalk
Amazon Elastic BeanstalkAmazon Elastic Beanstalk
Amazon Elastic Beanstalk
Eberhard Wolff
 
Autoscaling Ws On Ec2 Apache Con Presentation
Autoscaling Ws On Ec2 Apache Con PresentationAutoscaling Ws On Ec2 Apache Con Presentation
Autoscaling Ws On Ec2 Apache Con Presentation
guest60ed0b
 

La actualidad más candente (18)

Take control of your dev ops dumping ground
Take control of your  dev ops dumping groundTake control of your  dev ops dumping ground
Take control of your dev ops dumping ground
 
SIMCLOUD: Running Operational Simulators in the Cloud
SIMCLOUD: Running Operational Simulators in the CloudSIMCLOUD: Running Operational Simulators in the Cloud
SIMCLOUD: Running Operational Simulators in the Cloud
 
All You Need to Know about AWS Elastic Load Balancer
All You Need to Know about AWS Elastic Load BalancerAll You Need to Know about AWS Elastic Load Balancer
All You Need to Know about AWS Elastic Load Balancer
 
Amazon cloud failure
Amazon cloud failureAmazon cloud failure
Amazon cloud failure
 
Microsoft Azure Automation
Microsoft Azure AutomationMicrosoft Azure Automation
Microsoft Azure Automation
 
MesosCon 2017 - OpenWhisk as an Apache Mesos Framework
MesosCon 2017 - OpenWhisk as an Apache Mesos FrameworkMesosCon 2017 - OpenWhisk as an Apache Mesos Framework
MesosCon 2017 - OpenWhisk as an Apache Mesos Framework
 
Efficient way to manage environments in AWS
Efficient way to manage environments in AWS Efficient way to manage environments in AWS
Efficient way to manage environments in AWS
 
Microsoft Azure. Troubleshooting and monitoring.
Microsoft Azure. Troubleshooting and monitoring.Microsoft Azure. Troubleshooting and monitoring.
Microsoft Azure. Troubleshooting and monitoring.
 
Availability & Scalability with Elastic Load Balancing & Route 53 (CPN204) | ...
Availability & Scalability with Elastic Load Balancing & Route 53 (CPN204) | ...Availability & Scalability with Elastic Load Balancing & Route 53 (CPN204) | ...
Availability & Scalability with Elastic Load Balancing & Route 53 (CPN204) | ...
 
From Docker Straight to AWS
From Docker Straight to AWSFrom Docker Straight to AWS
From Docker Straight to AWS
 
Amazon Elastic Beanstalk
Amazon Elastic BeanstalkAmazon Elastic Beanstalk
Amazon Elastic Beanstalk
 
How NYTimes.com uses Amazon Web Services - AWS Summit 2011
How NYTimes.com uses Amazon Web Services - AWS Summit 2011How NYTimes.com uses Amazon Web Services - AWS Summit 2011
How NYTimes.com uses Amazon Web Services - AWS Summit 2011
 
Magento Developer Talk. Microservice Architecture and Actor Model
Magento Developer Talk. Microservice Architecture and Actor ModelMagento Developer Talk. Microservice Architecture and Actor Model
Magento Developer Talk. Microservice Architecture and Actor Model
 
Evolve18 | Brian Johnson & Ira Lessack | Business Track How To Move Your On-...
Evolve18 | Brian Johnson & Ira Lessack |  Business Track How To Move Your On-...Evolve18 | Brian Johnson & Ira Lessack |  Business Track How To Move Your On-...
Evolve18 | Brian Johnson & Ira Lessack | Business Track How To Move Your On-...
 
Architecting in Cloud : Your Guide to Amazon Web Services
Architecting in Cloud : Your Guide to Amazon Web ServicesArchitecting in Cloud : Your Guide to Amazon Web Services
Architecting in Cloud : Your Guide to Amazon Web Services
 
Autoscaling Ws On Ec2 Apache Con Presentation
Autoscaling Ws On Ec2 Apache Con PresentationAutoscaling Ws On Ec2 Apache Con Presentation
Autoscaling Ws On Ec2 Apache Con Presentation
 
Agile Deployment using Git and AWS Elastic Beanstalk
Agile Deployment using Git and AWS Elastic BeanstalkAgile Deployment using Git and AWS Elastic Beanstalk
Agile Deployment using Git and AWS Elastic Beanstalk
 
Operating OpenStack - Case Study in the Rackspace Cloud
Operating OpenStack - Case Study in the Rackspace CloudOperating OpenStack - Case Study in the Rackspace Cloud
Operating OpenStack - Case Study in the Rackspace Cloud
 

Destacado (9)

Learning by feeling
Learning by feelingLearning by feeling
Learning by feeling
 
Theories of foreign_language_acquisition_pre
Theories of foreign_language_acquisition_preTheories of foreign_language_acquisition_pre
Theories of foreign_language_acquisition_pre
 
Language and the brain
Language and the brainLanguage and the brain
Language and the brain
 
Language and the brain
Language and the brainLanguage and the brain
Language and the brain
 
Second Language learning
Second Language learningSecond Language learning
Second Language learning
 
Language and brain
Language and brainLanguage and brain
Language and brain
 
Second language acquisition
Second language acquisitionSecond language acquisition
Second language acquisition
 
Second Language Acquisition 631
Second Language Acquisition 631Second Language Acquisition 631
Second Language Acquisition 631
 
Second Language Acquisition: An Introduction
Second Language Acquisition: An IntroductionSecond Language Acquisition: An Introduction
Second Language Acquisition: An Introduction
 

Similar a Glue con2011 Jeff Malek from BigDoor

Was liberty at scale
Was liberty at scaleWas liberty at scale
Was liberty at scale
sflynn073
 
Muves3 Elastic Grid Java One2009 Final
Muves3 Elastic Grid Java One2009 FinalMuves3 Elastic Grid Java One2009 Final
Muves3 Elastic Grid Java One2009 Final
Elastic Grid, LLC.
 
Patterns & Practices of Microservices
Patterns & Practices of MicroservicesPatterns & Practices of Microservices
Patterns & Practices of Microservices
Wesley Reisz
 

Similar a Glue con2011 Jeff Malek from BigDoor (20)

Retrospective from a startup built in the cloud: top three big lessons learne...
Retrospective from a startup built in the cloud: top three big lessons learne...Retrospective from a startup built in the cloud: top three big lessons learne...
Retrospective from a startup built in the cloud: top three big lessons learne...
 
Powering the Cloud with Oracle WebLogic
Powering the Cloud with Oracle WebLogicPowering the Cloud with Oracle WebLogic
Powering the Cloud with Oracle WebLogic
 
5 Quick Wins for the Cloud
5 Quick Wins for the Cloud5 Quick Wins for the Cloud
5 Quick Wins for the Cloud
 
Was liberty at scale
Was liberty at scaleWas liberty at scale
Was liberty at scale
 
(ISM319) What Drives the Need for Application-Defined Management
(ISM319) What Drives the Need for Application-Defined Management(ISM319) What Drives the Need for Application-Defined Management
(ISM319) What Drives the Need for Application-Defined Management
 
Scaling Databricks to Run Data and ML Workloads on Millions of VMs
Scaling Databricks to Run Data and ML Workloads on Millions of VMsScaling Databricks to Run Data and ML Workloads on Millions of VMs
Scaling Databricks to Run Data and ML Workloads on Millions of VMs
 
Cto cloud
Cto cloudCto cloud
Cto cloud
 
Muves3 Elastic Grid Java One2009 Final
Muves3 Elastic Grid Java One2009 FinalMuves3 Elastic Grid Java One2009 Final
Muves3 Elastic Grid Java One2009 Final
 
Si so product 1 day technical
Si so product 1 day technicalSi so product 1 day technical
Si so product 1 day technical
 
Web sphere application transformation and modernization at engie electrabel
Web sphere application transformation and modernization at engie electrabelWeb sphere application transformation and modernization at engie electrabel
Web sphere application transformation and modernization at engie electrabel
 
VMworld 2013: Practicing What We Preach: VMware IT on vCenter Operations Mana...
VMworld 2013: Practicing What We Preach: VMware IT on vCenter Operations Mana...VMworld 2013: Practicing What We Preach: VMware IT on vCenter Operations Mana...
VMworld 2013: Practicing What We Preach: VMware IT on vCenter Operations Mana...
 
VMworld 2013: Virtualizing and Tuning Large Scale Java Platforms
VMworld 2013: Virtualizing and Tuning Large Scale Java Platforms VMworld 2013: Virtualizing and Tuning Large Scale Java Platforms
VMworld 2013: Virtualizing and Tuning Large Scale Java Platforms
 
V mware v fabric 5 - what's new technical sales training presentation
V mware v fabric 5 - what's new technical sales training presentationV mware v fabric 5 - what's new technical sales training presentation
V mware v fabric 5 - what's new technical sales training presentation
 
Patterns & Practices of Microservices
Patterns & Practices of MicroservicesPatterns & Practices of Microservices
Patterns & Practices of Microservices
 
Introduction To Cloud Computing
Introduction To Cloud ComputingIntroduction To Cloud Computing
Introduction To Cloud Computing
 
Oracle on AWS partner webinar series
Oracle on AWS partner webinar series Oracle on AWS partner webinar series
Oracle on AWS partner webinar series
 
Reduce Risk with End to End Monitoring of Middleware-based Applications
Reduce Risk with End to End Monitoring of Middleware-based ApplicationsReduce Risk with End to End Monitoring of Middleware-based Applications
Reduce Risk with End to End Monitoring of Middleware-based Applications
 
Madrid meetup #7 deployment models
Madrid meetup #7   deployment modelsMadrid meetup #7   deployment models
Madrid meetup #7 deployment models
 
OMEGAMON XE for Mainframe Networks v5.3 Long presentation
OMEGAMON XE for Mainframe Networks v5.3 Long presentationOMEGAMON XE for Mainframe Networks v5.3 Long presentation
OMEGAMON XE for Mainframe Networks v5.3 Long presentation
 
Why Cloud Management Makes Sense
Why Cloud Management Makes SenseWhy Cloud Management Makes Sense
Why Cloud Management Makes Sense
 

Glue con2011 Jeff Malek from BigDoor

  • 1. Retrospective from a startup built in the cloud : top 3 big lessons from the AWS outage on 04.21.2011 plus 4,369 other smaller ones 5/27/2011 1
  • 2. What a country : entrepreneurial resiliency 5/27/2011 2
  • 3. (true story) “robust systems:highly fault-tolerant, on or off grid. eg: our culture wrt entrepreneurs, AWS, the BD API” 5/27/2011 3
  • 5. good to be home! Go Buffs 5/27/2011 5
  • 6. me: previous startupteams in 3 countrieshighly transactional systemMS tech : IIS/MS SQL Serverco-located, leased/owned hardware0% in cloud$75M/yearly rev 5/27/2011 6
  • 7. me : current startupsystems 100% on AWS99% free/open-source software 5/27/2011 7 standing on the shoulders of giants
  • 8. fault tolerance: 3 to 47 important failearnings and 4,369 less important ones 5/27/2011 8
  • 9. in the context of our startup, of course YMMV depending on velocity 5/27/2011 9
  • 11. The Ruger Fault Equivalencytime = money fault tolerance = time²  - risk tolerance Also known as: 'Fast, good and cheap : pick two‘ 5/27/2011 11
  • 12. system design philosophy: 5/27/2011 12 leverage proven, open-source tech in the cloud to build a scaleable reliable secure operational foundation quickly
  • 13. So how do you achievethe right level of fault tolerance in the cloud? 3 tenets 5/27/2011 13
  • 14. Tenet #1 5/27/2011 14 Scripted Repeatability Tenet #2 SPOF Elimination Tenet #3 Clear-Cut Communication
  • 15. who here has used AWS? 5/27/2011 15
  • 16. Tenet #1prepare a fault-tolerant foundation with scripted repeatability aka automation 5/27/2011 16
  • 17. from the start :script the non-interactive install of your toolsand OScustom AMIDebian : great package managementbased on Eric Hammond’s workhttp://alestic.com/ 5/27/2011 17
  • 18. which will allow you toscript the setup/tear-down of your stack 5/27/2011 18
  • 19. which will allow you toscript system testsintegrity (3-4K tests)performance (30-40K tests)load, capacity (2-4M requests) 5/27/2011 19
  • 20. 5/27/2011 20 A/B system test results : MySQL Percona Upgrade
  • 21. That’s how1 person set up andmanaged a networkcomprised of 90+/- server instancesfor 1.5 yearswhile serving various other roleswithout having to leave their chair 5/27/2011 21 try that with real hardware
  • 22. Tenet #2SPOF Elimination We don’t need no stinkin single points of failure. 5/27/2011 22
  • 23. SPOF Examples:Cloud ProviderRegionZoneLoad BalancerApp Server DatabaseFred 5/27/2011 23
  • 24. Cloud Provider fail-over? e.g. AWS –> Rackspace 5/27/2011 24
  • 25. Region fail-over? e.g. useast->uswest within AWS Nah. 5/27/2011 25
  • 26. Zone fail-over? Yes. 5/27/2011 26 US-WEST US-EAST
  • 27. Zone fail-over best practices:are you using auto-scaling?no : distribute server instances evenly between 2 or more zonesyes : trigger scaling on network I/O or custom metrics 5/27/2011 27
  • 28. Load-balancer (ELB), app server, database fail-over? Yes. 5/27/2011 28
  • 29. So it’s actually all about reduction of the right SPOFs for your business context Just adding the ability to fail-over and have backups within a region is huge! Probably enough for most. What about Fred? 5/27/2011 29
  • 30. Tenet #3Clear-Cut Communication transparency is soooo 2010 5/27/2011 30
  • 31. During an outage, communicating the right things at the right time:hard. But not that hard. 5/27/2011 31
  • 32. Tenet #1 5/27/2011 32 Three Tenets Revisited Scripted Repeatability Tenet #2 SPOF Elimination Tenet #3 Clear-Cut Communication