SlideShare una empresa de Scribd logo
1 de 33
Michael Richardson
Twitter: @Mr_SPB
1© 2011 Energized Work - www.energizedwork.com
Availability and Recoverability
So what is High Availability?
• Five 9s?
• No Single point of failure?
• Multiple Data Centre’s?
• Fault Tolerance?
• Load Balancing?
• Uptime?
2© 2012 Energized Work - www.energizedwork.com
The 9’s of Availability
3© 2012 Energized Work - www.energizedwork.com
9
9
The 9’s of Availability
4© 2012 Energized Work - www.energizedwork.com
Availability Downtime per Year
One nine (90%) 36.5 days
Two nines (99%) 3.65 days
Three nines (99.9%) 8.76 hours
Four nines (99.99%) 52.56 minutes
Five nines (99.999%) 5.26 minutes
Problem with the 9’s
5© 2012 Energized Work - www.energizedwork.com
• What do they mean?
• Guaranteed or just an SLA
• Multiplicity
(99.9% * 99.9% * 99.9% = 99.7%)
SLA availability numbers:
just aim to provide a level of
confidence in a website’s
service
6© 2012 Energized Work - www.energizedwork.com
No Single Point of
Failure (SPOF)
7© 2012 Energized Work - www.energizedwork.com
two of everything?
8© 2012 Energized Work - www.energizedwork.com
Start with this
9© 2012 Energized Work - www.energizedwork.com
Index.html
Users
End with this
10© 2012 Energized Work - www.energizedwork.com
WEB1
switch 1 switch 2
WEB2 APP1 APP2 DB1 DB2
Firewall 1 Firewall 2
Users
• It’s expensive ££
• Where do you draw the line?
• Are failures independent
• Can you guarantee No SPOF?
• Increased complexity
11© 2012 Energized Work - www.energizedwork.com
Problems with
eliminating SPOF
Problem: Data Centre’s Fail
12© 2012 Energized Work - www.energizedwork.com
Solution: Get a 2nd
Data Centre
13© 2012 Energized Work - www.energizedwork.com
Hot/Hot Multisite
14© 2012 Energized Work - www.energizedwork.com
• Full range of services available in
multiple locations.
• Easy to automate failover of sites
• Data Consistency is hard.
• Capacity Planning concerns
+
Hot/Warm Multisite
15© 2012 Energized Work - www.energizedwork.com
• Simpler than Hot/Hot
• Read/write ratio dependant
• Synchronous or Asynchronously
replicate data?
+
Hot/Cold Multisite
16© 2012 Energized Work - www.energizedwork.com
• Easy to setup
• Will it work?
• Can it be trusted?
• Cold site rapidly become stale
• Is it actually valuable?
+
DR Multisite
17© 2012 Energized Work - www.energizedwork.com
• Fingers crossed you never need it.
• How can/should you test it?
• Cloud?
+
Problems with Multiple sites
18© 2012 Energized Work - www.energizedwork.com
• ££ - it’s expensive
• Managing more systems
• Managing consistency of Data
• Managing Capacity
• Is it still fail proof?
• Unless you test it, it’s just a plan
19© 2012 Energized Work - www.energizedwork.com
We now have a
Complex System
• More redundancy and automation leads
to more complexity.
• More complexity often adds more
points of failure.
20© 2012 Energized Work - www.energizedwork.com
Complex Systems
Author: Dr. Richard Cook
21© 2012 Energized Work - www.energizedwork.com
“How Complex Systems fail”
• Catastrophe is always just around the
corner.
• Human Operators have dual roles.
• Change introduces new forms of failure
Failure and Recovery
22© 2012 Energized Work - www.energizedwork.com
Questions for the Customer
23© 2012 Energized Work - www.energizedwork.com
• What is the cost of downtime?
• What are the RTO and RPO?
24© 2012 Energized Work - www.energizedwork.com
RTO = Recovery Time Objective
RPO = Recovery Point Objective
Aggressive RTO & RPO is
expensive and has a
performance impact.
25© 2012 Energized Work - www.energizedwork.com
RTO / RPO example
26© 2012 Energized Work - www.energizedwork.com
problem
•Simple DB
•Business can tolerate up to 15 minutes
downtime
•10 minute window of data lose.
RTO / RPO example
27© 2012 Energized Work - www.energizedwork.com
Possible solution
1.Continuously replicate data to 2nd
host
2.Continue with nightly backups and also
copy DB transaction logs from the primary
host to another system.
So what’s more important?
28© 2012 Energized Work - www.energizedwork.com
Increasing Availability
Or
Reducing Recovery Time
29© 2012 Energized Work - www.energizedwork.com
MTBF
Or
MTTR
What about MTTD??
30© 2012 Energized Work - www.energizedwork.com
Answer?
It Depends
31© 2012 Energized Work - www.energizedwork.com
Failure is inevitable
32© 2012 Energized Work - www.energizedwork.com
Ask anyone
33© 2011 Energized Work - www.energizedwork.com
Thank you
The End
Twitter - @Mr_SPB

Más contenido relacionado

Similar a System Availability Talk

MTBF / MTTR - Energized Work TekTalk, Mar 2012
MTBF / MTTR - Energized Work TekTalk, Mar 2012MTBF / MTTR - Energized Work TekTalk, Mar 2012
MTBF / MTTR - Energized Work TekTalk, Mar 2012
Energized Work
 
Disaster Recovery with MySQL and Tungsten
Disaster Recovery with MySQL and TungstenDisaster Recovery with MySQL and Tungsten
Disaster Recovery with MySQL and Tungsten
Jeff Mace
 
Presentation virtualizing oracle unlocked enterprise wide benefits
Presentation   virtualizing oracle unlocked enterprise wide benefitsPresentation   virtualizing oracle unlocked enterprise wide benefits
Presentation virtualizing oracle unlocked enterprise wide benefits
solarisyourep
 
Scaling mature systems
Scaling mature systemsScaling mature systems
Scaling mature systems
HanMorten
 
At bruxelles scaling agile - v1.5 slideshare
At bruxelles   scaling agile - v1.5 slideshareAt bruxelles   scaling agile - v1.5 slideshare
At bruxelles scaling agile - v1.5 slideshare
Herve Lourdin
 
Oracle primavera and bpm the power of integration ppt
Oracle primavera and bpm   the power of integration pptOracle primavera and bpm   the power of integration ppt
Oracle primavera and bpm the power of integration ppt
p6academy
 

Similar a System Availability Talk (20)

MTBF / MTTR - Energized Work TekTalk, Mar 2012
MTBF / MTTR - Energized Work TekTalk, Mar 2012MTBF / MTTR - Energized Work TekTalk, Mar 2012
MTBF / MTTR - Energized Work TekTalk, Mar 2012
 
Emc sql server 2012 overview
Emc sql server 2012 overviewEmc sql server 2012 overview
Emc sql server 2012 overview
 
Musings of an MSP - Why Some Things Never Change and Others Have To - Datacom
Musings of an MSP - Why Some Things Never Change and Others Have To - DatacomMusings of an MSP - Why Some Things Never Change and Others Have To - Datacom
Musings of an MSP - Why Some Things Never Change and Others Have To - Datacom
 
2012 Annual State of the Union for Mobile Ecommerce Performance [Velocity EU]
2012 Annual State of the Union for Mobile Ecommerce Performance [Velocity EU]2012 Annual State of the Union for Mobile Ecommerce Performance [Velocity EU]
2012 Annual State of the Union for Mobile Ecommerce Performance [Velocity EU]
 
Disaster Recovery with MySQL and Tungsten
Disaster Recovery with MySQL and TungstenDisaster Recovery with MySQL and Tungsten
Disaster Recovery with MySQL and Tungsten
 
Walmart pagespeed-slide
Walmart pagespeed-slideWalmart pagespeed-slide
Walmart pagespeed-slide
 
Walmart Web Performance Circa 2013
Walmart Web Performance Circa 2013Walmart Web Performance Circa 2013
Walmart Web Performance Circa 2013
 
Presentation virtualizing oracle unlocked enterprise wide benefits
Presentation   virtualizing oracle unlocked enterprise wide benefitsPresentation   virtualizing oracle unlocked enterprise wide benefits
Presentation virtualizing oracle unlocked enterprise wide benefits
 
O'Reilly webcast: Joshua Bixby on Mobile Performance Trends and Predictions
O'Reilly webcast: Joshua Bixby on Mobile Performance Trends and PredictionsO'Reilly webcast: Joshua Bixby on Mobile Performance Trends and Predictions
O'Reilly webcast: Joshua Bixby on Mobile Performance Trends and Predictions
 
Scaling mature systems
Scaling mature systemsScaling mature systems
Scaling mature systems
 
Why You Should Move to the Cloud
Why You Should Move to the CloudWhy You Should Move to the Cloud
Why You Should Move to the Cloud
 
Automation & Cloud Evolution - Long View VMware Forum Calgary January 21 2014
Automation & Cloud Evolution - Long View VMware Forum Calgary January 21 2014Automation & Cloud Evolution - Long View VMware Forum Calgary January 21 2014
Automation & Cloud Evolution - Long View VMware Forum Calgary January 21 2014
 
Executing the Digital Strategy
Executing the Digital StrategyExecuting the Digital Strategy
Executing the Digital Strategy
 
Optimizing Browser Rendering
Optimizing Browser RenderingOptimizing Browser Rendering
Optimizing Browser Rendering
 
How to Choose the Right Cloud for Continuity
How to Choose the Right Cloud for ContinuityHow to Choose the Right Cloud for Continuity
How to Choose the Right Cloud for Continuity
 
Works on my machine, your problem now? - QCon 2014
Works on my machine, your problem now? - QCon 2014Works on my machine, your problem now? - QCon 2014
Works on my machine, your problem now? - QCon 2014
 
At bruxelles scaling agile - v1.5 slideshare
At bruxelles   scaling agile - v1.5 slideshareAt bruxelles   scaling agile - v1.5 slideshare
At bruxelles scaling agile - v1.5 slideshare
 
Scaling CQ5
Scaling CQ5Scaling CQ5
Scaling CQ5
 
Dev talks Cluj 2018 : Java in the 21 Century: Are you thinking far enough ahead?
Dev talks Cluj 2018 : Java in the 21 Century: Are you thinking far enough ahead?Dev talks Cluj 2018 : Java in the 21 Century: Are you thinking far enough ahead?
Dev talks Cluj 2018 : Java in the 21 Century: Are you thinking far enough ahead?
 
Oracle primavera and bpm the power of integration ppt
Oracle primavera and bpm   the power of integration pptOracle primavera and bpm   the power of integration ppt
Oracle primavera and bpm the power of integration ppt
 

Más de m_richardson

Open Source Monitoring Tools
Open Source Monitoring ToolsOpen Source Monitoring Tools
Open Source Monitoring Tools
m_richardson
 

Más de m_richardson (9)

Persistence in the cloud with bosh
Persistence in the cloud with boshPersistence in the cloud with bosh
Persistence in the cloud with bosh
 
bootstrapping containers with confd
bootstrapping containers with confdbootstrapping containers with confd
bootstrapping containers with confd
 
Docker Service Registration and Discovery
Docker Service Registration and DiscoveryDocker Service Registration and Discovery
Docker Service Registration and Discovery
 
Puppetcamp Melbourne - puppetdb
Puppetcamp Melbourne - puppetdbPuppetcamp Melbourne - puppetdb
Puppetcamp Melbourne - puppetdb
 
Node collaboration - sharing information between your systems
Node collaboration - sharing information between your systemsNode collaboration - sharing information between your systems
Node collaboration - sharing information between your systems
 
Node collaboration - Exported Resources and PuppetDB
Node collaboration - Exported Resources and PuppetDBNode collaboration - Exported Resources and PuppetDB
Node collaboration - Exported Resources and PuppetDB
 
Serverspec and Sensu - Testing and Monitoring collide
Serverspec and Sensu - Testing and Monitoring collideServerspec and Sensu - Testing and Monitoring collide
Serverspec and Sensu - Testing and Monitoring collide
 
Chef - managing yours servers with Code
Chef - managing yours servers with CodeChef - managing yours servers with Code
Chef - managing yours servers with Code
 
Open Source Monitoring Tools
Open Source Monitoring ToolsOpen Source Monitoring Tools
Open Source Monitoring Tools
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 

System Availability Talk

  • 1. Michael Richardson Twitter: @Mr_SPB 1© 2011 Energized Work - www.energizedwork.com Availability and Recoverability
  • 2. So what is High Availability? • Five 9s? • No Single point of failure? • Multiple Data Centre’s? • Fault Tolerance? • Load Balancing? • Uptime? 2© 2012 Energized Work - www.energizedwork.com
  • 3. The 9’s of Availability 3© 2012 Energized Work - www.energizedwork.com 9 9
  • 4. The 9’s of Availability 4© 2012 Energized Work - www.energizedwork.com Availability Downtime per Year One nine (90%) 36.5 days Two nines (99%) 3.65 days Three nines (99.9%) 8.76 hours Four nines (99.99%) 52.56 minutes Five nines (99.999%) 5.26 minutes
  • 5. Problem with the 9’s 5© 2012 Energized Work - www.energizedwork.com • What do they mean? • Guaranteed or just an SLA • Multiplicity (99.9% * 99.9% * 99.9% = 99.7%)
  • 6. SLA availability numbers: just aim to provide a level of confidence in a website’s service 6© 2012 Energized Work - www.energizedwork.com
  • 7. No Single Point of Failure (SPOF) 7© 2012 Energized Work - www.energizedwork.com
  • 8. two of everything? 8© 2012 Energized Work - www.energizedwork.com
  • 9. Start with this 9© 2012 Energized Work - www.energizedwork.com Index.html Users
  • 10. End with this 10© 2012 Energized Work - www.energizedwork.com WEB1 switch 1 switch 2 WEB2 APP1 APP2 DB1 DB2 Firewall 1 Firewall 2 Users
  • 11. • It’s expensive ££ • Where do you draw the line? • Are failures independent • Can you guarantee No SPOF? • Increased complexity 11© 2012 Energized Work - www.energizedwork.com Problems with eliminating SPOF
  • 12. Problem: Data Centre’s Fail 12© 2012 Energized Work - www.energizedwork.com
  • 13. Solution: Get a 2nd Data Centre 13© 2012 Energized Work - www.energizedwork.com
  • 14. Hot/Hot Multisite 14© 2012 Energized Work - www.energizedwork.com • Full range of services available in multiple locations. • Easy to automate failover of sites • Data Consistency is hard. • Capacity Planning concerns +
  • 15. Hot/Warm Multisite 15© 2012 Energized Work - www.energizedwork.com • Simpler than Hot/Hot • Read/write ratio dependant • Synchronous or Asynchronously replicate data? +
  • 16. Hot/Cold Multisite 16© 2012 Energized Work - www.energizedwork.com • Easy to setup • Will it work? • Can it be trusted? • Cold site rapidly become stale • Is it actually valuable? +
  • 17. DR Multisite 17© 2012 Energized Work - www.energizedwork.com • Fingers crossed you never need it. • How can/should you test it? • Cloud? +
  • 18. Problems with Multiple sites 18© 2012 Energized Work - www.energizedwork.com • ££ - it’s expensive • Managing more systems • Managing consistency of Data • Managing Capacity • Is it still fail proof? • Unless you test it, it’s just a plan
  • 19. 19© 2012 Energized Work - www.energizedwork.com We now have a Complex System
  • 20. • More redundancy and automation leads to more complexity. • More complexity often adds more points of failure. 20© 2012 Energized Work - www.energizedwork.com Complex Systems
  • 21. Author: Dr. Richard Cook 21© 2012 Energized Work - www.energizedwork.com “How Complex Systems fail” • Catastrophe is always just around the corner. • Human Operators have dual roles. • Change introduces new forms of failure
  • 22. Failure and Recovery 22© 2012 Energized Work - www.energizedwork.com
  • 23. Questions for the Customer 23© 2012 Energized Work - www.energizedwork.com • What is the cost of downtime? • What are the RTO and RPO?
  • 24. 24© 2012 Energized Work - www.energizedwork.com RTO = Recovery Time Objective RPO = Recovery Point Objective
  • 25. Aggressive RTO & RPO is expensive and has a performance impact. 25© 2012 Energized Work - www.energizedwork.com
  • 26. RTO / RPO example 26© 2012 Energized Work - www.energizedwork.com problem •Simple DB •Business can tolerate up to 15 minutes downtime •10 minute window of data lose.
  • 27. RTO / RPO example 27© 2012 Energized Work - www.energizedwork.com Possible solution 1.Continuously replicate data to 2nd host 2.Continue with nightly backups and also copy DB transaction logs from the primary host to another system.
  • 28. So what’s more important? 28© 2012 Energized Work - www.energizedwork.com Increasing Availability Or Reducing Recovery Time
  • 29. 29© 2012 Energized Work - www.energizedwork.com MTBF Or MTTR What about MTTD??
  • 30. 30© 2012 Energized Work - www.energizedwork.com Answer? It Depends
  • 31. 31© 2012 Energized Work - www.energizedwork.com Failure is inevitable
  • 32. 32© 2012 Energized Work - www.energizedwork.com Ask anyone
  • 33. 33© 2011 Energized Work - www.energizedwork.com Thank you The End Twitter - @Mr_SPB

Notas del editor

  1. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  2. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  3. Ask any business how much downtime is acceptable and you will get a consistent answer. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  4. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  5. Found more in Marketing literature than technical literature 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  6. An SLA is just an instrument that makes business people comfortable (just like insurance) 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  7. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  8. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  9. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  10. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  11. 1 & 2 Diminishing returns Paradoxically, adding more components to an overall system design can undermine efforts to achieve high availability Cascading failures 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  12. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  13. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  14. Read & Write anywhere Global Server Load Balancing with DNS 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  15. Read intensive apps are well suited to this – Reads Hot/Hot 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  16. Cold site is so untrusted that perhaps spending hours restoring the primary DC is a better and safer bet. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  17. Cold site is so untrusted that perhaps spending hours restoring the primary DC is a better and safer bet. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  18. Talk about capacity planning Hot/Hot – config switches Most companies don ’ t thoroughly test DC failover. When failure occurs many companies will often focus on restoring the failure in the primary DC rather attempt a failover. So why bother having a 2 nd DC anyway. If you plan on having multiple DC ’ s or DR then test your procedures when you ’ re not in an emergency situation. Game Day events 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  19. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  20. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  21. Mention John Alspaw ’ s Qcon talk 2. Dual roles of humans Defenders against failure Producers of failure 3. Introduce a technology change To prevent low-consequence, but high frequency failures May introduce low frequency, but high consequence failure Introduce new pathways to large-scale, catastrophic failures. Focus of humans is on the beneficial charactistics of the change. New failure ’ s maybe difficult to foresee. Give config management example Knife Resolv.conf 3. Also covers maintenance and why many find it difficult. Build and forget mentality. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  22. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  23. Cost of downtime – easy or difficult to measure Can downtime actually be equated to lost revenue. Give online shopping example 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  24. RTO and RPO are often in competition Give eg of replication lag between 2 sites. Zero RPO example - If replication lags between systems and you have an aggressive RPO you maybe better off taking a few hours outage and focusing on restoring your primary site. Zero RTO example – if replication lags between DC ’ s you may decide to failover immediately and take the data loss for some inflight transactions Aggressive RTO & RPO is expensive and has a performance 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  25. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  26. Typical nightly backups aren ’ t going to cut it. Common practice is to backup systems nightly. Is your business happy to lose up to 24 hours of data? Probably not. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  27. Covers you for any catastrophic hardware failure 2 nd host has independent storage infrastructure. Data corruption would however result in 2 copies of crap 2. Covers you for data corruption Playing back transaction logs will also allow you to identify the place where corruption occurred. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  28. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  29. What about MTTD? 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  30. My experience tells me most companies focus on availability How many companies take nightly tape backups but have never bothered trying to restore or test them? If you think you can built a completely fail-proof system you are kidding yourself. How many companies have game days? 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  31. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  32. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING
  33. 28/10/10 © Energized Work Limited 2010 Agile Evangelists - LEANING