SlideShare a Scribd company logo
1 of 19
Availability Analysis for Deployment
of In-Cloud Applications
Xiwei Xu, Qinghua Lu, Liming Zhu, Jim (Zhanwen) Li
Sherif Sakr, Hiroshi Wada, Ingo Weber
Software Systems Research Group, NICTA
ISARCS13, Vancouver
Slides at: http://www.slideshare.net/LimingZhu/
NICTA Copyright 2010 From imagination to impact 2
Motivation
• Uncertainties in Cloud are challenging for architecting
critical applications and understanding availability
– Shared resources, weak SLA guarantees and limited visibility
– Rare but high consequence events
– Sporadic activities: upgrade, backup, recovery…
– Subjective uncertainties: impact of configuration choices
• We want to explicitly model the above uncertainties in
application availability analysis of cloud deployment.
– from a cloud consumer perspective
– focusing on mechanisms most relevant to critical
applications: auto-scaling, over-
provisioning, backup, recovery and maintenance.
NICTA Copyright 2010 From imagination to impact 3
Contributions
• SRN(Stochastic Reward Net)-based availability models
• which allow you to specify:
– Deployment architecture (application placements in VM)
– Node/Aggregation level SLAs from infrastructure providers
– Auto-scaling policies and recovery strategies
– Rare events: availability zone or region down
• which give you application availability levels of different options
under different scenarios
• Model evaluation by analysing existing industry best
practices in cloud application deployment
– Quantifying the rule-of-thumb best practices
– Comparing different (best) practices
NICTA Copyright 2010 From imagination to impact 4
Deployment Architecture Assumption
– Stateless VMs: auto-scaling groups
– Stateful VMs: hot standbys
– Backup at separate region for recovery
NICTA Copyright 2010 From imagination to impact 5
Availability Analysis Overview
• SRN-based Models
• Architecture model and recovery model in this paper
• One SRN architecture model per availability zone
NICTA Copyright 2010 From imagination to impact 6
Availability Analysis Overview
• Deployment decisions and patterns
– stateless/stateful application placement within VMs
– auto-scaling policies
– multi-zone configurations
NICTA Copyright 2010 From imagination to impact 7
Availability Analysis Overview
• SLA from the cloud providers
• Node level (Rackspace) or zone level (Amazon)
NICTA Copyright 2010 From imagination to impact 8
Availability Analysis Overview
• Recovery strategy
• Auto-regeneration of stateless VMs and different
recovery mechanisms for stateful VMs
• Different Recovery-Time/Point-Objective (RTO/RPO)
NICTA Copyright 2010 From imagination to impact 9
Availability Analysis Overview
• Application-specific data
– Stateless VM start-up time…
– Stateful VM replication…
NICTA Copyright 2010 From imagination to impact 10
Stochastic Reward Net
• Stochastic Reward Net (SRN)
– Stochastic Petri Net variant
– Firing delays
– Reward function
• Constructs
• Places: VM states
(Full, Running, Stoped, Failed )
• Token: VMs
• Transition
• Guard function
• Transition rate: 1) frequency of
events, 2) delay before the
transition fires
• Reward Function:
if((#Running1>0) 1 else 0
NICTA Copyright 2010 From imagination to impact 11
SRN-based Availability Models
NICTA Copyright 2010 From imagination to impact 12
Availability Models: Auto-scaling
NICTA Copyright 2010 From imagination to impact 13
Availability Models: Auto-scaling
gScaleSelf1:
if(#Running1<=#Running2 && #Stopped1>0) 1 else 0
gScaleOther1:
if(#Running1>#Running2 && #Stopped2>0) 1 else 0
NICTA Copyright 2010 From imagination to impact 14
Availability Models: Stateful VM
NICTA Copyright 2010 From imagination to impact 15
Availability Models—Disaster Recovery
• Availability zone life cycle
– Interact with the big
architecture model
• Stateless VM recovery
– Backup/AMI
• Stateful VM recovery
– Backup
– Replica
– Hot standby
NICTA Copyright 2010 From imagination to impact 16
Case 1: Multi-zone Deployment
• Parameters
– Amazon EC2 SLA of 99.95% availability
– Zone fail rate: 0.00011, MTTR: 4.38 hours per year
– Application specific measurement of transitions
0.01% = 52.56 mins downtime per year
0.4% diff = 35 hours
0.76% diff = 66 hours
NICTA Copyright 2010 From imagination to impact 17
Case 2: Recovery across Availability Zone
• Industry rule of thumb: ―Target auto-scale 30-60% until you have
50% headroom for load spikes. Lose an AZ leads to 90% utilisation.‖
• Impact on overall availability?
• 30-60% vs. traditional 70-90%?
• over-provisioning vs. auto-scaling?
0.29% diff = 25 hours
NICTA Copyright 2010 From imagination to impact 18
Case 3: Disaster Recovery across Regions
• Trade-off between RPO and RTO
• RPO: Recovery Point Objective
• RTO: Recovery Time Objective
Yuruware — http://www.yuruware.com/
0.2% diff = 17 hours
NICTA Copyright 2010 From imagination to impact
Conclusion and Future Work
• SRN-based availability models
– Application-level availability
– Highly configurable for different deployment architectures
– Model different uncertainties and scenarios for critical systems
– Quantify and compare choices and enable what-if analysis
– Evaluated using industry best practices
• Future work
– Better evaluation!
– Integrated models on impact of upgrade, live migration, backup and
subjective uncertainties (in IEEE Cloud 13)
Q. Lu, X. Xu, L. Zhu, L. Bass, et al., "Incorporating Uncertainty into in-Cloud Application
Deployment Decisions for Availability," in IEEE Cloud 2013
Liming.Zhu@nicta.com.au
Slides available at http://www.slideshare.net/LimingZhu/
19

More Related Content

More from Liming Zhu

Trends & Innovation in Cyber and Digitaltech
Trends & Innovationin Cyber and DigitaltechTrends & Innovationin Cyber and Digitaltech
Trends & Innovation in Cyber and DigitaltechLiming Zhu
 
Responsible/Trustworthy AI in the Era of Foundation Models
Responsible/Trustworthy AI in the Era of Foundation Models Responsible/Trustworthy AI in the Era of Foundation Models
Responsible/Trustworthy AI in the Era of Foundation Models Liming Zhu
 
ICSE23 Keynote: Software Engineering as the Linchpin of Responsible AI
ICSE23 Keynote: Software Engineering as the Linchpin of Responsible AIICSE23 Keynote: Software Engineering as the Linchpin of Responsible AI
ICSE23 Keynote: Software Engineering as the Linchpin of Responsible AILiming Zhu
 
International Cooperation for Research on Privacy and Data Protection - Austr...
International Cooperation for Research on Privacy and Data Protection - Austr...International Cooperation for Research on Privacy and Data Protection - Austr...
International Cooperation for Research on Privacy and Data Protection - Austr...Liming Zhu
 
RegTech for IR - Opportunities and Lessons
RegTech for IR - Opportunities and LessonsRegTech for IR - Opportunities and Lessons
RegTech for IR - Opportunities and LessonsLiming Zhu
 
Emerging Technologies in Data Sharing and Analytics at Data61
Emerging Technologies in Data Sharing and Analytics at Data61Emerging Technologies in Data Sharing and Analytics at Data61
Emerging Technologies in Data Sharing and Analytics at Data61Liming Zhu
 
Responsible AI The Australian Approach
Responsible AIThe Australian ApproachResponsible AIThe Australian Approach
Responsible AI The Australian ApproachLiming Zhu
 
Distributed Trust Architecture: The New Reality of ML-based Systems
Distributed Trust Architecture: The New Reality of ML-based SystemsDistributed Trust Architecture: The New Reality of ML-based Systems
Distributed Trust Architecture: The New Reality of ML-based SystemsLiming Zhu
 
Distributed Trust Architecture: The New Foundation of Everything
Distributed Trust Architecture: The New Foundation of EverythingDistributed Trust Architecture: The New Foundation of Everything
Distributed Trust Architecture: The New Foundation of EverythingLiming Zhu
 
Cyber technologies for SME growth – Barriers and Solutions
Cyber technologies for SME growth – Barriers and SolutionsCyber technologies for SME growth – Barriers and Solutions
Cyber technologies for SME growth – Barriers and SolutionsLiming Zhu
 
Emerging Technologies in Synthetic Representation and Digital Twin
Emerging Technologies in Synthetic Representation and Digital TwinEmerging Technologies in Synthetic Representation and Digital Twin
Emerging Technologies in Synthetic Representation and Digital TwinLiming Zhu
 
Responsible AI & Cybersecurity: A tale of two technology risks
Responsible AI & Cybersecurity: A tale of two technology risksResponsible AI & Cybersecurity: A tale of two technology risks
Responsible AI & Cybersecurity: A tale of two technology risksLiming Zhu
 
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...Liming Zhu
 
Challenges in Practicing High Frequency Releases in Cloud Environments
Challenges in Practicing High Frequency Releases in Cloud Environments Challenges in Practicing High Frequency Releases in Cloud Environments
Challenges in Practicing High Frequency Releases in Cloud Environments Liming Zhu
 
Dependable Operation - Performance Management and Capacity Planning Under Con...
Dependable Operation - Performance Management and Capacity Planning Under Con...Dependable Operation - Performance Management and Capacity Planning Under Con...
Dependable Operation - Performance Management and Capacity Planning Under Con...Liming Zhu
 
Dependable Operations
Dependable OperationsDependable Operations
Dependable OperationsLiming Zhu
 
Modelling and Analysing Operation Processes for Dependability
Modelling and Analysing Operation Processes for Dependability Modelling and Analysing Operation Processes for Dependability
Modelling and Analysing Operation Processes for Dependability Liming Zhu
 
Cloud API Issues: an Empirical Study and Impact
Cloud API Issues: an Empirical Study and ImpactCloud API Issues: an Empirical Study and Impact
Cloud API Issues: an Empirical Study and ImpactLiming Zhu
 

More from Liming Zhu (18)

Trends & Innovation in Cyber and Digitaltech
Trends & Innovationin Cyber and DigitaltechTrends & Innovationin Cyber and Digitaltech
Trends & Innovation in Cyber and Digitaltech
 
Responsible/Trustworthy AI in the Era of Foundation Models
Responsible/Trustworthy AI in the Era of Foundation Models Responsible/Trustworthy AI in the Era of Foundation Models
Responsible/Trustworthy AI in the Era of Foundation Models
 
ICSE23 Keynote: Software Engineering as the Linchpin of Responsible AI
ICSE23 Keynote: Software Engineering as the Linchpin of Responsible AIICSE23 Keynote: Software Engineering as the Linchpin of Responsible AI
ICSE23 Keynote: Software Engineering as the Linchpin of Responsible AI
 
International Cooperation for Research on Privacy and Data Protection - Austr...
International Cooperation for Research on Privacy and Data Protection - Austr...International Cooperation for Research on Privacy and Data Protection - Austr...
International Cooperation for Research on Privacy and Data Protection - Austr...
 
RegTech for IR - Opportunities and Lessons
RegTech for IR - Opportunities and LessonsRegTech for IR - Opportunities and Lessons
RegTech for IR - Opportunities and Lessons
 
Emerging Technologies in Data Sharing and Analytics at Data61
Emerging Technologies in Data Sharing and Analytics at Data61Emerging Technologies in Data Sharing and Analytics at Data61
Emerging Technologies in Data Sharing and Analytics at Data61
 
Responsible AI The Australian Approach
Responsible AIThe Australian ApproachResponsible AIThe Australian Approach
Responsible AI The Australian Approach
 
Distributed Trust Architecture: The New Reality of ML-based Systems
Distributed Trust Architecture: The New Reality of ML-based SystemsDistributed Trust Architecture: The New Reality of ML-based Systems
Distributed Trust Architecture: The New Reality of ML-based Systems
 
Distributed Trust Architecture: The New Foundation of Everything
Distributed Trust Architecture: The New Foundation of EverythingDistributed Trust Architecture: The New Foundation of Everything
Distributed Trust Architecture: The New Foundation of Everything
 
Cyber technologies for SME growth – Barriers and Solutions
Cyber technologies for SME growth – Barriers and SolutionsCyber technologies for SME growth – Barriers and Solutions
Cyber technologies for SME growth – Barriers and Solutions
 
Emerging Technologies in Synthetic Representation and Digital Twin
Emerging Technologies in Synthetic Representation and Digital TwinEmerging Technologies in Synthetic Representation and Digital Twin
Emerging Technologies in Synthetic Representation and Digital Twin
 
Responsible AI & Cybersecurity: A tale of two technology risks
Responsible AI & Cybersecurity: A tale of two technology risksResponsible AI & Cybersecurity: A tale of two technology risks
Responsible AI & Cybersecurity: A tale of two technology risks
 
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...
POD-Diagnosis: Error Detection and Diagnosis of Sporadic Operations on Cloud ...
 
Challenges in Practicing High Frequency Releases in Cloud Environments
Challenges in Practicing High Frequency Releases in Cloud Environments Challenges in Practicing High Frequency Releases in Cloud Environments
Challenges in Practicing High Frequency Releases in Cloud Environments
 
Dependable Operation - Performance Management and Capacity Planning Under Con...
Dependable Operation - Performance Management and Capacity Planning Under Con...Dependable Operation - Performance Management and Capacity Planning Under Con...
Dependable Operation - Performance Management and Capacity Planning Under Con...
 
Dependable Operations
Dependable OperationsDependable Operations
Dependable Operations
 
Modelling and Analysing Operation Processes for Dependability
Modelling and Analysing Operation Processes for Dependability Modelling and Analysing Operation Processes for Dependability
Modelling and Analysing Operation Processes for Dependability
 
Cloud API Issues: an Empirical Study and Impact
Cloud API Issues: an Empirical Study and ImpactCloud API Issues: an Empirical Study and Impact
Cloud API Issues: an Empirical Study and Impact
 

Recently uploaded

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEarley Information Science
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 

Recently uploaded (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 

Availability Analysis for Deployment of In-Cloud Applications

  • 1. Availability Analysis for Deployment of In-Cloud Applications Xiwei Xu, Qinghua Lu, Liming Zhu, Jim (Zhanwen) Li Sherif Sakr, Hiroshi Wada, Ingo Weber Software Systems Research Group, NICTA ISARCS13, Vancouver Slides at: http://www.slideshare.net/LimingZhu/
  • 2. NICTA Copyright 2010 From imagination to impact 2 Motivation • Uncertainties in Cloud are challenging for architecting critical applications and understanding availability – Shared resources, weak SLA guarantees and limited visibility – Rare but high consequence events – Sporadic activities: upgrade, backup, recovery… – Subjective uncertainties: impact of configuration choices • We want to explicitly model the above uncertainties in application availability analysis of cloud deployment. – from a cloud consumer perspective – focusing on mechanisms most relevant to critical applications: auto-scaling, over- provisioning, backup, recovery and maintenance.
  • 3. NICTA Copyright 2010 From imagination to impact 3 Contributions • SRN(Stochastic Reward Net)-based availability models • which allow you to specify: – Deployment architecture (application placements in VM) – Node/Aggregation level SLAs from infrastructure providers – Auto-scaling policies and recovery strategies – Rare events: availability zone or region down • which give you application availability levels of different options under different scenarios • Model evaluation by analysing existing industry best practices in cloud application deployment – Quantifying the rule-of-thumb best practices – Comparing different (best) practices
  • 4. NICTA Copyright 2010 From imagination to impact 4 Deployment Architecture Assumption – Stateless VMs: auto-scaling groups – Stateful VMs: hot standbys – Backup at separate region for recovery
  • 5. NICTA Copyright 2010 From imagination to impact 5 Availability Analysis Overview • SRN-based Models • Architecture model and recovery model in this paper • One SRN architecture model per availability zone
  • 6. NICTA Copyright 2010 From imagination to impact 6 Availability Analysis Overview • Deployment decisions and patterns – stateless/stateful application placement within VMs – auto-scaling policies – multi-zone configurations
  • 7. NICTA Copyright 2010 From imagination to impact 7 Availability Analysis Overview • SLA from the cloud providers • Node level (Rackspace) or zone level (Amazon)
  • 8. NICTA Copyright 2010 From imagination to impact 8 Availability Analysis Overview • Recovery strategy • Auto-regeneration of stateless VMs and different recovery mechanisms for stateful VMs • Different Recovery-Time/Point-Objective (RTO/RPO)
  • 9. NICTA Copyright 2010 From imagination to impact 9 Availability Analysis Overview • Application-specific data – Stateless VM start-up time… – Stateful VM replication…
  • 10. NICTA Copyright 2010 From imagination to impact 10 Stochastic Reward Net • Stochastic Reward Net (SRN) – Stochastic Petri Net variant – Firing delays – Reward function • Constructs • Places: VM states (Full, Running, Stoped, Failed ) • Token: VMs • Transition • Guard function • Transition rate: 1) frequency of events, 2) delay before the transition fires • Reward Function: if((#Running1>0) 1 else 0
  • 11. NICTA Copyright 2010 From imagination to impact 11 SRN-based Availability Models
  • 12. NICTA Copyright 2010 From imagination to impact 12 Availability Models: Auto-scaling
  • 13. NICTA Copyright 2010 From imagination to impact 13 Availability Models: Auto-scaling gScaleSelf1: if(#Running1<=#Running2 && #Stopped1>0) 1 else 0 gScaleOther1: if(#Running1>#Running2 && #Stopped2>0) 1 else 0
  • 14. NICTA Copyright 2010 From imagination to impact 14 Availability Models: Stateful VM
  • 15. NICTA Copyright 2010 From imagination to impact 15 Availability Models—Disaster Recovery • Availability zone life cycle – Interact with the big architecture model • Stateless VM recovery – Backup/AMI • Stateful VM recovery – Backup – Replica – Hot standby
  • 16. NICTA Copyright 2010 From imagination to impact 16 Case 1: Multi-zone Deployment • Parameters – Amazon EC2 SLA of 99.95% availability – Zone fail rate: 0.00011, MTTR: 4.38 hours per year – Application specific measurement of transitions 0.01% = 52.56 mins downtime per year 0.4% diff = 35 hours 0.76% diff = 66 hours
  • 17. NICTA Copyright 2010 From imagination to impact 17 Case 2: Recovery across Availability Zone • Industry rule of thumb: ―Target auto-scale 30-60% until you have 50% headroom for load spikes. Lose an AZ leads to 90% utilisation.‖ • Impact on overall availability? • 30-60% vs. traditional 70-90%? • over-provisioning vs. auto-scaling? 0.29% diff = 25 hours
  • 18. NICTA Copyright 2010 From imagination to impact 18 Case 3: Disaster Recovery across Regions • Trade-off between RPO and RTO • RPO: Recovery Point Objective • RTO: Recovery Time Objective Yuruware — http://www.yuruware.com/ 0.2% diff = 17 hours
  • 19. NICTA Copyright 2010 From imagination to impact Conclusion and Future Work • SRN-based availability models – Application-level availability – Highly configurable for different deployment architectures – Model different uncertainties and scenarios for critical systems – Quantify and compare choices and enable what-if analysis – Evaluated using industry best practices • Future work – Better evaluation! – Integrated models on impact of upgrade, live migration, backup and subjective uncertainties (in IEEE Cloud 13) Q. Lu, X. Xu, L. Zhu, L. Bass, et al., "Incorporating Uncertainty into in-Cloud Application Deployment Decisions for Availability," in IEEE Cloud 2013 Liming.Zhu@nicta.com.au Slides available at http://www.slideshare.net/LimingZhu/ 19

Editor's Notes

  1. In this paper, we only show the architecture model and the recovery model due to space limitations.