SlideShare una empresa de Scribd logo
1 de 41
Descargar para leer sin conexión
99.999% Available OpenStack Cloud
-
A Builder's Guide
Danny Al-Gaaf (Deutsche Telekom)
OpenStack Summit 2015 - Tokyo
● Motivation
● Availability and SLA's
● Data centers
○ Setup and failure scenarios
● OpenStack and Ceph
○ Architecture and Critical Components
○ HA setup
○ Quorum?
● OpenStack and Ceph == HA?
○ Failure scenarios
○ Mitigation
● Conclusions
Overview
2
Motivation
NFV Cloud @ Deutsche Telekom
● Datacenter design
○ Backend DCs
■ Few but classic DCs
■ High SLAs for infrastructure and services
■ For private/customer data and services
○ Frontend DCs
■ Small but many
■ Near to the customer
■ Lower SLAs, can fail at any time
■ NFVs:
● Spread over many FDCs
● Failures are handled by services and not the infrastructure
● Run telco core services @OpenStack/KVM/Ceph
4
Availability
Availability
● Measured relative to “100 % operational”
6
availability downtime classification
99.9% 8.76 hours/year high availability
99.99% 52.6 minutes/year very high availability
99.999% 5.26 minutes/year highest availability
99.9999% 0.526 minutes/year disaster tolerant
High Availability
● Continuous system availability in case of component
failures
● Which availability?
○ Server
○ Network
○ Datacenter
○ Cloud
○ Application/Service
● End-to-End availability most interesting
7
High Availability
● Calculation
○ Each component contributes to the service availability
■ Infrastructure
■ Hardware
■ Software
■ Processes
○ Likelihood of disaster and failure scenarios
○ Model can get very complex
● SLA’s
○ ITIL (IT Infrastructure Library)
○ Planned maintenance depending on SLA may be excluded
8
Data centers
Failure scenarios
● Power outage
○ External
○ Internal
○ Backup UPS/Generator
● Network outage
○ External connectivity
○ Internal
■ Cables
■ Switches, router
● Failure of a server or a component
● Failure of a software service
10
Failure scenarios
● Human error still often leading
cause of outage
○ Misconfiguration
○ Accidents
○ Emergency power-off
● Disaster
○ Fire
○ Flood
○ Earthquake
○ Plane crash
○ Nuclear accident
11
Data Center Tiers
12
Mitigation
● Identify potential SPoF
● Use redundant components
● Careful planning
○ Network design (external/internal)
○ Power management (external/internal)
○ Fire suppression
○ Disaster management
○ Monitoring
● 5-nines on DC/HW level hard to achieve
○ Tier IV usually too expensive (compared with Tier III or III+)
○ Requires HA concept on cloud and application level
13
Example: Network
● Spine/leaf arch
● Redundant
○ DC-R
○ Spine switches
○ Leaf switches (ToR)
○ OAM switches
○ Firewall
● Server
○ Redundant NICs
○ Redundant power lines
and supplies
14
Ceph and OpenStack
Architecture: Ceph
16
Architecture: Ceph Components
● OSDs
○ 10s - 1000s per cluster
○ One per device (HDD/SDD/RAID Group, SAN …)
○ Store objects
○ Handle replication and recovery
● MONs:
○ Maintain cluster membership and states
○ Use PAXOS protocol to establish quorum
consensus
○ Small, lightweight
○ Odd number17
Architecture: Ceph and OpenStack
18
HA - Critical Components
Which services need to be HA?
● Control plane
○ Provisioning, management
○ API endpoints and services
○ Admin nodes
○ Control nodes
● Data plane
○ Steady states
○ Storage
○ Network
19
HA Setup
● Stateless services
○ No dependency between requests
○ After reply no further attention required
○ API endpoints (e.g. nova-api, glance-api,...) or nova-scheduler
● Stateful service
○ Action typically comprises out of multiple requests
○ Subsequent requests depend on the results of a former request
○ Databases, RabbitMQ
20
HA Setup
21
active/passive active/active
stateless ● load balance
redundant services
● load balance redundant
services
stateful ● bring replacement
resource online
● redundant services, all
with the same state
● state changes are
passed to all instances
OpenStack HA
22
Quorum?
● Required to decide which cluster partition/member is
primary to prevent data/service corruption
● Examples:
○ Databases
■ MariaDB / Galera, MongoDB, CassandraDB
○ Pacemaker/corosync
○ Ceph Monitors
■ Paxos
■ Odd number of MONs required
■ At least 3 MONs for HA, simple majority (2:3, 3:5, 4:7, …)
■ Without quorum:
● no changes of cluster membership (e.g. add new MONs/ODSs)
● Clients can’t connect to cluster
23
OpenStack and Ceph == HA ?
SPoF
● OpenStack HA
○ No SPoF assumed
● Ceph
○ No SPoF assumed
○ Availability of RBDs is critical to VMs
○ Availability of RadosGW can be easily managed via HAProxy
● What in case of failures on higher level?
○ Data center cores or fire compartments
○ Network
■ Physical
■ Misconfiguration
○ Power
25
Setup - Two Rooms
26
Failure scenarios - FC fails
27
Failure scenarios - FC fails
28
Failure scenarios - Split brain
29
● Ceph
● Quorum selects B
● Storage in A stops
● OpenStack HA:
● Selects B
● VMs in B still running
● Best-case scenario
Failure scenarios - Split brain
30
● Ceph
● Quorum selects B
● Storage in A stops
● OpenStack HA:
● Selects A
● VMs in A and B stop
working
● Worst-case scenario
Other issues
● Replica distribution
○ Two room setup:
■ 2 or 3 replica contain risk of having only one replica left
■ Would require 4 replica (2:2)
● Reduced performance
● Increased traffic and costs
○ Alternative: erasure coding
■ Reduced performance, less space required
● Spare capacity
○ Remaining room requires spare capacity to restore
○ Depends on
■ Failure/restore scenario
■ Replication vs erasure code
○ Costs
31
Mitigation - Three FCs
32
● Third FC/failure
zone hosting all
services
● Usually higher
costs
● More resistant
against failures
● Better replica
distribution
● More east/west
traffic
Mitigation - Quorum Room
33
● Most DCs have
backup rooms
● Only a few servers
to host quorum
related services
● Less cost
intensive
● Can mitigate split
brain between FCs
(depending on
network layout)
Mitigation - Pets vs Cattle
34
● NO pets allowed !!!
● Only cloud-ready applications
Mitigation - Failure tolerant applications
35
● Tier level is not the most relevant layer
● Application must build their own cluster
mechanisms on top of the DC
→ increases the availability significantly
● Data replication must be done across
multi-region
● In case of a disaster route traffic to
different DC
● Many VNF (virtual network functions)
already support such setups
Mitigation - Federated Object Stores
36
● Best way to synchronize and replicate
data across multiple DC is usage of
object storage
● Sync is done asynchronously
Open issues:
● Doesn’t solve replication of databases
● Many applications don’t support object
storage and need to be adapted
● Applications also need to support
regions/zones
Mitigation - Outlook
● “OpenStack follows Storage”
○ Use RBDs as fencing devices
○ Extend Ceph MONs
■ Include information about physical placement similar to CRUSH map
■ Enable HA setup to query quorum decisions and map quorum to physical layout
● Passive standby Ceph MONs to ease deployment of
MONs if quorum fails
○ http://tracker.ceph.com/projects/ceph/wiki/Passive_monitors
● Generic quorum service/library ?
37
Conclusions
Conclusions
● OpenStack and Ceph provide HA if carefully planed
○ Be aware of potential failure scenarios!
○ All Quorums need must be synced
○ Third room must be used
○ Replica distribution and spare capacity must be considered
○ Ceph need more extended quorum information
● Target for five 9’s is E2E
○ Five 9’s on data center level very expensive
○ No pets !!!
○ Distribute applications or services over multiple DCs
39
Get involved !
● Ceph
○ https://ceph.com/community/contribute/
○ ceph-devel@vger.kernel.org
○ IRC: OFTC
■ #ceph,
■ #ceph-devel
○ Ceph Developer Summit
● OpenStack
○ Cinder, Glance, Manila, ...
40
danny.al-gaaf@telekom.de
dalgaaf
linkedin.com/in/dalgaaf
Danny Al-Gaaf
Senior Cloud Technologist
IRC
Q&A - THANK YOU!

Más contenido relacionado

La actualidad más candente

Kubecon shanghai rook deployed nfs clusters over ceph-fs (translator copy)
Kubecon shanghai  rook deployed nfs clusters over ceph-fs (translator copy)Kubecon shanghai  rook deployed nfs clusters over ceph-fs (translator copy)
Kubecon shanghai rook deployed nfs clusters over ceph-fs (translator copy)
Hien Nguyen Van
 

La actualidad más candente (20)

Debugging with-wireshark-niels-de-vos
Debugging with-wireshark-niels-de-vosDebugging with-wireshark-niels-de-vos
Debugging with-wireshark-niels-de-vos
 
Erasure codes and storage tiers on gluster
Erasure codes and storage tiers on glusterErasure codes and storage tiers on gluster
Erasure codes and storage tiers on gluster
 
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
OpenNebula Conf 2014 | Using Ceph to provide scalable storage for OpenNebula ...
 
Cache Tiering and Erasure Coding
Cache Tiering and Erasure CodingCache Tiering and Erasure Coding
Cache Tiering and Erasure Coding
 
Red Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS PlansRed Hat Gluster Storage, Container Storage and CephFS Plans
Red Hat Gluster Storage, Container Storage and CephFS Plans
 
Keeping OpenStack storage trendy with Ceph and containers
Keeping OpenStack storage trendy with Ceph and containersKeeping OpenStack storage trendy with Ceph and containers
Keeping OpenStack storage trendy with Ceph and containers
 
Gluster technical overview
Gluster technical overviewGluster technical overview
Gluster technical overview
 
Tiering barcelona
Tiering barcelonaTiering barcelona
Tiering barcelona
 
Kubecon shanghai rook deployed nfs clusters over ceph-fs (translator copy)
Kubecon shanghai  rook deployed nfs clusters over ceph-fs (translator copy)Kubecon shanghai  rook deployed nfs clusters over ceph-fs (translator copy)
Kubecon shanghai rook deployed nfs clusters over ceph-fs (translator copy)
 
Software defined storage
Software defined storageSoftware defined storage
Software defined storage
 
The State of Ceph, Manila, and Containers in OpenStack
The State of Ceph, Manila, and Containers in OpenStackThe State of Ceph, Manila, and Containers in OpenStack
The State of Ceph, Manila, and Containers in OpenStack
 
Lcna 2012-tutorial
Lcna 2012-tutorialLcna 2012-tutorial
Lcna 2012-tutorial
 
OpenContrail, Real Speed: Offloading vRouter
OpenContrail, Real Speed: Offloading vRouterOpenContrail, Real Speed: Offloading vRouter
OpenContrail, Real Speed: Offloading vRouter
 
Measuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data PlaneMeasuring a 25 and 40Gb/s Data Plane
Measuring a 25 and 40Gb/s Data Plane
 
Sdc 2012-challenges
Sdc 2012-challengesSdc 2012-challenges
Sdc 2012-challenges
 
My personal journey through the World of Open Source! How What Was Old Beco...
My personal journey through  the World of Open Source!  How What Was Old Beco...My personal journey through  the World of Open Source!  How What Was Old Beco...
My personal journey through the World of Open Source! How What Was Old Beco...
 
Accelerating Networked Applications with Flexible Packet Processing
Accelerating Networked Applications with Flexible Packet ProcessingAccelerating Networked Applications with Flexible Packet Processing
Accelerating Networked Applications with Flexible Packet Processing
 
Dedupe nmamit
Dedupe nmamitDedupe nmamit
Dedupe nmamit
 
Leases and-caching final
Leases and-caching finalLeases and-caching final
Leases and-caching final
 
Glusterfs for sysadmins-justin_clift
Glusterfs for sysadmins-justin_cliftGlusterfs for sysadmins-justin_clift
Glusterfs for sysadmins-justin_clift
 

Destacado

Destacado (9)

PuppetConf 2016 Customer Keynote: Proof of Concept to 30K+ Hosts with Puppet ...
PuppetConf 2016 Customer Keynote: Proof of Concept to 30K+ Hosts with Puppet ...PuppetConf 2016 Customer Keynote: Proof of Concept to 30K+ Hosts with Puppet ...
PuppetConf 2016 Customer Keynote: Proof of Concept to 30K+ Hosts with Puppet ...
 
Deutsche telekom
Deutsche telekomDeutsche telekom
Deutsche telekom
 
Deutsche Telekom CMD 2015 - Cost and Portfolio Transformation
 Deutsche Telekom CMD 2015 - Cost and Portfolio Transformation Deutsche Telekom CMD 2015 - Cost and Portfolio Transformation
Deutsche Telekom CMD 2015 - Cost and Portfolio Transformation
 
MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4
MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4
MySQL? Load? Clustering! Balancing! PECL/mysqlnd_ms 1.4
 
PoC: Using a Group Communication System to improve MySQL Replication HA
PoC: Using a Group Communication System to improve MySQL Replication HAPoC: Using a Group Communication System to improve MySQL Replication HA
PoC: Using a Group Communication System to improve MySQL Replication HA
 
High-Availability using MySQL Fabric
High-Availability using MySQL FabricHigh-Availability using MySQL Fabric
High-Availability using MySQL Fabric
 
Deutsche Telekom IR Deck - June 2015
Deutsche Telekom IR Deck - June 2015Deutsche Telekom IR Deck - June 2015
Deutsche Telekom IR Deck - June 2015
 
Kubernetes Meetup: CNI, Flex Volume, and Scheduler
Kubernetes Meetup: CNI, Flex Volume, and SchedulerKubernetes Meetup: CNI, Flex Volume, and Scheduler
Kubernetes Meetup: CNI, Flex Volume, and Scheduler
 
SDN IN DT’s TERASTREAM
SDN IN DT’s TERASTREAMSDN IN DT’s TERASTREAM
SDN IN DT’s TERASTREAM
 

Similar a 99.999% Available OpenStack Cloud - A Builder's Guide

Maxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialMaxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorial
madhuinturi
 

Similar a 99.999% Available OpenStack Cloud - A Builder's Guide (20)

Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and OutlookLinux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
Linux Stammtisch Munich: Ceph - Overview, Experiences and Outlook
 
OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...
OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...
OpenNebulaConf2018 - Is Hyperconverged Infrastructure what you need? - Boyan ...
 
Running OpenStack in Production - Barcamp Saigon 2016
Running OpenStack in Production - Barcamp Saigon 2016Running OpenStack in Production - Barcamp Saigon 2016
Running OpenStack in Production - Barcamp Saigon 2016
 
2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard2021.02 new in Ceph Pacific Dashboard
2021.02 new in Ceph Pacific Dashboard
 
What's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon ValleyWhat's New with Ceph - Ceph Day Silicon Valley
What's New with Ceph - Ceph Day Silicon Valley
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
CloudStack In Production
CloudStack In ProductionCloudStack In Production
CloudStack In Production
 
CEPH DAY BERLIN - WHAT'S NEW IN CEPH
CEPH DAY BERLIN - WHAT'S NEW IN CEPH CEPH DAY BERLIN - WHAT'S NEW IN CEPH
CEPH DAY BERLIN - WHAT'S NEW IN CEPH
 
OpenStack Best Practices and Considerations - terasky tech day
OpenStack Best Practices and Considerations  - terasky tech dayOpenStack Best Practices and Considerations  - terasky tech day
OpenStack Best Practices and Considerations - terasky tech day
 
OpenNebula and StorPool: Building Powerful Clouds
OpenNebula and StorPool: Building Powerful CloudsOpenNebula and StorPool: Building Powerful Clouds
OpenNebula and StorPool: Building Powerful Clouds
 
Ambedded - how to build a true no single point of failure ceph cluster
Ambedded - how to build a true no single point of failure ceph cluster Ambedded - how to build a true no single point of failure ceph cluster
Ambedded - how to build a true no single point of failure ceph cluster
 
2021.06. Ceph Project Update
2021.06. Ceph Project Update2021.06. Ceph Project Update
2021.06. Ceph Project Update
 
Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017Hadoop 3 @ Hadoop Summit San Jose 2017
Hadoop 3 @ Hadoop Summit San Jose 2017
 
Apache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community UpdateApache Hadoop 3.0 Community Update
Apache Hadoop 3.0 Community Update
 
Cloud storage: the right way OSS EU 2018
Cloud storage: the right way OSS EU 2018Cloud storage: the right way OSS EU 2018
Cloud storage: the right way OSS EU 2018
 
Optimized placement in Openstack for NFV
Optimized placement in Openstack for NFVOptimized placement in Openstack for NFV
Optimized placement in Openstack for NFV
 
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
Kubernetes @ Squarespace (SRE Portland Meetup October 2017)
 
Maxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorialMaxwell siuc hpc_description_tutorial
Maxwell siuc hpc_description_tutorial
 
Ceph in 2023 and Beyond.pdf
Ceph in 2023 and Beyond.pdfCeph in 2023 and Beyond.pdf
Ceph in 2023 and Beyond.pdf
 
Open stack HA - Theory to Reality
Open stack HA -  Theory to RealityOpen stack HA -  Theory to Reality
Open stack HA - Theory to Reality
 

Más de Danny Al-Gaaf

Más de Danny Al-Gaaf (7)

Email Storage with Ceph - Cephalocon APAC, Beijing 2018
Email Storage with Ceph - Cephalocon APAC, Beijing 2018Email Storage with Ceph - Cephalocon APAC, Beijing 2018
Email Storage with Ceph - Cephalocon APAC, Beijing 2018
 
Email Storage with Ceph - Ceph Day Germany 2018
Email Storage with Ceph - Ceph Day Germany 2018Email Storage with Ceph - Ceph Day Germany 2018
Email Storage with Ceph - Ceph Day Germany 2018
 
Vanilla vs OpenStack Distributions - Update on Distinctions, Status, and Stat...
Vanilla vs OpenStack Distributions - Update on Distinctions, Status, and Stat...Vanilla vs OpenStack Distributions - Update on Distinctions, Status, and Stat...
Vanilla vs OpenStack Distributions - Update on Distinctions, Status, and Stat...
 
Email Storage with Ceph - SUSECON2017
Email Storage with Ceph - SUSECON2017Email Storage with Ceph - SUSECON2017
Email Storage with Ceph - SUSECON2017
 
Email storage with Ceph
Email storage with CephEmail storage with Ceph
Email storage with Ceph
 
DOST 2017 - Vanilla or Distributions - How do they differentiate
DOST 2017 - Vanilla or Distributions - How do they differentiateDOST 2017 - Vanilla or Distributions - How do they differentiate
DOST 2017 - Vanilla or Distributions - How do they differentiate
 
Vanilla or distributions - How do they differentiate?
Vanilla or distributions - How do they differentiate?Vanilla or distributions - How do they differentiate?
Vanilla or distributions - How do they differentiate?
 

Último

Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
raffaeleoman
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
Kayode Fayemi
 

Último (20)

Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)Introduction to Prompt Engineering (Focusing on ChatGPT)
Introduction to Prompt Engineering (Focusing on ChatGPT)
 
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 97 Noida Escorts >༒8448380779 Escort Service
 
Causes of poverty in France presentation.pptx
Causes of poverty in France presentation.pptxCauses of poverty in France presentation.pptx
Causes of poverty in France presentation.pptx
 
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara ServicesVVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
VVIP Call Girls Nalasopara : 9892124323, Call Girls in Nalasopara Services
 
ICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdfICT role in 21st century education and it's challenges.pdf
ICT role in 21st century education and it's challenges.pdf
 
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptxChiulli_Aurora_Oman_Raffaele_Beowulf.pptx
Chiulli_Aurora_Oman_Raffaele_Beowulf.pptx
 
Dreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio IIIDreaming Music Video Treatment _ Project & Portfolio III
Dreaming Music Video Treatment _ Project & Portfolio III
 
Dreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video TreatmentDreaming Marissa Sánchez Music Video Treatment
Dreaming Marissa Sánchez Music Video Treatment
 
Presentation on Engagement in Book Clubs
Presentation on Engagement in Book ClubsPresentation on Engagement in Book Clubs
Presentation on Engagement in Book Clubs
 
Uncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac FolorunsoUncommon Grace The Autobiography of Isaac Folorunso
Uncommon Grace The Autobiography of Isaac Folorunso
 
My Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle BaileyMy Presentation "In Your Hands" by Halle Bailey
My Presentation "In Your Hands" by Halle Bailey
 
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docxANCHORING SCRIPT FOR A CULTURAL EVENT.docx
ANCHORING SCRIPT FOR A CULTURAL EVENT.docx
 
Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510Thirunelveli call girls Tamil escorts 7877702510
Thirunelveli call girls Tamil escorts 7877702510
 
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort ServiceBDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
BDSM⚡Call Girls in Sector 93 Noida Escorts >༒8448380779 Escort Service
 
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
Re-membering the Bard: Revisiting The Compleat Wrks of Wllm Shkspr (Abridged)...
 
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
Governance and Nation-Building in Nigeria: Some Reflections on Options for Po...
 
lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.lONG QUESTION ANSWER PAKISTAN STUDIES10.
lONG QUESTION ANSWER PAKISTAN STUDIES10.
 
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
Busty Desi⚡Call Girls in Sector 51 Noida Escorts >༒8448380779 Escort Service-...
 
Report Writing Webinar Training
Report Writing Webinar TrainingReport Writing Webinar Training
Report Writing Webinar Training
 
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptxMohammad_Alnahdi_Oral_Presentation_Assignment.pptx
Mohammad_Alnahdi_Oral_Presentation_Assignment.pptx
 

99.999% Available OpenStack Cloud - A Builder's Guide

  • 1. 99.999% Available OpenStack Cloud - A Builder's Guide Danny Al-Gaaf (Deutsche Telekom) OpenStack Summit 2015 - Tokyo
  • 2. ● Motivation ● Availability and SLA's ● Data centers ○ Setup and failure scenarios ● OpenStack and Ceph ○ Architecture and Critical Components ○ HA setup ○ Quorum? ● OpenStack and Ceph == HA? ○ Failure scenarios ○ Mitigation ● Conclusions Overview 2
  • 4. NFV Cloud @ Deutsche Telekom ● Datacenter design ○ Backend DCs ■ Few but classic DCs ■ High SLAs for infrastructure and services ■ For private/customer data and services ○ Frontend DCs ■ Small but many ■ Near to the customer ■ Lower SLAs, can fail at any time ■ NFVs: ● Spread over many FDCs ● Failures are handled by services and not the infrastructure ● Run telco core services @OpenStack/KVM/Ceph 4
  • 6. Availability ● Measured relative to “100 % operational” 6 availability downtime classification 99.9% 8.76 hours/year high availability 99.99% 52.6 minutes/year very high availability 99.999% 5.26 minutes/year highest availability 99.9999% 0.526 minutes/year disaster tolerant
  • 7. High Availability ● Continuous system availability in case of component failures ● Which availability? ○ Server ○ Network ○ Datacenter ○ Cloud ○ Application/Service ● End-to-End availability most interesting 7
  • 8. High Availability ● Calculation ○ Each component contributes to the service availability ■ Infrastructure ■ Hardware ■ Software ■ Processes ○ Likelihood of disaster and failure scenarios ○ Model can get very complex ● SLA’s ○ ITIL (IT Infrastructure Library) ○ Planned maintenance depending on SLA may be excluded 8
  • 10. Failure scenarios ● Power outage ○ External ○ Internal ○ Backup UPS/Generator ● Network outage ○ External connectivity ○ Internal ■ Cables ■ Switches, router ● Failure of a server or a component ● Failure of a software service 10
  • 11. Failure scenarios ● Human error still often leading cause of outage ○ Misconfiguration ○ Accidents ○ Emergency power-off ● Disaster ○ Fire ○ Flood ○ Earthquake ○ Plane crash ○ Nuclear accident 11
  • 13. Mitigation ● Identify potential SPoF ● Use redundant components ● Careful planning ○ Network design (external/internal) ○ Power management (external/internal) ○ Fire suppression ○ Disaster management ○ Monitoring ● 5-nines on DC/HW level hard to achieve ○ Tier IV usually too expensive (compared with Tier III or III+) ○ Requires HA concept on cloud and application level 13
  • 14. Example: Network ● Spine/leaf arch ● Redundant ○ DC-R ○ Spine switches ○ Leaf switches (ToR) ○ OAM switches ○ Firewall ● Server ○ Redundant NICs ○ Redundant power lines and supplies 14
  • 17. Architecture: Ceph Components ● OSDs ○ 10s - 1000s per cluster ○ One per device (HDD/SDD/RAID Group, SAN …) ○ Store objects ○ Handle replication and recovery ● MONs: ○ Maintain cluster membership and states ○ Use PAXOS protocol to establish quorum consensus ○ Small, lightweight ○ Odd number17
  • 18. Architecture: Ceph and OpenStack 18
  • 19. HA - Critical Components Which services need to be HA? ● Control plane ○ Provisioning, management ○ API endpoints and services ○ Admin nodes ○ Control nodes ● Data plane ○ Steady states ○ Storage ○ Network 19
  • 20. HA Setup ● Stateless services ○ No dependency between requests ○ After reply no further attention required ○ API endpoints (e.g. nova-api, glance-api,...) or nova-scheduler ● Stateful service ○ Action typically comprises out of multiple requests ○ Subsequent requests depend on the results of a former request ○ Databases, RabbitMQ 20
  • 21. HA Setup 21 active/passive active/active stateless ● load balance redundant services ● load balance redundant services stateful ● bring replacement resource online ● redundant services, all with the same state ● state changes are passed to all instances
  • 23. Quorum? ● Required to decide which cluster partition/member is primary to prevent data/service corruption ● Examples: ○ Databases ■ MariaDB / Galera, MongoDB, CassandraDB ○ Pacemaker/corosync ○ Ceph Monitors ■ Paxos ■ Odd number of MONs required ■ At least 3 MONs for HA, simple majority (2:3, 3:5, 4:7, …) ■ Without quorum: ● no changes of cluster membership (e.g. add new MONs/ODSs) ● Clients can’t connect to cluster 23
  • 25. SPoF ● OpenStack HA ○ No SPoF assumed ● Ceph ○ No SPoF assumed ○ Availability of RBDs is critical to VMs ○ Availability of RadosGW can be easily managed via HAProxy ● What in case of failures on higher level? ○ Data center cores or fire compartments ○ Network ■ Physical ■ Misconfiguration ○ Power 25
  • 26. Setup - Two Rooms 26
  • 27. Failure scenarios - FC fails 27
  • 28. Failure scenarios - FC fails 28
  • 29. Failure scenarios - Split brain 29 ● Ceph ● Quorum selects B ● Storage in A stops ● OpenStack HA: ● Selects B ● VMs in B still running ● Best-case scenario
  • 30. Failure scenarios - Split brain 30 ● Ceph ● Quorum selects B ● Storage in A stops ● OpenStack HA: ● Selects A ● VMs in A and B stop working ● Worst-case scenario
  • 31. Other issues ● Replica distribution ○ Two room setup: ■ 2 or 3 replica contain risk of having only one replica left ■ Would require 4 replica (2:2) ● Reduced performance ● Increased traffic and costs ○ Alternative: erasure coding ■ Reduced performance, less space required ● Spare capacity ○ Remaining room requires spare capacity to restore ○ Depends on ■ Failure/restore scenario ■ Replication vs erasure code ○ Costs 31
  • 32. Mitigation - Three FCs 32 ● Third FC/failure zone hosting all services ● Usually higher costs ● More resistant against failures ● Better replica distribution ● More east/west traffic
  • 33. Mitigation - Quorum Room 33 ● Most DCs have backup rooms ● Only a few servers to host quorum related services ● Less cost intensive ● Can mitigate split brain between FCs (depending on network layout)
  • 34. Mitigation - Pets vs Cattle 34 ● NO pets allowed !!! ● Only cloud-ready applications
  • 35. Mitigation - Failure tolerant applications 35 ● Tier level is not the most relevant layer ● Application must build their own cluster mechanisms on top of the DC → increases the availability significantly ● Data replication must be done across multi-region ● In case of a disaster route traffic to different DC ● Many VNF (virtual network functions) already support such setups
  • 36. Mitigation - Federated Object Stores 36 ● Best way to synchronize and replicate data across multiple DC is usage of object storage ● Sync is done asynchronously Open issues: ● Doesn’t solve replication of databases ● Many applications don’t support object storage and need to be adapted ● Applications also need to support regions/zones
  • 37. Mitigation - Outlook ● “OpenStack follows Storage” ○ Use RBDs as fencing devices ○ Extend Ceph MONs ■ Include information about physical placement similar to CRUSH map ■ Enable HA setup to query quorum decisions and map quorum to physical layout ● Passive standby Ceph MONs to ease deployment of MONs if quorum fails ○ http://tracker.ceph.com/projects/ceph/wiki/Passive_monitors ● Generic quorum service/library ? 37
  • 39. Conclusions ● OpenStack and Ceph provide HA if carefully planed ○ Be aware of potential failure scenarios! ○ All Quorums need must be synced ○ Third room must be used ○ Replica distribution and spare capacity must be considered ○ Ceph need more extended quorum information ● Target for five 9’s is E2E ○ Five 9’s on data center level very expensive ○ No pets !!! ○ Distribute applications or services over multiple DCs 39
  • 40. Get involved ! ● Ceph ○ https://ceph.com/community/contribute/ ○ ceph-devel@vger.kernel.org ○ IRC: OFTC ■ #ceph, ■ #ceph-devel ○ Ceph Developer Summit ● OpenStack ○ Cinder, Glance, Manila, ... 40