SlideShare una empresa de Scribd logo
1 de 25
Descargar para leer sin conexión
1
High Availability
on Linux
Roger Zhou
zzhou@suse.com
openSUSE.Asia
Summit 2015
SUSE
way
© SUSE, All rights reserved.2
CURIOSITY in the land
of Linux High Availability
3
Agenda
• HA architectural components
• Use case examples
• Future outlook & Demo
4
What is Cluster?
• HPC (super computing)
• Load Balancer (Very high capacity)
• High Availability
‒ 99.999% = 5 m/year MTTR
‒ SPOF(single point of failure)
‒ Murphy's Law
"Everything that can go wrong will
go wrong"
5
"HA", widely used, often confusing
• VMWare vSphere HA
‒ hypervisor and hardware level. Close-source.
‒ Agnostic on Guest OS inside the VM.
• SUSE HA
‒ Inside Linux OS.
‒ That said, Windows need Windows HA solution.
• Different Industries
‒ We are Enterprise.
‒ HADOOP HA (dot com)
‒ OpenStack HA ( paas )
‒ OpenSAF (telecom)
6
History of HA in Linux OS
• 1990s, Heartbeat project. Simply two nodes.
• Early 2000s, Heartbeat 2.0 too complex.
‒ Industry demands to split.
1) one for cluster membership
2) one for resource management
• Today, ClusterLabs.org
‒ A completely different solution in early days,
pacemaker + corosnc
‒ While merged Heartbeat project.
‒ 2015 HA Summit
7
Typical HA Problem - Split Brain
• Clustering
‒ multiple nodes share the same resources.
• Split partitions run the same service
‒ It just breaks data integrity !!!
• Two key concepts as the solution:
‒ Fencing
Cluster doesn't accept any confusing state.
STONITH - "shoot the other node in the head".
‒ Quorum
It stands for "majority". No quorum, then no actions,
no resource management, no fencing.
8
HA Hardware Components
• Multiple networks
‒ A user network for end user access.
‒ A dedicated network for cluster communication/heartbeat.
‒ A dedicated storage network infrastructure.
• Network Bonding
‒ aka. Link Aggregation
• Fencing/STONITH devices
‒ remote “powerswitch”
• Shared storage
‒ NAS(nfs/cifs), SAN(fc/iscsi)
9
Architectural Software Components
"clusterlabs.org"
‒ Corosync
‒ Pacemaker
‒ Resource Agents
‒ Fencing/STONITH Devices
‒ UI(crmsh and Hawk2)
‒ Booth for GEO Cluster
Outside of "clusterlabs.org"
‒ LVS: Layer 4, ip+port, kernel space.
‒ HAproxy: Layer 7/ HTTP, user space.
‒ Shared filesystem: OCFS2 / GFS2
‒ Block device replication:
DRBD, cLVM mirroring, cluster-md
‒ Shared storage:
SAN (FC / FCoE / iSCSI)
‒ Multipathing
Software Components in details
11
Corosync: messaging and membership
• Consensus algorithm
‒ "Totem Single Ring Ordering and Membership protocol"
• Closed Process Group
‒ Analogue “TCP/IP 3-way hand shaking”
‒ Membership handling.
‒ Message ordering.
• A quorum system
‒ notifies apps when quorum is achieved or lost.
• In-memory object database
‒ for Configuration engines and Service engines.
‒ Shared-nothing cluster.
12
Pacemaker: the resources manager
• The brain of the cluster.
• Policy engine for decision making.
‒ To start/stop resources on a node according to the score.
‒ To monitor resources according to interval.
‒ To restart resources if monitor fails.
‒ To fence/STONITH a node if stop operation fails.
13
Shoot The Other Node In The Head
• Data integrity does not
tolerate any confusing state.
Before migrating resources
to another node in the
cluster, the cluster must
confirm the suspicious node
really is down.
• STONITH is mandatory for
*enterprise* Linux HA
clusters.
14
Popular STONITH devices
• APC PDU
‒ network based powerswitch
• Standard Protocols Integrated with Servers
‒ Intel AMT, HP iLO, Dell DRAC, IBM IMM, IPMI Alliance
• Software libraries
‒ to deal with KVM, Xen and VMware Vms.
• Software based
‒ SBD (STONITH Block Device) to do self termination.
The last implicit option in the fencing topology.
• NOTE: Fencing devices can be chained.
15
Resources Agents (RAs)
• Write RA for your applications
• LSB shell scripts:
start / stop / monitor
• More than hundred contributors in upstream github.
16
Cluster Filesystem
• OCFS2 / GFS2
‒ On the shared storage.
‒ Multiple nodes concurrently access the same filesystem.
http://clusterlabs.org/doc/fr/Pacemaker/1.1/html/Clusters_from_Scratch/_pacemaker_architecture.html
17
Cluster Block Device
• DRBD
‒ network based raid1.
‒ high performance data replication over network.
• cLVM2 + cmirrord
‒ Clustered lvm2 mirroring.
‒ Multiple nodes can manipulate volumes on the shared disk.
‒ clvmd distributes LVM metadata updates in the cluster.
‒ Data replication speed is way too slow.
• Cluster md raid1
‒ multiple nodes use the shared disks as md-raid1.
‒ High performance raid1 solution in cluster.
Cluster Examples
19
NFS Server ( High Available NAS )
Cluster Example in Diagram
VM1
DRBD MasterDRBD Master
LVMLVM
Filesystem(ext4)Filesystem(ext4)
NFS exportsNFS exports
Virtual IPVirtual IP
VM2
DRBD SlaveDRBD Slave
LVMLVM
Filesystem(ext4)Filesystem(ext4)
NFS exportsNFS exports
Virtual IPVirtual IP
Failover
Pacemaker + CorosyncPacemaker + Corosync
KernelKernel KernelKernel
network raid1
20
HA iSCSI Server ( Active/Passive )
Cluster Example in Diagram
VM1
DRBD MasterDRBD Master
LVMLVM
iSCSITargetiSCSITarget
iSCSILogicalUnitiSCSILogicalUnit
Virtual IPVirtual IP
VM2
DRBD SlaveDRBD Slave
LVMLVM
iSCSITargetiSCSITarget
iSCSILogicalUnitiSCSILogicalUnit
Virtual IPVirtual IP
Failover
Pacemaker + CorosyncPacemaker + Corosync
network raid1
KernelKernel KernelKernel
21
Cluster FS - OCFS2 on shared disk
Cluster Example in Diagram
Host3Host1
OCFS2OCFS2
Host2
OCFS2OCFS2
KernelKernel KernelKernel
Virtual Machine Images and Configuration ( /mnt/images/ )Virtual Machine Images and Configuration ( /mnt/images/ )
Cluster MD-RAID1Cluster MD-RAID1 Cluster MD-RAID1Cluster MD-RAID1
Replication / Backup
/dev/md127 /dev/md127
VMVM VMmigration
Pacemaker + CorosyncPacemaker + Corosync
22
Future outlook
• Upstream activities
‒ OpenStack: from the control plane into the compute domain.
‒ Scalability of corosync/pacemaker
‒ Docker adoption
‒ “Zero” Downtime HA VM
‒ ...
23
Join us ( all *open-source* )
• Play with Leap 42.1: http://www.opensuse.org
Doc: https://www.suse.com/documentation/sle-ha-12/
• Report and Fix Bugs: http://bugzilla.opensuse.org
• Discussion: opensuse-ha@opensuse.org
• HA ClusterLabs: http://clusterlabs.org/
• General HA Users: users@clusterlabs.org
Demo + Q&A + Have fun
25

Más contenido relacionado

La actualidad más candente

Cgroup resource mgmt_v1
Cgroup resource mgmt_v1Cgroup resource mgmt_v1
Cgroup resource mgmt_v1
sprdd
 
GlusterFS Update and OpenStack Integration
GlusterFS Update and OpenStack IntegrationGlusterFS Update and OpenStack Integration
GlusterFS Update and OpenStack Integration
Etsuji Nakai
 
Gluster fs tutorial part 2 gluster and big data- gluster for devs and sys ...
Gluster fs tutorial   part 2  gluster and big data- gluster for devs and sys ...Gluster fs tutorial   part 2  gluster and big data- gluster for devs and sys ...
Gluster fs tutorial part 2 gluster and big data- gluster for devs and sys ...
Tommy Lee
 
Kubecon shanghai rook deployed nfs clusters over ceph-fs (translator copy)
Kubecon shanghai  rook deployed nfs clusters over ceph-fs (translator copy)Kubecon shanghai  rook deployed nfs clusters over ceph-fs (translator copy)
Kubecon shanghai rook deployed nfs clusters over ceph-fs (translator copy)
Hien Nguyen Van
 

La actualidad más candente (20)

OpenNebulaConf 2016 - The DRBD SDS for OpenNebula by Philipp Reisner, LINBIT
OpenNebulaConf 2016 - The DRBD SDS for OpenNebula by Philipp Reisner, LINBITOpenNebulaConf 2016 - The DRBD SDS for OpenNebula by Philipp Reisner, LINBIT
OpenNebulaConf 2016 - The DRBD SDS for OpenNebula by Philipp Reisner, LINBIT
 
IITCC15: Xen Project 4.6 Update
IITCC15: Xen Project 4.6 UpdateIITCC15: Xen Project 4.6 Update
IITCC15: Xen Project 4.6 Update
 
Rhel cluster basics 2
Rhel cluster basics   2Rhel cluster basics   2
Rhel cluster basics 2
 
Cgroup resource mgmt_v1
Cgroup resource mgmt_v1Cgroup resource mgmt_v1
Cgroup resource mgmt_v1
 
GlusterFS Update and OpenStack Integration
GlusterFS Update and OpenStack IntegrationGlusterFS Update and OpenStack Integration
GlusterFS Update and OpenStack Integration
 
Rh436 pdf
Rh436 pdfRh436 pdf
Rh436 pdf
 
Gluster fs tutorial part 2 gluster and big data- gluster for devs and sys ...
Gluster fs tutorial   part 2  gluster and big data- gluster for devs and sys ...Gluster fs tutorial   part 2  gluster and big data- gluster for devs and sys ...
Gluster fs tutorial part 2 gluster and big data- gluster for devs and sys ...
 
Kubecon shanghai rook deployed nfs clusters over ceph-fs (translator copy)
Kubecon shanghai  rook deployed nfs clusters over ceph-fs (translator copy)Kubecon shanghai  rook deployed nfs clusters over ceph-fs (translator copy)
Kubecon shanghai rook deployed nfs clusters over ceph-fs (translator copy)
 
Software defined storage
Software defined storageSoftware defined storage
Software defined storage
 
Gluster Storage
Gluster StorageGluster Storage
Gluster Storage
 
Integrating gluster fs,_qemu_and_ovirt-vijay_bellur-linuxcon_eu_2013
Integrating gluster fs,_qemu_and_ovirt-vijay_bellur-linuxcon_eu_2013Integrating gluster fs,_qemu_and_ovirt-vijay_bellur-linuxcon_eu_2013
Integrating gluster fs,_qemu_and_ovirt-vijay_bellur-linuxcon_eu_2013
 
An Updated Performance Comparison of Virtual Machines and Linux Containers
An Updated Performance Comparison of Virtual Machines and Linux ContainersAn Updated Performance Comparison of Virtual Machines and Linux Containers
An Updated Performance Comparison of Virtual Machines and Linux Containers
 
Ceph and Mirantis OpenStack
Ceph and Mirantis OpenStackCeph and Mirantis OpenStack
Ceph and Mirantis OpenStack
 
TechDay - Toronto 2016 - Hyperconvergence and OpenNebula
TechDay - Toronto 2016 - Hyperconvergence and OpenNebulaTechDay - Toronto 2016 - Hyperconvergence and OpenNebula
TechDay - Toronto 2016 - Hyperconvergence and OpenNebula
 
RHCE Training
RHCE TrainingRHCE Training
RHCE Training
 
Stig Telfer - OpenStack and the Software-Defined SuperComputer
Stig Telfer - OpenStack and the Software-Defined SuperComputerStig Telfer - OpenStack and the Software-Defined SuperComputer
Stig Telfer - OpenStack and the Software-Defined SuperComputer
 
Live migrating a container: pros, cons and gotchas
Live migrating a container: pros, cons and gotchasLive migrating a container: pros, cons and gotchas
Live migrating a container: pros, cons and gotchas
 
64-bit ARM Unikernels on uKVM
64-bit ARM Unikernels on uKVM64-bit ARM Unikernels on uKVM
64-bit ARM Unikernels on uKVM
 
TechDay - Cambridge 2016 - OpenNebula Corona
TechDay - Cambridge 2016 - OpenNebula CoronaTechDay - Cambridge 2016 - OpenNebula Corona
TechDay - Cambridge 2016 - OpenNebula Corona
 
Smb gluster devmar2013
Smb gluster devmar2013Smb gluster devmar2013
Smb gluster devmar2013
 

Similar a Linux High Availability Overview - openSUSE.Asia Summit 2015

Pacemaker+DRBD
Pacemaker+DRBDPacemaker+DRBD
Pacemaker+DRBD
Dan Frincu
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Databricks
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Databricks
 
What CloudStackers Need To Know About LINSTOR/DRBD
What CloudStackers Need To Know About LINSTOR/DRBDWhat CloudStackers Need To Know About LINSTOR/DRBD
What CloudStackers Need To Know About LINSTOR/DRBD
ShapeBlue
 

Similar a Linux High Availability Overview - openSUSE.Asia Summit 2015 (20)

High Availability Storage (susecon2016)
High Availability Storage (susecon2016)High Availability Storage (susecon2016)
High Availability Storage (susecon2016)
 
[발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community)
[발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community) [발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community)
[발표자료] 오픈소스 Pacemaker 활용한 zabbix 이중화 방안(w/ Zabbix Korea Community)
 
Realizing Linux Containers (LXC)
Realizing Linux Containers (LXC)Realizing Linux Containers (LXC)
Realizing Linux Containers (LXC)
 
Kfs presentation
Kfs presentationKfs presentation
Kfs presentation
 
Linux containers – next gen virtualization for cloud (atl summit) ar4 3 - copy
Linux containers – next gen virtualization for cloud (atl summit) ar4 3 - copyLinux containers – next gen virtualization for cloud (atl summit) ar4 3 - copy
Linux containers – next gen virtualization for cloud (atl summit) ar4 3 - copy
 
Linux Container Brief for IEEE WG P2302
Linux Container Brief for IEEE WG P2302Linux Container Brief for IEEE WG P2302
Linux Container Brief for IEEE WG P2302
 
Pacemaker+DRBD
Pacemaker+DRBDPacemaker+DRBD
Pacemaker+DRBD
 
Performant and Resilient Storage: The Open Source & Linux Way
Performant and Resilient Storage: The Open Source & Linux WayPerformant and Resilient Storage: The Open Source & Linux Way
Performant and Resilient Storage: The Open Source & Linux Way
 
OpenNebulaConf2019 - Performant and Resilient Storage the Open Source & Linux...
OpenNebulaConf2019 - Performant and Resilient Storage the Open Source & Linux...OpenNebulaConf2019 - Performant and Resilient Storage the Open Source & Linux...
OpenNebulaConf2019 - Performant and Resilient Storage the Open Source & Linux...
 
Linux Memory Analysis with Volatility
Linux Memory Analysis with VolatilityLinux Memory Analysis with Volatility
Linux Memory Analysis with Volatility
 
Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)Network Stack in Userspace (NUSE)
Network Stack in Userspace (NUSE)
 
Considerations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmfConsiderations when implementing_ha_in_dmf
Considerations when implementing_ha_in_dmf
 
Quick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage ClusterQuick-and-Easy Deployment of a Ceph Storage Cluster
Quick-and-Easy Deployment of a Ceph Storage Cluster
 
OpenNebulaConf2017EU: Hyper converged infrastructure with OpenNebula and Ceph...
OpenNebulaConf2017EU: Hyper converged infrastructure with OpenNebula and Ceph...OpenNebulaConf2017EU: Hyper converged infrastructure with OpenNebula and Ceph...
OpenNebulaConf2017EU: Hyper converged infrastructure with OpenNebula and Ceph...
 
Ippevent : openshift Introduction
Ippevent : openshift IntroductionIppevent : openshift Introduction
Ippevent : openshift Introduction
 
Gpfs introandsetup
Gpfs introandsetupGpfs introandsetup
Gpfs introandsetup
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
 
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
Apache Spark on Supercomputers: A Tale of the Storage Hierarchy with Costin I...
 
Build a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
Build a High Available NFS Cluster Based on CephFS - Shangzhong ZhuBuild a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
Build a High Available NFS Cluster Based on CephFS - Shangzhong Zhu
 
What CloudStackers Need To Know About LINSTOR/DRBD
What CloudStackers Need To Know About LINSTOR/DRBDWhat CloudStackers Need To Know About LINSTOR/DRBD
What CloudStackers Need To Know About LINSTOR/DRBD
 

Último

Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 

Último (20)

Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
Six Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal OntologySix Myths about Ontologies: The Basics of Formal Ontology
Six Myths about Ontologies: The Basics of Formal Ontology
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
Vector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptxVector Search -An Introduction in Oracle Database 23ai.pptx
Vector Search -An Introduction in Oracle Database 23ai.pptx
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Linux High Availability Overview - openSUSE.Asia Summit 2015

  • 1. 1 High Availability on Linux Roger Zhou zzhou@suse.com openSUSE.Asia Summit 2015 SUSE way
  • 2. © SUSE, All rights reserved.2 CURIOSITY in the land of Linux High Availability
  • 3. 3 Agenda • HA architectural components • Use case examples • Future outlook & Demo
  • 4. 4 What is Cluster? • HPC (super computing) • Load Balancer (Very high capacity) • High Availability ‒ 99.999% = 5 m/year MTTR ‒ SPOF(single point of failure) ‒ Murphy's Law "Everything that can go wrong will go wrong"
  • 5. 5 "HA", widely used, often confusing • VMWare vSphere HA ‒ hypervisor and hardware level. Close-source. ‒ Agnostic on Guest OS inside the VM. • SUSE HA ‒ Inside Linux OS. ‒ That said, Windows need Windows HA solution. • Different Industries ‒ We are Enterprise. ‒ HADOOP HA (dot com) ‒ OpenStack HA ( paas ) ‒ OpenSAF (telecom)
  • 6. 6 History of HA in Linux OS • 1990s, Heartbeat project. Simply two nodes. • Early 2000s, Heartbeat 2.0 too complex. ‒ Industry demands to split. 1) one for cluster membership 2) one for resource management • Today, ClusterLabs.org ‒ A completely different solution in early days, pacemaker + corosnc ‒ While merged Heartbeat project. ‒ 2015 HA Summit
  • 7. 7 Typical HA Problem - Split Brain • Clustering ‒ multiple nodes share the same resources. • Split partitions run the same service ‒ It just breaks data integrity !!! • Two key concepts as the solution: ‒ Fencing Cluster doesn't accept any confusing state. STONITH - "shoot the other node in the head". ‒ Quorum It stands for "majority". No quorum, then no actions, no resource management, no fencing.
  • 8. 8 HA Hardware Components • Multiple networks ‒ A user network for end user access. ‒ A dedicated network for cluster communication/heartbeat. ‒ A dedicated storage network infrastructure. • Network Bonding ‒ aka. Link Aggregation • Fencing/STONITH devices ‒ remote “powerswitch” • Shared storage ‒ NAS(nfs/cifs), SAN(fc/iscsi)
  • 9. 9 Architectural Software Components "clusterlabs.org" ‒ Corosync ‒ Pacemaker ‒ Resource Agents ‒ Fencing/STONITH Devices ‒ UI(crmsh and Hawk2) ‒ Booth for GEO Cluster Outside of "clusterlabs.org" ‒ LVS: Layer 4, ip+port, kernel space. ‒ HAproxy: Layer 7/ HTTP, user space. ‒ Shared filesystem: OCFS2 / GFS2 ‒ Block device replication: DRBD, cLVM mirroring, cluster-md ‒ Shared storage: SAN (FC / FCoE / iSCSI) ‒ Multipathing
  • 11. 11 Corosync: messaging and membership • Consensus algorithm ‒ "Totem Single Ring Ordering and Membership protocol" • Closed Process Group ‒ Analogue “TCP/IP 3-way hand shaking” ‒ Membership handling. ‒ Message ordering. • A quorum system ‒ notifies apps when quorum is achieved or lost. • In-memory object database ‒ for Configuration engines and Service engines. ‒ Shared-nothing cluster.
  • 12. 12 Pacemaker: the resources manager • The brain of the cluster. • Policy engine for decision making. ‒ To start/stop resources on a node according to the score. ‒ To monitor resources according to interval. ‒ To restart resources if monitor fails. ‒ To fence/STONITH a node if stop operation fails.
  • 13. 13 Shoot The Other Node In The Head • Data integrity does not tolerate any confusing state. Before migrating resources to another node in the cluster, the cluster must confirm the suspicious node really is down. • STONITH is mandatory for *enterprise* Linux HA clusters.
  • 14. 14 Popular STONITH devices • APC PDU ‒ network based powerswitch • Standard Protocols Integrated with Servers ‒ Intel AMT, HP iLO, Dell DRAC, IBM IMM, IPMI Alliance • Software libraries ‒ to deal with KVM, Xen and VMware Vms. • Software based ‒ SBD (STONITH Block Device) to do self termination. The last implicit option in the fencing topology. • NOTE: Fencing devices can be chained.
  • 15. 15 Resources Agents (RAs) • Write RA for your applications • LSB shell scripts: start / stop / monitor • More than hundred contributors in upstream github.
  • 16. 16 Cluster Filesystem • OCFS2 / GFS2 ‒ On the shared storage. ‒ Multiple nodes concurrently access the same filesystem. http://clusterlabs.org/doc/fr/Pacemaker/1.1/html/Clusters_from_Scratch/_pacemaker_architecture.html
  • 17. 17 Cluster Block Device • DRBD ‒ network based raid1. ‒ high performance data replication over network. • cLVM2 + cmirrord ‒ Clustered lvm2 mirroring. ‒ Multiple nodes can manipulate volumes on the shared disk. ‒ clvmd distributes LVM metadata updates in the cluster. ‒ Data replication speed is way too slow. • Cluster md raid1 ‒ multiple nodes use the shared disks as md-raid1. ‒ High performance raid1 solution in cluster.
  • 19. 19 NFS Server ( High Available NAS ) Cluster Example in Diagram VM1 DRBD MasterDRBD Master LVMLVM Filesystem(ext4)Filesystem(ext4) NFS exportsNFS exports Virtual IPVirtual IP VM2 DRBD SlaveDRBD Slave LVMLVM Filesystem(ext4)Filesystem(ext4) NFS exportsNFS exports Virtual IPVirtual IP Failover Pacemaker + CorosyncPacemaker + Corosync KernelKernel KernelKernel network raid1
  • 20. 20 HA iSCSI Server ( Active/Passive ) Cluster Example in Diagram VM1 DRBD MasterDRBD Master LVMLVM iSCSITargetiSCSITarget iSCSILogicalUnitiSCSILogicalUnit Virtual IPVirtual IP VM2 DRBD SlaveDRBD Slave LVMLVM iSCSITargetiSCSITarget iSCSILogicalUnitiSCSILogicalUnit Virtual IPVirtual IP Failover Pacemaker + CorosyncPacemaker + Corosync network raid1 KernelKernel KernelKernel
  • 21. 21 Cluster FS - OCFS2 on shared disk Cluster Example in Diagram Host3Host1 OCFS2OCFS2 Host2 OCFS2OCFS2 KernelKernel KernelKernel Virtual Machine Images and Configuration ( /mnt/images/ )Virtual Machine Images and Configuration ( /mnt/images/ ) Cluster MD-RAID1Cluster MD-RAID1 Cluster MD-RAID1Cluster MD-RAID1 Replication / Backup /dev/md127 /dev/md127 VMVM VMmigration Pacemaker + CorosyncPacemaker + Corosync
  • 22. 22 Future outlook • Upstream activities ‒ OpenStack: from the control plane into the compute domain. ‒ Scalability of corosync/pacemaker ‒ Docker adoption ‒ “Zero” Downtime HA VM ‒ ...
  • 23. 23 Join us ( all *open-source* ) • Play with Leap 42.1: http://www.opensuse.org Doc: https://www.suse.com/documentation/sle-ha-12/ • Report and Fix Bugs: http://bugzilla.opensuse.org • Discussion: opensuse-ha@opensuse.org • HA ClusterLabs: http://clusterlabs.org/ • General HA Users: users@clusterlabs.org
  • 24. Demo + Q&A + Have fun
  • 25. 25