Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Ha cluster with openSUSE Leap

Cargando en…3

Eche un vistazo a continuación

1 de 39
1 de 39

Más Contenido Relacionado

Audiolibros relacionados

Gratis con una prueba de 30 días de Scribd

Ver todo

Ha cluster with openSUSE Leap

  1. 1. High Availability Cluster with openSUSE Leap M. Edwin Zakaria
  2. 2. 2 • Mohammad Edwin Zakaria • Linux user since 1998 • openSUSE since 6.2 around 1999 • openSUSE member • openSUSE Indonesia
  3. 3. 3
  4. 4. 4
  5. 5. 5
  6. 6. 6
  7. 7. 7
  8. 8. 8 What is - Cluster ? - High Availability ?
  9. 9. 9 Curious? • A computer cluster consists of a set of loosely or tightly connected computers that work together so that, in many respects, they can be viewed as a single system. Unlike grid computers, computer clusters have each node set to perform the same task, controlled and scheduled by software
  10. 10. 10 Curious? • High availability (HA) is a system that is designed to avoid the loss of service by reducing or managing failures as well as minimizing planned downtime for the system. We expect a service to be highly available when life, health, and well-being, including the economic well-being of a company, depend on it.
  11. 11. 11 Curious? • Harvard Research Group divide the HA into several Availability Environment Classification (AEC): AE4, AE3, AE2, AE1, AE0 • ions.pdf • Other categories: continuous availability, fault tolerance, disaster tolerance
  12. 12. 12 Once again what is cluster? • High performance computing • Load balancer (high capacity) • High availability ‒ 99.99% ‒ MTBF (mean time between failure = total operating time/total numbers of failure) ‒ Single point of failure
  13. 13. 13 Once again what is cluster?
  14. 14. 14 Challange in HA • Murphy’s Law “If anything can go wrong, it will” ‒ Loss of data ‒ Service outage • Flood, fire, earthquake, natural disaster, hardware damage ‒ Can you afford a downtime? ‒ Can you afford low availability system? ‒ Cost of downtime?
  15. 15. 15 Different between HA term • HA term is widely use • VMware vSphere HA ‒ Closed source ‒ Hypervisor level and host hardware level • openSUSE/SUSE HA ‒ Open source ‒ OS level ‒ Protect critical resources running on VM ‒ HA within Linux OS
  16. 16. 16 HA in Linux • Started with heartbeat project in around 1990 • Now manage by ClusterLabs • The ClusterLabs stack, incorporating Corosync and Pacemaker defines an Open Source, High Availability cluster offering suitable for both small and large deployments. • Pacemaker has been around since 2004 and is primarily a collaborative effort between Red Hat and SUSE, they also receive considerable help and support from the folks at LinBit and the community in general.
  17. 17. 17 Hardware Consideration • External network, high traffic, use FO or eth bonding • Communication network between cluster node, use for messaging, membership, STONITH • Storage network, use FO or eth bonding • Manage switch • STONITH/fencing device • Shared storage: NAS (nfs/cifs), SAN (fc/iscsi)
  18. 18. 18 Hardware Consideration
  19. 19. 19 Hardware Consideration
  20. 20. 20 Software Component • Corosync ‒ messaging and membership • Pacemaker ‒ Cluster resource management • Resource Agents ‒ Manage and monitor availability of service • Fencing device ‒ STONITH to ensure data integrity • User interface ‒ Crmsh and Hawk
  21. 21. 21 Other component • LVS linux virtual server • HAproxy • Shared file system: OCFS2, GFS2 • Block device replication: DRBD • Shared storage: SAN • Geo cluster
  22. 22. 22 More details • Pacemaker : Pacemaker is a cluster resource manager. It achieves maximum availability for your cluster resources by detecting and recovering from node and resource-level failures by making use of the messaging and membership capabilities provided by your preferred cluster infrastructure (either Corosync or Heartbeat).
  23. 23. 23 More details • Corosync : ‒ provides cluster infrastructure functionality ‒ provides messaging and membership functionality ‒ maintains the quorum information. ‒ This feature has been utilized by pacemaker to provide high availability solution.
  24. 24. 24 In short ... • Corosync : A quorum system that notifies applications when quorum is achieved or lost • Pacemaker : ‒ To start/stop resources on a node according to the score. ‒ To monitor resources according to interval. ‒ To restart resources if monitor fails. ‒ To fence/STONITH a node if stop operation fails.
  25. 25. 25 Pacemaker Corosync Conceptual Overview
  26. 26. 26 Pacemaker Components • Non-cluster aware components (illustrated in green). These pieces include the resources themselves, scripts that start, stop and monitor them • Cluster Resource manager, provides the brain that processes and reacts to events regarding the cluster • Low level infrastructure, Corosync provides reliable messaging, membership and quorum information about the cluster
  27. 27. 27 Pacemaker Stack • pacemaker corosync cluster called as pacemaker stack • Linux kernel by default comes with DLM (distributed lock manager). It provides locking feature which will be used by cluster aware filesystem • The GFS2 (Global File System2) and OCFS2 (Oracle cluster File System 2) are called as cluster aware filesystem • To access single filesytem by multiple hosts you need to have either GFS2 or OCFS2. • Or you can create a file system on top of cLVM (cluster logical volume manager)
  28. 28. 28
  29. 29. 29 Cluster Filesystem • If you have shared disk and want several nodes access it, you need cluster aware filesystem • The open source solution are GFS2 (Global File System2) and OCFS2 (Oracle cluster File System 2)
  30. 30. 30 Cluster Block Device • DRBD (distributed replicated block device) allows you to create a mirror of two block devices that are located at two different sites across an IP network. When used with Corosync, DRBD supports distributed high-availability Linux clusters. It is a network based raid1, and high performance data replication over network • CLVM2, see • Cluster md raid1, see er.txt
  31. 31. 31 Cluster Block Device
  32. 32. 32 STONITH • STONITH is an acronym for “Shoot-The-Other-Node-In- The-Head”. • It protects your data from being corrupted by rogue nodes or concurrent access. • Just because a node is unresponsive, this doesn’t mean it isn’t accessing your data. The only way to be 100% sure that your data is safe, is to use STONITH so we can be certain that the node is truly offline, before allowing the data to be accessed from another node. • STONITH also has a role to play in the event that a clustered service cannot be stopped. In this case, the cluster uses STONITH to force the whole node offline, thereby making it safe to start the service elsewhere.
  33. 33. 33 Split brain – the HA problem • Two nodes run the same service, break the data integrity • Solution: ‒ Quorum If cluster doesn’t have quorum no action will be taken, means fencing and resource management are disabled without quorum ‒ STONITH Shoot the other node in the head • More on stonith
  34. 34. 34 Reference • SUSE HA Extension Doc (can be use for openSUSE also) • HA clusterlabs • Corosync doc -85-100.pdf • DRBD • OCFS2 • CLVM2 • Linux SCSI
  35. 35. Case Study / Hands-on
  36. 36. 36 Setting up HA on Leap • Scenario: ‒ setting up openSUSE Leap 42.1 as host ‒ create 2 VM with QEMU/KVM, install openSUSE Leap 42.1, configure the network, and all the required packages ‒ Configure pacemaker corosync drbd ‒ Setup HA webserver
  37. 37. 37 Preparation • Install openSUSE Leap 42.1 • Configure all repository • Install all the required software • Create at least 2 virtul machine with QEMU/KVM • Configure the Cluster • Create DRBD • Activate web server / nginx or apache • Test the status
  38. 38. Thank you. Join the conversation, contribute & have a lot of fun!
  39. 39. 39 Have a Lot of Fun, and Join Us At: