Compute node HA - current upstream development

Short presentation made for OpenStack London "Tokyo Aftermath" meetup, on current upstream activity in the OpenStack HA developers community around high availability for compute nodes.

  1. 1. Compute node HA a.k.a. “can pets survive in OpenStack?” Adam Spiers Senior Software Engineer, Cloud & High Availability OpenStack London Meetup, Wednesday 18th November – short update on upstream development
  2. 2. High Availability in a typical OpenStack cloud today
  3. 3. 3 Typical HA control plane in OpenStack Pacemaker Cluster Control Node 1 Node DRBD PostgreSQL RabbitMQ Keystone Glance Nova Dashboard Cinder Neutron Database Cluster Node 1 Node 2 DRBD or shared storage Database Message Queue Services Cluster Node 1 Node 2 Node 3 Orchestration Keystone Glance Nova Dashboard Cinder Telemetry Neutron • Maximises cloud uptime • Automatic restart of OpenStack controller services • Active/Active API services with load balancing • DB + MQ either Active/Active or Active/Passive
  4. 4. 4 Under the covers Services Cluster Node 1 Node 2 Node 3 • Recommended by official HA guide • HAProxy distributes service requests • Pacemaker ‒ monitoring and control of nodes and services • Corosync ‒ cluster membership / messaging / quorum / leadership election Corosync Pacemaker HAProxy But what I really want to do is keep my workloads up!
  5. 5. 6 HA Cluster Control node OS Message queue Database Identity Images Block storage Networking Dashboard Compute OS Compute node nova-compute libvirt HA only on control plane OS Compute node nova-compute libvirt OS Compute node nova-compute libvirt
  6. 6. 7 HA Cluster Control node OS Message queue Database Identity Images Block storage Networking Dashboard Compute OS Compute node nova-compute libvirt Can we simply extend the cluster? OS Compute node nova-compute libvirt OS Compute node nova-compute libvirt
  8. 8. 9 Scaling up • Corosync requires <= 32 nodes • But we want lots of compute nodes! • The obvious workarounds are ugly ‒ Multiple compute clusters ‒ introduces unwanted artificial boundaries ‒ Clusters inside / between guest VM instances ‒ requires cloud users to modify guest images (installing & configuring cluster software) ‒ cluster stacks are not OS-agnostic ‒ cloud is supposed to make things easier not harder!
  9. 9. 10 pacemaker_remote to the rescue! • New(-ish) Pacemaker feature • Allows arbitrary scalability of an existing Pacemaker cluster
  10. 10. 11 Extending the cluster to compute nodes Services Cluster Node 1 Node 2 Node 3 Corosync Pacemaker HAProxy Compute node pacemaker_remote Compute node pacemaker_remote Compute node pacemaker_remote Compute node pacemaker_remote
  11. 11. 12 Capabilities • Increases availability of compute nodes ‒ Detects failed compute services ‒ Automatic recovery of compute services where possible • “Quarantines” failing compute nodes ‒ STONITH (fencing) extends to remote nodes • Coordinates with control plane ‒ VMs on dead compute nodes are resurrected elsewhere ‒ In nova, this is described as “evacuation”
  12. 12. 13 Public Health Warning nova evacuate does not really mean evacuation!
  13. 13. 14 Think about earthquakes Not too late to evacuate Too late to evacuate
  14. 14. 15 nova terminology nova live-migration nova evacuate
  15. 15. 16 Public Health Warning • nova evacuate does not do evacuation • nova evacuate does resurrection • In Vancouver, nova developers considered a rename ‒ Hasn't happened yet ‒ Due to impact, seems unlikely to happen any time soon ‒ Whenever you see “evacuate” in a nova-related context, pretend you saw “resurrect”
  16. 16. 17 Existing solutions • NovaCompute / NovaEvacuate custom OCF RAs ‒ used by Red Hat / SUSE / Intel ‒ works with known limitations • EvacuationD ‒ PoC to address above limitations ‒ decouples resurrection workflow from Pacemaker • Masakari (NTT) ‒ similar architecture, different code ‒ monitoring at 3 layers (node, process, hypervisor) • Approach of AWcloud / ChinaMobile ‒ very different; uses consul / raft / gossip
  17. 17. 18 Proposed solutions • Use Mistral to orchestrate resurrection workflow • Intel currently working on prototype • Possibly the most promising approach ‒ Mistral considered pretty solid ‒ This is exactly the kind of thing it was designed for • However, Mistral currently a SPoF … oops ‒ Don't worry, should be fixed in mitaka cycle • Feasibility of convergence with Masakari will probably be analysed within next week or two
  18. 18. 19 Community developments • openstack-resource-agents project now on stackforge ‒ maintained by me • New #openstack-ha IRC channel on FreeNode ‒ automatic notifications for activity on HA repositories • New topic category on openstack-dev@ mailing list Subject: [HA] i can haz pets in my cloud? • Weekly IRC meetings at Monday 9am UTC • HA guide currently undergoing a revamp • Everyone welcome to get involved!
