IBM AIX PowerHA online course by real time instructor. PowerHA is designed by IBM Corporation to keep resources highly available with minimum downtime. PowerHA formerly called as HACMP. It manages disk, network and application resources in logical manner.
Course curriculum:
Module 1 :
Introduction to HACMP for AIX
Module 2 :
Networking Considerations for High Availability
Module 3 :
Shared storage considerations for high availability
Module 4 :
Planning for applications and resource groups
Module 5 :
HACMP Installation
Module 6 :
Initial Cluster Configuration
Module 7 :
Basic HACMP administration
Module 8 :
Events
Module 9 :
Integrating NFS into HACMP
2. www.kerneltraining.com
Unit objectives
After completing this unit, you should be able to:
Define High Availability and explain why it is needed
List the key considerations when designing and
implementing a high availability cluster
Outline the features and benefits of HACMP for AIX
Describe the components of an HACMP for AIX cluster
Explain how HACMP for AIX operates in typical cases
HACMP
3. www.kerneltraining.com
High Availability and HACMP concepts
After completing this topic, you should be able to:
Define High Availability
Recognize that eliminating single points of failure (SPOFs) is part of
the HACMP implementation process
Outline the features and benefits for HACMP for AIX
Describe the HACMP concepts of topology and resources
Give examples of topology components and resources
Provide a brief description of the software and hardware
components of a typical HACMP cluster
HACMP
4. www.kerneltraining.com
So, what is High Availability?
High Availability characteristics:
The masking or elimination of both planned and unplanned
downtime
The elimination of single points of failure (SPOFs)
Fault resilience and system hardening
No specialized hardware requirement
HACMP
client
Workload Fallover
WAN
Production
Node/LPAR
Standby Node/LPAR
5. www.kerneltraining.com
Eliminating single points of failure
HACMP
Cluster Object Eliminated as a single point of failure by:
Node Using multiple nodes
Power source Using multiple circuits or uninterruptible power
supplies
Network adapter
Network
Using redundant network adapters
Using multiple networks to connect nodes
TCP/IP subsystem Using non-IP networks to connect adjoining nodes
and clients
Disk adapter
Disk
Using redundant disk adapter or multipath hardware
Using multiple disks with mirroring or raid
Application Adding node for takeover; configuring application
monitor
VIO Server Implementing dual VIO Servers
Site Adding an additional site
The fundamental goal of (successful) cluster design is
the elimination of single points of failure (SPOFs).
6. www.kerneltraining.com
High availability clusters (HACMP base)
HACMP
System p and AIX RAS features include:
Application and Partition Mobility
First Failure Data Capture (FFDC)
Dynamic CPU Deallocation
Flexible Service Processor
Redundant Power and Cooling
Error Correction Checking Memory
Hot Swap Adapters
Dynamic Kernel
Journaled Filesystem
Redundant Data Paths
Dual Disk Adapters (MPIO)
Data Mirroring and/or Striping
Hot Swap / Hot Spare Storage
Redundant Power/Cooling for Storage Arrays
With High Availability Clustering (HACMP)
Protection against node and OS failure with Redundant nodes
Protection against NIC failure with Redundant Network Adapters
Protection against Network failure with Redundant Networks
Self-healing clusters with Application Monitoring
Protection against Site Failure (typically limited by SAN infrastructure)
or no distance limitations with HACMP/XD
7. www.kerneltraining.com
What about site failure?
HACMP
Limited distance (LVM mirroring and SAN): HACMP for AIX
Extended distance: Geographic Clustering Solution
(that is, HACMP/XD)
Distance unlimited
Application, disk, and network independent
Automated site failover and reintegration
A single cluster across two sites
Get more details in HACMP System Administration III – AU620
Toron
to
Bruss
els
Metro Mirror/PPRC
GLVM
GeoRM
Data Replication
8. www.kerneltraining.com
IBM's HA solution for AIX
HACMP
HACMP for AIX characteristics:
Stands for High Availability Cluster Multi-processing
Is based on cluster technology (RSCT)
Provides two environments (which can co-exist simultaneously):
Serial (High Availability): the process of ensuring that an application is
available for use through the use of serially accessible shared data and
duplicated resources
Parallel (Cluster Multiprocessing): concurrent access to shared data
9. www.kerneltraining.com
Fundamental HACMP concepts
HACMP
Topology: Physical “networking centric” components
Resources: Entities that are being made highly available
Resource group: A collection of resources, which HACMP controls as a single
unit
A given resource can appear only in, at most, one resource group
Resource group policies:
startup policy: which node the resource group is activated on
fallover policy: determines target when there is a failure
fallback policy: determines fallback behavior
Customization
The process of augmenting HACMP, typically via implementing scripts
Minimum: application start and stop scripts
Optional:
Application monitoring scripts (highly recommended!)
Event customization
Notification, pre- and post-event scripts, recovery scripts, user-
defined events, time until warning (config_too_long timeout)
10. www.kerneltraining.com
A highly available cluster
HACMP
Shared Storage
clstrmgr clstrmgr
Fallover
Node
Node
Fundamental Concepts
Cluster is comprised of physical components (topology) and logical
components (resource groups and resources).
12. www.kerneltraining.com
HACMP’s topology components (2 of 2)
HACMP
Ethernet / Etherchannel
ServerServer
PC
Non -IPServer
Server
Heartbeat on Disk
RS232/422
SAN IBM
RS/6000
RS/6000
DS8000 Fibre
DS4000
Fibre Channel
Node
Any-to-any, including LPARs
Minimum number of physical adapters for
redundancy must be considered
Networking
Ethernet
Physical and virtual
Etherchannel
Non-IP
Heartbeat on disk, RS-232, Target-mode
SCSI
Shared storage
Physical
SCSI or Fibre Channel
Virtual SCSI
13. www.kerneltraining.com
What is HACMP?
HACMP
An application which:
Controls where resource groups run
Monitors and reacts to events
Provides tools for cluster-wide configuration and synchronization
Relies on other AIX Subsystems (ODM, LVM, RSCT, TCP/IP, SRC, and
so on)
Cluster Manager Subsystem (clstrmgrES)
Topology
manager
Resource
manager
Event
manager
SNMP
manager
RSCT
(topsvcs, grpsvcs, RMC
subsystems)
snmpd clinfoES
clcomdES
clstat
14. www.kerneltraining.com
Additional features of HACMP
HACMP
HACMP is shipped with utilities to simplify configuration, monitoring,
customization, and cluster administration.
OLPW
smit via web
Configuration
Assistant
CSPOC
DARE
clstrmgrES
SNMP
Verification
Auto tests
Tivoli
Integration
Application
Monitoring
15. www.kerneltraining.com
Some assembly required
HACMP
HACMP can be used out of the box; however, some assembly is
required.
Minimum:
Application Start/Stop/Monitor scripts
Optional:
Customized pre/post event scripts
Reaction to events
Error notification Methods
User Defined Event’s (UDE’s)
Cluster State Change
HACMP's flexibility allows for complex customization in order to
meet availability goals
16. www.kerneltraining.com
Let’s review
HACMP
1. Which of the following items are examples of topology components in
HACMP? (Select all that apply.)
a. Node
b. Network
c. Service IP label
d. Hard disk drive
2. True or False?
All nodes in an HACMP cluster must have roughly equivalent performance
characteristics.
3. Which of the following is a characteristic of high availability?
a. High availability always requires specially designed hardware
components.
b. High availability solutions always require manual intervention to ensure
recovery following fallover.
c. High availability solutions never require customization.
d. High availability solutions use redundant standard equipment (no
specialized hardware).
4. True or False?
A thorough design and detailed planning is required for all high availability
solutions.
17. www.kerneltraining.com
Let’s review solutions
HACMP
1. Which of the following items are examples of topology components in
HACMP? (Select all that apply.)
a. Node
b. Network
c. Service IP label
d. Hard disk drive
2. True or False?
All nodes in an HACMP cluster must have roughly equivalent performance
characteristics.a
3. Which of the following is a characteristic of high availability?
a. High availability always requires specially designed hardware
components.
b. High availability solutions always require manual intervention to ensure
recovery following fallover.
c. High availability solutions never require customization.
d. High availability solutions use redundant standard equipment (no
specialized hardware).
4. True or False?
A thorough design and detailed planning is required for all high availability
solutions.
18. www.kerneltraining.com
What does HACMP do?
HACMP
After completing this topic, you should be able to:
Describe the failures that HACMP detects directly
Provide an overview of the standby and takeover cluster
configuration options in HACMP
Describe some of the considerations and limits of an
HACMP cluster
19. www.kerneltraining.com
Just what does HACMP do?
HACMP
HACMP functions:
Monitors the states of nodes, networks, network adapters and devices
Strives to keep resource groups highly available
Optionally, monitors the state of the applications, and can be customized to
react to every possible failure
20. www.kerneltraining.com
What happens when something fails?
HACMP
How the cluster responds to a failure depends on what has failed, what the
resource group's fallover policy is, and if there are any resource group
dependencies:
Typically, another equivalent component takes over duties of failed
component (for example, another node takes over from a failed node).
21. www.kerneltraining.com
What happens when a problem is fixed?
HACMP
How the cluster responds to the recovery of a failed component depends on what
has recovered, what the resource group's fallback policy is, and the resource
group dependencies:
Typically, administrators need to indicate or confirm that the fixed component
is approved for use. Some components are integrated automatically; for
instance, when a communication interface recovers.a
22. www.kerneltraining.com
Standby (active/passive) with fallback
HACMP
Node USA fails Node UK fails
USA returns UK returns
One node is primary
RG can be configured
to come online on the
primary or any node
(no change)
A
A A
AA
25. www.kerneltraining.com
Concurrent: Multiple active nodes
HACMP
USA, Germany, and UK are all
running Application A, each
using a separate IP Address
A A A
A A AA
If nodes fail, the application remains
continuously available as long as there are
surviving nodes to run on.
Fixed nodes resume running their copy of the
application.
Application must be designed to run simultaneously on
multiple nodes.
This has the potential for essentially zero downtime.
26. www.kerneltraining.com
Points to ponder
HACMP
Resource groups:
Must be serviced by at least two nodes
Can have different policies
Can be migrated (manually or automatically) to rebalance loads
Clusters:
Must have at least one IP network and one non-IP network
Need not have any shared storage
Can have any combination of supported nodes *
Can be split across two sites
Might or might not require replicating data (HACMP/XD).
Applications:
Can be restarted via monitoring
Must be manageable via scripts (start/restart and stop)
* Application performance requirements and other operational issues
almost certainly impose practical constraints on the size and
complexity of a given cluster.
27. www.kerneltraining.com
Other considerations for HACMP
HACMP
Design, planning, testing
Focus on service and availability
Apply appropriate risk analysis
Disciplined system administration practices
Documented operational procedures
High
availability
Continuous
operation
Continuous
availability
Systems
Management
People
Data
Hardware
Software
Environment
Networking
28. www.kerneltraining.com
Things HACMP does not do
HACMP
Back-up and restoration
Time synchronization
Application specific configuration
System administration tasks unique to each node
29. www.kerneltraining.com
When is HACMP not the correct solution?
HACMP
Zero downtime required
Maybe a fault tolerant system is the correct choice.
Availability 7x24x365; HACMP occasionally needs to be shut
down for maintenance.
Life-critical environments.
Security issues
Too little security
Many people can change the environment.
Too much security
C2 and B1 environments might not allow HACMP to
function as designed.
Unstable environments
HACMP cannot make an unstable and poorly managed
environment stable.
HACMP tends to reduce the availability of poorly managed
systems.
30. www.kerneltraining.com
What do we plan to achieve this week?
HACMP
Your mission this week is to build a two-node mutual takeover
highly available cluster using two previously separate AIX systems,
each of which has an application which needs to be made highly
available.
A
B
A
B
31. www.kerneltraining.com
Overview of the implementation process
HACMP
Plan and configure AIX
Elimination of single points of failure
Storage (adapters, LVM volume group, filesystem)
Networks (IP interfaces, /etc/hosts, non-IP networks, and devices)
Application start and stop scripts
Install the HACMP filesets (Note: 5.3 and earlier reboot!)
Configure the HACMP environment
Topology
Cluster, node names, HACMP IP and non-IP networks
Resources and Resource groups:
Identify name, nodes, policies
Resources: Application Server, service label, VG, filesystem
Synchronize, then start HACMP
Note: If using two nodes and one application “Configure the HACMP
environment” can be done in one step.
32. www.kerneltraining.com
Hints to get started
HACMP
• Draw a diagram.
• Use (online) planning sheets.
• Focus on eliminating SPOFs.
• Always factor in a non-IP network.
• Ensure that you have multipath
access to shared storage devices.
• Document a test plan.
• Test the cluster carefully.
• Be methodical.
hints
Public Network
Resource Group databaserg contains
Volume Group = dbvg
hdisk3, hdisk4, hdisk5, hdisk6, hdisk7
Major # = 51
JFS Log = dblvlog
Logical Volume = dblv1, dblv2
FS Mount Point = /db, /dbdata
Node Name = nodea
Resource group = dbrg
Applications = database
Resources = cascading
A-B
Priority = 1,2
CWOF = yes
Label = a_tmssa
Device = /dev/tmssa1
Label = a_tty
Device = /dev/tty1
Node Name =nodeb
Resource group = httprg
Applications = http
Resources = cascading
B-A
Priority = 2,1
CWOF = yes
Label = b_tmssa
Device = /dev/tmssa2
Label = a_tty
Device = /dev/tty1
tmssa network
serial network
VG = dbvg
Raid5
100GB
VG =httpvg
Raid1
9GB
rootvg
raid1
9.1GB
rootvg
raid1
9.1GB
user
community
HACMP Cluster
for
the ABC company
Resource Group httprg contains
Volume Group = httpvg
hdisk2,hdisk8
Major # = 50
JFS Log = httplvlog
Logical Volume = httplv
FS Mount Point = /http
Node A IP Label IP Address Netmask
Service webserv 192.168.9.5 255.255.255.0
Boot nodebboot 192.168.9.6 255.255.255.0
Standby nodebstand 192.168.254.3 255.255.255.0
Node A IP Label IP Address Netmask
Service database 192.168.9.3 255.255.255.0
Boot nodeaboot 192.168.9.4 255.255.255.0
Standby nodeastand 192.168.254.3 255.255.255.0
33. www.kerneltraining.com
Sources of HACMP information
HACMP
HACMP manuals come with the product
cluster.doc.en_US.es.html
cluster.doc.en_US.es.pdf
HACMP documentation also available online
http://www.ibm.com/servers/eserver/pseries/library/hacmp_docs.html
Release Notes contain important information about the version release
/usr/es/sbin/cluster/release_notes
Sales manual: http://www.ibm.com/common/ssi
IBM courses:
HACMP Admin. I: Planning and Implementation (AU540/AU54)
HACMP Admin II: Admin. and Problem Determination (AU610/AU61)
HACMP Administration III: Virtualization and Disaster Recovery
(AU620/AU62)
HACMP V5 Internals (AU60)
IBM Web site:
http://www-03.ibm.com/systems/p/ha/
Non-IBM sources (not endorsed by IBM but probably worth a look):
http://lpar.co.uk
http://portal.explico.de/
http://www.matilda.com/hacmp/
http://groups.yahoo.com/group/hacmp/
34. www.kerneltraining.com
Checkpoint
HACMP
1. True or False?
Resource Groups can be moved from node to node.
2. True or False?
HACMP/XD is a complete solution for building geographically
distributed clusters.
3. Which of the following capabilities does HACMP not provide?
(Select all that apply.)
a. Time synchronization
b. Automatic recovery from node and network adapter failure
c. System Administration tasks unique to each node; back-up and
restoration
d. Fallover of just a single resource group
4. True or False?
All nodes in a resource group must have equivalent performance
characteristics.
35. www.kerneltraining.com
Checkpoint solutions
HACMP
True or False?
Resource Groups can be moved from node to node.
True or False?
HACMP/XD is a complete solution for building geographically
distributed clusters.
Which of the following capabilities does HACMP not provide?
(Select all that apply.):
Time synchronization
Automatic recovery from node and network adapter failure
System Administration tasks unique to each node; back-up and
restoration
Fallover of just a single resource group
True or False?
All nodes in a resource group must have equivalent performance
characteristics.
36. www.kerneltraining.com
Unit summary
HACMP
Having completed this unit, you should be able to:
Define high availability and explain why it is needed
Outline the various options for implementing high availability
List the key considerations when designing and implementing a high
availability cluster
Outline the features and benefits of HACMP for AIX
Describe the components of an HACMP for AIX cluster
Explain how HACMP for AIX operates in typical casesa