6. AvatarNode (AN)
Active-Standby Pair Client
Coordinated via ZooKeeper
Failover in few seconds Client retrieves
block location from
Wrapper over NameNode Primary or Standby
Active AvatarNode Write
Read
Active transaction Standby
Writes transaction log to AvatarNode
transaction
AvatarNode
NFS filter
(NameNode) (NameNode)
Standby AvatarNode
Reads/Consumes
transactions from NFS filter Block Block
Processes all messages from Location Location
DataNodes messages messages
Latest metadata in memory
DataNodes
7. Four steps to failover
Wipe ZooKeeper entry. Clients will know the failover is in
progress. (0 seconds)
Stop the primary NameNode. Last bits of data will be
flushed to Transaction Log and it will die. (Seconds)
Switch Standby to Primary. It will consume the rest of the
Transaction log and get out of SafeMode ready to serve
traffic. (Seconds)
Update the entry in ZooKeeper. All the clients waiting for
failover will pick up the new connection (0 seconds)
After: Start the first node in the Standby Mode (Takes a
while, but the cluster is up and running)
9. Conclusions
Complete Hot Standby
NFS for storage of fsimage and editlogs. (no data loss)
Standby node Consumes transactions from editlogs on NFS
continuously. (namespace hot standby)
DataNodes send message to both primary and standby node.
(block reports hot standby)
Fast Switchover
Less than a minute
Make sense!
11. BackupNode (BN)
NN synchronously streams Client
transaction log to Client retrieves block location
BackupNode from NN
BackupNode applies log Synchronous
NN
to in-memory and disk stream transacton
(NameNode) logs to BN
image
BN always commit to disk BN
Block (BackupNode
before success to NN Location
)
If BN restarts, it has to messages
catch up with NN
Available in HDFS 0.20.1
release DataNodes
12. Limitations of BackupNode(BN)
Maximum of one BackupNode per NN
Support only two-machine failure
NN doesn’t forward block reports to BackupNode
Time to restart from 12GB image, 70M files + 100M
blocks
3-5 minutes to read the image from the disk
20 min to process block reports
BN will still take 25+ minutes to failover!
14. Other HA solutions
DRDB + Linux HA
http://www.cloudera.com/blog/2009/07/hadoop-ha-
configuration/
metadata backup
http://wiki.apache.org/hadoop/NameNodeFailover