Más contenido relacionado
La actualidad más candente (20)
Similar a HMS: Scalable Configuration Management System for Hadoop (20)
Más de DataWorks Summit (20)
HMS: Scalable Configuration Management System for Hadoop
- 1. HMS: Scalable
Configuration
Management System
for Hadoop
Kan Zhang (IBM)
Eric Yang (IBM)
June 5, 2012 © 2011 IBM Corporation
- 2. Motivation
■ Goal: managing Hadoop stack in a data center
– Multiple clusters, 10,000+ nodes, inter-cluster operations
■ Scalability
– Traditional client/server architecture is cumbersome to scale
– Need to keep server states in sync, load balance client requests
– Fault tolerance adds further complexity
■ Real-time interaction and feedback
– No poll or push, interact via asynchronous notification
– Visibility into cluster state is a major pain point for sys admins
■ Cross-node ordering dependency
– Example: start JobTracker after NameNode is running
– Simple to specify and efficient to implement
2 © 2011 IBM Corporation
- 3. HMS Approach
Controller Agent
ZooKeeper Storage
■ ZooKeeper plays a central role
– Fault-tolerant and scalable storage
– Asynchronous messaging service
3 © 2011 IBM Corporation
- 4. ZooKeeper
/
/dir1 /dir2
Watch
Ephemeral Queue
• A hierarchical namespace of znodes for storing data
• Sequential znodes for message queuing
• Watches for asynchronous notification
• Ephemeral znodes for failure detection
4 © 2011 IBM Corporation
- 5. Leveraging ZooKeeper
■ Storing cluster state
– Each cluster node is represented by a znode in ZooKeeper
– Node state is stored in its corresponding znode
■ Storing system state
– Any state needed for failure recovery is persisted in ZooKeeper
– Server failures are detected via ephemeral nodes
■ Distributed orchestration
– Messages are exchanged asynchronously via ZooKeeper queues
– Notifications are triggered by watches on queues
■ Cross-node dependency
– Node states are stored in ZooKeeper and accessible to all
– Watches are registered on state znodes to get notified of state change
5 © 2011 IBM Corporation
- 6. HMS Overview
ZooKeeper
cmd-
queue cmd1 cmdStatus
Status- watch
queue
action- action1 agent
NameNode queue
Controller1
worklog
/hms clusters cluster1
Controller2
action- action1 agent
JobTracker queue
Client worklog
cluster2
Controller1
live-
controllers
Controller2
6 © 2011 IBM Corporation
- 7. Design Implications
■ All cluster state is stored in ZooKeeper
– Built-in fault-tolerance and HA
– Cluster and command status at your finger tips
■ Controllers and agents don’t interact directly
– All communications are via ZooKeeper async notifications
– Good for scalability and fault-isolation
■ Controllers and agents are stateless
– Controllers can be replicated for load balancing
– Controller failures are automatically detected and handled
■ Dependency specified in terms of node states
– Actions come and go, but their effects are captured in node states
– Node state changes will trigger dependent actions to be re-evaluated
7 © 2011 IBM Corporation
- 8. Node List
{
"@url":"http://10.0.1.201:4080/v1/nodes/manifest/test",
"roles":[
{"@name":"namenode","host":"host1"},
{"@name":"jobtracker","host":"host2"},
{...}
]
}
Roles map to hostnames
8 © 2011 IBM Corporation
- 9. Package Manifest
{
"@url": "http://10.0.1.201:4080/v1/software/stack/hadoop-1.0.3",
"@name": "hadoop",
"@version": "1.0.3",
"roles": [
{
"@name": "namenode", Define software stack by roles
"package": [
{
"name": "http://.../hadoop-1.0.3-1.x86_64.rpm"
}
]
},
{
"@name": "datanode",
"package": [
{
"name": "http://.../hadoop-1.0.3-1.x86_64.rpm" Software download URL
}
]
}
]
}
9 © 2011 IBM Corporation
- 10. Configuration Plan
{
“@url”: ”http://host/config/hadoop-1.0.3”,
“@roles”: [
{ “role” : “namenode”, “actions” : [ … ] },
{ “role” : “jobtracker”, “actions” : [ … ] },
{ “role” : “datanode” , “actions”, : [ … ] }
]
}
Run a list of scripts to configure NameNode
10 © 2011 IBM Corporation
- 11. Start NameNode
{
“@url”: ”http://host/config/hadoop-1.0.3”,
“@roles”: [
{ “role” : “namenode”, “actions” : [ … ] },
{ “role” : “jobtracker”, “actions” : [ … ] },
{ “role” : “datanode” , “actions”, : [ … ] }
]
{
} "@type": "scriptAction",
"expectedResults": {
"type": "DAEMON",
"name": "hadoop-namenode",
Define expected
"status": "STARTED" result
},
"script": "/usr/sbin/hadoop-setup-hdfs.sh",
"parameters": [
"--format",
"--hdfs-user=hdfs",
"--mapreduce-user=mapred",
"--namenode-host=${namenode}"
]
},
Run a script to setup HDFS on namenode
11 © 2011 IBM Corporation
- 12. Compiled Plan
{
"startTime":"Thu Jun 07 13:11:29 {PDT 2012",
"action":{
"status":"SUCCEEDED", "@action":"DaemonAction",
"daemonName":"hadoop-namenode",
"clusterName":"my-test-cluster", "actionId":1,
"cmdPath":"/cmdqueue/cmd-0000000000",
"actionEntries":[ "actionType":"start",
"expectedResults":[
{ ... }, {
"name":"hadoop-namenode",
{ ... } "type":"DAEMON",
"status":"STARTED"
], }
],
"completedActions":6, "role":"namenode"
},
"totalActions":6, "hostStatus":[
{
"endTime":"Thu Jun 07 13:12:07 PDT 2012" "host":"bdvm021.svl.ibm.com",
Host Status
"status":"SUCCEEDED"
} }
]
},
12 © 2011 IBM Corporation
- 13. Start JobTracker
{
“@url”: ”http://host/config/hadoop-1.0.3”,
“@roles”: [
{ “role” : “namenode”, “actions” : [ … ] },
{ “role” : “jobtracker”, “actions” : [ … ] },
{ “role” : “datanode” , “actions”, : [ … ] }
] {
"@type": "daemonAction",
} "actionType": "start",
"dependencies": {
"states": {
Dependency
"type": "DAEMON", of namenode
"name": "hadoop-namenode",
"status": "STARTED" started
},
"roles": "namenode"
},
"expectedResults": {
"type": "DAEMON",
"name": "hadoop-jobtracker",
"status": "STARTED"
},
"daemon": "hadoop-jobtracker"
},
Start JobTracker
13 © 2011 IBM Corporation
- 14. Compiled Plan 2
{ {
"action":{
"startTime":"Thu Jun 07 13:11:29 PDT 2012",
"@action":"DaemonAction",
"daemonName":"hadoop-jobtracker",
"status":"SUCCEEDED", "actionId":5,
"cmdPath":"/cmdqueue/cmd-0000000000",
"clusterName":"my-test-cluster", "actionType":"start",
"dependencies":[
"actionEntries":[ {
"roles":[
{ ... }, "namenode"
],
{ ... } "hosts":[
"/clusters/my-test-cluster/bdvm022.svl.ibm.com"
], ],
"states":[
"completedActions":6, {
"name":"hadoop-namenode",
"totalActions":6, "type":"DAEMON",
"status":"STARTED"
"endTime":"Thu Jun 07 13:12:07 PDT 2012" }
]
} }
],
"role":"jobtracker"
},
}, Translated
Dependency
14 © 2011 IBM Corporation
- 15. Node State
[zk: localhost:2181(CONNECTED) 5] get /hms/clusters/my-test-
cluster/bdvm021.svl.ibm.com
{
"states":[
{ ... },
{ … },
{
"name":"hadoop-namenode",
"type":"DAEMON",
"status":"STARTED"
}
]
} State updates based on status reported by Agent
15 © 2011 IBM Corporation
- 16. Q&A
• HMS prototype is available on GitHub
https://github.com/macroadster/hms
• Credits
Kan Zhang (kzhang@apache.org)
Eric Yang (eyang@apache.org)
Jagane Sundar (jagane@apache.org)
16 © 2011 IBM Corporation