SlideShare una empresa de Scribd logo
1 de 16
HMS: Scalable
    Configuration
 Management System
     for Hadoop


               Kan Zhang (IBM)
               Eric Yang (IBM)


June 5, 2012                     © 2011 IBM Corporation
Motivation
■    Goal: managing Hadoop stack in a data center
     –  Multiple clusters, 10,000+ nodes, inter-cluster operations
■    Scalability
     –  Traditional client/server architecture is cumbersome to scale
     –  Need to keep server states in sync, load balance client requests
     –  Fault tolerance adds further complexity
■    Real-time interaction and feedback
     –  No poll or push, interact via asynchronous notification
     –  Visibility into cluster state is a major pain point for sys admins
■    Cross-node ordering dependency
     –  Example: start JobTracker after NameNode is running
     –  Simple to specify and efficient to implement



2                                                                            © 2011 IBM Corporation
HMS Approach


     Controller                                          Agent


                            ZooKeeper       Storage




    ■    ZooKeeper plays a central role
         –  Fault-tolerant and scalable storage
         –  Asynchronous messaging service



3                                                     © 2011 IBM Corporation
ZooKeeper
                                        /



                            /dir1                /dir2



                                        Watch
           Ephemeral       Queue


    •    A hierarchical namespace of znodes for storing data
    •    Sequential znodes for message queuing
    •    Watches for asynchronous notification
    •    Ephemeral znodes for failure detection

4                                                        © 2011 IBM Corporation
Leveraging ZooKeeper
■    Storing cluster state
     –  Each cluster node is represented by a znode in ZooKeeper
     –  Node state is stored in its corresponding znode
■    Storing system state
     –  Any state needed for failure recovery is persisted in ZooKeeper
     –  Server failures are detected via ephemeral nodes
■    Distributed orchestration
     –  Messages are exchanged asynchronously via ZooKeeper queues
     –  Notifications are triggered by watches on queues
■    Cross-node dependency
     –  Node states are stored in ZooKeeper and accessible to all
     –  Watches are registered on state znodes to get notified of state change



5                                                                         © 2011 IBM Corporation
HMS Overview
                  ZooKeeper
                                cmd-
                               queue                cmd1            cmdStatus


                               Status-                  watch
                               queue
                                                                         action-   action1           agent
                                                       NameNode          queue
    Controller1
                                                                                   worklog
                  /hms   clusters        cluster1
    Controller2

                                                                         action-   action1           agent
                                                       JobTracker        queue

      Client                                                                       worklog
                                         cluster2


                                                    Controller1
                            live-
                         controllers
                                                    Controller2




6                                                                                            © 2011 IBM Corporation
Design Implications
■    All cluster state is stored in ZooKeeper
     –  Built-in fault-tolerance and HA
     –  Cluster and command status at your finger tips
■    Controllers and agents don’t interact directly
     –  All communications are via ZooKeeper async notifications
     –  Good for scalability and fault-isolation
■    Controllers and agents are stateless
     –  Controllers can be replicated for load balancing
     –  Controller failures are automatically detected and handled
■    Dependency specified in terms of node states
     –  Actions come and go, but their effects are captured in node states
     –  Node state changes will trigger dependent actions to be re-evaluated



 7                                                                       © 2011 IBM Corporation
Node List
    {
        "@url":"http://10.0.1.201:4080/v1/nodes/manifest/test",
        "roles":[
          {"@name":"namenode","host":"host1"},
          {"@name":"jobtracker","host":"host2"},
          {...}
        ]
    }


                        Roles map to hostnames




8                                                           © 2011 IBM Corporation
Package Manifest
{
    "@url": "http://10.0.1.201:4080/v1/software/stack/hadoop-1.0.3",
    "@name": "hadoop",
    "@version": "1.0.3",
    "roles": [
      {
        "@name": "namenode",                                    Define software stack by   roles
        "package": [
          {
            "name": "http://.../hadoop-1.0.3-1.x86_64.rpm"
          }
        ]
      },
      {
        "@name": "datanode",
        "package": [
          {
            "name": "http://.../hadoop-1.0.3-1.x86_64.rpm"      Software download URL
          }
        ]
      }
    ]
}


9                                                                                     © 2011 IBM Corporation
Configuration Plan
{
    “@url”: ”http://host/config/hadoop-1.0.3”,
    “@roles”: [
      { “role” : “namenode”, “actions” : [ … ] },
      { “role” : “jobtracker”, “actions” : [ … ] },
      { “role” : “datanode” , “actions”, : [ … ] }
    ]
}


                                    Run a list of scripts to configure NameNode




10                                                                    © 2011 IBM Corporation
Start NameNode
{
    “@url”: ”http://host/config/hadoop-1.0.3”,
    “@roles”: [
      { “role” : “namenode”, “actions” : [ … ] },
      { “role” : “jobtracker”, “actions” : [ … ] },
      { “role” : “datanode” , “actions”, : [ … ] }
    ]
                                  {
}                                      "@type": "scriptAction",
                                       "expectedResults": {
                                         "type": "DAEMON",
                                         "name": "hadoop-namenode",
                                                                                     Define expected
                                         "status": "STARTED"                         result
                                       },
                                       "script": "/usr/sbin/hadoop-setup-hdfs.sh",
                                       "parameters": [
                                         "--format",
                                         "--hdfs-user=hdfs",
                                         "--mapreduce-user=mapred",
                                         "--namenode-host=${namenode}"
                                       ]
                                  },

                                       Run a script to setup HDFS on namenode

11                                                                                          © 2011 IBM Corporation
Compiled Plan
{
     "startTime":"Thu Jun 07 13:11:29 {PDT 2012",
                                        "action":{
     "status":"SUCCEEDED",                 "@action":"DaemonAction",
                                           "daemonName":"hadoop-namenode",
     "clusterName":"my-test-cluster",      "actionId":1,
                                           "cmdPath":"/cmdqueue/cmd-0000000000",
     "actionEntries":[                     "actionType":"start",
                                           "expectedResults":[
        { ... },                              {
                                                "name":"hadoop-namenode",
        { ... }                                 "type":"DAEMON",
                                                "status":"STARTED"
     ],                                       }
                                           ],
     "completedActions":6,                 "role":"namenode"
                                        },
     "totalActions":6,                  "hostStatus":[
                                           {
     "endTime":"Thu Jun 07 13:12:07 PDT 2012" "host":"bdvm021.svl.ibm.com",
                                                                            Host Status
                                                           "status":"SUCCEEDED"
}                                                      }
                                                   ]
                                              },




12                                                                                © 2011 IBM Corporation
Start JobTracker
{
    “@url”: ”http://host/config/hadoop-1.0.3”,
    “@roles”: [
      { “role” : “namenode”, “actions” : [ … ] },
      { “role” : “jobtracker”, “actions” : [ … ] },
      { “role” : “datanode” , “actions”, : [ … ] }
    ]                               {
                                       "@type": "daemonAction",
}                                      "actionType": "start",
                                       "dependencies": {
                                         "states": {
                                                                         Dependency
                                           "type": "DAEMON",             of namenode
                                           "name": "hadoop-namenode",
                                           "status": "STARTED"           started
                                         },
                                         "roles": "namenode"
                                       },
                                       "expectedResults": {
                                         "type": "DAEMON",
                                         "name": "hadoop-jobtracker",
                                         "status": "STARTED"
                                       },
                                       "daemon": "hadoop-jobtracker"
                                  },
                                                          Start JobTracker
13                                                                           © 2011 IBM Corporation
Compiled Plan 2
{                                                {
                                                          "action":{
     "startTime":"Thu Jun 07 13:11:29 PDT 2012",
                                       "@action":"DaemonAction",
                                       "daemonName":"hadoop-jobtracker",
     "status":"SUCCEEDED",             "actionId":5,
                                       "cmdPath":"/cmdqueue/cmd-0000000000",
     "clusterName":"my-test-cluster",  "actionType":"start",
                                       "dependencies":[
     "actionEntries":[                   {
                                           "roles":[
        { ... },                              "namenode"
                                           ],
        { ... }                            "hosts":[
                                              "/clusters/my-test-cluster/bdvm022.svl.ibm.com"
     ],                                    ],
                                           "states":[
     "completedActions":6,                    {
                                                "name":"hadoop-namenode",
     "totalActions":6,                          "type":"DAEMON",
                                                "status":"STARTED"
     "endTime":"Thu Jun 07 13:12:07 PDT 2012" }
                                                                     ]
}                                                                }
                                                               ],
                                                               "role":"jobtracker"

                                                     },
                                                          },                         Translated
                                                                                     Dependency

14                                                                                      © 2011 IBM Corporation
Node State
[zk: localhost:2181(CONNECTED) 5] get /hms/clusters/my-test-
cluster/bdvm021.svl.ibm.com
{
  "states":[
    { ... },
    { … },
    {
       "name":"hadoop-namenode",
       "type":"DAEMON",
       "status":"STARTED"
    }
  ]
}                  State updates based on status reported by Agent


 15                                                       © 2011 IBM Corporation
Q&A
•  HMS prototype is available on GitHub

     https://github.com/macroadster/hms

•  Credits
     Kan Zhang (kzhang@apache.org)
     Eric Yang (eyang@apache.org)
     Jagane Sundar (jagane@apache.org)




16                                        © 2011 IBM Corporation

Más contenido relacionado

La actualidad más candente

EAP6 performance Tuning
EAP6 performance TuningEAP6 performance Tuning
EAP6 performance Tuning
Praveen Adupa
 
MySQL Best Practices - OTN LAD Tour
MySQL Best Practices - OTN LAD TourMySQL Best Practices - OTN LAD Tour
MySQL Best Practices - OTN LAD Tour
Ronald Bradford
 
Configuring Oracle Enterprise Manager Cloud Control 12c for HA White Paper
Configuring Oracle Enterprise Manager Cloud Control 12c for HA White PaperConfiguring Oracle Enterprise Manager Cloud Control 12c for HA White Paper
Configuring Oracle Enterprise Manager Cloud Control 12c for HA White Paper
Leighton Nelson
 

La actualidad más candente (20)

Basics of Logical Replication,Streaming replication vs Logical Replication ,U...
Basics of Logical Replication,Streaming replication vs Logical Replication ,U...Basics of Logical Replication,Streaming replication vs Logical Replication ,U...
Basics of Logical Replication,Streaming replication vs Logical Replication ,U...
 
PostgreSQL Replication in 10 Minutes - SCALE
PostgreSQL Replication in 10  Minutes - SCALEPostgreSQL Replication in 10  Minutes - SCALE
PostgreSQL Replication in 10 Minutes - SCALE
 
PostGreSQL Performance Tuning
PostGreSQL Performance TuningPostGreSQL Performance Tuning
PostGreSQL Performance Tuning
 
Streaming Replication Made Easy in v9.3
Streaming Replication Made Easy in v9.3Streaming Replication Made Easy in v9.3
Streaming Replication Made Easy in v9.3
 
My two cents about Mysql backup
My two cents about Mysql backupMy two cents about Mysql backup
My two cents about Mysql backup
 
GlassFish v2 Clustering
GlassFish v2 ClusteringGlassFish v2 Clustering
GlassFish v2 Clustering
 
EAP6 performance Tuning
EAP6 performance TuningEAP6 performance Tuning
EAP6 performance Tuning
 
Built-in Replication in PostgreSQL
Built-in Replication in PostgreSQLBuilt-in Replication in PostgreSQL
Built-in Replication in PostgreSQL
 
MySQL Backup and Recovery Essentials
MySQL Backup and Recovery EssentialsMySQL Backup and Recovery Essentials
MySQL Backup and Recovery Essentials
 
Online MySQL Backups with Percona XtraBackup
Online MySQL Backups with Percona XtraBackupOnline MySQL Backups with Percona XtraBackup
Online MySQL Backups with Percona XtraBackup
 
Essential Linux Commands for DBAs
Essential Linux Commands for DBAsEssential Linux Commands for DBAs
Essential Linux Commands for DBAs
 
SQL Server vs Postgres
SQL Server vs PostgresSQL Server vs Postgres
SQL Server vs Postgres
 
What's New in Postgres Plus Advanced Server 9.3
What's New in Postgres Plus Advanced Server 9.3What's New in Postgres Plus Advanced Server 9.3
What's New in Postgres Plus Advanced Server 9.3
 
PostgreSQL Scaling And Failover
PostgreSQL Scaling And FailoverPostgreSQL Scaling And Failover
PostgreSQL Scaling And Failover
 
JBoss AS 7
JBoss AS 7JBoss AS 7
JBoss AS 7
 
Sql server 2012 ha dr nova
Sql server 2012 ha dr novaSql server 2012 ha dr nova
Sql server 2012 ha dr nova
 
MySQL Best Practices - OTN LAD Tour
MySQL Best Practices - OTN LAD TourMySQL Best Practices - OTN LAD Tour
MySQL Best Practices - OTN LAD Tour
 
Less13 performance
Less13 performanceLess13 performance
Less13 performance
 
Configuring Oracle Enterprise Manager Cloud Control 12c for HA White Paper
Configuring Oracle Enterprise Manager Cloud Control 12c for HA White PaperConfiguring Oracle Enterprise Manager Cloud Control 12c for HA White Paper
Configuring Oracle Enterprise Manager Cloud Control 12c for HA White Paper
 
Sql server 2012 ha dr 24_hop_final
Sql server 2012 ha dr 24_hop_finalSql server 2012 ha dr 24_hop_final
Sql server 2012 ha dr 24_hop_final
 

Similar a HMS: Scalable Configuration Management System for Hadoop

App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
outstanding59
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
Richard McDougall
 
Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to HivePerformance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
Yukinori Suda
 
2013 11-19-hoya-status
2013 11-19-hoya-status2013 11-19-hoya-status
2013 11-19-hoya-status
Steve Loughran
 
What Big Data Folks Need to Know About DevOps
What Big Data Folks Need to Know About DevOpsWhat Big Data Folks Need to Know About DevOps
What Big Data Folks Need to Know About DevOps
Matt Ray
 
Cloud Foundry Open Tour China
Cloud Foundry Open Tour ChinaCloud Foundry Open Tour China
Cloud Foundry Open Tour China
marklucovsky
 

Similar a HMS: Scalable Configuration Management System for Hadoop (20)

Whirr dev-up-puppetconf2011
Whirr dev-up-puppetconf2011Whirr dev-up-puppetconf2011
Whirr dev-up-puppetconf2011
 
Automação do físico ao NetSecDevOps
Automação do físico ao NetSecDevOpsAutomação do físico ao NetSecDevOps
Automação do físico ao NetSecDevOps
 
App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)App cap2956v2-121001194956-phpapp01 (1)
App cap2956v2-121001194956-phpapp01 (1)
 
Inside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworldInside the Hadoop Machine @ VMworld
Inside the Hadoop Machine @ VMworld
 
App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)App Cap2956v2 121001194956 Phpapp01 (1)
App Cap2956v2 121001194956 Phpapp01 (1)
 
GeekAustin DevOps
GeekAustin DevOpsGeekAustin DevOps
GeekAustin DevOps
 
Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to HivePerformance evaluation of cloudera impala 0.6 beta with comparison to Hive
Performance evaluation of cloudera impala 0.6 beta with comparison to Hive
 
A Tale of a Server Architecture (Frozen Rails 2012)
A Tale of a Server Architecture (Frozen Rails 2012)A Tale of a Server Architecture (Frozen Rails 2012)
A Tale of a Server Architecture (Frozen Rails 2012)
 
Automation day red hat ansible
   Automation day red hat ansible    Automation day red hat ansible
Automation day red hat ansible
 
A Groovy Kind of Java (San Francisco Java User Group)
A Groovy Kind of Java (San Francisco Java User Group)A Groovy Kind of Java (San Francisco Java User Group)
A Groovy Kind of Java (San Francisco Java User Group)
 
Ansible & Salt - Vincent Boon
Ansible & Salt - Vincent BoonAnsible & Salt - Vincent Boon
Ansible & Salt - Vincent Boon
 
2013 11-19-hoya-status
2013 11-19-hoya-status2013 11-19-hoya-status
2013 11-19-hoya-status
 
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
Jacopo Nardiello - Monitoring Cloud-Native applications with Prometheus - Cod...
 
A tour of Ansible
A tour of AnsibleA tour of Ansible
A tour of Ansible
 
What Big Data Folks Need to Know About DevOps
What Big Data Folks Need to Know About DevOpsWhat Big Data Folks Need to Know About DevOps
What Big Data Folks Need to Know About DevOps
 
Cloud Foundry Open Tour China
Cloud Foundry Open Tour ChinaCloud Foundry Open Tour China
Cloud Foundry Open Tour China
 
Serve Meals, Not Ingredients (ChefConf 2015)
Serve Meals, Not Ingredients (ChefConf 2015)Serve Meals, Not Ingredients (ChefConf 2015)
Serve Meals, Not Ingredients (ChefConf 2015)
 
Serve Meals, Not Ingredients - ChefConf 2015
Serve Meals, Not Ingredients - ChefConf 2015Serve Meals, Not Ingredients - ChefConf 2015
Serve Meals, Not Ingredients - ChefConf 2015
 
Hoya for Code Review
Hoya for Code ReviewHoya for Code Review
Hoya for Code Review
 
Cooking with Chef
Cooking with ChefCooking with Chef
Cooking with Chef
 

Más de DataWorks Summit

HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
DataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
DataWorks Summit
 

Más de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 

Último (20)

AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

HMS: Scalable Configuration Management System for Hadoop

  • 1. HMS: Scalable Configuration Management System for Hadoop Kan Zhang (IBM) Eric Yang (IBM) June 5, 2012 © 2011 IBM Corporation
  • 2. Motivation ■  Goal: managing Hadoop stack in a data center –  Multiple clusters, 10,000+ nodes, inter-cluster operations ■  Scalability –  Traditional client/server architecture is cumbersome to scale –  Need to keep server states in sync, load balance client requests –  Fault tolerance adds further complexity ■  Real-time interaction and feedback –  No poll or push, interact via asynchronous notification –  Visibility into cluster state is a major pain point for sys admins ■  Cross-node ordering dependency –  Example: start JobTracker after NameNode is running –  Simple to specify and efficient to implement 2 © 2011 IBM Corporation
  • 3. HMS Approach Controller Agent ZooKeeper Storage ■  ZooKeeper plays a central role –  Fault-tolerant and scalable storage –  Asynchronous messaging service 3 © 2011 IBM Corporation
  • 4. ZooKeeper / /dir1 /dir2 Watch Ephemeral Queue •  A hierarchical namespace of znodes for storing data •  Sequential znodes for message queuing •  Watches for asynchronous notification •  Ephemeral znodes for failure detection 4 © 2011 IBM Corporation
  • 5. Leveraging ZooKeeper ■  Storing cluster state –  Each cluster node is represented by a znode in ZooKeeper –  Node state is stored in its corresponding znode ■  Storing system state –  Any state needed for failure recovery is persisted in ZooKeeper –  Server failures are detected via ephemeral nodes ■  Distributed orchestration –  Messages are exchanged asynchronously via ZooKeeper queues –  Notifications are triggered by watches on queues ■  Cross-node dependency –  Node states are stored in ZooKeeper and accessible to all –  Watches are registered on state znodes to get notified of state change 5 © 2011 IBM Corporation
  • 6. HMS Overview ZooKeeper cmd- queue cmd1 cmdStatus Status- watch queue action- action1 agent NameNode queue Controller1 worklog /hms clusters cluster1 Controller2 action- action1 agent JobTracker queue Client worklog cluster2 Controller1 live- controllers Controller2 6 © 2011 IBM Corporation
  • 7. Design Implications ■  All cluster state is stored in ZooKeeper –  Built-in fault-tolerance and HA –  Cluster and command status at your finger tips ■  Controllers and agents don’t interact directly –  All communications are via ZooKeeper async notifications –  Good for scalability and fault-isolation ■  Controllers and agents are stateless –  Controllers can be replicated for load balancing –  Controller failures are automatically detected and handled ■  Dependency specified in terms of node states –  Actions come and go, but their effects are captured in node states –  Node state changes will trigger dependent actions to be re-evaluated 7 © 2011 IBM Corporation
  • 8. Node List { "@url":"http://10.0.1.201:4080/v1/nodes/manifest/test", "roles":[ {"@name":"namenode","host":"host1"}, {"@name":"jobtracker","host":"host2"}, {...} ] } Roles map to hostnames 8 © 2011 IBM Corporation
  • 9. Package Manifest { "@url": "http://10.0.1.201:4080/v1/software/stack/hadoop-1.0.3", "@name": "hadoop", "@version": "1.0.3", "roles": [ { "@name": "namenode", Define software stack by roles "package": [ { "name": "http://.../hadoop-1.0.3-1.x86_64.rpm" } ] }, { "@name": "datanode", "package": [ { "name": "http://.../hadoop-1.0.3-1.x86_64.rpm" Software download URL } ] } ] } 9 © 2011 IBM Corporation
  • 10. Configuration Plan { “@url”: ”http://host/config/hadoop-1.0.3”, “@roles”: [ { “role” : “namenode”, “actions” : [ … ] }, { “role” : “jobtracker”, “actions” : [ … ] }, { “role” : “datanode” , “actions”, : [ … ] } ] } Run a list of scripts to configure NameNode 10 © 2011 IBM Corporation
  • 11. Start NameNode { “@url”: ”http://host/config/hadoop-1.0.3”, “@roles”: [ { “role” : “namenode”, “actions” : [ … ] }, { “role” : “jobtracker”, “actions” : [ … ] }, { “role” : “datanode” , “actions”, : [ … ] } ] { } "@type": "scriptAction", "expectedResults": { "type": "DAEMON", "name": "hadoop-namenode", Define expected "status": "STARTED" result }, "script": "/usr/sbin/hadoop-setup-hdfs.sh", "parameters": [ "--format", "--hdfs-user=hdfs", "--mapreduce-user=mapred", "--namenode-host=${namenode}" ] }, Run a script to setup HDFS on namenode 11 © 2011 IBM Corporation
  • 12. Compiled Plan { "startTime":"Thu Jun 07 13:11:29 {PDT 2012", "action":{ "status":"SUCCEEDED", "@action":"DaemonAction", "daemonName":"hadoop-namenode", "clusterName":"my-test-cluster", "actionId":1, "cmdPath":"/cmdqueue/cmd-0000000000", "actionEntries":[ "actionType":"start", "expectedResults":[ { ... }, { "name":"hadoop-namenode", { ... } "type":"DAEMON", "status":"STARTED" ], } ], "completedActions":6, "role":"namenode" }, "totalActions":6, "hostStatus":[ { "endTime":"Thu Jun 07 13:12:07 PDT 2012" "host":"bdvm021.svl.ibm.com", Host Status "status":"SUCCEEDED" } } ] }, 12 © 2011 IBM Corporation
  • 13. Start JobTracker { “@url”: ”http://host/config/hadoop-1.0.3”, “@roles”: [ { “role” : “namenode”, “actions” : [ … ] }, { “role” : “jobtracker”, “actions” : [ … ] }, { “role” : “datanode” , “actions”, : [ … ] } ] { "@type": "daemonAction", } "actionType": "start", "dependencies": { "states": { Dependency "type": "DAEMON", of namenode "name": "hadoop-namenode", "status": "STARTED" started }, "roles": "namenode" }, "expectedResults": { "type": "DAEMON", "name": "hadoop-jobtracker", "status": "STARTED" }, "daemon": "hadoop-jobtracker" }, Start JobTracker 13 © 2011 IBM Corporation
  • 14. Compiled Plan 2 { { "action":{ "startTime":"Thu Jun 07 13:11:29 PDT 2012", "@action":"DaemonAction", "daemonName":"hadoop-jobtracker", "status":"SUCCEEDED", "actionId":5, "cmdPath":"/cmdqueue/cmd-0000000000", "clusterName":"my-test-cluster", "actionType":"start", "dependencies":[ "actionEntries":[ { "roles":[ { ... }, "namenode" ], { ... } "hosts":[ "/clusters/my-test-cluster/bdvm022.svl.ibm.com" ], ], "states":[ "completedActions":6, { "name":"hadoop-namenode", "totalActions":6, "type":"DAEMON", "status":"STARTED" "endTime":"Thu Jun 07 13:12:07 PDT 2012" } ] } } ], "role":"jobtracker" }, }, Translated Dependency 14 © 2011 IBM Corporation
  • 15. Node State [zk: localhost:2181(CONNECTED) 5] get /hms/clusters/my-test- cluster/bdvm021.svl.ibm.com { "states":[ { ... }, { … }, { "name":"hadoop-namenode", "type":"DAEMON", "status":"STARTED" } ] } State updates based on status reported by Agent 15 © 2011 IBM Corporation
  • 16. Q&A •  HMS prototype is available on GitHub https://github.com/macroadster/hms •  Credits Kan Zhang (kzhang@apache.org) Eric Yang (eyang@apache.org) Jagane Sundar (jagane@apache.org) 16 © 2011 IBM Corporation