SlideShare una empresa de Scribd logo
1 de 51
Dynamic Reconfiguration of ZooKeeper

             Alex Shraer
    (presented by Benjamin Reed)
Why ZooKeeper?




•
    Lots of servers
•
    Lots of processes
•
    High volumes of data
•
    Highly complex software systems
•
    … mere mortal developers
What ZooKeeper gives you
●   Simple programming model
●   Coordination of distributed processes
●   Fast notification of changes
●   Elasticity
●   Easy setup
●   High availability
ZooKeeper Configuration

• Membership
• Role of each server
  – E.g., follower or observer
• Quorum System spec
  – Zookeeper: majority or hierarchical
• Network addresses & ports
• Timeouts, directory paths, etc.
Zookeeper - distributed and replicated
                                 ZooKeeper Service
                                    Leader

             Server     Server        Server            Server        Server




    Client   Client   Client     Client        Client        Client     Client   Client


• All servers store a copy of the data (in memory)
• A leader is elected at startup
• Reads served by followers, all updates go through leader
• Update acked when a quorum of servers have persisted the
  change (on disk)
• Zookeeper uses ZAB - its own atomic broadcast protocol
Dynamic Membership Changes
• Necessary in every long-lived system!
• Examples:
   – Cloud computing: adapt to changing load, don’t pre-allocate!
   – Failures: replacing failed nodes with healthy ones
   – Upgrades: replacing out-of-date nodes with up-to-date ones
   – Free up storage space: decreasing the number of replicas
   – Moving nodes: within the network or the data center
   – Increase resilience by changing the set of servers
  Example: asynch. replication works as long as > #servers/2 operate:
Dynamic Membership Changes
• Necessary in every long-lived system!
• Examples:
   – Cloud computing: adapt to changing load, don’t pre-allocate!
   – Failures: replacing failed nodes with healthy ones
   – Upgrades: replacing out-of-date nodes with up-to-date ones
   – Free up storage space: decreasing the number of replicas
   – Moving nodes: within the network or the data center
   – Increase resilience by changing the set of servers
  Example: asynch. replication works as long as > #servers/2 operate:
Dynamic Membership Changes
• Necessary in every long-lived system!
• Examples:
   – Cloud computing: adapt to changing load, don’t pre-allocate!
   – Failures: replacing failed nodes with healthy ones
   – Upgrades: replacing out-of-date nodes with up-to-date ones
   – Free up storage space: decreasing the number of replicas
   – Moving nodes: within the network or the data center
   – Increase resilience by changing the set of servers
  Example: asynch. replication works as long as > #servers/2 operate:
Dynamic Membership Changes
• Necessary in every long-lived system!
• Examples:
   – Cloud computing: adapt to changing load, don’t pre-allocate!
   – Failures: replacing failed nodes with healthy ones
   – Upgrades: replacing out-of-date nodes with up-to-date ones
   – Free up storage space: decreasing the number of replicas
   – Moving nodes: within the network or the data center
   – Increase resilience by changing the set of servers
  Example: asynch. replication works as long as > #servers/2 operate:
Dynamic Membership Changes
• Necessary in every long-lived system!
• Examples:
   – Cloud computing: adapt to changing load, don’t pre-allocate!
   – Failures: replacing failed nodes with healthy ones
   – Upgrades: replacing out-of-date nodes with up-to-date ones
   – Free up storage space: decreasing the number of replicas
   – Moving nodes: within the network or the data center
   – Increase resilience by changing the set of servers
  Example: asynch. replication works as long as > #servers/2 operate:
Dynamic Membership Changes
• Necessary in every long-lived system!
• Examples:
   – Cloud computing: adapt to changing load, don’t pre-allocate!
   – Failures: replacing failed nodes with healthy ones
   – Upgrades: replacing out-of-date nodes with up-to-date ones
   – Free up storage space: decreasing the number of replicas
   – Moving nodes: within the network or the data center
   – Increase resilience by changing the set of servers
  Example: asynch. replication works as long as > #servers/2 operate:
Hazards of Manual Reconfiguration
                                     E
       A

                       C


        {A, B, C}

        B              {A, B, C}     D




           {A, B, C}


       • Goal: add servers E and D
Hazards of Manual Reconfiguration
                                           E
        A

                           C


       {A, B, C, D, E}                      {A, B, C, D, E}

         B               {A, B, C, D, E}   D




       {A, B, C, D, E}
                                           {A, B, C, D, E}


        • Goal: add servers E and D
        • Change Configuration
Hazards of Manual Reconfiguration
                                           E
        A

                           C


       {A, B, C, D, E}                      {A, B, C, D, E}

         B               {A, B, C, D, E}   D




       {A, B, C, D, E}
                                           {A, B, C, D, E}


        • Goal: add servers E and D
        • Change Configuration
        • Restart Servers
Hazards of Manual Reconfiguration
                                           E
        A

                           C


       {A, B, C, D, E}                      {A, B, C, D, E}

         B               {A, B, C, D, E}   D




       {A, B, C, D, E}
                                           {A, B, C, D, E}


        • Goal: add servers E and D
        • Change Configuration
        • Restart Servers
Hazards of Manual Reconfiguration
                                           E
        A

                           C


       {A, B, C, D, E}                      {A, B, C, D, E}

         B               {A, B, C, D, E}   D




       {A, B, C, D, E}
                                           {A, B, C, D, E}


        • Goal: add servers E and D
        • Change Configuration
        • Restart Servers
Hazards of Manual Reconfiguration
                                           E
        A

                           C


       {A, B, C, D, E}                      {A, B, C, D, E}

         B               {A, B, C, D, E}   D




       {A, B, C, D, E}
                                           {A, B, C, D, E}


        •    Goal: add servers E and D
        •    Change Configuration
        •    Restart Servers
        •    Lost    and    !
18

          Just use a coordination service!
     • Zookeeper is the coordination service
        – Don’t want to deploy another system to coordinate it!


     • Who will reconfigure that system ?
        – GFS has 3 levels of coordination services


     • More system components -> more management overhead


     • Use Zookeeper to reconfigure itself!
        – Other systems store configuration information in Zookeeper
        – Can we do the same??
        – Only if there are no failures
Recovery in Zookeeper

                  C               E

                           B


setData(/x, 5)

                                  D
                      A
Recovery in Zookeeper

                  C               E

                           B


setData(/x, 5)

                                  D
                      A
Recovery in Zookeeper

                  C               E

                           B


setData(/x, 5)

                                  D
                      A
Recovery in Zookeeper

                  C               E

                           B


setData(/x, 5)

                                  D
                      A
This doesn’t work for reconfigurations!
                                                                        E
                               C


                                                     B
                               {A, B, C, D, E}                          {A, B, C, D, E}


setData(/zookeeper/config, {A, B, F})
                                                      {A, B, C, D, E}   D
      remove C, D, E add F



             F
                                                                        {A, B, C, D, E}
                                        A




                                         {A, B, C, D, E}
This doesn’t work for reconfigurations!
                                                                          E
                               C


                                                        B
                               {A, B, C, D, E}                            {A, B, C, D, E}


setData(/zookeeper/config, {A, B, F})
                                                        {A, B, C, D, E}   D
      remove C, D, E add F



             F
                                                                          {A, B, C, D, E}
                                        A



 {A, B, F}
                                            {A, B, F}
This doesn’t work for reconfigurations!
                                                                              E
                                   C


                                                            B
                                   {A, B, C, D, E}                            {A, B, C, D, E}


    setData(/zookeeper/config, {A, B, F})
                                                            {A, B, C, D, E}   D
          remove C, D, E add F



                  F
                                                                              {A, B, C, D, E}
                                            A



      {A, B, F}
                                                {A, B, F}

•   Must persist the decision to reconfigure in the old
    config before activating the new config!
•   Once such decision is reached, must not allow further
    ops to be committed in old config
Our Solution
•   Correct
•   Fully automatic
•   No external services or additional components
•   Minimal changes to Zookeeper
•   Usually unnoticeable to clients
    – Pause operations only in rare circumstances
    – Clients work with a single configuration
• Rebalances clients across servers in new configuration

• Reconfigures immediately

• Speculative Reconfiguration
    – Reconfiguration (and commands that follow it) speculatively sent out by the
      primary, similarly to all other updates
Principles
●   Commit reconfig in a quorum of the old ensemble
    –   Submit reconfig op just like any other update
●   Make sure new ensemble has latest state before
    becoming active
    –   Get quorum of synced followers from new config
    –   Get acks from both old and new ensembles before committing
        updates proposed between reconfig op and activation
    –   Activate new configuration when reconfig commits
●   Once new ensemble active old ensemble cannot commit
    or propose new updates
●   Gossip activation through leader election and syncing
●   Verify configuration id of leader and follower
Failure free flow
Reconfiguration scenario 1
                                 E
   A

                   C


    {A, B, C}                        {A, B, C}

    B              {A, B, C}     D




       {A, B, C}
                                      {A, B, C}


   • Goal: add servers E and D
Reconfiguration scenario 1
                               E
   A

                   C


    {A, B, C}

    B              {A, B, C}   D




       {A, B, C}


   • Goal: add servers E and D
   •    doesn't commit until quorums of
     both ensembles ack
Reconfiguration scenario 1
                               E
   A

                   C


    {A, B, C}                      {A, B, C}

    B              {A, B, C}   D




       {A, B, C}
                                    {A, B, C}


   • Goal: add servers E and D
   •    doesn't commit until quorums of
     both ensembles ack
Reconfiguration scenario 1
                               E
   A

                   C


    {A, B, C}                      {A, B, C}

    B              {A, B, C}   D




       {A, B, C}
                                    {A, B, C}


   • Goal: add servers E and D
   •    doesn't commit until quorums of
     both ensembles ack
Reconfiguration scenario 1
                                 E
    A

                     C


   {A, B, C, D, E}               {A, B, C, D, E}

     B               {A, B, C}   D




   {A, B, C, D, E}
                                  {A, B, C, D, E}


    • Goal: add servers E and D
    •    doesn't commit until quorums of
      both ensembles ack
Reconfiguration scenario 1
                                 E
    A

                     C


   {A, B, C, D, E}               {A, B, C, D, E}

     B               {A, B, C}   D




   {A, B, C, D, E}
                                  {A, B, C, D, E}


    • Goal: add servers E and D
    •    doesn't commit until quorums of
      both ensembles ack
    • E and D gossip new configuration
      to C
Reconfiguration scenario 1
                                       E
    A

                       C


   {A, B, C, D, E}                     {A, B, C, D, E}

     B               {A, B, C, D, E}   D




   {A, B, C, D, E}
                                        {A, B, C, D, E}


    • Goal: add servers E and D
    •    doesn't commit until quorums of
      both ensembles ack
    • E and D gossip new configuration
      to C
Example - reconfig using CLI
reconfig -add 1=host1.com:1234:1235:observer;1239

         -add 2=host2.com:1236:1237:follower;1231 -remove 5
●
    Change follower 1 to an observer and change its ports
●
    Add follower 2 to the ensemble
●
    Remove follower 5 from the ensemble

reconfig -file myNewConfig.txt -v 234547
●
    Change the current config to the one in myNewConfig.txt
●
    But only if current config version is 234547

getConfig -w -c
●
    set a watch on /zookeeper/config
●
    -c means we only want the new connection string for clients
When it will not work
●   Quorum of new ensemble must be in sync
●   Another reconfig in progress
●   Version condition check fails
How do you know you are done
●   Write something somewhere
The “client side” of reconfiguration
• When system changes, clients need to stay connected
   – The usual solution: directory service (e.g., DNS)
• Re-balancing load during reconfiguration is also important!
• Goal: uniform #clients per server with minimal client migration
   – Migration should be proportional to change in membership




                  X 10   X 10   X 10
The “client side” of reconfiguration
• When system changes, clients need to stay connected
   – The usual solution: directory service (e.g., DNS)
• Re-balancing load during reconfiguration is also important!
• Goal: uniform #clients per server with minimal client migration
   – Migration should be proportional to change in membership




                   X 10   X 10   X 10
Our approach - Probabilistic Load Balancing
• Example 1 :


                X 10   X 10   X 10
Our approach - Probabilistic Load Balancing
• Example 1 :


                X 10   X 10   X 10
Our approach - Probabilistic Load Balancing
• Example 1 :


                       X6       X6       X6        X6     X6

   –   Each client moves to a random new server with probability 0.4
   –   1 – 3/5 = 0.4

   –   Exp. 40% clients will move off of each server
Our approach - Probabilistic Load Balancing
• Example 1 :


                        X6       X6       X6        X6     X6

    –   Each client moves to a random new server with probability 0.4
    –   1 – 3/5 = 0.4

    –   Exp. 40% clients will move off of each server
●
    Example 2 :



                        X6       X6      X6         X6     X6
Our approach - Probabilistic Load Balancing
• Example 1 :


                        X6       X6       X6        X6     X6

    –   Each client moves to a random new server with probability 0.4
    –   1 – 3/5 = 0.4

    –   Exp. 40% clients will move off of each server
●
    Example 2 :



                        X6       X6      X6         X6     X6
Our approach - Probabilistic Load Balancing
• Example 1 :


                         X6         X6       X6           X6      X6
    –   Each client moves to a random new server with probability 0.4
    –   1 – 3/5 = 0.4

    –   Exp. 40% clients will move off of each server
●
    Example 2 :
                                                   4/18        4/18    10/18




                         X6        X6       X6            X6      X6

    –   Connected clients don’t move
    –   Disconnected clients move to old servers with prob 4/18 and new one with prob
        10/18
    –   Exp. 8 clients will move from A, B, C to D, E and 10 to F
Our approach - Probabilistic Load Balancing
• Example 1 :


                         X6         X6       X6           X6      X6
    –   Each client moves to a random new server with probability 0.4
    –   1 – 3/5 = 0.4

    –   Exp. 40% clients will move off of each server
●
    Example 2 :
                                                   4/18        4/18      10/18




                                                        X 10      X 10      X 10

    –   Connected clients don’t move
    –   Disconnected clients move to old servers with prob 4/18 and new one with prob
        10/18
    –   Exp. 8 clients will move from A, B, C to D, E and 10 to F
Current Load Balancing
ProbabilisticCurrent Load Balancing
 When moving from config. S to S’:
E (load (i, S ' )) = load (i, S ) +     ∑ load ( j, S ) ⋅ Pr( j → i ) − load (i, S ) ∑ Pr(i → j )
                                      j∈S ∧ j ≠i                                 j∈S ' ∧ j ≠i

 expected #clients       #clients
 connected to i in S’   connected                      #clients
(10 in last example)     to i in S                                               #clients
                                                    moving to i from         moving from i to
                                                   other servers in S       other servers in S’
 Solving for Pr we get case-specific probabilities.
 Input: each client answers locally
 Question 1: Are there more servers now or less ?
 Question 2: Is my server being removed?
 Output: 1) disconnect or stay connected to my server
          if disconnect 2) Pr(connect to one of the old servers)
                 and Pr(connect to newly added server)
Implementation
• Implemented in Zookeeper (Java & C), integration ongoing
   – 3 new Zookeeper API calls: reconfig, getConfig, updateServerList
   – feature requested since 2008, expected in 3.5.0 release (july 2012)
• Dynamic changes to:
   –   Membership
   –   Quorum System
   –   Server roles
   –   Addresses & ports
• Reconfiguration modes:
   – Incremental (add servers E and D, remove server B)
   – Non-incremental (new config = {A, C, D, E})
   – Blind or conditioned (reconfig only if current config is #5)
• Subscriptions to config changes
   – Client can invoke client-side re-balancing upon change
52

                                      Summary
     • Design and implementation of reconfiguration for Apache Zookeeper
        – being contributed into Zookeeper codebase


     • Much simpler than state of the art, using properties already provided by Zookeeper

     • Many nice features:
        – Doesn’t limit concurrency
        – Reconfigures immediately
        – Preserves primary order
        – Doesn’t stop client ops
        – Zookeeper used by online systems, any delay must be avoided
        – Clients work with a single configuration at a time
        – No external services
        – Includes client-side rebalancing

Más contenido relacionado

La actualidad más candente

ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTOClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTOAltinity Ltd
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013mumrah
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeFlink Forward
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovAltinity Ltd
 
10 Good Reasons to Use ClickHouse
10 Good Reasons to Use ClickHouse10 Good Reasons to Use ClickHouse
10 Good Reasons to Use ClickHouserpolat
 
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesA Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesAltinity Ltd
 
SeaweedFS introduction
SeaweedFS introductionSeaweedFS introduction
SeaweedFS introductionchrislusf
 
Getting Started with Amazon ElastiCache
Getting Started with Amazon ElastiCacheGetting Started with Amazon ElastiCache
Getting Started with Amazon ElastiCacheAmazon Web Services
 
High Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouseHigh Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouseAltinity Ltd
 
[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅NAVER D2
 
Better than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouseBetter than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouseAltinity Ltd
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevAltinity Ltd
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsFlink Forward
 
Blazing Performance with Flame Graphs
Blazing Performance with Flame GraphsBlazing Performance with Flame Graphs
Blazing Performance with Flame GraphsBrendan Gregg
 
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...OpenStack Korea Community
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to RedisDvir Volk
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlJiangjie Qin
 

La actualidad más candente (20)

ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTOClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
ClickHouse on Kubernetes, by Alexander Zaitsev, Altinity CTO
 
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
Introduction and Overview of Apache Kafka, TriHUG July 23, 2013
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Query logging with proxysql
Query logging with proxysqlQuery logging with proxysql
Query logging with proxysql
 
ClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei MilovidovClickHouse Deep Dive, by Aleksei Milovidov
ClickHouse Deep Dive, by Aleksei Milovidov
 
10 Good Reasons to Use ClickHouse
10 Good Reasons to Use ClickHouse10 Good Reasons to Use ClickHouse
10 Good Reasons to Use ClickHouse
 
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert HodgesA Fast Intro to Fast Query with ClickHouse, by Robert Hodges
A Fast Intro to Fast Query with ClickHouse, by Robert Hodges
 
SeaweedFS introduction
SeaweedFS introductionSeaweedFS introduction
SeaweedFS introduction
 
Getting Started with Amazon ElastiCache
Getting Started with Amazon ElastiCacheGetting Started with Amazon ElastiCache
Getting Started with Amazon ElastiCache
 
High Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouseHigh Performance, High Reliability Data Loading on ClickHouse
High Performance, High Reliability Data Loading on ClickHouse
 
[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅[232] 성능어디까지쥐어짜봤니 송태웅
[232] 성능어디까지쥐어짜봤니 송태웅
 
Better than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouseBetter than you think: Handling JSON data in ClickHouse
Better than you think: Handling JSON data in ClickHouse
 
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander ZaitsevMigration to ClickHouse. Practical guide, by Alexander Zaitsev
Migration to ClickHouse. Practical guide, by Alexander Zaitsev
 
Practical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobsPractical learnings from running thousands of Flink jobs
Practical learnings from running thousands of Flink jobs
 
Blazing Performance with Flame Graphs
Blazing Performance with Flame GraphsBlazing Performance with Flame Graphs
Blazing Performance with Flame Graphs
 
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
[OpenInfra Days Korea 2018] Day 2 - CEPH 운영자를 위한 Object Storage Performance T...
 
Introduction to Redis
Introduction to RedisIntroduction to Redis
Introduction to Redis
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Introduction to Kafka Cruise Control
Introduction to Kafka Cruise ControlIntroduction to Kafka Cruise Control
Introduction to Kafka Cruise Control
 
Redis
RedisRedis
Redis
 

Similar a Dynamic Reconfiguration of Apache ZooKeeper

Understanding histogramppt.prn
Understanding histogramppt.prnUnderstanding histogramppt.prn
Understanding histogramppt.prnLeyi (Kamus) Zhang
 
NYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeNYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeRizwan Habib
 
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16MLconf
 
Graph analysis platform comparison, pregel/goldenorb/giraph
Graph analysis platform comparison, pregel/goldenorb/giraphGraph analysis platform comparison, pregel/goldenorb/giraph
Graph analysis platform comparison, pregel/goldenorb/giraphAndrew Yongjoon Kong
 
Presentation v mware roi tco calculator
Presentation   v mware roi tco calculatorPresentation   v mware roi tco calculator
Presentation v mware roi tco calculatorsolarisyourep
 
C++ unit-1-part-11
C++ unit-1-part-11C++ unit-1-part-11
C++ unit-1-part-11Jadavsejal
 
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the CloudsGreg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the CloudsFlink Forward
 
Lucene revolution 2011
Lucene revolution 2011Lucene revolution 2011
Lucene revolution 2011Takahiko Ito
 
Using GPUs to Handle Big Data with Java
Using GPUs to Handle Big Data with JavaUsing GPUs to Handle Big Data with Java
Using GPUs to Handle Big Data with JavaTim Ellison
 
Genomics Is Not Special: Towards Data Intensive Biology
Genomics Is Not Special: Towards Data Intensive BiologyGenomics Is Not Special: Towards Data Intensive Biology
Genomics Is Not Special: Towards Data Intensive BiologyUri Laserson
 
UKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptxUKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptxMarco Gralike
 
Provenance Annotation and Analysis to Support Process Re-Computation
Provenance Annotation and Analysis to Support Process Re-ComputationProvenance Annotation and Analysis to Support Process Re-Computation
Provenance Annotation and Analysis to Support Process Re-ComputationPaolo Missier
 
Real World Optimization
Real World OptimizationReal World Optimization
Real World OptimizationDavid Golden
 
Extend starfish to Support the Growing Hadoop Ecosystem
Extend starfish to Support the Growing Hadoop EcosystemExtend starfish to Support the Growing Hadoop Ecosystem
Extend starfish to Support the Growing Hadoop EcosystemFei Dong
 
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016MLconf
 
MapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementMapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementKyong-Ha Lee
 
What's new in Doctrine
What's new in DoctrineWhat's new in Doctrine
What's new in DoctrineJonathan Wage
 
Eventually Consistent Data Structures (from strangeloop12)
Eventually Consistent Data Structures (from strangeloop12)Eventually Consistent Data Structures (from strangeloop12)
Eventually Consistent Data Structures (from strangeloop12)Sean Cribbs
 
Eventually-Consistent Data Structures
Eventually-Consistent Data StructuresEventually-Consistent Data Structures
Eventually-Consistent Data StructuresSean Cribbs
 
Towards a General Approach for Symbolic Model-Checker Prototyping
Towards a General Approach for Symbolic Model-Checker PrototypingTowards a General Approach for Symbolic Model-Checker Prototyping
Towards a General Approach for Symbolic Model-Checker PrototypingEdmundo López Bóbeda
 

Similar a Dynamic Reconfiguration of Apache ZooKeeper (20)

Understanding histogramppt.prn
Understanding histogramppt.prnUnderstanding histogramppt.prn
Understanding histogramppt.prn
 
NYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKeeNYAI - Scaling Machine Learning Applications by Braxton McKee
NYAI - Scaling Machine Learning Applications by Braxton McKee
 
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
Braxton McKee, CEO & Founder, Ufora at MLconf NYC - 4/15/16
 
Graph analysis platform comparison, pregel/goldenorb/giraph
Graph analysis platform comparison, pregel/goldenorb/giraphGraph analysis platform comparison, pregel/goldenorb/giraph
Graph analysis platform comparison, pregel/goldenorb/giraph
 
Presentation v mware roi tco calculator
Presentation   v mware roi tco calculatorPresentation   v mware roi tco calculator
Presentation v mware roi tco calculator
 
C++ unit-1-part-11
C++ unit-1-part-11C++ unit-1-part-11
C++ unit-1-part-11
 
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the CloudsGreg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
Greg Hogan – To Petascale and Beyond- Apache Flink in the Clouds
 
Lucene revolution 2011
Lucene revolution 2011Lucene revolution 2011
Lucene revolution 2011
 
Using GPUs to Handle Big Data with Java
Using GPUs to Handle Big Data with JavaUsing GPUs to Handle Big Data with Java
Using GPUs to Handle Big Data with Java
 
Genomics Is Not Special: Towards Data Intensive Biology
Genomics Is Not Special: Towards Data Intensive BiologyGenomics Is Not Special: Towards Data Intensive Biology
Genomics Is Not Special: Towards Data Intensive Biology
 
UKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptxUKOUG2018 - I Know what you did Last Summer [in my Database].pptx
UKOUG2018 - I Know what you did Last Summer [in my Database].pptx
 
Provenance Annotation and Analysis to Support Process Re-Computation
Provenance Annotation and Analysis to Support Process Re-ComputationProvenance Annotation and Analysis to Support Process Re-Computation
Provenance Annotation and Analysis to Support Process Re-Computation
 
Real World Optimization
Real World OptimizationReal World Optimization
Real World Optimization
 
Extend starfish to Support the Growing Hadoop Ecosystem
Extend starfish to Support the Growing Hadoop EcosystemExtend starfish to Support the Growing Hadoop Ecosystem
Extend starfish to Support the Growing Hadoop Ecosystem
 
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
Tom Peters, Software Engineer, Ufora at MLconf ATL 2016
 
MapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvementMapReduce: A useful parallel tool that still has room for improvement
MapReduce: A useful parallel tool that still has room for improvement
 
What's new in Doctrine
What's new in DoctrineWhat's new in Doctrine
What's new in Doctrine
 
Eventually Consistent Data Structures (from strangeloop12)
Eventually Consistent Data Structures (from strangeloop12)Eventually Consistent Data Structures (from strangeloop12)
Eventually Consistent Data Structures (from strangeloop12)
 
Eventually-Consistent Data Structures
Eventually-Consistent Data StructuresEventually-Consistent Data Structures
Eventually-Consistent Data Structures
 
Towards a General Approach for Symbolic Model-Checker Prototyping
Towards a General Approach for Symbolic Model-Checker PrototypingTowards a General Approach for Symbolic Model-Checker Prototyping
Towards a General Approach for Symbolic Model-Checker Prototyping
 

Más de DataWorks Summit

Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisDataWorks Summit
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiDataWorks Summit
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...DataWorks Summit
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal SystemDataWorks Summit
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExampleDataWorks Summit
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberDataWorks Summit
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixDataWorks Summit
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiDataWorks Summit
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsDataWorks Summit
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureDataWorks Summit
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EngineDataWorks Summit
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...DataWorks Summit
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudDataWorks Summit
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiDataWorks Summit
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerDataWorks Summit
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...DataWorks Summit
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouDataWorks Summit
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkDataWorks Summit
 

Más de DataWorks Summit (20)

Data Science Crash Course
Data Science Crash CourseData Science Crash Course
Data Science Crash Course
 
Floating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache RatisFloating on a RAFT: HBase Durability with Apache Ratis
Floating on a RAFT: HBase Durability with Apache Ratis
 
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFiTracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
Tracking Crime as It Occurs with Apache Phoenix, Apache HBase and Apache NiFi
 
HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...HBase Tales From the Trenches - Short stories about most common HBase operati...
HBase Tales From the Trenches - Short stories about most common HBase operati...
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Managing the Dewey Decimal System
Managing the Dewey Decimal SystemManaging the Dewey Decimal System
Managing the Dewey Decimal System
 
Practical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist ExamplePractical NoSQL: Accumulo's dirlist Example
Practical NoSQL: Accumulo's dirlist Example
 
HBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at UberHBase Global Indexing to support large-scale data ingestion at Uber
HBase Global Indexing to support large-scale data ingestion at Uber
 
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and PhoenixScaling Cloud-Scale Translytics Workloads with Omid and Phoenix
Scaling Cloud-Scale Translytics Workloads with Omid and Phoenix
 
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFiBuilding the High Speed Cybersecurity Data Pipeline Using Apache NiFi
Building the High Speed Cybersecurity Data Pipeline Using Apache NiFi
 
Supporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability ImprovementsSupporting Apache HBase : Troubleshooting and Supportability Improvements
Supporting Apache HBase : Troubleshooting and Supportability Improvements
 
Security Framework for Multitenant Architecture
Security Framework for Multitenant ArchitectureSecurity Framework for Multitenant Architecture
Security Framework for Multitenant Architecture
 
Presto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything EnginePresto: Optimizing Performance of SQL-on-Anything Engine
Presto: Optimizing Performance of SQL-on-Anything Engine
 
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
Introducing MlFlow: An Open Source Platform for the Machine Learning Lifecycl...
 
Extending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google CloudExtending Twitter's Data Platform to Google Cloud
Extending Twitter's Data Platform to Google Cloud
 
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFiEvent-Driven Messaging and Actions using Apache Flink and Apache NiFi
Event-Driven Messaging and Actions using Apache Flink and Apache NiFi
 
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache RangerSecuring Data in Hybrid on-premise and Cloud Environments using Apache Ranger
Securing Data in Hybrid on-premise and Cloud Environments using Apache Ranger
 
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
Big Data Meets NVM: Accelerating Big Data Processing with Non-Volatile Memory...
 
Computer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near YouComputer Vision: Coming to a Store Near You
Computer Vision: Coming to a Store Near You
 
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache SparkBig Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
Big Data Genomics: Clustering Billions of DNA Sequences with Apache Spark
 

Último

Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI AgeCprime
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Mark Goldstein
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demoHarshalMandlekar2
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsNathaniel Shimoni
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 

Último (20)

Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
A Framework for Development in the AI Age
A Framework for Development in the AI AgeA Framework for Development in the AI Age
A Framework for Development in the AI Age
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
Arizona Broadband Policy Past, Present, and Future Presentation 3/25/24
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Sample pptx for embedding into website for demo
Sample pptx for embedding into website for demoSample pptx for embedding into website for demo
Sample pptx for embedding into website for demo
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
Time Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directionsTime Series Foundation Models - current state and future directions
Time Series Foundation Models - current state and future directions
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 

Dynamic Reconfiguration of Apache ZooKeeper

  • 1. Dynamic Reconfiguration of ZooKeeper Alex Shraer (presented by Benjamin Reed)
  • 2. Why ZooKeeper? • Lots of servers • Lots of processes • High volumes of data • Highly complex software systems • … mere mortal developers
  • 3. What ZooKeeper gives you ● Simple programming model ● Coordination of distributed processes ● Fast notification of changes ● Elasticity ● Easy setup ● High availability
  • 4. ZooKeeper Configuration • Membership • Role of each server – E.g., follower or observer • Quorum System spec – Zookeeper: majority or hierarchical • Network addresses & ports • Timeouts, directory paths, etc.
  • 5. Zookeeper - distributed and replicated ZooKeeper Service Leader Server Server Server Server Server Client Client Client Client Client Client Client Client • All servers store a copy of the data (in memory) • A leader is elected at startup • Reads served by followers, all updates go through leader • Update acked when a quorum of servers have persisted the change (on disk) • Zookeeper uses ZAB - its own atomic broadcast protocol
  • 6. Dynamic Membership Changes • Necessary in every long-lived system! • Examples: – Cloud computing: adapt to changing load, don’t pre-allocate! – Failures: replacing failed nodes with healthy ones – Upgrades: replacing out-of-date nodes with up-to-date ones – Free up storage space: decreasing the number of replicas – Moving nodes: within the network or the data center – Increase resilience by changing the set of servers Example: asynch. replication works as long as > #servers/2 operate:
  • 7. Dynamic Membership Changes • Necessary in every long-lived system! • Examples: – Cloud computing: adapt to changing load, don’t pre-allocate! – Failures: replacing failed nodes with healthy ones – Upgrades: replacing out-of-date nodes with up-to-date ones – Free up storage space: decreasing the number of replicas – Moving nodes: within the network or the data center – Increase resilience by changing the set of servers Example: asynch. replication works as long as > #servers/2 operate:
  • 8. Dynamic Membership Changes • Necessary in every long-lived system! • Examples: – Cloud computing: adapt to changing load, don’t pre-allocate! – Failures: replacing failed nodes with healthy ones – Upgrades: replacing out-of-date nodes with up-to-date ones – Free up storage space: decreasing the number of replicas – Moving nodes: within the network or the data center – Increase resilience by changing the set of servers Example: asynch. replication works as long as > #servers/2 operate:
  • 9. Dynamic Membership Changes • Necessary in every long-lived system! • Examples: – Cloud computing: adapt to changing load, don’t pre-allocate! – Failures: replacing failed nodes with healthy ones – Upgrades: replacing out-of-date nodes with up-to-date ones – Free up storage space: decreasing the number of replicas – Moving nodes: within the network or the data center – Increase resilience by changing the set of servers Example: asynch. replication works as long as > #servers/2 operate:
  • 10. Dynamic Membership Changes • Necessary in every long-lived system! • Examples: – Cloud computing: adapt to changing load, don’t pre-allocate! – Failures: replacing failed nodes with healthy ones – Upgrades: replacing out-of-date nodes with up-to-date ones – Free up storage space: decreasing the number of replicas – Moving nodes: within the network or the data center – Increase resilience by changing the set of servers Example: asynch. replication works as long as > #servers/2 operate:
  • 11. Dynamic Membership Changes • Necessary in every long-lived system! • Examples: – Cloud computing: adapt to changing load, don’t pre-allocate! – Failures: replacing failed nodes with healthy ones – Upgrades: replacing out-of-date nodes with up-to-date ones – Free up storage space: decreasing the number of replicas – Moving nodes: within the network or the data center – Increase resilience by changing the set of servers Example: asynch. replication works as long as > #servers/2 operate:
  • 12. Hazards of Manual Reconfiguration E A C {A, B, C} B {A, B, C} D {A, B, C} • Goal: add servers E and D
  • 13. Hazards of Manual Reconfiguration E A C {A, B, C, D, E} {A, B, C, D, E} B {A, B, C, D, E} D {A, B, C, D, E} {A, B, C, D, E} • Goal: add servers E and D • Change Configuration
  • 14. Hazards of Manual Reconfiguration E A C {A, B, C, D, E} {A, B, C, D, E} B {A, B, C, D, E} D {A, B, C, D, E} {A, B, C, D, E} • Goal: add servers E and D • Change Configuration • Restart Servers
  • 15. Hazards of Manual Reconfiguration E A C {A, B, C, D, E} {A, B, C, D, E} B {A, B, C, D, E} D {A, B, C, D, E} {A, B, C, D, E} • Goal: add servers E and D • Change Configuration • Restart Servers
  • 16. Hazards of Manual Reconfiguration E A C {A, B, C, D, E} {A, B, C, D, E} B {A, B, C, D, E} D {A, B, C, D, E} {A, B, C, D, E} • Goal: add servers E and D • Change Configuration • Restart Servers
  • 17. Hazards of Manual Reconfiguration E A C {A, B, C, D, E} {A, B, C, D, E} B {A, B, C, D, E} D {A, B, C, D, E} {A, B, C, D, E} • Goal: add servers E and D • Change Configuration • Restart Servers • Lost and !
  • 18. 18 Just use a coordination service! • Zookeeper is the coordination service – Don’t want to deploy another system to coordinate it! • Who will reconfigure that system ? – GFS has 3 levels of coordination services • More system components -> more management overhead • Use Zookeeper to reconfigure itself! – Other systems store configuration information in Zookeeper – Can we do the same?? – Only if there are no failures
  • 19. Recovery in Zookeeper C E B setData(/x, 5) D A
  • 20. Recovery in Zookeeper C E B setData(/x, 5) D A
  • 21. Recovery in Zookeeper C E B setData(/x, 5) D A
  • 22. Recovery in Zookeeper C E B setData(/x, 5) D A
  • 23. This doesn’t work for reconfigurations! E C B {A, B, C, D, E} {A, B, C, D, E} setData(/zookeeper/config, {A, B, F}) {A, B, C, D, E} D remove C, D, E add F F {A, B, C, D, E} A {A, B, C, D, E}
  • 24. This doesn’t work for reconfigurations! E C B {A, B, C, D, E} {A, B, C, D, E} setData(/zookeeper/config, {A, B, F}) {A, B, C, D, E} D remove C, D, E add F F {A, B, C, D, E} A {A, B, F} {A, B, F}
  • 25. This doesn’t work for reconfigurations! E C B {A, B, C, D, E} {A, B, C, D, E} setData(/zookeeper/config, {A, B, F}) {A, B, C, D, E} D remove C, D, E add F F {A, B, C, D, E} A {A, B, F} {A, B, F} • Must persist the decision to reconfigure in the old config before activating the new config! • Once such decision is reached, must not allow further ops to be committed in old config
  • 26. Our Solution • Correct • Fully automatic • No external services or additional components • Minimal changes to Zookeeper • Usually unnoticeable to clients – Pause operations only in rare circumstances – Clients work with a single configuration • Rebalances clients across servers in new configuration • Reconfigures immediately • Speculative Reconfiguration – Reconfiguration (and commands that follow it) speculatively sent out by the primary, similarly to all other updates
  • 27. Principles ● Commit reconfig in a quorum of the old ensemble – Submit reconfig op just like any other update ● Make sure new ensemble has latest state before becoming active – Get quorum of synced followers from new config – Get acks from both old and new ensembles before committing updates proposed between reconfig op and activation – Activate new configuration when reconfig commits ● Once new ensemble active old ensemble cannot commit or propose new updates ● Gossip activation through leader election and syncing ● Verify configuration id of leader and follower
  • 29. Reconfiguration scenario 1 E A C {A, B, C} {A, B, C} B {A, B, C} D {A, B, C} {A, B, C} • Goal: add servers E and D
  • 30. Reconfiguration scenario 1 E A C {A, B, C} B {A, B, C} D {A, B, C} • Goal: add servers E and D • doesn't commit until quorums of both ensembles ack
  • 31. Reconfiguration scenario 1 E A C {A, B, C} {A, B, C} B {A, B, C} D {A, B, C} {A, B, C} • Goal: add servers E and D • doesn't commit until quorums of both ensembles ack
  • 32. Reconfiguration scenario 1 E A C {A, B, C} {A, B, C} B {A, B, C} D {A, B, C} {A, B, C} • Goal: add servers E and D • doesn't commit until quorums of both ensembles ack
  • 33. Reconfiguration scenario 1 E A C {A, B, C, D, E} {A, B, C, D, E} B {A, B, C} D {A, B, C, D, E} {A, B, C, D, E} • Goal: add servers E and D • doesn't commit until quorums of both ensembles ack
  • 34. Reconfiguration scenario 1 E A C {A, B, C, D, E} {A, B, C, D, E} B {A, B, C} D {A, B, C, D, E} {A, B, C, D, E} • Goal: add servers E and D • doesn't commit until quorums of both ensembles ack • E and D gossip new configuration to C
  • 35. Reconfiguration scenario 1 E A C {A, B, C, D, E} {A, B, C, D, E} B {A, B, C, D, E} D {A, B, C, D, E} {A, B, C, D, E} • Goal: add servers E and D • doesn't commit until quorums of both ensembles ack • E and D gossip new configuration to C
  • 36. Example - reconfig using CLI reconfig -add 1=host1.com:1234:1235:observer;1239 -add 2=host2.com:1236:1237:follower;1231 -remove 5 ● Change follower 1 to an observer and change its ports ● Add follower 2 to the ensemble ● Remove follower 5 from the ensemble reconfig -file myNewConfig.txt -v 234547 ● Change the current config to the one in myNewConfig.txt ● But only if current config version is 234547 getConfig -w -c ● set a watch on /zookeeper/config ● -c means we only want the new connection string for clients
  • 37. When it will not work ● Quorum of new ensemble must be in sync ● Another reconfig in progress ● Version condition check fails
  • 38. How do you know you are done ● Write something somewhere
  • 39. The “client side” of reconfiguration • When system changes, clients need to stay connected – The usual solution: directory service (e.g., DNS) • Re-balancing load during reconfiguration is also important! • Goal: uniform #clients per server with minimal client migration – Migration should be proportional to change in membership X 10 X 10 X 10
  • 40. The “client side” of reconfiguration • When system changes, clients need to stay connected – The usual solution: directory service (e.g., DNS) • Re-balancing load during reconfiguration is also important! • Goal: uniform #clients per server with minimal client migration – Migration should be proportional to change in membership X 10 X 10 X 10
  • 41. Our approach - Probabilistic Load Balancing • Example 1 : X 10 X 10 X 10
  • 42. Our approach - Probabilistic Load Balancing • Example 1 : X 10 X 10 X 10
  • 43. Our approach - Probabilistic Load Balancing • Example 1 : X6 X6 X6 X6 X6 – Each client moves to a random new server with probability 0.4 – 1 – 3/5 = 0.4 – Exp. 40% clients will move off of each server
  • 44. Our approach - Probabilistic Load Balancing • Example 1 : X6 X6 X6 X6 X6 – Each client moves to a random new server with probability 0.4 – 1 – 3/5 = 0.4 – Exp. 40% clients will move off of each server ● Example 2 : X6 X6 X6 X6 X6
  • 45. Our approach - Probabilistic Load Balancing • Example 1 : X6 X6 X6 X6 X6 – Each client moves to a random new server with probability 0.4 – 1 – 3/5 = 0.4 – Exp. 40% clients will move off of each server ● Example 2 : X6 X6 X6 X6 X6
  • 46. Our approach - Probabilistic Load Balancing • Example 1 : X6 X6 X6 X6 X6 – Each client moves to a random new server with probability 0.4 – 1 – 3/5 = 0.4 – Exp. 40% clients will move off of each server ● Example 2 : 4/18 4/18 10/18 X6 X6 X6 X6 X6 – Connected clients don’t move – Disconnected clients move to old servers with prob 4/18 and new one with prob 10/18 – Exp. 8 clients will move from A, B, C to D, E and 10 to F
  • 47. Our approach - Probabilistic Load Balancing • Example 1 : X6 X6 X6 X6 X6 – Each client moves to a random new server with probability 0.4 – 1 – 3/5 = 0.4 – Exp. 40% clients will move off of each server ● Example 2 : 4/18 4/18 10/18 X 10 X 10 X 10 – Connected clients don’t move – Disconnected clients move to old servers with prob 4/18 and new one with prob 10/18 – Exp. 8 clients will move from A, B, C to D, E and 10 to F
  • 49. ProbabilisticCurrent Load Balancing When moving from config. S to S’: E (load (i, S ' )) = load (i, S ) + ∑ load ( j, S ) ⋅ Pr( j → i ) − load (i, S ) ∑ Pr(i → j ) j∈S ∧ j ≠i j∈S ' ∧ j ≠i expected #clients #clients connected to i in S’ connected #clients (10 in last example) to i in S #clients moving to i from moving from i to other servers in S other servers in S’ Solving for Pr we get case-specific probabilities. Input: each client answers locally Question 1: Are there more servers now or less ? Question 2: Is my server being removed? Output: 1) disconnect or stay connected to my server if disconnect 2) Pr(connect to one of the old servers) and Pr(connect to newly added server)
  • 50. Implementation • Implemented in Zookeeper (Java & C), integration ongoing – 3 new Zookeeper API calls: reconfig, getConfig, updateServerList – feature requested since 2008, expected in 3.5.0 release (july 2012) • Dynamic changes to: – Membership – Quorum System – Server roles – Addresses & ports • Reconfiguration modes: – Incremental (add servers E and D, remove server B) – Non-incremental (new config = {A, C, D, E}) – Blind or conditioned (reconfig only if current config is #5) • Subscriptions to config changes – Client can invoke client-side re-balancing upon change
  • 51. 52 Summary • Design and implementation of reconfiguration for Apache Zookeeper – being contributed into Zookeeper codebase • Much simpler than state of the art, using properties already provided by Zookeeper • Many nice features: – Doesn’t limit concurrency – Reconfigures immediately – Preserves primary order – Doesn’t stop client ops – Zookeeper used by online systems, any delay must be avoided – Clients work with a single configuration at a time – No external services – Includes client-side rebalancing