Dynamic Reconfiguration of Apache ZooKeeper

Dynamic Reconfiguration of ZooKeeper

Alex Shraer
(presented by Benjamin Reed)

Why ZooKeeper?

•
Lots of servers
•
Lots of processes
•
High volumes of data
•
Highly complex software systems
•
… mere mortal developers

What ZooKeeper gives you
● Simple programming model
● Coordination of distributed processes
● Fast notification of changes
● Elasticity
● Easy setup
● High availability

ZooKeeper Configuration

• Membership
• Role of each server
– E.g., follower or observer
• Quorum System spec
– Zookeeper: majority or hierarchical
• Network addresses & ports
• Timeouts, directory paths, etc.

Zookeeper - distributed and replicated
ZooKeeper Service
Leader

Server Server Server Server Server

Client Client Client Client Client Client Client Client

• All servers store a copy of the data (in memory)
• A leader is elected at startup
• Reads served by followers, all updates go through leader
• Update acked when a quorum of servers have persisted the
change (on disk)
• Zookeeper uses ZAB - its own atomic broadcast protocol

Dynamic Membership Changes
• Necessary in every long-lived system!
• Examples:
– Cloud computing: adapt to changing load, don’t pre-allocate!
– Failures: replacing failed nodes with healthy ones
– Upgrades: replacing out-of-date nodes with up-to-date ones
– Free up storage space: decreasing the number of replicas
– Moving nodes: within the network or the data center
– Increase resilience by changing the set of servers
Example: asynch. replication works as long as > #servers/2 operate:

Hazards of Manual Reconfiguration
E
A

C

{A, B, C}

B {A, B, C} D

{A, B, C}

• Goal: add servers E and D

E
A

C

{A, B, C, D, E} {A, B, C, D, E}

B {A, B, C, D, E} D

{A, B, C, D, E}
{A, B, C, D, E}

• Change Configuration

E
A

C

{A, B, C, D, E} {A, B, C, D, E}

B {A, B, C, D, E} D

{A, B, C, D, E}
{A, B, C, D, E}

• Restart Servers

E
A

C

{A, B, C, D, E} {A, B, C, D, E}

B {A, B, C, D, E} D

{A, B, C, D, E}
{A, B, C, D, E}

• Restart Servers
• Lost and !

18

Just use a coordination service!
• Zookeeper is the coordination service
– Don’t want to deploy another system to coordinate it!

• Who will reconfigure that system ?
– GFS has 3 levels of coordination services

• More system components -> more management overhead

• Use Zookeeper to reconfigure itself!
– Other systems store configuration information in Zookeeper
– Can we do the same??
– Only if there are no failures

Recovery in Zookeeper

C E

B

setData(/x, 5)

D
A

This doesn’t work for reconfigurations!
E
C

B
{A, B, C, D, E} {A, B, C, D, E}

setData(/zookeeper/config, {A, B, F})
{A, B, C, D, E} D
remove C, D, E add F

F
{A, B, C, D, E}
A

{A, B, C, D, E}

E
C

B
{A, B, C, D, E} {A, B, C, D, E}

{A, B, C, D, E} D

F
{A, B, C, D, E}
A

{A, B, F}
{A, B, F}

E
C

B
{A, B, C, D, E} {A, B, C, D, E}

{A, B, C, D, E} D

F
{A, B, C, D, E}
A

{A, B, F}
{A, B, F}

• Must persist the decision to reconfigure in the old
config before activating the new config!
• Once such decision is reached, must not allow further
ops to be committed in old config

Our Solution
• Correct
• Fully automatic
• No external services or additional components
• Minimal changes to Zookeeper
• Usually unnoticeable to clients
– Pause operations only in rare circumstances
– Clients work with a single configuration
• Rebalances clients across servers in new configuration

• Reconfigures immediately

• Speculative Reconfiguration
– Reconfiguration (and commands that follow it) speculatively sent out by the
primary, similarly to all other updates

Principles
● Commit reconfig in a quorum of the old ensemble
– Submit reconfig op just like any other update
● Make sure new ensemble has latest state before
becoming active
– Get quorum of synced followers from new config
– Get acks from both old and new ensembles before committing
updates proposed between reconfig op and activation
– Activate new configuration when reconfig commits
● Once new ensemble active old ensemble cannot commit
or propose new updates
● Gossip activation through leader election and syncing
● Verify configuration id of leader and follower

Reconfiguration scenario 1
E
A

C

{A, B, C} {A, B, C}

B {A, B, C} D

{A, B, C}
{A, B, C}


E
A

C

{A, B, C}

B {A, B, C} D

{A, B, C}

• doesn't commit until quorums of
both ensembles ack

E
A

C

{A, B, C} {A, B, C}

B {A, B, C} D

{A, B, C}
{A, B, C}

both ensembles ack

E
A

C

{A, B, C, D, E} {A, B, C, D, E}

B {A, B, C} D

{A, B, C, D, E}
{A, B, C, D, E}

both ensembles ack

E
A

C

{A, B, C, D, E} {A, B, C, D, E}

B {A, B, C} D

{A, B, C, D, E}
{A, B, C, D, E}

both ensembles ack
• E and D gossip new configuration
to C

E
A

C

{A, B, C, D, E} {A, B, C, D, E}

B {A, B, C, D, E} D

{A, B, C, D, E}
{A, B, C, D, E}

both ensembles ack
• E and D gossip new configuration
to C

Example - reconfig using CLI
reconfig -add 1=host1.com:1234:1235:observer;1239

-add 2=host2.com:1236:1237:follower;1231 -remove 5
●
Change follower 1 to an observer and change its ports
●
Add follower 2 to the ensemble
●
Remove follower 5 from the ensemble

reconfig -file myNewConfig.txt -v 234547
●
Change the current config to the one in myNewConfig.txt
●
But only if current config version is 234547

getConfig -w -c
●
set a watch on /zookeeper/config
●
-c means we only want the new connection string for clients

When it will not work
● Quorum of new ensemble must be in sync
● Another reconfig in progress
● Version condition check fails

How do you know you are done
● Write something somewhere

The “client side” of reconfiguration
• When system changes, clients need to stay connected
– The usual solution: directory service (e.g., DNS)
• Re-balancing load during reconfiguration is also important!
• Goal: uniform #clients per server with minimal client migration
– Migration should be proportional to change in membership

X 10 X 10 X 10

Our approach - Probabilistic Load Balancing
• Example 1 :

X 10 X 10 X 10

• Example 1 :

X6 X6 X6 X6 X6

– Each client moves to a random new server with probability 0.4
– 1 – 3/5 = 0.4

– Exp. 40% clients will move off of each server

• Example 1 :

X6 X6 X6 X6 X6

– 1 – 3/5 = 0.4

●
Example 2 :

X6 X6 X6 X6 X6

• Example 1 :

X6 X6 X6 X6 X6
– 1 – 3/5 = 0.4

●
Example 2 :
4/18 4/18 10/18

X6 X6 X6 X6 X6

– Connected clients don’t move
– Disconnected clients move to old servers with prob 4/18 and new one with prob
10/18
– Exp. 8 clients will move from A, B, C to D, E and 10 to F

• Example 1 :

X6 X6 X6 X6 X6
– 1 – 3/5 = 0.4

●
Example 2 :
4/18 4/18 10/18

X 10 X 10 X 10

– Connected clients don’t move
– Disconnected clients move to old servers with prob 4/18 and new one with prob
10/18
– Exp. 8 clients will move from A, B, C to D, E and 10 to F

ProbabilisticCurrent Load Balancing
When moving from config. S to S’:
E (load (i, S ' )) = load (i, S ) + ∑ load ( j, S ) ⋅ Pr( j → i ) − load (i, S ) ∑ Pr(i → j )
j∈S ∧ j ≠i j∈S ' ∧ j ≠i

expected #clients #clients
connected to i in S’ connected #clients
(10 in last example) to i in S #clients
moving to i from moving from i to
other servers in S other servers in S’
Solving for Pr we get case-specific probabilities.
Input: each client answers locally
Question 1: Are there more servers now or less ?
Question 2: Is my server being removed?
Output: 1) disconnect or stay connected to my server
if disconnect 2) Pr(connect to one of the old servers)
and Pr(connect to newly added server)

Implementation
• Implemented in Zookeeper (Java & C), integration ongoing
– 3 new Zookeeper API calls: reconfig, getConfig, updateServerList
– feature requested since 2008, expected in 3.5.0 release (july 2012)
• Dynamic changes to:
– Membership
– Quorum System
– Server roles
– Addresses & ports
• Reconfiguration modes:
– Incremental (add servers E and D, remove server B)
– Non-incremental (new config = {A, C, D, E})
– Blind or conditioned (reconfig only if current config is #5)
• Subscriptions to config changes
– Client can invoke client-side re-balancing upon change

52

Summary
• Design and implementation of reconfiguration for Apache Zookeeper
– being contributed into Zookeeper codebase

• Much simpler than state of the art, using properties already provided by Zookeeper

• Many nice features:
– Doesn’t limit concurrency
– Reconfigures immediately
– Preserves primary order
– Doesn’t stop client ops
– Zookeeper used by online systems, any delay must be avoided
– Clients work with a single configuration at a time
– No external services
– Includes client-side rebalancing

Dynamic Reconfiguration of Apache ZooKeeper

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Dynamic Reconfiguration of Apache ZooKeeper

Similar a Dynamic Reconfiguration of Apache ZooKeeper (20)

Más de DataWorks Summit

Más de DataWorks Summit (20)

Último

Último (20)

Dynamic Reconfiguration of Apache ZooKeeper