SlideShare una empresa de Scribd logo
1 de 233
1
1
Rebalance Protocol inside-out:
a Developer Perspective
Boyang Chen
2
2
2
Agenda
● What is rebalancing?
● How to design a
rebalancing protocol?
● Unnecessary rebalances
● Look into the future:
static membership
● Debugging tip
2
3
3
Boyang
Chen’s Bio
4
4
Boyang
Chen’s Bio
● Software engineer (Kafka Streams)
5
5
Boyang
Chen’s Bio
● Software engineer (Kafka Streams)
● Software engineer (Ads infrastructure)
6
6
Boyang
Chen’s Bio
● Software engineer (Kafka Streams)
● Software engineer (Ads infrastructure)
● Kafka Summit SF 2018: Building Pinterest
Real-Time Ads Platform Using Kafka Streams
7
7
Boyang
Chen’s Bio
● Software engineer (Kafka Streams)
● Software engineer (Ads infrastructure)
● Kafka Summit SF 2018: Building Pinterest
Real-Time Ads Platform Using Kafka Streams
● Kafka Summit SF 2019: Static Membership:
Rebalance Strategy Designed for the Cloud
What is
rebalancing?
8
What is
rebalancing?
● Group membership
9
What is
rebalancing?
● Group membership
● Resource assignment
10
What is
rebalancing?
● Group membership
● Resource assignment
● Example: Coordinator – Worker
model
11Coordinator
T1 T2 T3
T5T4 T6
What is
rebalancing?
● Group membership
● Resource assignment
● Example: Coordinator – Worker
model
1. New members join the group
12Coordinator
T1 T2 T3
T5T4 T6
What is
rebalancing?
● Group membership
● Resource assignment
● Example: Coordinator – Worker
model
1. New members join the group
13Coordinator
T1 T2 T3
T5T4 T6
What is
rebalancing?
● Group membership
● Resource assignment
● Example: Coordinator – Worker
model
1. New members join the group
2. Perform assignment
14Coordinator
T1 T2 T3
T5T4 T6
What is
rebalancing?
● Group membership
● Resource assignment
● Example: Coordinator – Worker
model
1. New members join the group
2. Perform assignment
3. Propagate
15Coordinator
T3T1 T2
T1 T2 T3
T5T4 T6
What is
rebalancing?
● Group membership
● Resource assignment
● Example: Coordinator – Worker
model
1. New members join the group
2. Perform assignment
3. Propagate
4. Done!
16Coordinator
T3T1 T2
T1 T2 T3
T5T4 T6
T6T4 T5
17
17
17
How to design a rebalance protocol?
18
18
18
How to design a rebalance protocol?
1. Membership changes:
19
19
19
How to design a rebalance protocol?
1. Membership changes:
(a) Member joins/leaves the group
20
20
20
How to design a rebalance protocol?
1. Membership changes:
(a) Member joins/leaves the group
(b) Member times out
21
21
21
How to design a rebalance protocol?
1. Membership changes:
(a) Member joins/leaves the group
(b) Member times out
(c) Zombie member fencing
22
22
22
How to design a rebalance protocol?
1. Membership changes:
(a) Member joins/leaves the group
(b) Member times out
(c) Zombie member fencing
2. Assignor changes
23
23
23
How to design a rebalance protocol?
1. Membership changes:
(a) Member joins/leaves the group
(b) Member times out
(c) Zombie member fencing
2. Assignor changes
3. Task changes
Catch membership
change
1. Spin up a new member
24Coordinator
T3T1 T2
T1 T2 T3
T5T4 T6
T6T4 T5
25Coordinator
T3T1 T2
T1 T2 T3
T5T4 T6
T6T4 T5
Catch membership
change
1. Spin up a new member
2. A new member joins
26Coordinator
T3T1 T2
T1 T2 T3
T5T4 T6
T6T4 T5
Catch membership
change
1. Spin up a new member
2. A new member joins
3. Revoke active tasks
4. Require members to rejoin
Catch membership
change
1. Spin up a new member
2. A new member joins
3. Revoke active tasks
4. Require members to rejoin
27Coordinator
T3T1 T2 T6T4 T5
T1 T2 T3
T5T4 T6
Catch membership
change
1. Spin up a new member
2. A new member joins
3. Revoke active tasks
4. Require members to rejoin
5. Perform new assignment
28Coordinator
T3T1 T2 T6T4 T5
T1 T2 T3
T5T4 T6
Catch membership
change
1. Spin up a new member
2. A new member joins
3. Revoke active tasks
4. Require members to rejoin
5. Perform new assignment
6. Propagate to members
29Coordinator
T6T4 T5
T1 T2 T3
T5T4 T6
T4T1
Catch membership
change
1. Spin up a new member
2. A new member joins
3. Revoke active tasks
4. Require members to rejoin
5. Perform new assignment
6. Propagate to members
30Coordinator
T1 T2 T3
T5T4 T6
T4T1 T5T2
Catch membership
change
1. Spin up a new member
2. A new member joins
3. Revoke active tasks
4. Require members to rejoin
5. Perform new assignment
6. Propagate to members
7. Done!
31Coordinator
T1 T2 T3
T5T4 T6
T4T1 T5T2 T6T3
Catch membership
change
1. Remove an active member
32Coordinator
T1 T2 T3
T5T4 T6
T4T1 T5T2 T6T3
Catch membership
change
1. Remove an active member
2. Member sends leave group
request
33Coordinator
T1 T2 T3
T5T4 T6
T4T1 T5T2 T6T3
Catch membership
change
1. Remove an active member
2. Member sends leave group
request
3. Revoke other members’ active
tasks
34Coordinator
T1 T2 T3
T5T4 T6
T4T1 T5T2
Catch membership
change
1. Remove an active member
2. Member sends leave group
request
3. Revoke other members’ active
tasks
4. Require members to rejoin
35Coordinator
T1 T2 T3
T5T4 T6
T4T1 T5T2
Catch membership
change
1. Remove an active member
2. Member sends leave group
request
3. Revoke other members’ active
tasks
4. Require members to rejoin
36Coordinator
T1 T2 T3
T5T4 T6
T4T1 T5T2
Catch membership
change
1. Remove an active member
2. Member sends leave group
request
3. Revoke other members’ active
tasks
4. Require members to rejoin
5. Perform assignment
37Coordinator
T1 T2 T3
T5T4 T6
T4T1 T5T2
Catch membership
change
1. Remove an active member
2. Member sends leave group
request
3. Revoke other members’ active
tasks
4. Require members to rejoin
5. Perform assignment
6. Propagate to members
38Coordinator
T1 T2 T3
T5T4 T6
T5T2T3T1 T2
Catch membership
change
1. Remove an active member
2. Member sends leave group
request
3. Revoke other members’ active
tasks
4. Require members to rejoin
5. Perform assignment
6. Propagate to members
7. Done!
39Coordinator
T1 T2 T3
T5T4 T6
T3T1 T2 T6T4 T5
40
40
40
How to design a rebalance protocol?
1. Membership changes:
(a) Member joins/leaves the group
(b) Member times out
(c) Zombie member fencing
2. Assignor changes
3. Task changes
Timeout configs
41
Timeout configs
● Liveness guarantee
○ session.timeout.ms
42
Timeout configs
● Liveness guarantee
○ session.timeout.ms
● Progress guarantee
○ max.poll.interval.ms
○ rebalance.timeout.ms
43
Session timeout
44Coordinator
T3T1 T2
T1 T2 T3
T5T4 T6
T6T4 T5
Session timeout
● Background thread sends
heartbeat
45Coordinator
T3T1 T2
T1 T2 T3
T5T4 T6
T6T4 T5
Session timeout
1. Member crashes without
sending leave group
46Coordinator
T3T1 T2
T1 T2 T3
T5T4 T6
T6T4 T5
Session timeout
1. Member crashes without
sending leave group
2. Session timeout reaches
47Coordinator
T3T1 T2
T1 T2 T3
T5T4 T6
T6T4 T5
Session timeout
1. Member crashes without
sending leave group
2. Session timeout reaches
3. Require other members to
revoke tasks/rejoin
48Coordinator
T3T1 T2
T1 T2 T3
T5T4 T6
Session timeout
1. Member crashes without
sending leave group
2. Session timeout reaches
3. Require other members to
revoke tasks/rejoin
49Coordinator
T3T1 T2
T1 T2 T3
T5T4 T6
Session timeout
1. Member crashes without
sending leave group
2. Session timeout reaches
3. Require other members to
revoke tasks/rejoin
4. Perform Assignment
50Coordinator
T3T1 T2
T1 T2 T3
T5T4 T6
Session timeout
1. Member crashes without
sending leave group
2. Session timeout reaches
3. Require other members to
revoke tasks/rejoin
4. Perform Assignment
5. Propagate…
6. Done!
51Coordinator
T3T1 T2
T6T4 T5
T1 T2 T3
T5T4 T6
Max poll interval
timeout
52Coordinator
T3T1 T2
T1 T2
T4T3
T4
Max poll interval
timeout
● Poll – Process – Commit
53Coordinator
T3T1 T2
Poll()
…
Poll()
…
T1 T2
T4T3
T4
Max poll interval
timeout
● Poll – Process – Commit
54Coordinator
T3T1 T2
Poll()
…
Poll()
…
Poll()
…
Poll()
…
T1 T2
T4T3
T4
Max poll interval
timeout
● Poll – Process – Commit
● One process takes too long
55Coordinator
T3T1 T2
Poll()
…
Poll()
…
Poll()
…
T1 T2
T4T3
Poll()
…
Poll()
…
Poll()
………… T4
Poll()
…
Poll()
…
Poll()
…………
…………
Max poll interval
timeout
● Poll – Process – Commit
● One process takes too long
● Reach timeout limit
56Coordinator
T3T1 T2
Poll()
…
Poll()
…
Poll()
…
T1 T2
T4T3
T4
Poll()
…
Poll()
…
Poll()
…………
…………
>= max.poll.interval.ms
Max poll interval
timeout
● Poll – Process – Commit
● One process takes too long
● Reach timeout limit
1. Member takes too long to
process
57Coordinator
T3T1 T2
Poll()
…
Poll()
…
Poll()
…
T1 T2
T4T3
T4
Poll()
…
Poll()
…
Poll()
…………
…………
>= max.poll.interval.ms
Max poll interval
timeout
● Poll – Process – Commit
● One process takes too long
● Reach timeout limit
1. Member takes too long to
process
2. Background thread stops
heartbeat and sends leave group
58Coordinator
T3T1 T2
Poll()
…
Poll()
…
Poll()
…T4
T1 T2
T4T3
Max poll interval
timeout
● Poll – Process – Commit
● One process takes too long
● Reach timeout limit
1. Member takes too long to
process
2. Background thread stops
heartbeat and sends leave group
3. Ask others to revoke task/rejoin
59Coordinator
T3T1 T2 T4
T1 T2
T4T3
Max poll interval
timeout
● Poll – Process – Commit
● One process takes too long
● Reach timeout limit
1. Member takes too long to
process
2. Background thread stops
heartbeat and sends leave group
3. Ask others to revoke task/rejoin
4. Perform assignment
60Coordinator
T3T1 T2 T4
T1 T2
T4T3
Max poll interval
timeout
● Poll – Process – Commit
● One process takes too long
● Reach timeout limit
1. Member takes too long to
process
2. Background thread stops
heartbeat and sends leave group
3. Ask others to revoke task/rejoin
4. Perform assignment
5. Propagate to members, done!
61Coordinator
T1 T2 T3
T5T4 T6
T4T2T3T1
Rebalance timeout
● Max time for a member to rejoin
during rebalance
62Coordinator
T3T1 T2
T1 T2 T3
T5T4 T6
T6T4 T5
Rebalance timeout
● Max time for a member to rejoin
during rebalance
● Use the max value of
max.poll.interval among all
clients
○ Member has to finish
ongoing work
63Coordinator
T3T1 T2
T1 T2 T3
T5T4 T6
T6T4 T5
Rebalance timeout
● Max time for a member to rejoin
during rebalance
● Use the max value of
max.poll.interval among all
clients
○ Member has to finish
ongoing work
● Track with given member id
64Coordinator
T3T1 T2
T1 T2 T3
T5T4 T6
T6T4 T5
m1, m2
Members
mID: m1 mID: m2
Rebalance timeout
● Max time for a member to rejoin
during rebalance
● Use the max value of
max.poll.interval among all
clients
○ Member has to finish
ongoing work
● Track with given member id
● Register callback
65Coordinator
T3T1 T2
T1 T2 T3
T5T4 T6
T6T4 T5
m1, m2
Members
mID: m1 mID: m2
Join callback
Rebalance timeout
● …
● Register callback
1. Group starts to rebalance
66Coordinator
T3T1 T2 T6T4 T5
T1 T2 T3
T5T4 T6
mID: m1 mID: m2
m1, m2
MembersJoin callback
Rebalance timeout
● …
● Register callback
1. Group starts to rebalance
2. Member m1 rejoins successfully
67Coordinator
T3T1 T2 T6T4 T5
T1 T2 T3
T5T4 T6
mID: m1 mID: m2
m1, m2
Members
<m1>
Join callback
Rebalance timeout
● …
● Register callback
1. Group starts to rebalance
2. Member m1 rejoins successfully
3. Member m2 gets stuck
68Coordinator
T3T1 T2
T1 T2 T3
T5T4 T6
mID: m1 mID: m2
m1, m2
Members
<m1>
Join callback
T6T4 T5
Rebalance timeout
● …
● Register callback
1. Group starts to rebalance
2. Member m1 rejoins successfully
3. Member m2 gets stuck
4. Rebalance timeout reached
69Coordinator
T3T1 T2
T1 T2 T3
T5T4 T6
mID: m1 mID: m2
m1, m2
Members
<m1>
Join callback
T6T4 T5
Rebalance timeout
● …
● Register callback
1. Group starts to rebalance
2. Member m1 rejoins successfully
3. Member m2 gets stuck
4. Rebalance timeout reached
5. Perform assignment
70Coordinator
T3T1 T2
T1 T2 T3
T5T4 T6
mID: m1 mID: m2
m1
Members
<m1>
Join callback
T6T4 T5
Rebalance timeout
● …
● Register callback
1. Group starts to rebalance
2. Member m1 rejoins successfully
3. Member m2 gets stuck
4. Rebalance timeout reached
5. Perform assignment
6. Propagate, done!
71Coordinator
mID: m1 mID: m2
m1
MembersJoin callback
T6T4 T5T3T1 T2
T6T4 T5
T1 T2 T3
T5T4 T6
mID: m1
72
72
72
How to design a rebalance protocol?
1. Membership changes:
(a) Member joins/leaves the group
(b) Member times out
(c) Zombie member fencing
2. Assignor changes
3. Task changes
Fencing zombie
73Coordinator
mID: m1 mID: m2
m1
Members
T6T4 T5T3T1 T2
T6T4 T5
T1 T2 T3
T5T4 T6
mID: m1
Fencing zombie
● What if a zombie member rejoins?
74Coordinator
mID: m1 mID: m2
m1
Members
T6T4 T5T3T1 T2
T6T4 T5
T1 T2 T3
T5T4 T6
mID: m1
Fencing zombie
● Bump generation number after
each rebalance
75Coordinator
T3T1 T2
gen: 1
m1, m2
Members
mID: m1
T6T4 T5
gen: 1
mID: m2
T1 T2 T3
T5T4 T6
Generation 1
Fencing zombie
● Bump generation number after
each rebalance
1. Member m1 rejoins group
76Coordinator
T3T1 T2
gen: 1
m1, m2
Members
mID: m1
T6T4 T5
gen: 1
mID: m2
T1 T2 T3
T5T4 T6
Generation 1
Fencing zombie
● Bump generation number after
each rebalance
1. Member m1 rejoins group
2. All members rejoin/revoke tasks
77Coordinator
T3T1 T2
gen: 1
m1, m2
Members
mID: m1
T6T4 T5
gen: 1
mID: m2
T1 T2 T3
T5T4 T6
Generation 1
Fencing zombie
● Bump generation number after
each rebalance
1. Member m1 rejoins group
2. All members rejoin/revoke tasks
3. Bump generation number
78Coordinator
T3T1 T2
gen: 1
m1, m2
Members
mID: m1
T6T4 T5
gen: 1
mID: m2
T1 T2 T3
T5T4 T6
Generation 1
Generation 2
Fencing zombie
● Bump generation number after
each rebalance
1. Member m1 rejoins group
2. All members rejoin/revoke tasks
3. Bump generation number
4. Perform assignment
79Coordinator
T3T1 T2
gen: 1
m1, m2
Members
mID: m1
T6T4 T5
gen: 1
mID: m2
T1 T2 T3
T5T4 T6
Generation 2
Fencing zombie
● Bump generation number after
each rebalance
1. Member m1 rejoins group
2. All members rejoin/revoke tasks
3. Bump generation number
4. Perform assignment
5. Propagate, and Done!
6. Group currently stable at
generation 2
80Coordinator
T3T1 T2
gen: 2
m1, m2
Members
mID: m1
T6T4 T5
gen: 2
mID: m2
T1 T2 T3
T5T4 T6
Generation 2
Fencing zombie
1. …
6. Group currently stable at
generation 2
7. Rebalance triggers again
81Coordinator
T3T1 T2
gen: 2
m1, m2
Members
mID: m1
T6T4 T5
gen: 2
mID: m2
T1 T2 T3
T5T4 T6
Generation 2
Join callback
Fencing zombie
1. …
6. Group currently stable at
generation 2
7. Rebalance triggers again
8. Member m1 rejoins
82Coordinator
T3T1 T2
gen: 2
m1, m2
Members
mID: m1
T6T4 T5
gen: 2
mID: m2
T1 T2 T3
T5T4 T6
Generation 2
<m1>
Join callback
Fencing zombie
1. …
6. Group currently stable at
generation 2
7. Rebalance triggers again
8. Member m1 rejoins
9. Member m2 has transient failure
83Coordinator
T3T1 T2
gen: 2
m1, m2
Members
mID: m1
T6T4 T5
gen: 2
mID: m2
T1 T2 T3
T5T4 T6
Generation 2
<m1>
Join callback
Fencing zombie
1. …
6. Group currently stable at
generation 2
7. Rebalance triggers again
8. Member m1 rejoins
9. Rebalance timeout reaches,
kicking out m2
84Coordinator
T3T1 T2
gen: 2
m1, m2
Members
mID: m1
T6T4 T5
gen: 2
mID: m2
T1 T2 T3
T5T4 T6
Generation 2
<m1>
Join callback
Fencing zombie
1. …
6. Group currently stable at
generation 2
7. Rebalance triggers again
8. Member m1 rejoins
9. Rebalance timeout reaches,
kicking out m2
10. Bump generation to 3
85Coordinator
T3T1 T2
gen: 2
m1, m2
Members
mID: m1
T6T4 T5
gen: 2
mID: m2
T1 T2 T3
T5T4 T6
Generation 2
Generation 3
<m1>
Join callback
Fencing zombie
1. …
6. Group currently stable at
generation 2
7. Rebalance triggers again
8. Member m1 rejoins
9. Rebalance timeout reaches,
kicking out m2
10. Bump generation to 3
11. Perform assignment
86Coordinator
T3T1 T2
gen: 2
m1, m2
Members
mID: m1
T6T4 T5
gen: 2
mID: m2
T1 T2 T3
T5T4 T6
Generation 2
Generation 3
<m1>
Join callback
Fencing zombie
1. …
6. Group currently stable at
generation 2
7. Rebalance triggers again
8. Member m1 rejoins
9. Rebalance timeout reaches,
kicking out m2
10. Bump generation to 3
11. Perform assignment
12. Propagate, and done!
13. Group stable at generation 3
87Coordinator
gen: 3
mID: m1
T6T4 T5
gen: 2
mID: m2
T1 T2 T3
T5T4 T6
m1
Members
Generation 3
Join callback
T6T1 T2 …
Fencing zombie
1. …
11. Group stable at generation 3
12. Member m2 rejoins in a zombie
mode
88Coordinator
gen: 3
mID: m1
T6T4 T5
gen: 2
mID: m2
T1 T2 T3
T5T4 T6
Join callback
…
m1
Members
Generation 3
T6T1 T2 …
Fencing zombie
1. …
11. Group stable at generation 3
12. Member m2 rejoins in a zombie
mode
13. Fenced by mismatched
generation
89Coordinator
gen: 3
mID: m1
T6T4 T5
gen: 2
mID: m2
T1 T2 T3
T5T4 T6
Join callback
…
X
m1
Members
Generation 3
T6T1 T2 …
Fencing zombie
1. …
11. Group stable at generation 3
12. Member m2 rejoins in a zombie
mode
13. Fenced by mismatched
generation
14. Reset local generation info
90Coordinator
gen: 3
mID: m1
gen: --
mID: --
T1 T2 T3
T5T4 T6
Join callback
…
m1
Members
Generation 3
T6T1 T2 …
Fencing zombie
1. …
11. Group stable at generation 3
12. Member m2 rejoins in a zombie
mode
13. Fenced by mismatched
generation
14. Reset local generation info
15. Rejoin as unknown member
without generation
91Coordinator
gen: 3
mID: m1
gen: --
mID: --
T1 T2 T3
T5T4 T6
Join callback
…
m1
Members
Generation 3
T6T1 T2 …
Fencing zombie
1. …
11. Group stable at generation 3
12. Member m2 rejoins in a zombie
mode
13. Fenced by mismatched
generation
14. Reset local generation info
15. Rejoin as unknown member
without generation
16. Registered as m3
17. Group transits to rebalance
92Coordinator
gen: 3
mID: m1
gen: --
mID: --
T1 T2 T3
T5T4 T6
…
m1, m3
Members
Generation 3
<m3>
<m1>
Join callback
T6T1 T2 …
Fencing zombie
1. …
17. Group transits to rebalance
18. Bump generation to 4
93Coordinator
gen: 3
mID: m1
gen: --
mID: --
T1 T2 T3
T5T4 T6
…
m1, m3
Members
Generation 3
Generation 4
<m3>
<m1>
Join callback
T6T1 T2 …
Fencing zombie
1. …
17. Group transits to rebalance
18. Bump generation to 4
19. Perform assignment
94Coordinator
gen: 3
mID: m1
gen: --
mID: --
T1 T2 T3
T5T4 T6
…
m1, m3
Members
Generation 3
Generation 4
<m3>
<m1>
Join callback
T6T1 T2 …
Fencing zombie
1. …
17. Group transits to rebalance
18. Bump generation to 4
19. Perform assignment
20. Propagate, and done!
95Coordinator
gen: 4
mID: m1
gen: 4
mID: m3
m1, m3
Members
Generation 4
Join callback
T3T1 T2 T6T4 T5
T1 T2 T3
T5T4 T6
96
96
96
How to design a rebalance protocol?
1. Membership changes:
(a) Member joins/leaves the group
(b) Member times out
(c) Zombie member fencing
2. Assignor changes
3. Task changes
97
97
97
Should we do use broker or client as the
assignor?
Do assignment on
broker
1. Stable with range assignment
98Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5 T8T7
Range
Do assignment on
broker
1. Stable with range assignment
2. Redeploy coordinator to use RR
99Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5 T8T7
Range
Round Robin
Do assignment on
broker
1. Stable with range assignment
2. Redeploy coordinator to use RR
3. Coordinator bounce completes
100Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5 T8T7
Round Robin
Do assignment on
broker
1. Stable with range assignment
2. Redeploy coordinator to use RR
3. Coordinator bounce completes
● Not an ideal approach
○ Restart stateful service
○ Affect other clients
○ Data protocol consistency
101Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5 T8T7
Round Robin
Do assignment on
client
1. Designated leader member
102Coordinator
T3T1 T2 T6T4 T5 T8T7
T1 T2
T5T4
T7 T8
T3 T6
Range
1. Designated leader member
2. Redeploy leader to use RR
103Coordinator
T3T1 T2 T6T4 T5 T8T7
T1 T2
T5T4
T7 T8
T3 T6
Range
Round Robin
Do assignment on
client
Do assignment on
client
1. Designated leader member
2. Redeploy leader to use RR
3. Leader restarted
104Coordinator
T3T1 T2 T6T4 T5 T8T7
T1 T2
T5T4
T7 T8
T3 T6
Round Robin
Do assignment on
client
1. Designated leader member
2. Redeploy leader to use RR
3. Leader restarted
4. Leader asks coordinator to
rebalance
105Coordinator
T6T4 T5 T8T7
T1 T2
T5T4
T7 T8
T3 T6
Round Robin
T3T1 T2
Do assignment on
client
1. Designated leader member
2. Redeploy leader to use RR
3. Leader restarted
4. Leader asks coordinator to
rebalance
5. Coordinator requires members
to revoke tasks/rejoin
106Coordinator
T3T1 T2 T6T4 T5
Round Robin
T1 T2
T5T4
T7 T8
T3 T6
T8T7
Do assignment on
client
1. Designated leader member
2. Redeploy leader to use RR
3. Leader restarted
4. Leader asks coordinator to
rebalance
5. Coordinator requires members
to revoke tasks/rejoin
107Coordinator
T3T1 T2 T6T4 T5
Round Robin
T1 T2
T5T4
T7 T8
T3 T6
T8T7
Do assignment on
client
1. Designated leader member
2. Redeploy leader to use RR
3. Leader restarted
4. Leader asks coordinator to
rebalance
5. Coordinator requires members
to revoke tasks/rejoin
6. Coordinator inform leader all the
members rejoined
108Coordinator
T3T1 T2 T6T4 T5 T8T7
T1 T2
T5T4
T7 T8
T3 T6
Round Robin
Do assignment on
client
1. …
5. Coordinator requires members
to revoke tasks/rejoin
6. Coordinator inform leader all the
members rejoined
7. Leader performs assignment
109Coordinator
T3T1 T2 T6T4 T5 T8T7
T1 T2
T5T4
T7 T8
T3 T6
Round Robin
Do assignment on
client
1. …
5. Coordinator requires members
to revoke tasks/rejoin
6. Coordinator inform leader all the
members rejoined
7. Leader performs assignment
8. Leader calls sync group to send
back assignment
110Coordinator
T3T1 T2 T6T4 T5 T8T7
T1 T2
T5T4
T7 T8
T3 T6
Round Robin
Do assignment on
client
1. …
5. Coordinator requires members
to revoke tasks/rejoin
6. Coordinator inform leader all the
members rejoined
7. Leader performs assignment
8. Leader calls sync group to send
back assignment
9. Coordinator propagates…
10. Done!
111Coordinator
T7T1 T4 T8T2 T5 T6T3
T1 T2
T5T4
T7 T8
T3 T6
112
112
112
How to design a rebalance protocol?
1. Membership changes:
(a) Member joins/leaves the group
(b) Member times out
(c) Zombie member fencing
2. Assignor changes
3. Task changes
Catch task change
113Coordinator
T4T1 T5T2 T6T3
T1 T2 T3
T5T4 T6
Catch task change
1. Add two new tasks
114Coordinator
T4T1 T5T2 T6T3
T1 T2 T3
T5T4 T6
T7
T8
Catch task change
1. Add two new tasks
2. Revoke all members’ active
tasks
3. Require members to rejoin
115Coordinator
T4T1 T5T2 T6T3
T1 T2 T3
T5T4 T6
T7
T8
Catch task change
1. Add two new tasks
2. Revoke all members’ active
tasks
3. Require members to rejoin
4. Perform assignment
5. Propagate to members
6. Done!
116Coordinator
T4T1 T5T2 T6T3
T1 T2 T3
T5T4 T6
T7
T8
Catch task change
1. Add two new tasks
2. Revoke all members’ active
tasks
3. Require members to rejoin
117Coordinator
T4T1 T5T2 T6T3
T1 T2 T3
T5T4 T6
T7
T8
Catch task change
1. Add two new tasks
2. Revoke all members’ active
tasks
3. Require members to rejoin
4. Perform assignment
118Coordinator
T4T1 T5T2 T6T3
T1 T2 T3
T5T4 T6
T7
T8
Catch task change
1. Add two new tasks
2. Revoke all members’ active
tasks
3. Require members to rejoin
4. Perform assignment
5. Propagate to members
119Coordinator
T5T2 T6T3
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2
Catch task change
1. Add two new tasks
2. Revoke all members’ active
tasks
3. Require members to rejoin
4. Perform assignment
5. Propagate to members
120Coordinator
T6T3
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5
Catch task change
1. Add two new tasks
2. Revoke all members’ active
tasks
3. Require members to rejoin
4. Perform assignment
5. Propagate to members
6. Done!
121Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5 T8T7
122
122
122
Recap:
1. Membership changes:
(a) Member joins/leaves the group
(b) Member times out
(c) Zombie member fencing
2. Assignor changes
3. Task changes
123
123
123
Congratulations! You have walked through all
the necessary parts of a rebalance algorithm!
Now let’s take a systematic view of it.
124
124
State Machine View: Two-phase protocol
Stable
125
125
State Machine View: Two-phase protocol
RebalanceStable
Rebalance
condition
triggered
126
126
State Machine View: Two-phase protocol
RebalanceStable
Rebalance
condition
triggered
Sync
All current
members join/
Rebalance
timeout
127
127
State Machine View: Two-phase protocol
RebalanceStable
Rebalance
condition
triggered
Sync
All current
members join/
Rebalance
timeout
Bump generation
128
128
State Machine View: Two-phase protocol
RebalanceStable
Rebalance
condition
triggered
Sync
All current
members join/
Rebalance
timeout
Leader sends
back the
assignment
RebalanceStable
Sync
Bump generation
129
129
State Machine View: Two-phase protocol
RebalanceStable
Sync
Leader sends
back the
assignment
Rebalance
condition
triggered
All current
members join/
Rebalance
timeout
Rebalance
condition
triggered
Bump generation
130
130
130
Rebalance is helpful, but sometimes harmful.
131
131
131
Rebalance is helpful, but sometimes harmful.
1. transient failure
132
132
132
Rebalance is helpful, but sometimes harmful.
1. transient failure
2. rolling bounce
Transient
unavailability
133Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5 T8T7
Transient
unavailability
1. One member couldn’t connect to
coordinator
134Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5 T8T7
Transient
unavailability
1. One member couldn’t connect to
coordinator
2. Session timeout reaches
135Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5 T8T7
Transient
unavailability
1. One member couldn’t connect to
coordinator
2. Session timeout reaches
3. Require other members to
revoke tasks/rejoin
136Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T8T7T3T1 T2 T6T4 T5
Transient
unavailability
1. One member couldn’t connect to
coordinator
2. Session timeout reaches
3. Require other members to
revoke tasks/rejoin
4. Perform Assignment
137Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T8T7T3T1 T2 T6T4 T5
Transient
unavailability
1. One member couldn’t connect to
coordinator
2. Session timeout reaches
3. Require other members to
revoke tasks/rejoin
4. Perform Assignment
5. Propagate…
6. Done! However one member
becomes zombie now
138Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T8T7T3T1 T2 T6T4 T5
T7 T8
Transient
unavailability
1. …
6. Done! However one member
becomes zombie now
7. Zombie member rejoins
139Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T8T7T3T1 T2 T6T4 T5
T7 T8
Transient
unavailability
1. …
6. Done! However one member
becomes zombie now
7. Zombie member rejoins
8. Zombie resets generation and
rejoins
140Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5
T7 T8
Transient
unavailability
1. …
6. Done! However one member
becomes zombie now
7. Zombie member rejoins
8. Zombie resets generation and
rejoins
9. Coordinator requires all
members to revoke tasks/rejoin
141Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T6T4 T5T3T1 T2
T7 T8
Transient
unavailability
1. …
6. Done! However one member
becomes zombie now
7. Zombie member rejoins
8. Zombie resets generation and
rejoins
9. Coordinator requires all
members to revoke tasks/rejoin
10. Perform assignment (different
from last time)
142Coordinator
T1 T2 T7
T5T4 T8
T3
T6
T6T4 T5T3T1 T2
T7 T8
Transient
unavailability
1. …
6. Done! However one member
becomes zombie now
7. Zombie member rejoins
8. Zombie resets generation and
rejoins
9. Coordinator requires all
members to revoke tasks/rejoin
10. Perform assignment (different
from last time)
11. Propagate, and done!
143Coordinator
T1 T2 T7
T5T4 T8
T3
T6
T6T3T7T1 T2 T8T4 T5
Transient
unavailability
● An unnecessary assignment
change
144Coordinator
T1 T2 T7
T5T4 T8
T3
T6
T6T3T7T1 T2 T8T4 T5
T3T1 T2 T6T4 T5 T8T7
Transient
unavailability
● An unnecessary assignment
change
● Solution:
○ Increase
session.timeout.ms
145Coordinator
T1 T2 T7
T5T4 T8
T3
T6
T6T3T7T1 T2 T8T4 T5
T3T1 T2 T6T4 T5 T8T7
Transient
unavailability
● An unnecessary assignment
change
● Solution:
○ Increase
session.timeout.ms
146Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5 T8T7
Transient
unavailability
● An unnecessary assignment
change
● Solution:
○ Increase
session.timeout.ms
147Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5 T8T7
Transient
unavailability
● An unnecessary assignment
change
● Solution:
○ Increase
session.timeout.ms
○ No rebalance triggered
148Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5 T8T7
Transient
unavailability
● An unnecessary assignment
change
● Solution:
○ Increase
session.timeout.ms
○ No rebalance triggered
○ Trade-off assignment
stickiness vs availability
149Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5 T8T7
150
150
150
Rebalance is helpful, but sometimes harmful.
1. transient failure
2. rolling bounce
Rolling bounce
151Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5 T8T7
ID: m1 ID: m2 ID: m3
m1, m2, m3
Members
Rolling bounce
1. Restart member fleet
152Coordinator
T1 T2 T3
T5T4 T6
T7
T8
ID: -- ID: -- ID: --
m1, m2, m3
Members
T3T1 T2 T6T4 T5 T8T7
Rolling bounce
1. Restart member fleet
2. Some member sends leave
group request
3. Members rejoin
153Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T6T4 T5T3T1 T2 T8T7
ID: -- ID: -- ID: --
Members
[ ]
Rolling bounce
1. Restart member fleet
2. Some member sends leave
group request
3. Members rejoin
154Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T6T4 T5T3T1 T2 T8T7
ID: -- ID: -- ID: --
Members
m4
Rolling bounce
1. Restart member fleet
2. Some member sends leave
group request
3. Members rejoin
155Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T6T4 T5T3T1 T2 T8T7
ID: -- ID: -- ID: --
Members
m4, m5
Rolling bounce
1. Restart member fleet
2. Some member sends leave
group request
3. Members rejoin
156Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T6T4 T5T3T1 T2 T8T7
ID: -- ID: -- ID: --
Members
m4, m5, m6
Rolling bounce
1. Restart member fleet
2. Some member sends leave
group request
3. Members rejoin
4. Member assignment gets
shuffled
157Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T6T4 T5T3T1 T2 T8T7
m4, m5, m6
Members
ID: -- ID: -- ID: --
Rolling bounce
1. Restart member fleet
2. Some member sends leave
group request
3. Members rejoin
4. Member assignment gets
shuffled
158Coordinator
T7 T4 T3
T5T2 T8
T1
T6
T6T4 T5T3T1 T2 T8T7
ID: -- ID: -- ID: --
m4, m5, m6
Members
Rolling bounce
1. Restart member fleet
2. Some member sends leave
group request
3. Members rejoin
4. Member assignment gets
shuffled
5. Perform assignment, and new
member id
159Coordinator
T7 T4 T3
T5T2 T8
T1
T6
T6T4 T5T3T1 T2 T8T7
ID: -- ID: -- ID: --
m4, m5, m6
Members
Rolling bounce
1. Restart member fleet
2. Some member sends leave
group request
3. Members rejoin
4. Member assignment gets
shuffled
5. Perform assignment, and new
member id
160Coordinator
T7 T4 T3
T5T2 T8
T1
T6
T6T4 T5 T8T7
ID: m4 ID: -- ID: --
m4, m5, m6
Members
T7T3 T4
Rolling bounce
1. Restart member fleet
2. Some member sends leave
group request
3. Members rejoin
4. Member assignment gets
shuffled
5. Perform assignment, and new
member.id
161Coordinator
T7 T4 T3
T5T2 T8
T1
T6
T8T7
ID: m4 ID: m5 ID: --
m4, m5, m6
Members
T7T3 T4 T8T2 T5
Rolling bounce
1. Restart member fleet
2. Some member sends leave
group request
3. Members rejoin
4. Member assignment gets
shuffled
5. Perform assignment, and new
member.id
6. Propagate…
7. Done!
162Coordinator
T7 T3 T4
T5T2 T8
T1
T6
T6T1T7T3 T4 T8T2 T5
ID: m4 ID: m5 ID: m6
m4, m5, m6
Members
Rolling bounce
● Another unnecessary
assignment change
163Coordinator
T7 T3 T4
T5T2 T8
T1
T6
T6T1T7T3 T4 T8T2 T5
T3T1 T2 T6T4 T5 T8T7
Rolling bounce
● Another unnecessary
assignment change
● No persistence of member
identity. After restart, the
member is unknown to the
coordinator.
164Coordinator
T7 T3 T4
T5T2 T8
T1
T6
T6T1T7T3 T4 T8T2 T5
T3T1 T2 T6T4 T5 T8T7
m1, m2, m3
m4, m5, m6
Members
165
165
165
Look into the future: static membership
166
166
166
Look into the future: static membership
1. Unique id for each member
167
167
167
Look into the future: static membership
1. Unique id for each member
2. Enlarge session timeout to make it
effective
168
168
168
Look into the future: static membership
1. Unique id for each member
2. Enlarge session timeout to make it
effective
3. No rebalance if just doing rolling bounce
169
169
169
Look into the future: static membership
1. Unique id for each member
2. Enlarge session timeout to make it
effective
3. No rebalance if just doing rolling bounce
4. Great if work with K8s
Static membership
170Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5 T8T7
Static membership
● Give each member a unique id
○ Config: group.instance.id
171Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5 T8T7
ID: w1 ID: w2 ID: w3
Static membership
● Give each member a unique id
○ Config: group.instance.id
○ Remember assignment info
on coordinator
172Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5 T8T7
ID: w1 ID: w2 ID: w3
Static membership
● Give each member a unique id
○ Config: group.instance.id
○ Remember assignment info
on coordinator
○ Static member never sends
leave group request
173Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5 T8T7
ID: w1 ID: w2 ID: w3
Static membership
● Give each member a unique id
○ Config: group.instance.id
○ Remember assignment info
on coordinator
○ Static member never sends
leave group request
○ No rebalance upon known
static member rejoin
174Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5 T8T7
ID: w1 ID: w2 ID: w3
Static membership
1. Restart member fleet
175Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5 T8T7
ID: w1 ID: w2 ID: w3
Static membership
1. Restart member fleet
2. Member w1 rejoins
176Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T6T4 T5 T8T7
ID: w1 ID: w2 ID: w3
T3T1 T2
Static membership
1. Restart member fleet
2. Member w1 rejoins
3. Coordinator gets w1’s
assignment
177Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T6T4 T5 T8T7
ID: w1 ID: w2 ID: w3
T3T1 T2
Static membership
1. Restart member fleet
2. Member w1 rejoins
3. Coordinator gets w1’s
assignment
4. Member w1 gets the same
assignment
178Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T6T4 T5 T8T7
ID: w1 ID: w2 ID: w3
T3T1 T2
Static membership
1. …
5. Member w2 rejoins
179Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T8T7
ID: w1 ID: w2 ID: w3
T3T1 T2 T6T4 T5
Static membership
1. …
5. Member w2 rejoins
6. Coordinator gets w2’s
assignment
180Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T8T7
ID: w1 ID: w2 ID: w3
T3T1 T2 T6T4 T5
Static membership
1. …
5. Member w2 rejoins
6. Coordinator gets w2’s
assignment
7. Member w2 gets the same
assignment
181Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T8T7
ID: w1 ID: w2 ID: w3
T3T1 T2 T6T4 T5
Static membership
1. …
8. Member w3 rejoins
182Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T8T7
ID: w1 ID: w2 ID: w3
T3T1 T2 T6T4 T5
Static membership
1. …
8. Member w3 rejoins
9. Coordinator gets w3’s
assignment
183Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T8T7
ID: w1 ID: w2 ID: w3
T3T1 T2 T6T4 T5
Static membership
1. …
8. Member w3 rejoins
9. Coordinator gets w3’s
assignment
10. Member w3 gets the same
assignment
11. Done!
184Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T8T7
ID: w1 ID: w2 ID: w3
T3T1 T2 T6T4 T5
185
185
185
Oops! I configured duplicate instances!
Fencing conflict
instance
● Maintain a mapping from
instance id to member id
186Coordinator
T3T1 T2 T6T4 T5
gID: w1
w1 -> m1,
w2 -> m2
Members
mID: m1
gID: w2
mID: m2
T1 T2 T3
T5T4 T6
Fencing conflict
instance
● Maintain a mapping from
instance id to member id
● Update member id when a
known instance rejoins
187Coordinator
T3T1 T2 T6T4 T5
gID: w1
w1 -> m1,
w2 -> m2
Members
mID: m1
gID: w2
mID: m2
T1 T2 T3
T5T4 T6
Fencing conflict
instance
● Maintain a mapping from
instance id to member id
● Update member id when a
known instance rejoins
1. One conflict member joins
188Coordinator
T3T1 T2 T6T4 T5
gID: w1
w1 -> m1,
w2 -> m2
Members
mID: m1
gID: w2
mID: m2
gID: w2
mID: --
T1 T2 T3
T5T4 T6
Fencing conflict
instance
● Maintain a mapping from
instance id to member id
● Update member id when a
known instance rejoins
1. One conflict member joins
2. Update w2’s member id to m3
189Coordinator
T3T1 T2 T6T4 T5
gID: w1
w1 -> m1,
w2 -> m2 m3
Members
mID: m1
gID: w2
mID: m2
gID: w2
mID: --
T1 T2 T3
T5T4 T6
Fencing conflict
instance
● Maintain a mapping from
instance id to member id
● Update member id when a
known instance rejoins
1. One conflict member joins
2. Update w2’s member id to m3
3. Old member m2 call heartbeat
190Coordinator
T3T1 T2 T6T4 T5
gID: w1
w1 -> m1,
w2 -> m2 m3
Members
mID: m1
gID: w2
mID: m2
gID: w2
mID: m3
T6T4 T5
T1 T2 T3
T5T4 T6
hb(m2)
Fencing conflict
instance
● Maintain a mapping from
instance id to member id
● Update member id when a
known instance rejoins
1. One conflict member joins
2. Update w2’s member id to m3
3. Old member m2 call heartbeat()
4. Member m2 will be fenced since
w2’s member id != m2
191Coordinator
T3T1 T2 T6T4 T5
gID: w1
w1 -> m1,
w2 -> m2 m3
Members
mID: m1
gID: w2
mID: m2
gID: w2
mID: m3
T6T4 T5
T1 T2 T3
T5T4 T6
hb(m2)
Fencing conflict
instance
● Maintain a mapping from
instance id to member id
● Update member id when a
known instance rejoins
1. One conflict member joins
2. Update w2’s member id to m3
3. Old member m2 call heartbeat()
4. Member m2 will be fenced since
w2’s member id != m2
5. Immediately crash m2
192Coordinator
T3T1 T2 T6T4 T5
gID: w1
w1 -> m1,
w2 -> m2 m3
Members
mID: m1
gID: w2
mID: m2
gID: w2
mID: m3
T6T4 T5
T1 T2 T3
T5T4 T6
hb(m2)
Fencing conflict
instance
1. One conflict member joins
2. Update w2’s member id to m3
3. Old member m2 call heartbeat()
4. Member m2 will be fenced since
w2’s member id != m2
5. Immediately crash m2
6. Group keeps stable
193Coordinator
T3T1 T2
gID: w1
w1 -> m1,
w2 -> m3
Members
mID: m1
gID: w2
mID: m3
T6T4 T5
T1 T2 T3
T5T4 T6
Fencing conflict
instance (Nice to
have)● A caveat for concurrent joining
194Coordinator
T3T1 T2
gID: w1
mID: m1
T1 T2 T3
w1 -> m1
Members
Fencing conflict
instance (Nice to
have)● A caveat for concurrent joining
1. First member joins with id w2
195Coordinator
gID: w1
w1 -> m1
w2 -> m2
Members
mID: m1
T1 T2 T3
gID: w2
mID: --
T3T1 T2
Fencing conflict
instance (Nice to
have)● A caveat for concurrent joining
1. First member joins with id w2
2. In the meantime, a conflict w2
joins
196Coordinator
gID: w1
w1 -> m1
w2 -> m2
Members
mID: m1
T1 T2 T3
gID: w2
mID: --
T3T1 T2
gID: w2
mID: --
Fencing conflict
instance (Nice to
have)● A caveat for concurrent joining
1. First member joins with id w2
2. In the meantime, a conflict w2
joins
3. Replace w2’s member id to m3
197Coordinator
gID: w1
w1 -> m1
w2 -> m2, m3
Members
mID: m1
T1 T2 T3
gID: w2
mID: --
T3T1 T2
gID: w2
mID: --
Fencing conflict
instance (Nice to
have)● A caveat for concurrent joining
1. First member joins with id w2
2. In the meantime, a conflict w2
joins
3. Replace w2’s member id to m3
4. Coordinator requires w1 to rejoin
198Coordinator
gID: w1
w1 -> m1
w2 -> m2 m3
Members
mID: m1
T1 T2 T3
gID: w2
mID: --
gID: w2
mID: --
T3T1 T2
Fencing conflict
instance (Nice to
have)● A caveat for concurrent joining
1. First member joins with id w2
2. In the meantime, a conflict w2
joins
3. Replace w2’s member id to m3
4. Coordinator requires w1 to rejoin
5. Group performs assignment
199Coordinator
gID: w1
w1 -> m1
w2 -> m2 m3
Members
mID: m1
gID: w2
mID: --
gID: w2
mID: --
T3T1 T2
T1 T2 T3
Fencing conflict
instance (Nice to
have)● A caveat for concurrent joining
1. First member joins with id w2
2. In the meantime, a conflict w2
joins
3. Replace w2’s member id to m3
4. Coordinator requires w1 to rejoin
5. Group performs assignment
6. Propagate new assignment
200Coordinator
gID: w1
w1 -> m1
w2 -> m3
Members
mID: m1
gID: w2
mID: --
gID: w2
mID: --
T1 T2
T1 T2 T3
Fencing conflict
instance (Nice to
have)● A caveat for concurrent joining
1. First member joins with id w2
2. In the meantime, a conflict w2
joins
3. Replace w2’s member id to m3
4. Coordinator requires w1 to rejoin
5. Group performs assignment
6. Propagate new assignment
7. Done!
201Coordinator
gID: w1
w1 -> m1
w2 -> m3
Members
mID: m1
gID: w2
mID: --
gID: w2
mID: m3
T1 T2
T1 T2 T3
T3
Fencing conflict
instance (Nice to
have)● A caveat for concurrent joining
1. …
8. Out of scope member times out,
rejoining
202Coordinator
gID: w1
w1 -> m1
w2 -> m3
Members
mID: m1
gID: w2
mID: --
gID: w2
mID: m3
T1 T2
T1 T2 T3
T3
Fencing conflict
instance (Nice to
have)● A caveat for concurrent joining
1. …
8. Out of scope member times out,
rejoining
9. Update w2’s member id to m4
203Coordinator
gID: w1
w1 -> m1
w2 -> m3 m4
Members
mID: m1
gID: w2
mID: --
gID: w2
mID: m3
T1 T2
T1 T2 T3
T3
Fencing conflict
instance (Nice to
have)● A caveat for concurrent joining
1. …
8. Out of scope member times out,
rejoining
9. Update w2’s member id to m4
10. Get conflict assignment
204Coordinator
gID: w1
w1 -> m1
w2 -> m3 m4
Members
mID: m1
gID: w2
mID: m4
gID: w2
mID: m3
T1 T2
T1 T2 T3
T3T3
Fencing conflict
instance (Nice to
have)● A caveat for concurrent joining
1. …
8. Out of scope member times out,
rejoining
9. Update w2’s member id to m4
10. Get conflict assignment
11. Old member m3 calls heartbeat()
205Coordinator
gID: w1
w1 -> m1
w2 -> m3 m4
Members
mID: m1
gID: w2
mID: m4
gID: w2
mID: m3
T1 T2
T1 T2 T3
T3T3
hb(m3)
Fencing conflict
instance (Nice to
have)● A caveat for concurrent joining
1. …
8. Out of scope member times out,
rejoining
9. Update w2’s member id to m4
10. Get conflict assignment
11. Old member m3 calls heartbeat()
12. Mismatch member id, fencing
m3
206Coordinator
gID: w1
w1 -> m1
w2 -> m3 m4
Members
mID: m1
gID: w2
mID: m4
gID: w2
mID: m3
T1 T2
T1 T2 T3
T3T3
hb(m3)
Fencing conflict
instance (Nice to
have)● A caveat for concurrent joining
● Downsides:
○ Risk of concurrent
processing
○ Delayed conflict detection
207Coordinator
gID: w1
w1 -> m1
w2 -> m3 m4
Members
mID: m1
gID: w2
mID: m4
gID: w2
mID: m3
T1 T2
T1 T2 T3
T3T3
hb(m3)
Fencing conflict
instance (Nice to
have)● Fence against callback
208Coordinator
T3T1 T2
gID: w1
mID: m1
T1 T2 T3
w1 -> m1
MembersJoin callback
Fencing conflict
instance (Nice to
have)● Fence against callback
1. New member with id w2 joins,
registering a callback
209Coordinator
gID: w1
w1 -> m1
w2 -> m2
Members
mID: m1
T1 T2 T3
gID: w2
mID: --
<w2, m2>
Join callback
T3T1 T2
Fencing conflict
instance (Nice to
have)● Fence against callback
1. New member with id w2 joins,
registering a callback
2. A conflict member joins at the
same time
210Coordinator
gID: w1
w1 -> m1
w2 -> m2
Members
mID: m1
T1 T2 T3
gID: w2
mID: --
gID: w2
mID: --
<w1, m1>
<w2, m2>
Join callback
T3T1 T2
Fencing conflict
instance (Nice to
have)● Fence against callback
1. New member with id w2 joins,
registering a callback
2. A conflict member joins at the
same time
3. Replace member id to m3
211Coordinator
gID: w1
w1 -> m1
w2 -> m2, m3
Members
mID: m1
T1 T2 T3
gID: w2
mID: --
gID: w2
mID: --
<w1, m1>
<w2, m2>
Join callback
T3T1 T2
Fencing conflict
instance (Nice to
have)● Fence against callback
1. New member with id w2 joins,
registering a callback
2. A conflict member joins at the
same time
3. Replace member id to m3
4. Return m2 callback with fenced
exception
212Coordinator
gID: w1
w1 -> m1,
w2 -> m3
Members
mID: m1
T1 T2 T3
gID: w2
mID: --
gID: w2
mID: --
<w1, m1>
<w2, m2> X
Join callback
T3T1 T2
Fencing conflict
instance (Nice to
have)● Fence against callback
1. New member with id w2 joins,
registering a callback
2. A conflict member joins at the
same time
3. Replace member id to m3
4. Return m2 callback with fenced
exception
5. Shutdown m2 immediately
213Coordinator
gID: w1
w1 -> m1,
w2 -> m3
Members
mID: m1
T1 T2 T3
gID: w2
mID: --
gID: w2
mID: --
<w1, m1>
<w2, m2> X
Join callback
T3T1 T2
Fencing conflict
instance (Nice to
have)● Fence against callback
1. …
6. Require all members to
revoke/rejoin
214Coordinator
gID: w1
w1 -> m1
w2 -> m3
Members
mID: m1
gID: w2
mID: --
<w1, m1>
<w2, m3>
Join callback
T3T1 T2
T1 T2 T3
Fencing conflict
instance (Nice to
have)● Fence against callback
1. …
6. Require all members to
revoke/rejoin
7. Performs assignment
215Coordinator
gID: w1
w1 -> m1
w2 -> m3
Members
mID: m1
T1 T2 T3
gID: w2
mID: --
<w1, m1>
<w2, m3>
Join callback
T3T1 T2
Fencing conflict
instance (Nice to
have)● Fence against callback
1. …
6. Require all members to
revoke/rejoin
7. Performs assignment
8. Propagate through callbacks,
done!
Coordinator
gID: w1
w1 -> m1
w2 -> m3
Members
mID: m1
gID: w2
mID: --
Join callback
T3T3T2
T1 T2 T3
217
217
217
Lastly, some debug tips
218
218
Developer
debugging …
219
219
Developer
debugging …
Find your
server!
220
220
Developer
debugging
1. Log in your client
application
…
Find your
server!
221
221
Developer
debugging [2019-06-14 00:23:47,020] INFO [Consumer
instanceId=consumer-A-2,
clientId=StaticMemberTestClient-019a5efe-87ef-
4c62-9891-27330df67049-StreamThread-2-
consumer, groupId=StaticMemberTestClient]
Discovered group coordinator ducker04:9092
(id: 2147483645 rack: null)
(org.apache.kafka.clients.consumer.internals.Abst
ractCoordinator)
1. Log in your client
application
2. Look into client log
and search for
“Discovered group
coordinator”
222
222
Developer
debugging
1. Log in your client
application
2. Look into client log
and search for
“Discovered group
coordinator”
3. Find your server and
log in
…
ducker04
Find your
server!
223
223
Developer
debugging [2019-06-14 00:23:47,389] INFO
[GroupCoordinator 2]: Preparing to rebalance
group StaticMemberTestClient in state
PreparingRebalance with old generation 0
(__consumer_offsets-2) (reason: Adding new
member consumer-A-1-1560471827287 with
group instanceid Some(consumer-A-1))
(kafka.coordinator.group.GroupCoordinator)
1. Log in your client
application
2. Look into client log
and search for
“Discovered group
coordinator”
3. Find your server and
log in
4. Check server log for
“rebalance reason”
224
224
Developer
debugging
Server metrics:
● NumGroupsPreparing
Rebalance
● NumGroupsCompleti
ngRebalance
● NumGroupsStable
● NumGroupsDead
● NumGroupsEmpty
…
225
225
Developer
debugging
Client metrics:
● Join-rate/total
● Join-time-avg/max
● Sync-rate/total
● Sync-time-avg/max
● Assigned-partitions
● Commit-rate/total
● Heartbeat-rate/total
…
226
226
226226
Takeaways
227
227
227227
● Different timeouts:
○ Enlarge your session.timeout.ms to achieve better stability
○ max.poll.interval.ms is the tolerance of member poll efficiency
○ rebalance.timeout.ms will kick out unjoined members when due
Takeaways
228
228
228228
● Different timeouts:
○ Enlarge your session.timeout.ms to achieve better stability
○ max.poll.interval.ms is the tolerance of member poll efficiency
○ rebalance.timeout.ms will kick out unjoined members when due
● What is group generation?
Takeaways
229
229
229229
● Different timeouts:
○ Enlarge your session.timeout.ms to achieve better stability
○ max.poll.interval.ms is the tolerance of member poll efficiency
○ rebalance.timeout.ms will kick out unjoined members when due
● What is group generation?
● Why we let the client do assignment?
Takeaways
230
230
230230
● Different timeouts:
○ Enlarge your session.timeout.ms to achieve better stability
○ max.poll.interval.ms is the tolerance of member poll efficiency
○ rebalance.timeout.ms will kick out unjoined members when due
● What is group generation?
● Why we let the client do assignment?
● Static membership is generally available in AK 2.3, for Consumer and
Streams
○ Upgrade your broker to 2.3
○ Set unique group.instance.id for your client (monitoring fencing)
○ Make session timeout long enough
Takeaways
231
Resources
• KIP-62: Allow consumer to send heartbeats from a background thread
• KIP-180: Add a broker metric specifying the number of consumer group rebalances in progress
• KIP-345: Introduce static membership protocol to reduce consumer rebalances (accepted)
• Kafka Client redesign proposal
• "The Magical Rebalance Protocol of Apache Kafka" by Gwen Shapira (Strange Loop Talk, Sep 2018)
https://www.youtube.com/watch?v=MmLezWRI3Ys&t=8s
232
232
232
Special thanks to
Guozhang Wang, Jason Gustafson, Liquan
Pei and Matthias J Sax
233
233
KS19Meetup.
CONFLUENT COMMUNITY DISCOUNT CODE
25% OFF*
*Standard Priced Conference pass

Más contenido relacionado

Más de confluent

Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Diveconfluent
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluentconfluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Meshconfluent
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservicesconfluent
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3confluent
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernizationconfluent
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataconfluent
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2confluent
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023confluent
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesisconfluent
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023confluent
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streamsconfluent
 
The Journey to Data Mesh with Confluent
The Journey to Data Mesh with ConfluentThe Journey to Data Mesh with Confluent
The Journey to Data Mesh with Confluentconfluent
 
Citi Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and PerformanceCiti Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and Performanceconfluent
 
Confluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with ReplyConfluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with Replyconfluent
 
Citi Tech Talk Disaster Recovery Solutions Deep Dive
Citi Tech Talk  Disaster Recovery Solutions Deep DiveCiti Tech Talk  Disaster Recovery Solutions Deep Dive
Citi Tech Talk Disaster Recovery Solutions Deep Diveconfluent
 
Citi Tech Talk: Hybrid Cloud
Citi Tech Talk: Hybrid CloudCiti Tech Talk: Hybrid Cloud
Citi Tech Talk: Hybrid Cloudconfluent
 
Partner Tech Talk Q3: Q&A with PS - Migration and Upgrade
Partner Tech Talk Q3: Q&A with PS - Migration and UpgradePartner Tech Talk Q3: Q&A with PS - Migration and Upgrade
Partner Tech Talk Q3: Q&A with PS - Migration and Upgradeconfluent
 
Confluent Partner Tech Talk with QLIK
Confluent Partner Tech Talk with QLIKConfluent Partner Tech Talk with QLIK
Confluent Partner Tech Talk with QLIKconfluent
 
Real-time Streaming for Government and the Public Sector
Real-time Streaming for Government and the Public SectorReal-time Streaming for Government and the Public Sector
Real-time Streaming for Government and the Public Sectorconfluent
 

Más de confluent (20)

Citi TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep DiveCiti TechTalk Session 2: Kafka Deep Dive
Citi TechTalk Session 2: Kafka Deep Dive
 
Build real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with ConfluentBuild real-time streaming data pipelines to AWS with Confluent
Build real-time streaming data pipelines to AWS with Confluent
 
Q&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service MeshQ&A with Confluent Professional Services: Confluent Service Mesh
Q&A with Confluent Professional Services: Confluent Service Mesh
 
Citi Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka MicroservicesCiti Tech Talk: Event Driven Kafka Microservices
Citi Tech Talk: Event Driven Kafka Microservices
 
Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3Confluent & GSI Webinars series - Session 3
Confluent & GSI Webinars series - Session 3
 
Citi Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging ModernizationCiti Tech Talk: Messaging Modernization
Citi Tech Talk: Messaging Modernization
 
Citi Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time dataCiti Tech Talk: Data Governance for streaming and real time data
Citi Tech Talk: Data Governance for streaming and real time data
 
Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2Confluent & GSI Webinars series: Session 2
Confluent & GSI Webinars series: Session 2
 
Data In Motion Paris 2023
Data In Motion Paris 2023Data In Motion Paris 2023
Data In Motion Paris 2023
 
Confluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with SynthesisConfluent Partner Tech Talk with Synthesis
Confluent Partner Tech Talk with Synthesis
 
The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023The Future of Application Development - API Days - Melbourne 2023
The Future of Application Development - API Days - Melbourne 2023
 
The Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data StreamsThe Playful Bond Between REST And Data Streams
The Playful Bond Between REST And Data Streams
 
The Journey to Data Mesh with Confluent
The Journey to Data Mesh with ConfluentThe Journey to Data Mesh with Confluent
The Journey to Data Mesh with Confluent
 
Citi Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and PerformanceCiti Tech Talk: Monitoring and Performance
Citi Tech Talk: Monitoring and Performance
 
Confluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with ReplyConfluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with Reply
 
Citi Tech Talk Disaster Recovery Solutions Deep Dive
Citi Tech Talk  Disaster Recovery Solutions Deep DiveCiti Tech Talk  Disaster Recovery Solutions Deep Dive
Citi Tech Talk Disaster Recovery Solutions Deep Dive
 
Citi Tech Talk: Hybrid Cloud
Citi Tech Talk: Hybrid CloudCiti Tech Talk: Hybrid Cloud
Citi Tech Talk: Hybrid Cloud
 
Partner Tech Talk Q3: Q&A with PS - Migration and Upgrade
Partner Tech Talk Q3: Q&A with PS - Migration and UpgradePartner Tech Talk Q3: Q&A with PS - Migration and Upgrade
Partner Tech Talk Q3: Q&A with PS - Migration and Upgrade
 
Confluent Partner Tech Talk with QLIK
Confluent Partner Tech Talk with QLIKConfluent Partner Tech Talk with QLIK
Confluent Partner Tech Talk with QLIK
 
Real-time Streaming for Government and the Public Sector
Real-time Streaming for Government and the Public SectorReal-time Streaming for Government and the Public Sector
Real-time Streaming for Government and the Public Sector
 

Último

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 

Último (20)

EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 

Rebalance Protocol Inside-out: a Developer Perspective

  • 1. 1 1 Rebalance Protocol inside-out: a Developer Perspective Boyang Chen
  • 2. 2 2 2 Agenda ● What is rebalancing? ● How to design a rebalancing protocol? ● Unnecessary rebalances ● Look into the future: static membership ● Debugging tip 2
  • 4. 4 4 Boyang Chen’s Bio ● Software engineer (Kafka Streams)
  • 5. 5 5 Boyang Chen’s Bio ● Software engineer (Kafka Streams) ● Software engineer (Ads infrastructure)
  • 6. 6 6 Boyang Chen’s Bio ● Software engineer (Kafka Streams) ● Software engineer (Ads infrastructure) ● Kafka Summit SF 2018: Building Pinterest Real-Time Ads Platform Using Kafka Streams
  • 7. 7 7 Boyang Chen’s Bio ● Software engineer (Kafka Streams) ● Software engineer (Ads infrastructure) ● Kafka Summit SF 2018: Building Pinterest Real-Time Ads Platform Using Kafka Streams ● Kafka Summit SF 2019: Static Membership: Rebalance Strategy Designed for the Cloud
  • 10. What is rebalancing? ● Group membership ● Resource assignment 10
  • 11. What is rebalancing? ● Group membership ● Resource assignment ● Example: Coordinator – Worker model 11Coordinator T1 T2 T3 T5T4 T6
  • 12. What is rebalancing? ● Group membership ● Resource assignment ● Example: Coordinator – Worker model 1. New members join the group 12Coordinator T1 T2 T3 T5T4 T6
  • 13. What is rebalancing? ● Group membership ● Resource assignment ● Example: Coordinator – Worker model 1. New members join the group 13Coordinator T1 T2 T3 T5T4 T6
  • 14. What is rebalancing? ● Group membership ● Resource assignment ● Example: Coordinator – Worker model 1. New members join the group 2. Perform assignment 14Coordinator T1 T2 T3 T5T4 T6
  • 15. What is rebalancing? ● Group membership ● Resource assignment ● Example: Coordinator – Worker model 1. New members join the group 2. Perform assignment 3. Propagate 15Coordinator T3T1 T2 T1 T2 T3 T5T4 T6
  • 16. What is rebalancing? ● Group membership ● Resource assignment ● Example: Coordinator – Worker model 1. New members join the group 2. Perform assignment 3. Propagate 4. Done! 16Coordinator T3T1 T2 T1 T2 T3 T5T4 T6 T6T4 T5
  • 17. 17 17 17 How to design a rebalance protocol?
  • 18. 18 18 18 How to design a rebalance protocol? 1. Membership changes:
  • 19. 19 19 19 How to design a rebalance protocol? 1. Membership changes: (a) Member joins/leaves the group
  • 20. 20 20 20 How to design a rebalance protocol? 1. Membership changes: (a) Member joins/leaves the group (b) Member times out
  • 21. 21 21 21 How to design a rebalance protocol? 1. Membership changes: (a) Member joins/leaves the group (b) Member times out (c) Zombie member fencing
  • 22. 22 22 22 How to design a rebalance protocol? 1. Membership changes: (a) Member joins/leaves the group (b) Member times out (c) Zombie member fencing 2. Assignor changes
  • 23. 23 23 23 How to design a rebalance protocol? 1. Membership changes: (a) Member joins/leaves the group (b) Member times out (c) Zombie member fencing 2. Assignor changes 3. Task changes
  • 24. Catch membership change 1. Spin up a new member 24Coordinator T3T1 T2 T1 T2 T3 T5T4 T6 T6T4 T5
  • 25. 25Coordinator T3T1 T2 T1 T2 T3 T5T4 T6 T6T4 T5 Catch membership change 1. Spin up a new member 2. A new member joins
  • 26. 26Coordinator T3T1 T2 T1 T2 T3 T5T4 T6 T6T4 T5 Catch membership change 1. Spin up a new member 2. A new member joins 3. Revoke active tasks 4. Require members to rejoin
  • 27. Catch membership change 1. Spin up a new member 2. A new member joins 3. Revoke active tasks 4. Require members to rejoin 27Coordinator T3T1 T2 T6T4 T5 T1 T2 T3 T5T4 T6
  • 28. Catch membership change 1. Spin up a new member 2. A new member joins 3. Revoke active tasks 4. Require members to rejoin 5. Perform new assignment 28Coordinator T3T1 T2 T6T4 T5 T1 T2 T3 T5T4 T6
  • 29. Catch membership change 1. Spin up a new member 2. A new member joins 3. Revoke active tasks 4. Require members to rejoin 5. Perform new assignment 6. Propagate to members 29Coordinator T6T4 T5 T1 T2 T3 T5T4 T6 T4T1
  • 30. Catch membership change 1. Spin up a new member 2. A new member joins 3. Revoke active tasks 4. Require members to rejoin 5. Perform new assignment 6. Propagate to members 30Coordinator T1 T2 T3 T5T4 T6 T4T1 T5T2
  • 31. Catch membership change 1. Spin up a new member 2. A new member joins 3. Revoke active tasks 4. Require members to rejoin 5. Perform new assignment 6. Propagate to members 7. Done! 31Coordinator T1 T2 T3 T5T4 T6 T4T1 T5T2 T6T3
  • 32. Catch membership change 1. Remove an active member 32Coordinator T1 T2 T3 T5T4 T6 T4T1 T5T2 T6T3
  • 33. Catch membership change 1. Remove an active member 2. Member sends leave group request 33Coordinator T1 T2 T3 T5T4 T6 T4T1 T5T2 T6T3
  • 34. Catch membership change 1. Remove an active member 2. Member sends leave group request 3. Revoke other members’ active tasks 34Coordinator T1 T2 T3 T5T4 T6 T4T1 T5T2
  • 35. Catch membership change 1. Remove an active member 2. Member sends leave group request 3. Revoke other members’ active tasks 4. Require members to rejoin 35Coordinator T1 T2 T3 T5T4 T6 T4T1 T5T2
  • 36. Catch membership change 1. Remove an active member 2. Member sends leave group request 3. Revoke other members’ active tasks 4. Require members to rejoin 36Coordinator T1 T2 T3 T5T4 T6 T4T1 T5T2
  • 37. Catch membership change 1. Remove an active member 2. Member sends leave group request 3. Revoke other members’ active tasks 4. Require members to rejoin 5. Perform assignment 37Coordinator T1 T2 T3 T5T4 T6 T4T1 T5T2
  • 38. Catch membership change 1. Remove an active member 2. Member sends leave group request 3. Revoke other members’ active tasks 4. Require members to rejoin 5. Perform assignment 6. Propagate to members 38Coordinator T1 T2 T3 T5T4 T6 T5T2T3T1 T2
  • 39. Catch membership change 1. Remove an active member 2. Member sends leave group request 3. Revoke other members’ active tasks 4. Require members to rejoin 5. Perform assignment 6. Propagate to members 7. Done! 39Coordinator T1 T2 T3 T5T4 T6 T3T1 T2 T6T4 T5
  • 40. 40 40 40 How to design a rebalance protocol? 1. Membership changes: (a) Member joins/leaves the group (b) Member times out (c) Zombie member fencing 2. Assignor changes 3. Task changes
  • 42. Timeout configs ● Liveness guarantee ○ session.timeout.ms 42
  • 43. Timeout configs ● Liveness guarantee ○ session.timeout.ms ● Progress guarantee ○ max.poll.interval.ms ○ rebalance.timeout.ms 43
  • 45. Session timeout ● Background thread sends heartbeat 45Coordinator T3T1 T2 T1 T2 T3 T5T4 T6 T6T4 T5
  • 46. Session timeout 1. Member crashes without sending leave group 46Coordinator T3T1 T2 T1 T2 T3 T5T4 T6 T6T4 T5
  • 47. Session timeout 1. Member crashes without sending leave group 2. Session timeout reaches 47Coordinator T3T1 T2 T1 T2 T3 T5T4 T6 T6T4 T5
  • 48. Session timeout 1. Member crashes without sending leave group 2. Session timeout reaches 3. Require other members to revoke tasks/rejoin 48Coordinator T3T1 T2 T1 T2 T3 T5T4 T6
  • 49. Session timeout 1. Member crashes without sending leave group 2. Session timeout reaches 3. Require other members to revoke tasks/rejoin 49Coordinator T3T1 T2 T1 T2 T3 T5T4 T6
  • 50. Session timeout 1. Member crashes without sending leave group 2. Session timeout reaches 3. Require other members to revoke tasks/rejoin 4. Perform Assignment 50Coordinator T3T1 T2 T1 T2 T3 T5T4 T6
  • 51. Session timeout 1. Member crashes without sending leave group 2. Session timeout reaches 3. Require other members to revoke tasks/rejoin 4. Perform Assignment 5. Propagate… 6. Done! 51Coordinator T3T1 T2 T6T4 T5 T1 T2 T3 T5T4 T6
  • 53. Max poll interval timeout ● Poll – Process – Commit 53Coordinator T3T1 T2 Poll() … Poll() … T1 T2 T4T3 T4
  • 54. Max poll interval timeout ● Poll – Process – Commit 54Coordinator T3T1 T2 Poll() … Poll() … Poll() … Poll() … T1 T2 T4T3 T4
  • 55. Max poll interval timeout ● Poll – Process – Commit ● One process takes too long 55Coordinator T3T1 T2 Poll() … Poll() … Poll() … T1 T2 T4T3 Poll() … Poll() … Poll() ………… T4
  • 56. Poll() … Poll() … Poll() ………… ………… Max poll interval timeout ● Poll – Process – Commit ● One process takes too long ● Reach timeout limit 56Coordinator T3T1 T2 Poll() … Poll() … Poll() … T1 T2 T4T3 T4
  • 57. Poll() … Poll() … Poll() ………… ………… >= max.poll.interval.ms Max poll interval timeout ● Poll – Process – Commit ● One process takes too long ● Reach timeout limit 1. Member takes too long to process 57Coordinator T3T1 T2 Poll() … Poll() … Poll() … T1 T2 T4T3 T4
  • 58. Poll() … Poll() … Poll() ………… ………… >= max.poll.interval.ms Max poll interval timeout ● Poll – Process – Commit ● One process takes too long ● Reach timeout limit 1. Member takes too long to process 2. Background thread stops heartbeat and sends leave group 58Coordinator T3T1 T2 Poll() … Poll() … Poll() …T4 T1 T2 T4T3
  • 59. Max poll interval timeout ● Poll – Process – Commit ● One process takes too long ● Reach timeout limit 1. Member takes too long to process 2. Background thread stops heartbeat and sends leave group 3. Ask others to revoke task/rejoin 59Coordinator T3T1 T2 T4 T1 T2 T4T3
  • 60. Max poll interval timeout ● Poll – Process – Commit ● One process takes too long ● Reach timeout limit 1. Member takes too long to process 2. Background thread stops heartbeat and sends leave group 3. Ask others to revoke task/rejoin 4. Perform assignment 60Coordinator T3T1 T2 T4 T1 T2 T4T3
  • 61. Max poll interval timeout ● Poll – Process – Commit ● One process takes too long ● Reach timeout limit 1. Member takes too long to process 2. Background thread stops heartbeat and sends leave group 3. Ask others to revoke task/rejoin 4. Perform assignment 5. Propagate to members, done! 61Coordinator T1 T2 T3 T5T4 T6 T4T2T3T1
  • 62. Rebalance timeout ● Max time for a member to rejoin during rebalance 62Coordinator T3T1 T2 T1 T2 T3 T5T4 T6 T6T4 T5
  • 63. Rebalance timeout ● Max time for a member to rejoin during rebalance ● Use the max value of max.poll.interval among all clients ○ Member has to finish ongoing work 63Coordinator T3T1 T2 T1 T2 T3 T5T4 T6 T6T4 T5
  • 64. Rebalance timeout ● Max time for a member to rejoin during rebalance ● Use the max value of max.poll.interval among all clients ○ Member has to finish ongoing work ● Track with given member id 64Coordinator T3T1 T2 T1 T2 T3 T5T4 T6 T6T4 T5 m1, m2 Members mID: m1 mID: m2
  • 65. Rebalance timeout ● Max time for a member to rejoin during rebalance ● Use the max value of max.poll.interval among all clients ○ Member has to finish ongoing work ● Track with given member id ● Register callback 65Coordinator T3T1 T2 T1 T2 T3 T5T4 T6 T6T4 T5 m1, m2 Members mID: m1 mID: m2 Join callback
  • 66. Rebalance timeout ● … ● Register callback 1. Group starts to rebalance 66Coordinator T3T1 T2 T6T4 T5 T1 T2 T3 T5T4 T6 mID: m1 mID: m2 m1, m2 MembersJoin callback
  • 67. Rebalance timeout ● … ● Register callback 1. Group starts to rebalance 2. Member m1 rejoins successfully 67Coordinator T3T1 T2 T6T4 T5 T1 T2 T3 T5T4 T6 mID: m1 mID: m2 m1, m2 Members <m1> Join callback
  • 68. Rebalance timeout ● … ● Register callback 1. Group starts to rebalance 2. Member m1 rejoins successfully 3. Member m2 gets stuck 68Coordinator T3T1 T2 T1 T2 T3 T5T4 T6 mID: m1 mID: m2 m1, m2 Members <m1> Join callback T6T4 T5
  • 69. Rebalance timeout ● … ● Register callback 1. Group starts to rebalance 2. Member m1 rejoins successfully 3. Member m2 gets stuck 4. Rebalance timeout reached 69Coordinator T3T1 T2 T1 T2 T3 T5T4 T6 mID: m1 mID: m2 m1, m2 Members <m1> Join callback T6T4 T5
  • 70. Rebalance timeout ● … ● Register callback 1. Group starts to rebalance 2. Member m1 rejoins successfully 3. Member m2 gets stuck 4. Rebalance timeout reached 5. Perform assignment 70Coordinator T3T1 T2 T1 T2 T3 T5T4 T6 mID: m1 mID: m2 m1 Members <m1> Join callback T6T4 T5
  • 71. Rebalance timeout ● … ● Register callback 1. Group starts to rebalance 2. Member m1 rejoins successfully 3. Member m2 gets stuck 4. Rebalance timeout reached 5. Perform assignment 6. Propagate, done! 71Coordinator mID: m1 mID: m2 m1 MembersJoin callback T6T4 T5T3T1 T2 T6T4 T5 T1 T2 T3 T5T4 T6 mID: m1
  • 72. 72 72 72 How to design a rebalance protocol? 1. Membership changes: (a) Member joins/leaves the group (b) Member times out (c) Zombie member fencing 2. Assignor changes 3. Task changes
  • 73. Fencing zombie 73Coordinator mID: m1 mID: m2 m1 Members T6T4 T5T3T1 T2 T6T4 T5 T1 T2 T3 T5T4 T6 mID: m1
  • 74. Fencing zombie ● What if a zombie member rejoins? 74Coordinator mID: m1 mID: m2 m1 Members T6T4 T5T3T1 T2 T6T4 T5 T1 T2 T3 T5T4 T6 mID: m1
  • 75. Fencing zombie ● Bump generation number after each rebalance 75Coordinator T3T1 T2 gen: 1 m1, m2 Members mID: m1 T6T4 T5 gen: 1 mID: m2 T1 T2 T3 T5T4 T6 Generation 1
  • 76. Fencing zombie ● Bump generation number after each rebalance 1. Member m1 rejoins group 76Coordinator T3T1 T2 gen: 1 m1, m2 Members mID: m1 T6T4 T5 gen: 1 mID: m2 T1 T2 T3 T5T4 T6 Generation 1
  • 77. Fencing zombie ● Bump generation number after each rebalance 1. Member m1 rejoins group 2. All members rejoin/revoke tasks 77Coordinator T3T1 T2 gen: 1 m1, m2 Members mID: m1 T6T4 T5 gen: 1 mID: m2 T1 T2 T3 T5T4 T6 Generation 1
  • 78. Fencing zombie ● Bump generation number after each rebalance 1. Member m1 rejoins group 2. All members rejoin/revoke tasks 3. Bump generation number 78Coordinator T3T1 T2 gen: 1 m1, m2 Members mID: m1 T6T4 T5 gen: 1 mID: m2 T1 T2 T3 T5T4 T6 Generation 1 Generation 2
  • 79. Fencing zombie ● Bump generation number after each rebalance 1. Member m1 rejoins group 2. All members rejoin/revoke tasks 3. Bump generation number 4. Perform assignment 79Coordinator T3T1 T2 gen: 1 m1, m2 Members mID: m1 T6T4 T5 gen: 1 mID: m2 T1 T2 T3 T5T4 T6 Generation 2
  • 80. Fencing zombie ● Bump generation number after each rebalance 1. Member m1 rejoins group 2. All members rejoin/revoke tasks 3. Bump generation number 4. Perform assignment 5. Propagate, and Done! 6. Group currently stable at generation 2 80Coordinator T3T1 T2 gen: 2 m1, m2 Members mID: m1 T6T4 T5 gen: 2 mID: m2 T1 T2 T3 T5T4 T6 Generation 2
  • 81. Fencing zombie 1. … 6. Group currently stable at generation 2 7. Rebalance triggers again 81Coordinator T3T1 T2 gen: 2 m1, m2 Members mID: m1 T6T4 T5 gen: 2 mID: m2 T1 T2 T3 T5T4 T6 Generation 2 Join callback
  • 82. Fencing zombie 1. … 6. Group currently stable at generation 2 7. Rebalance triggers again 8. Member m1 rejoins 82Coordinator T3T1 T2 gen: 2 m1, m2 Members mID: m1 T6T4 T5 gen: 2 mID: m2 T1 T2 T3 T5T4 T6 Generation 2 <m1> Join callback
  • 83. Fencing zombie 1. … 6. Group currently stable at generation 2 7. Rebalance triggers again 8. Member m1 rejoins 9. Member m2 has transient failure 83Coordinator T3T1 T2 gen: 2 m1, m2 Members mID: m1 T6T4 T5 gen: 2 mID: m2 T1 T2 T3 T5T4 T6 Generation 2 <m1> Join callback
  • 84. Fencing zombie 1. … 6. Group currently stable at generation 2 7. Rebalance triggers again 8. Member m1 rejoins 9. Rebalance timeout reaches, kicking out m2 84Coordinator T3T1 T2 gen: 2 m1, m2 Members mID: m1 T6T4 T5 gen: 2 mID: m2 T1 T2 T3 T5T4 T6 Generation 2 <m1> Join callback
  • 85. Fencing zombie 1. … 6. Group currently stable at generation 2 7. Rebalance triggers again 8. Member m1 rejoins 9. Rebalance timeout reaches, kicking out m2 10. Bump generation to 3 85Coordinator T3T1 T2 gen: 2 m1, m2 Members mID: m1 T6T4 T5 gen: 2 mID: m2 T1 T2 T3 T5T4 T6 Generation 2 Generation 3 <m1> Join callback
  • 86. Fencing zombie 1. … 6. Group currently stable at generation 2 7. Rebalance triggers again 8. Member m1 rejoins 9. Rebalance timeout reaches, kicking out m2 10. Bump generation to 3 11. Perform assignment 86Coordinator T3T1 T2 gen: 2 m1, m2 Members mID: m1 T6T4 T5 gen: 2 mID: m2 T1 T2 T3 T5T4 T6 Generation 2 Generation 3 <m1> Join callback
  • 87. Fencing zombie 1. … 6. Group currently stable at generation 2 7. Rebalance triggers again 8. Member m1 rejoins 9. Rebalance timeout reaches, kicking out m2 10. Bump generation to 3 11. Perform assignment 12. Propagate, and done! 13. Group stable at generation 3 87Coordinator gen: 3 mID: m1 T6T4 T5 gen: 2 mID: m2 T1 T2 T3 T5T4 T6 m1 Members Generation 3 Join callback T6T1 T2 …
  • 88. Fencing zombie 1. … 11. Group stable at generation 3 12. Member m2 rejoins in a zombie mode 88Coordinator gen: 3 mID: m1 T6T4 T5 gen: 2 mID: m2 T1 T2 T3 T5T4 T6 Join callback … m1 Members Generation 3 T6T1 T2 …
  • 89. Fencing zombie 1. … 11. Group stable at generation 3 12. Member m2 rejoins in a zombie mode 13. Fenced by mismatched generation 89Coordinator gen: 3 mID: m1 T6T4 T5 gen: 2 mID: m2 T1 T2 T3 T5T4 T6 Join callback … X m1 Members Generation 3 T6T1 T2 …
  • 90. Fencing zombie 1. … 11. Group stable at generation 3 12. Member m2 rejoins in a zombie mode 13. Fenced by mismatched generation 14. Reset local generation info 90Coordinator gen: 3 mID: m1 gen: -- mID: -- T1 T2 T3 T5T4 T6 Join callback … m1 Members Generation 3 T6T1 T2 …
  • 91. Fencing zombie 1. … 11. Group stable at generation 3 12. Member m2 rejoins in a zombie mode 13. Fenced by mismatched generation 14. Reset local generation info 15. Rejoin as unknown member without generation 91Coordinator gen: 3 mID: m1 gen: -- mID: -- T1 T2 T3 T5T4 T6 Join callback … m1 Members Generation 3 T6T1 T2 …
  • 92. Fencing zombie 1. … 11. Group stable at generation 3 12. Member m2 rejoins in a zombie mode 13. Fenced by mismatched generation 14. Reset local generation info 15. Rejoin as unknown member without generation 16. Registered as m3 17. Group transits to rebalance 92Coordinator gen: 3 mID: m1 gen: -- mID: -- T1 T2 T3 T5T4 T6 … m1, m3 Members Generation 3 <m3> <m1> Join callback T6T1 T2 …
  • 93. Fencing zombie 1. … 17. Group transits to rebalance 18. Bump generation to 4 93Coordinator gen: 3 mID: m1 gen: -- mID: -- T1 T2 T3 T5T4 T6 … m1, m3 Members Generation 3 Generation 4 <m3> <m1> Join callback T6T1 T2 …
  • 94. Fencing zombie 1. … 17. Group transits to rebalance 18. Bump generation to 4 19. Perform assignment 94Coordinator gen: 3 mID: m1 gen: -- mID: -- T1 T2 T3 T5T4 T6 … m1, m3 Members Generation 3 Generation 4 <m3> <m1> Join callback T6T1 T2 …
  • 95. Fencing zombie 1. … 17. Group transits to rebalance 18. Bump generation to 4 19. Perform assignment 20. Propagate, and done! 95Coordinator gen: 4 mID: m1 gen: 4 mID: m3 m1, m3 Members Generation 4 Join callback T3T1 T2 T6T4 T5 T1 T2 T3 T5T4 T6
  • 96. 96 96 96 How to design a rebalance protocol? 1. Membership changes: (a) Member joins/leaves the group (b) Member times out (c) Zombie member fencing 2. Assignor changes 3. Task changes
  • 97. 97 97 97 Should we do use broker or client as the assignor?
  • 98. Do assignment on broker 1. Stable with range assignment 98Coordinator T1 T2 T3 T5T4 T6 T7 T8 T3T1 T2 T6T4 T5 T8T7 Range
  • 99. Do assignment on broker 1. Stable with range assignment 2. Redeploy coordinator to use RR 99Coordinator T1 T2 T3 T5T4 T6 T7 T8 T3T1 T2 T6T4 T5 T8T7 Range Round Robin
  • 100. Do assignment on broker 1. Stable with range assignment 2. Redeploy coordinator to use RR 3. Coordinator bounce completes 100Coordinator T1 T2 T3 T5T4 T6 T7 T8 T3T1 T2 T6T4 T5 T8T7 Round Robin
  • 101. Do assignment on broker 1. Stable with range assignment 2. Redeploy coordinator to use RR 3. Coordinator bounce completes ● Not an ideal approach ○ Restart stateful service ○ Affect other clients ○ Data protocol consistency 101Coordinator T1 T2 T3 T5T4 T6 T7 T8 T3T1 T2 T6T4 T5 T8T7 Round Robin
  • 102. Do assignment on client 1. Designated leader member 102Coordinator T3T1 T2 T6T4 T5 T8T7 T1 T2 T5T4 T7 T8 T3 T6 Range
  • 103. 1. Designated leader member 2. Redeploy leader to use RR 103Coordinator T3T1 T2 T6T4 T5 T8T7 T1 T2 T5T4 T7 T8 T3 T6 Range Round Robin Do assignment on client
  • 104. Do assignment on client 1. Designated leader member 2. Redeploy leader to use RR 3. Leader restarted 104Coordinator T3T1 T2 T6T4 T5 T8T7 T1 T2 T5T4 T7 T8 T3 T6 Round Robin
  • 105. Do assignment on client 1. Designated leader member 2. Redeploy leader to use RR 3. Leader restarted 4. Leader asks coordinator to rebalance 105Coordinator T6T4 T5 T8T7 T1 T2 T5T4 T7 T8 T3 T6 Round Robin T3T1 T2
  • 106. Do assignment on client 1. Designated leader member 2. Redeploy leader to use RR 3. Leader restarted 4. Leader asks coordinator to rebalance 5. Coordinator requires members to revoke tasks/rejoin 106Coordinator T3T1 T2 T6T4 T5 Round Robin T1 T2 T5T4 T7 T8 T3 T6 T8T7
  • 107. Do assignment on client 1. Designated leader member 2. Redeploy leader to use RR 3. Leader restarted 4. Leader asks coordinator to rebalance 5. Coordinator requires members to revoke tasks/rejoin 107Coordinator T3T1 T2 T6T4 T5 Round Robin T1 T2 T5T4 T7 T8 T3 T6 T8T7
  • 108. Do assignment on client 1. Designated leader member 2. Redeploy leader to use RR 3. Leader restarted 4. Leader asks coordinator to rebalance 5. Coordinator requires members to revoke tasks/rejoin 6. Coordinator inform leader all the members rejoined 108Coordinator T3T1 T2 T6T4 T5 T8T7 T1 T2 T5T4 T7 T8 T3 T6 Round Robin
  • 109. Do assignment on client 1. … 5. Coordinator requires members to revoke tasks/rejoin 6. Coordinator inform leader all the members rejoined 7. Leader performs assignment 109Coordinator T3T1 T2 T6T4 T5 T8T7 T1 T2 T5T4 T7 T8 T3 T6 Round Robin
  • 110. Do assignment on client 1. … 5. Coordinator requires members to revoke tasks/rejoin 6. Coordinator inform leader all the members rejoined 7. Leader performs assignment 8. Leader calls sync group to send back assignment 110Coordinator T3T1 T2 T6T4 T5 T8T7 T1 T2 T5T4 T7 T8 T3 T6 Round Robin
  • 111. Do assignment on client 1. … 5. Coordinator requires members to revoke tasks/rejoin 6. Coordinator inform leader all the members rejoined 7. Leader performs assignment 8. Leader calls sync group to send back assignment 9. Coordinator propagates… 10. Done! 111Coordinator T7T1 T4 T8T2 T5 T6T3 T1 T2 T5T4 T7 T8 T3 T6
  • 112. 112 112 112 How to design a rebalance protocol? 1. Membership changes: (a) Member joins/leaves the group (b) Member times out (c) Zombie member fencing 2. Assignor changes 3. Task changes
  • 113. Catch task change 113Coordinator T4T1 T5T2 T6T3 T1 T2 T3 T5T4 T6
  • 114. Catch task change 1. Add two new tasks 114Coordinator T4T1 T5T2 T6T3 T1 T2 T3 T5T4 T6 T7 T8
  • 115. Catch task change 1. Add two new tasks 2. Revoke all members’ active tasks 3. Require members to rejoin 115Coordinator T4T1 T5T2 T6T3 T1 T2 T3 T5T4 T6 T7 T8
  • 116. Catch task change 1. Add two new tasks 2. Revoke all members’ active tasks 3. Require members to rejoin 4. Perform assignment 5. Propagate to members 6. Done! 116Coordinator T4T1 T5T2 T6T3 T1 T2 T3 T5T4 T6 T7 T8
  • 117. Catch task change 1. Add two new tasks 2. Revoke all members’ active tasks 3. Require members to rejoin 117Coordinator T4T1 T5T2 T6T3 T1 T2 T3 T5T4 T6 T7 T8
  • 118. Catch task change 1. Add two new tasks 2. Revoke all members’ active tasks 3. Require members to rejoin 4. Perform assignment 118Coordinator T4T1 T5T2 T6T3 T1 T2 T3 T5T4 T6 T7 T8
  • 119. Catch task change 1. Add two new tasks 2. Revoke all members’ active tasks 3. Require members to rejoin 4. Perform assignment 5. Propagate to members 119Coordinator T5T2 T6T3 T1 T2 T3 T5T4 T6 T7 T8 T3T1 T2
  • 120. Catch task change 1. Add two new tasks 2. Revoke all members’ active tasks 3. Require members to rejoin 4. Perform assignment 5. Propagate to members 120Coordinator T6T3 T1 T2 T3 T5T4 T6 T7 T8 T3T1 T2 T6T4 T5
  • 121. Catch task change 1. Add two new tasks 2. Revoke all members’ active tasks 3. Require members to rejoin 4. Perform assignment 5. Propagate to members 6. Done! 121Coordinator T1 T2 T3 T5T4 T6 T7 T8 T3T1 T2 T6T4 T5 T8T7
  • 122. 122 122 122 Recap: 1. Membership changes: (a) Member joins/leaves the group (b) Member times out (c) Zombie member fencing 2. Assignor changes 3. Task changes
  • 123. 123 123 123 Congratulations! You have walked through all the necessary parts of a rebalance algorithm! Now let’s take a systematic view of it.
  • 124. 124 124 State Machine View: Two-phase protocol Stable
  • 125. 125 125 State Machine View: Two-phase protocol RebalanceStable Rebalance condition triggered
  • 126. 126 126 State Machine View: Two-phase protocol RebalanceStable Rebalance condition triggered Sync All current members join/ Rebalance timeout
  • 127. 127 127 State Machine View: Two-phase protocol RebalanceStable Rebalance condition triggered Sync All current members join/ Rebalance timeout Bump generation
  • 128. 128 128 State Machine View: Two-phase protocol RebalanceStable Rebalance condition triggered Sync All current members join/ Rebalance timeout Leader sends back the assignment RebalanceStable Sync Bump generation
  • 129. 129 129 State Machine View: Two-phase protocol RebalanceStable Sync Leader sends back the assignment Rebalance condition triggered All current members join/ Rebalance timeout Rebalance condition triggered Bump generation
  • 130. 130 130 130 Rebalance is helpful, but sometimes harmful.
  • 131. 131 131 131 Rebalance is helpful, but sometimes harmful. 1. transient failure
  • 132. 132 132 132 Rebalance is helpful, but sometimes harmful. 1. transient failure 2. rolling bounce
  • 134. Transient unavailability 1. One member couldn’t connect to coordinator 134Coordinator T1 T2 T3 T5T4 T6 T7 T8 T3T1 T2 T6T4 T5 T8T7
  • 135. Transient unavailability 1. One member couldn’t connect to coordinator 2. Session timeout reaches 135Coordinator T1 T2 T3 T5T4 T6 T7 T8 T3T1 T2 T6T4 T5 T8T7
  • 136. Transient unavailability 1. One member couldn’t connect to coordinator 2. Session timeout reaches 3. Require other members to revoke tasks/rejoin 136Coordinator T1 T2 T3 T5T4 T6 T7 T8 T8T7T3T1 T2 T6T4 T5
  • 137. Transient unavailability 1. One member couldn’t connect to coordinator 2. Session timeout reaches 3. Require other members to revoke tasks/rejoin 4. Perform Assignment 137Coordinator T1 T2 T3 T5T4 T6 T7 T8 T8T7T3T1 T2 T6T4 T5
  • 138. Transient unavailability 1. One member couldn’t connect to coordinator 2. Session timeout reaches 3. Require other members to revoke tasks/rejoin 4. Perform Assignment 5. Propagate… 6. Done! However one member becomes zombie now 138Coordinator T1 T2 T3 T5T4 T6 T7 T8 T8T7T3T1 T2 T6T4 T5 T7 T8
  • 139. Transient unavailability 1. … 6. Done! However one member becomes zombie now 7. Zombie member rejoins 139Coordinator T1 T2 T3 T5T4 T6 T7 T8 T8T7T3T1 T2 T6T4 T5 T7 T8
  • 140. Transient unavailability 1. … 6. Done! However one member becomes zombie now 7. Zombie member rejoins 8. Zombie resets generation and rejoins 140Coordinator T1 T2 T3 T5T4 T6 T7 T8 T3T1 T2 T6T4 T5 T7 T8
  • 141. Transient unavailability 1. … 6. Done! However one member becomes zombie now 7. Zombie member rejoins 8. Zombie resets generation and rejoins 9. Coordinator requires all members to revoke tasks/rejoin 141Coordinator T1 T2 T3 T5T4 T6 T7 T8 T6T4 T5T3T1 T2 T7 T8
  • 142. Transient unavailability 1. … 6. Done! However one member becomes zombie now 7. Zombie member rejoins 8. Zombie resets generation and rejoins 9. Coordinator requires all members to revoke tasks/rejoin 10. Perform assignment (different from last time) 142Coordinator T1 T2 T7 T5T4 T8 T3 T6 T6T4 T5T3T1 T2 T7 T8
  • 143. Transient unavailability 1. … 6. Done! However one member becomes zombie now 7. Zombie member rejoins 8. Zombie resets generation and rejoins 9. Coordinator requires all members to revoke tasks/rejoin 10. Perform assignment (different from last time) 11. Propagate, and done! 143Coordinator T1 T2 T7 T5T4 T8 T3 T6 T6T3T7T1 T2 T8T4 T5
  • 144. Transient unavailability ● An unnecessary assignment change 144Coordinator T1 T2 T7 T5T4 T8 T3 T6 T6T3T7T1 T2 T8T4 T5 T3T1 T2 T6T4 T5 T8T7
  • 145. Transient unavailability ● An unnecessary assignment change ● Solution: ○ Increase session.timeout.ms 145Coordinator T1 T2 T7 T5T4 T8 T3 T6 T6T3T7T1 T2 T8T4 T5 T3T1 T2 T6T4 T5 T8T7
  • 146. Transient unavailability ● An unnecessary assignment change ● Solution: ○ Increase session.timeout.ms 146Coordinator T1 T2 T3 T5T4 T6 T7 T8 T3T1 T2 T6T4 T5 T8T7
  • 147. Transient unavailability ● An unnecessary assignment change ● Solution: ○ Increase session.timeout.ms 147Coordinator T1 T2 T3 T5T4 T6 T7 T8 T3T1 T2 T6T4 T5 T8T7
  • 148. Transient unavailability ● An unnecessary assignment change ● Solution: ○ Increase session.timeout.ms ○ No rebalance triggered 148Coordinator T1 T2 T3 T5T4 T6 T7 T8 T3T1 T2 T6T4 T5 T8T7
  • 149. Transient unavailability ● An unnecessary assignment change ● Solution: ○ Increase session.timeout.ms ○ No rebalance triggered ○ Trade-off assignment stickiness vs availability 149Coordinator T1 T2 T3 T5T4 T6 T7 T8 T3T1 T2 T6T4 T5 T8T7
  • 150. 150 150 150 Rebalance is helpful, but sometimes harmful. 1. transient failure 2. rolling bounce
  • 151. Rolling bounce 151Coordinator T1 T2 T3 T5T4 T6 T7 T8 T3T1 T2 T6T4 T5 T8T7 ID: m1 ID: m2 ID: m3 m1, m2, m3 Members
  • 152. Rolling bounce 1. Restart member fleet 152Coordinator T1 T2 T3 T5T4 T6 T7 T8 ID: -- ID: -- ID: -- m1, m2, m3 Members T3T1 T2 T6T4 T5 T8T7
  • 153. Rolling bounce 1. Restart member fleet 2. Some member sends leave group request 3. Members rejoin 153Coordinator T1 T2 T3 T5T4 T6 T7 T8 T6T4 T5T3T1 T2 T8T7 ID: -- ID: -- ID: -- Members [ ]
  • 154. Rolling bounce 1. Restart member fleet 2. Some member sends leave group request 3. Members rejoin 154Coordinator T1 T2 T3 T5T4 T6 T7 T8 T6T4 T5T3T1 T2 T8T7 ID: -- ID: -- ID: -- Members m4
  • 155. Rolling bounce 1. Restart member fleet 2. Some member sends leave group request 3. Members rejoin 155Coordinator T1 T2 T3 T5T4 T6 T7 T8 T6T4 T5T3T1 T2 T8T7 ID: -- ID: -- ID: -- Members m4, m5
  • 156. Rolling bounce 1. Restart member fleet 2. Some member sends leave group request 3. Members rejoin 156Coordinator T1 T2 T3 T5T4 T6 T7 T8 T6T4 T5T3T1 T2 T8T7 ID: -- ID: -- ID: -- Members m4, m5, m6
  • 157. Rolling bounce 1. Restart member fleet 2. Some member sends leave group request 3. Members rejoin 4. Member assignment gets shuffled 157Coordinator T1 T2 T3 T5T4 T6 T7 T8 T6T4 T5T3T1 T2 T8T7 m4, m5, m6 Members ID: -- ID: -- ID: --
  • 158. Rolling bounce 1. Restart member fleet 2. Some member sends leave group request 3. Members rejoin 4. Member assignment gets shuffled 158Coordinator T7 T4 T3 T5T2 T8 T1 T6 T6T4 T5T3T1 T2 T8T7 ID: -- ID: -- ID: -- m4, m5, m6 Members
  • 159. Rolling bounce 1. Restart member fleet 2. Some member sends leave group request 3. Members rejoin 4. Member assignment gets shuffled 5. Perform assignment, and new member id 159Coordinator T7 T4 T3 T5T2 T8 T1 T6 T6T4 T5T3T1 T2 T8T7 ID: -- ID: -- ID: -- m4, m5, m6 Members
  • 160. Rolling bounce 1. Restart member fleet 2. Some member sends leave group request 3. Members rejoin 4. Member assignment gets shuffled 5. Perform assignment, and new member id 160Coordinator T7 T4 T3 T5T2 T8 T1 T6 T6T4 T5 T8T7 ID: m4 ID: -- ID: -- m4, m5, m6 Members T7T3 T4
  • 161. Rolling bounce 1. Restart member fleet 2. Some member sends leave group request 3. Members rejoin 4. Member assignment gets shuffled 5. Perform assignment, and new member.id 161Coordinator T7 T4 T3 T5T2 T8 T1 T6 T8T7 ID: m4 ID: m5 ID: -- m4, m5, m6 Members T7T3 T4 T8T2 T5
  • 162. Rolling bounce 1. Restart member fleet 2. Some member sends leave group request 3. Members rejoin 4. Member assignment gets shuffled 5. Perform assignment, and new member.id 6. Propagate… 7. Done! 162Coordinator T7 T3 T4 T5T2 T8 T1 T6 T6T1T7T3 T4 T8T2 T5 ID: m4 ID: m5 ID: m6 m4, m5, m6 Members
  • 163. Rolling bounce ● Another unnecessary assignment change 163Coordinator T7 T3 T4 T5T2 T8 T1 T6 T6T1T7T3 T4 T8T2 T5 T3T1 T2 T6T4 T5 T8T7
  • 164. Rolling bounce ● Another unnecessary assignment change ● No persistence of member identity. After restart, the member is unknown to the coordinator. 164Coordinator T7 T3 T4 T5T2 T8 T1 T6 T6T1T7T3 T4 T8T2 T5 T3T1 T2 T6T4 T5 T8T7 m1, m2, m3 m4, m5, m6 Members
  • 165. 165 165 165 Look into the future: static membership
  • 166. 166 166 166 Look into the future: static membership 1. Unique id for each member
  • 167. 167 167 167 Look into the future: static membership 1. Unique id for each member 2. Enlarge session timeout to make it effective
  • 168. 168 168 168 Look into the future: static membership 1. Unique id for each member 2. Enlarge session timeout to make it effective 3. No rebalance if just doing rolling bounce
  • 169. 169 169 169 Look into the future: static membership 1. Unique id for each member 2. Enlarge session timeout to make it effective 3. No rebalance if just doing rolling bounce 4. Great if work with K8s
  • 170. Static membership 170Coordinator T1 T2 T3 T5T4 T6 T7 T8 T3T1 T2 T6T4 T5 T8T7
  • 171. Static membership ● Give each member a unique id ○ Config: group.instance.id 171Coordinator T1 T2 T3 T5T4 T6 T7 T8 T3T1 T2 T6T4 T5 T8T7 ID: w1 ID: w2 ID: w3
  • 172. Static membership ● Give each member a unique id ○ Config: group.instance.id ○ Remember assignment info on coordinator 172Coordinator T1 T2 T3 T5T4 T6 T7 T8 T3T1 T2 T6T4 T5 T8T7 ID: w1 ID: w2 ID: w3
  • 173. Static membership ● Give each member a unique id ○ Config: group.instance.id ○ Remember assignment info on coordinator ○ Static member never sends leave group request 173Coordinator T1 T2 T3 T5T4 T6 T7 T8 T3T1 T2 T6T4 T5 T8T7 ID: w1 ID: w2 ID: w3
  • 174. Static membership ● Give each member a unique id ○ Config: group.instance.id ○ Remember assignment info on coordinator ○ Static member never sends leave group request ○ No rebalance upon known static member rejoin 174Coordinator T1 T2 T3 T5T4 T6 T7 T8 T3T1 T2 T6T4 T5 T8T7 ID: w1 ID: w2 ID: w3
  • 175. Static membership 1. Restart member fleet 175Coordinator T1 T2 T3 T5T4 T6 T7 T8 T3T1 T2 T6T4 T5 T8T7 ID: w1 ID: w2 ID: w3
  • 176. Static membership 1. Restart member fleet 2. Member w1 rejoins 176Coordinator T1 T2 T3 T5T4 T6 T7 T8 T6T4 T5 T8T7 ID: w1 ID: w2 ID: w3 T3T1 T2
  • 177. Static membership 1. Restart member fleet 2. Member w1 rejoins 3. Coordinator gets w1’s assignment 177Coordinator T1 T2 T3 T5T4 T6 T7 T8 T6T4 T5 T8T7 ID: w1 ID: w2 ID: w3 T3T1 T2
  • 178. Static membership 1. Restart member fleet 2. Member w1 rejoins 3. Coordinator gets w1’s assignment 4. Member w1 gets the same assignment 178Coordinator T1 T2 T3 T5T4 T6 T7 T8 T6T4 T5 T8T7 ID: w1 ID: w2 ID: w3 T3T1 T2
  • 179. Static membership 1. … 5. Member w2 rejoins 179Coordinator T1 T2 T3 T5T4 T6 T7 T8 T8T7 ID: w1 ID: w2 ID: w3 T3T1 T2 T6T4 T5
  • 180. Static membership 1. … 5. Member w2 rejoins 6. Coordinator gets w2’s assignment 180Coordinator T1 T2 T3 T5T4 T6 T7 T8 T8T7 ID: w1 ID: w2 ID: w3 T3T1 T2 T6T4 T5
  • 181. Static membership 1. … 5. Member w2 rejoins 6. Coordinator gets w2’s assignment 7. Member w2 gets the same assignment 181Coordinator T1 T2 T3 T5T4 T6 T7 T8 T8T7 ID: w1 ID: w2 ID: w3 T3T1 T2 T6T4 T5
  • 182. Static membership 1. … 8. Member w3 rejoins 182Coordinator T1 T2 T3 T5T4 T6 T7 T8 T8T7 ID: w1 ID: w2 ID: w3 T3T1 T2 T6T4 T5
  • 183. Static membership 1. … 8. Member w3 rejoins 9. Coordinator gets w3’s assignment 183Coordinator T1 T2 T3 T5T4 T6 T7 T8 T8T7 ID: w1 ID: w2 ID: w3 T3T1 T2 T6T4 T5
  • 184. Static membership 1. … 8. Member w3 rejoins 9. Coordinator gets w3’s assignment 10. Member w3 gets the same assignment 11. Done! 184Coordinator T1 T2 T3 T5T4 T6 T7 T8 T8T7 ID: w1 ID: w2 ID: w3 T3T1 T2 T6T4 T5
  • 185. 185 185 185 Oops! I configured duplicate instances!
  • 186. Fencing conflict instance ● Maintain a mapping from instance id to member id 186Coordinator T3T1 T2 T6T4 T5 gID: w1 w1 -> m1, w2 -> m2 Members mID: m1 gID: w2 mID: m2 T1 T2 T3 T5T4 T6
  • 187. Fencing conflict instance ● Maintain a mapping from instance id to member id ● Update member id when a known instance rejoins 187Coordinator T3T1 T2 T6T4 T5 gID: w1 w1 -> m1, w2 -> m2 Members mID: m1 gID: w2 mID: m2 T1 T2 T3 T5T4 T6
  • 188. Fencing conflict instance ● Maintain a mapping from instance id to member id ● Update member id when a known instance rejoins 1. One conflict member joins 188Coordinator T3T1 T2 T6T4 T5 gID: w1 w1 -> m1, w2 -> m2 Members mID: m1 gID: w2 mID: m2 gID: w2 mID: -- T1 T2 T3 T5T4 T6
  • 189. Fencing conflict instance ● Maintain a mapping from instance id to member id ● Update member id when a known instance rejoins 1. One conflict member joins 2. Update w2’s member id to m3 189Coordinator T3T1 T2 T6T4 T5 gID: w1 w1 -> m1, w2 -> m2 m3 Members mID: m1 gID: w2 mID: m2 gID: w2 mID: -- T1 T2 T3 T5T4 T6
  • 190. Fencing conflict instance ● Maintain a mapping from instance id to member id ● Update member id when a known instance rejoins 1. One conflict member joins 2. Update w2’s member id to m3 3. Old member m2 call heartbeat 190Coordinator T3T1 T2 T6T4 T5 gID: w1 w1 -> m1, w2 -> m2 m3 Members mID: m1 gID: w2 mID: m2 gID: w2 mID: m3 T6T4 T5 T1 T2 T3 T5T4 T6 hb(m2)
  • 191. Fencing conflict instance ● Maintain a mapping from instance id to member id ● Update member id when a known instance rejoins 1. One conflict member joins 2. Update w2’s member id to m3 3. Old member m2 call heartbeat() 4. Member m2 will be fenced since w2’s member id != m2 191Coordinator T3T1 T2 T6T4 T5 gID: w1 w1 -> m1, w2 -> m2 m3 Members mID: m1 gID: w2 mID: m2 gID: w2 mID: m3 T6T4 T5 T1 T2 T3 T5T4 T6 hb(m2)
  • 192. Fencing conflict instance ● Maintain a mapping from instance id to member id ● Update member id when a known instance rejoins 1. One conflict member joins 2. Update w2’s member id to m3 3. Old member m2 call heartbeat() 4. Member m2 will be fenced since w2’s member id != m2 5. Immediately crash m2 192Coordinator T3T1 T2 T6T4 T5 gID: w1 w1 -> m1, w2 -> m2 m3 Members mID: m1 gID: w2 mID: m2 gID: w2 mID: m3 T6T4 T5 T1 T2 T3 T5T4 T6 hb(m2)
  • 193. Fencing conflict instance 1. One conflict member joins 2. Update w2’s member id to m3 3. Old member m2 call heartbeat() 4. Member m2 will be fenced since w2’s member id != m2 5. Immediately crash m2 6. Group keeps stable 193Coordinator T3T1 T2 gID: w1 w1 -> m1, w2 -> m3 Members mID: m1 gID: w2 mID: m3 T6T4 T5 T1 T2 T3 T5T4 T6
  • 194. Fencing conflict instance (Nice to have)● A caveat for concurrent joining 194Coordinator T3T1 T2 gID: w1 mID: m1 T1 T2 T3 w1 -> m1 Members
  • 195. Fencing conflict instance (Nice to have)● A caveat for concurrent joining 1. First member joins with id w2 195Coordinator gID: w1 w1 -> m1 w2 -> m2 Members mID: m1 T1 T2 T3 gID: w2 mID: -- T3T1 T2
  • 196. Fencing conflict instance (Nice to have)● A caveat for concurrent joining 1. First member joins with id w2 2. In the meantime, a conflict w2 joins 196Coordinator gID: w1 w1 -> m1 w2 -> m2 Members mID: m1 T1 T2 T3 gID: w2 mID: -- T3T1 T2 gID: w2 mID: --
  • 197. Fencing conflict instance (Nice to have)● A caveat for concurrent joining 1. First member joins with id w2 2. In the meantime, a conflict w2 joins 3. Replace w2’s member id to m3 197Coordinator gID: w1 w1 -> m1 w2 -> m2, m3 Members mID: m1 T1 T2 T3 gID: w2 mID: -- T3T1 T2 gID: w2 mID: --
  • 198. Fencing conflict instance (Nice to have)● A caveat for concurrent joining 1. First member joins with id w2 2. In the meantime, a conflict w2 joins 3. Replace w2’s member id to m3 4. Coordinator requires w1 to rejoin 198Coordinator gID: w1 w1 -> m1 w2 -> m2 m3 Members mID: m1 T1 T2 T3 gID: w2 mID: -- gID: w2 mID: -- T3T1 T2
  • 199. Fencing conflict instance (Nice to have)● A caveat for concurrent joining 1. First member joins with id w2 2. In the meantime, a conflict w2 joins 3. Replace w2’s member id to m3 4. Coordinator requires w1 to rejoin 5. Group performs assignment 199Coordinator gID: w1 w1 -> m1 w2 -> m2 m3 Members mID: m1 gID: w2 mID: -- gID: w2 mID: -- T3T1 T2 T1 T2 T3
  • 200. Fencing conflict instance (Nice to have)● A caveat for concurrent joining 1. First member joins with id w2 2. In the meantime, a conflict w2 joins 3. Replace w2’s member id to m3 4. Coordinator requires w1 to rejoin 5. Group performs assignment 6. Propagate new assignment 200Coordinator gID: w1 w1 -> m1 w2 -> m3 Members mID: m1 gID: w2 mID: -- gID: w2 mID: -- T1 T2 T1 T2 T3
  • 201. Fencing conflict instance (Nice to have)● A caveat for concurrent joining 1. First member joins with id w2 2. In the meantime, a conflict w2 joins 3. Replace w2’s member id to m3 4. Coordinator requires w1 to rejoin 5. Group performs assignment 6. Propagate new assignment 7. Done! 201Coordinator gID: w1 w1 -> m1 w2 -> m3 Members mID: m1 gID: w2 mID: -- gID: w2 mID: m3 T1 T2 T1 T2 T3 T3
  • 202. Fencing conflict instance (Nice to have)● A caveat for concurrent joining 1. … 8. Out of scope member times out, rejoining 202Coordinator gID: w1 w1 -> m1 w2 -> m3 Members mID: m1 gID: w2 mID: -- gID: w2 mID: m3 T1 T2 T1 T2 T3 T3
  • 203. Fencing conflict instance (Nice to have)● A caveat for concurrent joining 1. … 8. Out of scope member times out, rejoining 9. Update w2’s member id to m4 203Coordinator gID: w1 w1 -> m1 w2 -> m3 m4 Members mID: m1 gID: w2 mID: -- gID: w2 mID: m3 T1 T2 T1 T2 T3 T3
  • 204. Fencing conflict instance (Nice to have)● A caveat for concurrent joining 1. … 8. Out of scope member times out, rejoining 9. Update w2’s member id to m4 10. Get conflict assignment 204Coordinator gID: w1 w1 -> m1 w2 -> m3 m4 Members mID: m1 gID: w2 mID: m4 gID: w2 mID: m3 T1 T2 T1 T2 T3 T3T3
  • 205. Fencing conflict instance (Nice to have)● A caveat for concurrent joining 1. … 8. Out of scope member times out, rejoining 9. Update w2’s member id to m4 10. Get conflict assignment 11. Old member m3 calls heartbeat() 205Coordinator gID: w1 w1 -> m1 w2 -> m3 m4 Members mID: m1 gID: w2 mID: m4 gID: w2 mID: m3 T1 T2 T1 T2 T3 T3T3 hb(m3)
  • 206. Fencing conflict instance (Nice to have)● A caveat for concurrent joining 1. … 8. Out of scope member times out, rejoining 9. Update w2’s member id to m4 10. Get conflict assignment 11. Old member m3 calls heartbeat() 12. Mismatch member id, fencing m3 206Coordinator gID: w1 w1 -> m1 w2 -> m3 m4 Members mID: m1 gID: w2 mID: m4 gID: w2 mID: m3 T1 T2 T1 T2 T3 T3T3 hb(m3)
  • 207. Fencing conflict instance (Nice to have)● A caveat for concurrent joining ● Downsides: ○ Risk of concurrent processing ○ Delayed conflict detection 207Coordinator gID: w1 w1 -> m1 w2 -> m3 m4 Members mID: m1 gID: w2 mID: m4 gID: w2 mID: m3 T1 T2 T1 T2 T3 T3T3 hb(m3)
  • 208. Fencing conflict instance (Nice to have)● Fence against callback 208Coordinator T3T1 T2 gID: w1 mID: m1 T1 T2 T3 w1 -> m1 MembersJoin callback
  • 209. Fencing conflict instance (Nice to have)● Fence against callback 1. New member with id w2 joins, registering a callback 209Coordinator gID: w1 w1 -> m1 w2 -> m2 Members mID: m1 T1 T2 T3 gID: w2 mID: -- <w2, m2> Join callback T3T1 T2
  • 210. Fencing conflict instance (Nice to have)● Fence against callback 1. New member with id w2 joins, registering a callback 2. A conflict member joins at the same time 210Coordinator gID: w1 w1 -> m1 w2 -> m2 Members mID: m1 T1 T2 T3 gID: w2 mID: -- gID: w2 mID: -- <w1, m1> <w2, m2> Join callback T3T1 T2
  • 211. Fencing conflict instance (Nice to have)● Fence against callback 1. New member with id w2 joins, registering a callback 2. A conflict member joins at the same time 3. Replace member id to m3 211Coordinator gID: w1 w1 -> m1 w2 -> m2, m3 Members mID: m1 T1 T2 T3 gID: w2 mID: -- gID: w2 mID: -- <w1, m1> <w2, m2> Join callback T3T1 T2
  • 212. Fencing conflict instance (Nice to have)● Fence against callback 1. New member with id w2 joins, registering a callback 2. A conflict member joins at the same time 3. Replace member id to m3 4. Return m2 callback with fenced exception 212Coordinator gID: w1 w1 -> m1, w2 -> m3 Members mID: m1 T1 T2 T3 gID: w2 mID: -- gID: w2 mID: -- <w1, m1> <w2, m2> X Join callback T3T1 T2
  • 213. Fencing conflict instance (Nice to have)● Fence against callback 1. New member with id w2 joins, registering a callback 2. A conflict member joins at the same time 3. Replace member id to m3 4. Return m2 callback with fenced exception 5. Shutdown m2 immediately 213Coordinator gID: w1 w1 -> m1, w2 -> m3 Members mID: m1 T1 T2 T3 gID: w2 mID: -- gID: w2 mID: -- <w1, m1> <w2, m2> X Join callback T3T1 T2
  • 214. Fencing conflict instance (Nice to have)● Fence against callback 1. … 6. Require all members to revoke/rejoin 214Coordinator gID: w1 w1 -> m1 w2 -> m3 Members mID: m1 gID: w2 mID: -- <w1, m1> <w2, m3> Join callback T3T1 T2 T1 T2 T3
  • 215. Fencing conflict instance (Nice to have)● Fence against callback 1. … 6. Require all members to revoke/rejoin 7. Performs assignment 215Coordinator gID: w1 w1 -> m1 w2 -> m3 Members mID: m1 T1 T2 T3 gID: w2 mID: -- <w1, m1> <w2, m3> Join callback T3T1 T2
  • 216. Fencing conflict instance (Nice to have)● Fence against callback 1. … 6. Require all members to revoke/rejoin 7. Performs assignment 8. Propagate through callbacks, done! Coordinator gID: w1 w1 -> m1 w2 -> m3 Members mID: m1 gID: w2 mID: -- Join callback T3T3T2 T1 T2 T3
  • 220. 220 220 Developer debugging 1. Log in your client application … Find your server!
  • 221. 221 221 Developer debugging [2019-06-14 00:23:47,020] INFO [Consumer instanceId=consumer-A-2, clientId=StaticMemberTestClient-019a5efe-87ef- 4c62-9891-27330df67049-StreamThread-2- consumer, groupId=StaticMemberTestClient] Discovered group coordinator ducker04:9092 (id: 2147483645 rack: null) (org.apache.kafka.clients.consumer.internals.Abst ractCoordinator) 1. Log in your client application 2. Look into client log and search for “Discovered group coordinator”
  • 222. 222 222 Developer debugging 1. Log in your client application 2. Look into client log and search for “Discovered group coordinator” 3. Find your server and log in … ducker04 Find your server!
  • 223. 223 223 Developer debugging [2019-06-14 00:23:47,389] INFO [GroupCoordinator 2]: Preparing to rebalance group StaticMemberTestClient in state PreparingRebalance with old generation 0 (__consumer_offsets-2) (reason: Adding new member consumer-A-1-1560471827287 with group instanceid Some(consumer-A-1)) (kafka.coordinator.group.GroupCoordinator) 1. Log in your client application 2. Look into client log and search for “Discovered group coordinator” 3. Find your server and log in 4. Check server log for “rebalance reason”
  • 224. 224 224 Developer debugging Server metrics: ● NumGroupsPreparing Rebalance ● NumGroupsCompleti ngRebalance ● NumGroupsStable ● NumGroupsDead ● NumGroupsEmpty …
  • 225. 225 225 Developer debugging Client metrics: ● Join-rate/total ● Join-time-avg/max ● Sync-rate/total ● Sync-time-avg/max ● Assigned-partitions ● Commit-rate/total ● Heartbeat-rate/total …
  • 227. 227 227 227227 ● Different timeouts: ○ Enlarge your session.timeout.ms to achieve better stability ○ max.poll.interval.ms is the tolerance of member poll efficiency ○ rebalance.timeout.ms will kick out unjoined members when due Takeaways
  • 228. 228 228 228228 ● Different timeouts: ○ Enlarge your session.timeout.ms to achieve better stability ○ max.poll.interval.ms is the tolerance of member poll efficiency ○ rebalance.timeout.ms will kick out unjoined members when due ● What is group generation? Takeaways
  • 229. 229 229 229229 ● Different timeouts: ○ Enlarge your session.timeout.ms to achieve better stability ○ max.poll.interval.ms is the tolerance of member poll efficiency ○ rebalance.timeout.ms will kick out unjoined members when due ● What is group generation? ● Why we let the client do assignment? Takeaways
  • 230. 230 230 230230 ● Different timeouts: ○ Enlarge your session.timeout.ms to achieve better stability ○ max.poll.interval.ms is the tolerance of member poll efficiency ○ rebalance.timeout.ms will kick out unjoined members when due ● What is group generation? ● Why we let the client do assignment? ● Static membership is generally available in AK 2.3, for Consumer and Streams ○ Upgrade your broker to 2.3 ○ Set unique group.instance.id for your client (monitoring fencing) ○ Make session timeout long enough Takeaways
  • 231. 231 Resources • KIP-62: Allow consumer to send heartbeats from a background thread • KIP-180: Add a broker metric specifying the number of consumer group rebalances in progress • KIP-345: Introduce static membership protocol to reduce consumer rebalances (accepted) • Kafka Client redesign proposal • "The Magical Rebalance Protocol of Apache Kafka" by Gwen Shapira (Strange Loop Talk, Sep 2018) https://www.youtube.com/watch?v=MmLezWRI3Ys&t=8s
  • 232. 232 232 232 Special thanks to Guozhang Wang, Jason Gustafson, Liquan Pei and Matthias J Sax
  • 233. 233 233 KS19Meetup. CONFLUENT COMMUNITY DISCOUNT CODE 25% OFF* *Standard Priced Conference pass