Speaker: Boyang Chen, Infrastructure Engineer, Confluent
Rebalance protocol is the coordination algorithm to process data in a dynamic scaling fashion for general Kafka clients, including Consumer, Connect and Streams. If you have ever been in the shoes of a Kafka application developer, you should have heard of this term and even been bitten by it some times. In fact, it is one of the known performance killers for large member group or state heavy application as of today.
In this talk, we will deep dive into this protocol, demo some troubleshooting experience and introduce two most recent improvements on top: static membership and incremental rebalancing. After the talk, you would gain a deeper understanding of the rebalance protocol, which shall boost your Kafka development velocity right away!
https://www.meetup.com/KafkaBayArea/events/261932534/
2. 2
2
2
Agenda
● What is rebalancing?
● How to design a
rebalancing protocol?
● Unnecessary rebalances
● Look into the future:
static membership
● Debugging tip
2
11. What is
rebalancing?
● Group membership
● Resource assignment
● Example: Coordinator – Worker
model
11Coordinator
T1 T2 T3
T5T4 T6
12. What is
rebalancing?
● Group membership
● Resource assignment
● Example: Coordinator – Worker
model
1. New members join the group
12Coordinator
T1 T2 T3
T5T4 T6
13. What is
rebalancing?
● Group membership
● Resource assignment
● Example: Coordinator – Worker
model
1. New members join the group
13Coordinator
T1 T2 T3
T5T4 T6
14. What is
rebalancing?
● Group membership
● Resource assignment
● Example: Coordinator – Worker
model
1. New members join the group
2. Perform assignment
14Coordinator
T1 T2 T3
T5T4 T6
15. What is
rebalancing?
● Group membership
● Resource assignment
● Example: Coordinator – Worker
model
1. New members join the group
2. Perform assignment
3. Propagate
15Coordinator
T3T1 T2
T1 T2 T3
T5T4 T6
16. What is
rebalancing?
● Group membership
● Resource assignment
● Example: Coordinator – Worker
model
1. New members join the group
2. Perform assignment
3. Propagate
4. Done!
16Coordinator
T3T1 T2
T1 T2 T3
T5T4 T6
T6T4 T5
19. 19
19
19
How to design a rebalance protocol?
1. Membership changes:
(a) Member joins/leaves the group
20. 20
20
20
How to design a rebalance protocol?
1. Membership changes:
(a) Member joins/leaves the group
(b) Member times out
21. 21
21
21
How to design a rebalance protocol?
1. Membership changes:
(a) Member joins/leaves the group
(b) Member times out
(c) Zombie member fencing
22. 22
22
22
How to design a rebalance protocol?
1. Membership changes:
(a) Member joins/leaves the group
(b) Member times out
(c) Zombie member fencing
2. Assignor changes
23. 23
23
23
How to design a rebalance protocol?
1. Membership changes:
(a) Member joins/leaves the group
(b) Member times out
(c) Zombie member fencing
2. Assignor changes
3. Task changes
25. 25Coordinator
T3T1 T2
T1 T2 T3
T5T4 T6
T6T4 T5
Catch membership
change
1. Spin up a new member
2. A new member joins
26. 26Coordinator
T3T1 T2
T1 T2 T3
T5T4 T6
T6T4 T5
Catch membership
change
1. Spin up a new member
2. A new member joins
3. Revoke active tasks
4. Require members to rejoin
27. Catch membership
change
1. Spin up a new member
2. A new member joins
3. Revoke active tasks
4. Require members to rejoin
27Coordinator
T3T1 T2 T6T4 T5
T1 T2 T3
T5T4 T6
28. Catch membership
change
1. Spin up a new member
2. A new member joins
3. Revoke active tasks
4. Require members to rejoin
5. Perform new assignment
28Coordinator
T3T1 T2 T6T4 T5
T1 T2 T3
T5T4 T6
29. Catch membership
change
1. Spin up a new member
2. A new member joins
3. Revoke active tasks
4. Require members to rejoin
5. Perform new assignment
6. Propagate to members
29Coordinator
T6T4 T5
T1 T2 T3
T5T4 T6
T4T1
30. Catch membership
change
1. Spin up a new member
2. A new member joins
3. Revoke active tasks
4. Require members to rejoin
5. Perform new assignment
6. Propagate to members
30Coordinator
T1 T2 T3
T5T4 T6
T4T1 T5T2
31. Catch membership
change
1. Spin up a new member
2. A new member joins
3. Revoke active tasks
4. Require members to rejoin
5. Perform new assignment
6. Propagate to members
7. Done!
31Coordinator
T1 T2 T3
T5T4 T6
T4T1 T5T2 T6T3
33. Catch membership
change
1. Remove an active member
2. Member sends leave group
request
33Coordinator
T1 T2 T3
T5T4 T6
T4T1 T5T2 T6T3
34. Catch membership
change
1. Remove an active member
2. Member sends leave group
request
3. Revoke other members’ active
tasks
34Coordinator
T1 T2 T3
T5T4 T6
T4T1 T5T2
35. Catch membership
change
1. Remove an active member
2. Member sends leave group
request
3. Revoke other members’ active
tasks
4. Require members to rejoin
35Coordinator
T1 T2 T3
T5T4 T6
T4T1 T5T2
36. Catch membership
change
1. Remove an active member
2. Member sends leave group
request
3. Revoke other members’ active
tasks
4. Require members to rejoin
36Coordinator
T1 T2 T3
T5T4 T6
T4T1 T5T2
37. Catch membership
change
1. Remove an active member
2. Member sends leave group
request
3. Revoke other members’ active
tasks
4. Require members to rejoin
5. Perform assignment
37Coordinator
T1 T2 T3
T5T4 T6
T4T1 T5T2
38. Catch membership
change
1. Remove an active member
2. Member sends leave group
request
3. Revoke other members’ active
tasks
4. Require members to rejoin
5. Perform assignment
6. Propagate to members
38Coordinator
T1 T2 T3
T5T4 T6
T5T2T3T1 T2
39. Catch membership
change
1. Remove an active member
2. Member sends leave group
request
3. Revoke other members’ active
tasks
4. Require members to rejoin
5. Perform assignment
6. Propagate to members
7. Done!
39Coordinator
T1 T2 T3
T5T4 T6
T3T1 T2 T6T4 T5
40. 40
40
40
How to design a rebalance protocol?
1. Membership changes:
(a) Member joins/leaves the group
(b) Member times out
(c) Zombie member fencing
2. Assignor changes
3. Task changes
46. Session timeout
1. Member crashes without
sending leave group
46Coordinator
T3T1 T2
T1 T2 T3
T5T4 T6
T6T4 T5
47. Session timeout
1. Member crashes without
sending leave group
2. Session timeout reaches
47Coordinator
T3T1 T2
T1 T2 T3
T5T4 T6
T6T4 T5
48. Session timeout
1. Member crashes without
sending leave group
2. Session timeout reaches
3. Require other members to
revoke tasks/rejoin
48Coordinator
T3T1 T2
T1 T2 T3
T5T4 T6
49. Session timeout
1. Member crashes without
sending leave group
2. Session timeout reaches
3. Require other members to
revoke tasks/rejoin
49Coordinator
T3T1 T2
T1 T2 T3
T5T4 T6
50. Session timeout
1. Member crashes without
sending leave group
2. Session timeout reaches
3. Require other members to
revoke tasks/rejoin
4. Perform Assignment
50Coordinator
T3T1 T2
T1 T2 T3
T5T4 T6
51. Session timeout
1. Member crashes without
sending leave group
2. Session timeout reaches
3. Require other members to
revoke tasks/rejoin
4. Perform Assignment
5. Propagate…
6. Done!
51Coordinator
T3T1 T2
T6T4 T5
T1 T2 T3
T5T4 T6
58. Poll()
…
Poll()
…
Poll()
…………
…………
>= max.poll.interval.ms
Max poll interval
timeout
● Poll – Process – Commit
● One process takes too long
● Reach timeout limit
1. Member takes too long to
process
2. Background thread stops
heartbeat and sends leave group
58Coordinator
T3T1 T2
Poll()
…
Poll()
…
Poll()
…T4
T1 T2
T4T3
59. Max poll interval
timeout
● Poll – Process – Commit
● One process takes too long
● Reach timeout limit
1. Member takes too long to
process
2. Background thread stops
heartbeat and sends leave group
3. Ask others to revoke task/rejoin
59Coordinator
T3T1 T2 T4
T1 T2
T4T3
60. Max poll interval
timeout
● Poll – Process – Commit
● One process takes too long
● Reach timeout limit
1. Member takes too long to
process
2. Background thread stops
heartbeat and sends leave group
3. Ask others to revoke task/rejoin
4. Perform assignment
60Coordinator
T3T1 T2 T4
T1 T2
T4T3
61. Max poll interval
timeout
● Poll – Process – Commit
● One process takes too long
● Reach timeout limit
1. Member takes too long to
process
2. Background thread stops
heartbeat and sends leave group
3. Ask others to revoke task/rejoin
4. Perform assignment
5. Propagate to members, done!
61Coordinator
T1 T2 T3
T5T4 T6
T4T2T3T1
62. Rebalance timeout
● Max time for a member to rejoin
during rebalance
62Coordinator
T3T1 T2
T1 T2 T3
T5T4 T6
T6T4 T5
63. Rebalance timeout
● Max time for a member to rejoin
during rebalance
● Use the max value of
max.poll.interval among all
clients
○ Member has to finish
ongoing work
63Coordinator
T3T1 T2
T1 T2 T3
T5T4 T6
T6T4 T5
64. Rebalance timeout
● Max time for a member to rejoin
during rebalance
● Use the max value of
max.poll.interval among all
clients
○ Member has to finish
ongoing work
● Track with given member id
64Coordinator
T3T1 T2
T1 T2 T3
T5T4 T6
T6T4 T5
m1, m2
Members
mID: m1 mID: m2
65. Rebalance timeout
● Max time for a member to rejoin
during rebalance
● Use the max value of
max.poll.interval among all
clients
○ Member has to finish
ongoing work
● Track with given member id
● Register callback
65Coordinator
T3T1 T2
T1 T2 T3
T5T4 T6
T6T4 T5
m1, m2
Members
mID: m1 mID: m2
Join callback
72. 72
72
72
How to design a rebalance protocol?
1. Membership changes:
(a) Member joins/leaves the group
(b) Member times out
(c) Zombie member fencing
2. Assignor changes
3. Task changes
74. Fencing zombie
● What if a zombie member rejoins?
74Coordinator
mID: m1 mID: m2
m1
Members
T6T4 T5T3T1 T2
T6T4 T5
T1 T2 T3
T5T4 T6
mID: m1
75. Fencing zombie
● Bump generation number after
each rebalance
75Coordinator
T3T1 T2
gen: 1
m1, m2
Members
mID: m1
T6T4 T5
gen: 1
mID: m2
T1 T2 T3
T5T4 T6
Generation 1
76. Fencing zombie
● Bump generation number after
each rebalance
1. Member m1 rejoins group
76Coordinator
T3T1 T2
gen: 1
m1, m2
Members
mID: m1
T6T4 T5
gen: 1
mID: m2
T1 T2 T3
T5T4 T6
Generation 1
77. Fencing zombie
● Bump generation number after
each rebalance
1. Member m1 rejoins group
2. All members rejoin/revoke tasks
77Coordinator
T3T1 T2
gen: 1
m1, m2
Members
mID: m1
T6T4 T5
gen: 1
mID: m2
T1 T2 T3
T5T4 T6
Generation 1
78. Fencing zombie
● Bump generation number after
each rebalance
1. Member m1 rejoins group
2. All members rejoin/revoke tasks
3. Bump generation number
78Coordinator
T3T1 T2
gen: 1
m1, m2
Members
mID: m1
T6T4 T5
gen: 1
mID: m2
T1 T2 T3
T5T4 T6
Generation 1
Generation 2
79. Fencing zombie
● Bump generation number after
each rebalance
1. Member m1 rejoins group
2. All members rejoin/revoke tasks
3. Bump generation number
4. Perform assignment
79Coordinator
T3T1 T2
gen: 1
m1, m2
Members
mID: m1
T6T4 T5
gen: 1
mID: m2
T1 T2 T3
T5T4 T6
Generation 2
80. Fencing zombie
● Bump generation number after
each rebalance
1. Member m1 rejoins group
2. All members rejoin/revoke tasks
3. Bump generation number
4. Perform assignment
5. Propagate, and Done!
6. Group currently stable at
generation 2
80Coordinator
T3T1 T2
gen: 2
m1, m2
Members
mID: m1
T6T4 T5
gen: 2
mID: m2
T1 T2 T3
T5T4 T6
Generation 2
81. Fencing zombie
1. …
6. Group currently stable at
generation 2
7. Rebalance triggers again
81Coordinator
T3T1 T2
gen: 2
m1, m2
Members
mID: m1
T6T4 T5
gen: 2
mID: m2
T1 T2 T3
T5T4 T6
Generation 2
Join callback
82. Fencing zombie
1. …
6. Group currently stable at
generation 2
7. Rebalance triggers again
8. Member m1 rejoins
82Coordinator
T3T1 T2
gen: 2
m1, m2
Members
mID: m1
T6T4 T5
gen: 2
mID: m2
T1 T2 T3
T5T4 T6
Generation 2
<m1>
Join callback
83. Fencing zombie
1. …
6. Group currently stable at
generation 2
7. Rebalance triggers again
8. Member m1 rejoins
9. Member m2 has transient failure
83Coordinator
T3T1 T2
gen: 2
m1, m2
Members
mID: m1
T6T4 T5
gen: 2
mID: m2
T1 T2 T3
T5T4 T6
Generation 2
<m1>
Join callback
84. Fencing zombie
1. …
6. Group currently stable at
generation 2
7. Rebalance triggers again
8. Member m1 rejoins
9. Rebalance timeout reaches,
kicking out m2
84Coordinator
T3T1 T2
gen: 2
m1, m2
Members
mID: m1
T6T4 T5
gen: 2
mID: m2
T1 T2 T3
T5T4 T6
Generation 2
<m1>
Join callback
85. Fencing zombie
1. …
6. Group currently stable at
generation 2
7. Rebalance triggers again
8. Member m1 rejoins
9. Rebalance timeout reaches,
kicking out m2
10. Bump generation to 3
85Coordinator
T3T1 T2
gen: 2
m1, m2
Members
mID: m1
T6T4 T5
gen: 2
mID: m2
T1 T2 T3
T5T4 T6
Generation 2
Generation 3
<m1>
Join callback
86. Fencing zombie
1. …
6. Group currently stable at
generation 2
7. Rebalance triggers again
8. Member m1 rejoins
9. Rebalance timeout reaches,
kicking out m2
10. Bump generation to 3
11. Perform assignment
86Coordinator
T3T1 T2
gen: 2
m1, m2
Members
mID: m1
T6T4 T5
gen: 2
mID: m2
T1 T2 T3
T5T4 T6
Generation 2
Generation 3
<m1>
Join callback
87. Fencing zombie
1. …
6. Group currently stable at
generation 2
7. Rebalance triggers again
8. Member m1 rejoins
9. Rebalance timeout reaches,
kicking out m2
10. Bump generation to 3
11. Perform assignment
12. Propagate, and done!
13. Group stable at generation 3
87Coordinator
gen: 3
mID: m1
T6T4 T5
gen: 2
mID: m2
T1 T2 T3
T5T4 T6
m1
Members
Generation 3
Join callback
T6T1 T2 …
88. Fencing zombie
1. …
11. Group stable at generation 3
12. Member m2 rejoins in a zombie
mode
88Coordinator
gen: 3
mID: m1
T6T4 T5
gen: 2
mID: m2
T1 T2 T3
T5T4 T6
Join callback
…
m1
Members
Generation 3
T6T1 T2 …
89. Fencing zombie
1. …
11. Group stable at generation 3
12. Member m2 rejoins in a zombie
mode
13. Fenced by mismatched
generation
89Coordinator
gen: 3
mID: m1
T6T4 T5
gen: 2
mID: m2
T1 T2 T3
T5T4 T6
Join callback
…
X
m1
Members
Generation 3
T6T1 T2 …
90. Fencing zombie
1. …
11. Group stable at generation 3
12. Member m2 rejoins in a zombie
mode
13. Fenced by mismatched
generation
14. Reset local generation info
90Coordinator
gen: 3
mID: m1
gen: --
mID: --
T1 T2 T3
T5T4 T6
Join callback
…
m1
Members
Generation 3
T6T1 T2 …
91. Fencing zombie
1. …
11. Group stable at generation 3
12. Member m2 rejoins in a zombie
mode
13. Fenced by mismatched
generation
14. Reset local generation info
15. Rejoin as unknown member
without generation
91Coordinator
gen: 3
mID: m1
gen: --
mID: --
T1 T2 T3
T5T4 T6
Join callback
…
m1
Members
Generation 3
T6T1 T2 …
92. Fencing zombie
1. …
11. Group stable at generation 3
12. Member m2 rejoins in a zombie
mode
13. Fenced by mismatched
generation
14. Reset local generation info
15. Rejoin as unknown member
without generation
16. Registered as m3
17. Group transits to rebalance
92Coordinator
gen: 3
mID: m1
gen: --
mID: --
T1 T2 T3
T5T4 T6
…
m1, m3
Members
Generation 3
<m3>
<m1>
Join callback
T6T1 T2 …
93. Fencing zombie
1. …
17. Group transits to rebalance
18. Bump generation to 4
93Coordinator
gen: 3
mID: m1
gen: --
mID: --
T1 T2 T3
T5T4 T6
…
m1, m3
Members
Generation 3
Generation 4
<m3>
<m1>
Join callback
T6T1 T2 …
94. Fencing zombie
1. …
17. Group transits to rebalance
18. Bump generation to 4
19. Perform assignment
94Coordinator
gen: 3
mID: m1
gen: --
mID: --
T1 T2 T3
T5T4 T6
…
m1, m3
Members
Generation 3
Generation 4
<m3>
<m1>
Join callback
T6T1 T2 …
95. Fencing zombie
1. …
17. Group transits to rebalance
18. Bump generation to 4
19. Perform assignment
20. Propagate, and done!
95Coordinator
gen: 4
mID: m1
gen: 4
mID: m3
m1, m3
Members
Generation 4
Join callback
T3T1 T2 T6T4 T5
T1 T2 T3
T5T4 T6
96. 96
96
96
How to design a rebalance protocol?
1. Membership changes:
(a) Member joins/leaves the group
(b) Member times out
(c) Zombie member fencing
2. Assignor changes
3. Task changes
98. Do assignment on
broker
1. Stable with range assignment
98Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5 T8T7
Range
99. Do assignment on
broker
1. Stable with range assignment
2. Redeploy coordinator to use RR
99Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5 T8T7
Range
Round Robin
100. Do assignment on
broker
1. Stable with range assignment
2. Redeploy coordinator to use RR
3. Coordinator bounce completes
100Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5 T8T7
Round Robin
101. Do assignment on
broker
1. Stable with range assignment
2. Redeploy coordinator to use RR
3. Coordinator bounce completes
● Not an ideal approach
○ Restart stateful service
○ Affect other clients
○ Data protocol consistency
101Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5 T8T7
Round Robin
102. Do assignment on
client
1. Designated leader member
102Coordinator
T3T1 T2 T6T4 T5 T8T7
T1 T2
T5T4
T7 T8
T3 T6
Range
103. 1. Designated leader member
2. Redeploy leader to use RR
103Coordinator
T3T1 T2 T6T4 T5 T8T7
T1 T2
T5T4
T7 T8
T3 T6
Range
Round Robin
Do assignment on
client
104. Do assignment on
client
1. Designated leader member
2. Redeploy leader to use RR
3. Leader restarted
104Coordinator
T3T1 T2 T6T4 T5 T8T7
T1 T2
T5T4
T7 T8
T3 T6
Round Robin
105. Do assignment on
client
1. Designated leader member
2. Redeploy leader to use RR
3. Leader restarted
4. Leader asks coordinator to
rebalance
105Coordinator
T6T4 T5 T8T7
T1 T2
T5T4
T7 T8
T3 T6
Round Robin
T3T1 T2
106. Do assignment on
client
1. Designated leader member
2. Redeploy leader to use RR
3. Leader restarted
4. Leader asks coordinator to
rebalance
5. Coordinator requires members
to revoke tasks/rejoin
106Coordinator
T3T1 T2 T6T4 T5
Round Robin
T1 T2
T5T4
T7 T8
T3 T6
T8T7
107. Do assignment on
client
1. Designated leader member
2. Redeploy leader to use RR
3. Leader restarted
4. Leader asks coordinator to
rebalance
5. Coordinator requires members
to revoke tasks/rejoin
107Coordinator
T3T1 T2 T6T4 T5
Round Robin
T1 T2
T5T4
T7 T8
T3 T6
T8T7
108. Do assignment on
client
1. Designated leader member
2. Redeploy leader to use RR
3. Leader restarted
4. Leader asks coordinator to
rebalance
5. Coordinator requires members
to revoke tasks/rejoin
6. Coordinator inform leader all the
members rejoined
108Coordinator
T3T1 T2 T6T4 T5 T8T7
T1 T2
T5T4
T7 T8
T3 T6
Round Robin
109. Do assignment on
client
1. …
5. Coordinator requires members
to revoke tasks/rejoin
6. Coordinator inform leader all the
members rejoined
7. Leader performs assignment
109Coordinator
T3T1 T2 T6T4 T5 T8T7
T1 T2
T5T4
T7 T8
T3 T6
Round Robin
110. Do assignment on
client
1. …
5. Coordinator requires members
to revoke tasks/rejoin
6. Coordinator inform leader all the
members rejoined
7. Leader performs assignment
8. Leader calls sync group to send
back assignment
110Coordinator
T3T1 T2 T6T4 T5 T8T7
T1 T2
T5T4
T7 T8
T3 T6
Round Robin
111. Do assignment on
client
1. …
5. Coordinator requires members
to revoke tasks/rejoin
6. Coordinator inform leader all the
members rejoined
7. Leader performs assignment
8. Leader calls sync group to send
back assignment
9. Coordinator propagates…
10. Done!
111Coordinator
T7T1 T4 T8T2 T5 T6T3
T1 T2
T5T4
T7 T8
T3 T6
112. 112
112
112
How to design a rebalance protocol?
1. Membership changes:
(a) Member joins/leaves the group
(b) Member times out
(c) Zombie member fencing
2. Assignor changes
3. Task changes
114. Catch task change
1. Add two new tasks
114Coordinator
T4T1 T5T2 T6T3
T1 T2 T3
T5T4 T6
T7
T8
115. Catch task change
1. Add two new tasks
2. Revoke all members’ active
tasks
3. Require members to rejoin
115Coordinator
T4T1 T5T2 T6T3
T1 T2 T3
T5T4 T6
T7
T8
116. Catch task change
1. Add two new tasks
2. Revoke all members’ active
tasks
3. Require members to rejoin
4. Perform assignment
5. Propagate to members
6. Done!
116Coordinator
T4T1 T5T2 T6T3
T1 T2 T3
T5T4 T6
T7
T8
117. Catch task change
1. Add two new tasks
2. Revoke all members’ active
tasks
3. Require members to rejoin
117Coordinator
T4T1 T5T2 T6T3
T1 T2 T3
T5T4 T6
T7
T8
118. Catch task change
1. Add two new tasks
2. Revoke all members’ active
tasks
3. Require members to rejoin
4. Perform assignment
118Coordinator
T4T1 T5T2 T6T3
T1 T2 T3
T5T4 T6
T7
T8
119. Catch task change
1. Add two new tasks
2. Revoke all members’ active
tasks
3. Require members to rejoin
4. Perform assignment
5. Propagate to members
119Coordinator
T5T2 T6T3
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2
120. Catch task change
1. Add two new tasks
2. Revoke all members’ active
tasks
3. Require members to rejoin
4. Perform assignment
5. Propagate to members
120Coordinator
T6T3
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5
121. Catch task change
1. Add two new tasks
2. Revoke all members’ active
tasks
3. Require members to rejoin
4. Perform assignment
5. Propagate to members
6. Done!
121Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5 T8T7
126. 126
126
State Machine View: Two-phase protocol
RebalanceStable
Rebalance
condition
triggered
Sync
All current
members join/
Rebalance
timeout
127. 127
127
State Machine View: Two-phase protocol
RebalanceStable
Rebalance
condition
triggered
Sync
All current
members join/
Rebalance
timeout
Bump generation
128. 128
128
State Machine View: Two-phase protocol
RebalanceStable
Rebalance
condition
triggered
Sync
All current
members join/
Rebalance
timeout
Leader sends
back the
assignment
RebalanceStable
Sync
Bump generation
129. 129
129
State Machine View: Two-phase protocol
RebalanceStable
Sync
Leader sends
back the
assignment
Rebalance
condition
triggered
All current
members join/
Rebalance
timeout
Rebalance
condition
triggered
Bump generation
135. Transient
unavailability
1. One member couldn’t connect to
coordinator
2. Session timeout reaches
135Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5 T8T7
136. Transient
unavailability
1. One member couldn’t connect to
coordinator
2. Session timeout reaches
3. Require other members to
revoke tasks/rejoin
136Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T8T7T3T1 T2 T6T4 T5
137. Transient
unavailability
1. One member couldn’t connect to
coordinator
2. Session timeout reaches
3. Require other members to
revoke tasks/rejoin
4. Perform Assignment
137Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T8T7T3T1 T2 T6T4 T5
138. Transient
unavailability
1. One member couldn’t connect to
coordinator
2. Session timeout reaches
3. Require other members to
revoke tasks/rejoin
4. Perform Assignment
5. Propagate…
6. Done! However one member
becomes zombie now
138Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T8T7T3T1 T2 T6T4 T5
T7 T8
139. Transient
unavailability
1. …
6. Done! However one member
becomes zombie now
7. Zombie member rejoins
139Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T8T7T3T1 T2 T6T4 T5
T7 T8
140. Transient
unavailability
1. …
6. Done! However one member
becomes zombie now
7. Zombie member rejoins
8. Zombie resets generation and
rejoins
140Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5
T7 T8
141. Transient
unavailability
1. …
6. Done! However one member
becomes zombie now
7. Zombie member rejoins
8. Zombie resets generation and
rejoins
9. Coordinator requires all
members to revoke tasks/rejoin
141Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T6T4 T5T3T1 T2
T7 T8
142. Transient
unavailability
1. …
6. Done! However one member
becomes zombie now
7. Zombie member rejoins
8. Zombie resets generation and
rejoins
9. Coordinator requires all
members to revoke tasks/rejoin
10. Perform assignment (different
from last time)
142Coordinator
T1 T2 T7
T5T4 T8
T3
T6
T6T4 T5T3T1 T2
T7 T8
143. Transient
unavailability
1. …
6. Done! However one member
becomes zombie now
7. Zombie member rejoins
8. Zombie resets generation and
rejoins
9. Coordinator requires all
members to revoke tasks/rejoin
10. Perform assignment (different
from last time)
11. Propagate, and done!
143Coordinator
T1 T2 T7
T5T4 T8
T3
T6
T6T3T7T1 T2 T8T4 T5
152. Rolling bounce
1. Restart member fleet
152Coordinator
T1 T2 T3
T5T4 T6
T7
T8
ID: -- ID: -- ID: --
m1, m2, m3
Members
T3T1 T2 T6T4 T5 T8T7
153. Rolling bounce
1. Restart member fleet
2. Some member sends leave
group request
3. Members rejoin
153Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T6T4 T5T3T1 T2 T8T7
ID: -- ID: -- ID: --
Members
[ ]
154. Rolling bounce
1. Restart member fleet
2. Some member sends leave
group request
3. Members rejoin
154Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T6T4 T5T3T1 T2 T8T7
ID: -- ID: -- ID: --
Members
m4
155. Rolling bounce
1. Restart member fleet
2. Some member sends leave
group request
3. Members rejoin
155Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T6T4 T5T3T1 T2 T8T7
ID: -- ID: -- ID: --
Members
m4, m5
156. Rolling bounce
1. Restart member fleet
2. Some member sends leave
group request
3. Members rejoin
156Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T6T4 T5T3T1 T2 T8T7
ID: -- ID: -- ID: --
Members
m4, m5, m6
157. Rolling bounce
1. Restart member fleet
2. Some member sends leave
group request
3. Members rejoin
4. Member assignment gets
shuffled
157Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T6T4 T5T3T1 T2 T8T7
m4, m5, m6
Members
ID: -- ID: -- ID: --
158. Rolling bounce
1. Restart member fleet
2. Some member sends leave
group request
3. Members rejoin
4. Member assignment gets
shuffled
158Coordinator
T7 T4 T3
T5T2 T8
T1
T6
T6T4 T5T3T1 T2 T8T7
ID: -- ID: -- ID: --
m4, m5, m6
Members
159. Rolling bounce
1. Restart member fleet
2. Some member sends leave
group request
3. Members rejoin
4. Member assignment gets
shuffled
5. Perform assignment, and new
member id
159Coordinator
T7 T4 T3
T5T2 T8
T1
T6
T6T4 T5T3T1 T2 T8T7
ID: -- ID: -- ID: --
m4, m5, m6
Members
160. Rolling bounce
1. Restart member fleet
2. Some member sends leave
group request
3. Members rejoin
4. Member assignment gets
shuffled
5. Perform assignment, and new
member id
160Coordinator
T7 T4 T3
T5T2 T8
T1
T6
T6T4 T5 T8T7
ID: m4 ID: -- ID: --
m4, m5, m6
Members
T7T3 T4
161. Rolling bounce
1. Restart member fleet
2. Some member sends leave
group request
3. Members rejoin
4. Member assignment gets
shuffled
5. Perform assignment, and new
member.id
161Coordinator
T7 T4 T3
T5T2 T8
T1
T6
T8T7
ID: m4 ID: m5 ID: --
m4, m5, m6
Members
T7T3 T4 T8T2 T5
162. Rolling bounce
1. Restart member fleet
2. Some member sends leave
group request
3. Members rejoin
4. Member assignment gets
shuffled
5. Perform assignment, and new
member.id
6. Propagate…
7. Done!
162Coordinator
T7 T3 T4
T5T2 T8
T1
T6
T6T1T7T3 T4 T8T2 T5
ID: m4 ID: m5 ID: m6
m4, m5, m6
Members
163. Rolling bounce
● Another unnecessary
assignment change
163Coordinator
T7 T3 T4
T5T2 T8
T1
T6
T6T1T7T3 T4 T8T2 T5
T3T1 T2 T6T4 T5 T8T7
164. Rolling bounce
● Another unnecessary
assignment change
● No persistence of member
identity. After restart, the
member is unknown to the
coordinator.
164Coordinator
T7 T3 T4
T5T2 T8
T1
T6
T6T1T7T3 T4 T8T2 T5
T3T1 T2 T6T4 T5 T8T7
m1, m2, m3
m4, m5, m6
Members
167. 167
167
167
Look into the future: static membership
1. Unique id for each member
2. Enlarge session timeout to make it
effective
168. 168
168
168
Look into the future: static membership
1. Unique id for each member
2. Enlarge session timeout to make it
effective
3. No rebalance if just doing rolling bounce
169. 169
169
169
Look into the future: static membership
1. Unique id for each member
2. Enlarge session timeout to make it
effective
3. No rebalance if just doing rolling bounce
4. Great if work with K8s
171. Static membership
● Give each member a unique id
○ Config: group.instance.id
171Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5 T8T7
ID: w1 ID: w2 ID: w3
172. Static membership
● Give each member a unique id
○ Config: group.instance.id
○ Remember assignment info
on coordinator
172Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5 T8T7
ID: w1 ID: w2 ID: w3
173. Static membership
● Give each member a unique id
○ Config: group.instance.id
○ Remember assignment info
on coordinator
○ Static member never sends
leave group request
173Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5 T8T7
ID: w1 ID: w2 ID: w3
174. Static membership
● Give each member a unique id
○ Config: group.instance.id
○ Remember assignment info
on coordinator
○ Static member never sends
leave group request
○ No rebalance upon known
static member rejoin
174Coordinator
T1 T2 T3
T5T4 T6
T7
T8
T3T1 T2 T6T4 T5 T8T7
ID: w1 ID: w2 ID: w3
186. Fencing conflict
instance
● Maintain a mapping from
instance id to member id
186Coordinator
T3T1 T2 T6T4 T5
gID: w1
w1 -> m1,
w2 -> m2
Members
mID: m1
gID: w2
mID: m2
T1 T2 T3
T5T4 T6
187. Fencing conflict
instance
● Maintain a mapping from
instance id to member id
● Update member id when a
known instance rejoins
187Coordinator
T3T1 T2 T6T4 T5
gID: w1
w1 -> m1,
w2 -> m2
Members
mID: m1
gID: w2
mID: m2
T1 T2 T3
T5T4 T6
188. Fencing conflict
instance
● Maintain a mapping from
instance id to member id
● Update member id when a
known instance rejoins
1. One conflict member joins
188Coordinator
T3T1 T2 T6T4 T5
gID: w1
w1 -> m1,
w2 -> m2
Members
mID: m1
gID: w2
mID: m2
gID: w2
mID: --
T1 T2 T3
T5T4 T6
189. Fencing conflict
instance
● Maintain a mapping from
instance id to member id
● Update member id when a
known instance rejoins
1. One conflict member joins
2. Update w2’s member id to m3
189Coordinator
T3T1 T2 T6T4 T5
gID: w1
w1 -> m1,
w2 -> m2 m3
Members
mID: m1
gID: w2
mID: m2
gID: w2
mID: --
T1 T2 T3
T5T4 T6
190. Fencing conflict
instance
● Maintain a mapping from
instance id to member id
● Update member id when a
known instance rejoins
1. One conflict member joins
2. Update w2’s member id to m3
3. Old member m2 call heartbeat
190Coordinator
T3T1 T2 T6T4 T5
gID: w1
w1 -> m1,
w2 -> m2 m3
Members
mID: m1
gID: w2
mID: m2
gID: w2
mID: m3
T6T4 T5
T1 T2 T3
T5T4 T6
hb(m2)
191. Fencing conflict
instance
● Maintain a mapping from
instance id to member id
● Update member id when a
known instance rejoins
1. One conflict member joins
2. Update w2’s member id to m3
3. Old member m2 call heartbeat()
4. Member m2 will be fenced since
w2’s member id != m2
191Coordinator
T3T1 T2 T6T4 T5
gID: w1
w1 -> m1,
w2 -> m2 m3
Members
mID: m1
gID: w2
mID: m2
gID: w2
mID: m3
T6T4 T5
T1 T2 T3
T5T4 T6
hb(m2)
192. Fencing conflict
instance
● Maintain a mapping from
instance id to member id
● Update member id when a
known instance rejoins
1. One conflict member joins
2. Update w2’s member id to m3
3. Old member m2 call heartbeat()
4. Member m2 will be fenced since
w2’s member id != m2
5. Immediately crash m2
192Coordinator
T3T1 T2 T6T4 T5
gID: w1
w1 -> m1,
w2 -> m2 m3
Members
mID: m1
gID: w2
mID: m2
gID: w2
mID: m3
T6T4 T5
T1 T2 T3
T5T4 T6
hb(m2)
193. Fencing conflict
instance
1. One conflict member joins
2. Update w2’s member id to m3
3. Old member m2 call heartbeat()
4. Member m2 will be fenced since
w2’s member id != m2
5. Immediately crash m2
6. Group keeps stable
193Coordinator
T3T1 T2
gID: w1
w1 -> m1,
w2 -> m3
Members
mID: m1
gID: w2
mID: m3
T6T4 T5
T1 T2 T3
T5T4 T6
194. Fencing conflict
instance (Nice to
have)● A caveat for concurrent joining
194Coordinator
T3T1 T2
gID: w1
mID: m1
T1 T2 T3
w1 -> m1
Members
195. Fencing conflict
instance (Nice to
have)● A caveat for concurrent joining
1. First member joins with id w2
195Coordinator
gID: w1
w1 -> m1
w2 -> m2
Members
mID: m1
T1 T2 T3
gID: w2
mID: --
T3T1 T2
196. Fencing conflict
instance (Nice to
have)● A caveat for concurrent joining
1. First member joins with id w2
2. In the meantime, a conflict w2
joins
196Coordinator
gID: w1
w1 -> m1
w2 -> m2
Members
mID: m1
T1 T2 T3
gID: w2
mID: --
T3T1 T2
gID: w2
mID: --
197. Fencing conflict
instance (Nice to
have)● A caveat for concurrent joining
1. First member joins with id w2
2. In the meantime, a conflict w2
joins
3. Replace w2’s member id to m3
197Coordinator
gID: w1
w1 -> m1
w2 -> m2, m3
Members
mID: m1
T1 T2 T3
gID: w2
mID: --
T3T1 T2
gID: w2
mID: --
198. Fencing conflict
instance (Nice to
have)● A caveat for concurrent joining
1. First member joins with id w2
2. In the meantime, a conflict w2
joins
3. Replace w2’s member id to m3
4. Coordinator requires w1 to rejoin
198Coordinator
gID: w1
w1 -> m1
w2 -> m2 m3
Members
mID: m1
T1 T2 T3
gID: w2
mID: --
gID: w2
mID: --
T3T1 T2
199. Fencing conflict
instance (Nice to
have)● A caveat for concurrent joining
1. First member joins with id w2
2. In the meantime, a conflict w2
joins
3. Replace w2’s member id to m3
4. Coordinator requires w1 to rejoin
5. Group performs assignment
199Coordinator
gID: w1
w1 -> m1
w2 -> m2 m3
Members
mID: m1
gID: w2
mID: --
gID: w2
mID: --
T3T1 T2
T1 T2 T3
200. Fencing conflict
instance (Nice to
have)● A caveat for concurrent joining
1. First member joins with id w2
2. In the meantime, a conflict w2
joins
3. Replace w2’s member id to m3
4. Coordinator requires w1 to rejoin
5. Group performs assignment
6. Propagate new assignment
200Coordinator
gID: w1
w1 -> m1
w2 -> m3
Members
mID: m1
gID: w2
mID: --
gID: w2
mID: --
T1 T2
T1 T2 T3
201. Fencing conflict
instance (Nice to
have)● A caveat for concurrent joining
1. First member joins with id w2
2. In the meantime, a conflict w2
joins
3. Replace w2’s member id to m3
4. Coordinator requires w1 to rejoin
5. Group performs assignment
6. Propagate new assignment
7. Done!
201Coordinator
gID: w1
w1 -> m1
w2 -> m3
Members
mID: m1
gID: w2
mID: --
gID: w2
mID: m3
T1 T2
T1 T2 T3
T3
202. Fencing conflict
instance (Nice to
have)● A caveat for concurrent joining
1. …
8. Out of scope member times out,
rejoining
202Coordinator
gID: w1
w1 -> m1
w2 -> m3
Members
mID: m1
gID: w2
mID: --
gID: w2
mID: m3
T1 T2
T1 T2 T3
T3
203. Fencing conflict
instance (Nice to
have)● A caveat for concurrent joining
1. …
8. Out of scope member times out,
rejoining
9. Update w2’s member id to m4
203Coordinator
gID: w1
w1 -> m1
w2 -> m3 m4
Members
mID: m1
gID: w2
mID: --
gID: w2
mID: m3
T1 T2
T1 T2 T3
T3
204. Fencing conflict
instance (Nice to
have)● A caveat for concurrent joining
1. …
8. Out of scope member times out,
rejoining
9. Update w2’s member id to m4
10. Get conflict assignment
204Coordinator
gID: w1
w1 -> m1
w2 -> m3 m4
Members
mID: m1
gID: w2
mID: m4
gID: w2
mID: m3
T1 T2
T1 T2 T3
T3T3
205. Fencing conflict
instance (Nice to
have)● A caveat for concurrent joining
1. …
8. Out of scope member times out,
rejoining
9. Update w2’s member id to m4
10. Get conflict assignment
11. Old member m3 calls heartbeat()
205Coordinator
gID: w1
w1 -> m1
w2 -> m3 m4
Members
mID: m1
gID: w2
mID: m4
gID: w2
mID: m3
T1 T2
T1 T2 T3
T3T3
hb(m3)
206. Fencing conflict
instance (Nice to
have)● A caveat for concurrent joining
1. …
8. Out of scope member times out,
rejoining
9. Update w2’s member id to m4
10. Get conflict assignment
11. Old member m3 calls heartbeat()
12. Mismatch member id, fencing
m3
206Coordinator
gID: w1
w1 -> m1
w2 -> m3 m4
Members
mID: m1
gID: w2
mID: m4
gID: w2
mID: m3
T1 T2
T1 T2 T3
T3T3
hb(m3)
207. Fencing conflict
instance (Nice to
have)● A caveat for concurrent joining
● Downsides:
○ Risk of concurrent
processing
○ Delayed conflict detection
207Coordinator
gID: w1
w1 -> m1
w2 -> m3 m4
Members
mID: m1
gID: w2
mID: m4
gID: w2
mID: m3
T1 T2
T1 T2 T3
T3T3
hb(m3)
208. Fencing conflict
instance (Nice to
have)● Fence against callback
208Coordinator
T3T1 T2
gID: w1
mID: m1
T1 T2 T3
w1 -> m1
MembersJoin callback
209. Fencing conflict
instance (Nice to
have)● Fence against callback
1. New member with id w2 joins,
registering a callback
209Coordinator
gID: w1
w1 -> m1
w2 -> m2
Members
mID: m1
T1 T2 T3
gID: w2
mID: --
<w2, m2>
Join callback
T3T1 T2
210. Fencing conflict
instance (Nice to
have)● Fence against callback
1. New member with id w2 joins,
registering a callback
2. A conflict member joins at the
same time
210Coordinator
gID: w1
w1 -> m1
w2 -> m2
Members
mID: m1
T1 T2 T3
gID: w2
mID: --
gID: w2
mID: --
<w1, m1>
<w2, m2>
Join callback
T3T1 T2
211. Fencing conflict
instance (Nice to
have)● Fence against callback
1. New member with id w2 joins,
registering a callback
2. A conflict member joins at the
same time
3. Replace member id to m3
211Coordinator
gID: w1
w1 -> m1
w2 -> m2, m3
Members
mID: m1
T1 T2 T3
gID: w2
mID: --
gID: w2
mID: --
<w1, m1>
<w2, m2>
Join callback
T3T1 T2
212. Fencing conflict
instance (Nice to
have)● Fence against callback
1. New member with id w2 joins,
registering a callback
2. A conflict member joins at the
same time
3. Replace member id to m3
4. Return m2 callback with fenced
exception
212Coordinator
gID: w1
w1 -> m1,
w2 -> m3
Members
mID: m1
T1 T2 T3
gID: w2
mID: --
gID: w2
mID: --
<w1, m1>
<w2, m2> X
Join callback
T3T1 T2
213. Fencing conflict
instance (Nice to
have)● Fence against callback
1. New member with id w2 joins,
registering a callback
2. A conflict member joins at the
same time
3. Replace member id to m3
4. Return m2 callback with fenced
exception
5. Shutdown m2 immediately
213Coordinator
gID: w1
w1 -> m1,
w2 -> m3
Members
mID: m1
T1 T2 T3
gID: w2
mID: --
gID: w2
mID: --
<w1, m1>
<w2, m2> X
Join callback
T3T1 T2
214. Fencing conflict
instance (Nice to
have)● Fence against callback
1. …
6. Require all members to
revoke/rejoin
214Coordinator
gID: w1
w1 -> m1
w2 -> m3
Members
mID: m1
gID: w2
mID: --
<w1, m1>
<w2, m3>
Join callback
T3T1 T2
T1 T2 T3
215. Fencing conflict
instance (Nice to
have)● Fence against callback
1. …
6. Require all members to
revoke/rejoin
7. Performs assignment
215Coordinator
gID: w1
w1 -> m1
w2 -> m3
Members
mID: m1
T1 T2 T3
gID: w2
mID: --
<w1, m1>
<w2, m3>
Join callback
T3T1 T2
216. Fencing conflict
instance (Nice to
have)● Fence against callback
1. …
6. Require all members to
revoke/rejoin
7. Performs assignment
8. Propagate through callbacks,
done!
Coordinator
gID: w1
w1 -> m1
w2 -> m3
Members
mID: m1
gID: w2
mID: --
Join callback
T3T3T2
T1 T2 T3
221. 221
221
Developer
debugging [2019-06-14 00:23:47,020] INFO [Consumer
instanceId=consumer-A-2,
clientId=StaticMemberTestClient-019a5efe-87ef-
4c62-9891-27330df67049-StreamThread-2-
consumer, groupId=StaticMemberTestClient]
Discovered group coordinator ducker04:9092
(id: 2147483645 rack: null)
(org.apache.kafka.clients.consumer.internals.Abst
ractCoordinator)
1. Log in your client
application
2. Look into client log
and search for
“Discovered group
coordinator”
222. 222
222
Developer
debugging
1. Log in your client
application
2. Look into client log
and search for
“Discovered group
coordinator”
3. Find your server and
log in
…
ducker04
Find your
server!
223. 223
223
Developer
debugging [2019-06-14 00:23:47,389] INFO
[GroupCoordinator 2]: Preparing to rebalance
group StaticMemberTestClient in state
PreparingRebalance with old generation 0
(__consumer_offsets-2) (reason: Adding new
member consumer-A-1-1560471827287 with
group instanceid Some(consumer-A-1))
(kafka.coordinator.group.GroupCoordinator)
1. Log in your client
application
2. Look into client log
and search for
“Discovered group
coordinator”
3. Find your server and
log in
4. Check server log for
“rebalance reason”
227. 227
227
227227
● Different timeouts:
○ Enlarge your session.timeout.ms to achieve better stability
○ max.poll.interval.ms is the tolerance of member poll efficiency
○ rebalance.timeout.ms will kick out unjoined members when due
Takeaways
228. 228
228
228228
● Different timeouts:
○ Enlarge your session.timeout.ms to achieve better stability
○ max.poll.interval.ms is the tolerance of member poll efficiency
○ rebalance.timeout.ms will kick out unjoined members when due
● What is group generation?
Takeaways
229. 229
229
229229
● Different timeouts:
○ Enlarge your session.timeout.ms to achieve better stability
○ max.poll.interval.ms is the tolerance of member poll efficiency
○ rebalance.timeout.ms will kick out unjoined members when due
● What is group generation?
● Why we let the client do assignment?
Takeaways
230. 230
230
230230
● Different timeouts:
○ Enlarge your session.timeout.ms to achieve better stability
○ max.poll.interval.ms is the tolerance of member poll efficiency
○ rebalance.timeout.ms will kick out unjoined members when due
● What is group generation?
● Why we let the client do assignment?
● Static membership is generally available in AK 2.3, for Consumer and
Streams
○ Upgrade your broker to 2.3
○ Set unique group.instance.id for your client (monitoring fencing)
○ Make session timeout long enough
Takeaways
231. 231
Resources
• KIP-62: Allow consumer to send heartbeats from a background thread
• KIP-180: Add a broker metric specifying the number of consumer group rebalances in progress
• KIP-345: Introduce static membership protocol to reduce consumer rebalances (accepted)
• Kafka Client redesign proposal
• "The Magical Rebalance Protocol of Apache Kafka" by Gwen Shapira (Strange Loop Talk, Sep 2018)
https://www.youtube.com/watch?v=MmLezWRI3Ys&t=8s