"Getting started with a Kafka Consumer or Streams application is relatively straight forward, but having those clients be highly available and resilient in a real-world application where compute is becoming more ephemeral is another matter. Kubernetes is a popular place to deploy these workloads, making these challenges more accessible and prevalent than ever.
In this talk I will introduce a simple consumer implementation with a default configuration and discuss the KIPs and features that have been introduced over time to limit how the hostile world of cloud computing can impact your real-time consuming applications.
Once the Kafka Consumer configurations are under our belt, we can see how these same concepts are applied and augmented in Kafka Streams, and then cater for new concepts such as maintaining and restoring local state store data.
Throughout the talk I will show out-of-the box Kubernetes features and deployment configurations that harmonise with the Kafka clients and their configurations to achieve a highly available consumer or streams deployment.
If you have found your real-time streaming applications stopping the world through rebalancing, starting up slower than expected through a routine deployment, taking an age to restore state or over time found them to be less reliable as your platform engineers make your Kubernetes cluster more awesome, you will hopefully find something in this talk you could apply tomorrow."
3. 0.10.0 → 3.X
3
• In Production since 2017
• Nearly 700 consumer groups
• Realtime, stream-processing applications
• Kafka Streams ‘early adopters’
• Kubernetes hosted
• … not been without pain …
27. Make Kubernetes aware of consumer state
27
Tie-in readiness to rebalancing state
Kafka Consumer
class org.apache.kafka.clients.consumer.KafkaConsumer<K,V>
void subscribe(Collection<String> topics, ConsumerRebalanceListener callback)
public interface org.apache.kafka.clients.consumer.ConsumerRebalanceListener
void onPartitionsRevoked(Collection<TopicPartition> partitions)
void onPartitionsAssigned(Collection<TopicPartition> partitions)
28. Make Kubernetes aware of consumer state
28
Tie-in readiness to rebalancing state
Kafka Streams
class org.apache.kafka.streams.KafkaStreams
public void setStateListener(KafkaStreams.StateListener listener)
interface org.apache.kafka.streams.KafkaStreams.StateListener
void onChange(KafkaStreams.State newState, KafkaStreams.State
oldState)
Ready: oldState == State.REBALANCING && newState == State.RUNNING
Unready: newState != State.RUNNING
30. Kubernetes good citizens
30
Factor out your applications carefully
• Try to stick to a single API type per application
• Kafka Streams: Interactive Queries can be an exception
• Keep your microservices ‘micro’
• Probes should be efficient and not depend on external services
31. Kubernetes will respect the disruption budget
31
Multiple scenarios supported
• Cluster maintenance
• Cluster autoscaling
• Rolling deployment*
• Involuntary disruptions*
* Counts towards the budget only
32. kind: Deployment
apiVersion: apps/v1
metadata:
name: consumer-app
...
spec:
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
...
template:
metadata:
labels:
app.kubernetes.io/name: consumer-app
spec:
...
containers:
- ...
Kubernetes rolling deployment
32
Minimise the fallout from a rollout
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 0
maxUnavailable: 1
The default strategy is generally unfriendly for consumers
43. Cooperative incremental rebalancing
43
Read the documentation!
Kafka Consumer
partition.assignment.strategy =
[org.apache.kafka.clients.consumer.CooperativeStickyAssignor]
Before 3.0 default: [RangeAssignor]
After 3.0 default: [RangeAssignor, CooperativeStickyAssignor]
Opting in
44. Cooperative incremental rebalancing
44
Opting in
Read the documentation!
Kafka Streams
Built into the StreamPartitionAssignor, for
free™
Available from Kafka clients 2.4
https://kafka.apache.org/34/documentation/streams/upgrade-guide
45. Almost there..
45
Minimise the rebalance time, but still wait for session timeout
recovery time = session timeout + rebalance time
46. Static consumer group membership
46
Making static membership work with well with Kubernetes
“group.instance.id”
CommonClientConfigs.GROUP_INSTANCE_ID_C
ONFIG
Works together session.timeout.ms
Increased to 45s in 3.0 (from 10s)
KIP-345: Introduce static membership protocol to reduce consumer rebalances (2.4)
47. Use the Kubernetes ‘downward API’
47
Making the pod name available to the application
kind: Deployment
apiVersion: apps/v1
metadata:
name: consumer-app
...
spec:
...
template:
...
spec:
...
containers:
- ...
env:
- name: K8S_POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
...
env:
- name: K8S_POD_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.name
48. Static consumer group membership
48
Making static membership work with well with Kubernetes
49. Get a static name with StatefulSets
49
Constant set of pod names
* Bonus: no surges, pod by pod rolling deployments
50. Autoscaling?
50
Watch out for static membership
• Static membership prevents consumers from sending leave group requests
• session.timeout.ms could be set low, at the risk of more rebalances
• Dynamic membership?
• Could use an operator to scale down
interface org.apache.kafka.clients.admin.Admin
RemoveMembersFromConsumerGroupResult
removeMembersFromConsumerGroup(
String groupId, RemoveMembersFromConsumerGroupOptions options);
51. In review
51
Optimised pod layout
• Topology spread constraints
Made Kubernetes aware of replica state
• Health checks + Pod Disruption Budgets
Minimised rebalances + stop-the-world
• Incremental Co-operative Rebalancing
Minimised session timeout impact
• Static membership
recovery time = session timeout + rebalance time
60. Kafka Streams:
to be aware of
60
• KIP-441: Smooth scaling out of Kafka Streams (2.6)
• KIP-535: Allow IQ reads during rebalance (2.5)
• KAFKA-13439: Eager rebalancing of Kafka Streams
deprecated (3.1)
• KIP-708: Rack aware Kafka Streams apps (3.2)
61. Future
developments
61
• KIP-848: The Next Generation of the Consumer
Rebalance Protocol
• KAFKA-10199: Separate state restoration into separate
threads (3.5?)