7. 7
It is tradeoffs all the way down
● Retention - disk size
● Throughput - network, CPU
● Producer performance - disk IO
● Consumer performance - CPU, memory
Just performance
requirements!
9. 9
Separate clusters for...
● DR
● Geographical distribution
● Writing vs Reading
● Real-time vs Batch
● Dev / Test
● High throughput
● Highly reliable
● Security
12. 12
Kafka is built to scale horizontally
● Largest cluster: 200+ nodes
● Lots of work on improving controller in 1.1, 2.0
● Larger / more loaded brokers mean longer restarts and recovery.
● Larger brokers require tuning to take full advantage.
13. 13
Disks depends on version
- Before 1.1: RAID 10 recommended
- 1.1 and up:
- KIP 112 - Broker will survive loss of disk
- KIP 113 - Can assign replicas to specific disks
20. 20
The 25K question
1. Read Jun Rao blog on topic
2. More partitions == more scale
3. More partitions == more throughput
4. More partitions != more speed
5. Controller improvements in 1.1 mean more
partitions per broker
27. 27
Not all clients are same
1. Producers have very high throughput
2. Especially when tuned
3. EOS / Order require single-writer-for-entity
4. Many consumer groups is where Kafka shines
29. 29
Benchmarks
● Kafka ships with performance tools
● And your fav language has tools
● Your own workload (or similar)
● Your own configuration
● Your own failure scenarios
30. 30
Tuning
● Don’t fly blind
● Why is it slow?
● Where is the bottleneck?
● Version control for all configuration
● Automate the
“change->test->observe” loop
31. 31
My broker is slow 101
● Are all brokers working?
● Did you saturate network capacity?
● Is CPU utilization high?
● Are you running an old version?
● Do you have HUGE messages?
● Is it really the broker?