(SPRING)KAFKA - ONE MORE ARSENAL IN A DISTRIBUTED TOOLBOX
1. Nklmish
@nklmish
(SPRING) KAFKA - ONE MORE ARSENAL IN A
DISTRIBUTED TOOLBOX
“Therefore, just as water retains no constant shape, so in
warfare there are no constant conditions” - Sun Tzu
3. •Swedish startup founded in 2012.
•Strong advocate of DDD & CQRS.
•Event store with billions of events.
•Millions of events every day.
•Every event is EQUALLY important - “I was thinking I might win 50 pounds
but when it went all the way to the jackpot I was shocked.” - Mega Fortune
£2,700,000 jackpot won on the 3rd spin.
•And… we have an amazing culture.
11. Kafka - NOT a service bus
•ESB - Integrate legacy & off the shelf systems.
•Messaging layer (Low throughput).
•Central teams governance (validations, schemas, etc.).
•Beware : Stay away from recreating ESB antipatterns with Kafka.
•
12. Kafka - more than a message queue
•Supports both Point-to-point & publish-subscribe
•Extremely fast
•Massive msg throughput
•Msg replaying + retention
•Doesn’t slow down as the no. of consumers increases.
•Scalable
•Stronger ordering guarantees than a traditional messaging system
How
com
e?
13. Traditional queue, ordering guarantees
C0
C1
C2
Parallelconsumption
R0 R1 R2 R3 …
Server side: Queue retains records in-order
Async delivery
t=1, R0
t=2, R2
t=0, R1
M
essaging
system
solves
this
via
“exclusive
consum
er”
18. Kafka, a streaming platform
Kafka
Streaming API (can be stateful)
Kafka Streams & KSQL
Utilities
Schema Registry, Replicator, Mirror Maker, confluent platform
Communication API
API Clients
…
Connector API
Pull & push data to/from, Kafka (S3, JDBC, HDFS…)
23. “In the midst of chaos, there is also opportunity”
Sun Tzu
24. MOM vs.
Kafka
Broker Centric Approach Client Centric Approach
Index structures (Btree or Hash Tables) Log structured
Retention impacts performance Designed for Retention
Outrage: Significant slow down
Outrage: won’t cause infrastructure to slow
down significantly
29. Caching Comes From Simplicity - Sequential Disk Access
•Relies heavily on the filesystem for storing & caching messages
•Cache will stay warm even if the service is restarted.
30. Massive Throughput - Comes From Simplicity
•Zero copy - while reading the data is copied directly from disk
buffer to the network buffer; bypassing importing into JVM; In a
nutshell we can saturate the network.
java.nio.channels.File
Channel#transferTo()
32. Writes Velocity - Comes From Simplicity
Kafka
Sync()
Operation
Avoids flushing i.e.
Replication
Achieves via
Kafka Broker
(with 64 GB of
RAM)
Conclusion
Operates 1000x Faster Than Traditional
Messaging system
Replication built into the low-level design
37. What we achieved so far
•Parallel Consumer Reads - 3,080,000/sec
•Single Producer Writes - 640,000/sec
•Total events : 30+ Billion events
•Replay time < 3-5 Hours
•NOTE : We hit hardware limitations rather than Kafka’s.