Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014

#JCConf
Apache Kafka 
A high-throughput distributed messaging
system
陸振恩 (popcorny)
popcorny@cacaﬂy.com

#JCConf
● What is Kafka
● Basic concept
● Why Kafka fast
● Programming Kafka
● Using scenarios
● Recap
Outline
2

#JCConf
● More and more data and metrics need to be
collected
- Web activity tracking
- Operation metrics
- Application log aggregation
- Commit log
- …
● We need a message bus to collect and relay these
data
- Big Volume
- Fast and Scalable
Motivation
4

#JCConf
● Developed by Linkedin
● Message System
- Queue
- Pub / Sub
● Written in Scala
● Features
- Durability
- Scalability
- High Availability
- High Throughput
Kafka
5

#JCConf
BigData World
6
Traditional BigData
File System NFS HDFS, S3
Database RDBMS Cassandra, HBase
Batch
Processing
SQL
Hadoop MapReduce 
Spark
Stream
Processing
In-App Processing
Strom,  
Spark Streaming
Message
Service
AMQP-compliant Kafka

#JCConf
● Durability
- All messages are persisted
- Sequential read & write (like log file)
- Consumers keep the message offset (like file
descriptor)
- The log files are rotated (like logrotate)
- Messages are only deleted on expired. (like
logrotate)
- Support Batch Load and Real Time Usage!
• cat access.log | grep ‘jcconf’
● tail -F access.log | grep ‘jcconf’
Features
7

#JCConf
Design like Message Queue
Implementation like Distributed Log File
8

#JCConf
● Scalability
- Horizontal scale out
- Topic is partitioned (sharded)
● High Availability
- Partition can be replicated
Features
9

#JCConf
● High Throughput
Features
10
source: http://www.infoq.com/articles/apache-kafka

#JCConf
● Producer - The role to send message to broker
● Consumer -The role to receive message from broker
● Broker - One node of Kafka cluster.
● ZooKeeper - Coordinator of Kafka cluster and costumer
groups.
Kafka Cluster
Physical Components
12
Producer BroakerBroaker
Broaker
Zookeeper
Consumer Group
Consumer

#JCConf
● Topic!
- The named destination of
partition.
● Partition
- One Topic can have multiple
partition
- Unit of parallelism
● Message!
• Key/value pair
• Message offset
Logical Components
Topic
B
Partition 2
0 1
E
2
F
3
M
4
N
5
Q
6
R
7
S
8
Y
9
b
C
Partition 3
0 1
D
2
K
3
L
4
O
5
P
6
T
7
U
A
Partition 1
0 1
G
2
H
3
I
4
J
5
V
6
W
7
X
8
c
13

#JCConf
One Partition One Consumer  
(Queue)
P CA
Partition 1
0 1
B
2
C
3
D
4
E
5
F
6
G
7
H
8
I
9
J
offset = 8
14
Consumers keep the offset. 
Broker has no idea about if message is proceeded

#JCConf
One Partition Multiple Consumer  
(Pub/Sub)
P A
Partition 1
0 1
B
2
C
3
D
4
E
5
F
6
G
7
H
8
I
9
J
C1
C2
C3
offset = 8
offset = 7
offset = 9
15
Each Consumer keep its own offset.

#JCConf
broker2
Multiple Partitions
broker1
P
A
Partition 1
0 1
G
2
H
3
I
4
J
5
V
6
W
7
X
8
c
B
Partition 2
0 1
E
2
F
3
M
4
N
5
Q
6
R
7
S
8
Y
9
b
C
Partition 3
0 1
D
2
K
3
L
4
O
5
P
6
T
7
U
16
C1
p1.offset = 7
p2.offset = 9
p3.offset = 7
Dispatched by hashed key

#JCConf
broker2
Multiple Partitions
broker1
P
A
Partition 1
0 1
G
2
H
3
I
4
J
5
V
6
W
7
X
8
c
B
Partition 2
0 1
E
2
F
3
M
4
N
5
Q
6
R
7
S
8
Y
9
b
C
Partition 3
0 1
D
2
K
3
L
4
O
5
P
6
T
7
U
17
C2
offset = 9
offset = 7
C3
offset = 7
C1

#JCConf
Can we auto-rebalance the consumers
to partitions?
18
Yes, Consumer Group!!

#JCConf
● A group of workers
● Share the offsets
● Offsets are synced to ZooKeeper
● Auto Rebalancing
Consumer Group
19

#JCConf
Consumer Group
20
broker2
broker1
P
A
Partition 1
0 1
G
2
H
3
I
4
J
5
V
6
W
7
X
8
c
B
Partition 2
0 1
E
2
F
3
M
4
N
5
Q
6
R
7
S
8
Y
9
b
C
Partition 3
0 1
D
2
K
3
L
4
O
5
P
6
T
7
U
Consumer Group
‘group1’
C2
p1.offset = 7
p2.offset = 9
p3.offset = 7
C1

#JCConf
Consumer Group
21
broker2
broker1
P
A
Partition 1
0 1
G
2
H
3
I
4
J
5
V
6
W
7
X
8
c
B
Partition 2
0 1
E
2
F
3
M
4
N
5
Q
6
R
7
S
8
Y
9
b
C
Partition 3
0 1
D
2
K
3
L
4
O
5
P
6
T
7
U
’group1’
C2
C1
C1
’group2’

#JCConf
Consumer Group
P A
Partition 1
0 1
B
2
C
3
D
4
E
5
F
6
G
7
H
8
I
9
J
C1
C2
C3
offset = 9
Consumer Group
22
Partition to Consumer is Many to One relation (In One
Consumer Group)

#JCConf
● Messages from the same partition guarantee FIFO
semantic
● Traditional MQ can only guarantee message are
delivered in order
● Kafka can guarantee messages are handled in order (for
same partition)
Message Ordering
23
P B
C1
C2
P
P1 C1
C2P2
Traditional MQ Kafka

#JCConf
● At most once - Messages may be lost but are
never redelivered.
● At least once - Messages are never lost but may
be redelivered.
● Exactly once - each message is delivered once
and only once. (this is what people actually want)
- Two-Phase Commit
- At least once + Idempotence
Delivery Semantic
24
Apply multiple times without changing the ﬁnal result

#JCConf
● Which part do we discuss?
Delivery Semantic
25
Producer Broker Consumer

#JCConf
● At most once - Async send
● At least once - Sync send (with retry count)
" Exactly once!
- Idempotent delivery does not support until next
version (0.9)
Producer To Broker
26

#JCConf
● At most once - Store the offset before handling the
message
● At least once - Store the offset after handling the
message
● Exactly once - At least once + Idempotent
operation
Broker to Consumer
27

#JCConf
● The unit of replication is the partition!
● Each partition has a single leader and zero or more
followers
● All reads and writes go to the leader of the partition
Replication
28
source: http://www.infoq.com/articles/apache-kafka
Leader FollowerFollower
Producer Consumer
sync sync
write read

#JCConf
● Many data system retain a latest state for data by
some key.
● Log compaction adds an alternative retention
mechanism, log compaction, to support retaining
messages by key instead of purely by time.
● This would describe both many common data
systems — a search index, a cache, etc
Log Compaction
30

#JCConf34
Persistence and Fast?

#JCConf
● Don’t fear ﬁle system
● Six 7,200 RPM SATA RAID-5 array
- Sequential write: 600MB/sec
- Random write: 100K/sec
● Sequential read in disk faster than random access in memory?
Sequential vs Random
35
source: http://queue.acm.org/detail.cfm?id=1563874

#JCConf
If we persist data, should we cache
the data in memory?
36

#JCConf
● In-Process Cache
- Message as object
- Cache in JVM heap.
● Page Cache
- Disk cache by OS
In-Process Cache vs Page Cache
37

#JCConf
38
In Process Cache Disk Page Cache
Memory
Usage
In-heap memory Free Memory
Overhead Object overhead No
Garbage
Collection
Yes No
Process
Restart
Lost Still Warm
Controled
by
App OS

#JCConf
● Fact
- All disk reads and writes will go through page
cache. This feature cannot easily be turned off
without using direct I/O, so even if a process
maintains an in-process cache of the data, this
data will likely be duplicated in OS pagecache,
effectively storing everything twice.
● Conclusion
- Relying on pagecache is superior to maintaining
an in-process cache or other structure
39

#JCConf
How to transfer to consumers?
40

#JCConf
Application Copy vs Zero Copying
41

#JCConf
● Traditional Queue
- Broker keep the message state and metadata
- B-Tree O(log n)
- Random Access
● Kafka
- Consumers keep the offset
- Sequential Disk Read/Write O(1)
Constant Time
42

#JCConf
Producer
45
Async Send

#JCConf
High Level Consumer
46
Open The Consumer Connector
Open the stream for topic

#JCConf
High Level Consumer
47
Receive the message

#JCConf
● Realtime processing and analyzing
● Stream processing frameworks
- Strom
- Spark Streaming
- Samza
● Distributed stream source + Distributed stream
processing
● All these three frameworks support Kafka as stream
source.
Source of Stream Processing
49
Kafka
Cluster
Stream
Processing

#JCConf
● The most reliable source for stream processing
Source of Stream Processing
50
source: http://www.slideshare.net/ptgoetz/apache-storm-vs-spark-streaming

#JCConf
● Centralized Log Framework
● Distributed Log Collectors
- Logstash
- Fluentd
- Flume
Source and/or Sink of Distributed Log
Collectors
51
Kafka
Cluster
Distributed Log  
Collector
Other
Sink
Kafka
Cluster
Distributed Log  
Collector
Other
Source

#JCConf
● Push vs Pull 
 
 
 
 
 
● Distributed Log Collector provide Conﬁgurable
producer and consumer
● Kafka Cluster provide distributed, high availability,
reliable message system
Source and/or Sink of Distributed Log
Collectors (cont.)
52
Distributed Log  
Collector
Kafka Cluster
pull
pull
push
push

#JCConf
● What is lambda architecture?
- Stream for realtime data
- Batch for historical data
- Query by merged view.
Source of Lambda Architecture
53
source: http://lambda-architecture.net/

#JCConf
Lambda Architecture (cont.)
54
source: https://metamarkets.com/2014/building-a-data-pipeline-that-handles-billions-of-events-in-real-time/

#JCConf
● Features
- Durability
- Scalability
- High Availability
- High Throughput
● Basic Concept
- Producer, Broker, Consumer, Consumer Group
- Topic, Partition, Message
- Message Ordering
- Delivery Semantic
- Replication
● Why Kafka fast
● Using Scenarios
- Source of stream processing
- Source or sink of distributed log framework
- Source of lambda architecture
Recap
55

#JCConf
● Kafka Documentation 
kafka.apache.org/documentation.html
● Kafka Wiki 
https://cwiki.apache.org/conﬂuence/display/KAFKA/Index
● The Log: What every software engineer should know about real-
time data's unifying abstraction 
engineering.linkedin.com/distributed-systems/log-what-every-software-
engineer-should-know-about-real-time-datas-unifying
● Benchmarking Apache Kafka: 2 Million Writes Per Second (On
Three Cheap Machines) 
engineering.linkedin.com/kafka/benchmarking-apache-kafka-2-million-
writes-second-three-cheap-machines
● Apache Kafka for Beginners 
blog.cloudera.com/blog/2014/09/apache-kafka-for-beginners/
Reference
56

#JCConf57
producer.send(“thanks”);

#JCConf
// any question? 
question = consumer.receive();
58

Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014

Similar a Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014 (20)

Último

Último (20)

Apache Kafka: A high-throughput distributed messaging system @ JCConf 2014