State-of-the-art Patterns for Efficient, Resilient, and Highly Available global kafka

State-of-the-Art Patterns for
Efﬁcient, Resilient, and
Highly Available Global
Kafka
Sanjana Kaundinya, Luke Knepper, and Nikhil Bhatia

2
Nikhil Bhatia
Engineering Leader, Kafka
Global, Confluent
About Myself
• Previous project at Confluent -
Infinite Storage for Kafka
• Principal Engineer at Microsoft
@nikhilbhatia
linkedin.com/in/nikhil-bhatia-a2a8115

Agenda
3
1. Kafka Replication Overview
2. Kafka Global Replication Overview
3. Kafka Stretched Clusters + demo
4. Kafka Connected Clusters + demo
5. Q&A

Kafka Overview
4
● Broker - Stores messages in partitions
● Topic - Virtual Group of one or more partitions
● Partitions - Log ﬁles on disk with sequential write only. Kafka
guarantees message ordering in a partition.
Broker
T1
P
P2
P1
C
C
C
P

CG1
Kafka Log Offsets
5
P1
C1
P
C2
4 5 6 7 8 9
0 1 2 3
CG2
C1
C2
P2
P3
P4
Partition 1
__consumer
_offset
startOffset CG1 CG2 HW endOffset
Produce
● The log end offset is the offset of
the last message written to a log.
● The high watermark offset is the
offset of the last message that
was successfully copied to all of
the log’s replicas.
● The consumer offset is a way of
tracking the sequential order in
which messages are received by
Kafka topics.

Why do we Need Replication ?
6
Broker can do down
● Controlled - software/conﬁg update
● Uncontrolled - compute/disk fault, bugs
When broker goes down - durability and availability impact
● Data loss
● Some Partition on the cluster not available
From a major cloud provided
“For any Single Instance Virtual Machine using Standard HDD Managed Disks for Operating System Disks and Data Disks,
we guarantee you will have Virtual Machine Connectivity of at least 95%.”

Kafka Replication
7
● Partition replicas are evenly distributed (e.g. Replication Factor = 3)
● Strongly Consistent - Byte wise identical , Ordering guarantee
P1
(L)
P2
(L)
P4
(L)
P1
P2
P3
P4
P3
P1
P3
(L)
P4
P2

How are messages committed ?
8
● Leader maintains In Sync Replicas (ISR)
● Leader waits for followers to sync replicas
● If follower fails it will shrink the ISR, continue being available
● Leader and Follower failure cases are handled by having leader epoch be part
of producer message and log truncation in the follower (refer KIP-101)
●
R1
(L)
R2
(L)
R4
(L)
R2
R1
R3
R4
R3
R3
R3
(L)
R4
R2
P1
(L)

Salient points for Replication
9
● Intra cluster Replication helps improve durability and availability for
node level failures.
● Offsets are core piece of Kafka producer and consumer ecosystem.
● Kafka Replication protocol ensures strong consistency through byte
by byte replication and providing message ordering guarantees.

Multi Zone(MZ) HA Kafka Cluster
10
B B
zk zk
P C
B
zk
AZ1 AZ2 AZ3
Inter Zone Latency <10 ms
Typical ~3 ms

Why Globally Replicate ?
11
● Disaster Recovery
○ DC can go down, regional failures do happen even in major
clouds
■ AWS (US-EAST-1) Region failure in last Nov
■ Microsoft's last March 3 Azure East US outage
○ Planned failovers (expected hurricane, prepare and prevent
outage)
○ Passing DR audits are a requirement
● Fan-out - Topic sharing - Data needs to get near to the consumer
● Fan-in - Aggregate Clusters e.g. IOT use cases
● Cluster/Cloud Migration

Disaster Recovery - Metrics
12
Recovery Point Objective(RPO):
Maximum amount of data – as
measured by time – that can be lost
after a recovery
Recovery Time Objective (RTO):
Targeted duration of time and a
service level within which a business
process must be restored after a
disaster
*Source: Wikipedia

Differences Among Multi-DC Solutions
Multi-Region
Clusters
(Stretched)
Cluster
Linking
(Connected)
Replicator
Replicator/MM2
(Connected
Externally)

14
Luke Knepper
Product Manager for
Global Kafka
About Myself
● Stanford CS ⇒
● Lead Software Engineer ⇒
● Stanford MBA ⇒
● Product Manager
@knep
linkedin.com/in/knepper

Stretched Clusters
● Fast Disaster Recovery
● Offset Preserving
● Automated Client Failover
with No Custom Code
● Sync or Async Replication per
Topic with Conﬂuent’s
Multi-Region Clusters
16

Replica Placement
● Ensure your partition’s
replicas are spread
throughout your data centers
17
placement.json
"replicas": [ {
"count": 2,
"constraints": { "rack": "east" }
}, {
"count": 2,
"constraints": { "rack": "west" }
} ],
kafka-topics --create
--bootstrap-server localhost:9091
--topic annas_topic
--partitions 1
--config min.insync.replicas=3
--replica-placement placement.json

Follower
Fetching
● Your consumers can
read from a nearby
replica -- to save on
networking costs &
latency
18

Observers: Asynchronous Replicas
19
● Replicate partitions from the leader like followers.
● Not considered when we increment the high watermark
● Improved durability without sacriﬁcing write throughput
● Replicates across slower/higher-latency links without falling in and
out of sync (also known as ISR thrashing)
● Available in Conﬂuent Server

Example: Datacenter Failover for 2.5 DCs
2.5 datacenters 4 Replicas (“R”) + 2 Observers (“O”) Min ISR: 3 acks=all
20
R R
O
ISR
DC A DC B
Zk Zk
R R
O
Zk Zk
DC 0.5
Zk
Steady State
Observers stay out of the ISR.
Min ISR of 3 forces writes go to
both datacenters.
Note: This “half-DC” is
needed to prevent a “split
brain” between the two
datacenters

Example: Datacenter Failover for 2.5 DCs
2.5 datacenters 4 Replicas (“R”) + 2 Observers (“O”) Min ISR: 3 acks=all
21
R R
O
ISR
DC A DC B - oﬄine
Zk Zk
R R
O
Zk Zk
DC 0.5
Zk
DC Failure
In-sync replica count falls below min ISR.
Observer automatically joins ISR.
Min ISR is satisﬁed, writes can go to remaining datacenter.
Availability is automatically maintained.

22
Sanjana Kaundinya
Software Engineer, Global
Kafka
About Myself
- Working at Conﬂuent for the past 1.5
years supporting and developing
global replication technologies
@skaundinya15
linkedin.com/in/sanjanakaundinya

Clusters can
replicate using
Kafka Connect
● Disaster Recovery
● Active-Active Replication
● Offset Translation
● MM2 and Conﬂuent
Replicator
Connect based
Replication
24
R

Fundamentals of Kafka Connect
25
● Offset management
● Elastic scalability
● Parallelization
● Task distribution
● Metrics
● Failure & Retries
● Conﬁguration
Management
● REST API
● Schemas & Data Types

Multi-Geo Replication Through Connect
26
Source
Connector

Offset Translation in MirrorMaker 2.0
27
offset_sync
topic,
partition,
src offset,
matching dest offset
checkpoints
topic,
partition,
group name,
consumer group src offset,
matching dest offset
Consumer
translateOffsets
Destination Cluster

Active-Active Replication in MirrorMaker 2.0
28
● Two clusters can be conﬁgured to replicate to each other
○ Known as an “Active-Active” replication scenario
● Records are produced to both clusters and can be seen by clients in
both clusters
○ Problem that comes from this is cyclic replication
● MirrorMaker 2.0 uses alias detection
○ Example: Topics containing “us-west” in preﬁx won’t be
replicated to “us-west” cluster
● Holds true regardless of cluster topology
● Cluster can be replicated to several downstream clusters

Offset Translation in Replicator
29

Active-Active Replication in Replicator
30
● Replicator adds a provenance header to each record with the
following information
ID of the origin cluster where this message was first produced
Name of the topic to which this message was first produced
Timestamp when Replicator first copied the record
● Replicator (by default) will skip over any records where the
destination cluster id is the same as the origin cluster id in the
provenance header
● By adding a bit of overhead to the record, cyclic replication is
prevented
● Works with more than two clusters

Connecting Clusters
Sans Kafka Connect
● Multi continent replication
without the need for an
external system
● Offset preserving, thereby
eliminating the need for
offset translation
● Use cases include data
sharing, cluster migration,
and hybrid cloud
architectures
Cluster Linking
31

Cluster Linking Architecture Overview
32
● Extends the existing replica fetching protocol
○ Uses protocol to fetch across clusters
● Link has information necessary for destination to talk to source
● Destination cluster can create mirror topics
○ Topics fetch from source and have same conﬁgs as source topic
● Mirror topics are immutable
○ Recognizes partition is a mirror and fetches over cluster link
● Allows for mirror topic to be byte for byte replica of source topic
○ Maintains offset consistency across clusters, eliminates need for
offset translation

Questions?
skaundinya@confluent.io
nikhil@confluent.io
lknepper@confluent.io
We are hiring! :)

State-of-the-art Patterns for Efficient, Resilient, and Highly Available global kafka

Recomendados

Recomendados

Más contenido relacionado

Más de confluent

Más de confluent (20)

Último

Último (20)

State-of-the-art Patterns for Efficient, Resilient, and Highly Available global kafka