Más contenido relacionado La actualidad más candente (20) Similar a Lessons from the field: Catalog of Kafka Deployments | Joseph Niemiec, Cloudera (20) Más de HostedbyConfluent (20) Lessons from the field: Catalog of Kafka Deployments | Joseph Niemiec, Cloudera1. Lessons from the field:
Catalog of Kafka Deployments
Joseph Niemiec | Sr. Product Manager
3. 3
© 2021 Cloudera, Inc. All rights reserved.
Failure/Fault Domains
Limiting the state and scope of failure
What is a Failure/Fault Domain?
A set of components that share a
single point of failure and fail in a
correlated manner
4. 4
© 2021 Cloudera, Inc. All rights reserved.
Serial and Parallel Systems Reliability
Serial Systems
• Failure of any component results
in failure of entire system
• Adding more weakens the
system ie - 90% * 90% = 81%
– Rs= R1* R2… Rn
Parallel Systems
• One component online results in
system online
• Adding more strengthens the system
ie - 1-(1-0.90)*(1-0.90) = 99%
– Rs= 1 - (1 -R1) *(1-R2) … (1-Rn)
5. 5
© 2021 Cloudera, Inc. All rights reserved.
Basic Topic Design Theory
Partitions
● Few as needed
● Increased open files pressure
Topic Replicas
● Parallel availability
min-insync.replicas
● Durability vs Availability
● ack=all
○ for all isr with at least the min isr available
● replica.lag.time.max.ms
○ default 30 seconds!!!
Use log.retention.bytes
● Can use together
● Protect from topics growing larger
than brokers
Log Segments need to be
eligible for retention policies
● log.segment.bytes and log.roll.ms
7. 7
© 2021 Cloudera, Inc. All rights reserved.
Single Node Dev Cluster
• Everything is Serial
• Failure/Fault Domain
– The Cluster
– The Broker & Zookeeper
– The Laptop
– Everything!
• Failure/Fault Tolerance
– Deploy over Multiple Servers
Kafka Cluster
1 Broker
1 Zookeeper
8. 8
© 2021 Cloudera, Inc. All rights reserved.
Single Cluster - Single Rack w/wo colocated Zookeeper
• Some things are Serial
• Some things are Parallel
• Failure/Fault Domain
– Rack
– Brokers
• Other Notes
– Shared Log Dirs for Brokers
and Zookeeper
9. 9
© 2021 Cloudera, Inc. All rights reserved.
Single Cluster - VMs
• Some things are Serial
• Some things are Parallel
• Failure/Fault Domain
– VM Hosts
– Replica Placement
• Replica per VM Host
• Other Notes
– Debugging Complexity
10. 10
© 2021 Cloudera, Inc. All rights reserved.
Classic Failover - Two Clusters AcFve / Passive
• Mostly Parallel*
• Failure/Fault Domain
– Mirror Makers
– Cluster
– WAN*
– Data Center
• Other Notes
– Offsets During Failover
– WAN Bandwidth
– Mirror Maker Producer Side
11. 11
© 2021 Cloudera, Inc. All rights reserved.
STREAMS
REPLICATION
MANAGER (SRM)
with Mirror Maker2
• Supports active-active, multi-
cluster, cross DC replication &
other scenarios
• Leverage Kafka Connect for
scalability and HA
• Replicate data and configurations
• Offset translation for easy
failover
12. 12
© 2021 Cloudera, Inc. All rights reserved.
SRM
MONITORING
SERVICES
• SMM Cluster Replications View
provides monitoring
• SRM Service calculates replication
metrics (latency, throughput)
• REST endpoints with Swagger UI
• SMM Replication Flow Alerting
13. 13
© 2021 Cloudera, Inc. All rights reserved.
Triple OnSite Cluster - In/Out/SysOps
• Serial Pipeline
• Parallel Systems
• Failure/Fault Domain
– Cluster
• Isolated Failures in Pipeline
• IN / Out Pipeline
– Producers / Apps / Consumers
– Datacenter
• Other Notes
– Test Configs/Apps on DevOps
– Ingress / Egress Isolation
14. 14
© 2021 Cloudera, Inc. All rights reserved.
MulF-Geo DistribuFon and AggregaFon
• Serial Pipeline
• Parallel Systems
• Failure/Fault Domain
– Data Center
– Mirror Maker
– Cluster
• Other Notes
– Fat Network Pipe
15. 15
© 2021 Cloudera, Inc. All rights reserved.
Dual Aggregation Active Active
• Mostly Parallel*
• Failure/Fault Domain
– Data Center
– Cluster
– Mirror Maker
– Consumers / Producers*
• Other Notes
– Active/Active
– Ingress / Egress Isolation
– Consumers on any side
16. 16
© 2021 Cloudera, Inc. All rights reserved.
Multi-AZ/Cloud Spanning
• Parallel*
• Failure/Fault Domain
– Availability Zone
– Replica Placement*
• Other Notes
– Rack Awareness
– min-insync.replicas