SlideShare una empresa de Scribd logo
1 de 17
Descargar para leer sin conexión
Lessons from the field:
Catalog of Kafka Deployments
Joseph Niemiec | Sr. Product Manager
Failure / Fault
Domains and Tolerance
3
© 2021 Cloudera, Inc. All rights reserved.
Failure/Fault Domains
Limiting the state and scope of failure
What is a Failure/Fault Domain?
A set of components that share a
single point of failure and fail in a
correlated manner
4
© 2021 Cloudera, Inc. All rights reserved.
Serial and Parallel Systems Reliability
Serial Systems
• Failure of any component results
in failure of entire system
• Adding more weakens the
system ie - 90% * 90% = 81%
– Rs= R1* R2… Rn
Parallel Systems
• One component online results in
system online
• Adding more strengthens the system
ie - 1-(1-0.90)*(1-0.90) = 99%
– Rs= 1 - (1 -R1) *(1-R2) … (1-Rn)
5
© 2021 Cloudera, Inc. All rights reserved.
Basic Topic Design Theory
Partitions
● Few as needed
● Increased open files pressure
Topic Replicas
● Parallel availability
min-insync.replicas
● Durability vs Availability
● ack=all
○ for all isr with at least the min isr available
● replica.lag.time.max.ms
○ default 30 seconds!!!
Use log.retention.bytes
● Can use together
● Protect from topics growing larger
than brokers
Log Segments need to be
eligible for retention policies
● log.segment.bytes and log.roll.ms
Kafka Deployments
7
© 2021 Cloudera, Inc. All rights reserved.
Single Node Dev Cluster
• Everything is Serial
• Failure/Fault Domain
– The Cluster
– The Broker & Zookeeper
– The Laptop
– Everything!
• Failure/Fault Tolerance
– Deploy over Multiple Servers
Kafka Cluster
1 Broker
1 Zookeeper
8
© 2021 Cloudera, Inc. All rights reserved.
Single Cluster - Single Rack w/wo colocated Zookeeper
• Some things are Serial
• Some things are Parallel
• Failure/Fault Domain
– Rack
– Brokers
• Other Notes
– Shared Log Dirs for Brokers
and Zookeeper
9
© 2021 Cloudera, Inc. All rights reserved.
Single Cluster - VMs
• Some things are Serial
• Some things are Parallel
• Failure/Fault Domain
– VM Hosts
– Replica Placement
• Replica per VM Host
• Other Notes
– Debugging Complexity
10
© 2021 Cloudera, Inc. All rights reserved.
Classic Failover - Two Clusters AcFve / Passive
• Mostly Parallel*
• Failure/Fault Domain
– Mirror Makers
– Cluster
– WAN*
– Data Center
• Other Notes
– Offsets During Failover
– WAN Bandwidth
– Mirror Maker Producer Side
11
© 2021 Cloudera, Inc. All rights reserved.
STREAMS
REPLICATION
MANAGER (SRM)
with Mirror Maker2
• Supports active-active, multi-
cluster, cross DC replication &
other scenarios
• Leverage Kafka Connect for
scalability and HA
• Replicate data and configurations
• Offset translation for easy
failover
12
© 2021 Cloudera, Inc. All rights reserved.
SRM
MONITORING
SERVICES
• SMM Cluster Replications View
provides monitoring
• SRM Service calculates replication
metrics (latency, throughput)
• REST endpoints with Swagger UI
• SMM Replication Flow Alerting
13
© 2021 Cloudera, Inc. All rights reserved.
Triple OnSite Cluster - In/Out/SysOps
• Serial Pipeline
• Parallel Systems
• Failure/Fault Domain
– Cluster
• Isolated Failures in Pipeline
• IN / Out Pipeline
– Producers / Apps / Consumers
– Datacenter
• Other Notes
– Test Configs/Apps on DevOps
– Ingress / Egress Isolation
14
© 2021 Cloudera, Inc. All rights reserved.
MulF-Geo DistribuFon and AggregaFon
• Serial Pipeline
• Parallel Systems
• Failure/Fault Domain
– Data Center
– Mirror Maker
– Cluster
• Other Notes
– Fat Network Pipe
15
© 2021 Cloudera, Inc. All rights reserved.
Dual Aggregation Active Active
• Mostly Parallel*
• Failure/Fault Domain
– Data Center
– Cluster
– Mirror Maker
– Consumers / Producers*
• Other Notes
– Active/Active
– Ingress / Egress Isolation
– Consumers on any side
16
© 2021 Cloudera, Inc. All rights reserved.
Multi-AZ/Cloud Spanning
• Parallel*
• Failure/Fault Domain
– Availability Zone
– Replica Placement*
• Other Notes
– Rack Awareness
– min-insync.replicas
Thank You!
Q&A

Más contenido relacionado

La actualidad más candente

Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...
Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...
Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...
HostedbyConfluent
 
Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...
Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...
Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...
HostedbyConfluent
 

La actualidad más candente (20)

Low-latency real-time data processing at giga-scale with Kafka | John DesJard...
Low-latency real-time data processing at giga-scale with Kafka | John DesJard...Low-latency real-time data processing at giga-scale with Kafka | John DesJard...
Low-latency real-time data processing at giga-scale with Kafka | John DesJard...
 
A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff ...
A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff ...A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff ...
A Look into the Mirror: Patterns and Best Practices for MirrorMaker2 | Cliff ...
 
Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...
Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...
Kafka error handling patterns and best practices | Hemant Desale and Aruna Ka...
 
Building Scalable Real-Time Data Pipelines with the Couchbase Kafka Connector...
Building Scalable Real-Time Data Pipelines with the Couchbase Kafka Connector...Building Scalable Real-Time Data Pipelines with the Couchbase Kafka Connector...
Building Scalable Real-Time Data Pipelines with the Couchbase Kafka Connector...
 
Supercharge Your Real-time Event Processing with Neo4j's Streams Kafka Connec...
Supercharge Your Real-time Event Processing with Neo4j's Streams Kafka Connec...Supercharge Your Real-time Event Processing with Neo4j's Streams Kafka Connec...
Supercharge Your Real-time Event Processing with Neo4j's Streams Kafka Connec...
 
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, PresetStreaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
Streaming Data Analytics with ksqlDB and Superset | Robert Stolz, Preset
 
Kafka Excellence at Scale – Cloud, Kubernetes, Infrastructure as Code (Vik Wa...
Kafka Excellence at Scale – Cloud, Kubernetes, Infrastructure as Code (Vik Wa...Kafka Excellence at Scale – Cloud, Kubernetes, Infrastructure as Code (Vik Wa...
Kafka Excellence at Scale – Cloud, Kubernetes, Infrastructure as Code (Vik Wa...
 
Death of the dumb pipes: Using Apache Kafka® for Integration projects
Death of the dumb pipes: Using Apache Kafka® for Integration projectsDeath of the dumb pipes: Using Apache Kafka® for Integration projects
Death of the dumb pipes: Using Apache Kafka® for Integration projects
 
Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...
Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...
Building a Modern, Scalable Cyber Intelligence Platform with Apache Kafka | J...
 
Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...
Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...
Event-driven Applications with Kafka, Micronaut, and AWS Lambda | Dave Klein,...
 
The Road Most Traveled: A Kafka Story | Heikki Nousiainen, Aiven
The Road Most Traveled: A Kafka Story | Heikki Nousiainen, AivenThe Road Most Traveled: A Kafka Story | Heikki Nousiainen, Aiven
The Road Most Traveled: A Kafka Story | Heikki Nousiainen, Aiven
 
Guaranteed Event Delivery with Kafka and NodeJS | Amitesh Madhur, Nutanix
Guaranteed Event Delivery with Kafka and NodeJS | Amitesh Madhur, NutanixGuaranteed Event Delivery with Kafka and NodeJS | Amitesh Madhur, Nutanix
Guaranteed Event Delivery with Kafka and NodeJS | Amitesh Madhur, Nutanix
 
Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...
Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...
Feed Your SIEM Smart with Kafka Connect (Vitalii Rudenskyi, McKesson Corp) Ka...
 
How Confluent Completes the Event Streaming Platform (Addison Huddy & Dan Ros...
How Confluent Completes the Event Streaming Platform (Addison Huddy & Dan Ros...How Confluent Completes the Event Streaming Platform (Addison Huddy & Dan Ros...
How Confluent Completes the Event Streaming Platform (Addison Huddy & Dan Ros...
 
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
SingleStore & Kafka: Better Together to Power Modern Real-Time Data Architect...
 
Exposing and Controlling Kafka Event Streaming with Kong Konnect Enterprise |...
Exposing and Controlling Kafka Event Streaming with Kong Konnect Enterprise |...Exposing and Controlling Kafka Event Streaming with Kong Konnect Enterprise |...
Exposing and Controlling Kafka Event Streaming with Kong Konnect Enterprise |...
 
Understanding Kafka Produce and Fetch api calls for high throughtput applicat...
Understanding Kafka Produce and Fetch api calls for high throughtput applicat...Understanding Kafka Produce and Fetch api calls for high throughtput applicat...
Understanding Kafka Produce and Fetch api calls for high throughtput applicat...
 
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, QlikKeeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
 
5 lessons learned for successful migration to Confluent cloud | Natan Silinit...
5 lessons learned for successful migration to Confluent cloud | Natan Silinit...5 lessons learned for successful migration to Confluent cloud | Natan Silinit...
5 lessons learned for successful migration to Confluent cloud | Natan Silinit...
 
Distributed Data Storage & Streaming for Real-time Decisioning Using Kafka, S...
Distributed Data Storage & Streaming for Real-time Decisioning Using Kafka, S...Distributed Data Storage & Streaming for Real-time Decisioning Using Kafka, S...
Distributed Data Storage & Streaming for Real-time Decisioning Using Kafka, S...
 

Similar a Lessons from the field: Catalog of Kafka Deployments | Joseph Niemiec, Cloudera

Best Practices of HA and Replication of PostgreSQL in Virtualized Environments
Best Practices of HA and Replication of PostgreSQL in Virtualized EnvironmentsBest Practices of HA and Replication of PostgreSQL in Virtualized Environments
Best Practices of HA and Replication of PostgreSQL in Virtualized Environments
Jignesh Shah
 
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...
LarryZaman
 

Similar a Lessons from the field: Catalog of Kafka Deployments | Joseph Niemiec, Cloudera (20)

VMworld 2013: Virtualizing Highly Available SQL Servers
VMworld 2013: Virtualizing Highly Available SQL Servers VMworld 2013: Virtualizing Highly Available SQL Servers
VMworld 2013: Virtualizing Highly Available SQL Servers
 
5 Quick Wins for the Cloud
5 Quick Wins for the Cloud5 Quick Wins for the Cloud
5 Quick Wins for the Cloud
 
Best Practices of HA and Replication of PostgreSQL in Virtualized Environments
Best Practices of HA and Replication of PostgreSQL in Virtualized EnvironmentsBest Practices of HA and Replication of PostgreSQL in Virtualized Environments
Best Practices of HA and Replication of PostgreSQL in Virtualized Environments
 
1 architecture & design
1   architecture & design1   architecture & design
1 architecture & design
 
SQL Server Clustering and High Availability
SQL Server Clustering and High AvailabilitySQL Server Clustering and High Availability
SQL Server Clustering and High Availability
 
Next-Gen Decision Making in Under 2ms
Next-Gen Decision Making in Under 2msNext-Gen Decision Making in Under 2ms
Next-Gen Decision Making in Under 2ms
 
Oracle RAC and Your Way to the Cloud by Angelo Pruscino
Oracle RAC and Your Way to the Cloud by Angelo PruscinoOracle RAC and Your Way to the Cloud by Angelo Pruscino
Oracle RAC and Your Way to the Cloud by Angelo Pruscino
 
Geographically Distributed Multi-Master MySQL Clusters
Geographically Distributed Multi-Master MySQL ClustersGeographically Distributed Multi-Master MySQL Clusters
Geographically Distributed Multi-Master MySQL Clusters
 
Capital One's Next Generation Decision in less than 2 ms
Capital One's Next Generation Decision in less than 2 msCapital One's Next Generation Decision in less than 2 ms
Capital One's Next Generation Decision in less than 2 ms
 
Availability Considerations for SQL Server
Availability Considerations for SQL ServerAvailability Considerations for SQL Server
Availability Considerations for SQL Server
 
Presentation bringing xen server to mission critical system
Presentation   bringing xen server to mission critical systemPresentation   bringing xen server to mission critical system
Presentation bringing xen server to mission critical system
 
Oracle Linux/Oracle VM & Oracle Cloud Overview
Oracle Linux/Oracle VM & Oracle Cloud OverviewOracle Linux/Oracle VM & Oracle Cloud Overview
Oracle Linux/Oracle VM & Oracle Cloud Overview
 
VMworld Europe 2014: A Blueprint for Disaster Recovery of Business Critical A...
VMworld Europe 2014: A Blueprint for Disaster Recovery of Business Critical A...VMworld Europe 2014: A Blueprint for Disaster Recovery of Business Critical A...
VMworld Europe 2014: A Blueprint for Disaster Recovery of Business Critical A...
 
V mware v fabric 5 - what's new technical sales training presentation
V mware v fabric 5 - what's new technical sales training presentationV mware v fabric 5 - what's new technical sales training presentation
V mware v fabric 5 - what's new technical sales training presentation
 
Sneak Peek into the New ChangeMan ZMF Release
Sneak Peek into the New ChangeMan ZMF ReleaseSneak Peek into the New ChangeMan ZMF Release
Sneak Peek into the New ChangeMan ZMF Release
 
Sneak Peek into the New ChangeMan ZMF Release
Sneak Peek into the New ChangeMan ZMF ReleaseSneak Peek into the New ChangeMan ZMF Release
Sneak Peek into the New ChangeMan ZMF Release
 
Oracle business continuity for virtualization and cloud infrastructure
Oracle business continuity for virtualization and cloud infrastructureOracle business continuity for virtualization and cloud infrastructure
Oracle business continuity for virtualization and cloud infrastructure
 
Tokyo azure meetup #12 service fabric internals
Tokyo azure meetup #12   service fabric internalsTokyo azure meetup #12   service fabric internals
Tokyo azure meetup #12 service fabric internals
 
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...
Business_Continuity_Planning_with_SQL_Server_HADR_options_TechEd_Bangalore_20...
 
Pro sphere customer technical
Pro sphere customer technicalPro sphere customer technical
Pro sphere customer technical
 

Más de HostedbyConfluent

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
HostedbyConfluent
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
HostedbyConfluent
 

Más de HostedbyConfluent (20)

Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
Renaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit LondonRenaming a Kafka Topic | Kafka Summit London
Renaming a Kafka Topic | Kafka Summit London
 
Evolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at TrendyolEvolution of NRT Data Ingestion Pipeline at Trendyol
Evolution of NRT Data Ingestion Pipeline at Trendyol
 
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking TechniquesEnsuring Kafka Service Resilience: A Dive into Health-Checking Techniques
Ensuring Kafka Service Resilience: A Dive into Health-Checking Techniques
 
Exactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and KafkaExactly-once Stream Processing with Arroyo and Kafka
Exactly-once Stream Processing with Arroyo and Kafka
 
Fish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit LondonFish Plays Pokemon | Kafka Summit London
Fish Plays Pokemon | Kafka Summit London
 
Tiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit LondonTiered Storage 101 | Kafla Summit London
Tiered Storage 101 | Kafla Summit London
 
Building a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And WhyBuilding a Self-Service Stream Processing Portal: How And Why
Building a Self-Service Stream Processing Portal: How And Why
 
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
From the Trenches: Improving Kafka Connect Source Connector Ingestion from 7 ...
 
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
Future with Zero Down-Time: End-to-end Resiliency with Chaos Engineering and ...
 
Navigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka ClustersNavigating Private Network Connectivity Options for Kafka Clusters
Navigating Private Network Connectivity Options for Kafka Clusters
 
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data PlatformApache Flink: Building a Company-wide Self-service Streaming Data Platform
Apache Flink: Building a Company-wide Self-service Streaming Data Platform
 
Explaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy PubExplaining How Real-Time GenAI Works in a Noisy Pub
Explaining How Real-Time GenAI Works in a Noisy Pub
 
TL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit LondonTL;DR Kafka Metrics | Kafka Summit London
TL;DR Kafka Metrics | Kafka Summit London
 
A Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSLA Window Into Your Kafka Streams Tasks | KSL
A Window Into Your Kafka Streams Tasks | KSL
 
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing PerformanceMastering Kafka Producer Configs: A Guide to Optimizing Performance
Mastering Kafka Producer Configs: A Guide to Optimizing Performance
 
Data Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and BeyondData Contracts Management: Schema Registry and Beyond
Data Contracts Management: Schema Registry and Beyond
 
Code-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink AppsCode-First Approach: Crafting Efficient Flink Apps
Code-First Approach: Crafting Efficient Flink Apps
 
Debezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC EcosystemDebezium vs. the World: An Overview of the CDC Ecosystem
Debezium vs. the World: An Overview of the CDC Ecosystem
 
Beyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local DisksBeyond Tiered Storage: Serverless Kafka with No Local Disks
Beyond Tiered Storage: Serverless Kafka with No Local Disks
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Victor Rentea
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Victor Rentea
 

Último (20)

DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
Apidays New York 2024 - APIs in 2030: The Risk of Technological Sleepwalk by ...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 

Lessons from the field: Catalog of Kafka Deployments | Joseph Niemiec, Cloudera

  • 1. Lessons from the field: Catalog of Kafka Deployments Joseph Niemiec | Sr. Product Manager
  • 2. Failure / Fault Domains and Tolerance
  • 3. 3 © 2021 Cloudera, Inc. All rights reserved. Failure/Fault Domains Limiting the state and scope of failure What is a Failure/Fault Domain? A set of components that share a single point of failure and fail in a correlated manner
  • 4. 4 © 2021 Cloudera, Inc. All rights reserved. Serial and Parallel Systems Reliability Serial Systems • Failure of any component results in failure of entire system • Adding more weakens the system ie - 90% * 90% = 81% – Rs= R1* R2… Rn Parallel Systems • One component online results in system online • Adding more strengthens the system ie - 1-(1-0.90)*(1-0.90) = 99% – Rs= 1 - (1 -R1) *(1-R2) … (1-Rn)
  • 5. 5 © 2021 Cloudera, Inc. All rights reserved. Basic Topic Design Theory Partitions ● Few as needed ● Increased open files pressure Topic Replicas ● Parallel availability min-insync.replicas ● Durability vs Availability ● ack=all ○ for all isr with at least the min isr available ● replica.lag.time.max.ms ○ default 30 seconds!!! Use log.retention.bytes ● Can use together ● Protect from topics growing larger than brokers Log Segments need to be eligible for retention policies ● log.segment.bytes and log.roll.ms
  • 7. 7 © 2021 Cloudera, Inc. All rights reserved. Single Node Dev Cluster • Everything is Serial • Failure/Fault Domain – The Cluster – The Broker & Zookeeper – The Laptop – Everything! • Failure/Fault Tolerance – Deploy over Multiple Servers Kafka Cluster 1 Broker 1 Zookeeper
  • 8. 8 © 2021 Cloudera, Inc. All rights reserved. Single Cluster - Single Rack w/wo colocated Zookeeper • Some things are Serial • Some things are Parallel • Failure/Fault Domain – Rack – Brokers • Other Notes – Shared Log Dirs for Brokers and Zookeeper
  • 9. 9 © 2021 Cloudera, Inc. All rights reserved. Single Cluster - VMs • Some things are Serial • Some things are Parallel • Failure/Fault Domain – VM Hosts – Replica Placement • Replica per VM Host • Other Notes – Debugging Complexity
  • 10. 10 © 2021 Cloudera, Inc. All rights reserved. Classic Failover - Two Clusters AcFve / Passive • Mostly Parallel* • Failure/Fault Domain – Mirror Makers – Cluster – WAN* – Data Center • Other Notes – Offsets During Failover – WAN Bandwidth – Mirror Maker Producer Side
  • 11. 11 © 2021 Cloudera, Inc. All rights reserved. STREAMS REPLICATION MANAGER (SRM) with Mirror Maker2 • Supports active-active, multi- cluster, cross DC replication & other scenarios • Leverage Kafka Connect for scalability and HA • Replicate data and configurations • Offset translation for easy failover
  • 12. 12 © 2021 Cloudera, Inc. All rights reserved. SRM MONITORING SERVICES • SMM Cluster Replications View provides monitoring • SRM Service calculates replication metrics (latency, throughput) • REST endpoints with Swagger UI • SMM Replication Flow Alerting
  • 13. 13 © 2021 Cloudera, Inc. All rights reserved. Triple OnSite Cluster - In/Out/SysOps • Serial Pipeline • Parallel Systems • Failure/Fault Domain – Cluster • Isolated Failures in Pipeline • IN / Out Pipeline – Producers / Apps / Consumers – Datacenter • Other Notes – Test Configs/Apps on DevOps – Ingress / Egress Isolation
  • 14. 14 © 2021 Cloudera, Inc. All rights reserved. MulF-Geo DistribuFon and AggregaFon • Serial Pipeline • Parallel Systems • Failure/Fault Domain – Data Center – Mirror Maker – Cluster • Other Notes – Fat Network Pipe
  • 15. 15 © 2021 Cloudera, Inc. All rights reserved. Dual Aggregation Active Active • Mostly Parallel* • Failure/Fault Domain – Data Center – Cluster – Mirror Maker – Consumers / Producers* • Other Notes – Active/Active – Ingress / Egress Isolation – Consumers on any side
  • 16. 16 © 2021 Cloudera, Inc. All rights reserved. Multi-AZ/Cloud Spanning • Parallel* • Failure/Fault Domain – Availability Zone – Replica Placement* • Other Notes – Rack Awareness – min-insync.replicas