SlideShare una empresa de Scribd logo
1 de 14
Presented By:
Kafka: Internals
Sourabh Verma
Software Consultant
Agenda
● Brief Introduction to Kafka - it’s need?
● Key Terminologies - Zookeeper, Broker, Topic, Partitions,
Offsets, and Replication.
● IT Team and Kafka Cluster Analogy.
● Summarize the Core components and responsibilities.
● Kafka Tuning: Availability & Consistency.
● Delivery Semantics.
● Producer Tuning - Configs. (BONUS)
● DEMO
Brief Introduction to Kafka - it’s need?
● Kafka is a horizontally scalable, fault tolerant, and fast messaging system. It’s a pub-
sub model in which various producers and consumers can write and read. It
decouples source and target systems.
Few Use Cases:
● Scale to hundreds of nodes.
● Can handle millions of messages per second.
● Real-time processing (~10ms).
Key Terminologies
● ZooKeeper is a centralized service for managing distributed systems. It acts as
ensemble layer (ties things together) and ensures high availability of the Kafka
cluster.
● ZooKeeper stores metadata and the current state of the Kafka cluster. For example,
details like topic name, the number of partitions, replication, leader details of
partition, and In-Sync Replicas are stored in ZooKeeper. (After Kafka 0.10,
consumer offsets are not stored in ZooKeeper).
● Broker is a single Kafka node that is managed by ZooKeeper. A set of brokers form a
Kafka cluster. Topics that are created in Kafka are distributed across brokers based
on the partition, replication, and other factors.
Note: When a broker node fails based on the state stored in ZooKeeper it
automatically rebalances the cluster and if a leader partition is lost then one of the
follower partition (ISR) is elected as the leader.
● Topic is a specific stream of data. It is very similar to a table in a NoSQL database.
Like tables in a NoSQL database, the topic is split into partitions that enable topics
to be distributed across various nodes. Like primary keys in tables, topics have
offsets per partitions. You can uniquely identify a message using its topic,
partition, and offset.
Key Terminologies
● Partitions enable topics to be distributed across the cluster. Partitions are a unit of
parallelism for horizontal scalability. One topic can have more than one partition
scaling across nodes.
Messages are assigned to partitions based on partition keys, if there are no partition
keys then the partition is randomly assigned. It’s important to use the correct key
to avoid hotspots.
● Offsets - Each message in a partition is assigned an incremental id called an offset.
Offsets are unique per partition and messages are ordered only within a
partition. Messages written to partitions are immutable.
Note: Messages are not ordered between multiple partitions.
Key Terminologies
● Replication is making a copy of a partition available in another broker. Replication
enables Kafka to be fault tolerant. When a partition of the topic is available in
multiple brokers then one of the partitions in a broker is elected as the leader and the
rest of the replications of the partition are followers.
● Replication enables Kafka to be fault tolerant even when a broker is down, the
partition from another broker is elected as a leader and it starts serving the producers
and consumer groups. Replica partitions that are in sync with the leader are flagged
as ISR (In Sync Replica).
IT Team and Kafka Cluster Analogy.
Summarize the Core components and responsibilities.
● ZooKeeper manages Kafka brokers and their metadata.
● Brokers are horizontally scalable Kafka nodes that contain topics and it's replications.
● Topics are message streams with one or more partitions.
● Partitions contains messages with unique offsets per partition.
● Replication enables Kafka to be fault tolerant using follower partitions (ISRs).
Kafka Tuning: Availability & Consistency
● Cluster Size (N): Number of nodes/brokers in the Kafka cluster, we should have
2x+1, i.e. at least 3 nodes or more in an odd number.
● Partitions: Topic is divided into partitions (by default 1), but we should have M times
N, where M can be any integer number, i.e. M >= 1, to achieve more parallelism and
partitioning of data over the cluster. (only if order is not a concern.)
● Replication Factor: determines the number of copies (including the original/Leader)
of each partition in the cluster. All replicas of a partition exist on separate
node/broker, and we should never have R.F. > N, but at least 3.
We recommend having 3 RF with 3 or 5 nodes cluster. This helps in having both
availabilities as well as consistency.
● In-sync Replica (ISR): Number of minimum replicas (including the leader) synced
up, i.e. available for the producer to successfully send messages to the partition.
This inversely impacts the availability for producer i.e. lower the ISR more the
availability and lesser the consistency and vice versa. we should always have ISR
lower than RF. We recommend having 2 ISR for topics with RF as 3.
Note: Setting ISR to 1 is almost equivalent to having no replication in a system.
Kafka Tuning: Availability & Consistency
● Acknowledgment: message to be written into the number of replicas before it is
acknowledged to the producer.
a. Setting acks to 0 will make the system to send acknowledgment without writing
the message which may lose the data,
b. setting it to 1 means it should be written at least to the leader replica,
c. and setting it to all means message should be written to all in-sync replica which
helps in consistency but drops the availability.
Note: Setting acks to 0 or 1 can lead to loss of data & inconsistent partitions, in case
of leader failure, the next ISR replica might not be aware of the recent message which will
cause inconsistency in order of events in replicas.
● Unclean Leader Election: in case of failure of all ISR, out-of-sync replica is elected
as Leader, setting this to TRUE is not recommended at all, as it will lose the
consistency of the system, this should be used only and only if we need the 100%
availability irrespective of the consistency.
Delivery Semantics
● Acks = 0: At most once delivery semantics. The producer uses “send and forget
approach”. High data loss.
● Acks = 1: At least once delivery semantics. Moderate data loss and Duplicate.
● Acks = ALL: exactly once delivery semantics (acks to all min.insync.replica). No
data loss.
Producer Tuning - Configs
● Batch.size – batch size (messages) per request, producer will write to the partition
leader when the batch will be full (even if the linger.ms is not completed.)
● Linger.ms – Time to wait before sending the current batch, if the time is elapsed
even the batch is not full, producer will start sending the events to the Broker.
● Max.in.flight.requests.per.connection - The number of messages to be sent
without any acknowledgment. Default is 5. Set this to 1 to avoid out of order message
due to retry.
References
● Kafka Documentation
● Kafka Technical Overview
● Kafka Tuning: Consistency vs Availability
Thank You !

Más contenido relacionado

La actualidad más candente

APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsKetan Gote
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developersconfluent
 
Kafka Overview
Kafka OverviewKafka Overview
Kafka Overviewiamtodor
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaShiao-An Yuan
 
Kafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platformKafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platformJean-Paul Azar
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka IntroductionAmita Mirajkar
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafkaemreakis
 
How Apache Kafka® Works
How Apache Kafka® WorksHow Apache Kafka® Works
How Apache Kafka® Worksconfluent
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controllerconfluent
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaJoe Stein
 
Introduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterIntroduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterconfluent
 

La actualidad más candente (20)

Kafka basics
Kafka basicsKafka basics
Kafka basics
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka Streams
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developers
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Kafka Overview
Kafka OverviewKafka Overview
Kafka Overview
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Kafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platformKafka Tutorial - basics of the Kafka streaming platform
Kafka Tutorial - basics of the Kafka streaming platform
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Apache Kafka at LinkedIn
Apache Kafka at LinkedInApache Kafka at LinkedIn
Apache Kafka at LinkedIn
 
How Apache Kafka® Works
How Apache Kafka® WorksHow Apache Kafka® Works
How Apache Kafka® Works
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
 
Introduction to apache kafka
Introduction to apache kafkaIntroduction to apache kafka
Introduction to apache kafka
 
Introduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matterIntroduction to Apache Kafka and Confluent... and why they matter
Introduction to Apache Kafka and Confluent... and why they matter
 

Similar a Kafka: Internals

Kafka basics and best prectices
Kafka basics and best precticesKafka basics and best prectices
Kafka basics and best precticesRohitSingh542417
 
Kafka in action - Tech Talk - Paytm
Kafka in action - Tech Talk - PaytmKafka in action - Tech Talk - Paytm
Kafka in action - Tech Talk - PaytmSumit Jain
 
Messaging queue - Kafka
Messaging queue - KafkaMessaging queue - Kafka
Messaging queue - KafkaMayank Bansal
 
MySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspectiveMySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspectiveUlf Wendel
 
Introduction to Kafka Streams Presentation
Introduction to Kafka Streams PresentationIntroduction to Kafka Streams Presentation
Introduction to Kafka Streams PresentationKnoldus Inc.
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingDibyendu Bhattacharya
 
A Quick Guide to Refresh Kafka Skills
A Quick Guide to Refresh Kafka SkillsA Quick Guide to Refresh Kafka Skills
A Quick Guide to Refresh Kafka SkillsRavindra kumar
 
Kafka Reliable Data Delivery
Kafka Reliable Data DeliveryKafka Reliable Data Delivery
Kafka Reliable Data DeliveryMostafa Asgari
 
Timothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLEdunomica
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaAngelo Cesaro
 
Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018Otávio Carvalho
 
Stateful stream processing with kafka and samza
Stateful stream processing with kafka and samzaStateful stream processing with kafka and samza
Stateful stream processing with kafka and samzaGeorge Li
 
bigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar Appsbigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar AppsTimothy Spann
 
Introduction to Kafka and Event-Driven
Introduction to Kafka and Event-DrivenIntroduction to Kafka and Event-Driven
Introduction to Kafka and Event-DrivenDimosthenis Botsaris
 
Introduction to Kafka and Event-Driven
Introduction to Kafka and Event-DrivenIntroduction to Kafka and Event-Driven
Introduction to Kafka and Event-Drivenarconsis
 

Similar a Kafka: Internals (20)

Kafka basics and best prectices
Kafka basics and best precticesKafka basics and best prectices
Kafka basics and best prectices
 
Kafka in action - Tech Talk - Paytm
Kafka in action - Tech Talk - PaytmKafka in action - Tech Talk - Paytm
Kafka in action - Tech Talk - Paytm
 
Messaging queue - Kafka
Messaging queue - KafkaMessaging queue - Kafka
Messaging queue - Kafka
 
MySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspectiveMySQL 5.7 clustering: The developer perspective
MySQL 5.7 clustering: The developer perspective
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka overview v0.1
Kafka overview v0.1Kafka overview v0.1
Kafka overview v0.1
 
Introduction to Kafka Streams Presentation
Introduction to Kafka Streams PresentationIntroduction to Kafka Streams Presentation
Introduction to Kafka Streams Presentation
 
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark StreamingNear Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
Near Real time Indexing Kafka Messages to Apache Blur using Spark Streaming
 
A Quick Guide to Refresh Kafka Skills
A Quick Guide to Refresh Kafka SkillsA Quick Guide to Refresh Kafka Skills
A Quick Guide to Refresh Kafka Skills
 
Cassandra Insider
Cassandra InsiderCassandra Insider
Cassandra Insider
 
Kafka Reliable Data Delivery
Kafka Reliable Data DeliveryKafka Reliable Data Delivery
Kafka Reliable Data Delivery
 
Timothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for MLTimothy Spann: Apache Pulsar for ML
Timothy Spann: Apache Pulsar for ML
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache Kafka
 
Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018Non-Kafkaesque Apache Kafka - Yottabyte 2018
Non-Kafkaesque Apache Kafka - Yottabyte 2018
 
Stateful stream processing with kafka and samza
Stateful stream processing with kafka and samzaStateful stream processing with kafka and samza
Stateful stream processing with kafka and samza
 
bigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar Appsbigdata 2022_ FLiP Into Pulsar Apps
bigdata 2022_ FLiP Into Pulsar Apps
 
Introduction to Kafka and Event-Driven
Introduction to Kafka and Event-DrivenIntroduction to Kafka and Event-Driven
Introduction to Kafka and Event-Driven
 
Introduction to Kafka and Event-Driven
Introduction to Kafka and Event-DrivenIntroduction to Kafka and Event-Driven
Introduction to Kafka and Event-Driven
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
kafka
kafkakafka
kafka
 

Más de Knoldus Inc.

Mastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML ParsingMastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML ParsingKnoldus Inc.
 
Akka gRPC Essentials A Hands-On Introduction
Akka gRPC Essentials A Hands-On IntroductionAkka gRPC Essentials A Hands-On Introduction
Akka gRPC Essentials A Hands-On IntroductionKnoldus Inc.
 
Entity Core with Core Microservices.pptx
Entity Core with Core Microservices.pptxEntity Core with Core Microservices.pptx
Entity Core with Core Microservices.pptxKnoldus Inc.
 
Introduction to Redis and its features.pptx
Introduction to Redis and its features.pptxIntroduction to Redis and its features.pptx
Introduction to Redis and its features.pptxKnoldus Inc.
 
GraphQL with .NET Core Microservices.pdf
GraphQL with .NET Core Microservices.pdfGraphQL with .NET Core Microservices.pdf
GraphQL with .NET Core Microservices.pdfKnoldus Inc.
 
NuGet Packages Presentation (DoT NeT).pptx
NuGet Packages Presentation (DoT NeT).pptxNuGet Packages Presentation (DoT NeT).pptx
NuGet Packages Presentation (DoT NeT).pptxKnoldus Inc.
 
Data Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable TestingData Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable TestingKnoldus Inc.
 
K8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose KubernetesK8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose KubernetesKnoldus Inc.
 
Introduction to Circle Ci Presentation.pptx
Introduction to Circle Ci Presentation.pptxIntroduction to Circle Ci Presentation.pptx
Introduction to Circle Ci Presentation.pptxKnoldus Inc.
 
Robusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptxRobusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptxKnoldus Inc.
 
Optimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptxOptimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptxKnoldus Inc.
 
Azure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptxAzure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptxKnoldus Inc.
 
CQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptxCQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptxKnoldus Inc.
 
ETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake PresentationETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake PresentationKnoldus Inc.
 
Scripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics PresentationScripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics PresentationKnoldus Inc.
 
Getting started with dotnet core Web APIs
Getting started with dotnet core Web APIsGetting started with dotnet core Web APIs
Getting started with dotnet core Web APIsKnoldus Inc.
 
Introduction To Rust part II Presentation
Introduction To Rust part II PresentationIntroduction To Rust part II Presentation
Introduction To Rust part II PresentationKnoldus Inc.
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationKnoldus Inc.
 
Configuring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRAConfiguring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRAKnoldus Inc.
 
Advanced Python (with dependency injection and hydra configuration packages)
Advanced Python (with dependency injection and hydra configuration packages)Advanced Python (with dependency injection and hydra configuration packages)
Advanced Python (with dependency injection and hydra configuration packages)Knoldus Inc.
 

Más de Knoldus Inc. (20)

Mastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML ParsingMastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
Mastering Web Scraping with JSoup Unlocking the Secrets of HTML Parsing
 
Akka gRPC Essentials A Hands-On Introduction
Akka gRPC Essentials A Hands-On IntroductionAkka gRPC Essentials A Hands-On Introduction
Akka gRPC Essentials A Hands-On Introduction
 
Entity Core with Core Microservices.pptx
Entity Core with Core Microservices.pptxEntity Core with Core Microservices.pptx
Entity Core with Core Microservices.pptx
 
Introduction to Redis and its features.pptx
Introduction to Redis and its features.pptxIntroduction to Redis and its features.pptx
Introduction to Redis and its features.pptx
 
GraphQL with .NET Core Microservices.pdf
GraphQL with .NET Core Microservices.pdfGraphQL with .NET Core Microservices.pdf
GraphQL with .NET Core Microservices.pdf
 
NuGet Packages Presentation (DoT NeT).pptx
NuGet Packages Presentation (DoT NeT).pptxNuGet Packages Presentation (DoT NeT).pptx
NuGet Packages Presentation (DoT NeT).pptx
 
Data Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable TestingData Quality in Test Automation Navigating the Path to Reliable Testing
Data Quality in Test Automation Navigating the Path to Reliable Testing
 
K8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose KubernetesK8sGPTThe AI​ way to diagnose Kubernetes
K8sGPTThe AI​ way to diagnose Kubernetes
 
Introduction to Circle Ci Presentation.pptx
Introduction to Circle Ci Presentation.pptxIntroduction to Circle Ci Presentation.pptx
Introduction to Circle Ci Presentation.pptx
 
Robusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptxRobusta -Tool Presentation (DevOps).pptx
Robusta -Tool Presentation (DevOps).pptx
 
Optimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptxOptimizing Kubernetes using GOLDILOCKS.pptx
Optimizing Kubernetes using GOLDILOCKS.pptx
 
Azure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptxAzure Function App Exception Handling.pptx
Azure Function App Exception Handling.pptx
 
CQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptxCQRS Design Pattern Presentation (Java).pptx
CQRS Design Pattern Presentation (Java).pptx
 
ETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake PresentationETL Observability: Azure to Snowflake Presentation
ETL Observability: Azure to Snowflake Presentation
 
Scripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics PresentationScripting with K6 - Beyond the Basics Presentation
Scripting with K6 - Beyond the Basics Presentation
 
Getting started with dotnet core Web APIs
Getting started with dotnet core Web APIsGetting started with dotnet core Web APIs
Getting started with dotnet core Web APIs
 
Introduction To Rust part II Presentation
Introduction To Rust part II PresentationIntroduction To Rust part II Presentation
Introduction To Rust part II Presentation
 
Data governance with Unity Catalog Presentation
Data governance with Unity Catalog PresentationData governance with Unity Catalog Presentation
Data governance with Unity Catalog Presentation
 
Configuring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRAConfiguring Workflows & Validators in JIRA
Configuring Workflows & Validators in JIRA
 
Advanced Python (with dependency injection and hydra configuration packages)
Advanced Python (with dependency injection and hydra configuration packages)Advanced Python (with dependency injection and hydra configuration packages)
Advanced Python (with dependency injection and hydra configuration packages)
 

Último

Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)OPEN KNOWLEDGE GmbH
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AIABDERRAOUF MEHENNI
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...OnePlan Solutions
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...soniya singh
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfjoe51371421
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantAxelRicardoTrocheRiq
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfkalichargn70th171
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 

Último (20)

Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)Der Spagat zwischen BIAS und FAIRNESS (2024)
Der Spagat zwischen BIAS und FAIRNESS (2024)
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AISyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
SyndBuddy AI 2k Review 2024: Revolutionizing Content Syndication with AI
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
Tech Tuesday-Harness the Power of Effective Resource Planning with OnePlan’s ...
 
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
Russian Call Girls in Karol Bagh Aasnvi ➡️ 8264348440 💋📞 Independent Escort S...
 
why an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdfwhy an Opensea Clone Script might be your perfect match.pdf
why an Opensea Clone Script might be your perfect match.pdf
 
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
Salesforce Certified Field Service Consultant
Salesforce Certified Field Service ConsultantSalesforce Certified Field Service Consultant
Salesforce Certified Field Service Consultant
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdfThe Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
The Essentials of Digital Experience Monitoring_ A Comprehensive Guide.pdf
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 

Kafka: Internals

  • 1. Presented By: Kafka: Internals Sourabh Verma Software Consultant
  • 2. Agenda ● Brief Introduction to Kafka - it’s need? ● Key Terminologies - Zookeeper, Broker, Topic, Partitions, Offsets, and Replication. ● IT Team and Kafka Cluster Analogy. ● Summarize the Core components and responsibilities. ● Kafka Tuning: Availability & Consistency. ● Delivery Semantics. ● Producer Tuning - Configs. (BONUS) ● DEMO
  • 3. Brief Introduction to Kafka - it’s need? ● Kafka is a horizontally scalable, fault tolerant, and fast messaging system. It’s a pub- sub model in which various producers and consumers can write and read. It decouples source and target systems. Few Use Cases: ● Scale to hundreds of nodes. ● Can handle millions of messages per second. ● Real-time processing (~10ms).
  • 4. Key Terminologies ● ZooKeeper is a centralized service for managing distributed systems. It acts as ensemble layer (ties things together) and ensures high availability of the Kafka cluster. ● ZooKeeper stores metadata and the current state of the Kafka cluster. For example, details like topic name, the number of partitions, replication, leader details of partition, and In-Sync Replicas are stored in ZooKeeper. (After Kafka 0.10, consumer offsets are not stored in ZooKeeper). ● Broker is a single Kafka node that is managed by ZooKeeper. A set of brokers form a Kafka cluster. Topics that are created in Kafka are distributed across brokers based on the partition, replication, and other factors. Note: When a broker node fails based on the state stored in ZooKeeper it automatically rebalances the cluster and if a leader partition is lost then one of the follower partition (ISR) is elected as the leader. ● Topic is a specific stream of data. It is very similar to a table in a NoSQL database. Like tables in a NoSQL database, the topic is split into partitions that enable topics to be distributed across various nodes. Like primary keys in tables, topics have offsets per partitions. You can uniquely identify a message using its topic, partition, and offset.
  • 5. Key Terminologies ● Partitions enable topics to be distributed across the cluster. Partitions are a unit of parallelism for horizontal scalability. One topic can have more than one partition scaling across nodes. Messages are assigned to partitions based on partition keys, if there are no partition keys then the partition is randomly assigned. It’s important to use the correct key to avoid hotspots. ● Offsets - Each message in a partition is assigned an incremental id called an offset. Offsets are unique per partition and messages are ordered only within a partition. Messages written to partitions are immutable. Note: Messages are not ordered between multiple partitions.
  • 6. Key Terminologies ● Replication is making a copy of a partition available in another broker. Replication enables Kafka to be fault tolerant. When a partition of the topic is available in multiple brokers then one of the partitions in a broker is elected as the leader and the rest of the replications of the partition are followers. ● Replication enables Kafka to be fault tolerant even when a broker is down, the partition from another broker is elected as a leader and it starts serving the producers and consumer groups. Replica partitions that are in sync with the leader are flagged as ISR (In Sync Replica).
  • 7. IT Team and Kafka Cluster Analogy.
  • 8. Summarize the Core components and responsibilities. ● ZooKeeper manages Kafka brokers and their metadata. ● Brokers are horizontally scalable Kafka nodes that contain topics and it's replications. ● Topics are message streams with one or more partitions. ● Partitions contains messages with unique offsets per partition. ● Replication enables Kafka to be fault tolerant using follower partitions (ISRs).
  • 9. Kafka Tuning: Availability & Consistency ● Cluster Size (N): Number of nodes/brokers in the Kafka cluster, we should have 2x+1, i.e. at least 3 nodes or more in an odd number. ● Partitions: Topic is divided into partitions (by default 1), but we should have M times N, where M can be any integer number, i.e. M >= 1, to achieve more parallelism and partitioning of data over the cluster. (only if order is not a concern.) ● Replication Factor: determines the number of copies (including the original/Leader) of each partition in the cluster. All replicas of a partition exist on separate node/broker, and we should never have R.F. > N, but at least 3. We recommend having 3 RF with 3 or 5 nodes cluster. This helps in having both availabilities as well as consistency. ● In-sync Replica (ISR): Number of minimum replicas (including the leader) synced up, i.e. available for the producer to successfully send messages to the partition. This inversely impacts the availability for producer i.e. lower the ISR more the availability and lesser the consistency and vice versa. we should always have ISR lower than RF. We recommend having 2 ISR for topics with RF as 3. Note: Setting ISR to 1 is almost equivalent to having no replication in a system.
  • 10. Kafka Tuning: Availability & Consistency ● Acknowledgment: message to be written into the number of replicas before it is acknowledged to the producer. a. Setting acks to 0 will make the system to send acknowledgment without writing the message which may lose the data, b. setting it to 1 means it should be written at least to the leader replica, c. and setting it to all means message should be written to all in-sync replica which helps in consistency but drops the availability. Note: Setting acks to 0 or 1 can lead to loss of data & inconsistent partitions, in case of leader failure, the next ISR replica might not be aware of the recent message which will cause inconsistency in order of events in replicas. ● Unclean Leader Election: in case of failure of all ISR, out-of-sync replica is elected as Leader, setting this to TRUE is not recommended at all, as it will lose the consistency of the system, this should be used only and only if we need the 100% availability irrespective of the consistency.
  • 11. Delivery Semantics ● Acks = 0: At most once delivery semantics. The producer uses “send and forget approach”. High data loss. ● Acks = 1: At least once delivery semantics. Moderate data loss and Duplicate. ● Acks = ALL: exactly once delivery semantics (acks to all min.insync.replica). No data loss.
  • 12. Producer Tuning - Configs ● Batch.size – batch size (messages) per request, producer will write to the partition leader when the batch will be full (even if the linger.ms is not completed.) ● Linger.ms – Time to wait before sending the current batch, if the time is elapsed even the batch is not full, producer will start sending the events to the Broker. ● Max.in.flight.requests.per.connection - The number of messages to be sent without any acknowledgment. Default is 5. Set this to 1 to avoid out of order message due to retry.
  • 13. References ● Kafka Documentation ● Kafka Technical Overview ● Kafka Tuning: Consistency vs Availability