SlideShare a Scribd company logo
1 of 25
APACHE KAFKA
WHAT IS KAFKA?
• KAFKA is a distributed messaging system that was
originally developed at LinkedIn to serve as the
foundation for LinkedIn's activity stream and operational
data processing pipeline. It is now used at a variety of
different companies for various data pipeline and
messaging uses.
• LinkedIn developed Kafka for collecting and delivering
high volumes of log data with low latency for real-time
log processing.
• Scala and Zookeeper based.
WHAT IS KAFKA? CONTD..
• KAFKA can be used for both ONLINE(realtime) and
OFFLINE (integration with hadoop) analysis of log data.
• Distributed and scalable .
• High Throughput.
• More Priority to efficiency then fancy features.
• Simple API.
• Low Overhead.
APACHE KAFKA
=
+LOG AGGREGATOR MESSAGING SYSTEM
Where to use APACHE KAFKA?
• user activity events corresponding to logins, page-views,
clicks, “likes” ,sharing, comments, and search queries
• operational metrics (health monitoring)
• search relevance.
• Recommendations.
• Ad targeting and reporting.
• Protection against abnormal behavior.
• newsfeed
• Etc.
FLAWS in Traditional Messaging Systems.
• More focus on delivery guarantee.
• Less focus on throughput.
• Weak in distributed support.
• Performance degrades as Queue increases.
KAFKA DESIGN
PRINCIPLES
ARCHITECTURE
ARCHITECTURE contd..
• A stream of messages of particular type is defined by a topic.
• A topic is then divided into multiple partitions
• Each partition is a directory that has list of files and the files store
the message data.
• A producer can publish messages to a topic (push).
• Published messages are then stored on a set of servers called
brokers.
• Each broker stores one or more partitions
• Consumers can subscribe to any of the topics and can consume
messages by pulling data from the brokers.
• Supports both Point-to-point Delivery model and Publish/subscribe
Delivery model.
• Uses “Pull Based” system model.
WHY “PULL MODEL”?
• A “push-based” system has difficulty dealing with diverse
consumers as the broker controls the rate at which data
is transferred, (DOS attack = loss of data).
• In a Pull based system, a consumer simply falls behind
and catches up when it can.
• Consumer can rewind back to consume old messages.
DEPLOYMENT
Storage
• Simple Storage
• One each partition corresponds to a logical log.(a log ==
a list of files.)
• Physically a log is implemented using segmentation
where each segment is approximately of equal size.
• When a message is published , the broker simply
appends the message to the last segment file.
• Messages are flushed on to disk in a batch for better
performance.
• Messages is only exposed to consumers after they are
flushed.
• Messages are addressed by log offset. #lessoverhead.
Storage contd..
• Id of next message = length of current message + id of current
message.
Efficient Transfer
• Producer can submit multiple messages in a single send
request.( #End-to-endmessagebatching).
• Each pull request can consume multiple messages up to
a certain size.
• No caching of messages on the Kafta process layer.
Messages are only cached in the page cache, --“no
double buffering”. Hence very little overhead in garbage
collecting its memory.(#filesystemcaching)#lessoverhead
• Producer and Consumer access the data files
Sequentially. (disks are fast when accessed
sequentially).
Efficient Transfer :from local file to remote
socket
1. read data from the storage media to the page cache in an OS.
2. copy data in the page cache to an application buffer.
3. copy application buffer to another kernel buffer.
4. send the kernel buffer to the socket
• This process usually takes 4 data copying and one system call.
SENDFILE API
avoids 2 copies call and 1 system call
#zerocopytransfer
broker consumer
Stateless Broker
• Consumer maintains its own state hence reducing
complexity and overhead on the broker.
• Disadvantage is that broker doesn’t know whether all
subscriber’s have consumed the message , this problem
is solved by using a simple time based SLA for the
retention policy. A message is automatically deleted if it
has been
• A consumer can deliberately rewind back to an old offset
and re-consume data. #violateslogicofqueue.
Distributed Model
Producer 1 Producer 2
Broker 1 Broker 2 Broker 3
Consumer 1 Consumer 2
Zookeeper
Distributed Model contd..
• Consumer Group consists of one or more consumers.
• All messages from a partition are consumed by a single
consumer in a consumer group. #lessoverheadonbroker.
• No master node  less complexity and no master
failures to worry about.
• Zookeeper co-ordinates between the producers,
consumers and brokers.
Zookeeper functions.
• Detection of addition/ removal of brokers, producers, consumers.
• Triggering rebalance process in effect to above detection.
• Tracking.
Broker registry hostname, port and
set of topics of broker.
ephemeral
Consumer Registry consumer group Ephemeral
Owner registry consumer currently
consuming a partition
persistent
Offset registry stores the offset of last
consumed partition for
each subscribed
partition.
persistent
Delivery Guarantee
• At least once delivery. #cancauseduplication.#cost-
effective
• Messages from single partition in order.
• Messages from multiple partitions not necessarily in
order. #noguarantee
• CRC for each message in the logs #avoidlogcorruption
• If I/O error on broker then kafka removes messages with
inconsistent CRCs.
• If the storage system is completely damaged and
consumer have not consumed the message then the
message is lost forever. #futureplanstoaddreplication.
Recent Developments
• End-to-End Batch Level Compression.
• Improved Stream Processing Libraries.
• Hadoop Consumer.
• Hadoop Producer.
Apache Kafka @
References.
• http://incubator.apache.org/kafka/
• http://research.microsoft.com/en-us/um/people/srikan
• http://vimeo.com/27592622

More Related Content

What's hot

[WSO2Con EU 2017] WSO2 Unleashed: Full Stack Automation, Pitfalls and Solutions
[WSO2Con EU 2017] WSO2 Unleashed: Full Stack Automation, Pitfalls and Solutions[WSO2Con EU 2017] WSO2 Unleashed: Full Stack Automation, Pitfalls and Solutions
[WSO2Con EU 2017] WSO2 Unleashed: Full Stack Automation, Pitfalls and SolutionsWSO2
 
Azure Application insights - An Introduction
Azure Application insights - An IntroductionAzure Application insights - An Introduction
Azure Application insights - An IntroductionMatthias Güntert
 
Real User Monitoring (RUM)
Real User Monitoring (RUM)Real User Monitoring (RUM)
Real User Monitoring (RUM)Site24x7
 
Site24x7 Cloud Monitoring
Site24x7 Cloud MonitoringSite24x7 Cloud Monitoring
Site24x7 Cloud MonitoringSite24x7
 
[WSO2Con EU 2017] How a Large Organization Weighted on a WSO2 Integration Pla...
[WSO2Con EU 2017] How a Large Organization Weighted on a WSO2 Integration Pla...[WSO2Con EU 2017] How a Large Organization Weighted on a WSO2 Integration Pla...
[WSO2Con EU 2017] How a Large Organization Weighted on a WSO2 Integration Pla...WSO2
 
Modernizing Cloud and Hyperconverged Infrastructure monitoring
Modernizing Cloud and Hyperconverged Infrastructure monitoringModernizing Cloud and Hyperconverged Infrastructure monitoring
Modernizing Cloud and Hyperconverged Infrastructure monitoringManageEngine, Zoho Corporation
 
Cloud Bursting with A10 Lightning ADS
Cloud Bursting with A10 Lightning ADSCloud Bursting with A10 Lightning ADS
Cloud Bursting with A10 Lightning ADSAkshay Mathur
 
"What database can tell about application issues? What application can tell a...
"What database can tell about application issues? What application can tell a..."What database can tell about application issues? What application can tell a...
"What database can tell about application issues? What application can tell a...Fwdays
 
One Azure Monitor to Rule Them All? - Marius Zaharia
One Azure Monitor to Rule Them All? - Marius ZahariaOne Azure Monitor to Rule Them All? - Marius Zaharia
One Azure Monitor to Rule Them All? - Marius ZahariaITCamp
 
Azure API Manegement Introduction and Integeration with BizTalk
Azure API Manegement Introduction and Integeration with BizTalkAzure API Manegement Introduction and Integeration with BizTalk
Azure API Manegement Introduction and Integeration with BizTalkShailesh Dwivedi
 
Full Stack Reactive In Practice
Full Stack Reactive In PracticeFull Stack Reactive In Practice
Full Stack Reactive In PracticeLightbend
 
Nava SIEM Agent Datasheet
Nava SIEM Agent DatasheetNava SIEM Agent Datasheet
Nava SIEM Agent DatasheetLinkgard
 
Restful Asynchronous Notification
Restful Asynchronous NotificationRestful Asynchronous Notification
Restful Asynchronous NotificationMichael Koster
 
Cloud monitoring
Cloud monitoringCloud monitoring
Cloud monitoringGang Tao
 
Building Lightweight Microservices With Redis & Hydra
Building Lightweight Microservices With Redis & HydraBuilding Lightweight Microservices With Redis & Hydra
Building Lightweight Microservices With Redis & HydraRedis Labs
 
Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detecti...
Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detecti...Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detecti...
Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detecti...Flink Forward
 
Keynote : évolution et vision d'Elastic Observability
Keynote : évolution et vision d'Elastic ObservabilityKeynote : évolution et vision d'Elastic Observability
Keynote : évolution et vision d'Elastic ObservabilityElasticsearch
 

What's hot (20)

[WSO2Con EU 2017] WSO2 Unleashed: Full Stack Automation, Pitfalls and Solutions
[WSO2Con EU 2017] WSO2 Unleashed: Full Stack Automation, Pitfalls and Solutions[WSO2Con EU 2017] WSO2 Unleashed: Full Stack Automation, Pitfalls and Solutions
[WSO2Con EU 2017] WSO2 Unleashed: Full Stack Automation, Pitfalls and Solutions
 
Blr hadoop meetup
Blr hadoop meetupBlr hadoop meetup
Blr hadoop meetup
 
Monitor Cloud Resources using Alerts & Insights
Monitor Cloud Resources using Alerts & InsightsMonitor Cloud Resources using Alerts & Insights
Monitor Cloud Resources using Alerts & Insights
 
Azure Application insights - An Introduction
Azure Application insights - An IntroductionAzure Application insights - An Introduction
Azure Application insights - An Introduction
 
Real User Monitoring (RUM)
Real User Monitoring (RUM)Real User Monitoring (RUM)
Real User Monitoring (RUM)
 
Site24x7 Cloud Monitoring
Site24x7 Cloud MonitoringSite24x7 Cloud Monitoring
Site24x7 Cloud Monitoring
 
[WSO2Con EU 2017] How a Large Organization Weighted on a WSO2 Integration Pla...
[WSO2Con EU 2017] How a Large Organization Weighted on a WSO2 Integration Pla...[WSO2Con EU 2017] How a Large Organization Weighted on a WSO2 Integration Pla...
[WSO2Con EU 2017] How a Large Organization Weighted on a WSO2 Integration Pla...
 
Modernizing Cloud and Hyperconverged Infrastructure monitoring
Modernizing Cloud and Hyperconverged Infrastructure monitoringModernizing Cloud and Hyperconverged Infrastructure monitoring
Modernizing Cloud and Hyperconverged Infrastructure monitoring
 
Cloud Bursting with A10 Lightning ADS
Cloud Bursting with A10 Lightning ADSCloud Bursting with A10 Lightning ADS
Cloud Bursting with A10 Lightning ADS
 
"What database can tell about application issues? What application can tell a...
"What database can tell about application issues? What application can tell a..."What database can tell about application issues? What application can tell a...
"What database can tell about application issues? What application can tell a...
 
One Azure Monitor to Rule Them All? - Marius Zaharia
One Azure Monitor to Rule Them All? - Marius ZahariaOne Azure Monitor to Rule Them All? - Marius Zaharia
One Azure Monitor to Rule Them All? - Marius Zaharia
 
Azure API Manegement Introduction and Integeration with BizTalk
Azure API Manegement Introduction and Integeration with BizTalkAzure API Manegement Introduction and Integeration with BizTalk
Azure API Manegement Introduction and Integeration with BizTalk
 
Full Stack Reactive In Practice
Full Stack Reactive In PracticeFull Stack Reactive In Practice
Full Stack Reactive In Practice
 
implementing the right website monitoring strategy
 implementing the right website monitoring strategy implementing the right website monitoring strategy
implementing the right website monitoring strategy
 
Nava SIEM Agent Datasheet
Nava SIEM Agent DatasheetNava SIEM Agent Datasheet
Nava SIEM Agent Datasheet
 
Restful Asynchronous Notification
Restful Asynchronous NotificationRestful Asynchronous Notification
Restful Asynchronous Notification
 
Cloud monitoring
Cloud monitoringCloud monitoring
Cloud monitoring
 
Building Lightweight Microservices With Redis & Hydra
Building Lightweight Microservices With Redis & HydraBuilding Lightweight Microservices With Redis & Hydra
Building Lightweight Microservices With Redis & Hydra
 
Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detecti...
Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detecti...Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detecti...
Flink Forward Berlin 2018: Yonatan Most & Avihai Berkovitz - "Anomaly Detecti...
 
Keynote : évolution et vision d'Elastic Observability
Keynote : évolution et vision d'Elastic ObservabilityKeynote : évolution et vision d'Elastic Observability
Keynote : évolution et vision d'Elastic Observability
 

Viewers also liked

3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...Timothy McCormick
 
3425 - Using publish/subscribe to integrate applications
3425 - Using publish/subscribe to integrate applications3425 - Using publish/subscribe to integrate applications
3425 - Using publish/subscribe to integrate applicationsTimothy McCormick
 
Introducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
Introducing IBM Message Hub: Cloud-scale messaging based on Apache KafkaIntroducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
Introducing IBM Message Hub: Cloud-scale messaging based on Apache KafkaAndrew Schofield
 
#1922 rest-push2 ap-im-v6
#1922 rest-push2 ap-im-v6#1922 rest-push2 ap-im-v6
#1922 rest-push2 ap-im-v6Jack Carnes
 
HHM-3540: The IBM MQ Light API: From Developer Laptop to Enterprise Data Cen...
 HHM-3540: The IBM MQ Light API: From Developer Laptop to Enterprise Data Cen... HHM-3540: The IBM MQ Light API: From Developer Laptop to Enterprise Data Cen...
HHM-3540: The IBM MQ Light API: From Developer Laptop to Enterprise Data Cen...Matt Leming
 
3429 How to transform your messaging environment to a secure messaging envi...
3429   How to transform your messaging environment to a secure messaging envi...3429   How to transform your messaging environment to a secure messaging envi...
3429 How to transform your messaging environment to a secure messaging envi...Robert Parker
 
WhatsNewIBMIntegrationBus10FP4
WhatsNewIBMIntegrationBus10FP4WhatsNewIBMIntegrationBus10FP4
WhatsNewIBMIntegrationBus10FP4bthomps1979
 
ConnectorsForIntegration
ConnectorsForIntegrationConnectorsForIntegration
ConnectorsForIntegrationbthomps1979
 
IBM Messaging Security - Why securing your environment is important : IBM Int...
IBM Messaging Security - Why securing your environment is important : IBM Int...IBM Messaging Security - Why securing your environment is important : IBM Int...
IBM Messaging Security - Why securing your environment is important : IBM Int...Leif Davidsen
 
Hia 1691-using iib-to_support_api_economy
Hia 1691-using iib-to_support_api_economyHia 1691-using iib-to_support_api_economy
Hia 1691-using iib-to_support_api_economyAndrew Coleman
 
Hia 1689-techinical introduction-to_iib
Hia 1689-techinical introduction-to_iibHia 1689-techinical introduction-to_iib
Hia 1689-techinical introduction-to_iibAndrew Coleman
 
Java zone 2015 How to make life with kafka easier.
Java zone 2015 How to make life with kafka easier.Java zone 2015 How to make life with kafka easier.
Java zone 2015 How to make life with kafka easier.Krzysztof Debski
 
Microservices: Where do they fit within a rapidly evolving integration archit...
Microservices: Where do they fit within a rapidly evolving integration archit...Microservices: Where do they fit within a rapidly evolving integration archit...
Microservices: Where do they fit within a rapidly evolving integration archit...Kim Clark
 

Viewers also liked (13)

3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...3450 - Writing and optimising applications for performance in a hybrid messag...
3450 - Writing and optimising applications for performance in a hybrid messag...
 
3425 - Using publish/subscribe to integrate applications
3425 - Using publish/subscribe to integrate applications3425 - Using publish/subscribe to integrate applications
3425 - Using publish/subscribe to integrate applications
 
Introducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
Introducing IBM Message Hub: Cloud-scale messaging based on Apache KafkaIntroducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
Introducing IBM Message Hub: Cloud-scale messaging based on Apache Kafka
 
#1922 rest-push2 ap-im-v6
#1922 rest-push2 ap-im-v6#1922 rest-push2 ap-im-v6
#1922 rest-push2 ap-im-v6
 
HHM-3540: The IBM MQ Light API: From Developer Laptop to Enterprise Data Cen...
 HHM-3540: The IBM MQ Light API: From Developer Laptop to Enterprise Data Cen... HHM-3540: The IBM MQ Light API: From Developer Laptop to Enterprise Data Cen...
HHM-3540: The IBM MQ Light API: From Developer Laptop to Enterprise Data Cen...
 
3429 How to transform your messaging environment to a secure messaging envi...
3429   How to transform your messaging environment to a secure messaging envi...3429   How to transform your messaging environment to a secure messaging envi...
3429 How to transform your messaging environment to a secure messaging envi...
 
WhatsNewIBMIntegrationBus10FP4
WhatsNewIBMIntegrationBus10FP4WhatsNewIBMIntegrationBus10FP4
WhatsNewIBMIntegrationBus10FP4
 
ConnectorsForIntegration
ConnectorsForIntegrationConnectorsForIntegration
ConnectorsForIntegration
 
IBM Messaging Security - Why securing your environment is important : IBM Int...
IBM Messaging Security - Why securing your environment is important : IBM Int...IBM Messaging Security - Why securing your environment is important : IBM Int...
IBM Messaging Security - Why securing your environment is important : IBM Int...
 
Hia 1691-using iib-to_support_api_economy
Hia 1691-using iib-to_support_api_economyHia 1691-using iib-to_support_api_economy
Hia 1691-using iib-to_support_api_economy
 
Hia 1689-techinical introduction-to_iib
Hia 1689-techinical introduction-to_iibHia 1689-techinical introduction-to_iib
Hia 1689-techinical introduction-to_iib
 
Java zone 2015 How to make life with kafka easier.
Java zone 2015 How to make life with kafka easier.Java zone 2015 How to make life with kafka easier.
Java zone 2015 How to make life with kafka easier.
 
Microservices: Where do they fit within a rapidly evolving integration archit...
Microservices: Where do they fit within a rapidly evolving integration archit...Microservices: Where do they fit within a rapidly evolving integration archit...
Microservices: Where do they fit within a rapidly evolving integration archit...
 

Similar to Apache kafka- Onkar Kadam

Distributed messaging with Apache Kafka
Distributed messaging with Apache KafkaDistributed messaging with Apache Kafka
Distributed messaging with Apache KafkaSaumitra Srivastav
 
Removing dependencies between services: Messaging and Apache Kafka
Removing dependencies between services: Messaging and Apache KafkaRemoving dependencies between services: Messaging and Apache Kafka
Removing dependencies between services: Messaging and Apache KafkaDaniel Muñoz Garrido
 
Building an Event Bus at Scale
Building an Event Bus at ScaleBuilding an Event Bus at Scale
Building an Event Bus at Scalejimriecken
 
apachekafka-160907180205.pdf
apachekafka-160907180205.pdfapachekafka-160907180205.pdf
apachekafka-160907180205.pdfTarekHamdi8
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka IntroductionAmita Mirajkar
 
kafka simplicity and complexity
kafka simplicity and complexitykafka simplicity and complexity
kafka simplicity and complexityPaolo Platter
 
Zookeeper Tutorial for beginners
Zookeeper Tutorial for beginnersZookeeper Tutorial for beginners
Zookeeper Tutorial for beginnersjeetendra mandal
 
Kafka Overview
Kafka OverviewKafka Overview
Kafka Overviewiamtodor
 
Introduction to Microservices
Introduction to MicroservicesIntroduction to Microservices
Introduction to MicroservicesMahmoudZidan41
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaAngelo Cesaro
 
OnPrem Monitoring.pdf
OnPrem Monitoring.pdfOnPrem Monitoring.pdf
OnPrem Monitoring.pdfTarekHamdi8
 
Unleashing Real-time Power with Kafka.pptx
Unleashing Real-time Power with Kafka.pptxUnleashing Real-time Power with Kafka.pptx
Unleashing Real-time Power with Kafka.pptxKnoldus Inc.
 
AWS Study Group - Chapter 07 - Integrating Application Services [Solution Arc...
AWS Study Group - Chapter 07 - Integrating Application Services [Solution Arc...AWS Study Group - Chapter 07 - Integrating Application Services [Solution Arc...
AWS Study Group - Chapter 07 - Integrating Application Services [Solution Arc...QCloudMentor
 
Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)
Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)
Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)Ontico
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache KafkaJeff Holoman
 
WSO2 Message Broker - Product Overview
WSO2 Message Broker - Product OverviewWSO2 Message Broker - Product Overview
WSO2 Message Broker - Product OverviewWSO2
 
Microservice message routing on Kubernetes
Microservice message routing on KubernetesMicroservice message routing on Kubernetes
Microservice message routing on KubernetesFrans van Buul
 

Similar to Apache kafka- Onkar Kadam (20)

Distributed messaging with Apache Kafka
Distributed messaging with Apache KafkaDistributed messaging with Apache Kafka
Distributed messaging with Apache Kafka
 
Removing dependencies between services: Messaging and Apache Kafka
Removing dependencies between services: Messaging and Apache KafkaRemoving dependencies between services: Messaging and Apache Kafka
Removing dependencies between services: Messaging and Apache Kafka
 
Building an Event Bus at Scale
Building an Event Bus at ScaleBuilding an Event Bus at Scale
Building an Event Bus at Scale
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka tutorial
Kafka tutorialKafka tutorial
Kafka tutorial
 
apachekafka-160907180205.pdf
apachekafka-160907180205.pdfapachekafka-160907180205.pdf
apachekafka-160907180205.pdf
 
Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
kafka simplicity and complexity
kafka simplicity and complexitykafka simplicity and complexity
kafka simplicity and complexity
 
Zookeeper Tutorial for beginners
Zookeeper Tutorial for beginnersZookeeper Tutorial for beginners
Zookeeper Tutorial for beginners
 
Kafka Overview
Kafka OverviewKafka Overview
Kafka Overview
 
Introduction to Microservices
Introduction to MicroservicesIntroduction to Microservices
Introduction to Microservices
 
Fundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache KafkaFundamentals and Architecture of Apache Kafka
Fundamentals and Architecture of Apache Kafka
 
OnPrem Monitoring.pdf
OnPrem Monitoring.pdfOnPrem Monitoring.pdf
OnPrem Monitoring.pdf
 
Unleashing Real-time Power with Kafka.pptx
Unleashing Real-time Power with Kafka.pptxUnleashing Real-time Power with Kafka.pptx
Unleashing Real-time Power with Kafka.pptx
 
AWS Study Group - Chapter 07 - Integrating Application Services [Solution Arc...
AWS Study Group - Chapter 07 - Integrating Application Services [Solution Arc...AWS Study Group - Chapter 07 - Integrating Application Services [Solution Arc...
AWS Study Group - Chapter 07 - Integrating Application Services [Solution Arc...
 
Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)
Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)
Metrics are Not Enough: Monitoring Apache Kafka / Gwen Shapira (Confluent)
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
WSO2 Message Broker - Product Overview
WSO2 Message Broker - Product OverviewWSO2 Message Broker - Product Overview
WSO2 Message Broker - Product Overview
 
Microservices deck
Microservices deckMicroservices deck
Microservices deck
 
Microservice message routing on Kubernetes
Microservice message routing on KubernetesMicroservice message routing on Kubernetes
Microservice message routing on Kubernetes
 

Recently uploaded

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Angeliki Cooney
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfOverkill Security
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 

Recently uploaded (20)

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 

Apache kafka- Onkar Kadam

  • 2. WHAT IS KAFKA? • KAFKA is a distributed messaging system that was originally developed at LinkedIn to serve as the foundation for LinkedIn's activity stream and operational data processing pipeline. It is now used at a variety of different companies for various data pipeline and messaging uses. • LinkedIn developed Kafka for collecting and delivering high volumes of log data with low latency for real-time log processing. • Scala and Zookeeper based.
  • 3. WHAT IS KAFKA? CONTD.. • KAFKA can be used for both ONLINE(realtime) and OFFLINE (integration with hadoop) analysis of log data. • Distributed and scalable . • High Throughput. • More Priority to efficiency then fancy features. • Simple API. • Low Overhead.
  • 5. Where to use APACHE KAFKA? • user activity events corresponding to logins, page-views, clicks, “likes” ,sharing, comments, and search queries • operational metrics (health monitoring) • search relevance. • Recommendations. • Ad targeting and reporting. • Protection against abnormal behavior. • newsfeed • Etc.
  • 6. FLAWS in Traditional Messaging Systems. • More focus on delivery guarantee. • Less focus on throughput. • Weak in distributed support. • Performance degrades as Queue increases.
  • 9. ARCHITECTURE contd.. • A stream of messages of particular type is defined by a topic. • A topic is then divided into multiple partitions • Each partition is a directory that has list of files and the files store the message data. • A producer can publish messages to a topic (push). • Published messages are then stored on a set of servers called brokers. • Each broker stores one or more partitions • Consumers can subscribe to any of the topics and can consume messages by pulling data from the brokers. • Supports both Point-to-point Delivery model and Publish/subscribe Delivery model. • Uses “Pull Based” system model.
  • 10. WHY “PULL MODEL”? • A “push-based” system has difficulty dealing with diverse consumers as the broker controls the rate at which data is transferred, (DOS attack = loss of data). • In a Pull based system, a consumer simply falls behind and catches up when it can. • Consumer can rewind back to consume old messages.
  • 12.
  • 13.
  • 14. Storage • Simple Storage • One each partition corresponds to a logical log.(a log == a list of files.) • Physically a log is implemented using segmentation where each segment is approximately of equal size. • When a message is published , the broker simply appends the message to the last segment file. • Messages are flushed on to disk in a batch for better performance. • Messages is only exposed to consumers after they are flushed. • Messages are addressed by log offset. #lessoverhead.
  • 15. Storage contd.. • Id of next message = length of current message + id of current message.
  • 16. Efficient Transfer • Producer can submit multiple messages in a single send request.( #End-to-endmessagebatching). • Each pull request can consume multiple messages up to a certain size. • No caching of messages on the Kafta process layer. Messages are only cached in the page cache, --“no double buffering”. Hence very little overhead in garbage collecting its memory.(#filesystemcaching)#lessoverhead • Producer and Consumer access the data files Sequentially. (disks are fast when accessed sequentially).
  • 17. Efficient Transfer :from local file to remote socket 1. read data from the storage media to the page cache in an OS. 2. copy data in the page cache to an application buffer. 3. copy application buffer to another kernel buffer. 4. send the kernel buffer to the socket • This process usually takes 4 data copying and one system call. SENDFILE API avoids 2 copies call and 1 system call #zerocopytransfer broker consumer
  • 18. Stateless Broker • Consumer maintains its own state hence reducing complexity and overhead on the broker. • Disadvantage is that broker doesn’t know whether all subscriber’s have consumed the message , this problem is solved by using a simple time based SLA for the retention policy. A message is automatically deleted if it has been • A consumer can deliberately rewind back to an old offset and re-consume data. #violateslogicofqueue.
  • 19. Distributed Model Producer 1 Producer 2 Broker 1 Broker 2 Broker 3 Consumer 1 Consumer 2 Zookeeper
  • 20. Distributed Model contd.. • Consumer Group consists of one or more consumers. • All messages from a partition are consumed by a single consumer in a consumer group. #lessoverheadonbroker. • No master node  less complexity and no master failures to worry about. • Zookeeper co-ordinates between the producers, consumers and brokers.
  • 21. Zookeeper functions. • Detection of addition/ removal of brokers, producers, consumers. • Triggering rebalance process in effect to above detection. • Tracking. Broker registry hostname, port and set of topics of broker. ephemeral Consumer Registry consumer group Ephemeral Owner registry consumer currently consuming a partition persistent Offset registry stores the offset of last consumed partition for each subscribed partition. persistent
  • 22. Delivery Guarantee • At least once delivery. #cancauseduplication.#cost- effective • Messages from single partition in order. • Messages from multiple partitions not necessarily in order. #noguarantee • CRC for each message in the logs #avoidlogcorruption • If I/O error on broker then kafka removes messages with inconsistent CRCs. • If the storage system is completely damaged and consumer have not consumed the message then the message is lost forever. #futureplanstoaddreplication.
  • 23. Recent Developments • End-to-End Batch Level Compression. • Improved Stream Processing Libraries. • Hadoop Consumer. • Hadoop Producer.