SlideShare una empresa de Scribd logo
1 de 19
Introduction to
Kafka and Zookeeper
June Hadoop Meetup
Rahul Jain
@rahuldausa
Who am I?
 Software Engineer
 Member of Core technology @ IVY Comptech,
Hyderabad, India
 6 years of programming experience
 Areas of expertise/interest
 High traffic web applications
 JAVA/J2EE
 Big data, NoSQL
 Information-Retrieval, Machine learning
2
Agenda
• Overview
• Zookeeper
• Messaging System (Basic Concepts)
• Kafka
• Q&A
3
Apache Zookeeper TM
What is a Distributed System
“A Distributed system consists of multiple computers
that communicate and coordinate their actions by
passing messages. The components interact with each
other in order to achieve a common goal. ”
- Wikipedia
What is Zookeeper
• An Open source, High Performance coordination service
for distributed applications
• Centralized service for
– Configuration Management
– Locks and Synchronization for providing coordination
between distributed systems
– Naming service (Registry)
– Group Membership
• Features
– hierarchical namespace
– provides watcher on a znode
– allows to form a cluster of nodes
• Supports a large volume of request for data retrieval and
update
• http://zookeeper.apache.org/
6
Source : http://zookeeper.apache.org
Zookeeper Use cases
• Configuration Management
• Cluster member nodes Bootstrapping configuration from a
central source
• Distributed Cluster Management
• Node Join/Leave
• Node Status in real time
• Naming Service – e.g. DNS
• Distributed Synchronization – locks, barriers
• Leader election
• Centralized and Highly reliable Registry
Zookeeper Data Model
 Hierarchical Namespace
 Each node is called “znode”
 Each znode has data(stores data in
byte[] array) and can have children
 znode
– Maintains “Stat” structure with
version of data changes , ACL
changes and timestamp
– Version number increases with each
changes
Let’s recall basic concepts of
Messaging System
Point to Point Messaging
(Queue)
Credit: http://fusesource.com/docs/broker/5.3/getting_started/FuseMBStartedKeyJMS.html
Publish-Subscribe Messaging
(Topic)
Credit: http://fusesource.com/docs/broker/5.3/getting_started/FuseMBStartedKeyJMS.html
Apache Kafka
Overview
• An apache project initially developed at LinkedIn
• Distributed publish-subscribe messaging system
• Designed for processing of real time activity stream data e.g.
logs, metrics collections
• Written in Scala
• Does not follow JMS Standards, neither uses JMS APIs
• Features
– Persistent messaging
– High-throughput
– Supports both queue and topic semantics
– Uses Zookeeper for forming a cluster of nodes
(producer/consumer/broker)
and many more…
• http://kafka.apache.org/
13
How it works
Credit : http://kafka.apache.org/design.html
Real time transfer
15
Consumer3
(Group2)
Kafka
Broker
Consumer4
(Group2)
Producer
Zookeeper
Consumer2
(Group1)
Consumer1
(Group1)
Update Consumed
Message offset
Queue
Topology
Topic
Topology
Kafka
Broker
Design Elements
• Uses Filesystem Cache
• Zero-copy transfer of messages
• Batching of Messages
• Batch Compression
• Automatic Producer Load balancing.
• Broker does not Push messages to Consumer, Consumer
Polls messages from Broker.
Design Elements (Contd.)
• Cluster formation of Broker/Consumer using Zookeeper,
– So on the fly more consumer, broker can be introduced. The new
cluster rebalancing will be taken care by Zookeeper
• Data is persisted in broker
– But not removed on consumption (till retention period), so if one
consumer fails while consuming, same message can be re-consumed
again later from broker.
• Simplified storage mechanism for message,
– not for each message per consumer.
Performance Numbers
Credit : http://research.microsoft.com/en-us/UM/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf
Producer Performance Consumer Performance
Questions ?
@rahuldausa on twitter and slideshare
http://www.linkedin.com/in/rahuldausa

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Apache Kafka Introduction
Apache Kafka IntroductionApache Kafka Introduction
Apache Kafka Introduction
 
Fundamentals of Apache Kafka
Fundamentals of Apache KafkaFundamentals of Apache Kafka
Fundamentals of Apache Kafka
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
Apache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and DevelopersApache Kafka Fundamentals for Architects, Admins and Developers
Apache Kafka Fundamentals for Architects, Admins and Developers
 
Stability Patterns for Microservices
Stability Patterns for MicroservicesStability Patterns for Microservices
Stability Patterns for Microservices
 
Uber: Kafka Consumer Proxy
Uber: Kafka Consumer ProxyUber: Kafka Consumer Proxy
Uber: Kafka Consumer Proxy
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
Grafana.pptx
Grafana.pptxGrafana.pptx
Grafana.pptx
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 
Kafka 101
Kafka 101Kafka 101
Kafka 101
 
Producer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache KafkaProducer Performance Tuning for Apache Kafka
Producer Performance Tuning for Apache Kafka
 
APACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka StreamsAPACHE KAFKA / Kafka Connect / Kafka Streams
APACHE KAFKA / Kafka Connect / Kafka Streams
 
State management in Structured Streaming
State management in Structured StreamingState management in Structured Streaming
State management in Structured Streaming
 
Kafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced ProducersKafka Tutorial: Advanced Producers
Kafka Tutorial: Advanced Producers
 
Grafana introduction
Grafana introductionGrafana introduction
Grafana introduction
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Kafka: Internals
Kafka: InternalsKafka: Internals
Kafka: Internals
 
Removing performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configurationRemoving performance bottlenecks with Kafka Monitoring and topic configuration
Removing performance bottlenecks with Kafka Monitoring and topic configuration
 

Similar a Introduction to Kafka and Zookeeper

Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit
 
OpenNaaS Overview Complete
OpenNaaS Overview CompleteOpenNaaS Overview Complete
OpenNaaS Overview Complete
Joan Garcia
 
Architectures, Frameworks and Infrastructure
Architectures, Frameworks and InfrastructureArchitectures, Frameworks and Infrastructure
Architectures, Frameworks and Infrastructure
harendra_pathak
 
Big data conference europe real-time streaming in any and all clouds, hybri...
Big data conference europe   real-time streaming in any and all clouds, hybri...Big data conference europe   real-time streaming in any and all clouds, hybri...
Big data conference europe real-time streaming in any and all clouds, hybri...
Timothy Spann
 

Similar a Introduction to Kafka and Zookeeper (20)

Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?Webinar: What's new in CDAP 3.5?
Webinar: What's new in CDAP 3.5?
 
Real time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache SparkReal time Analytics with Apache Kafka and Apache Spark
Real time Analytics with Apache Kafka and Apache Spark
 
Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?Big Data Streams Architectures. Why? What? How?
Big Data Streams Architectures. Why? What? How?
 
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
AI&BigData Lab 2016. Сарапин Виктор: Размер имеет значение: анализ по требова...
 
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
Scenic City Summit (2021):  Real-Time Streaming in any and all clouds, hybrid...Scenic City Summit (2021):  Real-Time Streaming in any and all clouds, hybrid...
Scenic City Summit (2021): Real-Time Streaming in any and all clouds, hybrid...
 
Apache phoenix
Apache phoenixApache phoenix
Apache phoenix
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
 
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
Accumulo Summit 2015: Real-Time Distributed and Reactive Systems with Apache ...
 
Real time web apps
Real time web appsReal time web apps
Real time web apps
 
OpenNaaS Overview Complete
OpenNaaS Overview CompleteOpenNaaS Overview Complete
OpenNaaS Overview Complete
 
Apereo OAE - Architectural overview
Apereo OAE - Architectural overviewApereo OAE - Architectural overview
Apereo OAE - Architectural overview
 
Event Driven Architectures with Apache Kafka
Event Driven Architectures with Apache KafkaEvent Driven Architectures with Apache Kafka
Event Driven Architectures with Apache Kafka
 
Architectures, Frameworks and Infrastructure
Architectures, Frameworks and InfrastructureArchitectures, Frameworks and Infrastructure
Architectures, Frameworks and Infrastructure
 
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
Stream your Operational Data with Apache Spark & Kafka into Hadoop using Couc...
 
Big data conference europe real-time streaming in any and all clouds, hybri...
Big data conference europe   real-time streaming in any and all clouds, hybri...Big data conference europe   real-time streaming in any and all clouds, hybri...
Big data conference europe real-time streaming in any and all clouds, hybri...
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
 
Distributed messaging through Kafka
Distributed messaging through KafkaDistributed messaging through Kafka
Distributed messaging through Kafka
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
 
HPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journeyHPC and cloud distributed computing, as a journey
HPC and cloud distributed computing, as a journey
 

Más de Rahul Jain

Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
Rahul Jain
 

Más de Rahul Jain (13)

Flipkart Strategy Analysis and Recommendation
Flipkart Strategy Analysis and RecommendationFlipkart Strategy Analysis and Recommendation
Flipkart Strategy Analysis and Recommendation
 
Emerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big DataEmerging technologies /frameworks in Big Data
Emerging technologies /frameworks in Big Data
 
Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )Case study of Rujhaan.com (A social news app )
Case study of Rujhaan.com (A social news app )
 
Building a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache SolrBuilding a Large Scale SEO/SEM Application with Apache Solr
Building a Large Scale SEO/SEM Application with Apache Solr
 
Introduction to Apache Spark
Introduction to Apache SparkIntroduction to Apache Spark
Introduction to Apache Spark
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Introduction to Scala
Introduction to ScalaIntroduction to Scala
Introduction to Scala
 
What is NoSQL and CAP Theorem
What is NoSQL and CAP TheoremWhat is NoSQL and CAP Theorem
What is NoSQL and CAP Theorem
 
Introduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of LuceneIntroduction to Elasticsearch with basics of Lucene
Introduction to Elasticsearch with basics of Lucene
 
Introduction to Apache Lucene/Solr
Introduction to Apache Lucene/SolrIntroduction to Apache Lucene/Solr
Introduction to Apache Lucene/Solr
 
Introduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and UsecasesIntroduction to Lucene & Solr and Usecases
Introduction to Lucene & Solr and Usecases
 
Hadoop & HDFS for Beginners
Hadoop & HDFS for BeginnersHadoop & HDFS for Beginners
Hadoop & HDFS for Beginners
 
Hibernate tutorial for beginners
Hibernate tutorial for beginnersHibernate tutorial for beginners
Hibernate tutorial for beginners
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 

Introduction to Kafka and Zookeeper

  • 1. Introduction to Kafka and Zookeeper June Hadoop Meetup Rahul Jain @rahuldausa
  • 2. Who am I?  Software Engineer  Member of Core technology @ IVY Comptech, Hyderabad, India  6 years of programming experience  Areas of expertise/interest  High traffic web applications  JAVA/J2EE  Big data, NoSQL  Information-Retrieval, Machine learning 2
  • 3. Agenda • Overview • Zookeeper • Messaging System (Basic Concepts) • Kafka • Q&A 3
  • 5. What is a Distributed System “A Distributed system consists of multiple computers that communicate and coordinate their actions by passing messages. The components interact with each other in order to achieve a common goal. ” - Wikipedia
  • 6. What is Zookeeper • An Open source, High Performance coordination service for distributed applications • Centralized service for – Configuration Management – Locks and Synchronization for providing coordination between distributed systems – Naming service (Registry) – Group Membership • Features – hierarchical namespace – provides watcher on a znode – allows to form a cluster of nodes • Supports a large volume of request for data retrieval and update • http://zookeeper.apache.org/ 6 Source : http://zookeeper.apache.org
  • 7. Zookeeper Use cases • Configuration Management • Cluster member nodes Bootstrapping configuration from a central source • Distributed Cluster Management • Node Join/Leave • Node Status in real time • Naming Service – e.g. DNS • Distributed Synchronization – locks, barriers • Leader election • Centralized and Highly reliable Registry
  • 8. Zookeeper Data Model  Hierarchical Namespace  Each node is called “znode”  Each znode has data(stores data in byte[] array) and can have children  znode – Maintains “Stat” structure with version of data changes , ACL changes and timestamp – Version number increases with each changes
  • 9. Let’s recall basic concepts of Messaging System
  • 10. Point to Point Messaging (Queue) Credit: http://fusesource.com/docs/broker/5.3/getting_started/FuseMBStartedKeyJMS.html
  • 13. Overview • An apache project initially developed at LinkedIn • Distributed publish-subscribe messaging system • Designed for processing of real time activity stream data e.g. logs, metrics collections • Written in Scala • Does not follow JMS Standards, neither uses JMS APIs • Features – Persistent messaging – High-throughput – Supports both queue and topic semantics – Uses Zookeeper for forming a cluster of nodes (producer/consumer/broker) and many more… • http://kafka.apache.org/ 13
  • 14. How it works Credit : http://kafka.apache.org/design.html
  • 16. Design Elements • Uses Filesystem Cache • Zero-copy transfer of messages • Batching of Messages • Batch Compression • Automatic Producer Load balancing. • Broker does not Push messages to Consumer, Consumer Polls messages from Broker.
  • 17. Design Elements (Contd.) • Cluster formation of Broker/Consumer using Zookeeper, – So on the fly more consumer, broker can be introduced. The new cluster rebalancing will be taken care by Zookeeper • Data is persisted in broker – But not removed on consumption (till retention period), so if one consumer fails while consuming, same message can be re-consumed again later from broker. • Simplified storage mechanism for message, – not for each message per consumer.
  • 18. Performance Numbers Credit : http://research.microsoft.com/en-us/UM/people/srikanth/netdb11/netdb11papers/netdb11-final12.pdf Producer Performance Consumer Performance
  • 19. Questions ? @rahuldausa on twitter and slideshare http://www.linkedin.com/in/rahuldausa