SlideShare a Scribd company logo
1 of 33
Download to read offline
Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
Data Pipeline
with Kafka
Peerapat Asoktummarungsri
AGODA
Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
Senior Software
Engineer Agoda.com
Contributor Thai Java
User Group (THJUG.com)
Contributor Agile66
Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
AGENDA
Big Data & Data Pipeline
Kafka Introduction
Quick Start
Monitoring
Data Pipeline for Search API
Hadoop integration with Camus
Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
Hadoop
+

HDFS
Information
Big Data
MapReduce
Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
Pipeline
hadoopWebsite
log
Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
hadoopWebsite
Mobile
Growth
log
Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
hadoopWebsite
Mobile
realtime
monitoring
Complex
log
message
Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
New
New
hadoopWebsite
Mobile
realtime
monitoring
Data
Warehouse
API
Features becomes the problem
NEW
Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
hadoop
Website
Mobile
realtime
monitoring
API
Data Pipeline
Produce
Consume
Data Pipeline
Warehouse
Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
compare
Topic
Queue Consumer
Consumer
Consumer
Consumer
Consumer
Consumer
1
2
3
1
1
1
Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
General Topic Implement
Topic
Consumer 1
Consumer 2
Consumer 3
2
2
This consumer will lose a message.
Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
Distributed by Design
Fast
Scalable - It can be elastically and transparently
expanded without downtime.
Durable - Messages are persisted on disk and
replicated within the cluster to prevent data loss.
Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
Topic Consumer 1
Consumer 2
Consumer 3
msg
gid = Group ID
msg
msg
1
2
3
4
7
6 5
gid = hadoop
Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
Topic hadoop
gid = hadoop
realtime
monitoring
data
warehouse
msg
gid = Group ID
msg
msg
12
gid = rtmon
gid = warehouse
3
123
123
Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
Topic hadoop
gid = hadoop
realtime
monitoring
data
warehouse
msg
gid = Group ID
msg 9
gid = rtmon
gid = warehouse
9
9
New
Consumer
1
2
3
gid = newconsumer
Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
Vagrant
Install Vagrant
Install Virtual Box
Clone https://github.com/stealthly/scala-kafka
vagrant up
Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
BREW
brew update
brew install zookeeper kafka -y
Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
Some Kafka Config
# The id of the broker. This must be set to a unique integer for each broker.
broker.id=0
# The port the socket server listens on
port=9092
# Zookeeper connection string (see zookeeper docs for details).
zookeeper.connect=localhost:2181
# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000
# The minimum age of a log file to be eligible for deletion
log.retention.hours=168
Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
Kafka @ Linkedin (2013)
10 billion message writes per day
55 billion messages delivered to real-time consumers
367 topics that cover both user activity topics and
operational data
the largest of which adds an average of 92GB per day of
batch-compressed messages
Messages are kept for 7 days, and these average at
about 9.5 TB of compressed messages across all topics.
Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
KafkaOffsetMonitor
java -cp KafkaOffsetMonitor-assembly-0.2.1.jar 
com.quantifind.kafka.offsetapp.OffsetGetterWeb 
--zk localhost 
--port 8080 
--refresh 10.seconds 
--retain 2.days
Download KafkaOffsetMonitor from Github
https://github.com/quantifind/KafkaOffsetMonitor
1 Jar file, KafkaOffsetMonitor-assembly-0.2.1.jar
Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
CHANGE
Produce Change
Price & Inventory
Consumer
Cassandra
Search API
Calculate Price
HTTP
KafkaAPI
Hotel Manager
Hotels
Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
CHANGE
KafkaAPI
Hotel Manager
Hotels
B Consumer
A Consumer
Price & Inventory
Consumer
Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
Camus
Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
http://www.slideshare.net/nuboat
https://github.com/nuboat/akkakafkaexam
Slide available here
Sourcecode available here
Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
REFERENCES
http://www.slideshare.net/charmalloc/
developingwithapachekafka-29910685
http://www.infoq.com/articles/apache-kafka
http://kafka.apache.org/
https://github.com/stealthly/scala-kafka
https://github.com/quantifind/KafkaOffsetMonitor
Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
Q & A

More Related Content

What's hot

Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and ZeppelinJim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Flink Forward
 

What's hot (20)

Bullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query EngineBullet: A Real Time Data Query Engine
Bullet: A Real Time Data Query Engine
 
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
Bravo Six, Going Realtime. Transitioning Activision Data Pipeline to Streamin...
 
Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016Kafka connect-london-meetup-2016
Kafka connect-london-meetup-2016
 
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark StreamingBuilding Realtim Data Pipelines with Kafka Connect and Spark Streaming
Building Realtim Data Pipelines with Kafka Connect and Spark Streaming
 
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka StreamsKafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
Kafka Summit SF 2017 - Real-Time Document Rankings with Kafka Streams
 
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and ZeppelinJim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
Jim Dowling – Interactive Flink analytics with HopsWorks and Zeppelin
 
Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...
Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...
Kafka Summit SF 2017 - Query the Application, Not a Database: “Interactive Qu...
 
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ ExpediaBridging the gap of Relational to Hadoop using Sqoop @ Expedia
Bridging the gap of Relational to Hadoop using Sqoop @ Expedia
 
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on HiveFaster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
Faster, Faster, Faster: The True Story of a Mobile Analytics Data Mart on Hive
 
How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021
How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021
How Tencent Applies Apache Pulsar to Apache InLong - Pulsar Summit Asia 2021
 
fluentd -- the missing log collector
fluentd -- the missing log collectorfluentd -- the missing log collector
fluentd -- the missing log collector
 
Lambda Architecture with Spark
Lambda Architecture with SparkLambda Architecture with Spark
Lambda Architecture with Spark
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
 
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to SurviveHadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
 
Uber Real Time Data Analytics
Uber Real Time Data AnalyticsUber Real Time Data Analytics
Uber Real Time Data Analytics
 
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and KafkaStream, Stream, Stream: Different Streaming Methods with Spark and Kafka
Stream, Stream, Stream: Different Streaming Methods with Spark and Kafka
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
 
Unified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache BeamUnified, Efficient, and Portable Data Processing with Apache Beam
Unified, Efficient, and Portable Data Processing with Apache Beam
 
How Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and StormHow Apache Kafka is transforming Hadoop, Spark and Storm
How Apache Kafka is transforming Hadoop, Spark and Storm
 
Volta: Logging, Metrics, and Monitoring as a Service
Volta: Logging, Metrics, and Monitoring as a ServiceVolta: Logging, Metrics, and Monitoring as a Service
Volta: Logging, Metrics, and Monitoring as a Service
 

Viewers also liked

Hadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With PythonHadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With Python
Joe Stein
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
Joe Stein
 

Viewers also liked (20)

jstein.cassandra.nyc.2011
jstein.cassandra.nyc.2011jstein.cassandra.nyc.2011
jstein.cassandra.nyc.2011
 
Storing Time Series Metrics With Cassandra and Composite Columns
Storing Time Series Metrics With Cassandra and Composite ColumnsStoring Time Series Metrics With Cassandra and Composite Columns
Storing Time Series Metrics With Cassandra and Composite Columns
 
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
Making Distributed Data Persistent Services Elastic (Without Losing All Your ...
 
Developing Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache KafkaDeveloping Realtime Data Pipelines With Apache Kafka
Developing Realtime Data Pipelines With Apache Kafka
 
Containerized Data Persistence on Mesos
Containerized Data Persistence on MesosContainerized Data Persistence on Mesos
Containerized Data Persistence on Mesos
 
Apache Cassandra 2.0
Apache Cassandra 2.0Apache Cassandra 2.0
Apache Cassandra 2.0
 
Developing Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache KafkaDeveloping Real-Time Data Pipelines with Apache Kafka
Developing Real-Time Data Pipelines with Apache Kafka
 
Making Apache Kafka Elastic with Apache Mesos
Making Apache Kafka Elastic with Apache MesosMaking Apache Kafka Elastic with Apache Mesos
Making Apache Kafka Elastic with Apache Mesos
 
Introduction Apache Kafka
Introduction Apache KafkaIntroduction Apache Kafka
Introduction Apache Kafka
 
Developing Frameworks for Apache Mesos
Developing Frameworks  for Apache MesosDeveloping Frameworks  for Apache Mesos
Developing Frameworks for Apache Mesos
 
Developing with the Go client for Apache Kafka
Developing with the Go client for Apache KafkaDeveloping with the Go client for Apache Kafka
Developing with the Go client for Apache Kafka
 
Introduction To Apache Mesos
Introduction To Apache MesosIntroduction To Apache Mesos
Introduction To Apache Mesos
 
Current and Future of Apache Kafka
Current and Future of Apache KafkaCurrent and Future of Apache Kafka
Current and Future of Apache Kafka
 
Apache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on MesosApache Kafka, HDFS, Accumulo and more on Mesos
Apache Kafka, HDFS, Accumulo and more on Mesos
 
SMACK Stack 1.1
SMACK Stack 1.1SMACK Stack 1.1
SMACK Stack 1.1
 
Hadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With PythonHadoop Streaming Tutorial With Python
Hadoop Streaming Tutorial With Python
 
Streaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit LogStreaming Processing with a Distributed Commit Log
Streaming Processing with a Distributed Commit Log
 
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache AccumuloReal-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
Real-Time Distributed and Reactive Systems with Apache Kafka and Apache Accumulo
 
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and CassandraReal-Time Log Analysis with Apache Mesos, Kafka and Cassandra
Real-Time Log Analysis with Apache Mesos, Kafka and Cassandra
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 

Similar to Data Pipeline with Kafka

TechDays 2010 Portugal - Scaling your data tier with app fabric 16x9
TechDays 2010 Portugal - Scaling your data tier with app fabric 16x9TechDays 2010 Portugal - Scaling your data tier with app fabric 16x9
TechDays 2010 Portugal - Scaling your data tier with app fabric 16x9
Nuno Godinho
 
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesOSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
Timothy Spann
 
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
Timothy Spann
 
June 2004 IPv6 – Hands on
June 2004 IPv6 – Hands on June 2004 IPv6 – Hands on
June 2004 IPv6 – Hands on
Videoguy
 
Kafka Lag Monitoring For Human Beings (Elad Leev, AppsFlyer) Kafka Summit 2020
Kafka Lag Monitoring For Human Beings (Elad Leev, AppsFlyer) Kafka Summit 2020Kafka Lag Monitoring For Human Beings (Elad Leev, AppsFlyer) Kafka Summit 2020
Kafka Lag Monitoring For Human Beings (Elad Leev, AppsFlyer) Kafka Summit 2020
HostedbyConfluent
 
Fast Streaming into Clickhouse with Apache Pulsar
Fast Streaming into Clickhouse with Apache PulsarFast Streaming into Clickhouse with Apache Pulsar
Fast Streaming into Clickhouse with Apache Pulsar
Timothy Spann
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Timothy Spann
 

Similar to Data Pipeline with Kafka (20)

PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and FriendsPortoTechHub  - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
PortoTechHub - Hail Hydrate! From Stream to Lake with Apache Pulsar and Friends
 
Extending the Yahoo Streaming Benchmark
Extending the Yahoo Streaming BenchmarkExtending the Yahoo Streaming Benchmark
Extending the Yahoo Streaming Benchmark
 
[Spark Summit EU 2017] Apache spark streaming + kafka 0.10 an integration story
[Spark Summit EU 2017] Apache spark streaming + kafka 0.10  an integration story[Spark Summit EU 2017] Apache spark streaming + kafka 0.10  an integration story
[Spark Summit EU 2017] Apache spark streaming + kafka 0.10 an integration story
 
Apache Spark Streaming + Kafka 0.10 with Joan Viladrosariera
Apache Spark Streaming + Kafka 0.10 with Joan ViladrosarieraApache Spark Streaming + Kafka 0.10 with Joan Viladrosariera
Apache Spark Streaming + Kafka 0.10 with Joan Viladrosariera
 
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
NY Open Source Data Meetup Feb 8 2024 Building Real-time Pipelines with FLaNK...
 
Conf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python ProcessorsConf42-Python-Building Apache NiFi 2.0 Python Processors
Conf42-Python-Building Apache NiFi 2.0 Python Processors
 
GCP for Apache Kafka® Users: Stream Ingestion and Processing
GCP for Apache Kafka® Users: Stream Ingestion and ProcessingGCP for Apache Kafka® Users: Stream Ingestion and Processing
GCP for Apache Kafka® Users: Stream Ingestion and Processing
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
TechDays 2010 Portugal - Scaling your data tier with app fabric 16x9
TechDays 2010 Portugal - Scaling your data tier with app fabric 16x9TechDays 2010 Portugal - Scaling your data tier with app fabric 16x9
TechDays 2010 Portugal - Scaling your data tier with app fabric 16x9
 
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...How a distributed graph analytics platform uses Apache Kafka for data ingesti...
How a distributed graph analytics platform uses Apache Kafka for data ingesti...
 
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time PipelinesOSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
OSACon 2023_ Unlocking Financial Data with Real-Time Pipelines
 
Leverage Kafka to build a stream processing platform
Leverage Kafka to build a stream processing platformLeverage Kafka to build a stream processing platform
Leverage Kafka to build a stream processing platform
 
Apache Kafka - Strakin Technologies Pvt Ltd
Apache Kafka - Strakin Technologies Pvt LtdApache Kafka - Strakin Technologies Pvt Ltd
Apache Kafka - Strakin Technologies Pvt Ltd
 
Introduction to Apache Kafka
Introduction to Apache KafkaIntroduction to Apache Kafka
Introduction to Apache Kafka
 
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
26Oct2023_Adding Generative AI to Real-Time Streaming Pipelines_ NYC Meetup
 
June 2004 IPv6 – Hands on
June 2004 IPv6 – Hands on June 2004 IPv6 – Hands on
June 2004 IPv6 – Hands on
 
Kafka Lag Monitoring For Human Beings (Elad Leev, AppsFlyer) Kafka Summit 2020
Kafka Lag Monitoring For Human Beings (Elad Leev, AppsFlyer) Kafka Summit 2020Kafka Lag Monitoring For Human Beings (Elad Leev, AppsFlyer) Kafka Summit 2020
Kafka Lag Monitoring For Human Beings (Elad Leev, AppsFlyer) Kafka Summit 2020
 
[Big Data Spain] Apache Spark Streaming + Kafka 0.10: an Integration Story
[Big Data Spain] Apache Spark Streaming + Kafka 0.10:  an Integration Story[Big Data Spain] Apache Spark Streaming + Kafka 0.10:  an Integration Story
[Big Data Spain] Apache Spark Streaming + Kafka 0.10: an Integration Story
 
Fast Streaming into Clickhouse with Apache Pulsar
Fast Streaming into Clickhouse with Apache PulsarFast Streaming into Clickhouse with Apache Pulsar
Fast Streaming into Clickhouse with Apache Pulsar
 
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
Budapest Data/ML - Building Modern Data Streaming Apps with NiFi, Flink and K...
 

More from Peerapat Asoktummarungsri

More from Peerapat Asoktummarungsri (13)

ePassport eKYC for Financial
ePassport eKYC for FinancialePassport eKYC for Financial
ePassport eKYC for Financial
 
Security Deployment by CI/CD
Security Deployment by CI/CDSecurity Deployment by CI/CD
Security Deployment by CI/CD
 
Cassandra - Distributed Data Store
Cassandra - Distributed Data StoreCassandra - Distributed Data Store
Cassandra - Distributed Data Store
 
Modern Java Development
Modern Java DevelopmentModern Java Development
Modern Java Development
 
Sonarqube
SonarqubeSonarqube
Sonarqube
 
Meetup Big Data by THJUG
Meetup Big Data by THJUGMeetup Big Data by THJUG
Meetup Big Data by THJUG
 
Sonar
SonarSonar
Sonar
 
Roboguice
RoboguiceRoboguice
Roboguice
 
Homeloan
HomeloanHomeloan
Homeloan
 
Lightweight javaEE with Guice
Lightweight javaEE with GuiceLightweight javaEE with Guice
Lightweight javaEE with Guice
 
Hadoop
HadoopHadoop
Hadoop
 
Meet Django
Meet DjangoMeet Django
Meet Django
 
Easy java
Easy javaEasy java
Easy java
 

Recently uploaded

VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
dharasingh5698
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
MsecMca
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 

Recently uploaded (20)

Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Ankleshwar 7001035870 Whatsapp Number, 24/07 Booking
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
data_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdfdata_management_and _data_science_cheat_sheet.pdf
data_management_and _data_science_cheat_sheet.pdf
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar  ≼🔝 Delhi door step de...
Call Now ≽ 9953056974 ≼🔝 Call Girls In New Ashok Nagar ≼🔝 Delhi door step de...
 
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
(INDIRA) Call Girl Aurangabad Call Now 8617697112 Aurangabad Escorts 24x7
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance BookingCall Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
Call Girls Wakad Call Me 7737669865 Budget Friendly No Advance Booking
 
notes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.pptnotes on Evolution Of Analytic Scalability.ppt
notes on Evolution Of Analytic Scalability.ppt
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 

Data Pipeline with Kafka

  • 1. Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License. Data Pipeline with Kafka Peerapat Asoktummarungsri AGODA
  • 2. Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License. Senior Software Engineer Agoda.com Contributor Thai Java User Group (THJUG.com) Contributor Agile66
  • 3. Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License. AGENDA Big Data & Data Pipeline Kafka Introduction Quick Start Monitoring Data Pipeline for Search API Hadoop integration with Camus
  • 4. Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License. Hadoop +
 HDFS Information Big Data MapReduce
  • 5. Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License. Pipeline hadoopWebsite log
  • 6. Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License. hadoopWebsite Mobile Growth log
  • 7. Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License. hadoopWebsite Mobile realtime monitoring Complex log message
  • 8. Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License. New New hadoopWebsite Mobile realtime monitoring Data Warehouse API Features becomes the problem NEW
  • 9. Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License. hadoop Website Mobile realtime monitoring API Data Pipeline Produce Consume Data Pipeline Warehouse
  • 10. Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License. compare Topic Queue Consumer Consumer Consumer Consumer Consumer Consumer 1 2 3 1 1 1
  • 11. Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License. General Topic Implement Topic Consumer 1 Consumer 2 Consumer 3 2 2 This consumer will lose a message.
  • 12. Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License. Distributed by Design Fast Scalable - It can be elastically and transparently expanded without downtime. Durable - Messages are persisted on disk and replicated within the cluster to prevent data loss.
  • 13. Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License. Topic Consumer 1 Consumer 2 Consumer 3 msg gid = Group ID msg msg 1 2 3 4 7 6 5 gid = hadoop
  • 14. Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License. Topic hadoop gid = hadoop realtime monitoring data warehouse msg gid = Group ID msg msg 12 gid = rtmon gid = warehouse 3 123 123
  • 15. Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License. Topic hadoop gid = hadoop realtime monitoring data warehouse msg gid = Group ID msg 9 gid = rtmon gid = warehouse 9 9 New Consumer 1 2 3 gid = newconsumer
  • 16. Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
  • 17. Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License. Vagrant Install Vagrant Install Virtual Box Clone https://github.com/stealthly/scala-kafka vagrant up
  • 18. Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License. BREW brew update brew install zookeeper kafka -y
  • 19. Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License. Some Kafka Config # The id of the broker. This must be set to a unique integer for each broker. broker.id=0 # The port the socket server listens on port=9092 # Zookeeper connection string (see zookeeper docs for details). zookeeper.connect=localhost:2181 # Timeout in ms for connecting to zookeeper zookeeper.connection.timeout.ms=6000 # The minimum age of a log file to be eligible for deletion log.retention.hours=168
  • 20. Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License. Kafka @ Linkedin (2013) 10 billion message writes per day 55 billion messages delivered to real-time consumers 367 topics that cover both user activity topics and operational data the largest of which adds an average of 92GB per day of batch-compressed messages Messages are kept for 7 days, and these average at about 9.5 TB of compressed messages across all topics.
  • 21. Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License. KafkaOffsetMonitor java -cp KafkaOffsetMonitor-assembly-0.2.1.jar com.quantifind.kafka.offsetapp.OffsetGetterWeb --zk localhost --port 8080 --refresh 10.seconds --retain 2.days Download KafkaOffsetMonitor from Github https://github.com/quantifind/KafkaOffsetMonitor 1 Jar file, KafkaOffsetMonitor-assembly-0.2.1.jar
  • 22. Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
  • 23. Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
  • 24. Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
  • 25. Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
  • 26. Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
  • 27. Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License.
  • 28. Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License. CHANGE Produce Change Price & Inventory Consumer Cassandra Search API Calculate Price HTTP KafkaAPI Hotel Manager Hotels
  • 29. Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License. CHANGE KafkaAPI Hotel Manager Hotels B Consumer A Consumer Price & Inventory Consumer
  • 30. Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License. Camus
  • 31. Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License. http://www.slideshare.net/nuboat https://github.com/nuboat/akkakafkaexam Slide available here Sourcecode available here
  • 32. Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License. REFERENCES http://www.slideshare.net/charmalloc/ developingwithapachekafka-29910685 http://www.infoq.com/articles/apache-kafka http://kafka.apache.org/ https://github.com/stealthly/scala-kafka https://github.com/quantifind/KafkaOffsetMonitor
  • 33. Data Pipeline with Kafka by Peerapat A. is licensed under a Creative Commons Attribution 4.0 International License. Q & A