SlideShare una empresa de Scribd logo
1 de 30
From a Kafkaesque Story to
the Promised Land
7/7/2013
Ran Silberman
Open Source paradigm
The Cathedral & the Bazaar by Eric S Raymond, 1999
the struggle between top-down and bottom-up design
Challenges of data platform[1]
• High throughput
• Horizontal scale to address growth
• High availability of data services
• No Data loss
• Satisfy Real-Time demands
• Enforce structural data with schemas
• Process Big Data and Enterprise Data
• Single Source of Truth (SSOT)
SLA's of data platform
BI DWH
Real-time
Customers
Real-time
dashboards
Data Bus
Offline
Customers
SLA:
1. 98% in < 1/2 hr
2. 99.999% < 4 hrs
SLA:
1. 98% in < 500 msec
2. No send > 2 sec
Real-time
servers
Legacy Data flow in LivePerson
BI DWH
(Oracle)
RealTime
servers
View
Reports
Customers
ETL
Sessionize
Modeling
Schema
View
1st phase - move to Hadoop
ETL
Sessionize
Modeling
Schema
View
RealTime
servers
BI DWH
(Vertica)
View
Reports
HDFS
Hadoop
MR Job transfers
data to BI DWH
Customers
BI DWH
(Oracle)
2. move to Kafka
6
RealTime
servers
HDFS
BI DWH
(Vertica)
Hadoop
MR Job transfers
data to BI DWH
Kafka
Topic-1
View
Reports
Customers
3. Integrate with new producers
6
RealTime
servers
HDFS
BI DWH
(Vertica)
Hadoop
MR Job transfers
data to BI DWH
Kafka
Topic-1 Topic-2
New
RealTime
servers
View
Reports
Customers
4. Add Real-time BI
View
Reports
6
Customers
RealTime
servers
HDFS
BI DWH
(Vertica)
Hadoop
MR Job transfers
data to BI DWH
Kafka
Topic-1 Topic-2
New
RealTime
servers
Storm
Topology
5. Standardize Data-Model using Avro
View
Reports
6
Customers
RealTime
servers
HDFS
BI DWH
(Vertica)
Hadoop
MR Job transfers
data to BI DWH
Kafka
Topic-1 Topic-2
New
RealTime
servers
Storm
Topology
Camus
6. Define Single Source of Truth (SSOT)
View
Reports
6
Customers
RealTime
servers
HDFS
BI DWH
(Vertica)
Hadoop
MR Job transfers
data to BI DWH
Kafka
Topic-1 Topic-2
New
RealTime
servers
Storm
Topology
Camus
Kafka[2] as Backbone for Data
• Central "Message Bus"
• Support multiple topics (MQ style)
• Write ahead to files
• Distributed & Highly Available
• Horizontal Scale
• High throughput (10s MB/Sec per server)
• Service is agnostic to consumers' state
• Retention policy
Kafka Architecture
Kafka Architecture cont.
Node 1
Zookeeper
Producer 1 Producer 2 Producer 3
Node 2 Node 3
Consumer 1 Consumer 1Consumer 1
Group1
Kafka Architecture cont.
Node 1
Zookeeper
Producer 1 Producer 2
Node 3 Node 4
Consumer 2 Consumer 3Consumer 1
Node 2
Topic1 Topic2
Kafka replay messages.
Zookeeper
Node 3 Node 4
Min Offset ->
Max Offset ->
fetchRequest = new fetchRequest(topic, partition, offset, size);
currentOffset : taken from zookeeper
Earliest offset: -2
Latest offset : -1
Kafka API[3]
• Producer API
• Consumer API
o High-level API
 using zookeeper to access brokers and to save
offsets
o SimpleConsumer API
 direct access to Kafka brokers
• Kafka-Spout, Camus, and KafkaHadoopConsumer all
use SimpleConsumer
Kafka API[3]
• Producer
messages = new List<KeyedMessage<K, V>>()
Messages.add(new KeyedMessage(“topic1”, null, msg1));
Send(messages);
• Consumer
streams[] = Consumer.createMessageStream((“topic1”, 1);
for (message: streams[0]{
//do something with message
}
Kafka in Unit Testing
• Use of class KafkaServer
• Run embedded server
Introducing Avro[5]
• Schema representation using JSON
• Support types
o Primitive types: boolean, int, long, string, etc.
o Complex types:
Record, Enum, Union, Arrays, Maps, Fixed
• Data is serialized using its schema
• Avro files include file-header of the schema
Add Avro protocol to the story
Topic 1
Schema Repo
Producer 1
Topic 2
Consumers:
Camus/Storm
Create Message
according to
Schema 1.0
Register schema 1.0
Add revision to
message header
Send message
Read message Extract header
and obtain
schema version
Get schema by
version 1.0
Encode message
with Schema 1.0
Decode message
with schema 1.0
{event1:{header:{sessionId:"102122"),{timestamp:"12346")}...
Header Avro message
Kafka message
Pass message
1.0
Kafka + Storm + Avro example
• Demonstrating use of Avro data passing from Kafka to
Storm
• Explains Avro revision evolution
• Requires Kafka and Zookeeper installed
• Uses Storm artifact and Kafka-Spout artifact in Maven
• Plugin generates Java classes from Avro Schema
• https://github.com/ransilberman/avro-kafka-storm
Producer machine
Resiliency
Producer
Consistent
Topic
Send message
to Kafka
local file
Persist message to
local disk
Kafka Bridge
Send message
to Kafka
Fast
Topic
Real-time
Consumer:
Storm
Offline
Consumer:
Hadoop
Challenges of Kafka
• Still not mature enough
• Not enough supporting tools (viewers, maintenance)
• Duplications may occur
• API not documented enough
• Open Source - support by community only
• Difficult to replay messages from specific point in time
• Eventually Consistent...
Eventually Consistent
Because it is a distributed system -
• No guarantee for delivery order
• No way to tell to which broker message is sent
• Kafka do not guarantee that there are no duplications
• ...But eventually, all message will arrive!
Event
generated
Event
destination
Desert
Major Improvements in Kafka 0.8[4]
• Partitions replication
• Message send guarantee
• Consumer offsets are represented numbers instead of
bytes (e.g., 1, 2, 3, ..)
Addressing Data Challenges
• High throughput
o Kafka, Hadoop
• Horizontal scale to address growth
o Kafka, Storm, Hadoop
• High availability of data services
o Kafka, Storm, Zookeeper
• No Data loss
o Highly Available services, No ETL
Addressing Data Challenges Cont.
• Satisfy Real-Time demands
o Storm
• Enforce structural data with schemas
o Avro
• Process Big Data and Enterprise Data
o Kafka, Hadoop
• Single Source of Truth (SSOT)
o Hadoop, No ETL
References
• [1]Satisfying new requirements for Data Integration By
David Loshin
• [2]Apache Kafka
• [3]Kafka API
• [4]Kafka 0.8 Quick Start
• [5]Apache Avro
• [5]Storm
Thank you!

Más contenido relacionado

La actualidad más candente

Samza at LinkedIn: Taking Stream Processing to the Next Level
Samza at LinkedIn: Taking Stream Processing to the Next LevelSamza at LinkedIn: Taking Stream Processing to the Next Level
Samza at LinkedIn: Taking Stream Processing to the Next LevelMartin Kleppmann
 
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...Paul Brebner
 
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
Beaming flink to the cloud @ netflix   ff 2016-monal-daxiniBeaming flink to the cloud @ netflix   ff 2016-monal-daxini
Beaming flink to the cloud @ netflix ff 2016-monal-daxiniMonal Daxini
 
Building a distributed Key-Value store with Cassandra
Building a distributed Key-Value store with CassandraBuilding a distributed Key-Value store with Cassandra
Building a distributed Key-Value store with Cassandraaaronmorton
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uberconfluent
 
The Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data ProblemsThe Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data ProblemsMonal Daxini
 
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life ExampleKafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Exampleconfluent
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineMonal Daxini
 
Experience with Kafka & Storm
Experience with Kafka & StormExperience with Kafka & Storm
Experience with Kafka & StormOtto Mok
 
Bulletproof Kafka with Fault Tree Analysis (Andrey Falko, Lyft) Kafka Summit ...
Bulletproof Kafka with Fault Tree Analysis (Andrey Falko, Lyft) Kafka Summit ...Bulletproof Kafka with Fault Tree Analysis (Andrey Falko, Lyft) Kafka Summit ...
Bulletproof Kafka with Fault Tree Analysis (Andrey Falko, Lyft) Kafka Summit ...confluent
 
Introduction to Streaming Distributed Processing with Storm
Introduction to Streaming Distributed Processing with StormIntroduction to Streaming Distributed Processing with Storm
Introduction to Streaming Distributed Processing with StormBrandon O'Brien
 
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per DayHadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per DayAnkur Bansal
 
Ingesting Healthcare Data, Micah Whitacre
Ingesting Healthcare Data, Micah WhitacreIngesting Healthcare Data, Micah Whitacre
Ingesting Healthcare Data, Micah Whitacreconfluent
 
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN Flink Forward
 
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
Deploying Kafka at Dropbox, Mark Smith, Sean FellowsDeploying Kafka at Dropbox, Mark Smith, Sean Fellows
Deploying Kafka at Dropbox, Mark Smith, Sean Fellowsconfluent
 
Easily Build a Smart Pulsar Stream Processor_Simon Crosby
Easily Build a Smart Pulsar Stream Processor_Simon CrosbyEasily Build a Smart Pulsar Stream Processor_Simon Crosby
Easily Build a Smart Pulsar Stream Processor_Simon CrosbyStreamNative
 
High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016Eric Sammer
 
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013Amazon Web Services
 

La actualidad más candente (20)

Samza at LinkedIn: Taking Stream Processing to the Next Level
Samza at LinkedIn: Taking Stream Processing to the Next LevelSamza at LinkedIn: Taking Stream Processing to the Next Level
Samza at LinkedIn: Taking Stream Processing to the Next Level
 
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...ApacheCon2019 Talk: Kafka, Cassandra and Kubernetesat Scale – Real-time Ano...
ApacheCon2019 Talk: Kafka, Cassandra and Kubernetes at Scale – Real-time Ano...
 
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
Beaming flink to the cloud @ netflix   ff 2016-monal-daxiniBeaming flink to the cloud @ netflix   ff 2016-monal-daxini
Beaming flink to the cloud @ netflix ff 2016-monal-daxini
 
Building a distributed Key-Value store with Cassandra
Building a distributed Key-Value store with CassandraBuilding a distributed Key-Value store with Cassandra
Building a distributed Key-Value store with Cassandra
 
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ UberKafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
Kafka Summit NYC 2017 - Scalable Real-Time Complex Event Processing @ Uber
 
The Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data ProblemsThe Netflix Way to deal with Big Data Problems
The Netflix Way to deal with Big Data Problems
 
Data Pipeline with Kafka
Data Pipeline with KafkaData Pipeline with Kafka
Data Pipeline with Kafka
 
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life ExampleKafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
Kafka Summit NYC 2017 Introduction to Kafka Streams with a Real-life Example
 
Netflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipelineNetflix Keystone—Cloud scale event processing pipeline
Netflix Keystone—Cloud scale event processing pipeline
 
Experience with Kafka & Storm
Experience with Kafka & StormExperience with Kafka & Storm
Experience with Kafka & Storm
 
Bulletproof Kafka with Fault Tree Analysis (Andrey Falko, Lyft) Kafka Summit ...
Bulletproof Kafka with Fault Tree Analysis (Andrey Falko, Lyft) Kafka Summit ...Bulletproof Kafka with Fault Tree Analysis (Andrey Falko, Lyft) Kafka Summit ...
Bulletproof Kafka with Fault Tree Analysis (Andrey Falko, Lyft) Kafka Summit ...
 
Introduction to Streaming Distributed Processing with Storm
Introduction to Streaming Distributed Processing with StormIntroduction to Streaming Distributed Processing with Storm
Introduction to Streaming Distributed Processing with Storm
 
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per DayHadoop summit - Scaling Uber’s Real-Time Infra for  Trillion Events per Day
Hadoop summit - Scaling Uber’s Real-Time Infra for Trillion Events per Day
 
Ingesting Healthcare Data, Micah Whitacre
Ingesting Healthcare Data, Micah WhitacreIngesting Healthcare Data, Micah Whitacre
Ingesting Healthcare Data, Micah Whitacre
 
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
 
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
Deploying Kafka at Dropbox, Mark Smith, Sean FellowsDeploying Kafka at Dropbox, Mark Smith, Sean Fellows
Deploying Kafka at Dropbox, Mark Smith, Sean Fellows
 
ApacheCon BigData Europe 2015
ApacheCon BigData Europe 2015 ApacheCon BigData Europe 2015
ApacheCon BigData Europe 2015
 
Easily Build a Smart Pulsar Stream Processor_Simon Crosby
Easily Build a Smart Pulsar Stream Processor_Simon CrosbyEasily Build a Smart Pulsar Stream Processor_Simon Crosby
Easily Build a Smart Pulsar Stream Processor_Simon Crosby
 
High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016High cardinality time series search: A new level of scale - Data Day Texas 2016
High cardinality time series search: A new level of scale - Data Day Texas 2016
 
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
 

Destacado

Clash of clans data structures
Clash of clans   data structuresClash of clans   data structures
Clash of clans data structuresRan Silberman
 
Sharding with spider solutions 20160721
Sharding with spider solutions 20160721Sharding with spider solutions 20160721
Sharding with spider solutions 20160721Kentoku
 
Serialization (Avro, Message Pack, Kryo)
Serialization (Avro, Message Pack, Kryo)Serialization (Avro, Message Pack, Kryo)
Serialization (Avro, Message Pack, Kryo)오석 한
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013Jun Rao
 
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...Alexey Kharlamov
 
Dev ops for big data cluster management tools
Dev ops for big data  cluster management toolsDev ops for big data  cluster management tools
Dev ops for big data cluster management toolsRan Silberman
 
Using spider for sharding in production
Using spider for sharding in productionUsing spider for sharding in production
Using spider for sharding in productionKentoku
 
歡迎回來:全面圖譜,金融 3.0 顧客行銷新視界
歡迎回來:全面圖譜,金融 3.0 顧客行銷新視界歡迎回來:全面圖譜,金融 3.0 顧客行銷新視界
歡迎回來:全面圖譜,金融 3.0 顧客行銷新視界Etu Solution
 
Spider storage engine (dec212016)
Spider storage engine (dec212016)Spider storage engine (dec212016)
Spider storage engine (dec212016)Kentoku
 
終歸:分群消費者x多元商機的實現
終歸:分群消費者x多元商機的實現終歸:分群消費者x多元商機的實現
終歸:分群消費者x多元商機的實現Etu Solution
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka StreamsGuozhang Wang
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Helena Edelson
 
Spiderストレージエンジンのご紹介
Spiderストレージエンジンのご紹介Spiderストレージエンジンのご紹介
Spiderストレージエンジンのご紹介Kentoku
 
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaLambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaHelena Edelson
 
Kafka and Storm - event processing in realtime
Kafka and Storm - event processing in realtimeKafka and Storm - event processing in realtime
Kafka and Storm - event processing in realtimeGuido Schmutz
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperRahul Jain
 

Destacado (17)

Clash of clans data structures
Clash of clans   data structuresClash of clans   data structures
Clash of clans data structures
 
Sharding with spider solutions 20160721
Sharding with spider solutions 20160721Sharding with spider solutions 20160721
Sharding with spider solutions 20160721
 
Serialization (Avro, Message Pack, Kryo)
Serialization (Avro, Message Pack, Kryo)Serialization (Avro, Message Pack, Kryo)
Serialization (Avro, Message Pack, Kryo)
 
Kafka replication apachecon_2013
Kafka replication apachecon_2013Kafka replication apachecon_2013
Kafka replication apachecon_2013
 
Hadoop ecosystem
Hadoop ecosystemHadoop ecosystem
Hadoop ecosystem
 
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
Building large-scale analytics platform with Storm, Kafka and Cassandra - NYC...
 
Dev ops for big data cluster management tools
Dev ops for big data  cluster management toolsDev ops for big data  cluster management tools
Dev ops for big data cluster management tools
 
Using spider for sharding in production
Using spider for sharding in productionUsing spider for sharding in production
Using spider for sharding in production
 
歡迎回來:全面圖譜,金融 3.0 顧客行銷新視界
歡迎回來:全面圖譜,金融 3.0 顧客行銷新視界歡迎回來:全面圖譜,金融 3.0 顧客行銷新視界
歡迎回來:全面圖譜,金融 3.0 顧客行銷新視界
 
Spider storage engine (dec212016)
Spider storage engine (dec212016)Spider storage engine (dec212016)
Spider storage engine (dec212016)
 
終歸:分群消費者x多元商機的實現
終歸:分群消費者x多元商機的實現終歸:分群消費者x多元商機的實現
終歸:分群消費者x多元商機的實現
 
Introduction to Kafka Streams
Introduction to Kafka StreamsIntroduction to Kafka Streams
Introduction to Kafka Streams
 
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
Lambda Architecture with Spark, Spark Streaming, Kafka, Cassandra, Akka and S...
 
Spiderストレージエンジンのご紹介
Spiderストレージエンジンのご紹介Spiderストレージエンジンのご紹介
Spiderストレージエンジンのご紹介
 
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, ScalaLambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
Lambda Architecture with Spark Streaming, Kafka, Cassandra, Akka, Scala
 
Kafka and Storm - event processing in realtime
Kafka and Storm - event processing in realtimeKafka and Storm - event processing in realtime
Kafka and Storm - event processing in realtime
 
Introduction to Kafka and Zookeeper
Introduction to Kafka and ZookeeperIntroduction to Kafka and Zookeeper
Introduction to Kafka and Zookeeper
 

Similar a From a kafkaesque story to The Promised Land

From a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePersonFrom a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePersonLivePerson
 
Reducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive StreamsReducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive Streamsjimriecken
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...LINE Corporation
 
Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Peter Bakas
 
World of Tanks Experience of Using Kafka
World of Tanks Experience of Using KafkaWorld of Tanks Experience of Using Kafka
World of Tanks Experience of Using KafkaLevon Avakyan
 
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Erik Onnen
 
Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Evan Chan
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
Apache Kafka
Apache KafkaApache Kafka
Apache KafkaJoe Stein
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processingconfluent
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Guido Schmutz
 
NoSQL afternoon in Japan Kumofs & MessagePack
NoSQL afternoon in Japan Kumofs & MessagePackNoSQL afternoon in Japan Kumofs & MessagePack
NoSQL afternoon in Japan Kumofs & MessagePackSadayuki Furuhashi
 
NoSQL afternoon in Japan kumofs & MessagePack
NoSQL afternoon in Japan kumofs & MessagePackNoSQL afternoon in Japan kumofs & MessagePack
NoSQL afternoon in Japan kumofs & MessagePackSadayuki Furuhashi
 
F_1330_Narkhede_Kafka .pptx
F_1330_Narkhede_Kafka .pptxF_1330_Narkhede_Kafka .pptx
F_1330_Narkhede_Kafka .pptxNIMITJAIN71
 
Actors or Not: Async Event Architectures
Actors or Not: Async Event ArchitecturesActors or Not: Async Event Architectures
Actors or Not: Async Event ArchitecturesYaroslav Tkachenko
 
Event Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDBEvent Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDBScyllaDB
 

Similar a From a kafkaesque story to The Promised Land (20)

From a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePersonFrom a Kafkaesque Story to The Promised Land at LivePerson
From a Kafkaesque Story to The Promised Land at LivePerson
 
Reducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive StreamsReducing Microservice Complexity with Kafka and Reactive Streams
Reducing Microservice Complexity with Kafka and Reactive Streams
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
Building a company-wide data pipeline on Apache Kafka - engineering for 150 b...
 
Keystone - ApacheCon 2016
Keystone - ApacheCon 2016Keystone - ApacheCon 2016
Keystone - ApacheCon 2016
 
Kafka Explainaton
Kafka ExplainatonKafka Explainaton
Kafka Explainaton
 
World of Tanks Experience of Using Kafka
World of Tanks Experience of Using KafkaWorld of Tanks Experience of Using Kafka
World of Tanks Experience of Using Kafka
 
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
Data Models and Consumer Idioms Using Apache Kafka for Continuous Data Stream...
 
Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015Akka in Production - ScalaDays 2015
Akka in Production - ScalaDays 2015
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Apache Kafka
Apache KafkaApache Kafka
Apache Kafka
 
Capital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream ProcessingCapital One Delivers Risk Insights in Real Time with Stream Processing
Capital One Delivers Risk Insights in Real Time with Stream Processing
 
Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!Apache Kafka - Scalable Message Processing and more!
Apache Kafka - Scalable Message Processing and more!
 
Kafka presentation
Kafka presentationKafka presentation
Kafka presentation
 
NoSQL afternoon in Japan Kumofs & MessagePack
NoSQL afternoon in Japan Kumofs & MessagePackNoSQL afternoon in Japan Kumofs & MessagePack
NoSQL afternoon in Japan Kumofs & MessagePack
 
NoSQL afternoon in Japan kumofs & MessagePack
NoSQL afternoon in Japan kumofs & MessagePackNoSQL afternoon in Japan kumofs & MessagePack
NoSQL afternoon in Japan kumofs & MessagePack
 
F_1330_Narkhede_Kafka .pptx
F_1330_Narkhede_Kafka .pptxF_1330_Narkhede_Kafka .pptx
F_1330_Narkhede_Kafka .pptx
 
Actors or Not: Async Event Architectures
Actors or Not: Async Event ArchitecturesActors or Not: Async Event Architectures
Actors or Not: Async Event Architectures
 
Event Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDBEvent Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDB
 

Último

Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsRavi Sanghani
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Scott Andery
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityIES VE
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Farhan Tariq
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentPim van der Noll
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesThousandEyes
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rick Flair
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...panagenda
 

Último (20)

Potential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and InsightsPotential of AI (Generative AI) in Business: Learnings and Insights
Potential of AI (Generative AI) in Business: Learnings and Insights
 
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
Enhancing User Experience - Exploring the Latest Features of Tallyman Axis Lo...
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Decarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a realityDecarbonising Buildings: Making a net-zero built environment a reality
Decarbonising Buildings: Making a net-zero built environment a reality
 
Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...Genislab builds better products and faster go-to-market with Lean project man...
Genislab builds better products and faster go-to-market with Lean project man...
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native developmentEmixa Mendix Meetup 11 April 2024 about Mendix Native development
Emixa Mendix Meetup 11 April 2024 about Mendix Native development
 
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyesAssure Ecommerce and Retail Operations Uptime with ThousandEyes
Assure Ecommerce and Retail Operations Uptime with ThousandEyes
 
Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...Rise of the Machines: Known As Drones...
Rise of the Machines: Known As Drones...
 
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
Why device, WIFI, and ISP insights are crucial to supporting remote Microsoft...
 

From a kafkaesque story to The Promised Land

  • 1. From a Kafkaesque Story to the Promised Land 7/7/2013 Ran Silberman
  • 2. Open Source paradigm The Cathedral & the Bazaar by Eric S Raymond, 1999 the struggle between top-down and bottom-up design
  • 3. Challenges of data platform[1] • High throughput • Horizontal scale to address growth • High availability of data services • No Data loss • Satisfy Real-Time demands • Enforce structural data with schemas • Process Big Data and Enterprise Data • Single Source of Truth (SSOT)
  • 4. SLA's of data platform BI DWH Real-time Customers Real-time dashboards Data Bus Offline Customers SLA: 1. 98% in < 1/2 hr 2. 99.999% < 4 hrs SLA: 1. 98% in < 500 msec 2. No send > 2 sec Real-time servers
  • 5. Legacy Data flow in LivePerson BI DWH (Oracle) RealTime servers View Reports Customers ETL Sessionize Modeling Schema View
  • 6. 1st phase - move to Hadoop ETL Sessionize Modeling Schema View RealTime servers BI DWH (Vertica) View Reports HDFS Hadoop MR Job transfers data to BI DWH Customers BI DWH (Oracle)
  • 7. 2. move to Kafka 6 RealTime servers HDFS BI DWH (Vertica) Hadoop MR Job transfers data to BI DWH Kafka Topic-1 View Reports Customers
  • 8. 3. Integrate with new producers 6 RealTime servers HDFS BI DWH (Vertica) Hadoop MR Job transfers data to BI DWH Kafka Topic-1 Topic-2 New RealTime servers View Reports Customers
  • 9. 4. Add Real-time BI View Reports 6 Customers RealTime servers HDFS BI DWH (Vertica) Hadoop MR Job transfers data to BI DWH Kafka Topic-1 Topic-2 New RealTime servers Storm Topology
  • 10. 5. Standardize Data-Model using Avro View Reports 6 Customers RealTime servers HDFS BI DWH (Vertica) Hadoop MR Job transfers data to BI DWH Kafka Topic-1 Topic-2 New RealTime servers Storm Topology Camus
  • 11. 6. Define Single Source of Truth (SSOT) View Reports 6 Customers RealTime servers HDFS BI DWH (Vertica) Hadoop MR Job transfers data to BI DWH Kafka Topic-1 Topic-2 New RealTime servers Storm Topology Camus
  • 12. Kafka[2] as Backbone for Data • Central "Message Bus" • Support multiple topics (MQ style) • Write ahead to files • Distributed & Highly Available • Horizontal Scale • High throughput (10s MB/Sec per server) • Service is agnostic to consumers' state • Retention policy
  • 14. Kafka Architecture cont. Node 1 Zookeeper Producer 1 Producer 2 Producer 3 Node 2 Node 3 Consumer 1 Consumer 1Consumer 1
  • 15. Group1 Kafka Architecture cont. Node 1 Zookeeper Producer 1 Producer 2 Node 3 Node 4 Consumer 2 Consumer 3Consumer 1 Node 2 Topic1 Topic2
  • 16. Kafka replay messages. Zookeeper Node 3 Node 4 Min Offset -> Max Offset -> fetchRequest = new fetchRequest(topic, partition, offset, size); currentOffset : taken from zookeeper Earliest offset: -2 Latest offset : -1
  • 17. Kafka API[3] • Producer API • Consumer API o High-level API  using zookeeper to access brokers and to save offsets o SimpleConsumer API  direct access to Kafka brokers • Kafka-Spout, Camus, and KafkaHadoopConsumer all use SimpleConsumer
  • 18. Kafka API[3] • Producer messages = new List<KeyedMessage<K, V>>() Messages.add(new KeyedMessage(“topic1”, null, msg1)); Send(messages); • Consumer streams[] = Consumer.createMessageStream((“topic1”, 1); for (message: streams[0]{ //do something with message }
  • 19. Kafka in Unit Testing • Use of class KafkaServer • Run embedded server
  • 20. Introducing Avro[5] • Schema representation using JSON • Support types o Primitive types: boolean, int, long, string, etc. o Complex types: Record, Enum, Union, Arrays, Maps, Fixed • Data is serialized using its schema • Avro files include file-header of the schema
  • 21. Add Avro protocol to the story Topic 1 Schema Repo Producer 1 Topic 2 Consumers: Camus/Storm Create Message according to Schema 1.0 Register schema 1.0 Add revision to message header Send message Read message Extract header and obtain schema version Get schema by version 1.0 Encode message with Schema 1.0 Decode message with schema 1.0 {event1:{header:{sessionId:"102122"),{timestamp:"12346")}... Header Avro message Kafka message Pass message 1.0
  • 22. Kafka + Storm + Avro example • Demonstrating use of Avro data passing from Kafka to Storm • Explains Avro revision evolution • Requires Kafka and Zookeeper installed • Uses Storm artifact and Kafka-Spout artifact in Maven • Plugin generates Java classes from Avro Schema • https://github.com/ransilberman/avro-kafka-storm
  • 23. Producer machine Resiliency Producer Consistent Topic Send message to Kafka local file Persist message to local disk Kafka Bridge Send message to Kafka Fast Topic Real-time Consumer: Storm Offline Consumer: Hadoop
  • 24. Challenges of Kafka • Still not mature enough • Not enough supporting tools (viewers, maintenance) • Duplications may occur • API not documented enough • Open Source - support by community only • Difficult to replay messages from specific point in time • Eventually Consistent...
  • 25. Eventually Consistent Because it is a distributed system - • No guarantee for delivery order • No way to tell to which broker message is sent • Kafka do not guarantee that there are no duplications • ...But eventually, all message will arrive! Event generated Event destination Desert
  • 26. Major Improvements in Kafka 0.8[4] • Partitions replication • Message send guarantee • Consumer offsets are represented numbers instead of bytes (e.g., 1, 2, 3, ..)
  • 27. Addressing Data Challenges • High throughput o Kafka, Hadoop • Horizontal scale to address growth o Kafka, Storm, Hadoop • High availability of data services o Kafka, Storm, Zookeeper • No Data loss o Highly Available services, No ETL
  • 28. Addressing Data Challenges Cont. • Satisfy Real-Time demands o Storm • Enforce structural data with schemas o Avro • Process Big Data and Enterprise Data o Kafka, Hadoop • Single Source of Truth (SSOT) o Hadoop, No ETL
  • 29. References • [1]Satisfying new requirements for Data Integration By David Loshin • [2]Apache Kafka • [3]Kafka API • [4]Kafka 0.8 Quick Start • [5]Apache Avro • [5]Storm