SlideShare a Scribd company logo
1 of 40
Asynchronous micro-services and
the unified log
Crunch Conference, Budapest, 7th October 2016
Introducing myself
• Alexander Dean
• Co-founder and technical lead at Snowplow, the
open-source event data pipeline
• Weekend writer of Unified Log Processing,
available on the Manning Early Access Program
• Co-author at Snowplow of Iglu, our open-source
schema registry system, and Sauna, our open-
source decisioning and response platform
We are witnessing the convergence of two separate technology
tracks towards asynchronous or event-driven micro-services
Transactional workloads
Analytical workloads
Asynchronous
micro-services
Software monoliths
Synchronous
micro-services
Classic data
warehousing
Hybrid data
pipelines
Unified log
architectures
Analytical workloads, and
the rise of the unified log
A quick history lesson: the three eras of business data
processing [1]
1. Classic data warehousing, 1996+
2. Hybrid data pipelines, 2005+
3. Unified log architectures, 2013+
[1] http://snowplowanalytics.com/blog/
2014/01/20/the-three-eras-of-business-data-processing/
The era of classic data warehousing, 1996+
OWN DATA CENTER
Data warehouse
HIGH LATENCY
Point-to-point
connections
WIDE DATA
COVERAGE
CMS
Silo
CRM
Local loop Local loop
NARROW DATA SILOES LOW LATENCY LOCAL LOOPS
E-comm
Silo
Local loop
Management
reporting
ERP
Silo
Local loop
Silo
Nightly batch ETL process
FULL DATA
HISTORY
The era of hybrid data pipelines, 2005+
CLOUD VENDOR / OWN DATA CENTER
Search
Silo
Local loop
LOW LATENCY LOCAL LOOPS
E-comm
Silo
Local loop
CRM
Local loop
SAAS VENDOR #2
Email
marketing
Local loop
ERP
Silo
Local loop
CMS
Silo
Local loop
SAAS VENDOR #1
NARROW DATA SILOES
Stream
processing
Product
rec’s
Micro-batch
processing
Systems
monitoring
Batch
processing
Data
warehouse
Management
reporting
Batch
processing
Ad hoc
analytics
Hadoop
SAAS VENDOR #3
Web
analytics
Local loop
Local loop Local loop
LOW LATENCY LOW LATENCY
HIGH LATENCY HIGH LATENCY
APIs
Bulk exports
The hybrid era: a surfeit of software vendors
CLOUD VENDOR / OWN DATA CENTER
Search
Silo
Local loop
LOW LATENCY LOCAL LOOPS
E-comm
Silo
Local loop
CRM
Local loop
SAAS VENDOR #2
Email
marketing
Local loop
ERP
Silo
Local loop
CMS
Silo
Local loop
SAAS VENDOR #1
NARROW DATA SILOES
Stream
processing
Product
rec’s
Micro-batch
processing
Systems
monitoring
Batch
processing
Data
warehouse
Management
reporting
Batch
processing
Ad hoc
analytics
Hadoop
SAAS VENDOR #3
Web
analytics
Local loop
Local loop Local loop
LOW LATENCY LOW LATENCY
HIGH LATENCY HIGH LATENCY
APIs
Bulk exports
The hybrid era: company-wide reporting and
analytics ends up like Rashomon
The bandit’s story
vs.
The wife’s story
vs.
The samurai’s story
vs.
The woodcutter’s story
The hybrid era: the number of data integrations
is unsustainable
So how do we unravel the
hairball?
The advent of the unified log, 2013+
CLOUD VENDOR / OWN DATA CENTER
Search
Silo
SOME LOW LATENCY LOCAL LOOPS
E-comm
Silo
CRM
SAAS VENDOR #2
Email
marketing
ERP
Silo
CMS
Silo
SAAS VENDOR #1
NARROW DATA SILOES
Streaming APIs /
web hooks
Unified log
LOW LATENCY WIDE DATA
COVERAGE
Archiving
Hadoop
< WIDE DATA
COVERAGE >
< FULL DATA
HISTORY >
FEW DAYS’
DATA HISTORY
Systems
monitoring
Eventstream
HIGH LATENCY LOW LATENCY
Product rec’s
Ad hoc
analytics
Management
reporting
Fraud
detection
Churn
prevention
APIs
CLOUD VENDOR / OWN DATA CENTER
Search
Silo
SOME LOW LATENCY LOCAL LOOPS
E-comm
Silo
CRM
SAAS VENDOR #2
Email
marketing
ERP
Silo
CMS
Silo
SAAS VENDOR #1
NARROW DATA SILOES
Streaming APIs /
web hooks
Unified log
Archiving
Hadoop
< WIDE DATA
COVERAGE >
< FULL DATA
HISTORY >
Systems
monitoring
Eventstream
HIGH LATENCY LOW LATENCY
Product rec’s
Ad hoc
analytics
Management
reporting
Fraud
detection
Churn
prevention
APIs
The unified log is Amazon Kinesis, or Apache Kafka
• Amazon Kinesis, a
hosted AWS service
• Extremely similar
semantics to Kafka
• Apache Kafka, an append-
only, distributed, ordered
commit log
• Developed at LinkedIn to
serve as their
organization’s unified log
“Kafka is designed to allow a
single cluster to serve as the
central data backbone for a
large organization” [1]
[1] http://kafka.apache.org/
So what does a unified log give us?
A single version of the truth
Our truth is now upstream from the data warehouse
The hairball of point-to-point connections has been
unravelled
Local loops have been unbundled
1
2
3
4
Coming up to the end of 2016, and unified log architectures have
seen extremely rapid and widespread adoption
Transactional data
processing and the move to
micro-services
In parallel, we have seen a steady (if spotty) rejection of
software monoliths for transactional workloads
In a micro-services architecture, the individual capabilities of the
system are split out into separate services
Synchronous
communication using
request and response
(often using RESTful
HTTP or RPC)
What do synchronous micro-services give us?
Strong module boundaries
• Network boundaries between modules can be
helpful for larger teams
Independent deployment
• Deploy individual micro-services independently
• Simpler to deploy and less likely to cause whole
system failures
Support diversity
• Can use the best language/framework/database
for the capability
• Reduces monoculture risk (anti-fragile)
1
2
3
Convergence on
asynchronous or event-
driven micro-services
When we re-architected Snowplow around the unified log 2.5
years ago, we designed it around small, composable workers
Diagram from our February 2014
Snowplow v0.9.0 release post
This was based on the insight that real-time pipelines can be
composed a little like Unix pipes
We avoided monolithic Spark Streaming or Storm jobs, based on
our experiences with “heavy” Hadoop jobs in our batch pipeline
Kinesis
stream
Kinesis
stream
Kinesis
stream
Kinesis
stream
Kinesis
stream
What we didn’t do: an inner Storm topology
We wanted to avoid the “inner topology” effect, with
effectively two tiers of topology to reason about
Difficult to unit test the inner topologies – complex
behaviours inside each unit
Difficult to operationalize the inner topologies – how
do they handle backpressure, how do they scale,
how do we upgrade them?
Difficult to monitor the inner topologies
Fundamental problem: the event streams in an inner
topology are not first class entities
1
2
3
It worked: today the Snowplow real-time pipeline is a collection
of individual event-driven micro-services
Stream
Collector
Stream
Enrich
Kinesis S3
Kinesis
Elasticsearch
Kinesis Tee
Kinesis
Redshift
(design stage)
User’s AWS
Lambda
function
User’s KCL
worker app
User’s Spark
Streaming
job
“the most compelling applications for
stream processing are actually pretty
different from what you would typically
do with a Hive or Spark job—they are
closer to being a kind of asynchronous
microservice rather than being a faster
version of a batch analytics job. …”
Meanwhile, the Kafka team (now at Confluent) were
seeing something interesting in the adoption of Kafka…
“…What I mean is that these stream
processing apps were most often
software that implemented core
functions in the business rather than
computing analytics about the
business.” [1]
[1] http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple/
And these micro-services were substituting not just for batch
analytical workloads, but also for transactional workloads
Why are async micro-
services replacing “classic”
request/response micro-
services (at least amongst
Kafka users)?
To start with, asynchronous micro-services have the same
benefits as synchronous micro-services…
Strong module boundaries
• Network boundaries between modules can be
helpful for larger teams
Independent deployment
• Deploy individual micro-services independently
• Easier to deploy and less likely to cause whole
system failures
Support diversity
• Can use the best language/framework/database
for the capability
• Reduces monoculture risk (anti-fragile)
1
2
3
… but in addition, event-driven asynchronous micro-
services are extremely loosely-coupled, because they are
intermediated by first class streams
Compare this to request/response synchronous micro-services,
which have dependencies between upstream and downstream
Downstream
Upstream
Single-page web app
Login service
Notifications
service
Content
service
Customer profile service
Personalization
service
Ad service
If you can afford the (small) latency tax, there are some
clear advantages in going asynchronous
Much better toolkit for upgrades – schema evolution,
running old and new service versions in parallel etc
Adding new downstream services doesn’t increase the
load on upstream services
Failure of individual services introduces lag into the
overall system, rather than overall system failure
Easier to debug, because service inputs and outputs
are directly inspectable in the event streams
1
2
3
4
So what’s next in this
convergence?
?
More and more tooling for building asynchronous micro-
services, including the unstoppable rise of “serverless” 
Kinesis Client Library
AWS Lambda
IBM OpenWhisk
Event-driven cloud functionsStream processing as a library
Azure Functions
Greater adoption of open source schema registries as the
canonical source of truth for the events in our topologies
Confluent or Iglu
schema registries
Request/response micro-services aren’t going away –
they are just too useful
But expect a move from slower HTTP-based to faster RPC-
based options
• e.g. Google’s gRPC
• These are easier to read from and write from in high-
volume event-driven architectures
Expect wider adoption of API definition languages
• OpenAPI (Swagger), RAML, API Blueprint
Eventual harmonization of types?
• Using JSON for RESTful APIs, Protocol Buffers for RPC
and Avro for stream processing is crazy
• Needs company-wide standardization, or dynamic
translation (but this is lossy)
1
2
3
We also need new fabrics – or extensions of existing ones like
Kubernetes – to address the challenges of running our topologies
“How do we
monitor this
topology, and
alert if
something
(data loss;
event lag) is
going wrong?”
“How do we
scale our
streams and
micro-services
to handle
event peaks
and troughs
smoothly?”
“How do we
re-configure or
upgrade our
micro-services
without
breaking
things?”
At Snowplow we are working on a unified log fabric,
called Tupilak, to solve this problem
Questions?
http://snowplowanalytics.com
https://github.com/snowplow/snowplow
@snowplowdata
To meet up or chat, @alexcrdean on Twitter or
alex@snowplowanalytics.com

More Related Content

What's hot

Stream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsStream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsTom Van den Bulck
 
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin KumarSiphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumarconfluent
 
How Disney+ uses fast data ubiquity to improve the customer experience
 How Disney+ uses fast data ubiquity to improve the customer experience  How Disney+ uses fast data ubiquity to improve the customer experience
How Disney+ uses fast data ubiquity to improve the customer experience Martin Zapletal
 
Data Driven Enterprise with Apache Kafka
Data Driven Enterprise with Apache KafkaData Driven Enterprise with Apache Kafka
Data Driven Enterprise with Apache Kafkaconfluent
 
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013Amazon Web Services
 
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 20190-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019confluent
 
Lambda Architecture in Practice
Lambda Architecture in PracticeLambda Architecture in Practice
Lambda Architecture in PracticeNavneet kumar
 
Using Apache Kafka to Analyze Session Windows
Using Apache Kafka to Analyze Session WindowsUsing Apache Kafka to Analyze Session Windows
Using Apache Kafka to Analyze Session Windowsconfluent
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBconfluent
 
Event Stream Processing with Kafka and Samza
Event Stream Processing with Kafka and SamzaEvent Stream Processing with Kafka and Samza
Event Stream Processing with Kafka and SamzaZach Cox
 
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache KafkaKafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafkaconfluent
 
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...HostedbyConfluent
 
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...Lightbend
 
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to SurviveHadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Surviveconfluent
 
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteStructure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteGigaom
 
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...confluent
 
Streaming Transformations - Putting the T in Streaming ETL
Streaming Transformations - Putting the T in Streaming ETLStreaming Transformations - Putting the T in Streaming ETL
Streaming Transformations - Putting the T in Streaming ETLconfluent
 
Neo4j Graph Streaming Services with Apache Kafka
Neo4j Graph Streaming Services with Apache KafkaNeo4j Graph Streaming Services with Apache Kafka
Neo4j Graph Streaming Services with Apache Kafkajexp
 
Pivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
Pivoting Spring XD to Spring Cloud Data Flow with Sabby AnandanPivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
Pivoting Spring XD to Spring Cloud Data Flow with Sabby AnandanPivotalOpenSourceHub
 
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...HostedbyConfluent
 

What's hot (20)

Stream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka StreamsStream Processing Live Traffic Data with Kafka Streams
Stream Processing Live Traffic Data with Kafka Streams
 
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin KumarSiphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
Siphon - Near Real Time Databus Using Kafka, Eric Boyd, Nitin Kumar
 
How Disney+ uses fast data ubiquity to improve the customer experience
 How Disney+ uses fast data ubiquity to improve the customer experience  How Disney+ uses fast data ubiquity to improve the customer experience
How Disney+ uses fast data ubiquity to improve the customer experience
 
Data Driven Enterprise with Apache Kafka
Data Driven Enterprise with Apache KafkaData Driven Enterprise with Apache Kafka
Data Driven Enterprise with Apache Kafka
 
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
Maximizing Audience Engagement in Media Delivery (MED303) | AWS re:Invent 2013
 
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 20190-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
0-60: Tesla's Streaming Data Platform ( Jesse Yates, Tesla) Kafka Summit SF 2019
 
Lambda Architecture in Practice
Lambda Architecture in PracticeLambda Architecture in Practice
Lambda Architecture in Practice
 
Using Apache Kafka to Analyze Session Windows
Using Apache Kafka to Analyze Session WindowsUsing Apache Kafka to Analyze Session Windows
Using Apache Kafka to Analyze Session Windows
 
Data Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDBData Streaming with Apache Kafka & MongoDB
Data Streaming with Apache Kafka & MongoDB
 
Event Stream Processing with Kafka and Samza
Event Stream Processing with Kafka and SamzaEvent Stream Processing with Kafka and Samza
Event Stream Processing with Kafka and Samza
 
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache KafkaKafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
Kafka Summit NYC 2017 - Data Processing at LinkedIn with Apache Kafka
 
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
Hybrid Kafka, Taking Real-time Analytics to the Business (Cody Irwin, Google ...
 
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
Hands On With Spark: Creating A Fast Data Pipeline With Structured Streaming ...
 
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to SurviveHadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
Hadoop made fast - Why Virtual Reality Needed Stream Processing to Survive
 
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan WaiteStructure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
Structure Data 2014: BIG DATA ANALYTICS RE-INVENTED, Ryan Waite
 
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
Kafka: Journey from Just Another Software to Being a Critical Part of PayPal ...
 
Streaming Transformations - Putting the T in Streaming ETL
Streaming Transformations - Putting the T in Streaming ETLStreaming Transformations - Putting the T in Streaming ETL
Streaming Transformations - Putting the T in Streaming ETL
 
Neo4j Graph Streaming Services with Apache Kafka
Neo4j Graph Streaming Services with Apache KafkaNeo4j Graph Streaming Services with Apache Kafka
Neo4j Graph Streaming Services with Apache Kafka
 
Pivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
Pivoting Spring XD to Spring Cloud Data Flow with Sabby AnandanPivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
Pivoting Spring XD to Spring Cloud Data Flow with Sabby Anandan
 
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
Confluent On Azure: Why you should add Confluent to your Azure toolkit | Alic...
 

Viewers also liked

Unified Log Processing Architecture
Unified Log Processing ArchitectureUnified Log Processing Architecture
Unified Log Processing ArchitectureGuido Schmutz
 
Comments: Why not What
Comments: Why not WhatComments: Why not What
Comments: Why not WhatSean Kelly
 
Scalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduceScalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReducePietro Michiardi
 
Duplicating data or replicating data in Micro Services
Duplicating data or replicating data in Micro ServicesDuplicating data or replicating data in Micro Services
Duplicating data or replicating data in Micro ServicesDennis van der Stelt
 
Architektur von Big Data Lösungen
Architektur von Big Data LösungenArchitektur von Big Data Lösungen
Architektur von Big Data LösungenGuido Schmutz
 
Big data and its impact on SOA
Big data and its impact on SOABig data and its impact on SOA
Big data and its impact on SOADemed L'Her
 
Relational Algebra and MapReduce
Relational Algebra and MapReduceRelational Algebra and MapReduce
Relational Algebra and MapReducePietro Michiardi
 
High-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig LatinHigh-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig LatinPietro Michiardi
 
Distributed Stream Processing with Apache Kafka
Distributed Stream Processing with Apache KafkaDistributed Stream Processing with Apache Kafka
Distributed Stream Processing with Apache KafkaJay Kreps
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming AnalyticsGuido Schmutz
 
Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms comparedApache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms comparedGuido Schmutz
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
I Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache KafkaI Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache KafkaJay Kreps
 
CQRS recipes or how to cook your architecture
CQRS recipes or how to cook your architectureCQRS recipes or how to cook your architecture
CQRS recipes or how to cook your architectureThomas Jaskula
 
How We Made our Tech Organization and Architecture Converge Towards Scalability
How We Made our Tech Organization and Architecture Converge Towards ScalabilityHow We Made our Tech Organization and Architecture Converge Towards Scalability
How We Made our Tech Organization and Architecture Converge Towards ScalabilityZalando Technology
 
Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices  Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices Zalando Technology
 
Marshall hm poster_vra2015
Marshall hm poster_vra2015Marshall hm poster_vra2015
Marshall hm poster_vra2015Hannah Marshall
 
Boletín 30/03/2017
Boletín 30/03/2017Boletín 30/03/2017
Boletín 30/03/2017Openbank
 

Viewers also liked (20)

Unified Log Processing Architecture
Unified Log Processing ArchitectureUnified Log Processing Architecture
Unified Log Processing Architecture
 
Comments: Why not What
Comments: Why not WhatComments: Why not What
Comments: Why not What
 
Scalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduceScalable Algorithm Design with MapReduce
Scalable Algorithm Design with MapReduce
 
Duplicating data or replicating data in Micro Services
Duplicating data or replicating data in Micro ServicesDuplicating data or replicating data in Micro Services
Duplicating data or replicating data in Micro Services
 
Architektur von Big Data Lösungen
Architektur von Big Data LösungenArchitektur von Big Data Lösungen
Architektur von Big Data Lösungen
 
SOA & Big Data
SOA & Big DataSOA & Big Data
SOA & Big Data
 
Hadoop Internals
Hadoop InternalsHadoop Internals
Hadoop Internals
 
Big data and its impact on SOA
Big data and its impact on SOABig data and its impact on SOA
Big data and its impact on SOA
 
Relational Algebra and MapReduce
Relational Algebra and MapReduceRelational Algebra and MapReduce
Relational Algebra and MapReduce
 
High-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig LatinHigh-level Programming Languages: Apache Pig and Pig Latin
High-level Programming Languages: Apache Pig and Pig Latin
 
Distributed Stream Processing with Apache Kafka
Distributed Stream Processing with Apache KafkaDistributed Stream Processing with Apache Kafka
Distributed Stream Processing with Apache Kafka
 
Introduction to Streaming Analytics
Introduction to Streaming AnalyticsIntroduction to Streaming Analytics
Introduction to Streaming Analytics
 
Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms comparedApache Storm vs. Spark Streaming – two Stream Processing Platforms compared
Apache Storm vs. Spark Streaming – two Stream Processing Platforms compared
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
I Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache KafkaI Heart Log: Real-time Data and Apache Kafka
I Heart Log: Real-time Data and Apache Kafka
 
CQRS recipes or how to cook your architecture
CQRS recipes or how to cook your architectureCQRS recipes or how to cook your architecture
CQRS recipes or how to cook your architecture
 
How We Made our Tech Organization and Architecture Converge Towards Scalability
How We Made our Tech Organization and Architecture Converge Towards ScalabilityHow We Made our Tech Organization and Architecture Converge Towards Scalability
How We Made our Tech Organization and Architecture Converge Towards Scalability
 
Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices  Flink in Zalando's World of Microservices
Flink in Zalando's World of Microservices
 
Marshall hm poster_vra2015
Marshall hm poster_vra2015Marshall hm poster_vra2015
Marshall hm poster_vra2015
 
Boletín 30/03/2017
Boletín 30/03/2017Boletín 30/03/2017
Boletín 30/03/2017
 

Similar to Asynchronous micro-services and the unified log

Snowplow Analytics: from NoSQL to SQL and back again
Snowplow Analytics: from NoSQL to SQL and back againSnowplow Analytics: from NoSQL to SQL and back again
Snowplow Analytics: from NoSQL to SQL and back againAlexander Dean
 
Apache Spark Streaming -Real time web server log analytics
Apache Spark Streaming -Real time web server log analyticsApache Spark Streaming -Real time web server log analytics
Apache Spark Streaming -Real time web server log analyticsANKIT GUPTA
 
Service Discovery and Registration in a Microservices Architecture
Service Discovery and Registration in a Microservices ArchitectureService Discovery and Registration in a Microservices Architecture
Service Discovery and Registration in a Microservices ArchitecturePLUMgrid
 
No More Hops Towards A Linearly Scalable Application Infrastructure
No More Hops Towards A Linearly Scalable Application InfrastructureNo More Hops Towards A Linearly Scalable Application Infrastructure
No More Hops Towards A Linearly Scalable Application InfrastructureConSanFrancisco123
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...confluent
 
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...confluent
 
A Microservices Journey - Susanne Kaiser
A Microservices Journey - Susanne KaiserA Microservices Journey - Susanne Kaiser
A Microservices Journey - Susanne KaiserThoughtworks
 
Data Streaming with Apache Kafka & MongoDB - EMEA
Data Streaming with Apache Kafka & MongoDB - EMEAData Streaming with Apache Kafka & MongoDB - EMEA
Data Streaming with Apache Kafka & MongoDB - EMEAAndrew Morgan
 
Webinar: Data Streaming with Apache Kafka & MongoDB
Webinar: Data Streaming with Apache Kafka & MongoDBWebinar: Data Streaming with Apache Kafka & MongoDB
Webinar: Data Streaming with Apache Kafka & MongoDBMongoDB
 
apidays LIVE Jakarta - Building an Event-Driven Architecture by Harin Honesty...
apidays LIVE Jakarta - Building an Event-Driven Architecture by Harin Honesty...apidays LIVE Jakarta - Building an Event-Driven Architecture by Harin Honesty...
apidays LIVE Jakarta - Building an Event-Driven Architecture by Harin Honesty...apidays
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven productsLars Albertsson
 
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...Lightbend
 
Cloud Computing:An Economic Solution for Libraries
Cloud Computing:An Economic Solution for LibrariesCloud Computing:An Economic Solution for Libraries
Cloud Computing:An Economic Solution for LibrariesAmit Shaw
 
Amazon aws big data demystified | Introduction to streaming and messaging flu...
Amazon aws big data demystified | Introduction to streaming and messaging flu...Amazon aws big data demystified | Introduction to streaming and messaging flu...
Amazon aws big data demystified | Introduction to streaming and messaging flu...Omid Vahdaty
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Guido Schmutz
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Amazon Web Services
 
Etl is Dead; Long Live Streams
Etl is Dead; Long Live StreamsEtl is Dead; Long Live Streams
Etl is Dead; Long Live Streamsconfluent
 
ETL Is Dead, Long-live Streams
ETL Is Dead, Long-live StreamsETL Is Dead, Long-live Streams
ETL Is Dead, Long-live StreamsC4Media
 

Similar to Asynchronous micro-services and the unified log (20)

Snowplow Analytics: from NoSQL to SQL and back again
Snowplow Analytics: from NoSQL to SQL and back againSnowplow Analytics: from NoSQL to SQL and back again
Snowplow Analytics: from NoSQL to SQL and back again
 
Apache Spark Streaming -Real time web server log analytics
Apache Spark Streaming -Real time web server log analyticsApache Spark Streaming -Real time web server log analytics
Apache Spark Streaming -Real time web server log analytics
 
Service Discovery and Registration in a Microservices Architecture
Service Discovery and Registration in a Microservices ArchitectureService Discovery and Registration in a Microservices Architecture
Service Discovery and Registration in a Microservices Architecture
 
Real-Time Event Processing
Real-Time Event ProcessingReal-Time Event Processing
Real-Time Event Processing
 
No More Hops Towards A Linearly Scalable Application Infrastructure
No More Hops Towards A Linearly Scalable Application InfrastructureNo More Hops Towards A Linearly Scalable Application Infrastructure
No More Hops Towards A Linearly Scalable Application Infrastructure
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
Event Sourcing, Stream Processing and Serverless (Benjamin Stopford, Confluen...
 
A Microservices Journey - Susanne Kaiser
A Microservices Journey - Susanne KaiserA Microservices Journey - Susanne Kaiser
A Microservices Journey - Susanne Kaiser
 
Data Streaming with Apache Kafka & MongoDB - EMEA
Data Streaming with Apache Kafka & MongoDB - EMEAData Streaming with Apache Kafka & MongoDB - EMEA
Data Streaming with Apache Kafka & MongoDB - EMEA
 
Webinar: Data Streaming with Apache Kafka & MongoDB
Webinar: Data Streaming with Apache Kafka & MongoDBWebinar: Data Streaming with Apache Kafka & MongoDB
Webinar: Data Streaming with Apache Kafka & MongoDB
 
apidays LIVE Jakarta - Building an Event-Driven Architecture by Harin Honesty...
apidays LIVE Jakarta - Building an Event-Driven Architecture by Harin Honesty...apidays LIVE Jakarta - Building an Event-Driven Architecture by Harin Honesty...
apidays LIVE Jakarta - Building an Event-Driven Architecture by Harin Honesty...
 
Building real time data-driven products
Building real time data-driven productsBuilding real time data-driven products
Building real time data-driven products
 
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
Building Streaming And Fast Data Applications With Spark, Mesos, Akka, Cassan...
 
Cloud Computing:An Economic Solution for Libraries
Cloud Computing:An Economic Solution for LibrariesCloud Computing:An Economic Solution for Libraries
Cloud Computing:An Economic Solution for Libraries
 
Amazon aws big data demystified | Introduction to streaming and messaging flu...
Amazon aws big data demystified | Introduction to streaming and messaging flu...Amazon aws big data demystified | Introduction to streaming and messaging flu...
Amazon aws big data demystified | Introduction to streaming and messaging flu...
 
Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !Apache Kafka - Scalable Message-Processing and more !
Apache Kafka - Scalable Message-Processing and more !
 
Streaming analytics
Streaming analyticsStreaming analytics
Streaming analytics
 
Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...Deep dive and best practices on real time streaming applications nyc-loft_oct...
Deep dive and best practices on real time streaming applications nyc-loft_oct...
 
Etl is Dead; Long Live Streams
Etl is Dead; Long Live StreamsEtl is Dead; Long Live Streams
Etl is Dead; Long Live Streams
 
ETL Is Dead, Long-live Streams
ETL Is Dead, Long-live StreamsETL Is Dead, Long-live Streams
ETL Is Dead, Long-live Streams
 

Recently uploaded

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...Christina Lin
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVshikhaohhpro
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptkotipi9215
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideChristina Lin
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...OnePlan Solutions
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio, Inc.
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about usDynamic Netsoft
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationkaushalgiri8080
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...kellynguyen01
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataBradBedford3
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...stazi3110
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...aditisharan08
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfkalichargn70th171
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 

Recently uploaded (20)

ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
ODSC - Batch to Stream workshop - integration of Apache Spark, Cassandra, Pos...
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Optimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTVOptimizing AI for immediate response in Smart CCTV
Optimizing AI for immediate response in Smart CCTV
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
chapter--4-software-project-planning.ppt
chapter--4-software-project-planning.pptchapter--4-software-project-planning.ppt
chapter--4-software-project-planning.ppt
 
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop SlideBuilding Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
Building Real-Time Data Pipelines: Stream & Batch Processing workshop Slide
 
Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...Advancing Engineering with AI through the Next Generation of Strategic Projec...
Advancing Engineering with AI through the Next Generation of Strategic Projec...
 
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed DataAlluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
Alluxio Monthly Webinar | Cloud-Native Model Training on Distributed Data
 
DNT_Corporate presentation know about us
DNT_Corporate presentation know about usDNT_Corporate presentation know about us
DNT_Corporate presentation know about us
 
Project Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanationProject Based Learning (A.I).pptx detail explanation
Project Based Learning (A.I).pptx detail explanation
 
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
Short Story: Unveiling the Reasoning Abilities of Large Language Models by Ke...
 
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer DataAdobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
Adobe Marketo Engage Deep Dives: Using Webhooks to Transfer Data
 
Exploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the ProcessExploring iOS App Development: Simplifying the Process
Exploring iOS App Development: Simplifying the Process
 
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
Building a General PDE Solving Framework with Symbolic-Numeric Scientific Mac...
 
Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...Unit 1.1 Excite Part 1, class 9, cbse...
Unit 1.1 Excite Part 1, class 9, cbse...
 
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdfLearn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
Learn the Fundamentals of XCUITest Framework_ A Beginner's Guide.pdf
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 

Asynchronous micro-services and the unified log

  • 1. Asynchronous micro-services and the unified log Crunch Conference, Budapest, 7th October 2016
  • 2. Introducing myself • Alexander Dean • Co-founder and technical lead at Snowplow, the open-source event data pipeline • Weekend writer of Unified Log Processing, available on the Manning Early Access Program • Co-author at Snowplow of Iglu, our open-source schema registry system, and Sauna, our open- source decisioning and response platform
  • 3. We are witnessing the convergence of two separate technology tracks towards asynchronous or event-driven micro-services Transactional workloads Analytical workloads Asynchronous micro-services Software monoliths Synchronous micro-services Classic data warehousing Hybrid data pipelines Unified log architectures
  • 4. Analytical workloads, and the rise of the unified log
  • 5. A quick history lesson: the three eras of business data processing [1] 1. Classic data warehousing, 1996+ 2. Hybrid data pipelines, 2005+ 3. Unified log architectures, 2013+ [1] http://snowplowanalytics.com/blog/ 2014/01/20/the-three-eras-of-business-data-processing/
  • 6. The era of classic data warehousing, 1996+ OWN DATA CENTER Data warehouse HIGH LATENCY Point-to-point connections WIDE DATA COVERAGE CMS Silo CRM Local loop Local loop NARROW DATA SILOES LOW LATENCY LOCAL LOOPS E-comm Silo Local loop Management reporting ERP Silo Local loop Silo Nightly batch ETL process FULL DATA HISTORY
  • 7. The era of hybrid data pipelines, 2005+ CLOUD VENDOR / OWN DATA CENTER Search Silo Local loop LOW LATENCY LOCAL LOOPS E-comm Silo Local loop CRM Local loop SAAS VENDOR #2 Email marketing Local loop ERP Silo Local loop CMS Silo Local loop SAAS VENDOR #1 NARROW DATA SILOES Stream processing Product rec’s Micro-batch processing Systems monitoring Batch processing Data warehouse Management reporting Batch processing Ad hoc analytics Hadoop SAAS VENDOR #3 Web analytics Local loop Local loop Local loop LOW LATENCY LOW LATENCY HIGH LATENCY HIGH LATENCY APIs Bulk exports
  • 8. The hybrid era: a surfeit of software vendors CLOUD VENDOR / OWN DATA CENTER Search Silo Local loop LOW LATENCY LOCAL LOOPS E-comm Silo Local loop CRM Local loop SAAS VENDOR #2 Email marketing Local loop ERP Silo Local loop CMS Silo Local loop SAAS VENDOR #1 NARROW DATA SILOES Stream processing Product rec’s Micro-batch processing Systems monitoring Batch processing Data warehouse Management reporting Batch processing Ad hoc analytics Hadoop SAAS VENDOR #3 Web analytics Local loop Local loop Local loop LOW LATENCY LOW LATENCY HIGH LATENCY HIGH LATENCY APIs Bulk exports
  • 9. The hybrid era: company-wide reporting and analytics ends up like Rashomon The bandit’s story vs. The wife’s story vs. The samurai’s story vs. The woodcutter’s story
  • 10. The hybrid era: the number of data integrations is unsustainable
  • 11. So how do we unravel the hairball?
  • 12. The advent of the unified log, 2013+ CLOUD VENDOR / OWN DATA CENTER Search Silo SOME LOW LATENCY LOCAL LOOPS E-comm Silo CRM SAAS VENDOR #2 Email marketing ERP Silo CMS Silo SAAS VENDOR #1 NARROW DATA SILOES Streaming APIs / web hooks Unified log LOW LATENCY WIDE DATA COVERAGE Archiving Hadoop < WIDE DATA COVERAGE > < FULL DATA HISTORY > FEW DAYS’ DATA HISTORY Systems monitoring Eventstream HIGH LATENCY LOW LATENCY Product rec’s Ad hoc analytics Management reporting Fraud detection Churn prevention APIs
  • 13. CLOUD VENDOR / OWN DATA CENTER Search Silo SOME LOW LATENCY LOCAL LOOPS E-comm Silo CRM SAAS VENDOR #2 Email marketing ERP Silo CMS Silo SAAS VENDOR #1 NARROW DATA SILOES Streaming APIs / web hooks Unified log Archiving Hadoop < WIDE DATA COVERAGE > < FULL DATA HISTORY > Systems monitoring Eventstream HIGH LATENCY LOW LATENCY Product rec’s Ad hoc analytics Management reporting Fraud detection Churn prevention APIs The unified log is Amazon Kinesis, or Apache Kafka • Amazon Kinesis, a hosted AWS service • Extremely similar semantics to Kafka • Apache Kafka, an append- only, distributed, ordered commit log • Developed at LinkedIn to serve as their organization’s unified log
  • 14. “Kafka is designed to allow a single cluster to serve as the central data backbone for a large organization” [1] [1] http://kafka.apache.org/
  • 15. So what does a unified log give us? A single version of the truth Our truth is now upstream from the data warehouse The hairball of point-to-point connections has been unravelled Local loops have been unbundled 1 2 3 4
  • 16. Coming up to the end of 2016, and unified log architectures have seen extremely rapid and widespread adoption
  • 17. Transactional data processing and the move to micro-services
  • 18. In parallel, we have seen a steady (if spotty) rejection of software monoliths for transactional workloads
  • 19. In a micro-services architecture, the individual capabilities of the system are split out into separate services Synchronous communication using request and response (often using RESTful HTTP or RPC)
  • 20. What do synchronous micro-services give us? Strong module boundaries • Network boundaries between modules can be helpful for larger teams Independent deployment • Deploy individual micro-services independently • Simpler to deploy and less likely to cause whole system failures Support diversity • Can use the best language/framework/database for the capability • Reduces monoculture risk (anti-fragile) 1 2 3
  • 21. Convergence on asynchronous or event- driven micro-services
  • 22. When we re-architected Snowplow around the unified log 2.5 years ago, we designed it around small, composable workers Diagram from our February 2014 Snowplow v0.9.0 release post
  • 23. This was based on the insight that real-time pipelines can be composed a little like Unix pipes
  • 24. We avoided monolithic Spark Streaming or Storm jobs, based on our experiences with “heavy” Hadoop jobs in our batch pipeline Kinesis stream Kinesis stream Kinesis stream Kinesis stream Kinesis stream What we didn’t do: an inner Storm topology
  • 25. We wanted to avoid the “inner topology” effect, with effectively two tiers of topology to reason about Difficult to unit test the inner topologies – complex behaviours inside each unit Difficult to operationalize the inner topologies – how do they handle backpressure, how do they scale, how do we upgrade them? Difficult to monitor the inner topologies Fundamental problem: the event streams in an inner topology are not first class entities 1 2 3
  • 26. It worked: today the Snowplow real-time pipeline is a collection of individual event-driven micro-services Stream Collector Stream Enrich Kinesis S3 Kinesis Elasticsearch Kinesis Tee Kinesis Redshift (design stage) User’s AWS Lambda function User’s KCL worker app User’s Spark Streaming job
  • 27. “the most compelling applications for stream processing are actually pretty different from what you would typically do with a Hive or Spark job—they are closer to being a kind of asynchronous microservice rather than being a faster version of a batch analytics job. …” Meanwhile, the Kafka team (now at Confluent) were seeing something interesting in the adoption of Kafka…
  • 28. “…What I mean is that these stream processing apps were most often software that implemented core functions in the business rather than computing analytics about the business.” [1] [1] http://www.confluent.io/blog/introducing-kafka-streams-stream-processing-made-simple/ And these micro-services were substituting not just for batch analytical workloads, but also for transactional workloads
  • 29. Why are async micro- services replacing “classic” request/response micro- services (at least amongst Kafka users)?
  • 30. To start with, asynchronous micro-services have the same benefits as synchronous micro-services… Strong module boundaries • Network boundaries between modules can be helpful for larger teams Independent deployment • Deploy individual micro-services independently • Easier to deploy and less likely to cause whole system failures Support diversity • Can use the best language/framework/database for the capability • Reduces monoculture risk (anti-fragile) 1 2 3
  • 31. … but in addition, event-driven asynchronous micro- services are extremely loosely-coupled, because they are intermediated by first class streams
  • 32. Compare this to request/response synchronous micro-services, which have dependencies between upstream and downstream Downstream Upstream Single-page web app Login service Notifications service Content service Customer profile service Personalization service Ad service
  • 33. If you can afford the (small) latency tax, there are some clear advantages in going asynchronous Much better toolkit for upgrades – schema evolution, running old and new service versions in parallel etc Adding new downstream services doesn’t increase the load on upstream services Failure of individual services introduces lag into the overall system, rather than overall system failure Easier to debug, because service inputs and outputs are directly inspectable in the event streams 1 2 3 4
  • 34. So what’s next in this convergence? ?
  • 35. More and more tooling for building asynchronous micro- services, including the unstoppable rise of “serverless”  Kinesis Client Library AWS Lambda IBM OpenWhisk Event-driven cloud functionsStream processing as a library Azure Functions
  • 36. Greater adoption of open source schema registries as the canonical source of truth for the events in our topologies Confluent or Iglu schema registries
  • 37. Request/response micro-services aren’t going away – they are just too useful But expect a move from slower HTTP-based to faster RPC- based options • e.g. Google’s gRPC • These are easier to read from and write from in high- volume event-driven architectures Expect wider adoption of API definition languages • OpenAPI (Swagger), RAML, API Blueprint Eventual harmonization of types? • Using JSON for RESTful APIs, Protocol Buffers for RPC and Avro for stream processing is crazy • Needs company-wide standardization, or dynamic translation (but this is lossy) 1 2 3
  • 38. We also need new fabrics – or extensions of existing ones like Kubernetes – to address the challenges of running our topologies “How do we monitor this topology, and alert if something (data loss; event lag) is going wrong?” “How do we scale our streams and micro-services to handle event peaks and troughs smoothly?” “How do we re-configure or upgrade our micro-services without breaking things?”
  • 39. At Snowplow we are working on a unified log fabric, called Tupilak, to solve this problem

Editor's Notes

  1. We have a single version of the truth – together, the unified log plus Hadoop archive represent our single version of the truth. They contain exactly the same data - our event stream - they just have different time windows of data The single version of the truth is upstream from the data warehouse – in the classic era, the data warehouse provided the single version of the truth, making all reports generated from it consistent. In the unified era, the log provides the single version of the truth: as a result, operational systems (e.g. recommendation and ad targeting systems) compute on the same truth as analysts producing management reports Point-to-point connections have largely been unravelled - in their place, applications can append to the unified log and other applications can read their writes Local loops have been unbundled - in place of local silos, applications can collaborate on near-real-time decision-making via the unified log
  2. We have a single version of the truth – together, the unified log plus Hadoop archive represent our single version of the truth. They contain exactly the same data - our event stream - they just have different time windows of data The single version of the truth is upstream from the data warehouse – in the classic era, the data warehouse provided the single version of the truth, making all reports generated from it consistent. In the unified era, the log provides the single version of the truth: as a result, operational systems (e.g. recommendation and ad targeting systems) compute on the same truth as analysts producing management reports Point-to-point connections have largely been unravelled - in their place, applications can append to the unified log and other applications can read their writes Local loops have been unbundled - in place of local silos, applications can collaborate on near-real-time decision-making via the unified log
  3. We have a single version of the truth – together, the unified log plus Hadoop archive represent our single version of the truth. They contain exactly the same data - our event stream - they just have different time windows of data The single version of the truth is upstream from the data warehouse – in the classic era, the data warehouse provided the single version of the truth, making all reports generated from it consistent. In the unified era, the log provides the single version of the truth: as a result, operational systems (e.g. recommendation and ad targeting systems) compute on the same truth as analysts producing management reports Point-to-point connections have largely been unravelled - in their place, applications can append to the unified log and other applications can read their writes Local loops have been unbundled - in place of local silos, applications can collaborate on near-real-time decision-making via the unified log
  4. We have a single version of the truth – together, the unified log plus Hadoop archive represent our single version of the truth. They contain exactly the same data - our event stream - they just have different time windows of data The single version of the truth is upstream from the data warehouse – in the classic era, the data warehouse provided the single version of the truth, making all reports generated from it consistent. In the unified era, the log provides the single version of the truth: as a result, operational systems (e.g. recommendation and ad targeting systems) compute on the same truth as analysts producing management reports Point-to-point connections have largely been unravelled - in their place, applications can append to the unified log and other applications can read their writes Local loops have been unbundled - in place of local silos, applications can collaborate on near-real-time decision-making via the unified log
  5. We have a single version of the truth – together, the unified log plus Hadoop archive represent our single version of the truth. They contain exactly the same data - our event stream - they just have different time windows of data The single version of the truth is upstream from the data warehouse – in the classic era, the data warehouse provided the single version of the truth, making all reports generated from it consistent. In the unified era, the log provides the single version of the truth: as a result, operational systems (e.g. recommendation and ad targeting systems) compute on the same truth as analysts producing management reports Point-to-point connections have largely been unravelled - in their place, applications can append to the unified log and other applications can read their writes Local loops have been unbundled - in place of local silos, applications can collaborate on near-real-time decision-making via the unified log
  6. We have a single version of the truth – together, the unified log plus Hadoop archive represent our single version of the truth. They contain exactly the same data - our event stream - they just have different time windows of data The single version of the truth is upstream from the data warehouse – in the classic era, the data warehouse provided the single version of the truth, making all reports generated from it consistent. In the unified era, the log provides the single version of the truth: as a result, operational systems (e.g. recommendation and ad targeting systems) compute on the same truth as analysts producing management reports Point-to-point connections have largely been unravelled - in their place, applications can append to the unified log and other applications can read their writes Local loops have been unbundled - in place of local silos, applications can collaborate on near-real-time decision-making via the unified log
  7. We have a single version of the truth – together, the unified log plus Hadoop archive represent our single version of the truth. They contain exactly the same data - our event stream - they just have different time windows of data The single version of the truth is upstream from the data warehouse – in the classic era, the data warehouse provided the single version of the truth, making all reports generated from it consistent. In the unified era, the log provides the single version of the truth: as a result, operational systems (e.g. recommendation and ad targeting systems) compute on the same truth as analysts producing management reports Point-to-point connections have largely been unravelled - in their place, applications can append to the unified log and other applications can read their writes Local loops have been unbundled - in place of local silos, applications can collaborate on near-real-time decision-making via the unified log
  8. We have a single version of the truth – together, the unified log plus Hadoop archive represent our single version of the truth. They contain exactly the same data - our event stream - they just have different time windows of data The single version of the truth is upstream from the data warehouse – in the classic era, the data warehouse provided the single version of the truth, making all reports generated from it consistent. In the unified era, the log provides the single version of the truth: as a result, operational systems (e.g. recommendation and ad targeting systems) compute on the same truth as analysts producing management reports Point-to-point connections have largely been unravelled - in their place, applications can append to the unified log and other applications can read their writes Local loops have been unbundled - in place of local silos, applications can collaborate on near-real-time decision-making via the unified log
  9. We have a single version of the truth – together, the unified log plus Hadoop archive represent our single version of the truth. They contain exactly the same data - our event stream - they just have different time windows of data The single version of the truth is upstream from the data warehouse – in the classic era, the data warehouse provided the single version of the truth, making all reports generated from it consistent. In the unified era, the log provides the single version of the truth: as a result, operational systems (e.g. recommendation and ad targeting systems) compute on the same truth as analysts producing management reports Point-to-point connections have largely been unravelled - in their place, applications can append to the unified log and other applications can read their writes Local loops have been unbundled - in place of local silos, applications can collaborate on near-real-time decision-making via the unified log
  10. We have a single version of the truth – together, the unified log plus Hadoop archive represent our single version of the truth. They contain exactly the same data - our event stream - they just have different time windows of data The single version of the truth is upstream from the data warehouse – in the classic era, the data warehouse provided the single version of the truth, making all reports generated from it consistent. In the unified era, the log provides the single version of the truth: as a result, operational systems (e.g. recommendation and ad targeting systems) compute on the same truth as analysts producing management reports Point-to-point connections have largely been unravelled - in their place, applications can append to the unified log and other applications can read their writes Local loops have been unbundled - in place of local silos, applications can collaborate on near-real-time decision-making via the unified log
  11. We have a single version of the truth – together, the unified log plus Hadoop archive represent our single version of the truth. They contain exactly the same data - our event stream - they just have different time windows of data The single version of the truth is upstream from the data warehouse – in the classic era, the data warehouse provided the single version of the truth, making all reports generated from it consistent. In the unified era, the log provides the single version of the truth: as a result, operational systems (e.g. recommendation and ad targeting systems) compute on the same truth as analysts producing management reports Point-to-point connections have largely been unravelled - in their place, applications can append to the unified log and other applications can read their writes Local loops have been unbundled - in place of local silos, applications can collaborate on near-real-time decision-making via the unified log
  12. We have a single version of the truth – together, the unified log plus Hadoop archive represent our single version of the truth. They contain exactly the same data - our event stream - they just have different time windows of data The single version of the truth is upstream from the data warehouse – in the classic era, the data warehouse provided the single version of the truth, making all reports generated from it consistent. In the unified era, the log provides the single version of the truth: as a result, operational systems (e.g. recommendation and ad targeting systems) compute on the same truth as analysts producing management reports Point-to-point connections have largely been unravelled - in their place, applications can append to the unified log and other applications can read their writes Local loops have been unbundled - in place of local silos, applications can collaborate on near-real-time decision-making via the unified log
  13. We have a single version of the truth – together, the unified log plus Hadoop archive represent our single version of the truth. They contain exactly the same data - our event stream - they just have different time windows of data The single version of the truth is upstream from the data warehouse – in the classic era, the data warehouse provided the single version of the truth, making all reports generated from it consistent. In the unified era, the log provides the single version of the truth: as a result, operational systems (e.g. recommendation and ad targeting systems) compute on the same truth as analysts producing management reports Point-to-point connections have largely been unravelled - in their place, applications can append to the unified log and other applications can read their writes Local loops have been unbundled - in place of local silos, applications can collaborate on near-real-time decision-making via the unified log
  14. Confluent Schema Registry – an integral part of the Confluent Platform for Kafka-based data pipelines https://github.com/confluentinc/schema-registry Iglu – an integral part of the Snowplow open source event data pipeline https://github.com/snowplow/iglu
  15. We have a single version of the truth – together, the unified log plus Hadoop archive represent our single version of the truth. They contain exactly the same data - our event stream - they just have different time windows of data The single version of the truth is upstream from the data warehouse – in the classic era, the data warehouse provided the single version of the truth, making all reports generated from it consistent. In the unified era, the log provides the single version of the truth: as a result, operational systems (e.g. recommendation and ad targeting systems) compute on the same truth as analysts producing management reports Point-to-point connections have largely been unravelled - in their place, applications can append to the unified log and other applications can read their writes Local loops have been unbundled - in place of local silos, applications can collaborate on near-real-time decision-making via the unified log
  16. We have a single version of the truth – together, the unified log plus Hadoop archive represent our single version of the truth. They contain exactly the same data - our event stream - they just have different time windows of data The single version of the truth is upstream from the data warehouse – in the classic era, the data warehouse provided the single version of the truth, making all reports generated from it consistent. In the unified era, the log provides the single version of the truth: as a result, operational systems (e.g. recommendation and ad targeting systems) compute on the same truth as analysts producing management reports Point-to-point connections have largely been unravelled - in their place, applications can append to the unified log and other applications can read their writes Local loops have been unbundled - in place of local silos, applications can collaborate on near-real-time decision-making via the unified log
  17. We have a single version of the truth – together, the unified log plus Hadoop archive represent our single version of the truth. They contain exactly the same data - our event stream - they just have different time windows of data The single version of the truth is upstream from the data warehouse – in the classic era, the data warehouse provided the single version of the truth, making all reports generated from it consistent. In the unified era, the log provides the single version of the truth: as a result, operational systems (e.g. recommendation and ad targeting systems) compute on the same truth as analysts producing management reports Point-to-point connections have largely been unravelled - in their place, applications can append to the unified log and other applications can read their writes Local loops have been unbundled - in place of local silos, applications can collaborate on near-real-time decision-making via the unified log