SlideShare una empresa de Scribd logo
1 de 26
Descargar para leer sin conexión
Down the event-driven road:
Experiences of integrating
streaming into analytic data
platforms
Dr. Dominik Benz, Head of Machine Learning Engineering, inovex GmbH
Confluent Meetup Munich, 8.10.2018
Integrate
existing (batch)
data sources?
Check
consistency
with data
sources?
Build realtime
data
visualizations?
//flic.kr/p/5eQA7e
//flic.kr/p/bpFt7U
Down the event-driven road ..
Analytic
(Streaming)
Data Platforms
Integrating existing
(batch) data sources
Checking
consistency
Building realtime
visualizations
Wrap up & Summary
A typical analytic data platform
raw processed datahub analysisingress egress
Scheduling, orchestration, metadata
user access, system integration,
development
(Hive) Tables
Airflow, Hive
Metastore
Batch Processin
(Spark, Hive, ..
Flat files,
abases, APIs, ...
SQL, Notebooks
(Zeppelin, ..)
A typical (?) streaming data platform
raw processed datahub analysisingress egress
Scheduling, orchestration, metadata
user access, system integration,
development
(Kafka) Topics,
KTables, ..
(Confluent)
Schema Registry
Stream Process
(Kafka Streams,
..)
Kafka Conne
nput Data
(Streams)
KSQL
Down the event-driven road ..
Analytic
(Streaming)
Data Platforms
Integrating existing
(batch) data sources
Checking
consistency
Building realtime
visualizations
Wrap up & Summary
Integrating web tracking
company
website
tracking
service
tracking pixel
raw
tracking
data
› Hortonworks-based platform, including Nifi
and Confluent Platform
› Apache Airflow established scheduling / workflow
tool, integrated into monitoring, alerting, ..
› Tracking Service: Currently batch-oriented API
(request data, get download links, ..),
but click event stream planned
› Developers / Analysts with mixed background
w.r.t. programming skills
Integrating web tracking: setup / constraints
› drag-and-drop visual definition of data
pipelines
› various built-in connectors (file, stream,
database, service, ...)
› event-based processing paradigm
› built-in queues, data provenance,
backpressure handling, registry, ...
› focus: ingest & lightweight (!) transform
› not a complex event processor (like Kaf
Streams, Flink, Spark Streaming, ...)
› integrated into HDP stack
Apache Nifi in a Nutshell
› python library to define & schedule batch
workflows
› programmatic specification of a „DAG“
(= tasks + dependencies)
› clean handling of job run metadata (succ
duration, ..)
› developed by AirBnB, open-sourced 201
› built-in standard operators (bash, hive,
spark, kubernetes, ..)
› easily extendible (custom operators, ..)
› once used -> never Oozie again J
Apache Airflow in a nutshell
Integrating web tracking: options
tracking
service
ing
Option Aspects
Airflow only + integrated into monitoring, ..
+ job status handling, reloading
- not prepared for future stream
API
- handling file content complicat
Unified Abstraction
(e.g. Apache Beam)
+ one model for batch / stream
ingest
- comparatively high entry barrie
Nifi only + visual pipeline definition
+ easy handling of file content
+ event-based paradigm
+ operators available
- custom status handling, reloadi
Kafka-Connect + fault-tolerant
+ scalable setup
- custom connector coding
- custom status handling, reloadi
› Combines
advantages
of Airflow & Nifi
› Prepared for futur
streaming API
› Integrated into
monitoring, alertin
..
› Status handling /
reloading easy
Integrating web tracking: chosen solution – Airflow + Nifi
tracking
service
trigger
(hourly)
download
check
status
(sensors)
trigger, fetch
download links
download,
process, store
data
Down the event-driven road ..
Analytic
(Streaming)
Data Platforms
Integrating existing
(batch) data sources
Checking
consistency
Building realtime
visualizations
Wrap up & Summary
Checking consistency: Customer Consent
customer
portal
grants / revokes
consent
writes
consent
to hive
kafka
consent
event
in sync?
https://flic.kr/p/9y
Customer
(consent)
database
stores
consent
› Analysts need up-to-date version of customer
consent information in platform
› Hard correctness requirements (especially
regarding revoked consent)
› Continuous monitoring of correctness
› Alerting in case of differences
Checking consistency: setup / constraints
Checking Consistency: Statistics Events
ustomer
portal
kafka
use existing channel (kafka)
source inject periodic „statistics events“
into stream with defined measure point
(in time)
{type:GRANT, cid:12, ts:2018-10-01 11:00:00 ..}
{type:GRANT, cid:10, ts:2018-10-01 11:01:00 ..}
{type:REVOK, cid:09, ts:2018-10-01 11:01:05 ..}
{type=STAT, measure_ts=2018-10-01 11:01:20,
stats={num_consent_v1:72625,
num_consent_v2: 6252, ..}
}
Checking Consistency: Evaluate Statistics Event
› perform count on
target side (Hive) up
to
$measurePoint
› compare counts
› counts = simple
plausibility check, b
more elaborated
checks (hashes)
thinkable
{type=STAT, measure_ts=2018-10-01 11:01:20,
stats={num_consent_v1:72625,
num_consent_v2: 6252, ..}
}
in sync?
{
measure_ts=2018-10-01 11:01:20,
hive_stats={
num_consent_v1:72625,
num_consent_v2: 6252, ..}
}
ome
r
sent
)
base
Down the event-driven road ..
Analytic
(Streaming)
Data Platforms
Integrating existing
(batch) data sources
Checking
consistency
Building realtime
visualizations
Wrap up & Summary
Realtime visualizations: Online Shop Purchases
online
shop
JMS
purchase
event
normalization,
filtering,
aggregation, ..
https://flic.kr/p/9yH
realtime
dashboard
› Goal: timely insights into various purchase
aspects (items bought last 5min, ..)
› flexible / configurable frontend (time window,
aggregation dimension, ..)
› scalable to 100s / 1000s of dashboard users
› low latency of dashboard backend
Realtime visualizations: setup / constraints
Realtime visualizations: components / options
JM
S
ransport layer
ervice backend
service API
processing
Kafka-connect
Kafka
Kafka-streams
Kafka-connect
HBase
Phoenix / JDBC
Spring Boot
Nifi
Kafka
Tranquility
Druid
Spring Boot
aggregation during
processing
aggregation at
query-time
Built-in
configurab
aggregati
Nifi
Kafka
Kafka-connect
HBase
Phoenix / JDBC
Spring Boot
Realtime visualizations: chosen solution
JM
S
Nifi
Kafka
Tranquility
Druid
Spring Boot
› Druid: time series database with focus on
› Realtime ingestion, good Kafka integation
› „slice-and-dice“ queries
› distributed scale-out architecture
› Event processing kept simple in Nifi
› mainly cleaning, transformation
› aggregation is pushed down to Druid
› But: yet another distributed system .. L
› Experiences good so far, but needs work / skills
Down the event-driven road ..
Analytic
(Streaming)
Data Platforms
Integrating existing
(batch) data sources
Checking
consistency
Building realtime
visualizations
Wrap up & Summary
› Technology moves from batch to stream – what
about people?
› Analysts‘ world = often batch world
› tooling centered around static datasets
› can (and must) be generated from streams
› but: education towards stream / event-based
thinking necessary!
› Incremental / stream-based data exchange =
paradigm shift
› efforts / commitment „from both ends“ necessary
The human factor ..
https://flic.kr/p/f2W
Stream me up, Scotty ..
The future is event-based, but on the way:
› Existing batch-oriented APIs
› use (scheduled) event-based tools for easier later migration
› Checking consistency
› inject plausibility checks into data stream
› Realtime visualizations
› Druid + Kafka powerful and flexible combination
› Don‘t forget the human in the loop!
Vielen Dank
Dr. Dominik Benz
dominik.benz@inovex.de
inovex GmbH
Park Plaza
Ludwig-Erhard-Allee 6
76131 Karlsruhe

Más contenido relacionado

La actualidad más candente

Kafka Summit NYC 2017 Hanging Out with Your Past Self in VR
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VRKafka Summit NYC 2017 Hanging Out with Your Past Self in VR
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VRconfluent
 
Operational Analytics on Event Streams in Kafka
Operational Analytics on Event Streams in KafkaOperational Analytics on Event Streams in Kafka
Operational Analytics on Event Streams in Kafkaconfluent
 
Towards sql for streams
Towards sql for streamsTowards sql for streams
Towards sql for streamsRadu Tudoran
 
Flink Forward San Francisco 2018: Ken Krugler - "Building a scalable focused ...
Flink Forward San Francisco 2018: Ken Krugler - "Building a scalable focused ...Flink Forward San Francisco 2018: Ken Krugler - "Building a scalable focused ...
Flink Forward San Francisco 2018: Ken Krugler - "Building a scalable focused ...Flink Forward
 
Building a Streaming Platform with Kafka
Building a Streaming Platform with KafkaBuilding a Streaming Platform with Kafka
Building a Streaming Platform with Kafkaconfluent
 
KSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache KafkaKSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache Kafkaconfluent
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaGuido Schmutz
 
Streaming etl in practice with postgre sql, apache kafka, and ksql mic
Streaming etl in practice with postgre sql, apache kafka, and ksql micStreaming etl in practice with postgre sql, apache kafka, and ksql mic
Streaming etl in practice with postgre sql, apache kafka, and ksql micBas van Oudenaarde
 
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...HostedbyConfluent
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaSolutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaGuido Schmutz
 
Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...
Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...
Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...confluent
 
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made EasyConfluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made EasyKairo Tavares
 
KSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache KafkaKSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache KafkaKai Wähner
 
The State of Stream Processing
The State of Stream ProcessingThe State of Stream Processing
The State of Stream Processingconfluent
 
Data Driven Enterprise with Apache Kafka
Data Driven Enterprise with Apache KafkaData Driven Enterprise with Apache Kafka
Data Driven Enterprise with Apache Kafkaconfluent
 
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...Robert Metzger
 
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!confluent
 
Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)Kai Wähner
 
Real time analytics in Azure IoT
Real time analytics in Azure IoT Real time analytics in Azure IoT
Real time analytics in Azure IoT Sam Vanhoutte
 
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...HostedbyConfluent
 

La actualidad más candente (20)

Kafka Summit NYC 2017 Hanging Out with Your Past Self in VR
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VRKafka Summit NYC 2017 Hanging Out with Your Past Self in VR
Kafka Summit NYC 2017 Hanging Out with Your Past Self in VR
 
Operational Analytics on Event Streams in Kafka
Operational Analytics on Event Streams in KafkaOperational Analytics on Event Streams in Kafka
Operational Analytics on Event Streams in Kafka
 
Towards sql for streams
Towards sql for streamsTowards sql for streams
Towards sql for streams
 
Flink Forward San Francisco 2018: Ken Krugler - "Building a scalable focused ...
Flink Forward San Francisco 2018: Ken Krugler - "Building a scalable focused ...Flink Forward San Francisco 2018: Ken Krugler - "Building a scalable focused ...
Flink Forward San Francisco 2018: Ken Krugler - "Building a scalable focused ...
 
Building a Streaming Platform with Kafka
Building a Streaming Platform with KafkaBuilding a Streaming Platform with Kafka
Building a Streaming Platform with Kafka
 
KSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache KafkaKSQL: Open Source Streaming for Apache Kafka
KSQL: Open Source Streaming for Apache Kafka
 
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache KafkaSolutions for bi-directional integration between Oracle RDBMS & Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS & Apache Kafka
 
Streaming etl in practice with postgre sql, apache kafka, and ksql mic
Streaming etl in practice with postgre sql, apache kafka, and ksql micStreaming etl in practice with postgre sql, apache kafka, and ksql mic
Streaming etl in practice with postgre sql, apache kafka, and ksql mic
 
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
Kafka as your Data Lake - is it Feasible? (Guido Schmutz, Trivadis) Kafka Sum...
 
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache KafkaSolutions for bi-directional integration between Oracle RDBMS and Apache Kafka
Solutions for bi-directional integration between Oracle RDBMS and Apache Kafka
 
Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...
Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...
Secure Kafka at scale in true multi-tenant environment ( Vishnu Balusu & Asho...
 
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made EasyConfluent Kafka and KSQL: Streaming Data Pipelines Made Easy
Confluent Kafka and KSQL: Streaming Data Pipelines Made Easy
 
KSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache KafkaKSQL – An Open Source Streaming Engine for Apache Kafka
KSQL – An Open Source Streaming Engine for Apache Kafka
 
The State of Stream Processing
The State of Stream ProcessingThe State of Stream Processing
The State of Stream Processing
 
Data Driven Enterprise with Apache Kafka
Data Driven Enterprise with Apache KafkaData Driven Enterprise with Apache Kafka
Data Driven Enterprise with Apache Kafka
 
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
 
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
Apache Kafka and KSQL in Action: Let's Build a Streaming Data Pipeline!
 
Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)Kafka Connect and Streams (Concepts, Architecture, Features)
Kafka Connect and Streams (Concepts, Architecture, Features)
 
Real time analytics in Azure IoT
Real time analytics in Azure IoT Real time analytics in Azure IoT
Real time analytics in Azure IoT
 
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
How to use Standard SQL over Kafka: From the basics to advanced use cases | F...
 

Similar a Down the event-driven road: Experiences of integrating streaming into analytic data platforms

Data / Streaming / Microservices Platform with Devops
Data / Streaming / Microservices Platform with DevopsData / Streaming / Microservices Platform with Devops
Data / Streaming / Microservices Platform with DevopsKidong Lee
 
Honu - A Large Scale Streaming Data Collection and Processing Pipeline__Hadoo...
Honu - A Large Scale Streaming Data Collection and Processing Pipeline__Hadoo...Honu - A Large Scale Streaming Data Collection and Processing Pipeline__Hadoo...
Honu - A Large Scale Streaming Data Collection and Processing Pipeline__Hadoo...Yahoo Developer Network
 
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureData Science Milan
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterPaolo Castagna
 
Serverless and Streaming: Building ‘eBay’ by ‘Turning the Database Inside Out’
Serverless and Streaming: Building ‘eBay’ by ‘Turning the Database Inside Out’ Serverless and Streaming: Building ‘eBay’ by ‘Turning the Database Inside Out’
Serverless and Streaming: Building ‘eBay’ by ‘Turning the Database Inside Out’ confluent
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...StreamNative
 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleEvan Chan
 
Large scale stream processing with Apache Flink
Large scale stream processing with Apache FlinkLarge scale stream processing with Apache Flink
Large scale stream processing with Apache FlinkNikolay Stoitsev
 
Scalable Stream Processing with Apache Samza
Scalable Stream Processing with Apache SamzaScalable Stream Processing with Apache Samza
Scalable Stream Processing with Apache SamzaPrateek Maheshwari
 
Logging for Production Systems in The Container Era
Logging for Production Systems in The Container EraLogging for Production Systems in The Container Era
Logging for Production Systems in The Container EraSadayuki Furuhashi
 
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, QlikKeeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, QlikHostedbyConfluent
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Cloudera, Inc.
 
thinking in key value stores
thinking in key value storesthinking in key value stores
thinking in key value storesBhasker Kode
 
Hopsworks Feature Store 2.0 a new paradigm
Hopsworks Feature Store  2.0   a new paradigmHopsworks Feature Store  2.0   a new paradigm
Hopsworks Feature Store 2.0 a new paradigmJim Dowling
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksSlim Baltagi
 
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...Amazon Web Services
 
App modernization on AWS with Apache Kafka and Confluent Cloud
App modernization on AWS with Apache Kafka and Confluent CloudApp modernization on AWS with Apache Kafka and Confluent Cloud
App modernization on AWS with Apache Kafka and Confluent CloudKai Wähner
 
nuclio Overview October 2017
nuclio Overview October 2017nuclio Overview October 2017
nuclio Overview October 2017iguazio
 
Event Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDBEvent Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDBScyllaDB
 

Similar a Down the event-driven road: Experiences of integrating streaming into analytic data platforms (20)

Data / Streaming / Microservices Platform with Devops
Data / Streaming / Microservices Platform with DevopsData / Streaming / Microservices Platform with Devops
Data / Streaming / Microservices Platform with Devops
 
Honu - A Large Scale Streaming Data Collection and Processing Pipeline__Hadoo...
Honu - A Large Scale Streaming Data Collection and Processing Pipeline__Hadoo...Honu - A Large Scale Streaming Data Collection and Processing Pipeline__Hadoo...
Honu - A Large Scale Streaming Data Collection and Processing Pipeline__Hadoo...
 
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML InfrastructureMLOps with a Feature Store: Filling the Gap in ML Infrastructure
MLOps with a Feature Store: Filling the Gap in ML Infrastructure
 
Introduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matterIntroduction to apache kafka, confluent and why they matter
Introduction to apache kafka, confluent and why they matter
 
Serverless and Streaming: Building ‘eBay’ by ‘Turning the Database Inside Out’
Serverless and Streaming: Building ‘eBay’ by ‘Turning the Database Inside Out’ Serverless and Streaming: Building ‘eBay’ by ‘Turning the Database Inside Out’
Serverless and Streaming: Building ‘eBay’ by ‘Turning the Database Inside Out’
 
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi - Pulsa...
 
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza SeattleBuilding Scalable Data Pipelines - 2016 DataPalooza Seattle
Building Scalable Data Pipelines - 2016 DataPalooza Seattle
 
Large scale stream processing with Apache Flink
Large scale stream processing with Apache FlinkLarge scale stream processing with Apache Flink
Large scale stream processing with Apache Flink
 
Journey to the cloud
Journey to the cloudJourney to the cloud
Journey to the cloud
 
Scalable Stream Processing with Apache Samza
Scalable Stream Processing with Apache SamzaScalable Stream Processing with Apache Samza
Scalable Stream Processing with Apache Samza
 
Logging for Production Systems in The Container Era
Logging for Production Systems in The Container EraLogging for Production Systems in The Container Era
Logging for Production Systems in The Container Era
 
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, QlikKeeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
Keeping Analytics Data Fresh in a Streaming Architecture | John Neal, Qlik
 
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
Hadoop World 2011: Building Web Analytics Processing on Hadoop at CBS Interac...
 
thinking in key value stores
thinking in key value storesthinking in key value stores
thinking in key value stores
 
Hopsworks Feature Store 2.0 a new paradigm
Hopsworks Feature Store  2.0   a new paradigmHopsworks Feature Store  2.0   a new paradigm
Hopsworks Feature Store 2.0 a new paradigm
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
 
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016  Webi...
Evolving Your Big Data Use Cases from Batch to Real-Time - AWS May 2016 Webi...
 
App modernization on AWS with Apache Kafka and Confluent Cloud
App modernization on AWS with Apache Kafka and Confluent CloudApp modernization on AWS with Apache Kafka and Confluent Cloud
App modernization on AWS with Apache Kafka and Confluent Cloud
 
nuclio Overview October 2017
nuclio Overview October 2017nuclio Overview October 2017
nuclio Overview October 2017
 
Event Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDBEvent Streaming Architectures with Confluent and ScyllaDB
Event Streaming Architectures with Confluent and ScyllaDB
 

Más de inovex GmbH

lldb – Debugger auf Abwegen
lldb – Debugger auf Abwegenlldb – Debugger auf Abwegen
lldb – Debugger auf Abwegeninovex GmbH
 
Are you sure about that?! Uncertainty Quantification in AI
Are you sure about that?! Uncertainty Quantification in AIAre you sure about that?! Uncertainty Quantification in AI
Are you sure about that?! Uncertainty Quantification in AIinovex GmbH
 
Why natural language is next step in the AI evolution
Why natural language is next step in the AI evolutionWhy natural language is next step in the AI evolution
Why natural language is next step in the AI evolutioninovex GmbH
 
Network Policies
Network PoliciesNetwork Policies
Network Policiesinovex GmbH
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learninginovex GmbH
 
Jenkins X – CI/CD in wolkigen Umgebungen
Jenkins X – CI/CD in wolkigen UmgebungenJenkins X – CI/CD in wolkigen Umgebungen
Jenkins X – CI/CD in wolkigen Umgebungeninovex GmbH
 
AI auf Edge-Geraeten
AI auf Edge-GeraetenAI auf Edge-Geraeten
AI auf Edge-Geraeteninovex GmbH
 
Prometheus on Kubernetes
Prometheus on KubernetesPrometheus on Kubernetes
Prometheus on Kubernetesinovex GmbH
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systemsinovex GmbH
 
Representation Learning von Zeitreihen
Representation Learning von ZeitreihenRepresentation Learning von Zeitreihen
Representation Learning von Zeitreiheninovex GmbH
 
Talk to me – Chatbots und digitale Assistenten
Talk to me – Chatbots und digitale AssistentenTalk to me – Chatbots und digitale Assistenten
Talk to me – Chatbots und digitale Assistenteninovex GmbH
 
Künstlich intelligent?
Künstlich intelligent?Künstlich intelligent?
Künstlich intelligent?inovex GmbH
 
Das Android Open Source Project
Das Android Open Source ProjectDas Android Open Source Project
Das Android Open Source Projectinovex GmbH
 
Machine Learning Interpretability
Machine Learning InterpretabilityMachine Learning Interpretability
Machine Learning Interpretabilityinovex GmbH
 
Performance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use casePerformance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use caseinovex GmbH
 
People & Products – Lessons learned from the daily IT madness
People & Products – Lessons learned from the daily IT madnessPeople & Products – Lessons learned from the daily IT madness
People & Products – Lessons learned from the daily IT madnessinovex GmbH
 
Infrastructure as (real) Code – Manage your K8s resources with Pulumi
Infrastructure as (real) Code – Manage your K8s resources with PulumiInfrastructure as (real) Code – Manage your K8s resources with Pulumi
Infrastructure as (real) Code – Manage your K8s resources with Pulumiinovex GmbH
 

Más de inovex GmbH (20)

lldb – Debugger auf Abwegen
lldb – Debugger auf Abwegenlldb – Debugger auf Abwegen
lldb – Debugger auf Abwegen
 
Are you sure about that?! Uncertainty Quantification in AI
Are you sure about that?! Uncertainty Quantification in AIAre you sure about that?! Uncertainty Quantification in AI
Are you sure about that?! Uncertainty Quantification in AI
 
Why natural language is next step in the AI evolution
Why natural language is next step in the AI evolutionWhy natural language is next step in the AI evolution
Why natural language is next step in the AI evolution
 
WWDC 2019 Recap
WWDC 2019 RecapWWDC 2019 Recap
WWDC 2019 Recap
 
Network Policies
Network PoliciesNetwork Policies
Network Policies
 
Interpretable Machine Learning
Interpretable Machine LearningInterpretable Machine Learning
Interpretable Machine Learning
 
Jenkins X – CI/CD in wolkigen Umgebungen
Jenkins X – CI/CD in wolkigen UmgebungenJenkins X – CI/CD in wolkigen Umgebungen
Jenkins X – CI/CD in wolkigen Umgebungen
 
AI auf Edge-Geraeten
AI auf Edge-GeraetenAI auf Edge-Geraeten
AI auf Edge-Geraeten
 
Prometheus on Kubernetes
Prometheus on KubernetesPrometheus on Kubernetes
Prometheus on Kubernetes
 
Deep Learning for Recommender Systems
Deep Learning for Recommender SystemsDeep Learning for Recommender Systems
Deep Learning for Recommender Systems
 
Azure IoT Edge
Azure IoT EdgeAzure IoT Edge
Azure IoT Edge
 
Representation Learning von Zeitreihen
Representation Learning von ZeitreihenRepresentation Learning von Zeitreihen
Representation Learning von Zeitreihen
 
Talk to me – Chatbots und digitale Assistenten
Talk to me – Chatbots und digitale AssistentenTalk to me – Chatbots und digitale Assistenten
Talk to me – Chatbots und digitale Assistenten
 
Künstlich intelligent?
Künstlich intelligent?Künstlich intelligent?
Künstlich intelligent?
 
Dev + Ops = Go
Dev + Ops = GoDev + Ops = Go
Dev + Ops = Go
 
Das Android Open Source Project
Das Android Open Source ProjectDas Android Open Source Project
Das Android Open Source Project
 
Machine Learning Interpretability
Machine Learning InterpretabilityMachine Learning Interpretability
Machine Learning Interpretability
 
Performance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use casePerformance evaluation of GANs in a semisupervised OCR use case
Performance evaluation of GANs in a semisupervised OCR use case
 
People & Products – Lessons learned from the daily IT madness
People & Products – Lessons learned from the daily IT madnessPeople & Products – Lessons learned from the daily IT madness
People & Products – Lessons learned from the daily IT madness
 
Infrastructure as (real) Code – Manage your K8s resources with Pulumi
Infrastructure as (real) Code – Manage your K8s resources with PulumiInfrastructure as (real) Code – Manage your K8s resources with Pulumi
Infrastructure as (real) Code – Manage your K8s resources with Pulumi
 

Último

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...shyamraj55
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksSoftradix Technologies
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 

Último (20)

The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
Automating Business Process via MuleSoft Composer | Bangalore MuleSoft Meetup...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Benefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other FrameworksBenefits Of Flutter Compared To Other Frameworks
Benefits Of Flutter Compared To Other Frameworks
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 

Down the event-driven road: Experiences of integrating streaming into analytic data platforms

  • 1. Down the event-driven road: Experiences of integrating streaming into analytic data platforms Dr. Dominik Benz, Head of Machine Learning Engineering, inovex GmbH Confluent Meetup Munich, 8.10.2018
  • 2. Integrate existing (batch) data sources? Check consistency with data sources? Build realtime data visualizations? //flic.kr/p/5eQA7e //flic.kr/p/bpFt7U
  • 3. Down the event-driven road .. Analytic (Streaming) Data Platforms Integrating existing (batch) data sources Checking consistency Building realtime visualizations Wrap up & Summary
  • 4. A typical analytic data platform raw processed datahub analysisingress egress Scheduling, orchestration, metadata user access, system integration, development (Hive) Tables Airflow, Hive Metastore Batch Processin (Spark, Hive, .. Flat files, abases, APIs, ... SQL, Notebooks (Zeppelin, ..)
  • 5. A typical (?) streaming data platform raw processed datahub analysisingress egress Scheduling, orchestration, metadata user access, system integration, development (Kafka) Topics, KTables, .. (Confluent) Schema Registry Stream Process (Kafka Streams, ..) Kafka Conne nput Data (Streams) KSQL
  • 6. Down the event-driven road .. Analytic (Streaming) Data Platforms Integrating existing (batch) data sources Checking consistency Building realtime visualizations Wrap up & Summary
  • 8. › Hortonworks-based platform, including Nifi and Confluent Platform › Apache Airflow established scheduling / workflow tool, integrated into monitoring, alerting, .. › Tracking Service: Currently batch-oriented API (request data, get download links, ..), but click event stream planned › Developers / Analysts with mixed background w.r.t. programming skills Integrating web tracking: setup / constraints
  • 9. › drag-and-drop visual definition of data pipelines › various built-in connectors (file, stream, database, service, ...) › event-based processing paradigm › built-in queues, data provenance, backpressure handling, registry, ... › focus: ingest & lightweight (!) transform › not a complex event processor (like Kaf Streams, Flink, Spark Streaming, ...) › integrated into HDP stack Apache Nifi in a Nutshell
  • 10. › python library to define & schedule batch workflows › programmatic specification of a „DAG“ (= tasks + dependencies) › clean handling of job run metadata (succ duration, ..) › developed by AirBnB, open-sourced 201 › built-in standard operators (bash, hive, spark, kubernetes, ..) › easily extendible (custom operators, ..) › once used -> never Oozie again J Apache Airflow in a nutshell
  • 11. Integrating web tracking: options tracking service ing Option Aspects Airflow only + integrated into monitoring, .. + job status handling, reloading - not prepared for future stream API - handling file content complicat Unified Abstraction (e.g. Apache Beam) + one model for batch / stream ingest - comparatively high entry barrie Nifi only + visual pipeline definition + easy handling of file content + event-based paradigm + operators available - custom status handling, reloadi Kafka-Connect + fault-tolerant + scalable setup - custom connector coding - custom status handling, reloadi
  • 12. › Combines advantages of Airflow & Nifi › Prepared for futur streaming API › Integrated into monitoring, alertin .. › Status handling / reloading easy Integrating web tracking: chosen solution – Airflow + Nifi tracking service trigger (hourly) download check status (sensors) trigger, fetch download links download, process, store data
  • 13. Down the event-driven road .. Analytic (Streaming) Data Platforms Integrating existing (batch) data sources Checking consistency Building realtime visualizations Wrap up & Summary
  • 14. Checking consistency: Customer Consent customer portal grants / revokes consent writes consent to hive kafka consent event in sync? https://flic.kr/p/9y Customer (consent) database stores consent
  • 15. › Analysts need up-to-date version of customer consent information in platform › Hard correctness requirements (especially regarding revoked consent) › Continuous monitoring of correctness › Alerting in case of differences Checking consistency: setup / constraints
  • 16. Checking Consistency: Statistics Events ustomer portal kafka use existing channel (kafka) source inject periodic „statistics events“ into stream with defined measure point (in time) {type:GRANT, cid:12, ts:2018-10-01 11:00:00 ..} {type:GRANT, cid:10, ts:2018-10-01 11:01:00 ..} {type:REVOK, cid:09, ts:2018-10-01 11:01:05 ..} {type=STAT, measure_ts=2018-10-01 11:01:20, stats={num_consent_v1:72625, num_consent_v2: 6252, ..} }
  • 17. Checking Consistency: Evaluate Statistics Event › perform count on target side (Hive) up to $measurePoint › compare counts › counts = simple plausibility check, b more elaborated checks (hashes) thinkable {type=STAT, measure_ts=2018-10-01 11:01:20, stats={num_consent_v1:72625, num_consent_v2: 6252, ..} } in sync? { measure_ts=2018-10-01 11:01:20, hive_stats={ num_consent_v1:72625, num_consent_v2: 6252, ..} } ome r sent ) base
  • 18. Down the event-driven road .. Analytic (Streaming) Data Platforms Integrating existing (batch) data sources Checking consistency Building realtime visualizations Wrap up & Summary
  • 19. Realtime visualizations: Online Shop Purchases online shop JMS purchase event normalization, filtering, aggregation, .. https://flic.kr/p/9yH realtime dashboard
  • 20. › Goal: timely insights into various purchase aspects (items bought last 5min, ..) › flexible / configurable frontend (time window, aggregation dimension, ..) › scalable to 100s / 1000s of dashboard users › low latency of dashboard backend Realtime visualizations: setup / constraints
  • 21. Realtime visualizations: components / options JM S ransport layer ervice backend service API processing Kafka-connect Kafka Kafka-streams Kafka-connect HBase Phoenix / JDBC Spring Boot Nifi Kafka Tranquility Druid Spring Boot aggregation during processing aggregation at query-time Built-in configurab aggregati Nifi Kafka Kafka-connect HBase Phoenix / JDBC Spring Boot
  • 22. Realtime visualizations: chosen solution JM S Nifi Kafka Tranquility Druid Spring Boot › Druid: time series database with focus on › Realtime ingestion, good Kafka integation › „slice-and-dice“ queries › distributed scale-out architecture › Event processing kept simple in Nifi › mainly cleaning, transformation › aggregation is pushed down to Druid › But: yet another distributed system .. L › Experiences good so far, but needs work / skills
  • 23. Down the event-driven road .. Analytic (Streaming) Data Platforms Integrating existing (batch) data sources Checking consistency Building realtime visualizations Wrap up & Summary
  • 24. › Technology moves from batch to stream – what about people? › Analysts‘ world = often batch world › tooling centered around static datasets › can (and must) be generated from streams › but: education towards stream / event-based thinking necessary! › Incremental / stream-based data exchange = paradigm shift › efforts / commitment „from both ends“ necessary The human factor .. https://flic.kr/p/f2W
  • 25. Stream me up, Scotty .. The future is event-based, but on the way: › Existing batch-oriented APIs › use (scheduled) event-based tools for easier later migration › Checking consistency › inject plausibility checks into data stream › Realtime visualizations › Druid + Kafka powerful and flexible combination › Don‘t forget the human in the loop!
  • 26. Vielen Dank Dr. Dominik Benz dominik.benz@inovex.de inovex GmbH Park Plaza Ludwig-Erhard-Allee 6 76131 Karlsruhe