SlideShare a Scribd company logo
1 of 62
Download to read offline
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Streaming analytics better
than batch - when and why ?
_Adam Kawa - Dawid Wysakowicz -_
Krzysztof Zarzycki_
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Have you ever
built cool
Big Data
pipelines?
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Example Use-Case
■ Can be done in batch and
real-time
■ User session analytics at Spotify
● Simple stats
■ Duration, number of songs,
skips, searches etc.
● Advanced analytics
■ Mood, physical activity, real-time
content, ads
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Example Output
How long do users listen
to a new edition of
Discover Weekly?
_1. Dashboards_
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Example Output
How long do users listen
to a new edition of
Discover Weekly?
Australian users are
listening to Discover
Weekly too short !!!
_1. Dashboards_ _2. Alerts_
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Example Output
How long do users listen
to a new edition of
Discover Weekly?
Australian users are
listening to Discover
Weekly too short !!!
Recommend songs
and ads based on
current activity.
_1. Dashboards_ _2. Alerts_ _3. Content_
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
1st
- Batch Architecture
1h
1h
1h
1h - 1d
1h
User
Events
User
Sessions
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
1st
- Batch Architecture
1h
1h
1h
1d
1h
User
Events
User
Sessions
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
The More Moving Parts …
⬇ The higher learning curve
⬇ The more gluing code
⬇ The larger administrative effort
⬇ The more error-prone solution
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Long Waiting Time
Image source: “Continuous Analytics: Stream Query Processing in Practice”, Michael J Franklin, Professor, UC Berkley, Dec 2009 and
http://www.slideshare.net/JoshBaer/shortening-the-feedback-loop-big-data-spain-external
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
2nd
- Micro-Batch Architecture
1m - 1h
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
♪ ♪
No Built-In Session Windows
♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪
♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪
♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪
[10:00 - 11:00) [11:00 - 12:00)
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
♪ ♪
No Built-In Session Windows
♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪
♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪
♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪
[10:00 - 11:00) [11:00 - 12:00)
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Late Data …
♪ ♪ ♪ ♪ ♪ ♪ Event Time
14:55 - 16:35
Processing
Time
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
... Included in Current Batch
♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪
♪ ♪ ♪ ♪ ♪
♪
14:55 - 16:35 16:50 - …
Event Time
Processing
Time
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Out-Of-Order Data …
♪ ♫ ♪ Event Time
Processing
Time
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Out-Of-Order Data …
♪ ♫ ♪ ♪ ♪ ♫
♪ ♪ ♫
Event Time
Processing
Time
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Out-Of-Order Data …
♪ ♫ ♪ ♪ ♪ ♫
♪ ♪ ♫
Event Time
Processing
Time
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
... Breaks Correctness
♪ ♫ ♪ ♪ ♪ ♫ ♪ ♫ ♪ ♬ ♫
♪ ♪ ♫ ♪ ♫ ♪ ♬ ♫
♪
Event Time
Processing
Time
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Problems
FILES,
BATCHES,
DATA LAKES
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Solving Streaming Problem With Batch?
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
3rd
- Streaming-First Architecture
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
User Session Windows
♪Case A ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪
Case B ♪ ♪ ♪ ♪ ♪ ♪
Session gap
eg. 15
minutes
♪
♪♪
5
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
User Session Windows
♪Case A ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪
Case B ♪ ♪ ♪ ♪ ♪ ♪
Session gap
eg. 15
minutes
♪
♪♪
5
[3,2]
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Reading From Kafka
val sessionStream : DataStream[SessionStats] = sEnv
.addSource(new KafkaConsumer(...))
♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪♪ ♪ ♪ ♪ ♪ ♪ ♪
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Session Windows With Gap
val sessionStream : DataStream[SessionStats] = sEnv
.addSource(new KafkaConsumer(...))
.keyBy(_.userId)
♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪
♪ ♪ ♪ ♪ ♪ ♪ ♪
User 1
User 2
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Session Windows With Gap
val sessionStream : DataStream[SessionStats] = sEnv
.addSource(new KafkaConsumer(...))
.keyBy(_.userId)
.window(EventTimeSessionWindows.withGap(Time.minutes(15)))
User 1 ♪ ♪ ♪ ♪ ♪ ♪
Session gap
- 15 minutes
♪♪
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Analyzing User Session
val sessionStream : DataStream[SessionStats] = sEnv
.addSource(new KafkaConsumer(...))
.keyBy(_.userId)
.window(EventTimeSessionWindows.withGap(Time.minutes(15)))
.apply(new CountSessionStats())
User 1 ♪ ♪ ♪ ♪ ♪ ♪ ♪♪
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Handling Late Events
val sessionStream : DataStream[SessionStats] = sEnv
.addSource(new KafkaConsumer(...))
.keyBy(_.userId)
.window(EventTimeSessionWindows.withGap(Time.minutes(15)))
.allowedLateness(Time.minutes(60))
.apply(new CountSessionStats())
User 1 ♪ ♪ ♪ ♪ ♪ ♪ ♪♪ ♪
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Triggering Early Results
val sessionStream : DataStream[SessionStats] = sEnv
.addSource(new KafkaConsumer(...))
.keyBy(_.userId)
.window(EventTimeSessionWindows.withGap(Time.minutes(15)))
.trigger(EarlyTriggeringTrigger.every(Time.minutes(10)))
.allowedLateness(Time.minutes(60))
.apply(new CountSessionStats())
User 1 ♪ ♪ ♪ ♪ ♪ ♪ ♪♪
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Sessionization Example
val sessionStream : DataStream[SessionStats] = sEnv
.addSource(new KafkaConsumer(...))
.keyBy(_.userId)
.window(EventTimeSessionWindows.withGap(Time.minutes(15)))
.trigger(EarlyTriggeringTrigger.every(Time.minutes(10)))
.allowedLateness(Time.minutes(60))
.apply(new CountSessionStats())
Working example:
https://github.com/getindata/flink-use-case
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Modern Stream Processing Engines
■ Rich stream processing semantic
● Built-in support for event-time windows
● Accurate results for late / out-of-order events and replays
● Early triggers
■ Low latency and high-throughput
■ Exactly-once stateful processing
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Modern Stream Processing Engines
■ Rich stream processing semantic
● Built-in support for event-time windows
● Accurate results for late / out-of-order events and replays
● Early triggers
■ Low latency and high-throughput
■ Exactly-once stateful processing
User survey:
http://data-artisans.com/flink-user-survey-2016-part-1
http://data-artisans.com/flink-user-survey-2016-part-2
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
How can I reprocess data?
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Reprocessing Events In Flink
1. Take periodic snapshots of a job
● It stores Kafka offsets, on-flight sessions, application state
2. Restart a job from a savepoint rather than from a beginning
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
What if data is no longer in Kafka?
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Consuming Data From HDFS
■ Run your streaming code on HDFS (bounded data)
● You need to read data in event-time based order
● Implement mechanism of proper watermark generation
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
What are usual stream processing
applications?
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Stream Analytics
Image source: https://www.slideshare.net/sinisalyh/storm-at-spotify
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Stream 24/7 Applications
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
When is batch processing good?
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Batch Processing Use-Cases
■ Ad-hoc analytics and data exploration
● Notebooks, Spark/Flink/Hive, Parquet, complete data sets
■ Technical advantages
● A large swaths of historical data in HDFS
● High-level libraries in mature batch technologies
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Batch Processing Use-Cases
■ Ad-hoc analytics and data exploration
● Notebooks, Spark/Flink/Hive, Parquet, complete data sets
■ Implementation advantages
● Offline experiments over large historical data
■ Historical events are usually stored in HDFS, not Kafka
● High-level libraries in batch processing technologies
■ Spark MLlib, H2O
(when data arrives continuously)
don’t solve
streaming problem
with batch jobs
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Who Are You, actually?
■ At GetInData, we build custom Big Data solutions
● Hadoop, Flink, Spark, Kafka and more
■ Our team is today represented by
Krzysztof
Zarzycki
Dawid
Wysakowicz
Adam
Kawa
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
■ Stream often the natural representation of your data
■ Stream processing is not only about low latency
Summary
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Q&A
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Thanks !
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Log Abstraction
11:00 -
12:00
12:00 -
13:00
…
…
10:00 - …
10:00 - …
10:00 -
11:00
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Spark Structured Streaming
⬇ Operates on top of micro-batches (Spark SQL engine)
■ The ALPHA version and the experimental API until July 11,
2017
⬆ Easy-to-learn API (Dataset/DataFrame)
⬆ Rich ecosystem of tools and libraries e.g. MLlib
⬆ Supports event-time
⬇ Sessionization not yet supported - SPARK-10816
⬇ Queryable state not yet supported - SPARK-16738
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Kafka Streams
⬇ No exactly-once (just at-least-once)
⬇ Kafka as the only data source
⬇ No bounded streams (batch) optimizations
⬆ Simplicity
⬆ Embedded into application
⬆ Supports event-time
⬇ Lack of session windows
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Apache Beam
⬆ Unified API for batch and streaming
⬆ Rich streaming processing semantics
⬆ Complex TriggerDSL
⬆ Multiple runtime environments
⬆ Spark, Flink, Apex, Dataflow
⬆ Side inputs and outputs
⬇ Verbose Java API
⬇ New project - Top level since 01/2017
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Google Dataflow
■ Runtime environment for Apache Beam in Google Cloud
⬇ No support for Iterative Computations
⬆ Supports Side Outputs
⬆ Works with every Google Cloud Service (Pub/Sub, BigTable
etc.)
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
How to join with other data
sets/streams?
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Join With Other Datasets / Streams
■ Flink can join windowed streams easily
■ Join of data stream with data set is WIP
● Even with slowly changing data set!
● Even keyed data
Stream 2
Stream 1
Joined Stream Input Stream Joined Stream
+
Id Name
1 John Doe
2 Jane Doe
Dataset
+
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
I like this streaming API.
Can I use it for batch?
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Unified batch and streaming API
■ Not with raw Flink API
■ But with Flink Table API
■ Apache Beam
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Is Flink production ready?
© Copyright. All rights reserved. Not to be reproduced without prior written consent.
Powered By Flink

More Related Content

What's hot

What's hot (9)

Node Tools For Your Grails Toolbox - Gr8Conf 2013
Node Tools For Your Grails Toolbox - Gr8Conf 2013Node Tools For Your Grails Toolbox - Gr8Conf 2013
Node Tools For Your Grails Toolbox - Gr8Conf 2013
 
Working with Git
Working with GitWorking with Git
Working with Git
 
LetSwift 2017 - 토스 iOS 앱의 개발/배포 환경
LetSwift 2017 - 토스 iOS 앱의 개발/배포 환경LetSwift 2017 - 토스 iOS 앱의 개발/배포 환경
LetSwift 2017 - 토스 iOS 앱의 개발/배포 환경
 
Gittalk
GittalkGittalk
Gittalk
 
4Developers: Dns vs webapp
4Developers: Dns vs webapp4Developers: Dns vs webapp
4Developers: Dns vs webapp
 
Introduction To Git Workshop
Introduction To Git WorkshopIntroduction To Git Workshop
Introduction To Git Workshop
 
Introducción a git y GitHub
Introducción a git y GitHubIntroducción a git y GitHub
Introducción a git y GitHub
 
Git - Get Ready To Use It
Git - Get Ready To Use ItGit - Get Ready To Use It
Git - Get Ready To Use It
 
Astricon 2017 Superpower of deployment tools
Astricon 2017 Superpower of deployment toolsAstricon 2017 Superpower of deployment tools
Astricon 2017 Superpower of deployment tools
 

Similar to Streaming analytics better than batch – when and why by Dawid Wysakowicz and Adam Kawa at Big Data Spain 2017

NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
GetInData
 
Time-State Analytics
Time-State AnalyticsTime-State Analytics
Time-State Analytics
HostedbyConfluent
 
Lost in Translation:varnishlog, varnishtest(VUG7)
Lost in Translation:varnishlog, varnishtest(VUG7)Lost in Translation:varnishlog, varnishtest(VUG7)
Lost in Translation:varnishlog, varnishtest(VUG7)
Iwana Chan
 
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Kubernetes and real-time analytics - how to connect these two worlds with Apa...Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
GetInData
 

Similar to Streaming analytics better than batch – when and why by Dawid Wysakowicz and Adam Kawa at Big Data Spain 2017 (20)

Streaming analytics better than batch when and why - (Big Data Tech 2017)
Streaming analytics better than batch   when and why - (Big Data Tech 2017)Streaming analytics better than batch   when and why - (Big Data Tech 2017)
Streaming analytics better than batch when and why - (Big Data Tech 2017)
 
Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...Complex event processing platform handling millions of users - Krzysztof Zarz...
Complex event processing platform handling millions of users - Krzysztof Zarz...
 
Flink. Pure Streaming
Flink. Pure StreamingFlink. Pure Streaming
Flink. Pure Streaming
 
Apache Flume
Apache FlumeApache Flume
Apache Flume
 
Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...Building a system for machine and event-oriented data - Velocity, Santa Clara...
Building a system for machine and event-oriented data - Velocity, Santa Clara...
 
Observability at Spotify
Observability at SpotifyObservability at Spotify
Observability at Spotify
 
Cache aware-server-push in H2O version 1.5
Cache aware-server-push in H2O version 1.5Cache aware-server-push in H2O version 1.5
Cache aware-server-push in H2O version 1.5
 
Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015Building a system for machine and event-oriented data - Data Day Seattle 2015
Building a system for machine and event-oriented data - Data Day Seattle 2015
 
NTTドコモ様 導入事例 OpenStack Summit 2015 Tokyo 講演「After One year of OpenStack Cloud...
NTTドコモ様 導入事例 OpenStack Summit 2015 Tokyo 講演「After One year of OpenStack Cloud...NTTドコモ様 導入事例 OpenStack Summit 2015 Tokyo 講演「After One year of OpenStack Cloud...
NTTドコモ様 導入事例 OpenStack Summit 2015 Tokyo 講演「After One year of OpenStack Cloud...
 
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
NLP for videos: Understanding customers' feelings in videos - Albert Lewandow...
 
Apache httpd 2.4: The Cloud Killer App
Apache httpd 2.4: The Cloud Killer AppApache httpd 2.4: The Cloud Killer App
Apache httpd 2.4: The Cloud Killer App
 
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
Why And When Should We Consider Stream Processing In Our Solutions Teqnation ...
 
Building a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with RocanaBuilding a system for machine and event-oriented data with Rocana
Building a system for machine and event-oriented data with Rocana
 
Time-State Analytics
Time-State AnalyticsTime-State Analytics
Time-State Analytics
 
Spring Framework 5.0による Reactive Web Application #JavaDayTokyo
Spring Framework 5.0による Reactive Web Application #JavaDayTokyoSpring Framework 5.0による Reactive Web Application #JavaDayTokyo
Spring Framework 5.0による Reactive Web Application #JavaDayTokyo
 
FlinkCEP Library - Dawid Wysakowicz, GetInData (WHUG)
FlinkCEP Library - Dawid Wysakowicz, GetInData (WHUG)FlinkCEP Library - Dawid Wysakowicz, GetInData (WHUG)
FlinkCEP Library - Dawid Wysakowicz, GetInData (WHUG)
 
Lost in Translation:varnishlog, varnishtest(VUG7)
Lost in Translation:varnishlog, varnishtest(VUG7)Lost in Translation:varnishlog, varnishtest(VUG7)
Lost in Translation:varnishlog, varnishtest(VUG7)
 
About time
About timeAbout time
About time
 
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Kubernetes and real-time analytics - how to connect these two worlds with Apa...Kubernetes and real-time analytics - how to connect these two worlds with Apa...
Kubernetes and real-time analytics - how to connect these two worlds with Apa...
 
Velocity 2015 Amsterdam: Alerts overload
Velocity 2015 Amsterdam: Alerts overloadVelocity 2015 Amsterdam: Alerts overload
Velocity 2015 Amsterdam: Alerts overload
 

More from Big Data Spain

More from Big Data Spain (20)

Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
Big Data, Big Quality? by Irene Gonzálvez at Big Data Spain 2017
 
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
Scaling a backend for a big data and blockchain environment by Rafael Ríos at...
 
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017AI: The next frontier by Amparo Alonso at Big Data Spain 2017
AI: The next frontier by Amparo Alonso at Big Data Spain 2017
 
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
Disaster Recovery for Big Data by Carlos Izquierdo at Big Data Spain 2017
 
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
Presentation: Boost Hadoop and Spark with in-memory technologies by Akmal Cha...
 
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
Data science for lazy people, Automated Machine Learning by Diego Hueltes at ...
 
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
Training Deep Learning Models on Multiple GPUs in the Cloud by Enrique Otero ...
 
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
Unbalanced data: Same algorithms different techniques by Eric Martín at Big D...
 
State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...State of the art time-series analysis with deep learning by Javier Ordóñez at...
State of the art time-series analysis with deep learning by Javier Ordóñez at...
 
Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...Trading at market speed with the latest Kafka features by Iñigo González at B...
Trading at market speed with the latest Kafka features by Iñigo González at B...
 
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
Unified Stream Processing at Scale with Apache Samza by Jake Maes at Big Data...
 
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a... The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
The Analytic Platform behind IBM’s Watson Data Platform by Luciano Resende a...
 
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
Artificial Intelligence and Data-centric businesses by Óscar Méndez at Big Da...
 
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
Why big data didn’t end causal inference by Totte Harinen at Big Data Spain 2017
 
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
Meme Index. Analyzing fads and sensations on the Internet by Miguel Romero at...
 
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
Vehicle Big Data that Drives Smart City Advancement by Mike Branch at Big Dat...
 
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
End of the Myth: Ultra-Scalable Transactional Management by Ricardo Jiménez-P...
 
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
Attacking Machine Learning used in AntiVirus with Reinforcement by Rubén Mart...
 
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
More people, less banking: Blockchain by Salvador Casquero at Big Data Spain ...
 
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
Make the elephant fly, once again by Sourygna Luangsay at Big Data Spain 2017
 

Recently uploaded

Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
vu2urc
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
Earley Information Science
 

Recently uploaded (20)

ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptxEIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
EIS-Webinar-Prompt-Knowledge-Eng-2024-04-08.pptx
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 

Streaming analytics better than batch – when and why by Dawid Wysakowicz and Adam Kawa at Big Data Spain 2017

  • 1.
  • 2. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Streaming analytics better than batch - when and why ? _Adam Kawa - Dawid Wysakowicz -_ Krzysztof Zarzycki_
  • 3. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Have you ever built cool Big Data pipelines?
  • 4. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
  • 5. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Example Use-Case ■ Can be done in batch and real-time ■ User session analytics at Spotify ● Simple stats ■ Duration, number of songs, skips, searches etc. ● Advanced analytics ■ Mood, physical activity, real-time content, ads
  • 6. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Example Output How long do users listen to a new edition of Discover Weekly? _1. Dashboards_
  • 7. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Example Output How long do users listen to a new edition of Discover Weekly? Australian users are listening to Discover Weekly too short !!! _1. Dashboards_ _2. Alerts_
  • 8. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Example Output How long do users listen to a new edition of Discover Weekly? Australian users are listening to Discover Weekly too short !!! Recommend songs and ads based on current activity. _1. Dashboards_ _2. Alerts_ _3. Content_
  • 9. © Copyright. All rights reserved. Not to be reproduced without prior written consent. 1st - Batch Architecture 1h 1h 1h 1h - 1d 1h User Events User Sessions
  • 10. © Copyright. All rights reserved. Not to be reproduced without prior written consent. 1st - Batch Architecture 1h 1h 1h 1d 1h User Events User Sessions
  • 11. © Copyright. All rights reserved. Not to be reproduced without prior written consent. The More Moving Parts … ⬇ The higher learning curve ⬇ The more gluing code ⬇ The larger administrative effort ⬇ The more error-prone solution
  • 12. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Long Waiting Time Image source: “Continuous Analytics: Stream Query Processing in Practice”, Michael J Franklin, Professor, UC Berkley, Dec 2009 and http://www.slideshare.net/JoshBaer/shortening-the-feedback-loop-big-data-spain-external
  • 13. © Copyright. All rights reserved. Not to be reproduced without prior written consent. 2nd - Micro-Batch Architecture 1m - 1h
  • 14. © Copyright. All rights reserved. Not to be reproduced without prior written consent. ♪ ♪ No Built-In Session Windows ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ [10:00 - 11:00) [11:00 - 12:00)
  • 15. © Copyright. All rights reserved. Not to be reproduced without prior written consent. ♪ ♪ No Built-In Session Windows ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ [10:00 - 11:00) [11:00 - 12:00)
  • 16. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Late Data … ♪ ♪ ♪ ♪ ♪ ♪ Event Time 14:55 - 16:35 Processing Time
  • 17. © Copyright. All rights reserved. Not to be reproduced without prior written consent. ... Included in Current Batch ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ 14:55 - 16:35 16:50 - … Event Time Processing Time
  • 18. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Out-Of-Order Data … ♪ ♫ ♪ Event Time Processing Time
  • 19. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Out-Of-Order Data … ♪ ♫ ♪ ♪ ♪ ♫ ♪ ♪ ♫ Event Time Processing Time
  • 20. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Out-Of-Order Data … ♪ ♫ ♪ ♪ ♪ ♫ ♪ ♪ ♫ Event Time Processing Time
  • 21. © Copyright. All rights reserved. Not to be reproduced without prior written consent. ... Breaks Correctness ♪ ♫ ♪ ♪ ♪ ♫ ♪ ♫ ♪ ♬ ♫ ♪ ♪ ♫ ♪ ♫ ♪ ♬ ♫ ♪ Event Time Processing Time
  • 22. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Problems FILES, BATCHES, DATA LAKES
  • 23. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Solving Streaming Problem With Batch?
  • 24. © Copyright. All rights reserved. Not to be reproduced without prior written consent. 3rd - Streaming-First Architecture
  • 25. © Copyright. All rights reserved. Not to be reproduced without prior written consent. User Session Windows ♪Case A ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ Case B ♪ ♪ ♪ ♪ ♪ ♪ Session gap eg. 15 minutes ♪ ♪♪ 5
  • 26. © Copyright. All rights reserved. Not to be reproduced without prior written consent. User Session Windows ♪Case A ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ Case B ♪ ♪ ♪ ♪ ♪ ♪ Session gap eg. 15 minutes ♪ ♪♪ 5 [3,2]
  • 27. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Reading From Kafka val sessionStream : DataStream[SessionStats] = sEnv .addSource(new KafkaConsumer(...)) ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪♪ ♪ ♪ ♪ ♪ ♪ ♪
  • 28. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Session Windows With Gap val sessionStream : DataStream[SessionStats] = sEnv .addSource(new KafkaConsumer(...)) .keyBy(_.userId) ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ ♪ User 1 User 2
  • 29. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Session Windows With Gap val sessionStream : DataStream[SessionStats] = sEnv .addSource(new KafkaConsumer(...)) .keyBy(_.userId) .window(EventTimeSessionWindows.withGap(Time.minutes(15))) User 1 ♪ ♪ ♪ ♪ ♪ ♪ Session gap - 15 minutes ♪♪
  • 30. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Analyzing User Session val sessionStream : DataStream[SessionStats] = sEnv .addSource(new KafkaConsumer(...)) .keyBy(_.userId) .window(EventTimeSessionWindows.withGap(Time.minutes(15))) .apply(new CountSessionStats()) User 1 ♪ ♪ ♪ ♪ ♪ ♪ ♪♪
  • 31. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Handling Late Events val sessionStream : DataStream[SessionStats] = sEnv .addSource(new KafkaConsumer(...)) .keyBy(_.userId) .window(EventTimeSessionWindows.withGap(Time.minutes(15))) .allowedLateness(Time.minutes(60)) .apply(new CountSessionStats()) User 1 ♪ ♪ ♪ ♪ ♪ ♪ ♪♪ ♪
  • 32. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Triggering Early Results val sessionStream : DataStream[SessionStats] = sEnv .addSource(new KafkaConsumer(...)) .keyBy(_.userId) .window(EventTimeSessionWindows.withGap(Time.minutes(15))) .trigger(EarlyTriggeringTrigger.every(Time.minutes(10))) .allowedLateness(Time.minutes(60)) .apply(new CountSessionStats()) User 1 ♪ ♪ ♪ ♪ ♪ ♪ ♪♪
  • 33. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Sessionization Example val sessionStream : DataStream[SessionStats] = sEnv .addSource(new KafkaConsumer(...)) .keyBy(_.userId) .window(EventTimeSessionWindows.withGap(Time.minutes(15))) .trigger(EarlyTriggeringTrigger.every(Time.minutes(10))) .allowedLateness(Time.minutes(60)) .apply(new CountSessionStats()) Working example: https://github.com/getindata/flink-use-case
  • 34. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Modern Stream Processing Engines ■ Rich stream processing semantic ● Built-in support for event-time windows ● Accurate results for late / out-of-order events and replays ● Early triggers ■ Low latency and high-throughput ■ Exactly-once stateful processing
  • 35. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Modern Stream Processing Engines ■ Rich stream processing semantic ● Built-in support for event-time windows ● Accurate results for late / out-of-order events and replays ● Early triggers ■ Low latency and high-throughput ■ Exactly-once stateful processing User survey: http://data-artisans.com/flink-user-survey-2016-part-1 http://data-artisans.com/flink-user-survey-2016-part-2
  • 36. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
  • 37. © Copyright. All rights reserved. Not to be reproduced without prior written consent. How can I reprocess data?
  • 38. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Reprocessing Events In Flink 1. Take periodic snapshots of a job ● It stores Kafka offsets, on-flight sessions, application state 2. Restart a job from a savepoint rather than from a beginning
  • 39. © Copyright. All rights reserved. Not to be reproduced without prior written consent. What if data is no longer in Kafka?
  • 40. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Consuming Data From HDFS ■ Run your streaming code on HDFS (bounded data) ● You need to read data in event-time based order ● Implement mechanism of proper watermark generation
  • 41. © Copyright. All rights reserved. Not to be reproduced without prior written consent. What are usual stream processing applications?
  • 42. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Stream Analytics Image source: https://www.slideshare.net/sinisalyh/storm-at-spotify
  • 43. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Stream 24/7 Applications
  • 44. © Copyright. All rights reserved. Not to be reproduced without prior written consent. When is batch processing good?
  • 45. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Batch Processing Use-Cases ■ Ad-hoc analytics and data exploration ● Notebooks, Spark/Flink/Hive, Parquet, complete data sets ■ Technical advantages ● A large swaths of historical data in HDFS ● High-level libraries in mature batch technologies
  • 46. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Batch Processing Use-Cases ■ Ad-hoc analytics and data exploration ● Notebooks, Spark/Flink/Hive, Parquet, complete data sets ■ Implementation advantages ● Offline experiments over large historical data ■ Historical events are usually stored in HDFS, not Kafka ● High-level libraries in batch processing technologies ■ Spark MLlib, H2O (when data arrives continuously) don’t solve streaming problem with batch jobs
  • 47. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Who Are You, actually? ■ At GetInData, we build custom Big Data solutions ● Hadoop, Flink, Spark, Kafka and more ■ Our team is today represented by Krzysztof Zarzycki Dawid Wysakowicz Adam Kawa
  • 48. © Copyright. All rights reserved. Not to be reproduced without prior written consent. ■ Stream often the natural representation of your data ■ Stream processing is not only about low latency Summary
  • 49. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Q&A
  • 50. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Thanks !
  • 51. © Copyright. All rights reserved. Not to be reproduced without prior written consent.
  • 52. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Log Abstraction 11:00 - 12:00 12:00 - 13:00 … … 10:00 - … 10:00 - … 10:00 - 11:00
  • 53. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Spark Structured Streaming ⬇ Operates on top of micro-batches (Spark SQL engine) ■ The ALPHA version and the experimental API until July 11, 2017 ⬆ Easy-to-learn API (Dataset/DataFrame) ⬆ Rich ecosystem of tools and libraries e.g. MLlib ⬆ Supports event-time ⬇ Sessionization not yet supported - SPARK-10816 ⬇ Queryable state not yet supported - SPARK-16738
  • 54. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Kafka Streams ⬇ No exactly-once (just at-least-once) ⬇ Kafka as the only data source ⬇ No bounded streams (batch) optimizations ⬆ Simplicity ⬆ Embedded into application ⬆ Supports event-time ⬇ Lack of session windows
  • 55. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Apache Beam ⬆ Unified API for batch and streaming ⬆ Rich streaming processing semantics ⬆ Complex TriggerDSL ⬆ Multiple runtime environments ⬆ Spark, Flink, Apex, Dataflow ⬆ Side inputs and outputs ⬇ Verbose Java API ⬇ New project - Top level since 01/2017
  • 56. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Google Dataflow ■ Runtime environment for Apache Beam in Google Cloud ⬇ No support for Iterative Computations ⬆ Supports Side Outputs ⬆ Works with every Google Cloud Service (Pub/Sub, BigTable etc.)
  • 57. © Copyright. All rights reserved. Not to be reproduced without prior written consent. How to join with other data sets/streams?
  • 58. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Join With Other Datasets / Streams ■ Flink can join windowed streams easily ■ Join of data stream with data set is WIP ● Even with slowly changing data set! ● Even keyed data Stream 2 Stream 1 Joined Stream Input Stream Joined Stream + Id Name 1 John Doe 2 Jane Doe Dataset +
  • 59. © Copyright. All rights reserved. Not to be reproduced without prior written consent. I like this streaming API. Can I use it for batch?
  • 60. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Unified batch and streaming API ■ Not with raw Flink API ■ But with Flink Table API ■ Apache Beam
  • 61. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Is Flink production ready?
  • 62. © Copyright. All rights reserved. Not to be reproduced without prior written consent. Powered By Flink