SlideShare una empresa de Scribd logo
1 de 59
Descargar para leer sin conexión
Ufuk Celebi
uce@apache.org
HUG London
October 15, 2015
Streaming Data Flow with
Apache Flink
Recent History
April ‘14 December ‘14
v0.5 v0.6 v0.7
April ‘15
Project
Incubation
Top Level
Project
v0.8 v0.9
Currently moving towards 0.10 and 1.0 release.
What is Flink?
Streaming
Topologies
Stream
Time
Window
Count
Low Latency
Long Batch Pipelines
Resource Utilization
1.2
1.4
1.5
1.2
0.8
0.9
1.0
0.8
Rating Matrix User Matrix Item Matrix
1.5
1.7
1.2
0.6
1.0
1.1
0.8
0.4
W X Y ZW X Y Z
A
B
C
D
4.0
4.5
5.0
3.5
2.0
3.5
4.0
2.0
1.0
= X
User
Machine Learning
Iterative Algorithms
Graph Analysis
53
1 2
4
0.5
0.2 0.9
0.3
0.1
0.4
0.7
Mutable State
Overview
Deployment

Local (Single JVM) · Cluster (Standalone, YARN)
DataStream API
Unbounded Data
DataSet API
Bounded Data
Runtime
Distributed Streaming Data Flow
Libraries
Machine Learning · Graph Processing · SQL-like API
Stream Processing
Real world data is unbounded and is pushed to
systems.
BatchStreaming
Stream Platform Architecture
Server
Logs
Trxn
Logs
Sensor
Logs
Downstream
Systems
Flink
– Analyze and correlate streams
– Create derived streams
Kafka
– Gather and backup streams
– Offer streams
Cornerstones of Flink
Low Latency for fast results.
High Throughput to handle many events per second.
Exactly-once guarantees for correct results.
Intuitive APIs for productivity.
DataStream API
StreamExecutionEnvironment env = StreamExecutionEnvironment

.getExecutionEnvironment()
DataStream<String> data = env.fromElements(
"O Romeo, Romeo! wherefore art thou Romeo?”, ...);
// DataStream Windowed WordCount
DataStream<Tuple2<String, Integer>> counts = data
.flatMap(new SplitByWhitespace()) // (word, 1)
.keyBy(0) // [word, [1, 1, …]] for 10 seconds
.timeWindow(Time.of(10, TimeUnit.SECONDS))
.sum(1); // sum per word per 10 second window
counts.print();
env.execute();
DataStream API
StreamExecutionEnvironment env = StreamExecutionEnvironment

.getExecutionEnvironment()
DataStream<String> data = env.fromElements(
"O Romeo, Romeo! wherefore art thou Romeo?”, ...);
// DataStream Windowed WordCount
DataStream<Tuple2<String, Integer>> counts = data
.flatMap(new SplitByWhitespace()) // (word, 1)
.keyBy(0) // [word, [1, 1, …]] for 10 seconds
.timeWindow(Time.of(10, TimeUnit.SECONDS))
.sum(1); // sum per word per 10 second window
counts.print();
env.execute();
DataStream API
StreamExecutionEnvironment env = StreamExecutionEnvironment

.getExecutionEnvironment()
DataStream<String> data = env.fromElements(
"O Romeo, Romeo! wherefore art thou Romeo?”, ...);
// DataStream Windowed WordCount
DataStream<Tuple2<String, Integer>> counts = data
.flatMap(new SplitByWhitespace()) // (word, 1)
.keyBy(0) // [word, [1, 1, …]] for 10 seconds
.timeWindow(Time.of(10, TimeUnit.SECONDS))
.sum(1); // sum per word per 10 second window
counts.print();
env.execute();
DataStream API
StreamExecutionEnvironment env = StreamExecutionEnvironment

.getExecutionEnvironment()
DataStream<String> data = env.fromElements(
"O Romeo, Romeo! wherefore art thou Romeo?”, ...);
// DataStream Windowed WordCount
DataStream<Tuple2<String, Integer>> counts = data
.flatMap(new SplitByWhitespace()) // (word, 1)
.keyBy(0) // [word, [1, 1, …]] for 10 seconds
.timeWindow(Time.of(10, TimeUnit.SECONDS))
.sum(1); // sum per word per 10 second window
counts.print();
env.execute();
DataStream API
StreamExecutionEnvironment env = StreamExecutionEnvironment

.getExecutionEnvironment()
DataStream<String> data = env.fromElements(
"O Romeo, Romeo! wherefore art thou Romeo?”, ...);
// DataStream Windowed WordCount
DataStream<Tuple2<String, Integer>> counts = data
.flatMap(new SplitByWhitespace()) // (word, 1)
.keyBy(0) // [word, [1, 1, …]] for 10 seconds
.timeWindow(Time.of(10, TimeUnit.SECONDS))
.sum(1); // sum per word per 10 second window
counts.print();
env.execute();
DataStream API
StreamExecutionEnvironment env = StreamExecutionEnvironment

.getExecutionEnvironment()
DataStream<String> data = env.fromElements(
"O Romeo, Romeo! wherefore art thou Romeo?”, ...);
// DataStream Windowed WordCount
DataStream<Tuple2<String, Integer>> counts = data
.flatMap(new SplitByWhitespace()) // (word, 1)
.keyBy(0) // [word, [1, 1, …]] for 10 seconds
.timeWindow(Time.of(10, TimeUnit.SECONDS))
.sum(1); // sum per word per 10 second window
counts.print();
env.execute();
DataStream API
StreamExecutionEnvironment env = StreamExecutionEnvironment

.getExecutionEnvironment()
DataStream<String> data = env.fromElements(
"O Romeo, Romeo! wherefore art thou Romeo?”, ...);
// DataStream Windowed WordCount
DataStream<Tuple2<String, Integer>> counts = data
.flatMap(new SplitByWhitespace()) // (word, 1)
.keyBy(0) // [word, [1, 1, …]] for 10 seconds
.timeWindow(Time.of(10, TimeUnit.SECONDS))
.sum(1); // sum per word per 10 second window
counts.print();
env.execute();
DataStream API
StreamExecutionEnvironment env = StreamExecutionEnvironment

.getExecutionEnvironment()
DataStream<String> data = env.fromElements(
"O Romeo, Romeo! wherefore art thou Romeo?”, ...);
// DataStream Windowed WordCount
DataStream<Tuple2<String, Integer>> counts = data
.flatMap(new SplitByWhitespace()) // (word, 1)
.keyBy(0) // [word, [1, 1, …]] for 10 seconds
.timeWindow(Time.of(10, TimeUnit.SECONDS))
.sum(1); // sum per word per 10 second window
counts.print();
env.execute();
Pipelining
s1 t1 w1
s2 t2 w2
Source Tokenizer Window Count
Complete Pipeline Online Concurrently.
Pipelining
s1 t1 w1
s2 t2 w2
Source Tokenizer Window Count
Complete Pipeline Online Concurrently.
Chained tasks
Pipelining
s1
s2 t2 w2
t1 w1
Source Tokenizer Window Count
Complete Pipeline Online Concurrently.
Chained tasks Pipelined Shuffle
Streaming Fault Tolerance
At Least Once
• Ensure that all operators see all events.
Exactly Once
• Ensure that all operators see all events.
• Do not perform duplicates updates to operator state.
Streaming Fault Tolerance
At Least Once
• Ensure that all operators see all events.
Exactly Once
• Ensure that all operators see all events.
• Do not perform duplicates updates to operator state.
Flink guarantees exactly once processing.
Distributed Snaphots
Barriers flow through the topology in line with data.
Flink guarantees exactly once processing.
Part of
snapshot
Distributed Snaphots
Barriers flow through the topology in line with data.
Flink guarantees exactly once processing.
Part of
snapshot
Distributed Snaphots
Barriers flow through the topology in line with data.
Flink guarantees exactly once processing.
Part of
snapshot
Distributed Snaphots
Barriers flow through the topology in line with data.
Flink guarantees exactly once processing.
Part of
snapshot
Distributed Snaphots
Barriers flow through the topology in line with data.
Flink guarantees exactly once processing.
Part of
snapshot
Distributed Snaphots
Barriers flow through the topology in line with data.
Flink guarantees exactly once processing.
Part of
snapshot
Distributed Snaphots
Barriers flow through the topology in line with data.
Flink guarantees exactly once processing.
Part of
snapshot
Distributed Snaphots
Barriers flow through the topology in line with data.
Flink guarantees exactly once processing.
Part of
snapshot
Distributed Snaphots
Barriers flow through the topology in line with data.
Flink guarantees exactly once processing.
Part of
snapshot
Distributed Snaphots
Barriers flow through the topology in line with data.
Flink guarantees exactly once processing.
Part of
snapshot
Distributed Snaphots
Flink guarantees exactly once processing.


JobManager
Master
State Backend
Ceckpoint Data
Source 1: State 1:
Source 2: State 2:
Source 3: Sink 1:
Source 4: Sink 2:
Offset: 6791
Offset: 7252
Offset: 5589
Offset: 6843
Distributed Snaphots
Flink guarantees exactly once processing.


JobManager
Master
State Backend
Ceckpoint Data
Source 1: State 1:
Source 2: State 2:
Source 3: Sink 1:
Source 4: Sink 2:
Offset: 6791
Offset: 7252
Offset: 5589
Offset: 6843
Start Checkpoint
Message
Distributed Snaphots
Flink guarantees exactly once processing.


JobManager
Master
State Backend
Ceckpoint Data
Source 1: 6791 State 1:
Source 2: 7252 State 2:
Source 3: 5589 Sink 1:
Source 4: 6843 Sink 2:
Emit Barriers
Acknowledge with
Position
Distributed Snaphots
Flink guarantees exactly once processing.


JobManager
Master
State Backend
Ceckpoint Data
Source 1: 6791 State 1:
Source 2: 7252 State 2:
Source 3: 5589 Sink 1:
Source 4: 6843 Sink 2:
Received barrier
at each input
Distributed Snaphots
Flink guarantees exactly once processing.


JobManager
Master
State Backend
Ceckpoint Data
Source 1: 6791 State 1:
Source 2: 7252 State 2:
Source 3: 5589 Sink 1:
Source 4: 6843 Sink 2:
s1 Write Snapshot
of its state
Received barrier
at each input
Distributed Snaphots
Flink guarantees exactly once processing.


JobManager
Master
State Backend
Ceckpoint Data
Source 1: 6791 State 1: PTR1
Source 2: 7252 State 2: PTR2
Source 3: 5589 Sink 1:
Source 4: 6843 Sink 2:
s1
Acknowledge with
pointer to state
s2
Distributed Snaphots
Flink guarantees exactly once processing.


JobManager
Master
State Backend
Ceckpoint Data
Source 1: 6791 State 1: PTR1
Source 2: 7252 State 2: PTR2
Source 3: 5589 Sink 1: ACK
Source 4: 6843 Sink 2: ACK
s1 s2
Acknowledge Checkpoint
Received barrier
at each input
Distributed Snaphots
Flink guarantees exactly once processing.


JobManager
Master
State Backend
Ceckpoint Data
Source 1: 6791 State 1: PTR1
Source 2: 7252 State 2: PTR2
Source 3: 5589 Sink 1: ACK
Source 4: 6843 Sink 2: ACK
s1 s2
Operator State
User-defined state
• Flink’s transformations are long running operators
• Feel free to keep objects around
• Hooks to include into system’s checkpoint
Windowed streams
• Time, count, and data-driven windows
• Managed by the system
Batch on Streaming
DataStream API
Unbounded Data
DataSet API
Bounded Data
Runtime
Distributed Streaming Data Flow
Libraries
Machine Learning · Graph Processing · SQL-like API
Batch on Streaming
Run a bounded stream (data set) on

a stream processor.
Bounded
data set
Unbounded
data stream
Batch on Streaming
Stream Windows
Pipelined
Data Exchange
Global View
Pipelined or Blocking
Data Exchange
Infinite Streams Finite Streams
Run a bounded stream (data set) on

a stream processor.
Batch Pipelines
Batch Pipelines
Data exchange

is mostly streamed
Batch Pipelines
Data exchange

is mostly streamed
Some operators block
(e.g. sort, hash table)
DataSet API
ExecutionEnvironment env = ExecutionEnvironment

.getExecutionEnvironment()
DataSet<String> data = env.fromElements(
"O Romeo, Romeo! wherefore art thou Romeo?”, ...);
// DataSet WordCount
DataSet<Tuple2<String, Integer>> counts = data
.flatMap(new SplitByWhitespace()) // (word, 1)
.groupBy(0) // [word, [1, 1, …]]
.sum(1); // sum per word for all occurrences
counts.print();
DataStream API
ExecutionEnvironment env = ExecutionEnvironment

.getExecutionEnvironment()
DataSet<String> data = env.fromElements(
"O Romeo, Romeo! wherefore art thou Romeo?”, ...);
// DataSet WordCount
DataSet<Tuple2<String, Integer>> counts = data
.flatMap(new SplitByWhitespace()) // (word, 1)
.groupBy(0) // [word, [1, 1, …]]
.sum(1); // sum per word for all occurrences
counts.print();
DataStream API
ExecutionEnvironment env = ExecutionEnvironment

.getExecutionEnvironment()
DataSet<String> data = env.fromElements(
"O Romeo, Romeo! wherefore art thou Romeo?”, ...);
// DataSet WordCount
DataSet<Tuple2<String, Integer>> counts = data
.flatMap(new SplitByWhitespace()) // (word, 1)
.groupBy(0) // [word, [1, 1, …]]
.sum(1); // sum per word for all occurrences
counts.print();
DataStream API
ExecutionEnvironment env = ExecutionEnvironment

.getExecutionEnvironment()
DataSet<String> data = env.fromElements(
"O Romeo, Romeo! wherefore art thou Romeo?”, ...);
// DataSet WordCount
DataSet<Tuple2<String, Integer>> counts = data
.flatMap(new SplitByWhitespace()) // (word, 1)
.groupBy(0) // [word, [1, 1, …]]
.sum(1); // sum per word for all occurrences
counts.print();
DataStream API
ExecutionEnvironment env = ExecutionEnvironment

.getExecutionEnvironment()
DataSet<String> data = env.fromElements(
"O Romeo, Romeo! wherefore art thou Romeo?”, ...);
// DataSet WordCount
DataSet<Tuple2<String, Integer>> counts = data
.flatMap(new SplitByWhitespace()) // (word, 1)
.groupBy(0) // [word, [1, 1, …]]
.sum(1); // sum per word for all occurrences
counts.print();
DataStream API
ExecutionEnvironment env = ExecutionEnvironment

.getExecutionEnvironment()
DataSet<String> data = env.fromElements(
"O Romeo, Romeo! wherefore art thou Romeo?”, ...);
// DataSet WordCount
DataSet<Tuple2<String, Integer>> counts = data
.flatMap(new SplitByWhitespace()) // (word, 1)
.groupBy(0) // [word, [1, 1, …]]
.sum(1); // sum per word for all occurrences
counts.print();
DataStream API
ExecutionEnvironment env = ExecutionEnvironment

.getExecutionEnvironment()
DataSet<String> data = env.fromElements(
"O Romeo, Romeo! wherefore art thou Romeo?”, ...);
// DataSet WordCount
DataSet<Tuple2<String, Integer>> counts = data
.flatMap(new SplitByWhitespace()) // (word, 1)
.groupBy(0) // [word, [1, 1, …]]
.sum(1); // sum per word for all occurrences
counts.print();
Batch-specific optimizations
Managed memory
• On- and off-heap memory
• Internal operators (e.g. join or sort) with out-of-core
support
• Serialization stack for user-types
Cost-based optimizer
• Program adapts to changing data size
Getting Started
Project Page: http://flink.apache.org
Getting Started
Project Page: http://flink.apache.org
Quickstarts: Java & Scala API
Getting Started
Project Page: http://flink.apache.org
Docs: Programming Guides
Getting Started
Project Page: http://flink.apache.org
Get Involved: Mailing Lists, Stack Overflow, IRC, …
Blogs
http://flink.apache.org/blog
http://data-artisans.com/blog
Twitter
@ApacheFlink
Mailing lists
(news|user|dev)@flink.apache.org
Apache Flink

Más contenido relacionado

La actualidad más candente

Introduction to rx java for android
Introduction to rx java for androidIntroduction to rx java for android
Introduction to rx java for androidEsa Firman
 
Intro to ReactiveCocoa
Intro to ReactiveCocoaIntro to ReactiveCocoa
Intro to ReactiveCocoakleneau
 
An Introduction to Reactive Cocoa
An Introduction to Reactive CocoaAn Introduction to Reactive Cocoa
An Introduction to Reactive CocoaSmartLogic
 
Functional Reactive Programming (FRP): Working with RxJS
Functional Reactive Programming (FRP): Working with RxJSFunctional Reactive Programming (FRP): Working with RxJS
Functional Reactive Programming (FRP): Working with RxJSOswald Campesato
 
ClojureScript for the web
ClojureScript for the webClojureScript for the web
ClojureScript for the webMichiel Borkent
 
ClojureScript loves React, DomCode May 26 2015
ClojureScript loves React, DomCode May 26 2015ClojureScript loves React, DomCode May 26 2015
ClojureScript loves React, DomCode May 26 2015Michiel Borkent
 
Deep Dumpster Diving
Deep Dumpster DivingDeep Dumpster Diving
Deep Dumpster DivingRonnBlack
 
Introduction to RxJS
Introduction to RxJSIntroduction to RxJS
Introduction to RxJSBrainhub
 
ReactiveCocoa in Practice
ReactiveCocoa in PracticeReactiveCocoa in Practice
ReactiveCocoa in PracticeOutware Mobile
 
Processing large-scale graphs with Google(TM) Pregel
Processing large-scale graphs with Google(TM) PregelProcessing large-scale graphs with Google(TM) Pregel
Processing large-scale graphs with Google(TM) PregelArangoDB Database
 
RxJava applied [JavaDay Kyiv 2016]
RxJava applied [JavaDay Kyiv 2016]RxJava applied [JavaDay Kyiv 2016]
RxJava applied [JavaDay Kyiv 2016]Igor Lozynskyi
 
Intro to RxJava/RxAndroid - GDG Munich Android
Intro to RxJava/RxAndroid - GDG Munich AndroidIntro to RxJava/RxAndroid - GDG Munich Android
Intro to RxJava/RxAndroid - GDG Munich AndroidEgor Andreevich
 
Luis Atencio on RxJS
Luis Atencio on RxJSLuis Atencio on RxJS
Luis Atencio on RxJSLuis Atencio
 

La actualidad más candente (20)

Introduction to rx java for android
Introduction to rx java for androidIntroduction to rx java for android
Introduction to rx java for android
 
Intro to ReactiveCocoa
Intro to ReactiveCocoaIntro to ReactiveCocoa
Intro to ReactiveCocoa
 
An Introduction to Reactive Cocoa
An Introduction to Reactive CocoaAn Introduction to Reactive Cocoa
An Introduction to Reactive Cocoa
 
Rxjs ppt
Rxjs pptRxjs ppt
Rxjs ppt
 
Functional Reactive Programming (FRP): Working with RxJS
Functional Reactive Programming (FRP): Working with RxJSFunctional Reactive Programming (FRP): Working with RxJS
Functional Reactive Programming (FRP): Working with RxJS
 
ClojureScript for the web
ClojureScript for the webClojureScript for the web
ClojureScript for the web
 
bluespec talk
bluespec talkbluespec talk
bluespec talk
 
ClojureScript loves React, DomCode May 26 2015
ClojureScript loves React, DomCode May 26 2015ClojureScript loves React, DomCode May 26 2015
ClojureScript loves React, DomCode May 26 2015
 
Parallel streams in java 8
Parallel streams in java 8Parallel streams in java 8
Parallel streams in java 8
 
Deep Dumpster Diving
Deep Dumpster DivingDeep Dumpster Diving
Deep Dumpster Diving
 
Introduction to RxJS
Introduction to RxJSIntroduction to RxJS
Introduction to RxJS
 
ReactiveCocoa in Practice
ReactiveCocoa in PracticeReactiveCocoa in Practice
ReactiveCocoa in Practice
 
Processing large-scale graphs with Google(TM) Pregel
Processing large-scale graphs with Google(TM) PregelProcessing large-scale graphs with Google(TM) Pregel
Processing large-scale graphs with Google(TM) Pregel
 
Full Stack Clojure
Full Stack ClojureFull Stack Clojure
Full Stack Clojure
 
Presto overview
Presto overviewPresto overview
Presto overview
 
RxJava applied [JavaDay Kyiv 2016]
RxJava applied [JavaDay Kyiv 2016]RxJava applied [JavaDay Kyiv 2016]
RxJava applied [JavaDay Kyiv 2016]
 
Rxjava meetup presentation
Rxjava meetup presentationRxjava meetup presentation
Rxjava meetup presentation
 
Rxjs ngvikings
Rxjs ngvikingsRxjs ngvikings
Rxjs ngvikings
 
Intro to RxJava/RxAndroid - GDG Munich Android
Intro to RxJava/RxAndroid - GDG Munich AndroidIntro to RxJava/RxAndroid - GDG Munich Android
Intro to RxJava/RxAndroid - GDG Munich Android
 
Luis Atencio on RxJS
Luis Atencio on RxJSLuis Atencio on RxJS
Luis Atencio on RxJS
 

Similar a Streaming Dataflow with Apache Flink

Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015Till Rohrmann
 
When Web Services Go Bad
When Web Services Go BadWhen Web Services Go Bad
When Web Services Go BadSteve Loughran
 
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...Flink Forward
 
Going Reactive with Relational Databases
Going Reactive with Relational DatabasesGoing Reactive with Relational Databases
Going Reactive with Relational DatabasesIvaylo Pashov
 
Tornado Web Server Internals
Tornado Web Server InternalsTornado Web Server Internals
Tornado Web Server InternalsPraveen Gollakota
 
Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorAljoscha Krettek
 
Loophole: Timing Attacks on Shared Event Loops in Chrome
Loophole: Timing Attacks on Shared Event Loops in ChromeLoophole: Timing Attacks on Shared Event Loops in Chrome
Loophole: Timing Attacks on Shared Event Loops in Chromecgvwzq
 
Unbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniUnbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniMonal Daxini
 
K. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward KeynoteK. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward KeynoteFlink Forward
 
Ufuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemUfuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemFlink Forward
 
[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic
[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic
[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogicRakuten Group, Inc.
 
CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016
CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016
CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016Cloud Native Day Tel Aviv
 
Leveraging open source tools to gain insight into OpenStack Swift
Leveraging open source tools to gain insight into OpenStack SwiftLeveraging open source tools to gain insight into OpenStack Swift
Leveraging open source tools to gain insight into OpenStack SwiftDmitry Sotnikov
 
Journey through the ML model deployment to production by Stanko Kuveljic
Journey through the ML model deployment to production by Stanko KuveljicJourney through the ML model deployment to production by Stanko Kuveljic
Journey through the ML model deployment to production by Stanko KuveljicSmartCat
 
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...Flink Forward
 
Journey through the ML model deployment to production @DSC5
Journey through the ML model deployment to production @DSC5Journey through the ML model deployment to production @DSC5
Journey through the ML model deployment to production @DSC5SmartCat
 
A journey through the machine learning model deployment to production
A journey through the machine learning model deployment to productionA journey through the machine learning model deployment to production
A journey through the machine learning model deployment to productionInstitute of Contemporary Sciences
 
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteAdvanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteStreamNative
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at SpeedC4Media
 

Similar a Streaming Dataflow with Apache Flink (20)

Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
Streaming Data Flow with Apache Flink @ Paris Flink Meetup 2015
 
When Web Services Go Bad
When Web Services Go BadWhen Web Services Go Bad
When Web Services Go Bad
 
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
Flink Forward Berlin 2018: Steven Wu - "Failure is not fatal: what is your re...
 
Going Reactive with Relational Databases
Going Reactive with Relational DatabasesGoing Reactive with Relational Databases
Going Reactive with Relational Databases
 
Tornado Web Server Internals
Tornado Web Server InternalsTornado Web Server Internals
Tornado Web Server Internals
 
Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream Processor
 
Loophole: Timing Attacks on Shared Event Loops in Chrome
Loophole: Timing Attacks on Shared Event Loops in ChromeLoophole: Timing Attacks on Shared Event Loops in Chrome
Loophole: Timing Attacks on Shared Event Loops in Chrome
 
Flink. Pure Streaming
Flink. Pure StreamingFlink. Pure Streaming
Flink. Pure Streaming
 
Unbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxiniUnbounded bounded-data-strangeloop-2016-monal-daxini
Unbounded bounded-data-strangeloop-2016-monal-daxini
 
K. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward KeynoteK. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward Keynote
 
Ufuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemUfuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one System
 
[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic
[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic
[Rakuten TechConf2014] [C-5] Ichiba Architecture on ExaLogic
 
CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016
CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016
CI Provisioning with OpenStack - Gidi Samuels - OpenStack Day Israel 2016
 
Leveraging open source tools to gain insight into OpenStack Swift
Leveraging open source tools to gain insight into OpenStack SwiftLeveraging open source tools to gain insight into OpenStack Swift
Leveraging open source tools to gain insight into OpenStack Swift
 
Journey through the ML model deployment to production by Stanko Kuveljic
Journey through the ML model deployment to production by Stanko KuveljicJourney through the ML model deployment to production by Stanko Kuveljic
Journey through the ML model deployment to production by Stanko Kuveljic
 
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...
Flink Forward SF 2017: Stefan Richter - Improvements for large state and reco...
 
Journey through the ML model deployment to production @DSC5
Journey through the ML model deployment to production @DSC5Journey through the ML model deployment to production @DSC5
Journey through the ML model deployment to production @DSC5
 
A journey through the machine learning model deployment to production
A journey through the machine learning model deployment to productionA journey through the machine learning model deployment to production
A journey through the machine learning model deployment to production
 
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 KeynoteAdvanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
Advanced Stream Processing with Flink and Pulsar - Pulsar Summit NA 2021 Keynote
 
Fault Tolerance at Speed
Fault Tolerance at SpeedFault Tolerance at Speed
Fault Tolerance at Speed
 

Más de huguk

Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifactahuguk
 
ether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introhuguk
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoophuguk
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...huguk
 
Extracting maximum value from data while protecting consumer privacy. Jason ...
Extracting maximum value from data while protecting consumer privacy.  Jason ...Extracting maximum value from data while protecting consumer privacy.  Jason ...
Extracting maximum value from data while protecting consumer privacy. Jason ...huguk
 
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM WatsonIntelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watsonhuguk
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLhuguk
 
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...huguk
 
Jonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & PitchingJonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & Pitchinghuguk
 
Signal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News MonitoringSignal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News Monitoringhuguk
 
Dean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your StartupDean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your Startuphuguk
 
Peter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapultPeter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapulthuguk
 
Cytora: Real-Time Political Risk Analysis
Cytora:  Real-Time Political Risk AnalysisCytora:  Real-Time Political Risk Analysis
Cytora: Real-Time Political Risk Analysishuguk
 
Cubitic: Predictive Analytics
Cubitic: Predictive AnalyticsCubitic: Predictive Analytics
Cubitic: Predictive Analyticshuguk
 
Bird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made SocialBird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made Socialhuguk
 
Aiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine IntelligenceAiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine Intelligencehuguk
 
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive huguk
 
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...huguk
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthyhuguk
 
Fast real-time approximations using Spark streaming
Fast real-time approximations using Spark streamingFast real-time approximations using Spark streaming
Fast real-time approximations using Spark streaminghuguk
 

Más de huguk (20)

Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, TrifactaData Wrangling on Hadoop - Olivier De Garrigues, Trifacta
Data Wrangling on Hadoop - Olivier De Garrigues, Trifacta
 
ether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp introether.camp - Hackathon & ether.camp intro
ether.camp - Hackathon & ether.camp intro
 
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and HadoopGoogle Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
Google Cloud Dataproc - Easier, faster, more cost-effective Spark and Hadoop
 
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
Using Big Data techniques to query and store OpenStreetMap data. Stephen Knox...
 
Extracting maximum value from data while protecting consumer privacy. Jason ...
Extracting maximum value from data while protecting consumer privacy.  Jason ...Extracting maximum value from data while protecting consumer privacy.  Jason ...
Extracting maximum value from data while protecting consumer privacy. Jason ...
 
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM WatsonIntelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
Intelligence Augmented vs Artificial Intelligence. Alex Flamant, IBM Watson
 
Lambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale MLLambda architecture on Spark, Kafka for real-time large scale ML
Lambda architecture on Spark, Kafka for real-time large scale ML
 
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...Today’s reality Hadoop with Spark- How to select the best Data Science approa...
Today’s reality Hadoop with Spark- How to select the best Data Science approa...
 
Jonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & PitchingJonathon Southam: Venture Capital, Funding & Pitching
Jonathon Southam: Venture Capital, Funding & Pitching
 
Signal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News MonitoringSignal Media: Real-Time Media & News Monitoring
Signal Media: Real-Time Media & News Monitoring
 
Dean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your StartupDean Bryen: Scaling The Platform For Your Startup
Dean Bryen: Scaling The Platform For Your Startup
 
Peter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapultPeter Karney: Intro to the Digital catapult
Peter Karney: Intro to the Digital catapult
 
Cytora: Real-Time Political Risk Analysis
Cytora:  Real-Time Political Risk AnalysisCytora:  Real-Time Political Risk Analysis
Cytora: Real-Time Political Risk Analysis
 
Cubitic: Predictive Analytics
Cubitic: Predictive AnalyticsCubitic: Predictive Analytics
Cubitic: Predictive Analytics
 
Bird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made SocialBird.i: Earth Observation Data Made Social
Bird.i: Earth Observation Data Made Social
 
Aiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine IntelligenceAiseedo: Real Time Machine Intelligence
Aiseedo: Real Time Machine Intelligence
 
Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive Secrets of Spark's success - Deenar Toraskar, Think Reactive
Secrets of Spark's success - Deenar Toraskar, Think Reactive
 
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
TV Marketing and big data: cat and dog or thick as thieves? Krzysztof Osiewal...
 
Hadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun MurthyHadoop - Looking to the Future By Arun Murthy
Hadoop - Looking to the Future By Arun Murthy
 
Fast real-time approximations using Spark streaming
Fast real-time approximations using Spark streamingFast real-time approximations using Spark streaming
Fast real-time approximations using Spark streaming
 

Último

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 

Último (20)

IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 

Streaming Dataflow with Apache Flink

  • 1. Ufuk Celebi uce@apache.org HUG London October 15, 2015 Streaming Data Flow with Apache Flink
  • 2. Recent History April ‘14 December ‘14 v0.5 v0.6 v0.7 April ‘15 Project Incubation Top Level Project v0.8 v0.9 Currently moving towards 0.10 and 1.0 release.
  • 3. What is Flink? Streaming Topologies Stream Time Window Count Low Latency Long Batch Pipelines Resource Utilization 1.2 1.4 1.5 1.2 0.8 0.9 1.0 0.8 Rating Matrix User Matrix Item Matrix 1.5 1.7 1.2 0.6 1.0 1.1 0.8 0.4 W X Y ZW X Y Z A B C D 4.0 4.5 5.0 3.5 2.0 3.5 4.0 2.0 1.0 = X User Machine Learning Iterative Algorithms Graph Analysis 53 1 2 4 0.5 0.2 0.9 0.3 0.1 0.4 0.7 Mutable State
  • 4. Overview Deployment
 Local (Single JVM) · Cluster (Standalone, YARN) DataStream API Unbounded Data DataSet API Bounded Data Runtime Distributed Streaming Data Flow Libraries Machine Learning · Graph Processing · SQL-like API
  • 5. Stream Processing Real world data is unbounded and is pushed to systems. BatchStreaming
  • 6. Stream Platform Architecture Server Logs Trxn Logs Sensor Logs Downstream Systems Flink – Analyze and correlate streams – Create derived streams Kafka – Gather and backup streams – Offer streams
  • 7. Cornerstones of Flink Low Latency for fast results. High Throughput to handle many events per second. Exactly-once guarantees for correct results. Intuitive APIs for productivity.
  • 8. DataStream API StreamExecutionEnvironment env = StreamExecutionEnvironment
 .getExecutionEnvironment() DataStream<String> data = env.fromElements( "O Romeo, Romeo! wherefore art thou Romeo?”, ...); // DataStream Windowed WordCount DataStream<Tuple2<String, Integer>> counts = data .flatMap(new SplitByWhitespace()) // (word, 1) .keyBy(0) // [word, [1, 1, …]] for 10 seconds .timeWindow(Time.of(10, TimeUnit.SECONDS)) .sum(1); // sum per word per 10 second window counts.print(); env.execute();
  • 9. DataStream API StreamExecutionEnvironment env = StreamExecutionEnvironment
 .getExecutionEnvironment() DataStream<String> data = env.fromElements( "O Romeo, Romeo! wherefore art thou Romeo?”, ...); // DataStream Windowed WordCount DataStream<Tuple2<String, Integer>> counts = data .flatMap(new SplitByWhitespace()) // (word, 1) .keyBy(0) // [word, [1, 1, …]] for 10 seconds .timeWindow(Time.of(10, TimeUnit.SECONDS)) .sum(1); // sum per word per 10 second window counts.print(); env.execute();
  • 10. DataStream API StreamExecutionEnvironment env = StreamExecutionEnvironment
 .getExecutionEnvironment() DataStream<String> data = env.fromElements( "O Romeo, Romeo! wherefore art thou Romeo?”, ...); // DataStream Windowed WordCount DataStream<Tuple2<String, Integer>> counts = data .flatMap(new SplitByWhitespace()) // (word, 1) .keyBy(0) // [word, [1, 1, …]] for 10 seconds .timeWindow(Time.of(10, TimeUnit.SECONDS)) .sum(1); // sum per word per 10 second window counts.print(); env.execute();
  • 11. DataStream API StreamExecutionEnvironment env = StreamExecutionEnvironment
 .getExecutionEnvironment() DataStream<String> data = env.fromElements( "O Romeo, Romeo! wherefore art thou Romeo?”, ...); // DataStream Windowed WordCount DataStream<Tuple2<String, Integer>> counts = data .flatMap(new SplitByWhitespace()) // (word, 1) .keyBy(0) // [word, [1, 1, …]] for 10 seconds .timeWindow(Time.of(10, TimeUnit.SECONDS)) .sum(1); // sum per word per 10 second window counts.print(); env.execute();
  • 12. DataStream API StreamExecutionEnvironment env = StreamExecutionEnvironment
 .getExecutionEnvironment() DataStream<String> data = env.fromElements( "O Romeo, Romeo! wherefore art thou Romeo?”, ...); // DataStream Windowed WordCount DataStream<Tuple2<String, Integer>> counts = data .flatMap(new SplitByWhitespace()) // (word, 1) .keyBy(0) // [word, [1, 1, …]] for 10 seconds .timeWindow(Time.of(10, TimeUnit.SECONDS)) .sum(1); // sum per word per 10 second window counts.print(); env.execute();
  • 13. DataStream API StreamExecutionEnvironment env = StreamExecutionEnvironment
 .getExecutionEnvironment() DataStream<String> data = env.fromElements( "O Romeo, Romeo! wherefore art thou Romeo?”, ...); // DataStream Windowed WordCount DataStream<Tuple2<String, Integer>> counts = data .flatMap(new SplitByWhitespace()) // (word, 1) .keyBy(0) // [word, [1, 1, …]] for 10 seconds .timeWindow(Time.of(10, TimeUnit.SECONDS)) .sum(1); // sum per word per 10 second window counts.print(); env.execute();
  • 14. DataStream API StreamExecutionEnvironment env = StreamExecutionEnvironment
 .getExecutionEnvironment() DataStream<String> data = env.fromElements( "O Romeo, Romeo! wherefore art thou Romeo?”, ...); // DataStream Windowed WordCount DataStream<Tuple2<String, Integer>> counts = data .flatMap(new SplitByWhitespace()) // (word, 1) .keyBy(0) // [word, [1, 1, …]] for 10 seconds .timeWindow(Time.of(10, TimeUnit.SECONDS)) .sum(1); // sum per word per 10 second window counts.print(); env.execute();
  • 15. DataStream API StreamExecutionEnvironment env = StreamExecutionEnvironment
 .getExecutionEnvironment() DataStream<String> data = env.fromElements( "O Romeo, Romeo! wherefore art thou Romeo?”, ...); // DataStream Windowed WordCount DataStream<Tuple2<String, Integer>> counts = data .flatMap(new SplitByWhitespace()) // (word, 1) .keyBy(0) // [word, [1, 1, …]] for 10 seconds .timeWindow(Time.of(10, TimeUnit.SECONDS)) .sum(1); // sum per word per 10 second window counts.print(); env.execute();
  • 16. Pipelining s1 t1 w1 s2 t2 w2 Source Tokenizer Window Count Complete Pipeline Online Concurrently.
  • 17. Pipelining s1 t1 w1 s2 t2 w2 Source Tokenizer Window Count Complete Pipeline Online Concurrently. Chained tasks
  • 18. Pipelining s1 s2 t2 w2 t1 w1 Source Tokenizer Window Count Complete Pipeline Online Concurrently. Chained tasks Pipelined Shuffle
  • 19.
  • 20. Streaming Fault Tolerance At Least Once • Ensure that all operators see all events. Exactly Once • Ensure that all operators see all events. • Do not perform duplicates updates to operator state.
  • 21. Streaming Fault Tolerance At Least Once • Ensure that all operators see all events. Exactly Once • Ensure that all operators see all events. • Do not perform duplicates updates to operator state. Flink guarantees exactly once processing.
  • 22. Distributed Snaphots Barriers flow through the topology in line with data. Flink guarantees exactly once processing. Part of snapshot
  • 23. Distributed Snaphots Barriers flow through the topology in line with data. Flink guarantees exactly once processing. Part of snapshot
  • 24. Distributed Snaphots Barriers flow through the topology in line with data. Flink guarantees exactly once processing. Part of snapshot
  • 25. Distributed Snaphots Barriers flow through the topology in line with data. Flink guarantees exactly once processing. Part of snapshot
  • 26. Distributed Snaphots Barriers flow through the topology in line with data. Flink guarantees exactly once processing. Part of snapshot
  • 27. Distributed Snaphots Barriers flow through the topology in line with data. Flink guarantees exactly once processing. Part of snapshot
  • 28. Distributed Snaphots Barriers flow through the topology in line with data. Flink guarantees exactly once processing. Part of snapshot
  • 29. Distributed Snaphots Barriers flow through the topology in line with data. Flink guarantees exactly once processing. Part of snapshot
  • 30. Distributed Snaphots Barriers flow through the topology in line with data. Flink guarantees exactly once processing. Part of snapshot
  • 31. Distributed Snaphots Barriers flow through the topology in line with data. Flink guarantees exactly once processing. Part of snapshot
  • 32. Distributed Snaphots Flink guarantees exactly once processing. 
 JobManager Master State Backend Ceckpoint Data Source 1: State 1: Source 2: State 2: Source 3: Sink 1: Source 4: Sink 2: Offset: 6791 Offset: 7252 Offset: 5589 Offset: 6843
  • 33. Distributed Snaphots Flink guarantees exactly once processing. 
 JobManager Master State Backend Ceckpoint Data Source 1: State 1: Source 2: State 2: Source 3: Sink 1: Source 4: Sink 2: Offset: 6791 Offset: 7252 Offset: 5589 Offset: 6843 Start Checkpoint Message
  • 34. Distributed Snaphots Flink guarantees exactly once processing. 
 JobManager Master State Backend Ceckpoint Data Source 1: 6791 State 1: Source 2: 7252 State 2: Source 3: 5589 Sink 1: Source 4: 6843 Sink 2: Emit Barriers Acknowledge with Position
  • 35. Distributed Snaphots Flink guarantees exactly once processing. 
 JobManager Master State Backend Ceckpoint Data Source 1: 6791 State 1: Source 2: 7252 State 2: Source 3: 5589 Sink 1: Source 4: 6843 Sink 2: Received barrier at each input
  • 36. Distributed Snaphots Flink guarantees exactly once processing. 
 JobManager Master State Backend Ceckpoint Data Source 1: 6791 State 1: Source 2: 7252 State 2: Source 3: 5589 Sink 1: Source 4: 6843 Sink 2: s1 Write Snapshot of its state Received barrier at each input
  • 37. Distributed Snaphots Flink guarantees exactly once processing. 
 JobManager Master State Backend Ceckpoint Data Source 1: 6791 State 1: PTR1 Source 2: 7252 State 2: PTR2 Source 3: 5589 Sink 1: Source 4: 6843 Sink 2: s1 Acknowledge with pointer to state s2
  • 38. Distributed Snaphots Flink guarantees exactly once processing. 
 JobManager Master State Backend Ceckpoint Data Source 1: 6791 State 1: PTR1 Source 2: 7252 State 2: PTR2 Source 3: 5589 Sink 1: ACK Source 4: 6843 Sink 2: ACK s1 s2 Acknowledge Checkpoint Received barrier at each input
  • 39. Distributed Snaphots Flink guarantees exactly once processing. 
 JobManager Master State Backend Ceckpoint Data Source 1: 6791 State 1: PTR1 Source 2: 7252 State 2: PTR2 Source 3: 5589 Sink 1: ACK Source 4: 6843 Sink 2: ACK s1 s2
  • 40. Operator State User-defined state • Flink’s transformations are long running operators • Feel free to keep objects around • Hooks to include into system’s checkpoint Windowed streams • Time, count, and data-driven windows • Managed by the system
  • 41. Batch on Streaming DataStream API Unbounded Data DataSet API Bounded Data Runtime Distributed Streaming Data Flow Libraries Machine Learning · Graph Processing · SQL-like API
  • 42. Batch on Streaming Run a bounded stream (data set) on
 a stream processor. Bounded data set Unbounded data stream
  • 43. Batch on Streaming Stream Windows Pipelined Data Exchange Global View Pipelined or Blocking Data Exchange Infinite Streams Finite Streams Run a bounded stream (data set) on
 a stream processor.
  • 46. Batch Pipelines Data exchange
 is mostly streamed Some operators block (e.g. sort, hash table)
  • 47. DataSet API ExecutionEnvironment env = ExecutionEnvironment
 .getExecutionEnvironment() DataSet<String> data = env.fromElements( "O Romeo, Romeo! wherefore art thou Romeo?”, ...); // DataSet WordCount DataSet<Tuple2<String, Integer>> counts = data .flatMap(new SplitByWhitespace()) // (word, 1) .groupBy(0) // [word, [1, 1, …]] .sum(1); // sum per word for all occurrences counts.print();
  • 48. DataStream API ExecutionEnvironment env = ExecutionEnvironment
 .getExecutionEnvironment() DataSet<String> data = env.fromElements( "O Romeo, Romeo! wherefore art thou Romeo?”, ...); // DataSet WordCount DataSet<Tuple2<String, Integer>> counts = data .flatMap(new SplitByWhitespace()) // (word, 1) .groupBy(0) // [word, [1, 1, …]] .sum(1); // sum per word for all occurrences counts.print();
  • 49. DataStream API ExecutionEnvironment env = ExecutionEnvironment
 .getExecutionEnvironment() DataSet<String> data = env.fromElements( "O Romeo, Romeo! wherefore art thou Romeo?”, ...); // DataSet WordCount DataSet<Tuple2<String, Integer>> counts = data .flatMap(new SplitByWhitespace()) // (word, 1) .groupBy(0) // [word, [1, 1, …]] .sum(1); // sum per word for all occurrences counts.print();
  • 50. DataStream API ExecutionEnvironment env = ExecutionEnvironment
 .getExecutionEnvironment() DataSet<String> data = env.fromElements( "O Romeo, Romeo! wherefore art thou Romeo?”, ...); // DataSet WordCount DataSet<Tuple2<String, Integer>> counts = data .flatMap(new SplitByWhitespace()) // (word, 1) .groupBy(0) // [word, [1, 1, …]] .sum(1); // sum per word for all occurrences counts.print();
  • 51. DataStream API ExecutionEnvironment env = ExecutionEnvironment
 .getExecutionEnvironment() DataSet<String> data = env.fromElements( "O Romeo, Romeo! wherefore art thou Romeo?”, ...); // DataSet WordCount DataSet<Tuple2<String, Integer>> counts = data .flatMap(new SplitByWhitespace()) // (word, 1) .groupBy(0) // [word, [1, 1, …]] .sum(1); // sum per word for all occurrences counts.print();
  • 52. DataStream API ExecutionEnvironment env = ExecutionEnvironment
 .getExecutionEnvironment() DataSet<String> data = env.fromElements( "O Romeo, Romeo! wherefore art thou Romeo?”, ...); // DataSet WordCount DataSet<Tuple2<String, Integer>> counts = data .flatMap(new SplitByWhitespace()) // (word, 1) .groupBy(0) // [word, [1, 1, …]] .sum(1); // sum per word for all occurrences counts.print();
  • 53. DataStream API ExecutionEnvironment env = ExecutionEnvironment
 .getExecutionEnvironment() DataSet<String> data = env.fromElements( "O Romeo, Romeo! wherefore art thou Romeo?”, ...); // DataSet WordCount DataSet<Tuple2<String, Integer>> counts = data .flatMap(new SplitByWhitespace()) // (word, 1) .groupBy(0) // [word, [1, 1, …]] .sum(1); // sum per word for all occurrences counts.print();
  • 54. Batch-specific optimizations Managed memory • On- and off-heap memory • Internal operators (e.g. join or sort) with out-of-core support • Serialization stack for user-types Cost-based optimizer • Program adapts to changing data size
  • 55. Getting Started Project Page: http://flink.apache.org
  • 56. Getting Started Project Page: http://flink.apache.org Quickstarts: Java & Scala API
  • 57. Getting Started Project Page: http://flink.apache.org Docs: Programming Guides
  • 58. Getting Started Project Page: http://flink.apache.org Get Involved: Mailing Lists, Stack Overflow, IRC, …