SlideShare a Scribd company logo
1 of 34
Flink 0.10
Graduating streaming
Márton Balassi
mbalassi@apache.org / @MartonBalassi
Hungarian Academy of Sciences
Streaming in Flink 0.10
• Operational readiness
High Availability
Monitoring
Integration with other systems
• First-class support for event-time
• Hardened statefulness support
• Redefined API
Streaming in Flink 0.10
• Some breaking changes
GroupBy -> KeyBy
Windowing API completely changed
DataStream and alike naming
Internal rewrite
The goal is to harden for 1.0
API in one take 
Windowing
• Why put your data into windows?
• That is why:
Streaming data never stops
Window (5 min)
Count #Hashtags
Just saw #Trump on #CNN,
super cool. :D
Trump: 2394
Cheese: 12984
Money: 42
7
What I didn’t mention
• tweets have a timestamp,
their event time
• tweets from across the globe
arrive with delay
=> tweets with different
timestamps arrive out-of-order
Window (5 min)
Count #Hashtags
12:34 (13.10.2015):
Just saw #Trump on #CNN,
super cool. :D
Trump: 2394
Cheese: 12984
Money: 42
These arrive with 3
minutes slack
Form windows based on
processing time of the
machine.
Processing Time != Event Time
8
9
Why do people use this?
• easy to implement
• low latency
• this is what systems give you
(Spark Streaming, Apex,
Samza, Storm)*
*not Google Cloud Dataflow
10
Lets look at a more
complex example.
11
Window (5 min)
Correlate Tweets
and News
something...
These still have 3 min slack.
These have 8 min slack.
12:33 (13.10.2015):
Donald Trump speaks at
Cheese conference.
Processing Time != Event Time
Processing Time != Event Time
=> Mismatch in the
timespace continuum
13
Use cases
• out-of-order elements
• sources with delay
• recovery/fault-tolerance
• “catching up” with a stream
Who does it?
• Google Cloud Dataflow
• Apache Flink
14
How can we do this?
15
We need a
Global Clock
that runs on
event time
instead of
processing time.
16
This is a source
This is our window operator
1
0
0
0 0
1
2
1
2
1
1
This is the current event-time time
2
2
2
2
2
This is a watermark.
17
Now, show me the API!
18
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(ProcessingTime);
DataStream<Tweet> text = env.addSource(new TwitterSrc());
DataStream<Tuple2<String, Integer>> counts = text
.flatMap(new ExtractHashtags())
.keyBy(“name”)
.timeWindow(Time.of(5, MINUTES)
.apply(new HashtagCounter());
Processing Time
19
Event Time
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(EventTime);
DataStream<Tweet> text = env.addSource(new TwitterSrc());
text = text.assignTimestamps(new MyTimestampExtractor());
DataStream<Tuple2<String, Integer>> counts = text
.flatMap(new ExtractHashtags())
.keyBy(“name”)
.timeWindow(Time.of(5, MINUTES)
.apply(new HashtagCounter());
Fault tolerance in streaming
Fault-tolerance in streaming systems is inherently harder than in batch
• Can’t just restart computation
• State is a problem
• Fast recovery is crucial
• Streaming topologies run 24/7 for a long period
Fault-tolerance is a complex issue
• No single point of failure is allowed
• Guaranteeing input processing
• Consistent operator state
• Fast recovery
• At-least-once vs Exactly-once semantics
High Availability
Consistency - Flink distributed snapshots
Based on consistent global snapshots
Algorithm designed for stateful dataflows (minimal runtime
overhead)
Exactly-once semantics
Stateful streaming applications
ETL style operations
Filter incoming data,
Log analysis
High throughput, connectors, at-least-
once processing
Window aggregations
Trending tweets,
User sessions, Stream joins
Window abstractions
Inpu
t
Inpu
t
Inpu
tInput
Process/Enrich
Stateful streaming applications
Machine learning
Fitting trends to the evolving
stream, Stream clustering
Model state, cyclic flows
Pattern recognition
Fraud detection, Triggering signals
based on activity
Exactly-once processing
Statefulness in 0.9.1
Stateful dataflow operators (conceptually similar to Samza)
Two state access patterns
Local (Task) state
Partitioned (Key) state
Proper API integration
Java: OperatorState interface
Scala: mapWithState, flatMapWithState…
Exactly-once semantics by checkpointing
Stateful API
words.keyBy(x => x).mapWithState {
(word, count: Option[Int]) =>
{
val newCount = count.getOrElse(0) + 1
val output = (word, newCount)
(output, Some(newCount))
}
}
Local state example (Java)
public class MySource extends RichParallelSourceFunction {
// Omitted details
private OperatorState<Long> offset;
@Override
public void run(SourceContext ctx) {
Object checkpointLock = ctx.getCheckpointLock();
isRunning = true;
while (isRunning) {
synchronized (checkpointLock) {
offset.update(offset.value() + 1);
//ctx.collect(next);
}
}
}
}
Statefulness in 0.10
Internal operators are checkpointed
Aggregations
Window operators
…
KeyValue state
Easing common acces patterns
Flexible state backend interface
Removes non-partitioned operator state
Improved monitoring
Batch and streaming
Batch and streaming
Integration (not complete)
Summary - Streaming in Flink 0.10
• Operational readiness
High Availability
Monitoring
Integration with other systems
• First-class support for event-time
• Hardened statefulness support
• Redefined API
Thanks for the slides
• Material borrowed from:
flink.apache.org
Stephan Ewen
Aljoscha Krettek
Gyula Fóra

More Related Content

What's hot

Prometheus (Monitorama 2016)
Prometheus (Monitorama 2016)Prometheus (Monitorama 2016)
Prometheus (Monitorama 2016)Brian Brazil
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controllerconfluent
 
Telemetry indepth
Telemetry indepthTelemetry indepth
Telemetry indepthTianyou Li
 
Framingham Go Meetup - October 2016
Framingham Go Meetup - October 2016Framingham Go Meetup - October 2016
Framingham Go Meetup - October 2016Matthew Broberg
 
Kafka Summit NYC 2017 - Deep Dive Into Apache Kafka
Kafka Summit NYC 2017 - Deep Dive Into Apache KafkaKafka Summit NYC 2017 - Deep Dive Into Apache Kafka
Kafka Summit NYC 2017 - Deep Dive Into Apache Kafkaconfluent
 
Everything You Thought You Already Knew About Orchestration
Everything You Thought You Already Knew About OrchestrationEverything You Thought You Already Knew About Orchestration
Everything You Thought You Already Knew About OrchestrationLaura Frank Tacho
 
SignalFx: Making Cassandra Perform as a Time Series Database
SignalFx: Making Cassandra Perform as a Time Series DatabaseSignalFx: Making Cassandra Perform as a Time Series Database
SignalFx: Making Cassandra Perform as a Time Series DatabaseDataStax Academy
 
CNIT 124: Ch 8: Exploitation
CNIT 124: Ch 8: ExploitationCNIT 124: Ch 8: Exploitation
CNIT 124: Ch 8: ExploitationSam Bowne
 
Prometheus Training
Prometheus TrainingPrometheus Training
Prometheus TrainingTim Tyler
 
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. SaxIntroducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. SaxDatabricks
 
Deep Dive into Apache Kafka
Deep Dive into Apache KafkaDeep Dive into Apache Kafka
Deep Dive into Apache Kafkaconfluent
 
Automate Cisco Switch Configuration Backups with KRON
Automate Cisco Switch Configuration Backups with KRONAutomate Cisco Switch Configuration Backups with KRON
Automate Cisco Switch Configuration Backups with KRONTravis Kench
 
Understanding and Extending Prometheus AlertManager
Understanding and Extending Prometheus AlertManagerUnderstanding and Extending Prometheus AlertManager
Understanding and Extending Prometheus AlertManagerLee Calcote
 
"Enabling Googley microservices with gRPC." at Devoxx France 2017
"Enabling Googley microservices with gRPC." at Devoxx France 2017"Enabling Googley microservices with gRPC." at Devoxx France 2017
"Enabling Googley microservices with gRPC." at Devoxx France 2017Alex Borysov
 
Debugging Complex Systems - Erlang Factory SF 2015
Debugging Complex Systems - Erlang Factory SF 2015Debugging Complex Systems - Erlang Factory SF 2015
Debugging Complex Systems - Erlang Factory SF 2015lpgauth
 
Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)Brian Brazil
 
Performance Tuning - Memory leaks, Thread deadlocks, JDK tools
Performance Tuning -  Memory leaks, Thread deadlocks, JDK toolsPerformance Tuning -  Memory leaks, Thread deadlocks, JDK tools
Performance Tuning - Memory leaks, Thread deadlocks, JDK toolsHaribabu Nandyal Padmanaban
 
"Enabling Googley microservices with gRPC" at JEEConf 2017
"Enabling Googley microservices with gRPC" at JEEConf 2017"Enabling Googley microservices with gRPC" at JEEConf 2017
"Enabling Googley microservices with gRPC" at JEEConf 2017Alex Borysov
 
Tuning TCP and NGINX on EC2
Tuning TCP and NGINX on EC2Tuning TCP and NGINX on EC2
Tuning TCP and NGINX on EC2Chartbeat
 
Using RabbitMQ and Netty library to implement RPC protocol
Using RabbitMQ and Netty library to implement RPC protocolUsing RabbitMQ and Netty library to implement RPC protocol
Using RabbitMQ and Netty library to implement RPC protocolTho Q Luong Luong
 

What's hot (20)

Prometheus (Monitorama 2016)
Prometheus (Monitorama 2016)Prometheus (Monitorama 2016)
Prometheus (Monitorama 2016)
 
A Deep Dive into Kafka Controller
A Deep Dive into Kafka ControllerA Deep Dive into Kafka Controller
A Deep Dive into Kafka Controller
 
Telemetry indepth
Telemetry indepthTelemetry indepth
Telemetry indepth
 
Framingham Go Meetup - October 2016
Framingham Go Meetup - October 2016Framingham Go Meetup - October 2016
Framingham Go Meetup - October 2016
 
Kafka Summit NYC 2017 - Deep Dive Into Apache Kafka
Kafka Summit NYC 2017 - Deep Dive Into Apache KafkaKafka Summit NYC 2017 - Deep Dive Into Apache Kafka
Kafka Summit NYC 2017 - Deep Dive Into Apache Kafka
 
Everything You Thought You Already Knew About Orchestration
Everything You Thought You Already Knew About OrchestrationEverything You Thought You Already Knew About Orchestration
Everything You Thought You Already Knew About Orchestration
 
SignalFx: Making Cassandra Perform as a Time Series Database
SignalFx: Making Cassandra Perform as a Time Series DatabaseSignalFx: Making Cassandra Perform as a Time Series Database
SignalFx: Making Cassandra Perform as a Time Series Database
 
CNIT 124: Ch 8: Exploitation
CNIT 124: Ch 8: ExploitationCNIT 124: Ch 8: Exploitation
CNIT 124: Ch 8: Exploitation
 
Prometheus Training
Prometheus TrainingPrometheus Training
Prometheus Training
 
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. SaxIntroducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax
Introducing Exactly Once Semantics in Apache Kafka with Matthias J. Sax
 
Deep Dive into Apache Kafka
Deep Dive into Apache KafkaDeep Dive into Apache Kafka
Deep Dive into Apache Kafka
 
Automate Cisco Switch Configuration Backups with KRON
Automate Cisco Switch Configuration Backups with KRONAutomate Cisco Switch Configuration Backups with KRON
Automate Cisco Switch Configuration Backups with KRON
 
Understanding and Extending Prometheus AlertManager
Understanding and Extending Prometheus AlertManagerUnderstanding and Extending Prometheus AlertManager
Understanding and Extending Prometheus AlertManager
 
"Enabling Googley microservices with gRPC." at Devoxx France 2017
"Enabling Googley microservices with gRPC." at Devoxx France 2017"Enabling Googley microservices with gRPC." at Devoxx France 2017
"Enabling Googley microservices with gRPC." at Devoxx France 2017
 
Debugging Complex Systems - Erlang Factory SF 2015
Debugging Complex Systems - Erlang Factory SF 2015Debugging Complex Systems - Erlang Factory SF 2015
Debugging Complex Systems - Erlang Factory SF 2015
 
Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)Prometheus (Microsoft, 2016)
Prometheus (Microsoft, 2016)
 
Performance Tuning - Memory leaks, Thread deadlocks, JDK tools
Performance Tuning -  Memory leaks, Thread deadlocks, JDK toolsPerformance Tuning -  Memory leaks, Thread deadlocks, JDK tools
Performance Tuning - Memory leaks, Thread deadlocks, JDK tools
 
"Enabling Googley microservices with gRPC" at JEEConf 2017
"Enabling Googley microservices with gRPC" at JEEConf 2017"Enabling Googley microservices with gRPC" at JEEConf 2017
"Enabling Googley microservices with gRPC" at JEEConf 2017
 
Tuning TCP and NGINX on EC2
Tuning TCP and NGINX on EC2Tuning TCP and NGINX on EC2
Tuning TCP and NGINX on EC2
 
Using RabbitMQ and Netty library to implement RPC protocol
Using RabbitMQ and Netty library to implement RPC protocolUsing RabbitMQ and Netty library to implement RPC protocol
Using RabbitMQ and Netty library to implement RPC protocol
 

Viewers also liked

S Lattanzio El Uruguay Frente Al Cambio Climatico
S Lattanzio El Uruguay Frente Al Cambio ClimaticoS Lattanzio El Uruguay Frente Al Cambio Climatico
S Lattanzio El Uruguay Frente Al Cambio ClimaticoSergio Lattanzio
 
Boletín Informativo de MERCATENERIFE Nº24
Boletín Informativo de MERCATENERIFE Nº24Boletín Informativo de MERCATENERIFE Nº24
Boletín Informativo de MERCATENERIFE Nº24MERCATENERIFE
 
Isomeria plana exercicios
Isomeria plana exerciciosIsomeria plana exercicios
Isomeria plana exerciciosFernando Lucas
 
Apresentacao l essence laranjeiras final
Apresentacao l essence laranjeiras  finalApresentacao l essence laranjeiras  final
Apresentacao l essence laranjeiras finalAristides Alves
 
Web2.0 ff in der Unternehmenskommunikation
Web2.0 ff in der UnternehmenskommunikationWeb2.0 ff in der Unternehmenskommunikation
Web2.0 ff in der UnternehmenskommunikationMartin Michelson
 
Cha de oliveira com colageno, combinação perfeita
Cha de oliveira com colageno, combinação perfeitaCha de oliveira com colageno, combinação perfeita
Cha de oliveira com colageno, combinação perfeitaOliveira
 
ESSENS Parfemi i Kozmetika - ID 70003129 - Hrvatska - Croatia
ESSENS Parfemi i Kozmetika - ID 70003129 - Hrvatska - CroatiaESSENS Parfemi i Kozmetika - ID 70003129 - Hrvatska - Croatia
ESSENS Parfemi i Kozmetika - ID 70003129 - Hrvatska - CroatiaEssensHrvatskaID70003129
 
Chapitre projection pour tronc commun bac international marocain
Chapitre projection pour tronc commun bac international marocainChapitre projection pour tronc commun bac international marocain
Chapitre projection pour tronc commun bac international marocainAHMED ENNAJI
 
New Business Development Proposal - Adding Project Portfolio Management (PPM)...
New Business Development Proposal - Adding Project Portfolio Management (PPM)...New Business Development Proposal - Adding Project Portfolio Management (PPM)...
New Business Development Proposal - Adding Project Portfolio Management (PPM)...Rolly Perreaux, PMP
 
6.2 - The Roman Empire
6.2 - The Roman Empire6.2 - The Roman Empire
6.2 - The Roman EmpireDan Ewert
 
Arbeiten in England
Arbeiten in EnglandArbeiten in England
Arbeiten in EnglandKulturwerkeD
 

Viewers also liked (13)

S Lattanzio El Uruguay Frente Al Cambio Climatico
S Lattanzio El Uruguay Frente Al Cambio ClimaticoS Lattanzio El Uruguay Frente Al Cambio Climatico
S Lattanzio El Uruguay Frente Al Cambio Climatico
 
Boletín Informativo de MERCATENERIFE Nº24
Boletín Informativo de MERCATENERIFE Nº24Boletín Informativo de MERCATENERIFE Nº24
Boletín Informativo de MERCATENERIFE Nº24
 
Isomeria plana exercicios
Isomeria plana exerciciosIsomeria plana exercicios
Isomeria plana exercicios
 
Apresentacao l essence laranjeiras final
Apresentacao l essence laranjeiras  finalApresentacao l essence laranjeiras  final
Apresentacao l essence laranjeiras final
 
Web2.0 ff in der Unternehmenskommunikation
Web2.0 ff in der UnternehmenskommunikationWeb2.0 ff in der Unternehmenskommunikation
Web2.0 ff in der Unternehmenskommunikation
 
Cha de oliveira com colageno, combinação perfeita
Cha de oliveira com colageno, combinação perfeitaCha de oliveira com colageno, combinação perfeita
Cha de oliveira com colageno, combinação perfeita
 
Art20
Art20Art20
Art20
 
ESSENS Parfemi i Kozmetika - ID 70003129 - Hrvatska - Croatia
ESSENS Parfemi i Kozmetika - ID 70003129 - Hrvatska - CroatiaESSENS Parfemi i Kozmetika - ID 70003129 - Hrvatska - Croatia
ESSENS Parfemi i Kozmetika - ID 70003129 - Hrvatska - Croatia
 
Chapitre projection pour tronc commun bac international marocain
Chapitre projection pour tronc commun bac international marocainChapitre projection pour tronc commun bac international marocain
Chapitre projection pour tronc commun bac international marocain
 
New Business Development Proposal - Adding Project Portfolio Management (PPM)...
New Business Development Proposal - Adding Project Portfolio Management (PPM)...New Business Development Proposal - Adding Project Portfolio Management (PPM)...
New Business Development Proposal - Adding Project Portfolio Management (PPM)...
 
6.2 - The Roman Empire
6.2 - The Roman Empire6.2 - The Roman Empire
6.2 - The Roman Empire
 
Lieux de mémoire de wallonie namur
Lieux de mémoire de wallonie namurLieux de mémoire de wallonie namur
Lieux de mémoire de wallonie namur
 
Arbeiten in England
Arbeiten in EnglandArbeiten in England
Arbeiten in England
 

Similar to Graduating Flink Streaming - Chicago meetup

Apache Flink: Streaming Done Right @ FOSDEM 2016
Apache Flink: Streaming Done Right @ FOSDEM 2016Apache Flink: Streaming Done Right @ FOSDEM 2016
Apache Flink: Streaming Done Right @ FOSDEM 2016Till Rohrmann
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Stephan Ewen
 
Advanced WCF Workshop
Advanced WCF WorkshopAdvanced WCF Workshop
Advanced WCF WorkshopIdo Flatow
 
Flink Forward SF 2017: Konstantinos Kloudas - Extending Flink’s Streaming APIs
Flink Forward SF 2017: Konstantinos Kloudas -  Extending Flink’s Streaming APIsFlink Forward SF 2017: Konstantinos Kloudas -  Extending Flink’s Streaming APIs
Flink Forward SF 2017: Konstantinos Kloudas - Extending Flink’s Streaming APIsFlink Forward
 
K. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward KeynoteK. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward KeynoteFlink Forward
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkDataWorks Summit
 
High Performance Architecture Patterns
High Performance Architecture PatternsHigh Performance Architecture Patterns
High Performance Architecture PatternsCelso Crivelaro
 
High Performance Architecture Patterns
High Performance Architecture PatternsHigh Performance Architecture Patterns
High Performance Architecture PatternsCelso Crivelaro
 
Flink Forward San Francisco 2019: Real-time Processing with Flink for Machine...
Flink Forward San Francisco 2019: Real-time Processing with Flink for Machine...Flink Forward San Francisco 2019: Real-time Processing with Flink for Machine...
Flink Forward San Francisco 2019: Real-time Processing with Flink for Machine...Flink Forward
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large ScaleVerverica
 
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache FlinkUnifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache FlinkDataWorks Summit/Hadoop Summit
 
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...rschuppe
 
Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorAljoscha Krettek
 
Top Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your PipelineTop Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your PipelineAndreas Grabner
 
Back to FME School - Day 2: Your Data and FME
Back to FME School - Day 2: Your Data and FMEBack to FME School - Day 2: Your Data and FME
Back to FME School - Day 2: Your Data and FMESafe Software
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasMonal Daxini
 
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...Ververica
 
Predictive Maintenance with Deep Learning and Apache Flink
Predictive Maintenance with Deep Learning and Apache FlinkPredictive Maintenance with Deep Learning and Apache Flink
Predictive Maintenance with Deep Learning and Apache FlinkDongwon Kim
 
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...Ververica
 

Similar to Graduating Flink Streaming - Chicago meetup (20)

Apache Flink: Streaming Done Right @ FOSDEM 2016
Apache Flink: Streaming Done Right @ FOSDEM 2016Apache Flink: Streaming Done Right @ FOSDEM 2016
Apache Flink: Streaming Done Right @ FOSDEM 2016
 
Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)Flink 0.10 @ Bay Area Meetup (October 2015)
Flink 0.10 @ Bay Area Meetup (October 2015)
 
Advanced WCF Workshop
Advanced WCF WorkshopAdvanced WCF Workshop
Advanced WCF Workshop
 
Flink Forward SF 2017: Konstantinos Kloudas - Extending Flink’s Streaming APIs
Flink Forward SF 2017: Konstantinos Kloudas -  Extending Flink’s Streaming APIsFlink Forward SF 2017: Konstantinos Kloudas -  Extending Flink’s Streaming APIs
Flink Forward SF 2017: Konstantinos Kloudas - Extending Flink’s Streaming APIs
 
K. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward KeynoteK. Tzoumas & S. Ewen – Flink Forward Keynote
K. Tzoumas & S. Ewen – Flink Forward Keynote
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
High Performance Architecture Patterns
High Performance Architecture PatternsHigh Performance Architecture Patterns
High Performance Architecture Patterns
 
High Performance Architecture Patterns
High Performance Architecture PatternsHigh Performance Architecture Patterns
High Performance Architecture Patterns
 
Flink Forward San Francisco 2019: Real-time Processing with Flink for Machine...
Flink Forward San Francisco 2019: Real-time Processing with Flink for Machine...Flink Forward San Francisco 2019: Real-time Processing with Flink for Machine...
Flink Forward San Francisco 2019: Real-time Processing with Flink for Machine...
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache FlinkUnifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
Unifying Stream, SWL and CEP for Declarative Stream Processing with Apache Flink
 
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
Application Performance Troubleshooting 1x1 - Part 2 - Noch mehr Schweine und...
 
Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream Processor
 
Top Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your PipelineTop Java Performance Problems and Metrics To Check in Your Pipeline
Top Java Performance Problems and Metrics To Check in Your Pipeline
 
Back to FME School - Day 2: Your Data and FME
Back to FME School - Day 2: Your Data and FMEBack to FME School - Day 2: Your Data and FME
Back to FME School - Day 2: Your Data and FME
 
Http2 in practice
Http2 in practiceHttp2 in practice
Http2 in practice
 
Flink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paasFlink forward-2017-netflix keystones-paas
Flink forward-2017-netflix keystones-paas
 
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
Aljoscha Krettek - Apache Flink® and IoT: How Stateful Event-Time Processing ...
 
Predictive Maintenance with Deep Learning and Apache Flink
Predictive Maintenance with Deep Learning and Apache FlinkPredictive Maintenance with Deep Learning and Apache Flink
Predictive Maintenance with Deep Learning and Apache Flink
 
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
 

Recently uploaded

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxupamatechverse
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations120cr0395
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGSIVASHANKAR N
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...ranjana rawat
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdfKamal Acharya
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfKamal Acharya
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdfKamal Acharya
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...ranjana rawat
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxAsutosh Ranjan
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Christo Ananth
 

Recently uploaded (20)

Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Introduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptxIntroduction to Multiple Access Protocol.pptx
Introduction to Multiple Access Protocol.pptx
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsRussian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
Russian Call Girls in Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Extrusion Processes and Their Limitations
Extrusion Processes and Their LimitationsExtrusion Processes and Their Limitations
Extrusion Processes and Their Limitations
 
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTINGMANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
MANUFACTURING PROCESS-II UNIT-1 THEORY OF METAL CUTTING
 
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
The Most Attractive Pune Call Girls Budhwar Peth 8250192130 Will You Miss Thi...
 
Online banking management system project.pdf
Online banking management system project.pdfOnline banking management system project.pdf
Online banking management system project.pdf
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
The Most Attractive Pune Call Girls Manchar 8250192130 Will You Miss This Cha...
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
Call for Papers - African Journal of Biological Sciences, E-ISSN: 2663-2187, ...
 

Graduating Flink Streaming - Chicago meetup

  • 1. Flink 0.10 Graduating streaming Márton Balassi mbalassi@apache.org / @MartonBalassi Hungarian Academy of Sciences
  • 2. Streaming in Flink 0.10 • Operational readiness High Availability Monitoring Integration with other systems • First-class support for event-time • Hardened statefulness support • Redefined API
  • 3. Streaming in Flink 0.10 • Some breaking changes GroupBy -> KeyBy Windowing API completely changed DataStream and alike naming Internal rewrite The goal is to harden for 1.0
  • 4. API in one take 
  • 5. Windowing • Why put your data into windows? • That is why:
  • 6. Streaming data never stops Window (5 min) Count #Hashtags Just saw #Trump on #CNN, super cool. :D Trump: 2394 Cheese: 12984 Money: 42
  • 7. 7 What I didn’t mention • tweets have a timestamp, their event time • tweets from across the globe arrive with delay => tweets with different timestamps arrive out-of-order
  • 8. Window (5 min) Count #Hashtags 12:34 (13.10.2015): Just saw #Trump on #CNN, super cool. :D Trump: 2394 Cheese: 12984 Money: 42 These arrive with 3 minutes slack Form windows based on processing time of the machine. Processing Time != Event Time 8
  • 9. 9 Why do people use this? • easy to implement • low latency • this is what systems give you (Spark Streaming, Apex, Samza, Storm)* *not Google Cloud Dataflow
  • 10. 10 Lets look at a more complex example.
  • 11. 11 Window (5 min) Correlate Tweets and News something... These still have 3 min slack. These have 8 min slack. 12:33 (13.10.2015): Donald Trump speaks at Cheese conference. Processing Time != Event Time
  • 12. Processing Time != Event Time => Mismatch in the timespace continuum
  • 13. 13 Use cases • out-of-order elements • sources with delay • recovery/fault-tolerance • “catching up” with a stream Who does it? • Google Cloud Dataflow • Apache Flink
  • 14. 14 How can we do this?
  • 15. 15 We need a Global Clock that runs on event time instead of processing time.
  • 16. 16 This is a source This is our window operator 1 0 0 0 0 1 2 1 2 1 1 This is the current event-time time 2 2 2 2 2 This is a watermark.
  • 17. 17 Now, show me the API!
  • 18. 18 StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setStreamTimeCharacteristic(ProcessingTime); DataStream<Tweet> text = env.addSource(new TwitterSrc()); DataStream<Tuple2<String, Integer>> counts = text .flatMap(new ExtractHashtags()) .keyBy(“name”) .timeWindow(Time.of(5, MINUTES) .apply(new HashtagCounter()); Processing Time
  • 19. 19 Event Time StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment(); env.setStreamTimeCharacteristic(EventTime); DataStream<Tweet> text = env.addSource(new TwitterSrc()); text = text.assignTimestamps(new MyTimestampExtractor()); DataStream<Tuple2<String, Integer>> counts = text .flatMap(new ExtractHashtags()) .keyBy(“name”) .timeWindow(Time.of(5, MINUTES) .apply(new HashtagCounter());
  • 20. Fault tolerance in streaming Fault-tolerance in streaming systems is inherently harder than in batch • Can’t just restart computation • State is a problem • Fast recovery is crucial • Streaming topologies run 24/7 for a long period Fault-tolerance is a complex issue • No single point of failure is allowed • Guaranteeing input processing • Consistent operator state • Fast recovery • At-least-once vs Exactly-once semantics
  • 22. Consistency - Flink distributed snapshots Based on consistent global snapshots Algorithm designed for stateful dataflows (minimal runtime overhead) Exactly-once semantics
  • 23. Stateful streaming applications ETL style operations Filter incoming data, Log analysis High throughput, connectors, at-least- once processing Window aggregations Trending tweets, User sessions, Stream joins Window abstractions Inpu t Inpu t Inpu tInput Process/Enrich
  • 24. Stateful streaming applications Machine learning Fitting trends to the evolving stream, Stream clustering Model state, cyclic flows Pattern recognition Fraud detection, Triggering signals based on activity Exactly-once processing
  • 25. Statefulness in 0.9.1 Stateful dataflow operators (conceptually similar to Samza) Two state access patterns Local (Task) state Partitioned (Key) state Proper API integration Java: OperatorState interface Scala: mapWithState, flatMapWithState… Exactly-once semantics by checkpointing
  • 26. Stateful API words.keyBy(x => x).mapWithState { (word, count: Option[Int]) => { val newCount = count.getOrElse(0) + 1 val output = (word, newCount) (output, Some(newCount)) } }
  • 27. Local state example (Java) public class MySource extends RichParallelSourceFunction { // Omitted details private OperatorState<Long> offset; @Override public void run(SourceContext ctx) { Object checkpointLock = ctx.getCheckpointLock(); isRunning = true; while (isRunning) { synchronized (checkpointLock) { offset.update(offset.value() + 1); //ctx.collect(next); } } } }
  • 28. Statefulness in 0.10 Internal operators are checkpointed Aggregations Window operators … KeyValue state Easing common acces patterns Flexible state backend interface Removes non-partitioned operator state
  • 33. Summary - Streaming in Flink 0.10 • Operational readiness High Availability Monitoring Integration with other systems • First-class support for event-time • Hardened statefulness support • Redefined API
  • 34. Thanks for the slides • Material borrowed from: flink.apache.org Stephan Ewen Aljoscha Krettek Gyula Fóra

Editor's Notes

  1. Slack is the amount of time by which elements arrive late.
  2. Catching up, for example with elements in Kafka, you would still want correct windows based on timestamp in elements.