SlideShare una empresa de Scribd logo
1 de 41
1
Kostas Tzoumas
@kostas_tzoumas
Big Data Ldn
November 4, 2016
Stream Processing with Apache
Flink®
2
Kostas Tzoumas
@kostas_tzoumas
Big Data Ldn
November 4, 2016
Debunking Some Common Myths in
Stream Processing
3
Original creators of Apache
Flink®
Providers of the
dA Platform, a supported
Flink distribution
Outline
 What is data streaming
 Myth 1: The throughput/latency tradeoff
 Myth 2: Exactly once not possible
 Myth 3: Streaming is for (near) real-time
 Myth 4: Streaming is hard
4
The streaming architecture
5
6
Reconsideration of data architecture
 Better app isolation
 More real-time reaction to events
 Robust continuous applications
 Process both real-time and historical data
7
app state
app state
app state
event log
Query
service
What is (distributed) streaming
 Computations on never-
ending “streams” of data
records (“events”)
 Stream processor
distributes the
computation in a cluster
8
Your
code
Your
code
Your
code
Your
code
What is stateful streaming
 Computation and state
• E.g., counters, windows of past
events, state machines, trained ML
models
 Result depends on history of
stream
 Stateful stream processor gives
the tools to manage state
• Recover, roll back, version,
upgrade, etc
9
Your
code
state
What is event-time streaming
 Data records associated with
timestamps (time series data)
 Processing depends on timestamps
 Event-time stream processor gives
you the tools to reason about time
• E.g., handle streams that are out of
order
• Core feature is watermarks – a clock
to measure event time
10
Your
code
state
t3 t1 t2t4 t1-t2 t3-t4
What is streaming
 Continuous processing on data that is
continuously generated
 I.e., pretty much all “big” data
 It’s all about state and time
11
Debunking some common stream
processing myths
12
Myth 1: Throughput/latency tradeoff
 Myth 1: you need to choose between high
throughput or low latency
 Physical limits
• In reality, network determines both the achievable
throughput and latency
• A well-engineered system achieves these limits
13
Flink performance
 10s of millions events per seconds in 10s of nodes
 scaled to 1000s of nodes
 with latency in single-digit milliseconds
14
Myth 2: Exactly once not possible
 Exactly once: under failures, system computes result
as if there was no failure
 In contrast to:
• At most once: no guarantees
• At least once: duplicates possible
 Exactly once state versus exactly once delivery
 Myth 2: Exactly once state not possible/too costly
15
Transactions
 “Exactly once” is transactions: either all
actions succeed or none succeed
 Transactions are possible
 Transactions are useful
 Let’s not start eventual consistency all over
again…
16
Flink checkpoints
 Periodic asynchronous consistent snapshots of
application state
 Provide exactly-once state guarantees under failures
17
9/2/2016 stream_barriers.svg
checkpoint
barrier n­1
data stream
stream record
(event)
checkpoint
barrier n
newer records
part of
checkpoint n­1
part of
checkpoint n
part of
checkpoint n+1
older records
End-to-end exactly once
 Checkpoints double as transaction coordination mechanism
 Source and sink operators can take part in checkpoints
 Exactly once internally, "effectively once" end to end: e.g.,
Flink + Cassandra with idempotent updates
18
transactional sinks
State management
 Checkpoints triple as state
versioning mechanism
(savepoints)
 Go back and forth in time while
maintaining state consistency
 Ease code upgrades (Flink or
app), maintenance, migration,
and debugging, what-if
simulations, A/B tests
19
Myth 3: Streaming and real time
 Myth 3: streaming and real-time are
synonymous
 Streaming is a new model
• Essentially, state and time
• Low latency/real time is the icing on the cake
20
Low latency and high latency streams
21
2016-3-1
12:00 am
2016-3-1
1:00 am
2016-3-1
2:00 am
2016-3-11
11:00pm
2016-3-12
12:00am
2016-3-12
1:00am
2016-3-11
10:00pm
2016-3-12
2:00am
2016-3-12
3:00am…
partition
partition
Stream (low latency)
Batch
(bounded stream)
Stream (high latency)
Robust continuous applications
22
Accurate computation
 Batch processing is not an accurate
computation model for continuous data
• Misses the right concepts and primitives
• Time handling, state across batch boundaries
 Stateful stream processing a better model
• Real-time/low-latency is the icing on the cake
23
Myth 4: How hard is streaming?
 Myth 4: streaming is too hard to learn
 You are already doing streaming, just in an
ad hoc way
 Most data is unbounded and the code
changes slower than the data
• This is a streaming problem
24
It's about your data and code
 What's the form of your data?
• Unbounded (e.g., clicks, sensors, logs), or
• Bounded (e.g., ???*)
 What changes more often?
• My code changes faster than my data
• My data changes faster than my code
25
* Please help me find a great example of naturally bounded data
It's about your data and code
 If your data changes faster than your code
you have a streaming problem
• You may be solving it with hourly batch jobs
depending on someone else to create the
hourly batches
• You are probably living with inaccurate results
without knowing it
26
It's about your data and code
 If your code changes faster than your data
you have an exploration problem
• Using notebooks or other tools for quick data
exploration is a good idea
• Once your code stabilizes you will have a
streaming problem, so you might as well think
of it as such from the beginning
27
Flink in the real world
28
Flink community
 > 240 contributors, 95 contributors in Flink 1.1
 42 meetups around the world with > 15,000 members
 2x-3x growth in 2015, similar in 2016
29
Powered by Flink
30
Zalando, one of the largest ecommerce
companies in Europe, uses Flink for real-
time business process monitoring.
King, the creators of Candy Crush Saga,
uses Flink to provide data science teams
with real-time analytics.
Bouygues Telecom uses Flink for real-time
event processing over billions of Kafka
messages per day.
Alibaba, the world's largest retailer, built a
Flink-based system (Blink) to optimize
search rankings in real time.
See more at flink.apache.org/poweredby.html
30 Flink applications in production for more than one
year. 10 billion events (2TB) processed daily
Complex jobs of > 30 operators running 24/7,
processing 30 billion events daily, maintaining state
of 100s of GB with exactly-once guarantees
Largest job has > 20 operators, runs on > 5000
vCores in 1000-node cluster, processes millions of
events per second
31
32
Flink Forward 2016
Current work in Flink
34
Ongoing Flink development
35
Connectors
Session
Windows
(Stream) SQL
Library
enhancements
Metric
System
Operations
Ecosystem
Application
Features
Metrics &
Visualization
Dynamic Scaling
Savepoint
compatibility Checkpoints
to savepoints
More connectors Stream SQL
Windows
Large state
Maintenance
Fine grained
recovery
Side in-/outputs
Window DSL
Broader
Audience
Security
Mesos &
others
Dynamic Resource
Management
Authentication
Queryable State
A longer-term vision for Flink
36
Streaming use cases
Application
(Near) real-time apps
Continuous apps
Analytics on historical
data
Request/response apps
Technology
Low-latency streaming
High-latency streaming
Batch as special case of
streaming
Large queryable state
37
Request/response applications
 Queryable state: query Flink state directly instead
of pushing results in a database
 Large state support and query API coming in Flink
38
queries
In summary
 The need for streaming comes from a rethinking of
data infra architecture
• Stream processing then just becomes natural
 Debunking 4 common myths
• Myth 1: The throughput/latency tradeoff
• Myth 2: Exactly once not possible
• Myth 3: Streaming is for (near) real-time
• Myth 4: Streaming is hard
39
4
Thank you!
@kostas_tzoumas
@ApacheFlink
@dataArtisans
4
We are hiring!
data-artisans.com/careers

Más contenido relacionado

La actualidad más candente

Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Ververica
 
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkFabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Ververica
 
Flink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseFlink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San Jose
Kostas Tzoumas
 
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
Kostas Tzoumas - Apache Flink®: State of the Union and What's NextKostas Tzoumas - Apache Flink®: State of the Union and What's Next
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
Ververica
 

La actualidad más candente (20)

2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
2018-04 Kafka Summit London: Stephan Ewen - "Apache Flink and Apache Kafka fo...
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
 
Debunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream ProcessingDebunking Six Common Myths in Stream Processing
Debunking Six Common Myths in Stream Processing
 
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
Aljoscha Krettek - Apache Flink for IoT: How Event-Time Processing Enables Ea...
 
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkFabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache Flink
 
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
 
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache FlinkTzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
Tzu-Li (Gordon) Tai - Stateful Stream Processing with Apache Flink
 
Flink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San JoseFlink Streaming Hadoop Summit San Jose
Flink Streaming Hadoop Summit San Jose
 
Aljoscha Krettek - The Future of Apache Flink
Aljoscha Krettek - The Future of Apache FlinkAljoscha Krettek - The Future of Apache Flink
Aljoscha Krettek - The Future of Apache Flink
 
Stephan Ewen - Experiences running Flink at Very Large Scale
Stephan Ewen -  Experiences running Flink at Very Large ScaleStephan Ewen -  Experiences running Flink at Very Large Scale
Stephan Ewen - Experiences running Flink at Very Large Scale
 
Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016
 
Big Data Warsaw
Big Data WarsawBig Data Warsaw
Big Data Warsaw
 
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
 
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
Kostas Tzoumas - Apache Flink®: State of the Union and What's NextKostas Tzoumas - Apache Flink®: State of the Union and What's Next
Kostas Tzoumas - Apache Flink®: State of the Union and What's Next
 
Kostas Kloudas - Complex Event Processing with Flink: the state of FlinkCEP
Kostas Kloudas - Complex Event Processing with Flink: the state of FlinkCEP Kostas Kloudas - Complex Event Processing with Flink: the state of FlinkCEP
Kostas Kloudas - Complex Event Processing with Flink: the state of FlinkCEP
 
Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
 
A look at Flink 1.2
A look at Flink 1.2A look at Flink 1.2
A look at Flink 1.2
 
Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
 
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
Flink Forward Berlin 2017: Fabian Hueske - Using Stream and Batch Processing ...
 

Destacado

Destacado (20)

Stefan Richter - A look at Flink 1.2 and beyond @ Berlin Meetup
Stefan Richter - A look at Flink 1.2 and beyond @ Berlin Meetup Stefan Richter - A look at Flink 1.2 and beyond @ Berlin Meetup
Stefan Richter - A look at Flink 1.2 and beyond @ Berlin Meetup
 
Timo Walther - Table & SQL API - unified APIs for batch and stream processing
Timo Walther - Table & SQL API - unified APIs for batch and stream processingTimo Walther - Table & SQL API - unified APIs for batch and stream processing
Timo Walther - Table & SQL API - unified APIs for batch and stream processing
 
Jamie Grier - Robust Stream Processing with Apache Flink
Jamie Grier - Robust Stream Processing with Apache FlinkJamie Grier - Robust Stream Processing with Apache Flink
Jamie Grier - Robust Stream Processing with Apache Flink
 
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloadsTill Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
Till Rohrmann - Dynamic Scaling - How Apache Flink adapts to changing workloads
 
Robert Metzger - Apache Flink Community Updates November 2016 @ Berlin Meetup
Robert Metzger - Apache Flink Community Updates November 2016 @ Berlin Meetup Robert Metzger - Apache Flink Community Updates November 2016 @ Berlin Meetup
Robert Metzger - Apache Flink Community Updates November 2016 @ Berlin Meetup
 
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords   The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
The Stream Processor as the Database - Apache Flink @ Berlin buzzwords
 
Apache Flink Community Updates November 2016 @ Berlin Meetup
Apache Flink Community Updates November 2016 @ Berlin MeetupApache Flink Community Updates November 2016 @ Berlin Meetup
Apache Flink Community Updates November 2016 @ Berlin Meetup
 
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
A Data Streaming Architecture with Apache Flink (berlin Buzzwords 2016)
 
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
Apache Flink - Overview and Use cases of a Distributed Dataflow System (at pr...
 
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
 
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
Robert Metzger - Connecting Apache Flink to the World - Reviewing the streami...
 
Stephan Ewen - Scaling to large State
Stephan Ewen - Scaling to large StateStephan Ewen - Scaling to large State
Stephan Ewen - Scaling to large State
 
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
Fundamentals of Stream Processing with Apache Beam, Tyler Akidau, Frances Perry
 
Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
Apache Flink Meetup:  Sanjar Akhmedov - Joining Infinity – Windowless Stream ...Apache Flink Meetup:  Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
Apache Flink Meetup: Sanjar Akhmedov - Joining Infinity – Windowless Stream ...
 
Sessionization with Spark streaming
Sessionization with Spark streamingSessionization with Spark streaming
Sessionization with Spark streaming
 
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiApache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink
 
Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming Analytics
 
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 

Similar a Kostas Tzoumas - Stream Processing with Apache Flink®

Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
confluent
 
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
Flink Forward
 

Similar a Kostas Tzoumas - Stream Processing with Apache Flink® (20)

Debunking Common Myths in Stream Processing
Debunking Common Myths in Stream ProcessingDebunking Common Myths in Stream Processing
Debunking Common Myths in Stream Processing
 
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
Apache Flink for IoT: How Event-Time Processing Enables Easy and Accurate Ana...
 
Counting Elements in Streams
Counting Elements in StreamsCounting Elements in Streams
Counting Elements in Streams
 
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
Event Driven Architecture with a RESTful Microservices Architecture (Kyle Ben...
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
 
Jay Kreps | Kafka Summit NYC 2019 Keynote (Events Everywhere) | CEO, Confluent
Jay Kreps | Kafka Summit NYC 2019 Keynote (Events Everywhere) | CEO, ConfluentJay Kreps | Kafka Summit NYC 2019 Keynote (Events Everywhere) | CEO, Confluent
Jay Kreps | Kafka Summit NYC 2019 Keynote (Events Everywhere) | CEO, Confluent
 
Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016Trivento summercamp masterclass 9/9/2016
Trivento summercamp masterclass 9/9/2016
 
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big DataVoxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
Voxxed days thessaloniki 21/10/2016 - Streaming Engines for Big Data
 
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big DataVoxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
Voxxed Days Thesaloniki 2016 - Streaming Engines for Big Data
 
Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014Introduction to Data streaming - 05/12/2014
Introduction to Data streaming - 05/12/2014
 
Flexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache FlinkFlexible and Real-Time Stream Processing with Apache Flink
Flexible and Real-Time Stream Processing with Apache Flink
 
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...
Neha Narkhede | Kafka Summit London 2019 Keynote | Event Streaming: Our Cloud...
 
Assisting millions of active users in real-time - Alexey Brodovshuk, Kcell; K...
Assisting millions of active users in real-time - Alexey Brodovshuk, Kcell; K...Assisting millions of active users in real-time - Alexey Brodovshuk, Kcell; K...
Assisting millions of active users in real-time - Alexey Brodovshuk, Kcell; K...
 
Data analytics at scale implementing stateful stream processing - publish
Data analytics at scale implementing stateful stream processing - publishData analytics at scale implementing stateful stream processing - publish
Data analytics at scale implementing stateful stream processing - publish
 
Streaming analytics state of the art
Streaming analytics state of the artStreaming analytics state of the art
Streaming analytics state of the art
 
Stream Processing with Apache Flink
Stream Processing with Apache FlinkStream Processing with Apache Flink
Stream Processing with Apache Flink
 
Data Stream Processing - Concepts and Frameworks
Data Stream Processing - Concepts and FrameworksData Stream Processing - Concepts and Frameworks
Data Stream Processing - Concepts and Frameworks
 
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
William Vambenepe – Google Cloud Dataflow and Flink , Stream Processing by De...
 
Reintroducing the Stream Processor: A universal tool for continuous data anal...
Reintroducing the Stream Processor: A universal tool for continuous data anal...Reintroducing the Stream Processor: A universal tool for continuous data anal...
Reintroducing the Stream Processor: A universal tool for continuous data anal...
 
Apache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyondApache Flink 101 - the rise of stream processing and beyond
Apache Flink 101 - the rise of stream processing and beyond
 

Más de Ververica

Fabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkFabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Ververica
 

Más de Ververica (9)

2020-05-06 Apache Flink Meetup London: The Easiest Way to Get Operational wit...
2020-05-06 Apache Flink Meetup London: The Easiest Way to Get Operational wit...2020-05-06 Apache Flink Meetup London: The Easiest Way to Get Operational wit...
2020-05-06 Apache Flink Meetup London: The Easiest Way to Get Operational wit...
 
Webinar: How to contribute to Apache Flink - Robert Metzger
Webinar:  How to contribute to Apache Flink - Robert MetzgerWebinar:  How to contribute to Apache Flink - Robert Metzger
Webinar: How to contribute to Apache Flink - Robert Metzger
 
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth WiesmanWebinar: Deep Dive on Apache Flink State - Seth Wiesman
Webinar: Deep Dive on Apache Flink State - Seth Wiesman
 
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin KnaufWebinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
Webinar: 99 Ways to Enrich Streaming Data with Apache Flink - Konstantin Knauf
 
Webinar: Detecting row patterns with Flink SQL - Dawid Wysakowicz
Webinar:  Detecting row patterns with Flink SQL - Dawid WysakowiczWebinar:  Detecting row patterns with Flink SQL - Dawid Wysakowicz
Webinar: Detecting row patterns with Flink SQL - Dawid Wysakowicz
 
Deploying Flink on Kubernetes - David Anderson
 Deploying Flink on Kubernetes - David Anderson Deploying Flink on Kubernetes - David Anderson
Deploying Flink on Kubernetes - David Anderson
 
Webinar: Flink SQL in Action - Fabian Hueske
 Webinar: Flink SQL in Action - Fabian Hueske Webinar: Flink SQL in Action - Fabian Hueske
Webinar: Flink SQL in Action - Fabian Hueske
 
2018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 2
2018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 22018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 2
2018-01 Seattle Apache Flink Meetup at OfferUp, Opening Remarks and Talk 2
 
Fabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache FlinkFabian Hueske - Stream Analytics with SQL on Apache Flink
Fabian Hueske - Stream Analytics with SQL on Apache Flink
 

Último

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
AroojKhan71
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
only4webmaster01
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
amitlee9823
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
JoseMangaJr1
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 

Último (20)

Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
Call Girls Indiranagar Just Call 👗 9155563397 👗 Top Class Call Girl Service B...
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 

Kostas Tzoumas - Stream Processing with Apache Flink®

  • 1. 1 Kostas Tzoumas @kostas_tzoumas Big Data Ldn November 4, 2016 Stream Processing with Apache Flink®
  • 2. 2 Kostas Tzoumas @kostas_tzoumas Big Data Ldn November 4, 2016 Debunking Some Common Myths in Stream Processing
  • 3. 3 Original creators of Apache Flink® Providers of the dA Platform, a supported Flink distribution
  • 4. Outline  What is data streaming  Myth 1: The throughput/latency tradeoff  Myth 2: Exactly once not possible  Myth 3: Streaming is for (near) real-time  Myth 4: Streaming is hard 4
  • 6. 6 Reconsideration of data architecture  Better app isolation  More real-time reaction to events  Robust continuous applications  Process both real-time and historical data
  • 7. 7 app state app state app state event log Query service
  • 8. What is (distributed) streaming  Computations on never- ending “streams” of data records (“events”)  Stream processor distributes the computation in a cluster 8 Your code Your code Your code Your code
  • 9. What is stateful streaming  Computation and state • E.g., counters, windows of past events, state machines, trained ML models  Result depends on history of stream  Stateful stream processor gives the tools to manage state • Recover, roll back, version, upgrade, etc 9 Your code state
  • 10. What is event-time streaming  Data records associated with timestamps (time series data)  Processing depends on timestamps  Event-time stream processor gives you the tools to reason about time • E.g., handle streams that are out of order • Core feature is watermarks – a clock to measure event time 10 Your code state t3 t1 t2t4 t1-t2 t3-t4
  • 11. What is streaming  Continuous processing on data that is continuously generated  I.e., pretty much all “big” data  It’s all about state and time 11
  • 12. Debunking some common stream processing myths 12
  • 13. Myth 1: Throughput/latency tradeoff  Myth 1: you need to choose between high throughput or low latency  Physical limits • In reality, network determines both the achievable throughput and latency • A well-engineered system achieves these limits 13
  • 14. Flink performance  10s of millions events per seconds in 10s of nodes  scaled to 1000s of nodes  with latency in single-digit milliseconds 14
  • 15. Myth 2: Exactly once not possible  Exactly once: under failures, system computes result as if there was no failure  In contrast to: • At most once: no guarantees • At least once: duplicates possible  Exactly once state versus exactly once delivery  Myth 2: Exactly once state not possible/too costly 15
  • 16. Transactions  “Exactly once” is transactions: either all actions succeed or none succeed  Transactions are possible  Transactions are useful  Let’s not start eventual consistency all over again… 16
  • 17. Flink checkpoints  Periodic asynchronous consistent snapshots of application state  Provide exactly-once state guarantees under failures 17 9/2/2016 stream_barriers.svg checkpoint barrier n­1 data stream stream record (event) checkpoint barrier n newer records part of checkpoint n­1 part of checkpoint n part of checkpoint n+1 older records
  • 18. End-to-end exactly once  Checkpoints double as transaction coordination mechanism  Source and sink operators can take part in checkpoints  Exactly once internally, "effectively once" end to end: e.g., Flink + Cassandra with idempotent updates 18 transactional sinks
  • 19. State management  Checkpoints triple as state versioning mechanism (savepoints)  Go back and forth in time while maintaining state consistency  Ease code upgrades (Flink or app), maintenance, migration, and debugging, what-if simulations, A/B tests 19
  • 20. Myth 3: Streaming and real time  Myth 3: streaming and real-time are synonymous  Streaming is a new model • Essentially, state and time • Low latency/real time is the icing on the cake 20
  • 21. Low latency and high latency streams 21 2016-3-1 12:00 am 2016-3-1 1:00 am 2016-3-1 2:00 am 2016-3-11 11:00pm 2016-3-12 12:00am 2016-3-12 1:00am 2016-3-11 10:00pm 2016-3-12 2:00am 2016-3-12 3:00am… partition partition Stream (low latency) Batch (bounded stream) Stream (high latency)
  • 23. Accurate computation  Batch processing is not an accurate computation model for continuous data • Misses the right concepts and primitives • Time handling, state across batch boundaries  Stateful stream processing a better model • Real-time/low-latency is the icing on the cake 23
  • 24. Myth 4: How hard is streaming?  Myth 4: streaming is too hard to learn  You are already doing streaming, just in an ad hoc way  Most data is unbounded and the code changes slower than the data • This is a streaming problem 24
  • 25. It's about your data and code  What's the form of your data? • Unbounded (e.g., clicks, sensors, logs), or • Bounded (e.g., ???*)  What changes more often? • My code changes faster than my data • My data changes faster than my code 25 * Please help me find a great example of naturally bounded data
  • 26. It's about your data and code  If your data changes faster than your code you have a streaming problem • You may be solving it with hourly batch jobs depending on someone else to create the hourly batches • You are probably living with inaccurate results without knowing it 26
  • 27. It's about your data and code  If your code changes faster than your data you have an exploration problem • Using notebooks or other tools for quick data exploration is a good idea • Once your code stabilizes you will have a streaming problem, so you might as well think of it as such from the beginning 27
  • 28. Flink in the real world 28
  • 29. Flink community  > 240 contributors, 95 contributors in Flink 1.1  42 meetups around the world with > 15,000 members  2x-3x growth in 2015, similar in 2016 29
  • 30. Powered by Flink 30 Zalando, one of the largest ecommerce companies in Europe, uses Flink for real- time business process monitoring. King, the creators of Candy Crush Saga, uses Flink to provide data science teams with real-time analytics. Bouygues Telecom uses Flink for real-time event processing over billions of Kafka messages per day. Alibaba, the world's largest retailer, built a Flink-based system (Blink) to optimize search rankings in real time. See more at flink.apache.org/poweredby.html
  • 31. 30 Flink applications in production for more than one year. 10 billion events (2TB) processed daily Complex jobs of > 30 operators running 24/7, processing 30 billion events daily, maintaining state of 100s of GB with exactly-once guarantees Largest job has > 20 operators, runs on > 5000 vCores in 1000-node cluster, processes millions of events per second 31
  • 32. 32
  • 34. Current work in Flink 34
  • 35. Ongoing Flink development 35 Connectors Session Windows (Stream) SQL Library enhancements Metric System Operations Ecosystem Application Features Metrics & Visualization Dynamic Scaling Savepoint compatibility Checkpoints to savepoints More connectors Stream SQL Windows Large state Maintenance Fine grained recovery Side in-/outputs Window DSL Broader Audience Security Mesos & others Dynamic Resource Management Authentication Queryable State
  • 36. A longer-term vision for Flink 36
  • 37. Streaming use cases Application (Near) real-time apps Continuous apps Analytics on historical data Request/response apps Technology Low-latency streaming High-latency streaming Batch as special case of streaming Large queryable state 37
  • 38. Request/response applications  Queryable state: query Flink state directly instead of pushing results in a database  Large state support and query API coming in Flink 38 queries
  • 39. In summary  The need for streaming comes from a rethinking of data infra architecture • Stream processing then just becomes natural  Debunking 4 common myths • Myth 1: The throughput/latency tradeoff • Myth 2: Exactly once not possible • Myth 3: Streaming is for (near) real-time • Myth 4: Streaming is hard 39