SlideShare una empresa de Scribd logo
1 de 51
Apache Flink 1.0: A New Era for Real-World
Streaming Analytics
Chicago Apache Flink Meetup. April 19th, 2016
Slim Baltagi
Director, Enterprise Architecture
Capital One Financial Corporation
2
Agenda
1. Origin and evolution of streaming
capabilities in Flink
2. Why Flink is suitable for real-world
streaming analytics?
3. What are some streaming analytics use
cases suitable for Flink?
4. What are some streaming analytics use
cases from companies actually using Flink?
5. What are some novel use cases enabled by
Flink?
6. Where do you go from here?
3
1. Origin and evolution of data streaming
capabilities in Flink
2009
Apache Flink has its origins in a research project called Stratosphere
of which the idea was conceived in 2009 by professor Volker
Markl from the Technische Universität Berlin in Germany.
At its core, Flink has always been a distributed dataflow streaming
engine.
2012
Massively-Parallel Stream Processing under QoS Constraints with
Nephele, June 12th , 2012
http://stratosphere.eu/assets/papers/massivelyParallelStreamProcessing_12.pdf
2013
Nephele Streaming: Stream Processing under QoS Constraints at
Scale, August 5th, 2013 http://stratosphere.eu/assets/papers/nephele-
streaming.pdf
4
1. Origin and evolution of data streaming capabilities in
Flink
2014
March 2014: Work on the first prototype for an API demonstrating the
streaming capabilities of Stratosphere started in March 2014 by Gyula
Fora and Marton Balassi from the Hungarian Academy of Sciences.
April 2014: Flink joined the Apache incubator in April 2014 and
graduated as an Apache Top Level Project (TLP) in December 2014.
June 2014: First public mention of this prototype was on June 4th,
2014 http://2014.adattarhazforum.hu/letoltes/2014dwforum/mta_sztaki_balassi_marton.pdf
October 2014: 2nd public mention of this prototype was in October
7th 2014 https://www.youtube.com/watch?v=k2AOqwm_7ts at 10’37” http://data-
artisans.com/apache-flink-new-kid-on-the-block/
November 2014: The first talk using ‘Flink Streaming’ at the
ApacheCon on November 18th , 2014
http://events.linuxfoundation.org/sites/events/files/slides/flink_apachecon_small.pdf
5
1. Origin and evolution of data streaming
capabilities in Flink
2015
June 2015: “I would consider stream data analysis to be a major
unique selling proposition for Flink. Due to its pipelined architecture
Flink is a perfect match for big data stream processing in the Apache
stack.” – Volker Markl. Ref.: On Apache Flink. Interview with Volker
Markl, June 24th 2015 http://www.odbms.org/blog/2015/06/on-apache-flink-
interview-with-volker-markl/
June 2015: Flink 0.9 released on June 24, 2015, DataStream API
in beta, exactly-once guarantees via checkpointing
November 2015: Flink 0.10 released on November 16th, 2015,
Event time support, windowing mechanism based on Dataflow/Beam
model, graduated DataStream API, high availability, state backbends,
new/updated connectors (Kafka, Nifi, ...), improved monitoring, …
6
1. Origin and evolution of streaming capabilities
in Flink
2016
This Google paper “The Dataflow Model: A Practical
Approach to Balancing Correctness, Latency, and Cost
in Massive-Scale, Unbounded, Out-of-Order Data
Processing” http://research.google.com/pubs/pub43864.html influenced
Flink rich windowing semantics
March 2016: Flink 1.0 released on March 8th 2016,
Stable DataStream API, Out-of-core state, savepoints,
CEP library, improved monitoring, Kafka 0.9 support, …
April 2016: Apache Flink 1.0.1 was released on April 6th
2016.
Flink 1.0.2 is being voted on.
7
1. Origin and evolution of streaming capabilities in
Flink
Post Flink 1.0 in 2016
Queryable state: query the state from within Flink
instead of a database. Querying the state that Flink
holds while it is doing its computation will effectively
replace a database! Planned for Flink 1.1
SQL/StreamSQL and Table API
Dynamic Scaling: Runtime scaling for DataStream
programs
Managed memory for streaming operators
Security: Over-the-wire encryption of RPC (Akka) and
data transfers (Netty)
8
1. Origin and evolution of streaming capabilities in
Flink
Expose more runtime metrics: Backpressure monitoring,
Spilling / Out of Core
Additional streaming connectors: Kinesis, Cassandra, …
Making YARN resource dynamic
Support for Apache Mesos
https://issues.apache.org/jira/browse/FLINK-1984
Further reading:
• Apache Flink Roadmap Draft, December 2015
https://docs.google.com/document/d/1ExmtVpeVVT3TIhO1JoBpC5JKXm-
778DAD7eqw5GANwE/edit
• What’s next? Roadmap 2016. Robert Metzger, January 26, 2016.
Berlin Apache Flink Meetup.
http://www.slideshare.net/robertmetzger1/january-2016-flink-community-update-
roadmap-2016/9
9
Agenda
1. Origin and evolution of streaming
capabilities in Flink
2. Why Flink is suitable for real-world
streaming analytics?
3. What are some streaming analytics use
cases suitable for Flink?
4. What are some streaming analytics use
cases from companies actually using Flink?
5. What are some novel use cases enabled by
Flink?
6. Where do you go from here?
10
2. Why Flink is suitable for real-world streaming
analytics?
Apache Flink 1.0, which was released on March 8th
2016, comes with a competitive set of streaming
analytics features, some of which are unique in the
open source domain.
The combination of these features makes Apache
Flink a unique choice for real-world streaming
analytics.
Let’s discuss some of Apache Flink features for real-
world streaming analytics.
11
2. Why Flink is suitable for real-world streaming analytics?
2.1. Pipelined processing engine
2.2. Stream abstraction: DataStream as in the real-
world
2.3. Performance: Low latency and high throughput
2.4. Support for rich windowing semantics
2.5. Support for different notions of time
2.6. Stateful stream processing
2.7. Fault tolerance and correctness
2.8. High Availability
2.9. Backpressure handling
2.10. Expressive and easy-to-use APIs in Scala and
Java
2.11. Support for batch
2.12. Integration with the Hadoop ecosystem
12
2.1. Pipelined processing engine
 Flink is a pipelined (streaming) engine akin to parallel
database systems, rather than a batch engine as
Spark.
 ‘Flink’s runtime is not designed around the idea that
operators wait for their predecessors to finish before
they start, but they can already consume partially
generated results.’
 ‘This is called pipeline parallelism and means that
several transformations in a Flink program are
actually executed concurrently with data being
passed between them through memory and network
channels.’ http://data-artisans.com/apache-flink-new-kid-on-the-
block/
13
2.2. Stream abstraction: DataStream as in the real-
world
 Real world data is a series of events that are
continuously produced by a variety of applications and
disparate systems inside and outside the enterprise.
 Flink, as a stream processing system, models streams
as what they are in the real world, a series of events
and use DataStream as an abstraction.
 Spark, as a batch processing system, approximates
these streams as micro-batches and uses DStream as
an abstraction. This adds an artificial latency!
14
2.3. Performance: Low latency and high throughput
Pipelined processing engine enable true low latency
streaming applications with fast results in milliseconds
High throughput: efficiently handle high volume of
streams (millions of events per second)
Tunable latency / throughput tradeoff: Using a tuning
knob to navigate the latency-throughput trade off.
Yahoo! benchmarked Storm, Spark Streaming and Flink
with a production use-case (counting ad impressions
grouped by campaign).
Full Yahoo! Article, benchmark stops at low write
throughput and programs are not fault tolerant.
https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-
computation-engines-at
15
2.3. Performance: Low latency and high throughput
Full Data Artisans article, extends the Yahoo!
benchmark to high volumes and uses Flink’s built-in
state http://data-artisans.com/extending-the-yahoo-streaming-benchmark/
Flink outperformed both Spark Streaming and Storm
in this benchmark modeled after a real-world
application:
• Flink achieves throughput of 15 million messages/second on a
10 machines cluster. This is 35x higher throughput compared to
Storm (80x compared to Yahoo’s runs)
• Flink ran with exactly once guarantees, Storm with at least
once.
Ultimately, you need to test the performance of your
own streaming analytics application as it depends on
your own logic and the version of your preferred
stream processing tool!
16
2.4. Support for rich windowing semantics
Flink provides rich windowing semantics. A window is
a grouping of events based on some function of time
(all records of the last 5 minutes), count (the last 10
events) or session (all the events of a particular web
user ).
Window types in Flink:
• Tumbling windows ( no overlap)
• Sliding windows (with overlap)
• Session windows ( gap of activity)
• Custom windows (with assigners, triggers and
evictors)
17
2.4. Support for rich windowing semantics
In many systems, these windows are hard-coded and
connected with the system’s internal checkpointing
mechanism. Flink is the first open source streaming
engine that completely decouples windowing from
fault tolerance, allowing for richer forms of windows,
such as sessions.
Further reading:
• http://flink.apache.org/news/2015/12/04/Introducing-windows.html
• http://beam.incubator.apache.org/beam/capability/2016/03/17/capability-matrix.html
• https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101
• https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102
18
2.5. Support for different notions of time
In a streaming program with Flink, for example to define
windows in respect to time, one can refer to different
notions of time:
• Event Time: when an event did happen in the real
world.
• Ingestion time: when data is loaded into Flink, from
Kafka for example.
• Processing Time: when data is processed by Flink
In the real word, streams of events rarely arrive in the
order that they are produced due to distributed sources,
non-synced clocks, network delays… They are said to be
“out of order’ streams.
Flink is the first open source streaming engine that
supports out of order streams and which is able to
consistently process events according to their event
time.
19
2.5. Support for different notions of time
http://beam.incubator.apache.org/beam/capability/2016/03/17/capability-matrix.html
https://ci.apache.org/projects/flink/flink-docs-master/concepts/concepts.html#time
https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/event_time.html
http://data-artisans.com/how-apache-flink-enables-new-streaming-applications-part-1/
20
2.6. Stateful stream processing
Many operations in a dataflow simply look at one
individual event at a time, for example an event parser.
Some operations called stateful operations are defined as
the ones where data is needed to be stored at the end of a
window for computations occurring in later windows.
Now, where the state of these stateful operations is
maintained?
21
2.6. Stateful stream processing
 The state can be stored in memory, in the File System
or in RocksDB which is an embedded key value data
store and not an external database.
 Flink also supports state versioning through
savepoints which are checkpoints of the state of a
running streaming job that can be manually triggered
by the user while the job is running.
 Savepoints enable:
• Code upgrades: both application and framework
• Cluster maintenance and migration
• A/B testing and what-if scenarios
• Testing and debugging.
• Restart a job with adjusted parallelism
Further reading: http://data-artisans.com/how-apache-flink-enables-new-streaming-
applications/
 https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/savepoints.html
22
2.7. Fault tolerance and correctness
How to ensure that the state is correct after failures?
Apache Flink offers a fault tolerance mechanism to
consistently recover the state of data streaming
applications.
This ensures that even in the presence of failures, the
operators do not perform duplicate updates to their
state (exactly once guarantees). This basically means
that the computed results are the same whether there
are failures along the way or not.
There is a switch to downgrade the guarantees to at
least once if the use case tolerates duplicate updates.
23
2.7. Fault tolerance and correctness
Further reading:
• High-throughput, low-latency, and exactly-once stream
processing with Apache Flinkhttp://data-artisans.com/high-
throughput-low-latency-and-exactly-once-stream-processing-with-apache-
flink/
• Data Streaming Fault Tolerance document:
http://ci.apache.org/projects/flink/flink-docs-
master/internals/stream_checkpointing.html
• ‘Lightweight Asynchronous Snapshots for Distributed
Dataflows’ http://arxiv.org/pdf/1506.08603v1.pdf June 28, 2015
• Distributed Snapshots: Determining Global States of
Distributed Systems, February 1985, Chandra-Lamport
algorithm http://research.microsoft.com/en-
us/um/people/lamport/pubs/chandy.pdf
24
2.8. High Availability
In the real world, streaming analytics applications need
to be reliable and capable of running jobs for months
and remain resilient in the event of failures.
The JobManager (Master) is responsible for scheduling
and resource management. If it crashes, no new
programs can be submitted and running program will
fail.
Flink provides a High Availability (HA) mode to recover
from JobManager crash, to eliminate the Single Point
Of Failure (SPOF)
Further reading: JobManager High Availability
https://ci.apache.org/projects/flink/flink-docs-
master/setup/jobmanager_high_availability.html
25
2.9. Backpressure handling
In the real world, there are situations where a system is
receiving data at a higher rate than it can normally
process. This is called backpressure.
Flink handles backpressure implicitly through its
architecture without user interaction while
backpressure handling in Spark is through manual
configuration: spark.streaming.backpressure.enabled.
Flink provides backpressure monitoring to allow users
to understand bottlenecks in streaming applications.
Further reading:
• How Flink handles backpressure? by Ufuk Celebi, Kostas Tzoumas and
Stephan Ewen, August 31, 2015. http://data-artisans.com/how-flink-handles-
backpressure/
26
2.10. Expressive and easy-to-use APIs in Scala and Java
 High level, expressive and easy to use DataStream API
with flexible window semantics results in significantly
less custom application logic compared to other open
source stream processing solutions.
 Flink's DataStream API ports many operators from its
DataSet batch processing API such as map, reduce, and
join to the streaming world.
 In addition, it provides stream-specific operations such
as window, split, connect, …
 Its support for user-defined functions eases the
implementation of custom application behavior.
 The DataStream API is available in Scala and Java.
27
2.10. Expressive and easy-to-use APIs in Scala and Java
case class Word (word: String, frequency: Int)
val env = StreamExecutionEnvironment.getExecutionEnvironment()
val lines: DataStream[String] = env.fromSocketStream(...)
lines.flatMap {line => line.split(" ")
.map(word => Word(word,1))}
.window(Time.of(5,SECONDS)).every(Time.of(1,SECONDS))
.keyBy("word").sum("frequency")
.print()
env.execute()
val env = ExecutionEnvironment.getExecutionEnvironment()
val lines: DataSet[String] = env.readTextFile(...)
lines.flatMap {line => line.split(" ")
.map(word => Word(word,1))}
.groupBy("word").sum("frequency")
.print()
env.execute()
DataSet API (batch): WordCount
DataStream API (streaming): Window WordCount
28
2.11. Support for batch
 In Flink, batch processing is a special case of stream
processing, as finite data sources are just streams that
happen to end.
 Flink offers a full toolset for batch processing with a
dedicated DataSet API and libraries for machine learning
and graph processing.
 In addition, Flink contains several batch-specific
optimizations such as for scheduling, memory
management, and query optimization.
 Flink out-performs dedicated batch processing engine
such as Spark and Hadoop MapReduce in batch use
cases.
29
2.12. Integration with the Hadoop ecosystem
POSIX Java/Scala
Collections
POSIX
30
Agenda
1. Origin and evolution of streaming
capabilities in Flink
2. Why Flink is suitable for real-world
streaming analytics?
3. What are some streaming analytics use
cases suitable for Flink?
4. What are some streaming analytics use
cases from companies actually using Flink?
5. What are some novel use cases enabled by
Flink?
6. Where do you go from here?
31
3. What are some streaming analytics use cases
suitable for Flink?
1. Financial services
2. Telecommunications
3. Online gaming systems
4. Security & Intelligence
5. Advertisement serving
6. Sensor Networks
7. Social Media
8. Healthcare
9. Oil & Gas
10. Retail & eCommerce
11. Transportation and logistics
32
Agenda
1. Origin and evolution of streaming
capabilities in Flink
2. Why Flink is suitable for real-world
streaming analytics?
3. What are some streaming analytics use
cases suitable for Flink?
4. What are some streaming analytics use
cases from companies actually using Flink?
5. What are some novel use cases enabled by
Flink?
6. Where do you go from here?
33
4. What are some streaming analytics use cases
from companies actually using Flink?
. Who is using Apache Flink?Some companies using Flink for streaming analytics:
[Telecommunications] [Retail] [Financial Services]
Gaming Security
[Gaming] [Security]
Powered by Flink [Companies, Software Projects,
Universities/Research Institutes]
https://cwiki.apache.org/confluence/display/FLINK/Powered+by+Flink
34
4. What are some streaming analytics use cases
from companies actually using Flink?
 Bouygues Telecom is a full-service communication
operator (mobile, fixed telephony, TV, Internet, and
Cloud computing) and one of the largest providers in
France, with over 11 million mobile subscribers, …
 Bouygues Telecom uses Flink for real-time event
processing and analytics over billions of Kafka
messages per day.
 Stream processing at Bouygues Telecom
with Apache Flink, by Mohamed Amine Abdessemed
• Blog: http://data-artisans.com/flink-at-bouygues-html/ June 1st , 2015
• Slides: http://www.slideshare.net/FlinkForward/mohamed-amine-abdessemed-
realtime-data-integration-with-apache-flink-kafka
• Video: https://www.youtube.com/watch?v=hjmgZfXSi3M
35
4. What are some streaming analytics use cases
from companies actually using Flink?
Otto Group is the world’s second-largest online retailer in
the end-consumer (B2C) business and Europe’s largest
online retailer in the end-consumer B2C fashion and
lifestyle business.
 “A range of exciting projects at the
BI department were implemented with Apache Flink, e.g. a
crowd-sourced user-agent identification, and a search
session identifier.”
 How we selected Apache Flink as our Stream Processing
Framework at the Otto Group Business Intelligence Department?
October 6, 2015
Blog: http://data-artisans.com/how-we-selected-apache-flink-at-otto-group/ Slides:
http://www.slideshare.net/FlinkForward/christian-kreuzfeld-static-vs-dynamic-stream-processing
Video: https://www.youtube.com/watch?v=cnqPyw_uQAQ
36
4. What are some streaming analytics use cases
from companies actually using Flink?
 At king.com, Flink is used to process more than 30
billion events daily and compute real-time player
statistics by leveraging Flink's stateful streaming
abstractions and Complex Event Processing.
References:
• Apache Software Foundation Blog, March 8th 2016
• Blog:https://blogs.apache.org/foundation/entry/the_apache_software_foundation_announces88
• Hadoop Summit Dublin 2016, April 13, 2016
• Slides: http://www.slideshare.net/GyulaFra/largescale-stream-processing-in-the-hadoop-
ecosystem-hadoop-summit-2016-60887821/3
• Video: https://www.youtube.com/watch?v=mRhCpp-p11E
37
4. What are some streaming analytics use cases
from companies actually using Flink?
 Zalando(.com) is Europe’s
leading online fashion platform, doing business in 15
markets and attracting well over 100 million visits per
month.
 “Delivering first-class shopping experiences to our
+14 million customers requires moving fast and using
cutting-edge, open-source technologies.”
 Near real time business intelligence for the following
use cases: Business process monitoring and
continuous ETL
 Apache Showdown: Flink vs. Spark by Javier Lopez,
Mihail Vieru - 31 March 2016https://tech.zalando.com/blog/apache-
showdown-flink-vs.-spark/
38
4. What are some streaming analytics use cases
from companies actually using Flink?
Capital One is a top 10 leading
consumer and commercial banking institution which is
conducting business in the US, Canada and UK.
Flink was used for Real-Time monitoring of
customer activity data (Audit log event details,
failure and success data, … ) to:
• proactively detect and resolve issue immediately
• prevent significant customer impact
• enable flawless digital enterprise experience
Flink Case study at Capital One, 2015 FlinkForward
Conference, Berlin, Germany October 12th 2015
http://www.slideshare.net/FlinkForward/flink-case-study-capital-one
39
Real-Time Monitoring of Customer Activity
40
4. What are some streaming analytics use cases
from companies actually using Flink?
 has its hack week and the winner, announced
on December 18th 2015, was a Flink based streaming project!
Extending the Yahoo! Streaming Benchmark and Winning Twitter
Hack-Week with Apache Flink. Posted on February 2, 2016 by
Jamie Grier http://data-artisans.com/extending-the-yahoo-streaming-benchmark/
http://www.slideshare.net/JamieGrier/stateful-stream-processing-at-inmemory-speed
 did some benchmarks to compare
performance of one of their use case originally implemented on
Apache Storm against Spark Streaming and Flink. Results posted
on December 18, 2015
• http://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines-
at
• http://data-artisans.com/extending-the-yahoo-streaming-benchmark/
• https://github.com/dataArtisans/yahoo-streaming-benchmark
• http://www.slideshare.net/JamieGrier/extending-the-yahoo-streaming-benchmark
41
Generic Streaming Analytics Architectural pattern:
This is changing with Flink’s alerts, StreamSQL, state
querying, FlinkCEP, …
Event
Producers
Collector
Broker
Processor
Indexer
Visualizer/Search
• Kafka
• RabitMQ
• JMS
• Amazon
Kinesis
• Google Cloud
Pub/Sub
• MapR Streams
• Flink
• Spark
• Storm
• Samza
• Kafka
streams
• ElasticSearch
• Solr
• Cassandra
• HBase
• MapR DB
• MongoDB
• Apache Geode
• Kibana
• Custom
GUI
• Flume
• SpringXD
• Logstash
• Nifi
• Fluentd
• Apps
• Devices
• Sensors
42
Agenda
1. Origin and evolution of streaming
capabilities in Flink
2. Why Flink is suitable for real-world
streaming analytics?
3. What are some streaming analytics use
cases suitable for Flink?
4. What are some streaming analytics use
cases from companies actually using Flink?
5. What are some novel use cases enabled by
Flink?
6. Where do you go from here?
43
5. What are some novel use cases enabled by
Flink?
5.1. Flink as an imbedded key/value data store
5.2. Flink as a distributed CEP engine
44
5.1. Flink as an imbedded key/value data store
 The stream processor as a database: a new design pattern for data
streaming applications, using Apache Flink and Apache Kafka:
Building applications directly on top of the stream processor, rather
than on top of key/value databases populated by data streams.
 The stateful operator features in Flink allow a streaming application
to query state in the stream processor instead of a key/value store
often a bottleneck http://data-artisans.com/extending-the-yahoo-streaming-benchmark/
45
“State querying” feature is expected in upcoming Flink 1.1
http://www.slideshare.net/JamieGrier/stateful-stream-processing-at-inmemory-speed/38
46
5.2. Flink as a distributed CEP engine
Flink stream processor as CEP (Complex Event
Processing) engine. Example: an application that
ingests network monitoring events, identifies access
patterns such as intrusion attempts using FlinkCEP, and
analyzes and aggregates identified access patterns.
Upcoming Talk: Streaming analytics and CEP - Two sides of the
same coin’ by Till Rohrmann and Fabian Hueske at the Berlin
Buzzwords on June 05-07 2016.
http://berlinbuzzwords.de/session/streaming-analytics-and-cep-two-sides-same-coin
Further reading:
– Introducing Complex Event Processing (CEP) with Apache Flink,
Till Rohrmann April 6, 2016 http://flink.apache.org/news/2016/04/06/cep-
monitoring.html
– FlinkCEP - Complex event processing for
Flinkhttps://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/libs/cep.html
47
5.2. Flink as a distributed CEP engine
Pattern<MonitoringEvent, ?> warningPattern =
Pattern.<MonitoringEvent>begin("First Event")
.subtype(TemperatureEvent.class)
.where(evt -> evt.getTemperature()>=THRESHOLD)
.next("Second Event")
.subtype(TemperatureEvent.class)
.where(evt -> evt.getTemperature() >= THRESHOLD)
.within(Time.seconds(10));
48
Agenda
1. Why streaming analytics are emerging?
2. Why Flink is suitable for real-world
streaming analytics?
3. What are some streaming analytics use
cases suitable for Flink?
4. What are some streaming analytics use
cases from companies actually using Flink?
5. What are some novel use cases enabled by
Flink?
6. Where do you go from here?
49
6. Where do you go from here?
 A few resources for you:
• Overview of Apache Flink: the 4G of Big Data Analytics
Frameworks, Hadoop Summit Europe, April 13th 2016
• Slides: http://www.slideshare.net/SlimBaltagi/overview-of-apache-fink-the-4-g-
of-big-data-analytics-frameworks
• Video: https://www.youtube.com/watch?v=_BZURQn2EQI
• Flink Knowledge Base: One-Stop for everything related
to Apache Flink. http://sparkbigdata.com/component/tags/tag/27-flink
• Flink at the Apache Software Foundation: flink.apache.org/
• Free Apache Flink training from data Artisans
http://dataartisans.github.io/flink-training
• Flink Forward Conference, 12-14 September 2016,
Berlin, Germany http://flink-forward.org/ (call for submissions announced
on April 13th , 2016)
50
6. Where do you go from here?
• Free ebook from MapR: Streaming Architecture: New
Designs Using Apache Kafka and MapR Streams
https://www.mapr.com/streaming-architecture-using-apache-kafka-mapr-streams
• Free ebook from Confluent: Making sense of stream
processing http://www.confluent.io/making-sense-of-stream-processing-
ebook
 A few takeaways:
• Apache Flink unique capabilities enable new and
sophisticated use cases especially for real-world
streaming analytics.
• Customers demand will push major Hadoop
distributors to package Flink and support it.
• Apache Flink will enable innovations and disruptions in
many verticals with its capabilities in real-world
streaming analytics.
51
Thanks!
To all of you for attending!
Let’s keep in touch!
• sbaltagi@gmail.com
• @SlimBaltagi
• https://www.linkedin.com/in/slimbaltagi
Any questions?

Más contenido relacionado

La actualidad más candente

Unified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache FlinkUnified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache FlinkSlim Baltagi
 
Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorAljoscha Krettek
 
January 2016 Flink Community Update & Roadmap 2016
January 2016 Flink Community Update & Roadmap 2016January 2016 Flink Community Update & Roadmap 2016
January 2016 Flink Community Update & Roadmap 2016Robert Metzger
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flinkdatamantra
 
Stateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory SpeedStateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory SpeedJamie Grier
 
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...confluent
 
Apache Kafka Streams + Machine Learning / Deep Learning
Apache Kafka Streams + Machine Learning / Deep LearningApache Kafka Streams + Machine Learning / Deep Learning
Apache Kafka Streams + Machine Learning / Deep LearningKai Wähner
 
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR BenchmarksExtending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR BenchmarksJamie Grier
 
Flink Community Update December 2015: Year in Review
Flink Community Update December 2015: Year in ReviewFlink Community Update December 2015: Year in Review
Flink Community Update December 2015: Year in ReviewRobert Metzger
 
Apache Spark vs Apache Flink
Apache Spark vs Apache FlinkApache Spark vs Apache Flink
Apache Spark vs Apache FlinkAKASH SIHAG
 
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN Flink Forward
 
Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/HadoopHopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/HadoopJim Dowling
 
QCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache FlinkQCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache FlinkRobert Metzger
 
Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...
Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...
Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...Kai Wähner
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksSlim Baltagi
 
The Evolution of (Open Source) Data Processing
The Evolution of (Open Source) Data ProcessingThe Evolution of (Open Source) Data Processing
The Evolution of (Open Source) Data ProcessingAljoscha Krettek
 
Apache Flink: Past, Present and Future
Apache Flink: Past, Present and FutureApache Flink: Past, Present and Future
Apache Flink: Past, Present and FutureGyula Fóra
 
Maximilian Michels - Flink and Beam
Maximilian Michels - Flink and BeamMaximilian Michels - Flink and Beam
Maximilian Michels - Flink and BeamFlink Forward
 

La actualidad más candente (20)

Unified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache FlinkUnified Batch and Real-Time Stream Processing Using Apache Flink
Unified Batch and Real-Time Stream Processing Using Apache Flink
 
Cooperative Data Exploration with iPython Notebook
Cooperative Data Exploration with iPython NotebookCooperative Data Exploration with iPython Notebook
Cooperative Data Exploration with iPython Notebook
 
Apache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream ProcessorApache Flink(tm) - A Next-Generation Stream Processor
Apache Flink(tm) - A Next-Generation Stream Processor
 
January 2016 Flink Community Update & Roadmap 2016
January 2016 Flink Community Update & Roadmap 2016January 2016 Flink Community Update & Roadmap 2016
January 2016 Flink Community Update & Roadmap 2016
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
 
Stateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory SpeedStateful Stream Processing at In-Memory Speed
Stateful Stream Processing at In-Memory Speed
 
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
Five Fabulous Sinks for Your Kafka Data. #3 will surprise you! (Rachel Pedres...
 
Apache Kafka Streams + Machine Learning / Deep Learning
Apache Kafka Streams + Machine Learning / Deep LearningApache Kafka Streams + Machine Learning / Deep Learning
Apache Kafka Streams + Machine Learning / Deep Learning
 
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR BenchmarksExtending the Yahoo Streaming Benchmark + MapR Benchmarks
Extending the Yahoo Streaming Benchmark + MapR Benchmarks
 
Flink Community Update December 2015: Year in Review
Flink Community Update December 2015: Year in ReviewFlink Community Update December 2015: Year in Review
Flink Community Update December 2015: Year in Review
 
Apache Spark vs Apache Flink
Apache Spark vs Apache FlinkApache Spark vs Apache Flink
Apache Spark vs Apache Flink
 
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
Jim Dowling - Multi-tenant Flink-as-a-Service on YARN
 
Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/HadoopHopsworks - Self-Service Spark/Flink/Kafka/Hadoop
Hopsworks - Self-Service Spark/Flink/Kafka/Hadoop
 
QCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache FlinkQCon London - Stream Processing with Apache Flink
QCon London - Stream Processing with Apache Flink
 
Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...
Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...
Deep Learning Streaming Platform with Kafka Streams, TensorFlow, DeepLearning...
 
Why apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics FrameworksWhy apache Flink is the 4G of Big Data Analytics Frameworks
Why apache Flink is the 4G of Big Data Analytics Frameworks
 
The Evolution of (Open Source) Data Processing
The Evolution of (Open Source) Data ProcessingThe Evolution of (Open Source) Data Processing
The Evolution of (Open Source) Data Processing
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
 
Apache Flink: Past, Present and Future
Apache Flink: Past, Present and FutureApache Flink: Past, Present and Future
Apache Flink: Past, Present and Future
 
Maximilian Michels - Flink and Beam
Maximilian Michels - Flink and BeamMaximilian Michels - Flink and Beam
Maximilian Michels - Flink and Beam
 

Similar a Apache Fink 1.0: A New Era for Real-World Streaming Analytics

Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksDataWorks Summit/Hadoop Summit
 
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksOverview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksSlim Baltagi
 
Robust stream processing with Apache Flink
Robust stream processing with Apache FlinkRobust stream processing with Apache Flink
Robust stream processing with Apache FlinkAljoscha Krettek
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkSlim Baltagi
 
Flink Cummunity Update July (Berlin Meetup)
Flink Cummunity Update July (Berlin Meetup)Flink Cummunity Update July (Berlin Meetup)
Flink Cummunity Update July (Berlin Meetup)Robert Metzger
 
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiApache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiSlim Baltagi
 
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitSlim Baltagi
 
Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureTimothy Spann
 
Flink Community Update 2015 June
Flink Community Update 2015 JuneFlink Community Update 2015 June
Flink Community Update 2015 JuneMárton Balassi
 
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...Flink Forward
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flinkconfluent
 
Codeless pipelines with pulsar and flink
Codeless pipelines with pulsar and flinkCodeless pipelines with pulsar and flink
Codeless pipelines with pulsar and flinkTimothy Spann
 
Data Intensive Applications with Apache Flink
Data Intensive Applications with Apache FlinkData Intensive Applications with Apache Flink
Data Intensive Applications with Apache FlinkSimone Robutti
 
Data intensive applications with Apache Flink - Simone Robutti, Radicalbit
Data intensive applications with Apache Flink - Simone Robutti, RadicalbitData intensive applications with Apache Flink - Simone Robutti, Radicalbit
Data intensive applications with Apache Flink - Simone Robutti, RadicalbitData Science Milan
 
Flink September 2015 Community Update
Flink September 2015 Community UpdateFlink September 2015 Community Update
Flink September 2015 Community UpdateRobert Metzger
 

Similar a Apache Fink 1.0: A New Era for Real-World Streaming Analytics (20)

Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics FrameworksOverview of Apache Flink: the 4G of Big Data Analytics Frameworks
Overview of Apache Flink: the 4G of Big Data Analytics Frameworks
 
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics FrameworksOverview of Apache Fink: The 4G of Big Data Analytics Frameworks
Overview of Apache Fink: The 4G of Big Data Analytics Frameworks
 
Robust stream processing with Apache Flink
Robust stream processing with Apache FlinkRobust stream processing with Apache Flink
Robust stream processing with Apache Flink
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
 
Flink Cummunity Update July (Berlin Meetup)
Flink Cummunity Update July (Berlin Meetup)Flink Cummunity Update July (Berlin Meetup)
Flink Cummunity Update July (Berlin Meetup)
 
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-BaltagiApache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
Apache-Flink-What-How-Why-Who-Where-by-Slim-Baltagi
 
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summitAnalysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
Analysis-of-Major-Trends-in-big-data-analytics-slim-baltagi-hadoop-summit
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
Analysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data AnalyticsAnalysis of Major Trends in Big Data Analytics
Analysis of Major Trends in Big Data Analytics
 
Cloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azureCloud lunch and learn real-time streaming in azure
Cloud lunch and learn real-time streaming in azure
 
Flink Community Update 2015 June
Flink Community Update 2015 JuneFlink Community Update 2015 June
Flink Community Update 2015 June
 
Apache kafka
Apache kafkaApache kafka
Apache kafka
 
Apache flink
Apache flinkApache flink
Apache flink
 
Apache flink
Apache flinkApache flink
Apache flink
 
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
Kostas Tzoumas_Stephan Ewen - Keynote -The maturing data streaming ecosystem ...
 
Workshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con FlinkWorkshop híbrido: Stream Processing con Flink
Workshop híbrido: Stream Processing con Flink
 
Codeless pipelines with pulsar and flink
Codeless pipelines with pulsar and flinkCodeless pipelines with pulsar and flink
Codeless pipelines with pulsar and flink
 
Data Intensive Applications with Apache Flink
Data Intensive Applications with Apache FlinkData Intensive Applications with Apache Flink
Data Intensive Applications with Apache Flink
 
Data intensive applications with Apache Flink - Simone Robutti, Radicalbit
Data intensive applications with Apache Flink - Simone Robutti, RadicalbitData intensive applications with Apache Flink - Simone Robutti, Radicalbit
Data intensive applications with Apache Flink - Simone Robutti, Radicalbit
 
Flink September 2015 Community Update
Flink September 2015 Community UpdateFlink September 2015 Community Update
Flink September 2015 Community Update
 

Más de Slim Baltagi

How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?Slim Baltagi
 
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiModern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiSlim Baltagi
 
Modern big data and machine learning in the era of cloud, docker and kubernetes
Modern big data and machine learning in the era of cloud, docker and kubernetesModern big data and machine learning in the era of cloud, docker and kubernetes
Modern big data and machine learning in the era of cloud, docker and kubernetesSlim Baltagi
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsSlim Baltagi
 
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeApache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeSlim Baltagi
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Slim Baltagi
 
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiHadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiSlim Baltagi
 
Big Data at CME Group: Challenges and Opportunities
Big Data at CME Group: Challenges and Opportunities Big Data at CME Group: Challenges and Opportunities
Big Data at CME Group: Challenges and Opportunities Slim Baltagi
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopSlim Baltagi
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkSlim Baltagi
 
A Big Data Journey: Bringing Open Source to Finance
A Big Data Journey: Bringing Open Source to FinanceA Big Data Journey: Bringing Open Source to Finance
A Big Data Journey: Bringing Open Source to FinanceSlim Baltagi
 

Más de Slim Baltagi (11)

How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?How to select a modern data warehouse and get the most out of it?
How to select a modern data warehouse and get the most out of it?
 
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-BaltagiModern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
Modern-Data-Warehouses-In-The-Cloud-Use-Cases-Slim-Baltagi
 
Modern big data and machine learning in the era of cloud, docker and kubernetes
Modern big data and machine learning in the era of cloud, docker and kubernetesModern big data and machine learning in the era of cloud, docker and kubernetes
Modern big data and machine learning in the era of cloud, docker and kubernetes
 
Kafka Streams for Java enthusiasts
Kafka Streams for Java enthusiastsKafka Streams for Java enthusiasts
Kafka Streams for Java enthusiasts
 
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision TreeApache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
Apache Kafka vs RabbitMQ: Fit For Purpose / Decision Tree
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink
 
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim BaltagiHadoop or Spark: is it an either-or proposition? By Slim Baltagi
Hadoop or Spark: is it an either-or proposition? By Slim Baltagi
 
Big Data at CME Group: Challenges and Opportunities
Big Data at CME Group: Challenges and Opportunities Big Data at CME Group: Challenges and Opportunities
Big Data at CME Group: Challenges and Opportunities
 
Building a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise HadoopBuilding a Modern Data Architecture with Enterprise Hadoop
Building a Modern Data Architecture with Enterprise Hadoop
 
Transitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to SparkTransitioning Compute Models: Hadoop MapReduce to Spark
Transitioning Compute Models: Hadoop MapReduce to Spark
 
A Big Data Journey: Bringing Open Source to Finance
A Big Data Journey: Bringing Open Source to FinanceA Big Data Journey: Bringing Open Source to Finance
A Big Data Journey: Bringing Open Source to Finance
 

Último

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxolyaivanovalion
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...amitlee9823
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 

Último (20)

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Carero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptxCarero dropshipping via API with DroFx.pptx
Carero dropshipping via API with DroFx.pptx
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 

Apache Fink 1.0: A New Era for Real-World Streaming Analytics

  • 1. Apache Flink 1.0: A New Era for Real-World Streaming Analytics Chicago Apache Flink Meetup. April 19th, 2016 Slim Baltagi Director, Enterprise Architecture Capital One Financial Corporation
  • 2. 2 Agenda 1. Origin and evolution of streaming capabilities in Flink 2. Why Flink is suitable for real-world streaming analytics? 3. What are some streaming analytics use cases suitable for Flink? 4. What are some streaming analytics use cases from companies actually using Flink? 5. What are some novel use cases enabled by Flink? 6. Where do you go from here?
  • 3. 3 1. Origin and evolution of data streaming capabilities in Flink 2009 Apache Flink has its origins in a research project called Stratosphere of which the idea was conceived in 2009 by professor Volker Markl from the Technische Universität Berlin in Germany. At its core, Flink has always been a distributed dataflow streaming engine. 2012 Massively-Parallel Stream Processing under QoS Constraints with Nephele, June 12th , 2012 http://stratosphere.eu/assets/papers/massivelyParallelStreamProcessing_12.pdf 2013 Nephele Streaming: Stream Processing under QoS Constraints at Scale, August 5th, 2013 http://stratosphere.eu/assets/papers/nephele- streaming.pdf
  • 4. 4 1. Origin and evolution of data streaming capabilities in Flink 2014 March 2014: Work on the first prototype for an API demonstrating the streaming capabilities of Stratosphere started in March 2014 by Gyula Fora and Marton Balassi from the Hungarian Academy of Sciences. April 2014: Flink joined the Apache incubator in April 2014 and graduated as an Apache Top Level Project (TLP) in December 2014. June 2014: First public mention of this prototype was on June 4th, 2014 http://2014.adattarhazforum.hu/letoltes/2014dwforum/mta_sztaki_balassi_marton.pdf October 2014: 2nd public mention of this prototype was in October 7th 2014 https://www.youtube.com/watch?v=k2AOqwm_7ts at 10’37” http://data- artisans.com/apache-flink-new-kid-on-the-block/ November 2014: The first talk using ‘Flink Streaming’ at the ApacheCon on November 18th , 2014 http://events.linuxfoundation.org/sites/events/files/slides/flink_apachecon_small.pdf
  • 5. 5 1. Origin and evolution of data streaming capabilities in Flink 2015 June 2015: “I would consider stream data analysis to be a major unique selling proposition for Flink. Due to its pipelined architecture Flink is a perfect match for big data stream processing in the Apache stack.” – Volker Markl. Ref.: On Apache Flink. Interview with Volker Markl, June 24th 2015 http://www.odbms.org/blog/2015/06/on-apache-flink- interview-with-volker-markl/ June 2015: Flink 0.9 released on June 24, 2015, DataStream API in beta, exactly-once guarantees via checkpointing November 2015: Flink 0.10 released on November 16th, 2015, Event time support, windowing mechanism based on Dataflow/Beam model, graduated DataStream API, high availability, state backbends, new/updated connectors (Kafka, Nifi, ...), improved monitoring, …
  • 6. 6 1. Origin and evolution of streaming capabilities in Flink 2016 This Google paper “The Dataflow Model: A Practical Approach to Balancing Correctness, Latency, and Cost in Massive-Scale, Unbounded, Out-of-Order Data Processing” http://research.google.com/pubs/pub43864.html influenced Flink rich windowing semantics March 2016: Flink 1.0 released on March 8th 2016, Stable DataStream API, Out-of-core state, savepoints, CEP library, improved monitoring, Kafka 0.9 support, … April 2016: Apache Flink 1.0.1 was released on April 6th 2016. Flink 1.0.2 is being voted on.
  • 7. 7 1. Origin and evolution of streaming capabilities in Flink Post Flink 1.0 in 2016 Queryable state: query the state from within Flink instead of a database. Querying the state that Flink holds while it is doing its computation will effectively replace a database! Planned for Flink 1.1 SQL/StreamSQL and Table API Dynamic Scaling: Runtime scaling for DataStream programs Managed memory for streaming operators Security: Over-the-wire encryption of RPC (Akka) and data transfers (Netty)
  • 8. 8 1. Origin and evolution of streaming capabilities in Flink Expose more runtime metrics: Backpressure monitoring, Spilling / Out of Core Additional streaming connectors: Kinesis, Cassandra, … Making YARN resource dynamic Support for Apache Mesos https://issues.apache.org/jira/browse/FLINK-1984 Further reading: • Apache Flink Roadmap Draft, December 2015 https://docs.google.com/document/d/1ExmtVpeVVT3TIhO1JoBpC5JKXm- 778DAD7eqw5GANwE/edit • What’s next? Roadmap 2016. Robert Metzger, January 26, 2016. Berlin Apache Flink Meetup. http://www.slideshare.net/robertmetzger1/january-2016-flink-community-update- roadmap-2016/9
  • 9. 9 Agenda 1. Origin and evolution of streaming capabilities in Flink 2. Why Flink is suitable for real-world streaming analytics? 3. What are some streaming analytics use cases suitable for Flink? 4. What are some streaming analytics use cases from companies actually using Flink? 5. What are some novel use cases enabled by Flink? 6. Where do you go from here?
  • 10. 10 2. Why Flink is suitable for real-world streaming analytics? Apache Flink 1.0, which was released on March 8th 2016, comes with a competitive set of streaming analytics features, some of which are unique in the open source domain. The combination of these features makes Apache Flink a unique choice for real-world streaming analytics. Let’s discuss some of Apache Flink features for real- world streaming analytics.
  • 11. 11 2. Why Flink is suitable for real-world streaming analytics? 2.1. Pipelined processing engine 2.2. Stream abstraction: DataStream as in the real- world 2.3. Performance: Low latency and high throughput 2.4. Support for rich windowing semantics 2.5. Support for different notions of time 2.6. Stateful stream processing 2.7. Fault tolerance and correctness 2.8. High Availability 2.9. Backpressure handling 2.10. Expressive and easy-to-use APIs in Scala and Java 2.11. Support for batch 2.12. Integration with the Hadoop ecosystem
  • 12. 12 2.1. Pipelined processing engine  Flink is a pipelined (streaming) engine akin to parallel database systems, rather than a batch engine as Spark.  ‘Flink’s runtime is not designed around the idea that operators wait for their predecessors to finish before they start, but they can already consume partially generated results.’  ‘This is called pipeline parallelism and means that several transformations in a Flink program are actually executed concurrently with data being passed between them through memory and network channels.’ http://data-artisans.com/apache-flink-new-kid-on-the- block/
  • 13. 13 2.2. Stream abstraction: DataStream as in the real- world  Real world data is a series of events that are continuously produced by a variety of applications and disparate systems inside and outside the enterprise.  Flink, as a stream processing system, models streams as what they are in the real world, a series of events and use DataStream as an abstraction.  Spark, as a batch processing system, approximates these streams as micro-batches and uses DStream as an abstraction. This adds an artificial latency!
  • 14. 14 2.3. Performance: Low latency and high throughput Pipelined processing engine enable true low latency streaming applications with fast results in milliseconds High throughput: efficiently handle high volume of streams (millions of events per second) Tunable latency / throughput tradeoff: Using a tuning knob to navigate the latency-throughput trade off. Yahoo! benchmarked Storm, Spark Streaming and Flink with a production use-case (counting ad impressions grouped by campaign). Full Yahoo! Article, benchmark stops at low write throughput and programs are not fault tolerant. https://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming- computation-engines-at
  • 15. 15 2.3. Performance: Low latency and high throughput Full Data Artisans article, extends the Yahoo! benchmark to high volumes and uses Flink’s built-in state http://data-artisans.com/extending-the-yahoo-streaming-benchmark/ Flink outperformed both Spark Streaming and Storm in this benchmark modeled after a real-world application: • Flink achieves throughput of 15 million messages/second on a 10 machines cluster. This is 35x higher throughput compared to Storm (80x compared to Yahoo’s runs) • Flink ran with exactly once guarantees, Storm with at least once. Ultimately, you need to test the performance of your own streaming analytics application as it depends on your own logic and the version of your preferred stream processing tool!
  • 16. 16 2.4. Support for rich windowing semantics Flink provides rich windowing semantics. A window is a grouping of events based on some function of time (all records of the last 5 minutes), count (the last 10 events) or session (all the events of a particular web user ). Window types in Flink: • Tumbling windows ( no overlap) • Sliding windows (with overlap) • Session windows ( gap of activity) • Custom windows (with assigners, triggers and evictors)
  • 17. 17 2.4. Support for rich windowing semantics In many systems, these windows are hard-coded and connected with the system’s internal checkpointing mechanism. Flink is the first open source streaming engine that completely decouples windowing from fault tolerance, allowing for richer forms of windows, such as sessions. Further reading: • http://flink.apache.org/news/2015/12/04/Introducing-windows.html • http://beam.incubator.apache.org/beam/capability/2016/03/17/capability-matrix.html • https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-101 • https://www.oreilly.com/ideas/the-world-beyond-batch-streaming-102
  • 18. 18 2.5. Support for different notions of time In a streaming program with Flink, for example to define windows in respect to time, one can refer to different notions of time: • Event Time: when an event did happen in the real world. • Ingestion time: when data is loaded into Flink, from Kafka for example. • Processing Time: when data is processed by Flink In the real word, streams of events rarely arrive in the order that they are produced due to distributed sources, non-synced clocks, network delays… They are said to be “out of order’ streams. Flink is the first open source streaming engine that supports out of order streams and which is able to consistently process events according to their event time.
  • 19. 19 2.5. Support for different notions of time http://beam.incubator.apache.org/beam/capability/2016/03/17/capability-matrix.html https://ci.apache.org/projects/flink/flink-docs-master/concepts/concepts.html#time https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/event_time.html http://data-artisans.com/how-apache-flink-enables-new-streaming-applications-part-1/
  • 20. 20 2.6. Stateful stream processing Many operations in a dataflow simply look at one individual event at a time, for example an event parser. Some operations called stateful operations are defined as the ones where data is needed to be stored at the end of a window for computations occurring in later windows. Now, where the state of these stateful operations is maintained?
  • 21. 21 2.6. Stateful stream processing  The state can be stored in memory, in the File System or in RocksDB which is an embedded key value data store and not an external database.  Flink also supports state versioning through savepoints which are checkpoints of the state of a running streaming job that can be manually triggered by the user while the job is running.  Savepoints enable: • Code upgrades: both application and framework • Cluster maintenance and migration • A/B testing and what-if scenarios • Testing and debugging. • Restart a job with adjusted parallelism Further reading: http://data-artisans.com/how-apache-flink-enables-new-streaming- applications/  https://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/savepoints.html
  • 22. 22 2.7. Fault tolerance and correctness How to ensure that the state is correct after failures? Apache Flink offers a fault tolerance mechanism to consistently recover the state of data streaming applications. This ensures that even in the presence of failures, the operators do not perform duplicate updates to their state (exactly once guarantees). This basically means that the computed results are the same whether there are failures along the way or not. There is a switch to downgrade the guarantees to at least once if the use case tolerates duplicate updates.
  • 23. 23 2.7. Fault tolerance and correctness Further reading: • High-throughput, low-latency, and exactly-once stream processing with Apache Flinkhttp://data-artisans.com/high- throughput-low-latency-and-exactly-once-stream-processing-with-apache- flink/ • Data Streaming Fault Tolerance document: http://ci.apache.org/projects/flink/flink-docs- master/internals/stream_checkpointing.html • ‘Lightweight Asynchronous Snapshots for Distributed Dataflows’ http://arxiv.org/pdf/1506.08603v1.pdf June 28, 2015 • Distributed Snapshots: Determining Global States of Distributed Systems, February 1985, Chandra-Lamport algorithm http://research.microsoft.com/en- us/um/people/lamport/pubs/chandy.pdf
  • 24. 24 2.8. High Availability In the real world, streaming analytics applications need to be reliable and capable of running jobs for months and remain resilient in the event of failures. The JobManager (Master) is responsible for scheduling and resource management. If it crashes, no new programs can be submitted and running program will fail. Flink provides a High Availability (HA) mode to recover from JobManager crash, to eliminate the Single Point Of Failure (SPOF) Further reading: JobManager High Availability https://ci.apache.org/projects/flink/flink-docs- master/setup/jobmanager_high_availability.html
  • 25. 25 2.9. Backpressure handling In the real world, there are situations where a system is receiving data at a higher rate than it can normally process. This is called backpressure. Flink handles backpressure implicitly through its architecture without user interaction while backpressure handling in Spark is through manual configuration: spark.streaming.backpressure.enabled. Flink provides backpressure monitoring to allow users to understand bottlenecks in streaming applications. Further reading: • How Flink handles backpressure? by Ufuk Celebi, Kostas Tzoumas and Stephan Ewen, August 31, 2015. http://data-artisans.com/how-flink-handles- backpressure/
  • 26. 26 2.10. Expressive and easy-to-use APIs in Scala and Java  High level, expressive and easy to use DataStream API with flexible window semantics results in significantly less custom application logic compared to other open source stream processing solutions.  Flink's DataStream API ports many operators from its DataSet batch processing API such as map, reduce, and join to the streaming world.  In addition, it provides stream-specific operations such as window, split, connect, …  Its support for user-defined functions eases the implementation of custom application behavior.  The DataStream API is available in Scala and Java.
  • 27. 27 2.10. Expressive and easy-to-use APIs in Scala and Java case class Word (word: String, frequency: Int) val env = StreamExecutionEnvironment.getExecutionEnvironment() val lines: DataStream[String] = env.fromSocketStream(...) lines.flatMap {line => line.split(" ") .map(word => Word(word,1))} .window(Time.of(5,SECONDS)).every(Time.of(1,SECONDS)) .keyBy("word").sum("frequency") .print() env.execute() val env = ExecutionEnvironment.getExecutionEnvironment() val lines: DataSet[String] = env.readTextFile(...) lines.flatMap {line => line.split(" ") .map(word => Word(word,1))} .groupBy("word").sum("frequency") .print() env.execute() DataSet API (batch): WordCount DataStream API (streaming): Window WordCount
  • 28. 28 2.11. Support for batch  In Flink, batch processing is a special case of stream processing, as finite data sources are just streams that happen to end.  Flink offers a full toolset for batch processing with a dedicated DataSet API and libraries for machine learning and graph processing.  In addition, Flink contains several batch-specific optimizations such as for scheduling, memory management, and query optimization.  Flink out-performs dedicated batch processing engine such as Spark and Hadoop MapReduce in batch use cases.
  • 29. 29 2.12. Integration with the Hadoop ecosystem POSIX Java/Scala Collections POSIX
  • 30. 30 Agenda 1. Origin and evolution of streaming capabilities in Flink 2. Why Flink is suitable for real-world streaming analytics? 3. What are some streaming analytics use cases suitable for Flink? 4. What are some streaming analytics use cases from companies actually using Flink? 5. What are some novel use cases enabled by Flink? 6. Where do you go from here?
  • 31. 31 3. What are some streaming analytics use cases suitable for Flink? 1. Financial services 2. Telecommunications 3. Online gaming systems 4. Security & Intelligence 5. Advertisement serving 6. Sensor Networks 7. Social Media 8. Healthcare 9. Oil & Gas 10. Retail & eCommerce 11. Transportation and logistics
  • 32. 32 Agenda 1. Origin and evolution of streaming capabilities in Flink 2. Why Flink is suitable for real-world streaming analytics? 3. What are some streaming analytics use cases suitable for Flink? 4. What are some streaming analytics use cases from companies actually using Flink? 5. What are some novel use cases enabled by Flink? 6. Where do you go from here?
  • 33. 33 4. What are some streaming analytics use cases from companies actually using Flink? . Who is using Apache Flink?Some companies using Flink for streaming analytics: [Telecommunications] [Retail] [Financial Services] Gaming Security [Gaming] [Security] Powered by Flink [Companies, Software Projects, Universities/Research Institutes] https://cwiki.apache.org/confluence/display/FLINK/Powered+by+Flink
  • 34. 34 4. What are some streaming analytics use cases from companies actually using Flink?  Bouygues Telecom is a full-service communication operator (mobile, fixed telephony, TV, Internet, and Cloud computing) and one of the largest providers in France, with over 11 million mobile subscribers, …  Bouygues Telecom uses Flink for real-time event processing and analytics over billions of Kafka messages per day.  Stream processing at Bouygues Telecom with Apache Flink, by Mohamed Amine Abdessemed • Blog: http://data-artisans.com/flink-at-bouygues-html/ June 1st , 2015 • Slides: http://www.slideshare.net/FlinkForward/mohamed-amine-abdessemed- realtime-data-integration-with-apache-flink-kafka • Video: https://www.youtube.com/watch?v=hjmgZfXSi3M
  • 35. 35 4. What are some streaming analytics use cases from companies actually using Flink? Otto Group is the world’s second-largest online retailer in the end-consumer (B2C) business and Europe’s largest online retailer in the end-consumer B2C fashion and lifestyle business.  “A range of exciting projects at the BI department were implemented with Apache Flink, e.g. a crowd-sourced user-agent identification, and a search session identifier.”  How we selected Apache Flink as our Stream Processing Framework at the Otto Group Business Intelligence Department? October 6, 2015 Blog: http://data-artisans.com/how-we-selected-apache-flink-at-otto-group/ Slides: http://www.slideshare.net/FlinkForward/christian-kreuzfeld-static-vs-dynamic-stream-processing Video: https://www.youtube.com/watch?v=cnqPyw_uQAQ
  • 36. 36 4. What are some streaming analytics use cases from companies actually using Flink?  At king.com, Flink is used to process more than 30 billion events daily and compute real-time player statistics by leveraging Flink's stateful streaming abstractions and Complex Event Processing. References: • Apache Software Foundation Blog, March 8th 2016 • Blog:https://blogs.apache.org/foundation/entry/the_apache_software_foundation_announces88 • Hadoop Summit Dublin 2016, April 13, 2016 • Slides: http://www.slideshare.net/GyulaFra/largescale-stream-processing-in-the-hadoop- ecosystem-hadoop-summit-2016-60887821/3 • Video: https://www.youtube.com/watch?v=mRhCpp-p11E
  • 37. 37 4. What are some streaming analytics use cases from companies actually using Flink?  Zalando(.com) is Europe’s leading online fashion platform, doing business in 15 markets and attracting well over 100 million visits per month.  “Delivering first-class shopping experiences to our +14 million customers requires moving fast and using cutting-edge, open-source technologies.”  Near real time business intelligence for the following use cases: Business process monitoring and continuous ETL  Apache Showdown: Flink vs. Spark by Javier Lopez, Mihail Vieru - 31 March 2016https://tech.zalando.com/blog/apache- showdown-flink-vs.-spark/
  • 38. 38 4. What are some streaming analytics use cases from companies actually using Flink? Capital One is a top 10 leading consumer and commercial banking institution which is conducting business in the US, Canada and UK. Flink was used for Real-Time monitoring of customer activity data (Audit log event details, failure and success data, … ) to: • proactively detect and resolve issue immediately • prevent significant customer impact • enable flawless digital enterprise experience Flink Case study at Capital One, 2015 FlinkForward Conference, Berlin, Germany October 12th 2015 http://www.slideshare.net/FlinkForward/flink-case-study-capital-one
  • 39. 39 Real-Time Monitoring of Customer Activity
  • 40. 40 4. What are some streaming analytics use cases from companies actually using Flink?  has its hack week and the winner, announced on December 18th 2015, was a Flink based streaming project! Extending the Yahoo! Streaming Benchmark and Winning Twitter Hack-Week with Apache Flink. Posted on February 2, 2016 by Jamie Grier http://data-artisans.com/extending-the-yahoo-streaming-benchmark/ http://www.slideshare.net/JamieGrier/stateful-stream-processing-at-inmemory-speed  did some benchmarks to compare performance of one of their use case originally implemented on Apache Storm against Spark Streaming and Flink. Results posted on December 18, 2015 • http://yahooeng.tumblr.com/post/135321837876/benchmarking-streaming-computation-engines- at • http://data-artisans.com/extending-the-yahoo-streaming-benchmark/ • https://github.com/dataArtisans/yahoo-streaming-benchmark • http://www.slideshare.net/JamieGrier/extending-the-yahoo-streaming-benchmark
  • 41. 41 Generic Streaming Analytics Architectural pattern: This is changing with Flink’s alerts, StreamSQL, state querying, FlinkCEP, … Event Producers Collector Broker Processor Indexer Visualizer/Search • Kafka • RabitMQ • JMS • Amazon Kinesis • Google Cloud Pub/Sub • MapR Streams • Flink • Spark • Storm • Samza • Kafka streams • ElasticSearch • Solr • Cassandra • HBase • MapR DB • MongoDB • Apache Geode • Kibana • Custom GUI • Flume • SpringXD • Logstash • Nifi • Fluentd • Apps • Devices • Sensors
  • 42. 42 Agenda 1. Origin and evolution of streaming capabilities in Flink 2. Why Flink is suitable for real-world streaming analytics? 3. What are some streaming analytics use cases suitable for Flink? 4. What are some streaming analytics use cases from companies actually using Flink? 5. What are some novel use cases enabled by Flink? 6. Where do you go from here?
  • 43. 43 5. What are some novel use cases enabled by Flink? 5.1. Flink as an imbedded key/value data store 5.2. Flink as a distributed CEP engine
  • 44. 44 5.1. Flink as an imbedded key/value data store  The stream processor as a database: a new design pattern for data streaming applications, using Apache Flink and Apache Kafka: Building applications directly on top of the stream processor, rather than on top of key/value databases populated by data streams.  The stateful operator features in Flink allow a streaming application to query state in the stream processor instead of a key/value store often a bottleneck http://data-artisans.com/extending-the-yahoo-streaming-benchmark/
  • 45. 45 “State querying” feature is expected in upcoming Flink 1.1 http://www.slideshare.net/JamieGrier/stateful-stream-processing-at-inmemory-speed/38
  • 46. 46 5.2. Flink as a distributed CEP engine Flink stream processor as CEP (Complex Event Processing) engine. Example: an application that ingests network monitoring events, identifies access patterns such as intrusion attempts using FlinkCEP, and analyzes and aggregates identified access patterns. Upcoming Talk: Streaming analytics and CEP - Two sides of the same coin’ by Till Rohrmann and Fabian Hueske at the Berlin Buzzwords on June 05-07 2016. http://berlinbuzzwords.de/session/streaming-analytics-and-cep-two-sides-same-coin Further reading: – Introducing Complex Event Processing (CEP) with Apache Flink, Till Rohrmann April 6, 2016 http://flink.apache.org/news/2016/04/06/cep- monitoring.html – FlinkCEP - Complex event processing for Flinkhttps://ci.apache.org/projects/flink/flink-docs-master/apis/streaming/libs/cep.html
  • 47. 47 5.2. Flink as a distributed CEP engine Pattern<MonitoringEvent, ?> warningPattern = Pattern.<MonitoringEvent>begin("First Event") .subtype(TemperatureEvent.class) .where(evt -> evt.getTemperature()>=THRESHOLD) .next("Second Event") .subtype(TemperatureEvent.class) .where(evt -> evt.getTemperature() >= THRESHOLD) .within(Time.seconds(10));
  • 48. 48 Agenda 1. Why streaming analytics are emerging? 2. Why Flink is suitable for real-world streaming analytics? 3. What are some streaming analytics use cases suitable for Flink? 4. What are some streaming analytics use cases from companies actually using Flink? 5. What are some novel use cases enabled by Flink? 6. Where do you go from here?
  • 49. 49 6. Where do you go from here?  A few resources for you: • Overview of Apache Flink: the 4G of Big Data Analytics Frameworks, Hadoop Summit Europe, April 13th 2016 • Slides: http://www.slideshare.net/SlimBaltagi/overview-of-apache-fink-the-4-g- of-big-data-analytics-frameworks • Video: https://www.youtube.com/watch?v=_BZURQn2EQI • Flink Knowledge Base: One-Stop for everything related to Apache Flink. http://sparkbigdata.com/component/tags/tag/27-flink • Flink at the Apache Software Foundation: flink.apache.org/ • Free Apache Flink training from data Artisans http://dataartisans.github.io/flink-training • Flink Forward Conference, 12-14 September 2016, Berlin, Germany http://flink-forward.org/ (call for submissions announced on April 13th , 2016)
  • 50. 50 6. Where do you go from here? • Free ebook from MapR: Streaming Architecture: New Designs Using Apache Kafka and MapR Streams https://www.mapr.com/streaming-architecture-using-apache-kafka-mapr-streams • Free ebook from Confluent: Making sense of stream processing http://www.confluent.io/making-sense-of-stream-processing- ebook  A few takeaways: • Apache Flink unique capabilities enable new and sophisticated use cases especially for real-world streaming analytics. • Customers demand will push major Hadoop distributors to package Flink and support it. • Apache Flink will enable innovations and disruptions in many verticals with its capabilities in real-world streaming analytics.
  • 51. 51 Thanks! To all of you for attending! Let’s keep in touch! • sbaltagi@gmail.com • @SlimBaltagi • https://www.linkedin.com/in/slimbaltagi Any questions?