2. Streaming in Flink 0.10
• Operational readiness
High Availability
Monitoring
Integration with other systems
• First-class support for event-time
• Hardened statefulness support
• Redefined API
3. Streaming in Flink 0.10
• Some breaking changes
GroupBy -> KeyBy
Windowing API completely changed
DataStream and alike naming
Internal rewrite
The goal is to harden for 1.0
6. Streaming data never stops
Window (5 min)
Count #Hashtags
Just saw #Trump on #CNN,
super cool. :D
Trump: 2394
Cheese: 12984
Money: 42
7. 7
What I didn’t mention
• tweets have a timestamp,
their event time
• tweets from across the globe
arrive with delay
=> tweets with different
timestamps arrive out-of-order
8. Window (5 min)
Count #Hashtags
12:34 (13.10.2015):
Just saw #Trump on #CNN,
super cool. :D
Trump: 2394
Cheese: 12984
Money: 42
These arrive with 3
minutes slack
Form windows based on
processing time of the
machine.
Processing Time != Event Time
8
9. 9
Why do people use this?
• easy to implement
• low latency
• this is what systems give you
(Spark Streaming, Apex,
Samza, Storm)*
*not Google Cloud Dataflow
11. 11
Window (5 min)
Correlate Tweets
and News
something...
These still have 3 min slack.
These have 8 min slack.
12:33 (13.10.2015):
Donald Trump speaks at
Cheese conference.
Processing Time != Event Time
13. 13
Use cases
• out-of-order elements
• sources with delay
• recovery/fault-tolerance
• “catching up” with a stream
Who does it?
• Google Cloud Dataflow
• Apache Flink
19. 19
Event Time
StreamExecutionEnvironment env =
StreamExecutionEnvironment.getExecutionEnvironment();
env.setStreamTimeCharacteristic(EventTime);
DataStream<Tweet> text = env.addSource(new TwitterSrc());
text = text.assignTimestamps(new MyTimestampExtractor());
DataStream<Tuple2<String, Integer>> counts = text
.flatMap(new ExtractHashtags())
.keyBy(“name”)
.timeWindow(Time.of(5, MINUTES)
.apply(new HashtagCounter());
20. Fault tolerance in streaming
Fault-tolerance in streaming systems is inherently harder than in batch
• Can’t just restart computation
• State is a problem
• Fast recovery is crucial
• Streaming topologies run 24/7 for a long period
Fault-tolerance is a complex issue
• No single point of failure is allowed
• Guaranteeing input processing
• Consistent operator state
• Fast recovery
• At-least-once vs Exactly-once semantics
22. Consistency - Flink distributed snapshots
Based on consistent global snapshots
Algorithm designed for stateful dataflows (minimal runtime
overhead)
Exactly-once semantics
23. Stateful streaming applications
ETL style operations
Filter incoming data,
Log analysis
High throughput, connectors, at-least-
once processing
Window aggregations
Trending tweets,
User sessions, Stream joins
Window abstractions
Inpu
t
Inpu
t
Inpu
tInput
Process/Enrich
24. Stateful streaming applications
Machine learning
Fitting trends to the evolving
stream, Stream clustering
Model state, cyclic flows
Pattern recognition
Fraud detection, Triggering signals
based on activity
Exactly-once processing
25. Statefulness in 0.9.1
Stateful dataflow operators (conceptually similar to Samza)
Two state access patterns
Local (Task) state
Partitioned (Key) state
Proper API integration
Java: OperatorState interface
Scala: mapWithState, flatMapWithState…
Exactly-once semantics by checkpointing
26. Stateful API
words.keyBy(x => x).mapWithState {
(word, count: Option[Int]) =>
{
val newCount = count.getOrElse(0) + 1
val output = (word, newCount)
(output, Some(newCount))
}
}
27. Local state example (Java)
public class MySource extends RichParallelSourceFunction {
// Omitted details
private OperatorState<Long> offset;
@Override
public void run(SourceContext ctx) {
Object checkpointLock = ctx.getCheckpointLock();
isRunning = true;
while (isRunning) {
synchronized (checkpointLock) {
offset.update(offset.value() + 1);
//ctx.collect(next);
}
}
}
}
28. Statefulness in 0.10
Internal operators are checkpointed
Aggregations
Window operators
…
KeyValue state
Easing common acces patterns
Flexible state backend interface
Removes non-partitioned operator state
33. Summary - Streaming in Flink 0.10
• Operational readiness
High Availability
Monitoring
Integration with other systems
• First-class support for event-time
• Hardened statefulness support
• Redefined API
34. Thanks for the slides
• Material borrowed from:
flink.apache.org
Stephan Ewen
Aljoscha Krettek
Gyula Fóra
Editor's Notes
Slack is the amount of time by which elements arrive late.
Catching up, for example with elements in Kafka, you would still want correct windows based on timestamp in elements.