SlideShare una empresa de Scribd logo
1 de 53
Flink 
internals 
Kostas Tzoumas 
Flink committer & 
Co-founder, data Artisans 
ktzoumas@apache.org 
@kostas_tzoumas
Welcome 
 Last talk: how to program PageRank in Flink, 
and Flink programming model 
 This talk: how Flink works internally 
 Again, a big bravo to the Flink community 
2
Recap: 
Using Flink 
3
DataSet and transformations 
Input X First Y Second 
Operator X Operator Y 
ExecutionEnvironment env = 
ExecutionEnvironment.getExecutionEnvironment(); 
DataSet<String> input = env.readTextFile(input); 
DataSet<String> first = input 
.filter (str -> str.contains(“Apache Flink“)); 
DataSet<String> second = first 
.filter (str -> str.length() > 40); 
second.print() 
env.execute(); 
4
Available transformations 
 map 
 flatMap 
 filter 
 reduce 
 reduceGroup 
 join 
 coGroup 
 aggregate 
 cross 
 project 
 distinct 
 union 
 iterate 
 iterateDelta 
 repartition 
 … 
5
Other API elements & tools 
 Accumulators and counters 
• Int, Long, Double counters 
• Histogram accumulator 
• Define your own 
 Broadcast variables 
 Plan visualization 
 Local debugging/testing mode 
6
Data types and grouping 
public static class Access { 
public int userId; 
public String url; 
... 
} 
public static class User { 
public int userId; 
public int region; 
public Date customerSince; 
... 
} 
DataSet<Tuple2<Access,User>> campaign = access.join(users) 
.where(“userId“).equalTo(“userId“) 
DataSet<Tuple3<Integer,String,String> someLog; 
someLog.groupBy(0,1).reduceGroup(...); 
 Bean-style Java classes & field names 
 Tuples and position addressing 
 Any data type with key selector function 
7
Other API elements 
 Hadoop compatibility 
• Supports all Hadoop data types, input/output 
formats, Hadoop mappers and reducers 
 Data streaming API 
• DataStream instead of DataSet 
• Similar set of operators 
• Currently in alpha but moving very fast 
 Scala and Java APIs (mirrored) 
 Graph API (Spargel) 
8
Intro to 
internals 
9
Task 
for (String token : value.split("W")) { 
out.collect(new Tuple2<>(token, 1)); 
Manager 
DataSet<String> text = env.readTextFile(input); 
DataSet<Tuple2<String, Integer>> result = text 
Job 
Manager 
Task 
Manager 
.flatMap((str, out) -> { 
}) 
.groupBy(0) 
.aggregate(SUM, 1); 
Flink Client & 
Optimizer 
O Romeo, 
Romeo, 
wherefore 
art thou 
Romeo? 
O, 1 
Romeo, 3 
wherefore, 1 
art, 1 
thou, 1 
Apache Flink 
10 
Nor arm, 
nor face, 
nor any 
other part 
nor, 3 
arm, 1 
face, 1, 
any, 1, 
other, 1 
part, 1
If you want to know one 
thing about Flink is that 
you don’t need to know 
the internals of Flink. 
11
Philosophy 
 Flink “hides” its internal workings from the 
user 
 This is good 
• User does not worry about how jobs are executed 
• Internals can be changed without breaking 
changes 
 … and bad 
• Execution model more complicated to explain 
compared to MapReduce or Spark RDD 
12
Recap: DataSet 
Input X First Y Second 
Operator X Operator Y 
13 
ExecutionEnvironment env = 
ExecutionEnvironment.getExecutionEnvironment(); 
DataSet<String> input = env.readTextFile(input); 
DataSet<String> first = input 
.filter (str -> str.contains(“Apache Flink“)); 
DataSet<String> second = first 
.filter (str -> str.length() > 40); 
second.print() 
env.execute();
Common misconception 
Input X First Y Second 
 Programs are not executed eagerly 
 Instead, system compiles program to an 
execution plan and executes that plan 
14
DataSet<String> 
 Think of it as a PCollection<String>, or a 
Spark RDD[String] 
 With a major difference: it can be 
produced/recovered in several ways 
• … like a Java collection 
• … like an RDD 
• … perhaps it is never fully materialized (because 
the program does not need it to) 
• … implicitly updated in an iteration 
 And this is transparent to the user 
15
Example: grep 
Romeo, 
Romeo, 
where art 
thou Romeo? 
Load Log 
Search 
for str1 
Search 
for str2 
Search 
for str2 
Grep 1 
Grep 2 
Grep 2 
16
Staged (batch) execution 
Romeo, 
Romeo, 
where art 
thou Romeo? 
Load Log 
Search 
for str1 
Search 
for str2 
Search 
for str2 
Grep 1 
Grep 2 
Grep 2 
Stage 1: 
Create/cache Log 
Subseqent stages: 
Grep log for matches 
Caching in-memory 
and disk if needed 
17
Pipelined execution 
Romeo, 
Romeo, 
where art 
thou Romeo? 
Load Log 
Search 
for str1 
Search 
for str2 
Search 
for str2 
Grep 1 
Grep 2 
Grep 2 
00110011 
Stage 1: 
Deploy and start operators 
Data transfer in-memory 
and disk if 
needed 
18 
Note: Log 
DataSet is 
never 
“created”!
Benefits of pipelining 
 25 node cluster 
 Grep log for 3 
terms 
 Scale data size 
from 100GB to 
1TB 
2500 
2250 
2000 
1750 
1500 
1250 
1000 
750 
500 
250 
0 
0 100 200 300 400 500 600 700 800 900 1000 
Time to complete grep (sec) 
Cluster memory Data size (GB) 
exceeded 19
20
Drawbacks of pipelining 
 Long pipelines may be active at the same time leading to 
memory fragmentation 
• FLINK-1101: Changes memory allocation from static to adaptive 
 Fault-tolerance harder to get right 
• FLINK-986: Adds intermediate data sets (similar to RDDS) as 
first-class citizen to Flink Runtime. Will lead to fine-grained fault-tolerance 
among other features. 
21
Example: Iterative processing 
DataSet<Page> pages = ... 
DataSet<Neighborhood> edges = ... 
DataSet<Page> oldRanks = pages; DataSet<Page> newRanks; 
for (i = 0; i < maxIterations; i++) { 
newRanks = update(oldRanks, edges) 
oldRanks = newRanks 
} 
DataSet<Page> result = newRanks; 
DataSet<Page> update (DataSet<Page> ranks, DataSet<Neighborhood> adjacency) { 
return oldRanks 
.join(adjacency) 
.where(“id“).equalTo(“id“) 
.with ( (page, adj, out) -> { 
for (long n : adj.neighbors) 
out.collect(new Page(n, df * page.rank / adj.neighbors.length)) 
}) 
.groupBy(“id“) 
.reduce ( (a, b) -> new Page(a.id, a.rank + b.rank) ); 
22
Iterate by unrolling 
Client 
Step Step Step Step Step 
 for/while loop in client submits one job per iteration 
step 
 Data reuse by caching in memory and/or disk 
23
Iterate natively 
DataSet<Page> pages = ... 
DataSet<Neighborhood> edges = ... 
IterativeDataSet<Page> pagesIter = pages.iterate(maxIterations); 
DataSet<Page> newRanks = update (pagesIter, edges); 
DataSet<Page> result = pagesIter.closeWith(newRanks) 
24 
partial 
solution 
partial 
solution X 
other 
datasets 
Y 
initial 
solution 
iteration 
result 
Replace 
Step function
Iterate natively with deltas 
Replace 
workset A B workset 
partial 
solution 
delta 
set X 
other 
datasets 
Y 
initial 
workset 
initial 
solution 
iteration 
result 
Merge deltas 
DeltaIteration<...> pagesIter = pages.iterateDelta(initialDeltas, maxIterations, 0); 
DataSet<...> newRanks = update (pagesIter, edges); 
DataSet<...> newRanks = ... 
DataSet<...> result = pagesIter.closeWith(newRanks, deltas) 
See http://data-artisans.com/data-analysis-with-flink.html 25
Native, unrolling, and delta 
26
Dissecting 
Flink 
27
28
Flink stack 
Apache Tez 
Data 
storage 
Flink Optimizer Flink Stream Builder 
Rabbit 
MQ 
Embedded execution 
(Java collections) 
Files HDFS S3 JDBC Kafka 
Redis 
Azure 
tables 
… 
Local execution 
Flink Runtime 
YARN EC2 
Common API 
Scala API 
(batch) 
Java API 
(streaming) 
Java API 
(batch) 
Python API 
(upcoming) 
Graph API 
Apache 
MRQL 
Flink Execution Engine 
29
Flink stack 
30 
Common API 
Flink Optimizer Flink Stream Builder 
Scala API 
(batch) 
Java API 
(streaming) 
Java API 
(batch) 
Python API 
(upcoming) 
Graph API 
Apache 
MRQL 
Flink Local Runtime 
Embedded 
environment 
(Java collections) 
Local 
Environment 
(for debugging) 
Remote environment 
(Regular cluster execution) 
Apache Tez 
Flink cluster YARN 
Data 
storage 
Rabbit 
MQ 
Files HDFS S3 JDBC Kafka 
Redis 
Azure 
tables 
… 
Single node execution
Program lifecycle 
31 
val source1 = … 
val source2 = … 
val maxed = source1 
.map(v => (v._1,v._2, 
math.max(v._1,v._2)) 
val filtered = source2 
.filter(v => (v._1 > 4)) 
val result = maxed 
.join(filtered).where(0).equalTo(0) 
.filter(_1 > 3) 
.groupBy(0) 
.reduceGroup {……} 
1
 The optimizer is the 
component that selects 
an execution plan for a 
Common API program 
 Think of an AI system 
manipulating your 
program for you  
 But don’t be scared – it 
works 
• Relational databases have 
been doing this for 
decades – Flink ports the 
technology to API-based 
systems 
Flink Optimizer 
32
A simple program 
33 
DataSet<Tuple5<Integer, String, String, String, Integer>> orders = … 
DataSet<Tuple2<Integer, Double>> lineitems = … 
DataSet<Tuple2<Integer, Integer>> filteredOrders = orders 
.filter(. . .) 
.project(0,4).types(Integer.class, Integer.class); 
DataSet<Tuple3<Integer, Integer, Double>> lineitemsOfOrders = filteredOrders 
.join(lineitems) 
.where(0).equalTo(0) 
.projectFirst(0,1).projectSecond(1) 
.types(Integer.class, Integer.class, Double.class); 
DataSet<Tuple3<Integer, Integer, Double>> priceSums = lineitemsOfOrders 
.groupBy(0,1).aggregate(Aggregations.SUM, 2); 
priceSums.writeAsCsv(outputPath);
Two execution plans 
34 
GroupRed 
sort 
Combine 
Map DataSource 
Filter 
DataSource 
orders.tbl 
lineitem.tbl 
Join 
Hybrid Hash 
buildHT probe 
broadcast forward 
Map DataSource 
Filter 
DataSource 
orders.tbl 
lineitem.tbl 
Join 
Hybrid Hash 
buildHT probe 
hash-part [0] hash-part [0] 
hash-part [0,1] 
GroupRed 
sort 
Best plan forward 
depends on 
relative sizes 
of input files
Flink Local Runtime 
 Local runtime, not the 
35 
distributed execution 
engine 
 Aka: what happens 
inside every parallel 
task
Flink runtime operators 
 Sorting and hashing data 
• Necessary for grouping, aggregation, reduce, 
join, cogroup, delta iterations 
 Flink contains tailored implementations of 
hybrid hashing and external sorting in 
Java 
• Scale well with both abundant and restricted 
memory sizes 
36
Internal data representation 
37 
JVM Heap 
map 
JVM Heap 
reduce 
O Romeo, 
Romeo, 
wherefore 
art thou 
Romeo? 
00110011 
art, 1 
O, 1 
Romeo, 1 
Romeo, 1 
00110011 
00010111 
01110001 
01111010 
00010111 
00110011 
Network transfer 
Local sort 
How is intermediate data internally represented?
Internal data representation 
 Two options: Java objects or raw bytes 
 Java objects 
• Easier to program 
• Can suffer from GC overhead 
• Hard to de-stage data to disk, may suffer from “out of 
memory exceptions” 
 Raw bytes 
• Harder to program (customer serialization stack, more 
involved runtime operators) 
• Solves most of memory and GC problems 
• Overhead from object (de)serialization 
 Flink follows the raw byte approach 
38
Memory in Flink 
public class WC { 
public String word; 
public int count; 
} 
empty 
page 
Pool of Memory Pages 
JVM Heap 
User code 
objects 
Sorting, 
hashing, 
caching 
Shuffling, 
broadcasts 
heap 
Unmanaged 
Managed 
heap 
Network 
buffers 
39
Memory in Flink (2) 
 Internal memory management 
• Flink initially allocates 70% of the free heap as byte[] 
segments 
• Internal operators allocate() and release() these 
segments 
 Flink has its own serialization stack 
• All accepted data types serialized to data segments 
 Easy to reason about memory, (almost) no 
OutOfMemory errors, reduces the pressure to the 
GC (smooth performance) 
40
Operating on serialized data 
Microbenchmark 
 Sorting 1GB worth of (long, double) tuples 
 67,108,864 elements 
 Simple quicksort 
41
Flink Execution Engine 
42 
 The distributed 
execution engine 
 Pipelined 
• Same engine for Flink 
and Flink streaming 
 Pluggable 
• Local runtime can be 
executed on other 
engines 
• E.g., Java collections 
and Apache Tez
Closing 
43
Summary 
 Flink decouples API from execution 
• Same program can be executed in many different 
ways 
• Hopefully users do not need to care about this and 
still get very good performance 
 Unique Flink internal features 
• Pipelined execution, native iterations, optimizer, 
serialized data manipulation, good disk destaging 
 Very good performance 
• Known issues currently worked on actively 
44
Stay informed 
 flink.incubator.apache.org 
• Subscribe to the mailing lists! 
• http://flink.incubator.apache.org/community.html#mailing-lists 
 Blogs 
• flink.incubator.apache.org/blog 
• data-artisans.com/blog 
 Twitter 
• follow @ApacheFlink 
45
46
That’s it, time for beer 
47
Appendix 
48
Flink in context 
49 
Hive 
Mahout 
MapReduce 
Flink 
Spark Storm 
Yarn Mesos 
HDFS 
Cascading 
… 
Tez 
Pig 
Applications 
Data processing 
engines 
App and resource 
management 
Storage, streams HBase Kafka 
…
Common API 
 Notion of “DataSet” is no 
longer present 
 Program is a DAG of 
operators 
DataSource 
DataSource 
MapOperator 
FilterOperator 
JoinOperator 
DataSink 
Operator 
50
Example: Joins in Flink 
DataSet<Order> large = ... 
DataSet<Lineitem> medium = ... 
DataSet<Customer> small = ... 
⋈ 
⋈ 
DataSet<Tuple...> joined1 = large.join(medium).where(3).equals(1) 
.with(new JoinFunction() { ... }); 
DataSet<Tuple...> joined2 = small.join(joined1).where(0).equals(2) 
.with(new JoinFunction() { ... }); 
DataSet<Tuple...> result = joined2.groupBy(3).aggregate(MAX, 2); 
small 
51 
Built-in strategies include partitioned join and replicated join with 
local sort-merge or hybrid-hash algorithms. 
γ 
large medium
Optimizer Example 
DataSet<Tuple...> large = env.readCsv(...); 
DataSet<Tuple...> medium = env.readCsv(...); 
DataSet<Tuple...> small = env.readCsv(...); 
DataSet<Tuple...> joined1 = large.join(medium).where(3).equals(1) 
.with(new JoinFunction() { ... }); 
DataSet<Tuple...> joined2 = small.join(joined1).where(0).equals(2) 
.with(new JoinFunction() { ... }); 
DataSet<Tuple...> result = joined2.groupBy(3).aggregate(MAX, 2); 
52 
1) Partitioned hash-join 
2) Broadcast hash-join 
3) Grouping /Aggregation reuses the partitioning 
from step (1)  No shuffle!!! 
Partitioned ≈ Reduce-side 
Broadcast ≈ Map-side
Operating on serialized data 
53 
serializes data every time 
 Highly robust, never gives up on you 
works on objects, RDDs may be stored serialized 
Serialization considered slow, only when needed 
makes serialization really cheap: 
 partial deserialization, operates on serialized form 
 Efficient and robust!

Más contenido relacionado

La actualidad más candente

Don't Cross The Streams - Data Streaming And Apache Flink
Don't Cross The Streams  - Data Streaming And Apache FlinkDon't Cross The Streams  - Data Streaming And Apache Flink
Don't Cross The Streams - Data Streaming And Apache FlinkJohn Gorman (BSc, CISSP)
 
Flink Streaming @BudapestData
Flink Streaming @BudapestDataFlink Streaming @BudapestData
Flink Streaming @BudapestDataGyula Fóra
 
Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016Stephan Ewen
 
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Robert Metzger
 
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)Apache Flink Taiwan User Group
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapKostas Tzoumas
 
Tech Talk @ Google on Flink Fault Tolerance and HA
Tech Talk @ Google on Flink Fault Tolerance and HATech Talk @ Google on Flink Fault Tolerance and HA
Tech Talk @ Google on Flink Fault Tolerance and HAParis Carbone
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkFabian Hueske
 
Marton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream ProcessingMarton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream ProcessingFlink Forward
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkDataWorks Summit
 
Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016Stephan Ewen
 
Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016Kostas Tzoumas
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System OverviewFlink Forward
 
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)ucelebi
 
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Tran Nam-Luc – Stale Synchronous Parallel Iterations on FlinkTran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Tran Nam-Luc – Stale Synchronous Parallel Iterations on FlinkFlink Forward
 
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache FlinkMaximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache FlinkFlink Forward
 
Ufuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemUfuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemFlink Forward
 

La actualidad más candente (20)

Don't Cross The Streams - Data Streaming And Apache Flink
Don't Cross The Streams  - Data Streaming And Apache FlinkDon't Cross The Streams  - Data Streaming And Apache Flink
Don't Cross The Streams - Data Streaming And Apache Flink
 
Flink Streaming @BudapestData
Flink Streaming @BudapestDataFlink Streaming @BudapestData
Flink Streaming @BudapestData
 
Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016Continuous Processing with Apache Flink - Strata London 2016
Continuous Processing with Apache Flink - Strata London 2016
 
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
Architecture of Flink's Streaming Runtime @ ApacheCon EU 2015
 
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
Stream Processing with Apache Flink (Flink.tw Meetup 2016/07/19)
 
Apache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmapApache Flink: API, runtime, and project roadmap
Apache Flink: API, runtime, and project roadmap
 
Unified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache FlinkUnified Stream and Batch Processing with Apache Flink
Unified Stream and Batch Processing with Apache Flink
 
Tech Talk @ Google on Flink Fault Tolerance and HA
Tech Talk @ Google on Flink Fault Tolerance and HATech Talk @ Google on Flink Fault Tolerance and HA
Tech Talk @ Google on Flink Fault Tolerance and HA
 
Data Stream Processing with Apache Flink
Data Stream Processing with Apache FlinkData Stream Processing with Apache Flink
Data Stream Processing with Apache Flink
 
Marton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream ProcessingMarton Balassi – Stateful Stream Processing
Marton Balassi – Stateful Stream Processing
 
Real-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache FlinkReal-time Stream Processing with Apache Flink
Real-time Stream Processing with Apache Flink
 
Apache flink
Apache flinkApache flink
Apache flink
 
Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016Apache Flink Berlin Meetup May 2016
Apache Flink Berlin Meetup May 2016
 
Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016Apache Flink at Strata San Jose 2016
Apache Flink at Strata San Jose 2016
 
Apache Flink Training: System Overview
Apache Flink Training: System OverviewApache Flink Training: System Overview
Apache Flink Training: System Overview
 
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
Unified Stream & Batch Processing with Apache Flink (Hadoop Summit Dublin 2016)
 
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Tran Nam-Luc – Stale Synchronous Parallel Iterations on FlinkTran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
Tran Nam-Luc – Stale Synchronous Parallel Iterations on Flink
 
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache FlinkMaximilian Michels – Google Cloud Dataflow on Top of Apache Flink
Maximilian Michels – Google Cloud Dataflow on Top of Apache Flink
 
Ufuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one SystemUfuc Celebi – Stream & Batch Processing in one System
Ufuc Celebi – Stream & Batch Processing in one System
 
The Stream Processor as a Database Apache Flink
The Stream Processor as a Database Apache FlinkThe Stream Processor as a Database Apache Flink
The Stream Processor as a Database Apache Flink
 

Destacado

Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsSlim Baltagi
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkKostas Tzoumas
 
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
Apache Fink 1.0: A New Era  for Real-World Streaming AnalyticsApache Fink 1.0: A New Era  for Real-World Streaming Analytics
Apache Fink 1.0: A New Era for Real-World Streaming AnalyticsSlim Baltagi
 
Click-Through Example for Flink’s KafkaConsumer Checkpointing
Click-Through Example for Flink’s KafkaConsumer CheckpointingClick-Through Example for Flink’s KafkaConsumer Checkpointing
Click-Through Example for Flink’s KafkaConsumer CheckpointingRobert Metzger
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkSlim Baltagi
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Slim Baltagi
 

Destacado (8)

Apache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming AnalyticsApache Flink: Real-World Use Cases for Streaming Analytics
Apache Flink: Real-World Use Cases for Streaming Analytics
 
Streaming in the Wild with Apache Flink
Streaming in the Wild with Apache FlinkStreaming in the Wild with Apache Flink
Streaming in the Wild with Apache Flink
 
Spark sql meetup
Spark sql meetupSpark sql meetup
Spark sql meetup
 
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
Apache Fink 1.0: A New Era  for Real-World Streaming AnalyticsApache Fink 1.0: A New Era  for Real-World Streaming Analytics
Apache Fink 1.0: A New Era for Real-World Streaming Analytics
 
Click-Through Example for Flink’s KafkaConsumer Checkpointing
Click-Through Example for Flink’s KafkaConsumer CheckpointingClick-Through Example for Flink’s KafkaConsumer Checkpointing
Click-Through Example for Flink’s KafkaConsumer Checkpointing
 
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics FrameworkOverview of Apache Flink: Next-Gen Big Data Analytics Framework
Overview of Apache Flink: Next-Gen Big Data Analytics Framework
 
Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink Step-by-Step Introduction to Apache Flink
Step-by-Step Introduction to Apache Flink
 
Flink vs. Spark
Flink vs. SparkFlink vs. Spark
Flink vs. Spark
 

Similar a Flink internals web

Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep DiveVasia Kalavri
 
FastR+Apache Flink
FastR+Apache FlinkFastR+Apache Flink
FastR+Apache FlinkJuan Fumero
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flinkmxmxm
 
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustStructuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustSpark Summit
 
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...Databricks
 
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...Robert Metzger
 
SparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsSparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsDatabricks
 
Data Analysis With Apache Flink
Data Analysis With Apache FlinkData Analysis With Apache Flink
Data Analysis With Apache FlinkDataWorks Summit
 
Data Analysis with Apache Flink (Hadoop Summit, 2015)
Data Analysis with Apache Flink (Hadoop Summit, 2015)Data Analysis with Apache Flink (Hadoop Summit, 2015)
Data Analysis with Apache Flink (Hadoop Summit, 2015)Aljoscha Krettek
 
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...IndicThreads
 
Structuring Spark: DataFrames, Datasets, and Streaming
Structuring Spark: DataFrames, Datasets, and StreamingStructuring Spark: DataFrames, Datasets, and Streaming
Structuring Spark: DataFrames, Datasets, and StreamingDatabricks
 
r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packagesAjay Ohri
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)Paul Chao
 
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Spark Summit
 
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectMao Geng
 
Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkUnified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkC4Media
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at ScaleSean Zhong
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Serban Tanasa
 

Similar a Flink internals web (20)

Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep Dive
 
FastR+Apache Flink
FastR+Apache FlinkFastR+Apache Flink
FastR+Apache Flink
 
Introduction to Apache Flink
Introduction to Apache FlinkIntroduction to Apache Flink
Introduction to Apache Flink
 
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael ArmbrustStructuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
Structuring Spark: DataFrames, Datasets, and Streaming by Michael Armbrust
 
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
Structuring Apache Spark 2.0: SQL, DataFrames, Datasets And Streaming - by Mi...
 
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
Apache Flink Meetup Munich (November 2015): Flink Overview, Architecture, Int...
 
SparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDsSparkSQL: A Compiler from Queries to RDDs
SparkSQL: A Compiler from Queries to RDDs
 
Data Analysis With Apache Flink
Data Analysis With Apache FlinkData Analysis With Apache Flink
Data Analysis With Apache Flink
 
Data Analysis with Apache Flink (Hadoop Summit, 2015)
Data Analysis with Apache Flink (Hadoop Summit, 2015)Data Analysis with Apache Flink (Hadoop Summit, 2015)
Data Analysis with Apache Flink (Hadoop Summit, 2015)
 
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...Processing massive amount of data with Map Reduce using Apache Hadoop  - Indi...
Processing massive amount of data with Map Reduce using Apache Hadoop - Indi...
 
Structuring Spark: DataFrames, Datasets, and Streaming
Structuring Spark: DataFrames, Datasets, and StreamingStructuring Spark: DataFrames, Datasets, and Streaming
Structuring Spark: DataFrames, Datasets, and Streaming
 
Apache Flink Deep Dive
Apache Flink Deep DiveApache Flink Deep Dive
Apache Flink Deep Dive
 
Handout3o
Handout3oHandout3o
Handout3o
 
r,rstats,r language,r packages
r,rstats,r language,r packagesr,rstats,r language,r packages
r,rstats,r language,r packages
 
AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)AI與大數據數據處理 Spark實戰(20171216)
AI與大數據數據處理 Spark實戰(20171216)
 
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)Advanced Data Science on Spark-(Reza Zadeh, Stanford)
Advanced Data Science on Spark-(Reza Zadeh, Stanford)
 
Hadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log projectHadoop and HBase experiences in perf log project
Hadoop and HBase experiences in perf log project
 
Unified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache SparkUnified Big Data Processing with Apache Spark
Unified Big Data Processing with Apache Spark
 
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
Strata Singapore: GearpumpReal time DAG-Processing with Akka at ScaleStrata Singapore: GearpumpReal time DAG-Processing with Akka at Scale
Strata Singapore: Gearpump Real time DAG-Processing with Akka at Scale
 
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
Get up to Speed (Quick Guide to data.table in R and Pentaho PDI)
 

Flink internals web

  • 1. Flink internals Kostas Tzoumas Flink committer & Co-founder, data Artisans ktzoumas@apache.org @kostas_tzoumas
  • 2. Welcome  Last talk: how to program PageRank in Flink, and Flink programming model  This talk: how Flink works internally  Again, a big bravo to the Flink community 2
  • 4. DataSet and transformations Input X First Y Second Operator X Operator Y ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); DataSet<String> input = env.readTextFile(input); DataSet<String> first = input .filter (str -> str.contains(“Apache Flink“)); DataSet<String> second = first .filter (str -> str.length() > 40); second.print() env.execute(); 4
  • 5. Available transformations  map  flatMap  filter  reduce  reduceGroup  join  coGroup  aggregate  cross  project  distinct  union  iterate  iterateDelta  repartition  … 5
  • 6. Other API elements & tools  Accumulators and counters • Int, Long, Double counters • Histogram accumulator • Define your own  Broadcast variables  Plan visualization  Local debugging/testing mode 6
  • 7. Data types and grouping public static class Access { public int userId; public String url; ... } public static class User { public int userId; public int region; public Date customerSince; ... } DataSet<Tuple2<Access,User>> campaign = access.join(users) .where(“userId“).equalTo(“userId“) DataSet<Tuple3<Integer,String,String> someLog; someLog.groupBy(0,1).reduceGroup(...);  Bean-style Java classes & field names  Tuples and position addressing  Any data type with key selector function 7
  • 8. Other API elements  Hadoop compatibility • Supports all Hadoop data types, input/output formats, Hadoop mappers and reducers  Data streaming API • DataStream instead of DataSet • Similar set of operators • Currently in alpha but moving very fast  Scala and Java APIs (mirrored)  Graph API (Spargel) 8
  • 10. Task for (String token : value.split("W")) { out.collect(new Tuple2<>(token, 1)); Manager DataSet<String> text = env.readTextFile(input); DataSet<Tuple2<String, Integer>> result = text Job Manager Task Manager .flatMap((str, out) -> { }) .groupBy(0) .aggregate(SUM, 1); Flink Client & Optimizer O Romeo, Romeo, wherefore art thou Romeo? O, 1 Romeo, 3 wherefore, 1 art, 1 thou, 1 Apache Flink 10 Nor arm, nor face, nor any other part nor, 3 arm, 1 face, 1, any, 1, other, 1 part, 1
  • 11. If you want to know one thing about Flink is that you don’t need to know the internals of Flink. 11
  • 12. Philosophy  Flink “hides” its internal workings from the user  This is good • User does not worry about how jobs are executed • Internals can be changed without breaking changes  … and bad • Execution model more complicated to explain compared to MapReduce or Spark RDD 12
  • 13. Recap: DataSet Input X First Y Second Operator X Operator Y 13 ExecutionEnvironment env = ExecutionEnvironment.getExecutionEnvironment(); DataSet<String> input = env.readTextFile(input); DataSet<String> first = input .filter (str -> str.contains(“Apache Flink“)); DataSet<String> second = first .filter (str -> str.length() > 40); second.print() env.execute();
  • 14. Common misconception Input X First Y Second  Programs are not executed eagerly  Instead, system compiles program to an execution plan and executes that plan 14
  • 15. DataSet<String>  Think of it as a PCollection<String>, or a Spark RDD[String]  With a major difference: it can be produced/recovered in several ways • … like a Java collection • … like an RDD • … perhaps it is never fully materialized (because the program does not need it to) • … implicitly updated in an iteration  And this is transparent to the user 15
  • 16. Example: grep Romeo, Romeo, where art thou Romeo? Load Log Search for str1 Search for str2 Search for str2 Grep 1 Grep 2 Grep 2 16
  • 17. Staged (batch) execution Romeo, Romeo, where art thou Romeo? Load Log Search for str1 Search for str2 Search for str2 Grep 1 Grep 2 Grep 2 Stage 1: Create/cache Log Subseqent stages: Grep log for matches Caching in-memory and disk if needed 17
  • 18. Pipelined execution Romeo, Romeo, where art thou Romeo? Load Log Search for str1 Search for str2 Search for str2 Grep 1 Grep 2 Grep 2 00110011 Stage 1: Deploy and start operators Data transfer in-memory and disk if needed 18 Note: Log DataSet is never “created”!
  • 19. Benefits of pipelining  25 node cluster  Grep log for 3 terms  Scale data size from 100GB to 1TB 2500 2250 2000 1750 1500 1250 1000 750 500 250 0 0 100 200 300 400 500 600 700 800 900 1000 Time to complete grep (sec) Cluster memory Data size (GB) exceeded 19
  • 20. 20
  • 21. Drawbacks of pipelining  Long pipelines may be active at the same time leading to memory fragmentation • FLINK-1101: Changes memory allocation from static to adaptive  Fault-tolerance harder to get right • FLINK-986: Adds intermediate data sets (similar to RDDS) as first-class citizen to Flink Runtime. Will lead to fine-grained fault-tolerance among other features. 21
  • 22. Example: Iterative processing DataSet<Page> pages = ... DataSet<Neighborhood> edges = ... DataSet<Page> oldRanks = pages; DataSet<Page> newRanks; for (i = 0; i < maxIterations; i++) { newRanks = update(oldRanks, edges) oldRanks = newRanks } DataSet<Page> result = newRanks; DataSet<Page> update (DataSet<Page> ranks, DataSet<Neighborhood> adjacency) { return oldRanks .join(adjacency) .where(“id“).equalTo(“id“) .with ( (page, adj, out) -> { for (long n : adj.neighbors) out.collect(new Page(n, df * page.rank / adj.neighbors.length)) }) .groupBy(“id“) .reduce ( (a, b) -> new Page(a.id, a.rank + b.rank) ); 22
  • 23. Iterate by unrolling Client Step Step Step Step Step  for/while loop in client submits one job per iteration step  Data reuse by caching in memory and/or disk 23
  • 24. Iterate natively DataSet<Page> pages = ... DataSet<Neighborhood> edges = ... IterativeDataSet<Page> pagesIter = pages.iterate(maxIterations); DataSet<Page> newRanks = update (pagesIter, edges); DataSet<Page> result = pagesIter.closeWith(newRanks) 24 partial solution partial solution X other datasets Y initial solution iteration result Replace Step function
  • 25. Iterate natively with deltas Replace workset A B workset partial solution delta set X other datasets Y initial workset initial solution iteration result Merge deltas DeltaIteration<...> pagesIter = pages.iterateDelta(initialDeltas, maxIterations, 0); DataSet<...> newRanks = update (pagesIter, edges); DataSet<...> newRanks = ... DataSet<...> result = pagesIter.closeWith(newRanks, deltas) See http://data-artisans.com/data-analysis-with-flink.html 25
  • 28. 28
  • 29. Flink stack Apache Tez Data storage Flink Optimizer Flink Stream Builder Rabbit MQ Embedded execution (Java collections) Files HDFS S3 JDBC Kafka Redis Azure tables … Local execution Flink Runtime YARN EC2 Common API Scala API (batch) Java API (streaming) Java API (batch) Python API (upcoming) Graph API Apache MRQL Flink Execution Engine 29
  • 30. Flink stack 30 Common API Flink Optimizer Flink Stream Builder Scala API (batch) Java API (streaming) Java API (batch) Python API (upcoming) Graph API Apache MRQL Flink Local Runtime Embedded environment (Java collections) Local Environment (for debugging) Remote environment (Regular cluster execution) Apache Tez Flink cluster YARN Data storage Rabbit MQ Files HDFS S3 JDBC Kafka Redis Azure tables … Single node execution
  • 31. Program lifecycle 31 val source1 = … val source2 = … val maxed = source1 .map(v => (v._1,v._2, math.max(v._1,v._2)) val filtered = source2 .filter(v => (v._1 > 4)) val result = maxed .join(filtered).where(0).equalTo(0) .filter(_1 > 3) .groupBy(0) .reduceGroup {……} 1
  • 32.  The optimizer is the component that selects an execution plan for a Common API program  Think of an AI system manipulating your program for you   But don’t be scared – it works • Relational databases have been doing this for decades – Flink ports the technology to API-based systems Flink Optimizer 32
  • 33. A simple program 33 DataSet<Tuple5<Integer, String, String, String, Integer>> orders = … DataSet<Tuple2<Integer, Double>> lineitems = … DataSet<Tuple2<Integer, Integer>> filteredOrders = orders .filter(. . .) .project(0,4).types(Integer.class, Integer.class); DataSet<Tuple3<Integer, Integer, Double>> lineitemsOfOrders = filteredOrders .join(lineitems) .where(0).equalTo(0) .projectFirst(0,1).projectSecond(1) .types(Integer.class, Integer.class, Double.class); DataSet<Tuple3<Integer, Integer, Double>> priceSums = lineitemsOfOrders .groupBy(0,1).aggregate(Aggregations.SUM, 2); priceSums.writeAsCsv(outputPath);
  • 34. Two execution plans 34 GroupRed sort Combine Map DataSource Filter DataSource orders.tbl lineitem.tbl Join Hybrid Hash buildHT probe broadcast forward Map DataSource Filter DataSource orders.tbl lineitem.tbl Join Hybrid Hash buildHT probe hash-part [0] hash-part [0] hash-part [0,1] GroupRed sort Best plan forward depends on relative sizes of input files
  • 35. Flink Local Runtime  Local runtime, not the 35 distributed execution engine  Aka: what happens inside every parallel task
  • 36. Flink runtime operators  Sorting and hashing data • Necessary for grouping, aggregation, reduce, join, cogroup, delta iterations  Flink contains tailored implementations of hybrid hashing and external sorting in Java • Scale well with both abundant and restricted memory sizes 36
  • 37. Internal data representation 37 JVM Heap map JVM Heap reduce O Romeo, Romeo, wherefore art thou Romeo? 00110011 art, 1 O, 1 Romeo, 1 Romeo, 1 00110011 00010111 01110001 01111010 00010111 00110011 Network transfer Local sort How is intermediate data internally represented?
  • 38. Internal data representation  Two options: Java objects or raw bytes  Java objects • Easier to program • Can suffer from GC overhead • Hard to de-stage data to disk, may suffer from “out of memory exceptions”  Raw bytes • Harder to program (customer serialization stack, more involved runtime operators) • Solves most of memory and GC problems • Overhead from object (de)serialization  Flink follows the raw byte approach 38
  • 39. Memory in Flink public class WC { public String word; public int count; } empty page Pool of Memory Pages JVM Heap User code objects Sorting, hashing, caching Shuffling, broadcasts heap Unmanaged Managed heap Network buffers 39
  • 40. Memory in Flink (2)  Internal memory management • Flink initially allocates 70% of the free heap as byte[] segments • Internal operators allocate() and release() these segments  Flink has its own serialization stack • All accepted data types serialized to data segments  Easy to reason about memory, (almost) no OutOfMemory errors, reduces the pressure to the GC (smooth performance) 40
  • 41. Operating on serialized data Microbenchmark  Sorting 1GB worth of (long, double) tuples  67,108,864 elements  Simple quicksort 41
  • 42. Flink Execution Engine 42  The distributed execution engine  Pipelined • Same engine for Flink and Flink streaming  Pluggable • Local runtime can be executed on other engines • E.g., Java collections and Apache Tez
  • 44. Summary  Flink decouples API from execution • Same program can be executed in many different ways • Hopefully users do not need to care about this and still get very good performance  Unique Flink internal features • Pipelined execution, native iterations, optimizer, serialized data manipulation, good disk destaging  Very good performance • Known issues currently worked on actively 44
  • 45. Stay informed  flink.incubator.apache.org • Subscribe to the mailing lists! • http://flink.incubator.apache.org/community.html#mailing-lists  Blogs • flink.incubator.apache.org/blog • data-artisans.com/blog  Twitter • follow @ApacheFlink 45
  • 46. 46
  • 47. That’s it, time for beer 47
  • 49. Flink in context 49 Hive Mahout MapReduce Flink Spark Storm Yarn Mesos HDFS Cascading … Tez Pig Applications Data processing engines App and resource management Storage, streams HBase Kafka …
  • 50. Common API  Notion of “DataSet” is no longer present  Program is a DAG of operators DataSource DataSource MapOperator FilterOperator JoinOperator DataSink Operator 50
  • 51. Example: Joins in Flink DataSet<Order> large = ... DataSet<Lineitem> medium = ... DataSet<Customer> small = ... ⋈ ⋈ DataSet<Tuple...> joined1 = large.join(medium).where(3).equals(1) .with(new JoinFunction() { ... }); DataSet<Tuple...> joined2 = small.join(joined1).where(0).equals(2) .with(new JoinFunction() { ... }); DataSet<Tuple...> result = joined2.groupBy(3).aggregate(MAX, 2); small 51 Built-in strategies include partitioned join and replicated join with local sort-merge or hybrid-hash algorithms. γ large medium
  • 52. Optimizer Example DataSet<Tuple...> large = env.readCsv(...); DataSet<Tuple...> medium = env.readCsv(...); DataSet<Tuple...> small = env.readCsv(...); DataSet<Tuple...> joined1 = large.join(medium).where(3).equals(1) .with(new JoinFunction() { ... }); DataSet<Tuple...> joined2 = small.join(joined1).where(0).equals(2) .with(new JoinFunction() { ... }); DataSet<Tuple...> result = joined2.groupBy(3).aggregate(MAX, 2); 52 1) Partitioned hash-join 2) Broadcast hash-join 3) Grouping /Aggregation reuses the partitioning from step (1)  No shuffle!!! Partitioned ≈ Reduce-side Broadcast ≈ Map-side
  • 53. Operating on serialized data 53 serializes data every time  Highly robust, never gives up on you works on objects, RDDs may be stored serialized Serialization considered slow, only when needed makes serialization really cheap:  partial deserialization, operates on serialized form  Efficient and robust!

Notas del editor

  1. cool feature: automatic