SlideShare una empresa de Scribd logo
1 de 65
Descargar para leer sin conexión
Scotty: Efficient Window Aggregation with
General Stream Slicing
Berlin, October 7-9, 2019
Philipp M. Grulich
Research Associate (TU Berlin)
Jonas Traub
Research Associate (TU Berlin)
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Aggregations in Stream Processing Pipelines
A stream processing pipeline is a series of concurrently running operators.
2
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Aggregations in Stream Processing Pipelines
A stream processing pipeline is a series of concurrently running operators.
Window
Aggregation
2
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Aggregations in Stream Processing Pipelines
A stream processing pipeline is a series of concurrently running operators.
Window
Aggregation
53
2
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Aggregations in Stream Processing Pipelines
A stream processing pipeline is a series of concurrently running operators.
Window
Aggregation
8
2
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Motivation
3
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Motivation
3
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Research Background
Cutty: Aggregate Sharing for User-Defined Windows
P. Carbone, J. Traub, A. Katsifodimos, S. Haridi, V. Markl
ACM International on Conference on Information and Knowledge Management (CIKM2016)
4
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Research Background
Cutty: Aggregate Sharing for User-Defined Windows
P. Carbone, J. Traub, A. Katsifodimos, S. Haridi, V. Markl
ACM International on Conference on Information and Knowledge Management (CIKM2016)
Scotty: Efficient Window Aggregation for out-of-order Stream Processing
J. Traub, P. M. Grulich, A. R. Cuéllar, S. Breß, A. Katsifodimos, T. Rabl, V. Markl
IEEE International Conference on Data Engineering (ICDE 2018)
4
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Research Background
Cutty: Aggregate Sharing for User-Defined Windows
P. Carbone, J. Traub, A. Katsifodimos, S. Haridi, V. Markl
ACM International on Conference on Information and Knowledge Management (CIKM2016)
Scotty: Efficient Window Aggregation for out-of-order Stream Processing
J. Traub, P. M. Grulich, A. R. Cuéllar, S. Breß, A. Katsifodimos, T. Rabl, V. Markl
IEEE International Conference on Data Engineering (ICDE 2018)
Efficient Window Aggregation with General Stream Slicing
J. Traub, P. M. Grulich, AR. Cuéllar, S. Breß, A. Katsifodimos, T. Rabl, V. Markl
International Conference on Extending Database Technology (EDBT 2019; Best Paper Award)
4
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Research Background
Cutty: Aggregate Sharing for User-Defined Windows
P. Carbone, J. Traub, A. Katsifodimos, S. Haridi, V. Markl
ACM International on Conference on Information and Knowledge Management (CIKM2016)
Scotty: Efficient Window Aggregation for out-of-order Stream Processing
J. Traub, P. M. Grulich, A. R. Cuéllar, S. Breß, A. Katsifodimos, T. Rabl, V. Markl
IEEE International Conference on Data Engineering (ICDE 2018)
Efficient Window Aggregation with General Stream Slicing
J. Traub, P. M. Grulich, AR. Cuéllar, S. Breß, A. Katsifodimos, T. Rabl, V. Markl
International Conference on Extending Database Technology (EDBT 2019; Best Paper Award)
Scotty Window Processor:
Efficent Window Aggregations for Flink, Beam, and Storm
https://github.com/TU-Berlin-DIMA/scotty-window-processor
4
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
5
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
The number of slices depends on the workload.
6
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
7
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
8
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
9
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
10
We store partial aggregates instead of all tuples. => Small memory footprint.
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
11
We assign each tuple to exactly one slice. => O(1) per-tuple complexity.
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
12
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Stream Slicing Example
We require just a few computation steps to calculate final aggregates. => Low latency.
13
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
General Stream Slicing
14
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
General Stream Slicing
Workload
Characteristics
14
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
General Stream Slicing
Workload
Characteristics
Aggregation
Functions
distributive
algebraic
holistic
associativity
cummutativity
invertibility
14
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
General Stream Slicing
Workload
Characteristics
Window
Types
Context Free
Forward Context Free
Forward Context Aware
Aggregation
Functions
distributive
algebraic
holistic
associativity
cummutativity
invertibility
14
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
General Stream Slicing
Workload
Characteristics
Window
Types
Context Free
Forward Context Free
Forward Context Aware
Window
Measures
time
tuple count
arbitrary
Aggregation
Functions
distributive
algebraic
holistic
associativity
cummutativity
invertibility
14
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
General Stream Slicing
Workload
Characteristics
Window
Types
Context Free
Forward Context Free
Forward Context Aware
Stream
Order
in-order
out-of-order
Window
Measures
time
tuple count
arbitrary
Aggregation
Functions
distributive
algebraic
holistic
associativity
cummutativity
invertibility
14
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
General Stream Slicing
Workload
Characteristics
Window
Types
Context Free
Forward Context Free
Forward Context Aware
Stream
Order
in-order
out-of-order
Window
Measures
time
tuple count
arbitrary
Aggregation
Functions
distributive
algebraic
holistic
associativity
cummutativity
invertibility
General Stream Slicing combines generality and efficiency in a single solution.
14
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
15
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
15
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
Count-based tumbling
window with a length of 5
tuples.
15
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Count-based tumbling
window with a length of 5
tuples.
15
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Count-based tumbling
window with a length of 5
tuples.
11 13 12
15
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
11 13 12
What if the stream is out-of-order?
15
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
What if the stream is out-of-order?
15
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
What if the stream is out-of-order?
5
49
Out-of-order Tuple
15
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
What if the stream is out-of-order?
5
49
Out-of-order Tuple
15
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
What if the stream is out-of-order?
5
49
15
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
What if the stream is out-of-order?
5
49
13 12
15
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
What if the stream is out-of-order?
5
49
13 12
15
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
What if the stream is out-of-order?
5
49
13 12
5
15
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
What if the stream is out-of-order?
5
49
13 125 + - 3
5
15
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
What if the stream is out-of-order?
5
49
13 123 1+ -5 + - 3
5
15
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Impact of Workload Characteristics
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
1 2 3 4 5 6 7 8 9 10 11 12 13 14
Tuple Count
15
Event Time
5 12 13 20 35 37 42 46 48 51 52 57 63 64 65
11 13 12
1 2 1 4 3 1 5 2 2 3 6 1 2 2 1
What if the stream is out-of-order?
5
49
13 123 1+ -5 + - 3
5
What if the aggregation function is not invertible?
15
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Scotty Window Processor:
Efficent Window Aggregations
for Flink, Beam, and Storm
https://github.com/TU-Berlin-DIMA/scotty-window-processor
16
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Key-Facts
Features:
17
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Key-Facts
Features:
● One window operator for many systems.
17
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Key-Facts
Features:
● One window operator for many systems.
● High performance window aggregations with stream slicing.
17
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Key-Facts
Features:
● One window operator for many systems.
● High performance window aggregations with stream slicing.
● Scales to thousands of concurrent windows.
17
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Key-Facts
Features:
● One window operator for many systems.
● High performance window aggregations with stream slicing.
● Scales to thousands of concurrent windows.
● Aggregate sharing among multiple window queries.
17
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Key-Facts
Features:
● One window operator for many systems.
● High performance window aggregations with stream slicing.
● Scales to thousands of concurrent windows.
● Aggregate sharing among multiple window queries.
● Adapts to workload characteristics:
○ Window Types
○ Aggregation Functions
○ Window Measures
○ Stream Order
17
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Key-Facts
Features:
● One window operator for many systems.
● High performance window aggregations with stream slicing.
● Scales to thousands of concurrent windows.
● Aggregate sharing among multiple window queries.
● Adapts to workload characteristics:
○ Window Types
○ Aggregation Functions
○ Window Measures
○ Stream Order
Connectors:
…more coming soon…
17
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Scotty Core
18
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Scotty Core
18
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Scotty Core
18
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Scotty Core
18
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Scotty Core
Scotty adapts to work load characteristics
and combines generality and efficiency in a single solution.
18
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Benchmark
Concurrent Windows with Built-in Window Operator:
● Flink performs well
with a single window
(no overlap; one
bucket at a time)
0
500.000
1.000.000
1.500.000
2.000.000
2.500.000
1 10 20 50 100 500 1000
Flink Storm Flink on Beam
Throughput(Tuples/sec.)
Number of Councurrent Windows
19
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Benchmark
Concurrent Windows with Built-in Window Operator:
● Flink performs well
with a single window
(no overlap; one
bucket at a time)
0
500.000
1.000.000
1.500.000
2.000.000
2.500.000
1 10 20 50 100 500 1000
Flink Storm Flink on Beam
● With overlapping
concurrent windows,
the throughput drops
drastically.
Throughput(Tuples/sec.)
Number of Councurrent Windows
19
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
0
500.000
1.000.000
1.500.000
2.000.000
2.500.000
1 10 20 50 100 500 1000
Flink+Scotty Storm+Scotty Beam+Flink+Scotty
Benchmark
Concurrent Windows with Scotty:
● With Scotty, the throughput
is independent of the
number of concurrent
windows.
20
Throughput(Tuples/sec.)
Number of Councurrent Windows
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Using Scotty on Flink
1. Clone Scotty and install to maven
21
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Using Scotty on Flink
1. Clone Scotty and install to maven
2. Add Scotty to your Flink Project:
21
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Using Scotty on Flink
1. Initialize Scotty Window Operator
22
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Using Scotty on Flink
1. Initialize Scotty Window Operator
2. Add Window Definitions
22
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Using Scotty on Flink
1. Initialize Scotty Window Operator
3. Add Scotty to your Flink Job
2. Add Window Definitions
22
Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing
Acknowledgements: This talk is supported by the Berlin Big Data Center (01IS14013A), the Berlin Center for Machine Learning (01IS18037A), and Software Campus (1-3000473-18TP).
Scotty Window Processor
Scotty Features:
● One window operator for many systems.
● High performance with stream slicing.
● Scales to thousands of concurrent windows.
● Aggregate sharing among multiple window queries.
● Adapts to workload characteristics
tu-berlin-dima.github.io/
scotty-window-processor
Open Source Repository:
23

Más contenido relacionado

Similar a Scotty: Efficient Window Aggregation with General Stream Slicing - jonas Traub & Philipp Grulich, TU Berlin

Replacing Academic Journals
Replacing Academic JournalsReplacing Academic Journals
Replacing Academic JournalsBjörn Brembs
 
Costs of the French PWR
Costs of the French PWRCosts of the French PWR
Costs of the French PWRmyatom
 
Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)
Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)
Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)Jonas Traub
 
Sps Conference Essen 2009 Wi Lettenmaier
Sps Conference Essen 2009 Wi LettenmaierSps Conference Essen 2009 Wi Lettenmaier
Sps Conference Essen 2009 Wi LettenmaierCSCP
 
OPAL-RT RT13: Real time simulation of distribution grids
OPAL-RT RT13: Real time simulation of distribution gridsOPAL-RT RT13: Real time simulation of distribution grids
OPAL-RT RT13: Real time simulation of distribution gridsOPAL-RT TECHNOLOGIES
 
From Cloud to Fog: the Tao of IT Infrastructure Decentralization
From Cloud to Fog: the Tao of IT Infrastructure DecentralizationFrom Cloud to Fog: the Tao of IT Infrastructure Decentralization
From Cloud to Fog: the Tao of IT Infrastructure DecentralizationFogGuru MSCA Project
 
Gridforum Juergen Knobloch Grids For Science 20080402
Gridforum Juergen Knobloch Grids For Science 20080402Gridforum Juergen Knobloch Grids For Science 20080402
Gridforum Juergen Knobloch Grids For Science 20080402vrij
 

Similar a Scotty: Efficient Window Aggregation with General Stream Slicing - jonas Traub & Philipp Grulich, TU Berlin (8)

Replacing Academic Journals
Replacing Academic JournalsReplacing Academic Journals
Replacing Academic Journals
 
Costs of the French PWR
Costs of the French PWRCosts of the French PWR
Costs of the French PWR
 
Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)
Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)
Efficient Window Aggregation with General Stream Slicing (EDBT 2019, Best Paper)
 
Sps Conference Essen 2009 Wi Lettenmaier
Sps Conference Essen 2009 Wi LettenmaierSps Conference Essen 2009 Wi Lettenmaier
Sps Conference Essen 2009 Wi Lettenmaier
 
Evacuation Modelling in New Zealand the Result of An Online Survey_Crimson Pu...
Evacuation Modelling in New Zealand the Result of An Online Survey_Crimson Pu...Evacuation Modelling in New Zealand the Result of An Online Survey_Crimson Pu...
Evacuation Modelling in New Zealand the Result of An Online Survey_Crimson Pu...
 
OPAL-RT RT13: Real time simulation of distribution grids
OPAL-RT RT13: Real time simulation of distribution gridsOPAL-RT RT13: Real time simulation of distribution grids
OPAL-RT RT13: Real time simulation of distribution grids
 
From Cloud to Fog: the Tao of IT Infrastructure Decentralization
From Cloud to Fog: the Tao of IT Infrastructure DecentralizationFrom Cloud to Fog: the Tao of IT Infrastructure Decentralization
From Cloud to Fog: the Tao of IT Infrastructure Decentralization
 
Gridforum Juergen Knobloch Grids For Science 20080402
Gridforum Juergen Knobloch Grids For Science 20080402Gridforum Juergen Knobloch Grids For Science 20080402
Gridforum Juergen Knobloch Grids For Science 20080402
 

Más de Flink Forward

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Flink Forward
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkFlink Forward
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...Flink Forward
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Flink Forward
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorFlink Forward
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeFlink Forward
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Flink Forward
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkFlink Forward
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxFlink Forward
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink Forward
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraFlink Forward
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkFlink Forward
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentFlink Forward
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022Flink Forward
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink Forward
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsFlink Forward
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotFlink Forward
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesFlink Forward
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Flink Forward
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergFlink Forward
 

Más de Flink Forward (20)

Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...Building a fully managed stream processing platform on Flink at scale for Lin...
Building a fully managed stream processing platform on Flink at scale for Lin...
 
Evening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in FlinkEvening out the uneven: dealing with skew in Flink
Evening out the uneven: dealing with skew in Flink
 
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
“Alexa, be quiet!”: End-to-end near-real time model building and evaluation i...
 
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
Introducing BinarySortedMultiMap - A new Flink state primitive to boost your ...
 
Introducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes OperatorIntroducing the Apache Flink Kubernetes Operator
Introducing the Apache Flink Kubernetes Operator
 
Autoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive ModeAutoscaling Flink with Reactive Mode
Autoscaling Flink with Reactive Mode
 
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
Dynamically Scaling Data Streams across Multiple Kafka Clusters with Zero Fli...
 
One sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async SinkOne sink to rule them all: Introducing the new Async Sink
One sink to rule them all: Introducing the new Async Sink
 
Tuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptxTuning Apache Kafka Connectors for Flink.pptx
Tuning Apache Kafka Connectors for Flink.pptx
 
Flink powered stream processing platform at Pinterest
Flink powered stream processing platform at PinterestFlink powered stream processing platform at Pinterest
Flink powered stream processing platform at Pinterest
 
Apache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native EraApache Flink in the Cloud-Native Era
Apache Flink in the Cloud-Native Era
 
Where is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in FlinkWhere is my bottleneck? Performance troubleshooting in Flink
Where is my bottleneck? Performance troubleshooting in Flink
 
Using the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production DeploymentUsing the New Apache Flink Kubernetes Operator in a Production Deployment
Using the New Apache Flink Kubernetes Operator in a Production Deployment
 
The Current State of Table API in 2022
The Current State of Table API in 2022The Current State of Table API in 2022
The Current State of Table API in 2022
 
Flink SQL on Pulsar made easy
Flink SQL on Pulsar made easyFlink SQL on Pulsar made easy
Flink SQL on Pulsar made easy
 
Dynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data AlertsDynamic Rule-based Real-time Market Data Alerts
Dynamic Rule-based Real-time Market Data Alerts
 
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and PinotExactly-Once Financial Data Processing at Scale with Flink and Pinot
Exactly-Once Financial Data Processing at Scale with Flink and Pinot
 
Processing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial ServicesProcessing Semantically-Ordered Streams in Financial Services
Processing Semantically-Ordered Streams in Financial Services
 
Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...Tame the small files problem and optimize data layout for streaming ingestion...
Tame the small files problem and optimize data layout for streaming ingestion...
 
Batch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & IcebergBatch Processing at Scale with Flink & Iceberg
Batch Processing at Scale with Flink & Iceberg
 

Último

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...apidays
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...apidays
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesrafiqahmad00786416
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 

Último (20)

Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 

Scotty: Efficient Window Aggregation with General Stream Slicing - jonas Traub & Philipp Grulich, TU Berlin

  • 1. Scotty: Efficient Window Aggregation with General Stream Slicing Berlin, October 7-9, 2019 Philipp M. Grulich Research Associate (TU Berlin) Jonas Traub Research Associate (TU Berlin)
  • 2. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Aggregations in Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. 2
  • 3. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Aggregations in Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. Window Aggregation 2
  • 4. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Aggregations in Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. Window Aggregation 53 2
  • 5. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Aggregations in Stream Processing Pipelines A stream processing pipeline is a series of concurrently running operators. Window Aggregation 8 2
  • 6. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Motivation 3
  • 7. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Motivation 3
  • 8. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Research Background Cutty: Aggregate Sharing for User-Defined Windows P. Carbone, J. Traub, A. Katsifodimos, S. Haridi, V. Markl ACM International on Conference on Information and Knowledge Management (CIKM2016) 4
  • 9. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Research Background Cutty: Aggregate Sharing for User-Defined Windows P. Carbone, J. Traub, A. Katsifodimos, S. Haridi, V. Markl ACM International on Conference on Information and Knowledge Management (CIKM2016) Scotty: Efficient Window Aggregation for out-of-order Stream Processing J. Traub, P. M. Grulich, A. R. Cuéllar, S. Breß, A. Katsifodimos, T. Rabl, V. Markl IEEE International Conference on Data Engineering (ICDE 2018) 4
  • 10. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Research Background Cutty: Aggregate Sharing for User-Defined Windows P. Carbone, J. Traub, A. Katsifodimos, S. Haridi, V. Markl ACM International on Conference on Information and Knowledge Management (CIKM2016) Scotty: Efficient Window Aggregation for out-of-order Stream Processing J. Traub, P. M. Grulich, A. R. Cuéllar, S. Breß, A. Katsifodimos, T. Rabl, V. Markl IEEE International Conference on Data Engineering (ICDE 2018) Efficient Window Aggregation with General Stream Slicing J. Traub, P. M. Grulich, AR. Cuéllar, S. Breß, A. Katsifodimos, T. Rabl, V. Markl International Conference on Extending Database Technology (EDBT 2019; Best Paper Award) 4
  • 11. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Research Background Cutty: Aggregate Sharing for User-Defined Windows P. Carbone, J. Traub, A. Katsifodimos, S. Haridi, V. Markl ACM International on Conference on Information and Knowledge Management (CIKM2016) Scotty: Efficient Window Aggregation for out-of-order Stream Processing J. Traub, P. M. Grulich, A. R. Cuéllar, S. Breß, A. Katsifodimos, T. Rabl, V. Markl IEEE International Conference on Data Engineering (ICDE 2018) Efficient Window Aggregation with General Stream Slicing J. Traub, P. M. Grulich, AR. Cuéllar, S. Breß, A. Katsifodimos, T. Rabl, V. Markl International Conference on Extending Database Technology (EDBT 2019; Best Paper Award) Scotty Window Processor: Efficent Window Aggregations for Flink, Beam, and Storm https://github.com/TU-Berlin-DIMA/scotty-window-processor 4
  • 12. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Stream Slicing Example 5
  • 13. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Stream Slicing Example The number of slices depends on the workload. 6
  • 14. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Stream Slicing Example 7
  • 15. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Stream Slicing Example 8
  • 16. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Stream Slicing Example 9
  • 17. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Stream Slicing Example 10 We store partial aggregates instead of all tuples. => Small memory footprint.
  • 18. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Stream Slicing Example 11 We assign each tuple to exactly one slice. => O(1) per-tuple complexity.
  • 19. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Stream Slicing Example 12
  • 20. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Stream Slicing Example We require just a few computation steps to calculate final aggregates. => Low latency. 13
  • 21. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing General Stream Slicing 14
  • 22. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing General Stream Slicing Workload Characteristics 14
  • 23. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing General Stream Slicing Workload Characteristics Aggregation Functions distributive algebraic holistic associativity cummutativity invertibility 14
  • 24. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing General Stream Slicing Workload Characteristics Window Types Context Free Forward Context Free Forward Context Aware Aggregation Functions distributive algebraic holistic associativity cummutativity invertibility 14
  • 25. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing General Stream Slicing Workload Characteristics Window Types Context Free Forward Context Free Forward Context Aware Window Measures time tuple count arbitrary Aggregation Functions distributive algebraic holistic associativity cummutativity invertibility 14
  • 26. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing General Stream Slicing Workload Characteristics Window Types Context Free Forward Context Free Forward Context Aware Stream Order in-order out-of-order Window Measures time tuple count arbitrary Aggregation Functions distributive algebraic holistic associativity cummutativity invertibility 14
  • 27. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing General Stream Slicing Workload Characteristics Window Types Context Free Forward Context Free Forward Context Aware Stream Order in-order out-of-order Window Measures time tuple count arbitrary Aggregation Functions distributive algebraic holistic associativity cummutativity invertibility General Stream Slicing combines generality and efficiency in a single solution. 14
  • 28. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 15
  • 29. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 15
  • 30. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 Count-based tumbling window with a length of 5 tuples. 15
  • 31. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Count-based tumbling window with a length of 5 tuples. 15
  • 32. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Count-based tumbling window with a length of 5 tuples. 11 13 12 15
  • 33. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 11 13 12 What if the stream is out-of-order? 15
  • 34. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 What if the stream is out-of-order? 15
  • 35. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 What if the stream is out-of-order? 5 49 Out-of-order Tuple 15
  • 36. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 What if the stream is out-of-order? 5 49 Out-of-order Tuple 15
  • 37. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 What if the stream is out-of-order? 5 49 15
  • 38. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 What if the stream is out-of-order? 5 49 13 12 15
  • 39. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 What if the stream is out-of-order? 5 49 13 12 15
  • 40. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 What if the stream is out-of-order? 5 49 13 12 5 15
  • 41. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 What if the stream is out-of-order? 5 49 13 125 + - 3 5 15
  • 42. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 What if the stream is out-of-order? 5 49 13 123 1+ -5 + - 3 5 15
  • 43. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Impact of Workload Characteristics 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Tuple Count 15 Event Time 5 12 13 20 35 37 42 46 48 51 52 57 63 64 65 11 13 12 1 2 1 4 3 1 5 2 2 3 6 1 2 2 1 What if the stream is out-of-order? 5 49 13 123 1+ -5 + - 3 5 What if the aggregation function is not invertible? 15
  • 44. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Scotty Window Processor: Efficent Window Aggregations for Flink, Beam, and Storm https://github.com/TU-Berlin-DIMA/scotty-window-processor 16
  • 45. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Key-Facts Features: 17
  • 46. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Key-Facts Features: ● One window operator for many systems. 17
  • 47. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Key-Facts Features: ● One window operator for many systems. ● High performance window aggregations with stream slicing. 17
  • 48. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Key-Facts Features: ● One window operator for many systems. ● High performance window aggregations with stream slicing. ● Scales to thousands of concurrent windows. 17
  • 49. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Key-Facts Features: ● One window operator for many systems. ● High performance window aggregations with stream slicing. ● Scales to thousands of concurrent windows. ● Aggregate sharing among multiple window queries. 17
  • 50. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Key-Facts Features: ● One window operator for many systems. ● High performance window aggregations with stream slicing. ● Scales to thousands of concurrent windows. ● Aggregate sharing among multiple window queries. ● Adapts to workload characteristics: ○ Window Types ○ Aggregation Functions ○ Window Measures ○ Stream Order 17
  • 51. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Key-Facts Features: ● One window operator for many systems. ● High performance window aggregations with stream slicing. ● Scales to thousands of concurrent windows. ● Aggregate sharing among multiple window queries. ● Adapts to workload characteristics: ○ Window Types ○ Aggregation Functions ○ Window Measures ○ Stream Order Connectors: …more coming soon… 17
  • 52. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Scotty Core 18
  • 53. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Scotty Core 18
  • 54. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Scotty Core 18
  • 55. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Scotty Core 18
  • 56. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Scotty Core Scotty adapts to work load characteristics and combines generality and efficiency in a single solution. 18
  • 57. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Benchmark Concurrent Windows with Built-in Window Operator: ● Flink performs well with a single window (no overlap; one bucket at a time) 0 500.000 1.000.000 1.500.000 2.000.000 2.500.000 1 10 20 50 100 500 1000 Flink Storm Flink on Beam Throughput(Tuples/sec.) Number of Councurrent Windows 19
  • 58. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Benchmark Concurrent Windows with Built-in Window Operator: ● Flink performs well with a single window (no overlap; one bucket at a time) 0 500.000 1.000.000 1.500.000 2.000.000 2.500.000 1 10 20 50 100 500 1000 Flink Storm Flink on Beam ● With overlapping concurrent windows, the throughput drops drastically. Throughput(Tuples/sec.) Number of Councurrent Windows 19
  • 59. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing 0 500.000 1.000.000 1.500.000 2.000.000 2.500.000 1 10 20 50 100 500 1000 Flink+Scotty Storm+Scotty Beam+Flink+Scotty Benchmark Concurrent Windows with Scotty: ● With Scotty, the throughput is independent of the number of concurrent windows. 20 Throughput(Tuples/sec.) Number of Councurrent Windows
  • 60. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Using Scotty on Flink 1. Clone Scotty and install to maven 21
  • 61. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Using Scotty on Flink 1. Clone Scotty and install to maven 2. Add Scotty to your Flink Project: 21
  • 62. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Using Scotty on Flink 1. Initialize Scotty Window Operator 22
  • 63. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Using Scotty on Flink 1. Initialize Scotty Window Operator 2. Add Window Definitions 22
  • 64. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Using Scotty on Flink 1. Initialize Scotty Window Operator 3. Add Scotty to your Flink Job 2. Add Window Definitions 22
  • 65. Jonas Traub (TU Berlin), Philipp M. Grulich (TU Berlin) - Efficient Window Aggregation with Stream Slicing Acknowledgements: This talk is supported by the Berlin Big Data Center (01IS14013A), the Berlin Center for Machine Learning (01IS18037A), and Software Campus (1-3000473-18TP). Scotty Window Processor Scotty Features: ● One window operator for many systems. ● High performance with stream slicing. ● Scales to thousands of concurrent windows. ● Aggregate sharing among multiple window queries. ● Adapts to workload characteristics tu-berlin-dima.github.io/ scotty-window-processor Open Source Repository: 23