In April 2019, we did an USA excursion and presented selected publications of the TU Berlin DIMA and the DFKI IAM research groups. This slide set contains the four teaser talks which we presented on the tour:
1) Jonas Traub: Optimized On-Demand Data Streaming from Sensor Nodes
2) Sebastian Breß: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors
3) Martin Kiefer: Estimating Join Selectivities using Bandwidth Optimized Kernel Density Models
4) Andreas Kunft: BlockJoin: Efficient Matrix Partitioning through Joins
Database Research at TU Berlin DIMA and DFKI IAM - USA Excursion Slides 2019
1. Database Research at TU Berlin
Today‘s Talks:
Jonas Traub Sebastian Breß Martin Kiefer Andreas Kunft
Optimized On-Demand
Data Streaming from
Sensor Nodes
ACM Symposium on
Cloud Computing
(SoCC), 2017.
Estimating Join
Selectivities using
Bandwidth-Optimized
Kernel Density Models
Proceedings of the
VLDB Endowment
(PVLDB), 2017.
Generating Custom Code
for Efficient Query
Execution on
Heterogeneous
Processors
The VLDB Journal,
27(6), 2018.
BlockJoin:
Efficient Matrix
Partitioning Through
Joins
Proceedings of the
VLDB Endowment
(PVLDB), 2017.
Database Systems and Information Management Group (DIMA) of Volker Markl
2. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
Optimized On-Demand Data
Streaming from Sensor Nodes
Jonas Traub, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, Volker Markl
ACM Symposium on Cloud Computing (SoCC), 2017
3. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
The Sensor Cloud
Real-time
insights
3
4. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
The Sensor Cloud
Real-time
insights
Billions of sensor nodes form a sensor cloud
and provide data streams to analysis systems.
3
5. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
The Sensor Cloud
Real-time
insights
Billions of sensor nodes form a sensor cloud
and provide data streams to analysis systems.
3
6. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
The Sensor Cloud
Real-time
insights
Billions of sensor nodes form a sensor cloud
and provide data streams to analysis systems.
3
7. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
The Sensor Cloud
Real-time
insights
Billions of sensor nodes form a sensor cloud
and provide data streams to analysis systems.
3
8. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
Real-time
insights
Billions of sensor nodes form a sensor cloud
and provide data streams to analysis systems.
The Sensor Cloud – Problems
4
9. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
Real-time
insights
Streaming all data from billions
of sensors to all applications
with maximal frequencies is impossible
Billions of sensor nodes form a sensor cloud
and provide data streams to analysis systems.
The Sensor Cloud – Problems
4
10. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
Real-time
insights
Streaming all data from billions
of sensors to all applications
with maximal frequencies is impossible
Increasing data rates
require expensive
system scale-out.
Billions of sensor nodes form a sensor cloud
and provide data streams to analysis systems.
The Sensor Cloud – Problems
4
11. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
Tailor Data Streams to the Demand of Applications
• Provide an abstraction to define the data demand of applications.
• Optimize communication costs while maintaining the result accuracy.
• Share sensor reads and data transfer among users and queries.
User-Defined Sampling Functions (UDSFs)
Read-Time Optimization
Multi-Query / Multi-User Optimization
The Sensor Cloud – Solutions
5
12. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
Architecture Overview
6
13. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
Architecture Overview
6
14. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
Architecture Overview
6
15. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
Architecture Overview
6
16. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
Architecture Overview
6
17. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
Sensor Read Scheduling
7
18. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
Input:
Sensor read time and value
Output:
Next Sensor Read Request
User-Defined Sampling Functions
8
19. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
Input:
Sensor read time and value
User-Defined Sampling Functions
9
20. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
Enable adaptive sampling techniques to reduce data transmission
e.g., Adam [Trihinas ‘15], FAST [Fan ‘14], L-SIP [Gaura ’13]
User-Defined Sampling Functions
10
21. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
Sensor Read Fusion
11
22. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
1) Minimize Sensor Reads and Data Transfer:
Latest possible read time
Sensor Read Fusion
12
23. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
1) Minimize Sensor Reads and Data Transfer:
Latest possible read time
2) Optimize Sensor Read Times:
● Check the paper for all details on the read time optimizer!
Sensor Read Fusion
12
24. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
Read Execution
14
25. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
Local Filtering
15
26. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
● Enable adaptive filtering in combination with adaptive sampling
● Enable model-driven data acquisition
Local Filtering
15
27. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
• On-Demand scheduling reduces sensor reads and data transfer by up to 87%.
• The # of reads and transfers increases sub-linearly with the # of queries.
Increasing the Number of Concurrent Queries
16
independent queries
on-demand scheduling
28. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
Further Publications on Data Streams and Sensor Data:
Optimized On-Demand Data
Streaming from Sensor Nodes
Jonas Traub, Sebastian Breß, Asterios Katsifodimos, Tilmann Rabl, Volker Markl
ACM Symposium on Cloud Computing (SoCC), 2017
Efficient Window
Aggregation with General
Stream Slicing
EDBT 2019
I²: Interactive Real-Time
Visualization for
Streaming Data
EDBT 2017
Resense: Transparent Record
and Replay of Sensor Data in
the Internet of Things
EDBT 2019
29. Database Research at TU Berlin
Up Next:
Jonas Traub Sebastian Breß Martin Kiefer Andreas Kunft
Optimized On-Demand
Data Streaming from
Sensor Nodes
ACM Symposium on
Cloud Computing
(SoCC), 2017.
Estimating Join
Selectivities using
Bandwidth-Optimized
Kernel Density Models
Proceedings of the
VLDB Endowment
(PVLDB), 2017.
Generating Custom Code
for Efficient Query
Execution on
Heterogeneous
Processors
The VLDB Journal,
27(6), 2018.
BlockJoin:
Efficient Matrix
Partitioning Through
Joins
Proceedings of the
VLDB Endowment
(PVLDB), 2017.
Database Systems and Information Management Group (DIMA) of Volker Markl
30. Generating Custom Code for Efficient Query
Execution on Heterogeneous Processors
Sebastian Breß, Bastian Köcher, Henning Funke, Steffen Zeuch, Tilmann Rabl, Volker Markl
VLDB Journal, 27(6), 797-822, 2018
31. Heterogeneous Processors
20S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
32. Heterogeneous Processors
20
CPUs
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
33. Heterogeneous Processors
20
CPUs MICs
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
34. Heterogeneous Processors
20
CPUs MICs GPUs
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
35. Heterogeneous Processors
20
Enable databases to automatically exploit heterogeneous processors
Goal
CPUs MICs GPUs
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
36. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 21
Writing efficient code for different processors is costly and error prone
Problem
Problem and Key Ideas
37. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 21
Writing efficient code for different processors is costly and error prone
Problem
Generate custom code for each query and processor
Key Idea 1
Problem and Key Ideas
38. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 21
Writing efficient code for different processors is costly and error prone
Problem
Generate custom code for each query and processor
Key Idea 1
Identify efficient code modifications and parameters automatically
Key Idea 2
Problem and Key Ideas
39. Challenges
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 22
40. Challenges
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 22
Represent code modifications in query plan
Intermediate Representation
41. Challenges
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 22
Represent code modifications in query plan
Intermediate Representation
Select efficient parameters and code modifications
Variant Optimization
42. Challenges
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 22
Represent code modifications in query plan
Intermediate Representation
Select efficient parameters and code modifications
Variant Optimization
Generate hardware-tailored code
Code Generation
43. Hawk Code Generator
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 23
44. Hawk Code Generator
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 23
y
a
od a o
a s
45. Hawk Code Generator
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 23
y
a
od a o
a s
No changes to SQL parser and optimizer
Alternative Execution Engine
46. Hawk Code Generator
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 23
y
a
od a o
a s
No changes to SQL parser and optimizer
Alternative Execution Engine
Execute queries on CPUs/GPUs/MICs
Multi-Processor Support
47. Hawk Code Generator
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 23
y
a
od a o
a s
No changes to SQL parser and optimizer
Alternative Execution Engine
Execute queries on CPUs/GPUs/MICs
Multi-Processor Support
Tunes code and parameters to processors
Automatic Performance Optimization
48. Step 1: Query Segmentation
24
CJCJ
CJ
SQL
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
49. Step 1: Query Segmentation
24
CJCJ
CJ
SQL
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
50. Step 1: Query Segmentation
24
SQL
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
51. Step 2: Select Processor-Specific Code Variants
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 25
Pipeline
program
Optimized Pipeline
Programs
52. Step 2: Select Processor-Specific Code Variants
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 25
Pipeline
program
Optimized Pipeline
Programs
Variant
Optimizer
53. Step 2: Select Processor-Specific Code Variants
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 25
Pipeline
program
Optimized Pipeline
Programs
Variant
Optimizer
54. Step 2: Select Processor-Specific Code Variants
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 25
Pipeline
program
Optimized Pipeline
Programs
Variant
Optimizer
55. Step 2: Select Processor-Specific Code Variants
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 25
Pipeline
program
Optimized Pipeline
Programs
Variant
Optimizer
61. Pipeline Program IR
28
SELECT id, age
FROM person
WHERE age < 25;
SQL Query Pipeline Program
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
62. Pipeline Program IR (2)
29S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
63. Pipeline Program IR (2)
29
LOOP(person)
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
64. Pipeline Program IR (2)
29
LOOP(person)
FILTER(age<25)
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
65. Pipeline Program IR (2)
29
LOOP(person)
FILTER(age<25)
HASH_PUT(id)
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
66. Pipeline Program IR (2)
29
LOOP(person)
FILTER(age<25)
HASH_PUT(id)
PROJECT(id, age)
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
67. Pipeline Program IR: Modifications
30
LOOP(table)
FILTER(age<25)
HASH_PUT(id)
PROJECT(id, age)
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
68. Pipeline Program IR: Modifications
30
LOOP(table)
FILTER(age<25)
HASH_PUT(id)
PROJECT(id, age)
Memory Access Pattern
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
69. Pipeline Program IR: Modifications
30
LOOP(table)
FILTER(age<25)
HASH_PUT(id)
PROJECT(id, age)
Memory Access Pattern
Predication Mode
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
70. Pipeline Program IR: Modifications
30
LOOP(table)
FILTER(age<25)
HASH_PUT(id)
PROJECT(id, age)
Memory Access Pattern
Hash Table Implementation
Predication Mode
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
71. Pipeline Program IR: Modifications
30
LOOP(table)
FILTER(age<25)
HASH_PUT(id)
PROJECT(id, age)
Memory Access Pattern
Hash Table Implementation
Predication Mode
Parallelization Strategy
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
72. Pipeline Program IR: Modifications (2)
31
LOOP(table, sequential)
FILTER(age<25, branched)
HASH_PUT(id, linear_probing)
PROJECT(id, age, single-pass)
LOOP(table)
FILTER(age<25)
HASH_PUT(id)
PROJECT(id, age)
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
73. Pipeline Program IR: Modifications (2)
31
LOOP(table, sequential)
FILTER(age<25, branched)
HASH_PUT(id, linear_probing)
PROJECT(id, age, single-pass)
FILTER(age<25)
HASH_PUT(id)
PROJECT(id, age)
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
74. Pipeline Program IR: Modifications (2)
31
LOOP(table, sequential)
FILTER(age<25, branched)
HASH_PUT(id, linear_probing)
PROJECT(id, age, single-pass)
HASH_PUT(id)
PROJECT(id, age)
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
75. Pipeline Program IR: Modifications (2)
31
LOOP(table, sequential)
FILTER(age<25, branched)
HASH_PUT(id, linear_probing)
PROJECT(id, age, single-pass)PROJECT(id, age)
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
76. Pipeline Program IR: Modifications (2)
31
LOOP(table, sequential)
FILTER(age<25, branched)
HASH_PUT(id, linear_probing)
PROJECT(id, age, single-pass)
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
77. Generating Code: Sequential Memory Access
32
int thread_id = get_thread_id();
start=start_idx(thread_id, num_rows);
end=end_idx(thread_id, num_rows);
for(tid=start;tid<end;tid+=1){
if(age[id] < 25){
OUTPUT(id[tid], age[tid]);
}
}
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
78. Memory Access Patterns
33S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
79. Pipeline Program IR: Rewrite
80
LOOP(table, coalesced)
FILTER(age<25, branched)
HASH_PUT(id, linear_probing)
PROJECT(id, age, single-pass)
LOOP(table, sequential)
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
80. Pipeline Program IR: Rewrite
81
LOOP(table, coalesced)
FILTER(age<25, branched)
HASH_PUT(id, linear_probing)
PROJECT(id, age, single-pass)
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
81. Generating Code: Coalesced Memory Access
82
int thread_id = get_thread_id();
int num_threads= get_num_threads();
for(id=thread_id;id<num_rows;
id+=num_threads){
if(age[id] < 25){
OUTPUT(id[tid], age[tid]);
}
}
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
82. Generating Code: Coalesced Memory Access
83
int thread_id = get_thread_id();
int num_threads= get_num_threads();
for(id=thread_id;id<num_rows;
id+=num_threads){
if(age[id] < 25){
OUTPUT(id[tid], age[tid]);
}
}
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
Pipeline programs provide fine-grained control over generated code
83. Performance: Memory Access Patterns
84S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
85. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 38
Change to a pipeline program that conserves the semantic but changes the code
Modification
Terminology
86. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 38
Change to a pipeline program that conserves the semantic but changes the code
Modification
Provides value for each supported modification, defines the generated code
Variant configuration
Terminology
87. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 38
Change to a pipeline program that conserves the semantic but changes the code
Modification
Provides value for each supported modification, defines the generated code
Variant configuration
Compilation result of a pipeline program
Code variant
Terminology
88. Variant Optimization
39S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
89. Variant Optimization
39
Derive an efficient code variant for each processor
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
90. Variant Optimization
39
Derive an efficient code variant for each processor
Perform an offline calibration phase on a test workload
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
91. Variant Optimization
39
Derive an efficient code variant for each processor
Perform an offline calibration phase on a test workload
Explore the impact of each code modification separately
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
109. Search Algorithm
44S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
110. Search Algorithm
44
Finds an efficient variant with linear run-time in the number of dimensions
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
111. Search Algorithm
44
Finds an efficient variant with linear run-time in the number of dimensions
Code modifications are not strictly orthogonal (space not convex)
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
112. Search Algorithm
44
Finds an efficient variant with linear run-time in the number of dimensions
Code modifications are not strictly orthogonal (space not convex)
Perform multiple iterations of the algorithm to find best code variant
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
113. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 45
Optimizing Search Time
114. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 45
Terminate the search if no faster variant is found during an iteration
Early Termination
Optimizing Search Time
115. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 45
Terminate the search if no faster variant is found during an iteration
Early Termination
Explore the parameter values of the most critical modifications first
Feature Ordering
Optimizing Search Time
116. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 45
Terminate the search if no faster variant is found during an iteration
Early Termination
Explore the parameter values of the most critical modifications first
Feature Ordering
Only include code modifications that change the code
Nested Modifications
Optimizing Search Time
117. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 46
Evaluation of Search Time
Variant exploration times for SSB Q4.1 on SF1
118. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 46
Evaluation of Search Time
Our strategy outperforms backtracking by up to two orders of magnitude
Variant exploration times for SSB Q4.1 on SF1
119. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 47
Handling Query Dependencies
120. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 47
Variant configuration of a processor serves as starting point for further tuning
Reuse Variant Configurations
Handling Query Dependencies
121. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 47
Variant configuration of a processor serves as starting point for further tuning
Reuse Variant Configurations
Set a query-dependent modification to another parameter value when we
expect a performance improvement
Heuristic-Based Rewrites
Handling Query Dependencies
122. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 47
Variant configuration of a processor serves as starting point for further tuning
Reuse Variant Configurations
Set a query-dependent modification to another parameter value when we
expect a performance improvement
Heuristic-Based Rewrites
Switch to software predication in FILTER when selectivity is 50%
Example: Software Predication
Handling Query Dependencies
123. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 48
Query Compilation Times
124. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 48
Query Compilation Times
Compilation times of OpenCL are in the order of hundreds of milliseconds
125. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 48
Query Compilation Times
Compilation times of OpenCL are in the order of hundreds of milliseconds
126. S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018 48
Query Compilation Times
Compilation times of OpenCL are in the order of hundreds of milliseconds
Compilation times grow linear with the number of pipelines in a query
127. Evaluation Results
49
1
1
1
1
1
1
7
11
1
1
1 1 1
1
1
1
17
1
1
1
1
1
1
1
1
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
128. Evaluation Results
49
1
1
1
1
1
1
7
11
1
1
1 1 1
1
1
1
17
1
1
1
1
1
1
1
1
Performance difference among variants up to two orders of magnitude
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
129. Evaluation Results
49
1
1
1
1
1
1
7
11
1
1
1 1 1
1
1
1
17
1
1
1
1
1
1
1
1
Performance difference among variants up to two orders of magnitude
Hawk reliably identifies efficient code variants for CPUs, GPUs, MICs
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
130. Evaluation Results
49
1
1
1
1
1
1
7
11
1
1
1 1 1
1
1
1
17
1
1
1
1
1
1
1
1
Performance difference among variants up to two orders of magnitude
Hawk reliably identifies efficient code variants for CPUs, GPUs, MICs
Best code depends on query characteristics
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
131. Conclusion
50S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
132. Conclusion
50
A hardware-tailored code generator
Hawk
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
133. Conclusion
50
A hardware-tailored code generator
Hawk
Produce custom code variants for each processor
Code Variant Generation
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
134. Conclusion
50
A hardware-tailored code generator
Hawk
Produce custom code variants for each processor
Code Variant Generation
No manual tuning for a specific processor
Variant Optimization
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
135. https://github.com/TU-Berlin-DIMA/Hawk-VLDBJ
Conclusion
50
A hardware-tailored code generator
Hawk
Produce custom code variants for each processor
Code Variant Generation
No manual tuning for a specific processor
Variant Optimization
S. Breß et al.: Generating Custom Code for Efficient Query Execution on Heterogeneous Processors. In The VLDB Journal, 27(6), 797-822, 2018
136. Further Publications on Data Management on Modern Hardware:
Generating Custom Code for Efficient Query
Execution on Heterogeneous Processors
Sebastian Breß, Bastian Köcher, Henning Funke, Steffen Zeuch, Tilmann Rabl, Volker Markl
VLDB Journal, 27(6), 797-822, 2018
Pipelined Query Processing in
Coprocessor Environments
SIGMOD 2018
Efficient and Scalable
k-Means on GPUs.
Datenbank-Spektrum 2018
Analyzing Efficient Stream
Processing on Modern
Hardware
PVLDB 2019
137. Database Research at TU Berlin
Up Next:
Jonas Traub Sebastian Breß Martin Kiefer Andreas Kunft
Optimized On-Demand
Data Streaming from
Sensor Nodes
ACM Symposium on
Cloud Computing
(SoCC), 2017.
Estimating Join
Selectivities using
Bandwidth-Optimized
Kernel Density Models
Proceedings of the
VLDB Endowment
(PVLDB), 2017.
Generating Custom Code
for Efficient Query
Execution on
Heterogeneous
Processors
The VLDB Journal,
27(6), 2018.
BlockJoin:
Efficient Matrix
Partitioning Through
Joins
Proceedings of the
VLDB Endowment
(PVLDB), 2017.
Database Systems and Information Management Group (DIMA) of Volker Markl
138. GPU-Accelerated Join Selectivity Estimation using
KDE Models
Paper:
Estimating Join Selectivities using Bandwidth-Optimized Kernel Density Models,
Martin Kiefer, Max Heimel, Sebastian Breß, Volker Markl
PVLDB, Volume 10 Issue 13, September 2017
139. GPU-Accelerated Kernel Density Estimation for
Join Selectivity Estimation
54
Query Optimizer
Database Engine
Query
Plan
Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
140. GPU-Accelerated Kernel Density Estimation for
Join Selectivity Estimation
54
Query Optimizer
Database Engine
Statistical CoprocessorQuery
Plan
Parameters
Estimates
Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
141. GPU-Accelerated Kernel Density Estimation for
Join Selectivity Estimation
54
Query Optimizer
Database Engine
Statistical CoprocessorQuery
Plan
Parameters
Estimates
Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
142. Background: Kernel Density Estimators
55
Dataset
Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
143. Background: Kernel Density Estimators
55
Dataset Sample 𝑆
Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
144. Background: Kernel Density Estimators
55
Dataset Sample 𝑆 Kernels 𝐾 𝐻
Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
145. Background: Kernel Density Estimators
55
Dataset Sample 𝑆 Kernels 𝐾 𝐻 Estimate 𝑃 𝐻
Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
146. Background: Kernel Density Estimators
55
Dataset Sample 𝑆 Kernels 𝐾 𝐻 Estimate 𝑃 𝐻
Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
147. Background: Kernel Density Estimators
55
𝑃 𝐻 Ԧ𝑥 =
1
|𝑆|
𝑖=1
|𝑆|
𝐾 𝐻 𝑠𝑖, Ԧ𝑥
Average… … over the kernel contributions
Dataset Sample 𝑆 Kernels 𝐾 𝐻 Estimate 𝑃 𝐻
Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
148. Background: Kernel Density Estimators
56
Average… … over the kernel contributions
Dataset Sample 𝑆 Kernels 𝐾 𝐻 Estimate 𝑃 𝐻
Ω Ω
sel Ω =
1
|𝑆|
𝑖=1
|𝑆|
න
Ω
𝐾 𝐻(𝑠𝑖, Ԧ𝑥) 𝑑 Ԧ𝑥
Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
149. Background: Kernel Density Estimators for Multi-
Dimensional Selectivity Estimation [1]
57
Good fit Overfit Underfit
The bandwidth matrix 𝐻 controls the smoothing applied on the
sample
• Range selections over base tables
• Bandwidth optimization based on the estimation error
• Easy model maintenance
[1] Self-Tuning, GPU-Accelerated Kernel Density Models for Multidimensional Selectivity Estimation, SIGMOD’15
150. The Problem:
Multi-Dimensional Join Selectivity Estimation
• and generalization to multiple joins
• Databases: Independence Assumption
• Often violated
• Introduce large errors, potentially bad query plans
• Research: Various Methods (e.g. Sampling, Sketches)
• Our Approach: Kernel Density Estimators
58Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
151. Why KDEs for Join Selectivities?
• Multivariate Estimator
• No independence assumption
• Hybrid between samples and histograms
• Small bandwidth: Sample evaluation
• Increasing bandwidth: More smoothing, increasing bucket sizes
• Bandwidth optimization selects proper bandwidth
59Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
152. The Approach: Join and Base Table Models
60Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
153. The Approach: Join and Base Table Models
60
Sample from
𝑅1 ⋈ 𝑅1.𝐴1=𝑅2.𝐴1
𝑅2
Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
154. The Approach: Join and Base Table Models
60
Bandwidth 𝐻
Sample from
𝑅1 ⋈ 𝑅1.𝐴1=𝑅2.𝐴1
𝑅2
Join KDE Model (𝑷)
Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
155. The Approach: Join and Base Table Models
60
Bandwidth 𝐻
Sample from
𝑅1 ⋈ 𝑅1.𝐴1=𝑅2.𝐴1
𝑅2
Join KDE Model (𝑷)
𝑃(𝑐1 ∧ 𝑐2)Compute:
Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
156. The Approach: Join and Base Table Models
60
Bandwidth 𝐻
Sample from
𝑅1 ⋈ 𝑅1.𝐴1=𝑅2.𝐴1
𝑅2
Join KDE Model (𝑷)
Sample from 𝑅1 Sample from 𝑅2
𝑃(𝑐1 ∧ 𝑐2)Compute:
Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
157. The Approach: Join and Base Table Models
60
Bandwidth 𝐻
Sample from
𝑅1 ⋈ 𝑅1.𝐴1=𝑅2.𝐴1
𝑅2
Join KDE Model (𝑷)
Bandwidth 𝐻
Sample from 𝑅1
Base Table KDE Model
(𝑷 𝟏)
Bandwidth 𝐻
Sample from 𝑅2
Base Table KDE Model
(𝑷 𝟐)
𝑃(𝑐1 ∧ 𝑐2)Compute:
Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
158. The Approach: Join and Base Table Models
60
Bandwidth 𝐻
Sample from
𝑅1 ⋈ 𝑅1.𝐴1=𝑅2.𝐴1
𝑅2
Join KDE Model (𝑷)
Bandwidth 𝐻
Sample from 𝑅1
Base Table KDE Model
(𝑷 𝟏)
Bandwidth 𝐻
Sample from 𝑅2
Base Table KDE Model
(𝑷 𝟐)
𝑃(𝑐1 ∧ 𝑐2) Compute:
𝑣∈𝐴
𝑃1 𝐴1 = 𝑣 ∧ 𝑐1 ⋅ 𝑃2 𝐴2 = 𝑣 ∧ 𝑐2Compute:
Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
159. The Approach: Join and Base Table Models
60
Bandwidth 𝐻
Sample from
𝑅1 ⋈ 𝑅1.𝐴1=𝑅2.𝐴1
𝑅2
Join KDE Model (𝑷)
Bandwidth 𝐻
Sample from 𝑅1
Base Table KDE Model
(𝑷 𝟏)
Bandwidth 𝐻
Sample from 𝑅2
Base Table KDE Model
(𝑷 𝟐)
𝑃(𝑐1 ∧ 𝑐2) Compute:
𝑣∈𝐴
𝑃1 𝐴1 = 𝑣 ∧ 𝑐1 ⋅ 𝑃2 𝐴2 = 𝑣 ∧ 𝑐2Compute:
Easy to evaluate, better estimates
Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
160. The Approach: Join and Base Table Models
60
Bandwidth 𝐻
Sample from
𝑅1 ⋈ 𝑅1.𝐴1=𝑅2.𝐴1
𝑅2
Join KDE Model (𝑷)
Bandwidth 𝐻
Sample from 𝑅1
Base Table KDE Model
(𝑷 𝟏)
Bandwidth 𝐻
Sample from 𝑅2
Base Table KDE Model
(𝑷 𝟐)
𝑃(𝑐1 ∧ 𝑐2) Compute:
𝑣∈𝐴
𝑃1 𝐴1 = 𝑣 ∧ 𝑐1 ⋅ 𝑃2 𝐴2 = 𝑣 ∧ 𝑐2Compute:
Easy to evaluate, better estimates
Support for base table and join selectivities
Easy to construct and to maintain
Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
163. Table Model: Computation Components
61
Sum over cross
product of two
samples Invariant Contributions:
Contribution of sample
points wrt. selection
predicate
Selectivity:
164. Table Model: Computation Components
61
Sum over cross
product of two
samples Cross Contribution:
Distance function on join
attributes of sample points
Invariant Contributions:
Contribution of sample
points wrt. selection
predicate
Selectivity:
174. Evaluation: Scaling the Model Size
(Postgres)
64
Dataset: DMV
Query: Q1U
Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
175. Evaluation: Scaling the Model Size
(Table Sample)
65
Dataset: DMV
Query: Q1U
Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
176. Evaluation: Scaling the Model Size
(Correlated Sample)
66
Dataset: DMV
Query: Q1U
Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
177. Evaluation: Scaling the Model Size
(AGMS Sketch)
67
Dataset: DMV
Query: Q1U
Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
178. Evaluation: Scaling the Model Size
(Join Sample)
68
Dataset: DMV
Query: Q1U
Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
179. Evaluation: Scaling the Model Size
(Join Sample + KDE)
69
Dataset: DMV
Query: Q1U
Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
180. Evaluation: Scaling the Model Size
(Table Sample + KDE)
70
Dataset: DMV
Query: Q1U
Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
181. Runtime: CPU vs GPU
Dataset: IMDB
Workload: Q1U
GPU: Tesla V100
CPU: Intel Xeon Gold 5115
TS+KDE:
4x faster
JS+KDE:
5x faster
0,1
1
10
100
1% 2% 4% 8% 16%
AverageEstimationTime(ms)
Sample Size (Relative to Base Table Size)
TS+KDE (GPU) TS+KDE (CPU) JS+KDE (GPU) JS+KDE (CPU)
71Estimating Join Selectivities using Bandwidth-Optimized Kernel Densitity Models, Martin Kiefer et al. PVLDB, 2017 |
182. Conclusion
• KDE models for join selectivity estimation
• “Getting most out of your sample”
• Based on join or base table KDE models
• Learning hybrid between histograms and samples
• GPU-acceleration possible
• Experiments, data, and code online
72
github.com/martinkiefer/join-kde
“Estimating Join Selectivities using Bandwidth-
Optimized Kernel Density Models”, PVLDB 17
183. Further Publications on GPU-Accelerated Kernel Density Estimation:
Estimating Join Selectivities using Bandwidth-
Optimized Kernel Density Models
Martin Kiefer, Max Heimel, Sebastian Breß, Volker Markl
Proceedings of the VLDB Endowment, 10(13), 2017
Demonstrating Transfer-Efficient
Sample Maintenance on Graphics
Cards
EDBT 2015
Self-Tuning, GPU-Accelerated Kernel
Density Models for Multidimensional
Selectivity Estimation
SIGMOD 2015
184. Database Research at TU Berlin
Up Next:
Jonas Traub Sebastian Breß Martin Kiefer Andreas Kunft
Optimized On-Demand
Data Streaming from
Sensor Nodes
ACM Symposium on
Cloud Computing
(SoCC), 2017.
Estimating Join
Selectivities using
Bandwidth-Optimized
Kernel Density Models
Proceedings of the
VLDB Endowment
(PVLDB), 2017.
Generating Custom Code
for Efficient Query
Execution on
Heterogeneous
Processors
The VLDB Journal,
27(6), 2018.
BlockJoin:
Efficient Matrix
Partitioning Through
Joins
Proceedings of the
VLDB Endowment
(PVLDB), 2017.
Database Systems and Information Management Group (DIMA) of Volker Markl
185. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
BlockJoin: Efficient Matrix
Partitioning Through Joins
Andreas Kunft, Asterios Katsifodimos, Sebastian Schelter, Tilmann Rabl, Volker Markl
PVLDB, Volume 10 Issue 13, September 2017
186. 76
Common Pattern in end-to-end machine learning pipelines
1. Relational operators e.g., join and filter the input data
2. User-defined functions e.g., feature transformation and vectorization
3. Linear algebra operators e.g., model training and cross-validation
INTRODUCTION
⋈ ML𝒇
BlockJoin: Efficient Matrix Partitioning Through Joins, Andreas Kunft et al. PVLDB, 2017 |
187. 77
Parallel Dataflow engines implement
• Relational operators on row-partitioned datasets
• Linear algebra operators on block-partitioned matrices
INTRODUCTION
⋈ ML𝒇
BlockJoin: Efficient Matrix Partitioning Through Joins, Andreas Kunft et al. PVLDB, 2017 |
188. 78
Parallel Dataflow engines implement
• Relational operators on row-partitioned datasets
• Linear algebra operators on block-partitioned matrices
>> Pipelines combining both require expensive re-partitioning (shuffle) steps
INTRODUCTION
⋈ ML𝒇
BlockJoin: Efficient Matrix Partitioning Through Joins, Andreas Kunft et al. PVLDB, 2017 |
192. 0
0
1 1
2 2
0
1
1 3
1 4
STANDARD WORKFLOW - PROBLEMS
82
⋈
Join Result
Row-wise
0 1 1 1 1
1 2 2 2 2
2 1 3 3 3
3 1 4 4 4
Global row-index
Row-wise
1 3
1 4
Matrix
block-partitioned
Materializes the join result, just to apply sequential row-index:
• Shuffles data for row-wise partitioning , which is split up immediately
• Puts heavy load on a few machines in case of skewed keys
• Forces early matrix block materialization
Products
Reviews
PK
FK
1
0
1 1
2 2
1
1
3 3
4 4
P1 1 1 1 1
P2 2 2 2 2
P1 1 3 3 3
P1 1 4 4 4
P1 1
P2 2
P3 3
P1 1 1 1
P2 2 2 2
P1 3 3 3
P1 4 4 4
Distributed
Join
Re-
Partitioning
BlockJoin: Efficient Matrix Partitioning Through Joins, Andreas Kunft et al. PVLDB, 2017 |
193. • We propose
Specialized operators at the intersection of linear and relational algebra
• Here, we focus on
Efficient creation of block-partitioned results from normalized data
83
HOW CAN WE IMPROVE?
BlockJoin: Efficient Matrix Partitioning Through Joins, Andreas Kunft et al. PVLDB, 2017 |
195. OUR APPROACH
Creates block-partitioned results from normalized data
JOIN KERNEL: Local TID-Join on driver to create block-index meta-data
1. Meta-data provides mapping of TID to row-index for both relations
2. Row-index is applied independently: no materialization of join result
85BlockJoin: Efficient Matrix Partitioning Through Joins, Andreas Kunft et al. PVLDB, 2017 |
196. OUR APPROACH
Creates block-partitioned results from normalized data
JOIN KERNEL: Local TID-Join on driver to create block-index meta-data
FETCH KERNEL: Materialization strategy of matrix blocks based on matrix shape:
• Late materialization: Blocks are materialized on the receiver node
|PK columns| >> |FK columns|
• Early materialization: Blocks are materialized on the sender node
|PK columns| << |FK columns|
86BlockJoin: Efficient Matrix Partitioning Through Joins, Andreas Kunft et al. PVLDB, 2017 |
198. PK – FK JOIN
PK Table: 100k rows, scaling columns
FK Table: 1m rows, 5k columns
88
b. Power-law distributed FKsa. Uniform distributed FKs
up to 2.5x speedup
skew resistant,
while the baseline fails
BlockJoin: Efficient Matrix Partitioning Through Joins, Andreas Kunft et al. PVLDB, 2017 |
199. PK – FK JOIN
PK Table: 100k rows, scaling columns
FK Table: 1m rows, 5k columns
89
b. Power-law distributed FKsa. Uniform distributed FKs
BlockJoin: Efficient Matrix Partitioning Through Joins, Andreas Kunft et al. PVLDB, 2017 |
200. RECAP
BlockJoin is a logically fused operator pipeline
• Separation of matrix index creation and matrix materialization
> No materialization of join result
> Skew resistant
• Cost based block materialization based on data shape
> Late materialization
> Early materialization
90BlockJoin: Efficient Matrix Partitioning Through Joins, Andreas Kunft et al. PVLDB, 2017 |
201. Traub et al., Optimized On-Demand Data Streaming from Sensor Nodes, SoCC ‘17
Further Publications:
BlockJoin:
Efficient Matrix PartitioningThrough Joins
Andreas Kunft, Asterios Katsifodimos, Sebastian Schelter, Tilmann Rabl, and Volker Markl.
PVLDB 10.13, 2017
Bridging the gap: towards
optimization across linear
and relational algebra
BeyondMR 2016
Implicit Parallelism
through Deep Language
Embedding
SIGMOD 2015
ScootR: Scaling R
Dataframes on Dataflow
Systems
SoCC 2018
202. Database Research at TU Berlin
Today‘s Talks:
Jonas Traub Sebastian Breß Martin Kiefer Andreas Kunft
Optimized On-Demand
Data Streaming from
Sensor Nodes
ACM Symposium on
Cloud Computing
(SoCC), 2017.
Estimating Join
Selectivities using
Bandwidth-Optimized
Kernel Density Models
Proceedings of the
VLDB Endowment
(PVLDB), 2017.
Generating Custom Code
for Efficient Query
Execution on
Heterogeneous
Processors
The VLDB Journal,
27(6), 2018.
BlockJoin:
Efficient Matrix
Partitioning Through
Joins
Proceedings of the
VLDB Endowment
(PVLDB), 2017.
Database Systems and Information Management Group (DIMA) of Volker Markl