SlideShare una empresa de Scribd logo
1 de 35
Descargar para leer sin conexión
Efficient spatial queries
on vanilla databases
Julian Hyde • Apache Calcite online meetup • 2021/01/20
Relational Spatial
A spatial query
Find all restaurants within 1.5 distance units of
my location (6, 7)
restaurant x y
Zachary’s pizza 3 1
King Yen 7 7
Filippo’s 7 4
Station burger 5 6
•
•
•
•
Zachary’s
pizza
Filippo’s
King
Yen
Station
burger
A spatial query
Find all restaurants within 1.5 distance units of
my location (6, 7)
Using OpenGIS SQL extensions:
restaurant x y
Zachary’s pizza 3 1
King Yen 7 7
Filippo’s 7 4
Station burger 5 6
•
•
•
•
Zachary’s
pizza
Filippo’s
King
Yen
Station
burger
SELECT *
FROM Restaurants AS r
WHERE ST_Distance(
ST_MakePoint(r.x, r.y),
ST_MakePoint(6, 7)) < 1.5
Simple implementation
Using ESRI’s geometry-api-java library,
almost all ST_ functions were easy to
implement in Calcite.
Slow – one row at a time.
SELECT *
FROM Restaurants AS r
WHERE ST_Distance(
ST_MakePoint(r.x, r.y),
ST_MakePoint(6, 7)) < 1.5
package org.apache.calcite.runtime;
import com.esri.core.geometry.*;
/** Simple implementations of built-in geospatial functions. */
public class GeoFunctions {
/** Returns the distance between g1 and g2. */
public static double ST_Distance(Geom g1, Geom g2) {
return GeometryEngine.distance(g1.g(), g2.g(), g1.sr());
}
/** Constructs a 2D point from coordinates. */
public static Geom ST_MakePoint(double x, double y) {
final Geometry g = new Point(x, y);
return new SimpleGeom(g);
}
/** Geometry. It may or may not have a spatial reference
* associated with it. */
public interface Geom {
Geometry g();
SpatialReference sr();
Geom transform(int srid);
Geom wrap(Geometry g);
}
static class SimpleGeom implements Geom { … }
}
What do relational databases do well?
● Sorted & partitioned data
● Range queries
● Algebraic query rewrite
Root
Region
server
[A - Gh]
Region
server
[Gi - Ts]
Region
server
[Tr - Z]
● WHERE name = 'King Yen'
● WHERE name IN ('King Yen', 'Station burger')
● WHERE name = 'King Yen' OR name LIKE 'S%' OR name IS NULL
Search arguments (‘Sargs’)
A sarg is a sorted set of disjoint ranges for
a given data type
Data type must have a ‘<’ operator
Sargs are values, and can be combined
during query optimization: union, intersect
Easily implemented by a table/index scan
Equivalent predicates
WHERE x < 2
OR x IN (5, 8)
OR x BETWEEN 10 AND 20
WHERE x < 2
OR x = 5
OR x = 8
OR x >= 10 AND x <= 20
WHERE $SEARCH(x,
$SARG '(-∞, 2), [5, 5],
[8, 8], [10, 20]') *
*
pseudo-syntax
-∞
2 5
+∞
8 10 20
Traditional DB indexing
techniques don’t work
Sort
Hash
CREATE /* b-tree */ INDEX
I_Restaurants
ON Restaurants(x, y);
CREATE TABLE Restaurants(
restaurant VARCHAR(20),
x INTEGER,
y INTEGER)
PARTITION BY (MOD(x + 5279 * y, 1024));
•
•
•
•
A scan over a two-dimensional index only
has locality in one dimension
Spatial data structures and algorithms
The challenge: Reduce dimensionality while preserving locality
● Reduce dimensionality – We want to warp the information space so that
we can access on one composite attribute rather than several
● Preserve locality – If two items are close in 2D, we want them to be close
in the information space (and in the same cache line or disk block)
Two main approaches to spatial data structures:
● Data-oriented
● Space-oriented
A spatially skewed data set
2000 population distribution in
continental United States. Each
dot = 7,500 people.
R-tree (a data-oriented structure)
R-tree (split vertically into 2)
R-tree (split horizontally into 4)
R-tree (split vertically into 8)
R-tree (split horizontally into 16)
R-tree (split vertically into 32)
Grid (a space-oriented structure)
Grid (a space-oriented structure)
Hilbert space-filling curve
● A space-filling curve invented by mathematician David Hilbert
● Every (x, y) point has a unique position on the curve
● Points near to each other typically have Hilbert indexes close together
Spatial query
Find all restaurants within 1.5 distance units of
my location (6, 7):
restaurant x y
Zachary’s pizza 3 1
King Yen 7 7
Filippo’s 7 4
Station burger 5 6
•
•
•
•
Zachary’s
pizza
Filippo’s
King
Yen
Station
burger
SELECT *
FROM Restaurants AS r
WHERE ST_Distance(
ST_MakePoint(r.x, r.y),
ST_MakePoint(6, 7)) < 1.5
•
•
•
•
Add restriction based on h, a restaurant’s distance
along the Hilbert curve
Must keep original restriction due to false positives
Using Hilbert index
restaurant x y h
Zachary’s pizza 3 1 5
King Yen 7 7 41
Filippo’s 7 4 52
Station burger 5 6 36
Zachary’s
pizza
Filippo’s
SELECT *
FROM Restaurants AS r
WHERE (r.h BETWEEN 35 AND 42
OR r.h BETWEEN 46 AND 46)
AND ST_Distance(
ST_MakePoint(r.x, r.y),
ST_MakePoint(6, 7)) < 1.5
King
Yen
Station
burger
•
•
•
•
Add restriction based on h, a restaurant’s distance
along the Hilbert curve
Must keep original restriction due to false positives
Using Hilbert index
restaurant x y h
Zachary’s pizza 3 1 5
King Yen 7 7 41
Filippo’s 7 4 52
Station burger 5 6 36
Zachary’s
pizza
Filippo’s
SELECT *
FROM Restaurants AS r
WHERE (r.h BETWEEN 35 AND 42
OR r.h BETWEEN 46 AND 46)
AND ST_Distance(
ST_MakePoint(r.x, r.y),
ST_MakePoint(6, 7)) < 1.5
King
Yen
Station
burger
Sarg
Telling the optimizer
1. Declare h as a generated column
2. Sort table by h
Planner can now convert spatial range
queries into a range scan
Does not require specialized spatial
index such as r-tree
Very efficient on a sorted table such as
HBase
CREATE TABLE Restaurants (
restaurant VARCHAR(20),
x DOUBLE,
y DOUBLE,
h DOUBLE GENERATED ALWAYS AS
ST_Hilbert(x, y) STORED)
SORT KEY (h);
restaurant x y h
Zachary’s pizza 3 1 5
Station burger 5 6 36
King Yen 7 7 41
Filippo’s 7 4 52
Algebraic rewrite
Scan [T]
Filter [ST_Distance(
ST_Point(T.X, T.Y),
ST_Point(x, y)) < d]
Scan [T]
Filter [$SEARCH(T.H, sarg)
AND ST_Distance(
ST_Point(T.X, T.Y),
ST_Point(x, y)) < d]
Constraint: Table T has a
column H such that:
H = Hilbert(X, Y)
FilterHilbertRule
x, y, d, sarg – constants
T – table
T.X, T.Y, T.H – columns
Variations on a theme
Several ways to say the same thing using OpenGIS functions:
● ST_Distance(ST_Point(X, Y), ST_Point(x, y)) < d
● ST_Distance(ST_Point(x, y), ST_Point(X, Y)) < d
● ST_DWithin(ST_Point(x, y), ST_Point(X, Y), d)
● ST_Contains(ST_Buffer(ST_Point(x, y), d), ST_Point(X, Y))
Other patterns can use Hilbert functions:
● ST_DWithin(ST_MakeLine(ST_Point(x1, y1), ST_Point(x2, y2)),
ST_Point(X, Y), d)
● ST_Contains(ST_PolyFromText('POLYGON((0 0,20 0,20 20,0 20,0 0))'),
ST_Point(X, Y), d)
More spatial queries
What state am I in? (1-point-to-1-polygon)
Which states does Yellowstone NP intersect?
(1-polygon-to-many-polygons)
Which US national park intersects with the most
states? (many-polygons-to-many-polygons,
followed by sort/limit)
More spatial queries
What state am I in? (point-to-polygon)
Which states does Yellowstone NP intersect?
(polygon-to-polygon)
SELECT *
FROM States AS s
WHERE ST_Intersects(s.geometry,
ST_MakePoint(6, 7))
SELECT *
FROM States AS s
WHERE ST_Intersects(s.geometry,
ST_GeomFromText('LINESTRING(...)'))
Tile index
Idaho
Montana
Nevada Utah Colorado
Wyoming
We cannot use space-filling curves, because each
region (state or park) is a set of points and not
known as planning time.
Divide regions into a (coarse) set of tiles. They
intersect only if some of their tiles intersect.
Tile index 6 7 8
3 4 5
0 1 2
Idaho
Montana
Nevada Utah Colorado
Wyoming
tileId state
0 Nevada
0 Utah
1 Utah
2 Colorado
2 Utah
3 Idaho
3 Nevada
3 Utah
4 Idaho
tileId state
4 Utah
4 Wyoming
5 Wyoming
6 Idaho
6 Montana
7 Montana
7 Wyoming
8 Montana
8 Wyoming
Aside: Materialized views
CREATE MATERIALIZED
VIEW EmpSummary AS
SELECT deptno, COUNT(*) AS c
FROM Emp
GROUP BY deptno;
Scan [Emps]
Scan
[EmpSummary]
Aggregate [deptno, count(*)]
empno name deptno
100 Fred 20
110 Barney 10
120 Wilma 30
130 Dino 10
deptno c
10 2
20 1
30 1
A materialized view is a table
that is defined by a query
The planner knows about the
mapping and can
transparently rewrite queries
to use it
Building the tile index
Use the ST_MakeGrid function to decompose each
state into a series of tiles
Store the results in a table, sorted by tile id
A materialized view is a table that remembers how
it was computed, so the planner can rewrite queries
to use it
CREATE MATERIALIZED VIEW StateTiles AS
SELECT s.stateId, t.tileId
FROM States AS s,
LATERAL TABLE(ST_MakeGrid(s.geometry, 4, 4)) AS t
6 7 8
3 4 5
0 1 2
Idaho
Montana
Nevada Utah Colorado
Wyoming
Point-to-polygon query
What state am I in? (point-to-polygon)
1. Divide the plane into tiles, and pre-compute
the state-tile intersections
2. Use this ‘tile index’ to narrow list of states
SELECT s.*
FROM States AS s
WHERE s.stateId IN (SELECT stateId
FROM StateTiles AS t
WHERE t.tileId = 8)
AND ST_Intersects(s.geometry, ST_MakePoint(6, 7))
6 7 8
3 4 5
0 1 2
Idaho
Montana
Nevada Utah Colorado
Wyoming
Algebraic rewrite
Scan [S]
Filter [ST_Intersects(
S.geometry,
ST_Point(x, y)]
SemiJoin [S.stateId = T.stateId]
Constraint #1: There is a table “Tiles” defined by
SELECT s.stateId, t.tileId FROM States AS s,
LATERAL TABLE(ST_MakeGrid(s.geometry, x, y)) AS t
Constraint #2: stateId is primary key of S
TileSemiJoinRule
Filter [ST_Intersects(
S.geometry,
ST_Point(x, y)]
Scan [S] Filter [T.tileId = 8]
Scan [T]
Summary
Traditional DB techniques (sort, hash) don’t work for 2-dimensional data
Spatial presents tough design choices:
● Space-oriented vs data-oriented algorithms
● General-purpose vs specialized data structures
Relational algebra unifies traditional and spatial:
● Use general-purpose structures
● Compose techniques (transactions, analytics, spatial, streaming)
● Must use space-oriented algorithms, because their dimensionality-reducing
mapping is known at planning time
Thank you!
Questions?
https://calcite.apache.org
@ApacheCalcite
@julianhyde
Resources & credits
● [CALCITE-1870] Lattice suggester
● [CALCITE-1861] Spatial indexes
● [CALCITE-1968] OpenGIS
● [CALCITE-1991] Generated columns
● [CALCITE-4173] Add internal SEARCH operator and Sarg literal
● Talk: “SQL on everything, in memory” (Strata, 2014)
● Zhang, Qi, Stradling, Huang (2014). “Towards a Painless Index for Spatial Objects”
● https://www.census.gov/geo/maps-data/maps/2000popdistribution.html
● https://www.nasa.gov/mission_pages/NPP/news/earth-at-night.html

Más contenido relacionado

La actualidad más candente

Hive Functions Cheat Sheet
Hive Functions Cheat SheetHive Functions Cheat Sheet
Hive Functions Cheat SheetHortonworks
 
Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Julian Hyde
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Julian Hyde
 
ACADILD:: HADOOP LESSON
ACADILD:: HADOOP LESSON ACADILD:: HADOOP LESSON
ACADILD:: HADOOP LESSON Padma shree. T
 
Grouping & Summarizing Data in R
Grouping & Summarizing Data in RGrouping & Summarizing Data in R
Grouping & Summarizing Data in RJeffrey Breen
 
Pivoting Data with SparkSQL by Andrew Ray
Pivoting Data with SparkSQL by Andrew RayPivoting Data with SparkSQL by Andrew Ray
Pivoting Data with SparkSQL by Andrew RaySpark Summit
 
Flexible Indexing with Postgres
Flexible Indexing with PostgresFlexible Indexing with Postgres
Flexible Indexing with PostgresEDB
 
MySQL 8.0.18 latest updates: Hash join and EXPLAIN ANALYZE
MySQL 8.0.18 latest updates: Hash join and EXPLAIN ANALYZEMySQL 8.0.18 latest updates: Hash join and EXPLAIN ANALYZE
MySQL 8.0.18 latest updates: Hash join and EXPLAIN ANALYZENorvald Ryeng
 
pandas - Python Data Analysis
pandas - Python Data Analysispandas - Python Data Analysis
pandas - Python Data AnalysisAndrew Henshaw
 
Stata cheat sheet: data processing
Stata cheat sheet: data processingStata cheat sheet: data processing
Stata cheat sheet: data processingTim Essam
 
U-SQL Query Execution and Performance Tuning
U-SQL Query Execution and Performance TuningU-SQL Query Execution and Performance Tuning
U-SQL Query Execution and Performance TuningMichael Rys
 
Stata Programming Cheat Sheet
Stata Programming Cheat SheetStata Programming Cheat Sheet
Stata Programming Cheat SheetLaura Hughes
 
Cost Based Optimizer - Part 1 of 2
Cost Based Optimizer - Part 1 of 2Cost Based Optimizer - Part 1 of 2
Cost Based Optimizer - Part 1 of 2Mahesh Vallampati
 
Python and Data Analysis
Python and Data AnalysisPython and Data Analysis
Python and Data AnalysisPraveen Nair
 
5 R Tutorial Data Visualization
5 R Tutorial Data Visualization5 R Tutorial Data Visualization
5 R Tutorial Data VisualizationSakthi Dasans
 

La actualidad más candente (20)

Hive Functions Cheat Sheet
Hive Functions Cheat SheetHive Functions Cheat Sheet
Hive Functions Cheat Sheet
 
Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...Is there a perfect data-parallel programming language? (Experiments with More...
Is there a perfect data-parallel programming language? (Experiments with More...
 
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
Data all over the place! How SQL and Apache Calcite bring sanity to streaming...
 
Cost-based Query Optimization
Cost-based Query Optimization Cost-based Query Optimization
Cost-based Query Optimization
 
ACADILD:: HADOOP LESSON
ACADILD:: HADOOP LESSON ACADILD:: HADOOP LESSON
ACADILD:: HADOOP LESSON
 
Grouping & Summarizing Data in R
Grouping & Summarizing Data in RGrouping & Summarizing Data in R
Grouping & Summarizing Data in R
 
Pivoting Data with SparkSQL by Andrew Ray
Pivoting Data with SparkSQL by Andrew RayPivoting Data with SparkSQL by Andrew Ray
Pivoting Data with SparkSQL by Andrew Ray
 
R Get Started I
R Get Started IR Get Started I
R Get Started I
 
Flexible Indexing with Postgres
Flexible Indexing with PostgresFlexible Indexing with Postgres
Flexible Indexing with Postgres
 
R Get Started II
R Get Started IIR Get Started II
R Get Started II
 
MySQL 8.0.18 latest updates: Hash join and EXPLAIN ANALYZE
MySQL 8.0.18 latest updates: Hash join and EXPLAIN ANALYZEMySQL 8.0.18 latest updates: Hash join and EXPLAIN ANALYZE
MySQL 8.0.18 latest updates: Hash join and EXPLAIN ANALYZE
 
pandas - Python Data Analysis
pandas - Python Data Analysispandas - Python Data Analysis
pandas - Python Data Analysis
 
Stata cheat sheet: data processing
Stata cheat sheet: data processingStata cheat sheet: data processing
Stata cheat sheet: data processing
 
The PostgreSQL Query Planner
The PostgreSQL Query PlannerThe PostgreSQL Query Planner
The PostgreSQL Query Planner
 
U-SQL Query Execution and Performance Tuning
U-SQL Query Execution and Performance TuningU-SQL Query Execution and Performance Tuning
U-SQL Query Execution and Performance Tuning
 
Stata Programming Cheat Sheet
Stata Programming Cheat SheetStata Programming Cheat Sheet
Stata Programming Cheat Sheet
 
Sparklyr
SparklyrSparklyr
Sparklyr
 
Cost Based Optimizer - Part 1 of 2
Cost Based Optimizer - Part 1 of 2Cost Based Optimizer - Part 1 of 2
Cost Based Optimizer - Part 1 of 2
 
Python and Data Analysis
Python and Data AnalysisPython and Data Analysis
Python and Data Analysis
 
5 R Tutorial Data Visualization
5 R Tutorial Data Visualization5 R Tutorial Data Visualization
5 R Tutorial Data Visualization
 

Similar a Efficient spatial queries on vanilla databases

Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Julian Hyde
 
New SQL features in latest MySQL releases
New SQL features in latest MySQL releasesNew SQL features in latest MySQL releases
New SQL features in latest MySQL releasesGeorgi Sotirov
 
Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with RYanchang Zhao
 
Interactively querying Google Analytics reports from R using ganalytics
Interactively querying Google Analytics reports from R using ganalyticsInteractively querying Google Analytics reports from R using ganalytics
Interactively querying Google Analytics reports from R using ganalyticsJohann de Boer
 
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介Masayuki Matsushita
 
MySQL and GIS Programming
MySQL and GIS ProgrammingMySQL and GIS Programming
MySQL and GIS ProgrammingMike Benshoof
 
Love Your Database Railsconf 2017
Love Your Database Railsconf 2017Love Your Database Railsconf 2017
Love Your Database Railsconf 2017gisborne
 
Comparing Geospatial Implementation in MongoDB, Postgres, and Elastic
Comparing Geospatial Implementation in MongoDB, Postgres, and ElasticComparing Geospatial Implementation in MongoDB, Postgres, and Elastic
Comparing Geospatial Implementation in MongoDB, Postgres, and ElasticAntonios Giannopoulos
 
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012Big Data Spain
 
Covering the earth and the cloud the next generation of spatial in sql server...
Covering the earth and the cloud the next generation of spatial in sql server...Covering the earth and the cloud the next generation of spatial in sql server...
Covering the earth and the cloud the next generation of spatial in sql server...Texas Natural Resources Information System
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine LearningAmanBhalla14
 
High Performance GPU computing with Ruby, Rubykaigi 2018
High Performance GPU computing with Ruby, Rubykaigi 2018High Performance GPU computing with Ruby, Rubykaigi 2018
High Performance GPU computing with Ruby, Rubykaigi 2018Prasun Anand
 
Advanced SQL For Data Scientists
Advanced SQL For Data ScientistsAdvanced SQL For Data Scientists
Advanced SQL For Data ScientistsDatabricks
 
SQL Performance Solutions: Refactor Mercilessly, Index Wisely
SQL Performance Solutions: Refactor Mercilessly, Index WiselySQL Performance Solutions: Refactor Mercilessly, Index Wisely
SQL Performance Solutions: Refactor Mercilessly, Index WiselyEnkitec
 
Introduction To PostGIS
Introduction To PostGISIntroduction To PostGIS
Introduction To PostGISmleslie
 
Indexes, Filters, and Other Animals
Indexes, Filters, and Other AnimalsIndexes, Filters, and Other Animals
Indexes, Filters, and Other AnimalsScyllaDB
 

Similar a Efficient spatial queries on vanilla databases (20)

Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!Don't optimize my queries, organize my data!
Don't optimize my queries, organize my data!
 
New SQL features in latest MySQL releases
New SQL features in latest MySQL releasesNew SQL features in latest MySQL releases
New SQL features in latest MySQL releases
 
R programming language
R programming languageR programming language
R programming language
 
Regression and Classification with R
Regression and Classification with RRegression and Classification with R
Regression and Classification with R
 
Interactively querying Google Analytics reports from R using ganalytics
Interactively querying Google Analytics reports from R using ganalyticsInteractively querying Google Analytics reports from R using ganalytics
Interactively querying Google Analytics reports from R using ganalytics
 
Introduction to r
Introduction to rIntroduction to r
Introduction to r
 
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
クラウドDWHとしても進化を続けるPivotal Greenplumご紹介
 
MySQL and GIS Programming
MySQL and GIS ProgrammingMySQL and GIS Programming
MySQL and GIS Programming
 
Love Your Database Railsconf 2017
Love Your Database Railsconf 2017Love Your Database Railsconf 2017
Love Your Database Railsconf 2017
 
Comparing Geospatial Implementation in MongoDB, Postgres, and Elastic
Comparing Geospatial Implementation in MongoDB, Postgres, and ElasticComparing Geospatial Implementation in MongoDB, Postgres, and Elastic
Comparing Geospatial Implementation in MongoDB, Postgres, and Elastic
 
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
Crunching Data with Google BigQuery. JORDAN TIGANI at Big Data Spain 2012
 
Covering the earth and the cloud the next generation of spatial in sql server...
Covering the earth and the cloud the next generation of spatial in sql server...Covering the earth and the cloud the next generation of spatial in sql server...
Covering the earth and the cloud the next generation of spatial in sql server...
 
R programming & Machine Learning
R programming & Machine LearningR programming & Machine Learning
R programming & Machine Learning
 
High Performance GPU computing with Ruby, Rubykaigi 2018
High Performance GPU computing with Ruby, Rubykaigi 2018High Performance GPU computing with Ruby, Rubykaigi 2018
High Performance GPU computing with Ruby, Rubykaigi 2018
 
Do You Have the Time
Do You Have the TimeDo You Have the Time
Do You Have the Time
 
Advanced SQL For Data Scientists
Advanced SQL For Data ScientistsAdvanced SQL For Data Scientists
Advanced SQL For Data Scientists
 
Seminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mmeSeminar PSU 10.10.2014 mme
Seminar PSU 10.10.2014 mme
 
SQL Performance Solutions: Refactor Mercilessly, Index Wisely
SQL Performance Solutions: Refactor Mercilessly, Index WiselySQL Performance Solutions: Refactor Mercilessly, Index Wisely
SQL Performance Solutions: Refactor Mercilessly, Index Wisely
 
Introduction To PostGIS
Introduction To PostGISIntroduction To PostGIS
Introduction To PostGIS
 
Indexes, Filters, and Other Animals
Indexes, Filters, and Other AnimalsIndexes, Filters, and Other Animals
Indexes, Filters, and Other Animals
 

Más de Julian Hyde

Building a semantic/metrics layer using Calcite
Building a semantic/metrics layer using CalciteBuilding a semantic/metrics layer using Calcite
Building a semantic/metrics layer using CalciteJulian Hyde
 
Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!Julian Hyde
 
Adding measures to Calcite SQL
Adding measures to Calcite SQLAdding measures to Calcite SQL
Adding measures to Calcite SQLJulian Hyde
 
Morel, a data-parallel programming language
Morel, a data-parallel programming languageMorel, a data-parallel programming language
Morel, a data-parallel programming languageJulian Hyde
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Julian Hyde
 
The evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityThe evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityJulian Hyde
 
What to expect when you're Incubating
What to expect when you're IncubatingWhat to expect when you're Incubating
What to expect when you're IncubatingJulian Hyde
 
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache CalciteOpen Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache CalciteJulian Hyde
 
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Julian Hyde
 
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteA smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteJulian Hyde
 
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Julian Hyde
 
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteJulian Hyde
 
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache CalciteJulian Hyde
 
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...Julian Hyde
 
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
 Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/TridentJulian Hyde
 

Más de Julian Hyde (19)

Building a semantic/metrics layer using Calcite
Building a semantic/metrics layer using CalciteBuilding a semantic/metrics layer using Calcite
Building a semantic/metrics layer using Calcite
 
Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!Cubing and Metrics in SQL, oh my!
Cubing and Metrics in SQL, oh my!
 
Adding measures to Calcite SQL
Adding measures to Calcite SQLAdding measures to Calcite SQL
Adding measures to Calcite SQL
 
Morel, a data-parallel programming language
Morel, a data-parallel programming languageMorel, a data-parallel programming language
Morel, a data-parallel programming language
 
Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)Apache Calcite (a tutorial given at BOSS '21)
Apache Calcite (a tutorial given at BOSS '21)
 
The evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its CommunityThe evolution of Apache Calcite and its Community
The evolution of Apache Calcite and its Community
 
What to expect when you're Incubating
What to expect when you're IncubatingWhat to expect when you're Incubating
What to expect when you're Incubating
 
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache CalciteOpen Source SQL - beyond parsers: ZetaSQL and Apache Calcite
Open Source SQL - beyond parsers: ZetaSQL and Apache Calcite
 
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
Smarter Together - Bringing Relational Algebra, Powered by Apache Calcite, in...
 
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache CalciteA smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
A smarter Pig: Building a SQL interface to Apache Pig using Apache Calcite
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
Streaming SQL (at FlinkForward, Berlin, 2016/09/12)
 
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache CalciteCost-based Query Optimization in Apache Phoenix using Apache Calcite
Cost-based Query Optimization in Apache Phoenix using Apache Calcite
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Streaming SQL with Apache Calcite
Streaming SQL with Apache CalciteStreaming SQL with Apache Calcite
Streaming SQL with Apache Calcite
 
Streaming SQL
Streaming SQLStreaming SQL
Streaming SQL
 
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
Planning with Polyalgebra: Bringing Together Relational, Complex and Machine ...
 
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
 Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
Querying the Internet of Things: Streaming SQL on Kafka/Samza and Storm/Trident
 

Último

Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Andreas Granig
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaHanief Utama
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identityteam-WIBU
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsAhmed Mohamed
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsSafe Software
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalLionel Briand
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...confluent
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesŁukasz Chruściel
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureDinusha Kumarasiri
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Velvetech LLC
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Mater
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Developmentvyaparkranti
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based projectAnoyGreter
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作qr0udbr0
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsChristian Birchler
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf31events.com
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfStefano Stabellini
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Angel Borroy López
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmSujith Sukumaran
 
How To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROHow To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROmotivationalword821
 

Último (20)

Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024Automate your Kamailio Test Calls - Kamailio World 2024
Automate your Kamailio Test Calls - Kamailio World 2024
 
React Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief UtamaReact Server Component in Next.js by Hanief Utama
React Server Component in Next.js by Hanief Utama
 
Post Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on IdentityPost Quantum Cryptography – The Impact on Identity
Post Quantum Cryptography – The Impact on Identity
 
Unveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML DiagramsUnveiling Design Patterns: A Visual Guide with UML Diagrams
Unveiling Design Patterns: A Visual Guide with UML Diagrams
 
Powering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data StreamsPowering Real-Time Decisions with Continuous Data Streams
Powering Real-Time Decisions with Continuous Data Streams
 
Precise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive GoalPrecise and Complete Requirements? An Elusive Goal
Precise and Complete Requirements? An Elusive Goal
 
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
Catch the Wave: SAP Event-Driven and Data Streaming for the Intelligence Ente...
 
Unveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New FeaturesUnveiling the Future: Sylius 2.0 New Features
Unveiling the Future: Sylius 2.0 New Features
 
Implementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with AzureImplementing Zero Trust strategy with Azure
Implementing Zero Trust strategy with Azure
 
Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...Software Project Health Check: Best Practices and Techniques for Your Product...
Software Project Health Check: Best Practices and Techniques for Your Product...
 
Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)Ahmed Motair CV April 2024 (Senior SW Developer)
Ahmed Motair CV April 2024 (Senior SW Developer)
 
VK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web DevelopmentVK Business Profile - provides IT solutions and Web Development
VK Business Profile - provides IT solutions and Web Development
 
MYjobs Presentation Django-based project
MYjobs Presentation Django-based projectMYjobs Presentation Django-based project
MYjobs Presentation Django-based project
 
英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作英国UN学位证,北安普顿大学毕业证书1:1制作
英国UN学位证,北安普顿大学毕业证书1:1制作
 
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving CarsSensoDat: Simulation-based Sensor Dataset of Self-driving Cars
SensoDat: Simulation-based Sensor Dataset of Self-driving Cars
 
Sending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdfSending Calendar Invites on SES and Calendarsnack.pdf
Sending Calendar Invites on SES and Calendarsnack.pdf
 
Xen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdfXen Safety Embedded OSS Summit April 2024 v4.pdf
Xen Safety Embedded OSS Summit April 2024 v4.pdf
 
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
Alfresco TTL#157 - Troubleshooting Made Easy: Deciphering Alfresco mTLS Confi...
 
Intelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalmIntelligent Home Wi-Fi Solutions | ThinkPalm
Intelligent Home Wi-Fi Solutions | ThinkPalm
 
How To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTROHow To Manage Restaurant Staff -BTRESTRO
How To Manage Restaurant Staff -BTRESTRO
 

Efficient spatial queries on vanilla databases

  • 1. Efficient spatial queries on vanilla databases Julian Hyde • Apache Calcite online meetup • 2021/01/20
  • 3. A spatial query Find all restaurants within 1.5 distance units of my location (6, 7) restaurant x y Zachary’s pizza 3 1 King Yen 7 7 Filippo’s 7 4 Station burger 5 6 • • • • Zachary’s pizza Filippo’s King Yen Station burger
  • 4. A spatial query Find all restaurants within 1.5 distance units of my location (6, 7) Using OpenGIS SQL extensions: restaurant x y Zachary’s pizza 3 1 King Yen 7 7 Filippo’s 7 4 Station burger 5 6 • • • • Zachary’s pizza Filippo’s King Yen Station burger SELECT * FROM Restaurants AS r WHERE ST_Distance( ST_MakePoint(r.x, r.y), ST_MakePoint(6, 7)) < 1.5
  • 5. Simple implementation Using ESRI’s geometry-api-java library, almost all ST_ functions were easy to implement in Calcite. Slow – one row at a time. SELECT * FROM Restaurants AS r WHERE ST_Distance( ST_MakePoint(r.x, r.y), ST_MakePoint(6, 7)) < 1.5 package org.apache.calcite.runtime; import com.esri.core.geometry.*; /** Simple implementations of built-in geospatial functions. */ public class GeoFunctions { /** Returns the distance between g1 and g2. */ public static double ST_Distance(Geom g1, Geom g2) { return GeometryEngine.distance(g1.g(), g2.g(), g1.sr()); } /** Constructs a 2D point from coordinates. */ public static Geom ST_MakePoint(double x, double y) { final Geometry g = new Point(x, y); return new SimpleGeom(g); } /** Geometry. It may or may not have a spatial reference * associated with it. */ public interface Geom { Geometry g(); SpatialReference sr(); Geom transform(int srid); Geom wrap(Geometry g); } static class SimpleGeom implements Geom { … } }
  • 6. What do relational databases do well? ● Sorted & partitioned data ● Range queries ● Algebraic query rewrite Root Region server [A - Gh] Region server [Gi - Ts] Region server [Tr - Z] ● WHERE name = 'King Yen' ● WHERE name IN ('King Yen', 'Station burger') ● WHERE name = 'King Yen' OR name LIKE 'S%' OR name IS NULL
  • 7. Search arguments (‘Sargs’) A sarg is a sorted set of disjoint ranges for a given data type Data type must have a ‘<’ operator Sargs are values, and can be combined during query optimization: union, intersect Easily implemented by a table/index scan Equivalent predicates WHERE x < 2 OR x IN (5, 8) OR x BETWEEN 10 AND 20 WHERE x < 2 OR x = 5 OR x = 8 OR x >= 10 AND x <= 20 WHERE $SEARCH(x, $SARG '(-∞, 2), [5, 5], [8, 8], [10, 20]') * * pseudo-syntax -∞ 2 5 +∞ 8 10 20
  • 8. Traditional DB indexing techniques don’t work Sort Hash CREATE /* b-tree */ INDEX I_Restaurants ON Restaurants(x, y); CREATE TABLE Restaurants( restaurant VARCHAR(20), x INTEGER, y INTEGER) PARTITION BY (MOD(x + 5279 * y, 1024)); • • • • A scan over a two-dimensional index only has locality in one dimension
  • 9. Spatial data structures and algorithms The challenge: Reduce dimensionality while preserving locality ● Reduce dimensionality – We want to warp the information space so that we can access on one composite attribute rather than several ● Preserve locality – If two items are close in 2D, we want them to be close in the information space (and in the same cache line or disk block) Two main approaches to spatial data structures: ● Data-oriented ● Space-oriented
  • 10. A spatially skewed data set 2000 population distribution in continental United States. Each dot = 7,500 people.
  • 19. Hilbert space-filling curve ● A space-filling curve invented by mathematician David Hilbert ● Every (x, y) point has a unique position on the curve ● Points near to each other typically have Hilbert indexes close together
  • 20. Spatial query Find all restaurants within 1.5 distance units of my location (6, 7): restaurant x y Zachary’s pizza 3 1 King Yen 7 7 Filippo’s 7 4 Station burger 5 6 • • • • Zachary’s pizza Filippo’s King Yen Station burger SELECT * FROM Restaurants AS r WHERE ST_Distance( ST_MakePoint(r.x, r.y), ST_MakePoint(6, 7)) < 1.5
  • 21. • • • • Add restriction based on h, a restaurant’s distance along the Hilbert curve Must keep original restriction due to false positives Using Hilbert index restaurant x y h Zachary’s pizza 3 1 5 King Yen 7 7 41 Filippo’s 7 4 52 Station burger 5 6 36 Zachary’s pizza Filippo’s SELECT * FROM Restaurants AS r WHERE (r.h BETWEEN 35 AND 42 OR r.h BETWEEN 46 AND 46) AND ST_Distance( ST_MakePoint(r.x, r.y), ST_MakePoint(6, 7)) < 1.5 King Yen Station burger
  • 22. • • • • Add restriction based on h, a restaurant’s distance along the Hilbert curve Must keep original restriction due to false positives Using Hilbert index restaurant x y h Zachary’s pizza 3 1 5 King Yen 7 7 41 Filippo’s 7 4 52 Station burger 5 6 36 Zachary’s pizza Filippo’s SELECT * FROM Restaurants AS r WHERE (r.h BETWEEN 35 AND 42 OR r.h BETWEEN 46 AND 46) AND ST_Distance( ST_MakePoint(r.x, r.y), ST_MakePoint(6, 7)) < 1.5 King Yen Station burger Sarg
  • 23. Telling the optimizer 1. Declare h as a generated column 2. Sort table by h Planner can now convert spatial range queries into a range scan Does not require specialized spatial index such as r-tree Very efficient on a sorted table such as HBase CREATE TABLE Restaurants ( restaurant VARCHAR(20), x DOUBLE, y DOUBLE, h DOUBLE GENERATED ALWAYS AS ST_Hilbert(x, y) STORED) SORT KEY (h); restaurant x y h Zachary’s pizza 3 1 5 Station burger 5 6 36 King Yen 7 7 41 Filippo’s 7 4 52
  • 24. Algebraic rewrite Scan [T] Filter [ST_Distance( ST_Point(T.X, T.Y), ST_Point(x, y)) < d] Scan [T] Filter [$SEARCH(T.H, sarg) AND ST_Distance( ST_Point(T.X, T.Y), ST_Point(x, y)) < d] Constraint: Table T has a column H such that: H = Hilbert(X, Y) FilterHilbertRule x, y, d, sarg – constants T – table T.X, T.Y, T.H – columns
  • 25. Variations on a theme Several ways to say the same thing using OpenGIS functions: ● ST_Distance(ST_Point(X, Y), ST_Point(x, y)) < d ● ST_Distance(ST_Point(x, y), ST_Point(X, Y)) < d ● ST_DWithin(ST_Point(x, y), ST_Point(X, Y), d) ● ST_Contains(ST_Buffer(ST_Point(x, y), d), ST_Point(X, Y)) Other patterns can use Hilbert functions: ● ST_DWithin(ST_MakeLine(ST_Point(x1, y1), ST_Point(x2, y2)), ST_Point(X, Y), d) ● ST_Contains(ST_PolyFromText('POLYGON((0 0,20 0,20 20,0 20,0 0))'), ST_Point(X, Y), d)
  • 26. More spatial queries What state am I in? (1-point-to-1-polygon) Which states does Yellowstone NP intersect? (1-polygon-to-many-polygons) Which US national park intersects with the most states? (many-polygons-to-many-polygons, followed by sort/limit)
  • 27. More spatial queries What state am I in? (point-to-polygon) Which states does Yellowstone NP intersect? (polygon-to-polygon) SELECT * FROM States AS s WHERE ST_Intersects(s.geometry, ST_MakePoint(6, 7)) SELECT * FROM States AS s WHERE ST_Intersects(s.geometry, ST_GeomFromText('LINESTRING(...)'))
  • 28. Tile index Idaho Montana Nevada Utah Colorado Wyoming We cannot use space-filling curves, because each region (state or park) is a set of points and not known as planning time. Divide regions into a (coarse) set of tiles. They intersect only if some of their tiles intersect.
  • 29. Tile index 6 7 8 3 4 5 0 1 2 Idaho Montana Nevada Utah Colorado Wyoming tileId state 0 Nevada 0 Utah 1 Utah 2 Colorado 2 Utah 3 Idaho 3 Nevada 3 Utah 4 Idaho tileId state 4 Utah 4 Wyoming 5 Wyoming 6 Idaho 6 Montana 7 Montana 7 Wyoming 8 Montana 8 Wyoming
  • 30. Aside: Materialized views CREATE MATERIALIZED VIEW EmpSummary AS SELECT deptno, COUNT(*) AS c FROM Emp GROUP BY deptno; Scan [Emps] Scan [EmpSummary] Aggregate [deptno, count(*)] empno name deptno 100 Fred 20 110 Barney 10 120 Wilma 30 130 Dino 10 deptno c 10 2 20 1 30 1 A materialized view is a table that is defined by a query The planner knows about the mapping and can transparently rewrite queries to use it
  • 31. Building the tile index Use the ST_MakeGrid function to decompose each state into a series of tiles Store the results in a table, sorted by tile id A materialized view is a table that remembers how it was computed, so the planner can rewrite queries to use it CREATE MATERIALIZED VIEW StateTiles AS SELECT s.stateId, t.tileId FROM States AS s, LATERAL TABLE(ST_MakeGrid(s.geometry, 4, 4)) AS t 6 7 8 3 4 5 0 1 2 Idaho Montana Nevada Utah Colorado Wyoming
  • 32. Point-to-polygon query What state am I in? (point-to-polygon) 1. Divide the plane into tiles, and pre-compute the state-tile intersections 2. Use this ‘tile index’ to narrow list of states SELECT s.* FROM States AS s WHERE s.stateId IN (SELECT stateId FROM StateTiles AS t WHERE t.tileId = 8) AND ST_Intersects(s.geometry, ST_MakePoint(6, 7)) 6 7 8 3 4 5 0 1 2 Idaho Montana Nevada Utah Colorado Wyoming
  • 33. Algebraic rewrite Scan [S] Filter [ST_Intersects( S.geometry, ST_Point(x, y)] SemiJoin [S.stateId = T.stateId] Constraint #1: There is a table “Tiles” defined by SELECT s.stateId, t.tileId FROM States AS s, LATERAL TABLE(ST_MakeGrid(s.geometry, x, y)) AS t Constraint #2: stateId is primary key of S TileSemiJoinRule Filter [ST_Intersects( S.geometry, ST_Point(x, y)] Scan [S] Filter [T.tileId = 8] Scan [T]
  • 34. Summary Traditional DB techniques (sort, hash) don’t work for 2-dimensional data Spatial presents tough design choices: ● Space-oriented vs data-oriented algorithms ● General-purpose vs specialized data structures Relational algebra unifies traditional and spatial: ● Use general-purpose structures ● Compose techniques (transactions, analytics, spatial, streaming) ● Must use space-oriented algorithms, because their dimensionality-reducing mapping is known at planning time
  • 35. Thank you! Questions? https://calcite.apache.org @ApacheCalcite @julianhyde Resources & credits ● [CALCITE-1870] Lattice suggester ● [CALCITE-1861] Spatial indexes ● [CALCITE-1968] OpenGIS ● [CALCITE-1991] Generated columns ● [CALCITE-4173] Add internal SEARCH operator and Sarg literal ● Talk: “SQL on everything, in memory” (Strata, 2014) ● Zhang, Qi, Stradling, Huang (2014). “Towards a Painless Index for Spatial Objects” ● https://www.census.gov/geo/maps-data/maps/2000popdistribution.html ● https://www.nasa.gov/mission_pages/NPP/news/earth-at-night.html