SlideShare a Scribd company logo
1 of 31
Download to read offline
Datomic R-trees

James Sofra
@sofra
https://github.com/jsofra/datomic-rtree
Summary
●

Motivations

●

Datomic overview

●

Datomic R-tree implementation

●

Hilbert Curves

●

Bulk loading (via Hilbert Curves)

●

Future plans
Motivations
●

I have an interest in geospatial applications
–

●

e.g. Thunderstorm probability application
(THESPA)

Datomic is an interesting database that
makes different trade-offs to other
databases
–

Wonder how far we can take the ability to
describe arbitrary structures in Datomic
Why don't we have both?
Datomic Overview
●

Immutable database

●

Time-base facts (stored as entites)

●

ACID transactions

●

Expressive queries using Datalog

●

Pluggable storage

●

Flexible enough to act as row, column or graph database

●

Schema that describes attributes that can be attached to
entities
–

●

Attributes have a type; String, Long, Double, Inst, Ref etc.

Database functions
–

Stored in the database, see the in transaction value
Datomic Overview Architecture
Datomic Motivations
●

Things that make Datomic appealing for spatial data
–

Time-base nature of Datomic is useful for time series data which we
often have

–

No need to add spatial operations (union, intersection, etc.) to the
database, can be handled by libraries in the peers

–

Spatial indexes can be stored as regular data, allows for a lot of
freedom over choice of index, handling multiple indexes over subsets
of the data in space and time

–

Flexible entity structures are useful because spatial data frequently
does not fit nicely in a table

–

Immutability is surprisingly useful in lots of different applications!
R-trees
●

●

●

●

●

●

Efficient query of
multi-dimensional data
Groups nearby objects
Balanced (all leaf nodes at
same level)
Aims for nodes minimise
empty space coverage and
overlap
Designed for storage on disk
(as used in databases)

"R-Trees: A Dynamic Index Structure for Spatial Searching"
–

Guttman, A (1984)
R-trees - Insertions
●
●

●

Choose a leaf node to insert
Insert entry into leaf node and enlarge
node
If node has more than max number of
children split the node and propagate
enlargement and splits up tree
Datomic R-tree - Schema
:rtree/root

:db.type/ref

:rtree/max-children

:db.type/long

:rtree/min-children

:db.type/long

:node/children

:db.type/ref

:node/is-leaf?

:db.type/boolean

:node/entry

:db.type/ref

:bbox/min-x

:db.type/double

:bbox/min-y

:db.type/double

:bbox/max-x

:db.type/double

:bbox/max-y

:db.type/double
Datomic R-tree choose-leaf
Datomic R-tree split-node
Datomic R-tree pick-seeds
Datomic R-tree - pick-next
Datomic R-tree –
regular transaction
Transaction for
adding new entry,
calls database
function
Database function

New entry with new ID

Add new entry as
child to leaf node
Datomic R-tree –
split transaction
New entry
Remove root
Create new
leaf nodes

Add new root
Bulk loading
●

Issues with single insertion loading of R-tree
–
–

●

●

●

Becomes slow with with many insertions
The resulting tree is not as always as efficient as it
could be

Bulk loading builds a tree once from a number
of entities
Two basic approaches top-down and
bottom-up
Bulk loading does not imply bulk insertion
Bulk loading – sort based loading
●

Aims for better R-tree performance

●

Bottom-up approach

●

Sorts all entities in an order that aims to preserve locality

●

●

●

Partitions the entities into clusters that are (hopefully)
spatially collocated
Recursively apply partitioning to build up the tree
“Sort-based Query-adaptive Loading of R-trees”
–

●

D. Achakeev; B. Seeger; P. Widmayer (2012)

“Sort-based parallel loading of R-trees”
–

D. Achakeev; M. Seidemann; M. Schmidt; B. Seeger (2012)
Hilbert Curves
●

●

●

●

a continuous fractal
space-filling curve
first described by
mathematician David Hilbert in
1891
useful because it enables
mapping from 2D to 1D
preserving some notion of
locality
Other options are; Peano
curve, Z-order curve (aka
Morton Curve)
Hilbert Curves
●

●

●

●

a continuous fractal
space-filling curve
first described by
mathematician David Hilbert in
1891
useful because it enables
mapping from 2D to 1D
preserving some notion of
locality
Other options are; Peano
curve, Z-order curve (aka
Morton Curve)
Bulk loading – hilbert sort based

●

Better Hilbert partitioning
Bulk loading via Hilbert curves
●

●

●

●

Insert all entities into Datomic (or using
existing entities)
Entities include an indexed Hilbert value
attribute
Obtain a seq of the entities using the :avet
index with the Hilbert value
Perform partioning
Bulk - hilbert-ents

Takes advantage of Datomic index API to get direct
access to the Hilbert index
Bulk - min-cost-index

List of options for the
next partition point
Must be at least
min-children in the
partition
Bulk - cost-partition
Bulk - p-cost-partition
Bulk - dyn-cost-partition
Conclusions
●

It works!
(install-single-insertions conn 50000 20 10)
–

"Elapsed time: 119114.342783 msecs"

(install-and-bulk-load conn 50000 20 10)
–

"Elapsed time: 6511.543299 msecs"

(time (naive-intersecting all-entries search-box))
–

"Elapsed time: 870.575802 msecs"

(time (intersecting root search-box))
–

"Elapsed time: 2.927883 msecs"
* note these times should be regarded with suspicion since they
only use the in memory database
Future plans
●

Retractions and updates

●

Bulk insertions

●

More search and query support

●

●

Schema for supporting Meridian Shapes
and Features
Investigate other R-trees; R* tree, R+ tree
Questions?

Thanks you! Any questions?
James Sofra
@sofra
Other Interesting
Resources
●

●

"The R*-tree: an efficient and robust access method for points
and rectangles"
“OMT: Overlap Minimizing Top-down Bulk Loading Algorithm for
R-tree.”
–

●

“The Priority R-Tree: A Practically Efficient and Worst-Case
Optimal R-Tree”
–

●

L. Arge; M. de Berg; K. Yi (2004)

“Compact Hilbert Indices”
–

●

T. Lee; S. Lee (2003)

Hamilton. C (2006)

“R-Trees: Theory and Applications”
–

Manolopoulos. Y; Nanopoulos. A; Papadopoulos. A. N; Theodoridis. Y
(2006)

More Related Content

What's hot

Ronalao termpresent
Ronalao termpresentRonalao termpresent
Ronalao termpresentElma Belitz
 
TriHUG 3/14: HBase in Production
TriHUG 3/14: HBase in ProductionTriHUG 3/14: HBase in Production
TriHUG 3/14: HBase in Productiontrihug
 
[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...
[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...
[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...PingCAP
 
Introduction to kdb+
Introduction to kdb+Introduction to kdb+
Introduction to kdb+Rory Winston
 
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...NAVER D2
 
Working Experience_V5.0
Working Experience_V5.0Working Experience_V5.0
Working Experience_V5.0Danny Lai
 
Dremel interactive analysis of web scale datasets
Dremel interactive analysis of web scale datasetsDremel interactive analysis of web scale datasets
Dremel interactive analysis of web scale datasetsCarl Lu
 
Dremel: Interactive Analysis of Web-Scale Datasets
Dremel: Interactive Analysis of Web-Scale Datasets Dremel: Interactive Analysis of Web-Scale Datasets
Dremel: Interactive Analysis of Web-Scale Datasets robertlz
 
DB reading group may 16, 2018
DB reading group may 16, 2018DB reading group may 16, 2018
DB reading group may 16, 2018Keisuke Suzuki
 
Using python to analyze spatial data
Using python to analyze spatial dataUsing python to analyze spatial data
Using python to analyze spatial dataKudos S.A.S
 
High Dimensional Indexing using MongoDB (MongoSV 2012)
High Dimensional Indexing using MongoDB (MongoSV 2012)High Dimensional Indexing using MongoDB (MongoSV 2012)
High Dimensional Indexing using MongoDB (MongoSV 2012)Nicholas Knize, Ph.D., GISP
 
If the Data Cannot Come To The Algorithm...
If the Data Cannot Come To The Algorithm...If the Data Cannot Come To The Algorithm...
If the Data Cannot Come To The Algorithm...Robert Burrell Donkin
 
Ch 5: Introduction to heap overflows
Ch 5: Introduction to heap overflowsCh 5: Introduction to heap overflows
Ch 5: Introduction to heap overflowsSam Bowne
 
Handling the growth of data
Handling the growth of dataHandling the growth of data
Handling the growth of dataPiyush Katariya
 
SystemML - Datapalooza Denver - 05.17.16 MWD
SystemML - Datapalooza Denver - 05.17.16 MWDSystemML - Datapalooza Denver - 05.17.16 MWD
SystemML - Datapalooza Denver - 05.17.16 MWDMike Dusenberry
 

What's hot (19)

Ronalao termpresent
Ronalao termpresentRonalao termpresent
Ronalao termpresent
 
TriHUG 3/14: HBase in Production
TriHUG 3/14: HBase in ProductionTriHUG 3/14: HBase in Production
TriHUG 3/14: HBase in Production
 
[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...
[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...
[Paper reading] Interleaving with Coroutines: A Practical Approach for Robust...
 
19compression
19compression19compression
19compression
 
Google's Dremel
Google's DremelGoogle's Dremel
Google's Dremel
 
Introduction to kdb+
Introduction to kdb+Introduction to kdb+
Introduction to kdb+
 
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...
[2D3]TurboGraph- Ultrafast graph analystics engine for billion-scale graphs i...
 
Working Experience_V5.0
Working Experience_V5.0Working Experience_V5.0
Working Experience_V5.0
 
Dremel interactive analysis of web scale datasets
Dremel interactive analysis of web scale datasetsDremel interactive analysis of web scale datasets
Dremel interactive analysis of web scale datasets
 
Dremel: Interactive Analysis of Web-Scale Datasets
Dremel: Interactive Analysis of Web-Scale Datasets Dremel: Interactive Analysis of Web-Scale Datasets
Dremel: Interactive Analysis of Web-Scale Datasets
 
DB reading group may 16, 2018
DB reading group may 16, 2018DB reading group may 16, 2018
DB reading group may 16, 2018
 
Using python to analyze spatial data
Using python to analyze spatial dataUsing python to analyze spatial data
Using python to analyze spatial data
 
High Dimensional Indexing using MongoDB (MongoSV 2012)
High Dimensional Indexing using MongoDB (MongoSV 2012)High Dimensional Indexing using MongoDB (MongoSV 2012)
High Dimensional Indexing using MongoDB (MongoSV 2012)
 
Hello cloud 3
Hello  cloud 3Hello  cloud 3
Hello cloud 3
 
If the Data Cannot Come To The Algorithm...
If the Data Cannot Come To The Algorithm...If the Data Cannot Come To The Algorithm...
If the Data Cannot Come To The Algorithm...
 
Progress_190118
Progress_190118Progress_190118
Progress_190118
 
Ch 5: Introduction to heap overflows
Ch 5: Introduction to heap overflowsCh 5: Introduction to heap overflows
Ch 5: Introduction to heap overflows
 
Handling the growth of data
Handling the growth of dataHandling the growth of data
Handling the growth of data
 
SystemML - Datapalooza Denver - 05.17.16 MWD
SystemML - Datapalooza Denver - 05.17.16 MWDSystemML - Datapalooza Denver - 05.17.16 MWD
SystemML - Datapalooza Denver - 05.17.16 MWD
 

Viewers also liked

Making the Most Out of Your Next Job Fair by Kolby Goodman, Career Coach TheJ...
Making the Most Out of Your Next Job Fair by Kolby Goodman, Career Coach TheJ...Making the Most Out of Your Next Job Fair by Kolby Goodman, Career Coach TheJ...
Making the Most Out of Your Next Job Fair by Kolby Goodman, Career Coach TheJ...kolbygoodman
 
презентация по секциям
презентация по секциямпрезентация по секциям
презентация по секциямmarina423
 
Игра «Умники и умницы» по сказке Л.И.Куликова «Золотая бабочка»
Игра «Умники и умницы» по сказке Л.И.Куликова «Золотая бабочка» Игра «Умники и умницы» по сказке Л.И.Куликова «Золотая бабочка»
Игра «Умники и умницы» по сказке Л.И.Куликова «Золотая бабочка» metodist4560
 
Karate1 pl template_bulletin_2014 karate one i̇stanbul open2014_final
Karate1 pl template_bulletin_2014 karate one i̇stanbul open2014_finalKarate1 pl template_bulletin_2014 karate one i̇stanbul open2014_final
Karate1 pl template_bulletin_2014 karate one i̇stanbul open2014_finalMiguel Nacarino Karateyalgomas
 
Blogging for business
Blogging for businessBlogging for business
Blogging for businessDelia Rusu
 
Facebook для бизнеса
Facebook для бизнесаFacebook для бизнеса
Facebook для бизнесаAlexey Kolb
 
Har 1014 Vortec DBW Wiring Harness Manual and Instructions
Har 1014 Vortec DBW Wiring Harness Manual and Instructions Har 1014 Vortec DBW Wiring Harness Manual and Instructions
Har 1014 Vortec DBW Wiring Harness Manual and Instructions PSI Conversion
 
Manual: 2005 and Newer - LS2 LS3 Drive By EFI Wiring Harness
Manual: 2005 and Newer - LS2 LS3 Drive By EFI Wiring Harness Manual: 2005 and Newer - LS2 LS3 Drive By EFI Wiring Harness
Manual: 2005 and Newer - LS2 LS3 Drive By EFI Wiring Harness PSI Conversion
 
Rk birojs - viens serviss
Rk birojs  - viens servissRk birojs  - viens serviss
Rk birojs - viens servissSIA RK Birojs
 
Manual: Wiring Harness 1998 – 2002 GM LS1 / LS6 Drive by Cable Electronic Fu...
Manual: Wiring Harness 1998 – 2002 GM LS1 / LS6 Drive by Cable  Electronic Fu...Manual: Wiring Harness 1998 – 2002 GM LS1 / LS6 Drive by Cable  Electronic Fu...
Manual: Wiring Harness 1998 – 2002 GM LS1 / LS6 Drive by Cable Electronic Fu...PSI Conversion
 
A l azhar
A l azharA l azhar
A l azharimhoney
 
Guide to ATTRA's Livestock and Pasture Publications
Guide to ATTRA's Livestock and Pasture PublicationsGuide to ATTRA's Livestock and Pasture Publications
Guide to ATTRA's Livestock and Pasture PublicationsGardening
 
Organic Pumpkin and Winter Squash Marketing and Production
Organic Pumpkin and Winter Squash Marketing and ProductionOrganic Pumpkin and Winter Squash Marketing and Production
Organic Pumpkin and Winter Squash Marketing and ProductionGardening
 
Bison Production
Bison ProductionBison Production
Bison ProductionGardening
 
Вечерний город
Вечерний городВечерний город
Вечерний городmetodist4560
 
Organic and Low-Spray Apple Production
Organic and Low-Spray Apple ProductionOrganic and Low-Spray Apple Production
Organic and Low-Spray Apple ProductionGardening
 

Viewers also liked (20)

Take back control
Take back controlTake back control
Take back control
 
Presentazione1
Presentazione1Presentazione1
Presentazione1
 
Making the Most Out of Your Next Job Fair by Kolby Goodman, Career Coach TheJ...
Making the Most Out of Your Next Job Fair by Kolby Goodman, Career Coach TheJ...Making the Most Out of Your Next Job Fair by Kolby Goodman, Career Coach TheJ...
Making the Most Out of Your Next Job Fair by Kolby Goodman, Career Coach TheJ...
 
презентация по секциям
презентация по секциямпрезентация по секциям
презентация по секциям
 
Игра «Умники и умницы» по сказке Л.И.Куликова «Золотая бабочка»
Игра «Умники и умницы» по сказке Л.И.Куликова «Золотая бабочка» Игра «Умники и умницы» по сказке Л.И.Куликова «Золотая бабочка»
Игра «Умники и умницы» по сказке Л.И.Куликова «Золотая бабочка»
 
Karate1 pl template_bulletin_2014 karate one i̇stanbul open2014_final
Karate1 pl template_bulletin_2014 karate one i̇stanbul open2014_finalKarate1 pl template_bulletin_2014 karate one i̇stanbul open2014_final
Karate1 pl template_bulletin_2014 karate one i̇stanbul open2014_final
 
Blogging for business
Blogging for businessBlogging for business
Blogging for business
 
Facebook для бизнеса
Facebook для бизнесаFacebook для бизнеса
Facebook для бизнеса
 
Har 1014 Vortec DBW Wiring Harness Manual and Instructions
Har 1014 Vortec DBW Wiring Harness Manual and Instructions Har 1014 Vortec DBW Wiring Harness Manual and Instructions
Har 1014 Vortec DBW Wiring Harness Manual and Instructions
 
Manual: 2005 and Newer - LS2 LS3 Drive By EFI Wiring Harness
Manual: 2005 and Newer - LS2 LS3 Drive By EFI Wiring Harness Manual: 2005 and Newer - LS2 LS3 Drive By EFI Wiring Harness
Manual: 2005 and Newer - LS2 LS3 Drive By EFI Wiring Harness
 
Selfintrospeech
SelfintrospeechSelfintrospeech
Selfintrospeech
 
Rk birojs - viens serviss
Rk birojs  - viens servissRk birojs  - viens serviss
Rk birojs - viens serviss
 
Manual: Wiring Harness 1998 – 2002 GM LS1 / LS6 Drive by Cable Electronic Fu...
Manual: Wiring Harness 1998 – 2002 GM LS1 / LS6 Drive by Cable  Electronic Fu...Manual: Wiring Harness 1998 – 2002 GM LS1 / LS6 Drive by Cable  Electronic Fu...
Manual: Wiring Harness 1998 – 2002 GM LS1 / LS6 Drive by Cable Electronic Fu...
 
Dairy Sheep
Dairy SheepDairy Sheep
Dairy Sheep
 
A l azhar
A l azharA l azhar
A l azhar
 
Guide to ATTRA's Livestock and Pasture Publications
Guide to ATTRA's Livestock and Pasture PublicationsGuide to ATTRA's Livestock and Pasture Publications
Guide to ATTRA's Livestock and Pasture Publications
 
Organic Pumpkin and Winter Squash Marketing and Production
Organic Pumpkin and Winter Squash Marketing and ProductionOrganic Pumpkin and Winter Squash Marketing and Production
Organic Pumpkin and Winter Squash Marketing and Production
 
Bison Production
Bison ProductionBison Production
Bison Production
 
Вечерний город
Вечерний городВечерний город
Вечерний город
 
Organic and Low-Spray Apple Production
Organic and Low-Spray Apple ProductionOrganic and Low-Spray Apple Production
Organic and Low-Spray Apple Production
 

Similar to Datomic R-trees implement spatial indexing with Hilbert curves

Challenges of Building a First Class SQL-on-Hadoop Engine
Challenges of Building a First Class SQL-on-Hadoop EngineChallenges of Building a First Class SQL-on-Hadoop Engine
Challenges of Building a First Class SQL-on-Hadoop EngineNicolas Morales
 
20181215 introduction to graph databases
20181215   introduction to graph databases20181215   introduction to graph databases
20181215 introduction to graph databasesTimothy Findlay
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding HadoopAhmed Ossama
 
Tez Data Processing over Yarn
Tez Data Processing over YarnTez Data Processing over Yarn
Tez Data Processing over YarnInMobi Technology
 
Pig on Tez - Low Latency ETL with Big Data
Pig on Tez - Low Latency ETL with Big DataPig on Tez - Low Latency ETL with Big Data
Pig on Tez - Low Latency ETL with Big DataDataWorks Summit
 
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized EngineApache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized EngineDataWorks Summit
 
Tajo_Meetup_20141120
Tajo_Meetup_20141120Tajo_Meetup_20141120
Tajo_Meetup_20141120Hyoungjun Kim
 
[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)Steve Min
 
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...Yahoo Developer Network
 
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...Gruter
 
IBM Spark Meetup - RDD & Spark Basics
IBM Spark Meetup - RDD & Spark BasicsIBM Spark Meetup - RDD & Spark Basics
IBM Spark Meetup - RDD & Spark BasicsSatya Narayan
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simpleDori Waldman
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solidLars Albertsson
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataSafe Software
 
CityLABS Workshop: Working with large tables
CityLABS Workshop: Working with large tablesCityLABS Workshop: Working with large tables
CityLABS Workshop: Working with large tablesEnrico Daga
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Dayprogrammermag
 

Similar to Datomic R-trees implement spatial indexing with Hilbert curves (20)

Challenges of Building a First Class SQL-on-Hadoop Engine
Challenges of Building a First Class SQL-on-Hadoop EngineChallenges of Building a First Class SQL-on-Hadoop Engine
Challenges of Building a First Class SQL-on-Hadoop Engine
 
20181215 introduction to graph databases
20181215   introduction to graph databases20181215   introduction to graph databases
20181215 introduction to graph databases
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
Big Data Processing
Big Data ProcessingBig Data Processing
Big Data Processing
 
Tez Data Processing over Yarn
Tez Data Processing over YarnTez Data Processing over Yarn
Tez Data Processing over Yarn
 
Pig on Tez - Low Latency ETL with Big Data
Pig on Tez - Low Latency ETL with Big DataPig on Tez - Low Latency ETL with Big Data
Pig on Tez - Low Latency ETL with Big Data
 
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized EngineApache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
Apache Tajo: Query Optimization Techniques and JIT-based Vectorized Engine
 
Tajo_Meetup_20141120
Tajo_Meetup_20141120Tajo_Meetup_20141120
Tajo_Meetup_20141120
 
[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)[SSA] 04.sql on hadoop(2014.02.05)
[SSA] 04.sql on hadoop(2014.02.05)
 
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
Apache Hadoop India Summit 2011 talk "Hadoop Map-Reduce Programming & Best Pr...
 
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
Hadoop Summit 2014: Query Optimization and JIT-based Vectorized Execution in ...
 
IBM Spark Meetup - RDD & Spark Basics
IBM Spark Meetup - RDD & Spark BasicsIBM Spark Meetup - RDD & Spark Basics
IBM Spark Meetup - RDD & Spark Basics
 
Big data should be simple
Big data should be simpleBig data should be simple
Big data should be simple
 
Data Pipeline at Tapad
Data Pipeline at TapadData Pipeline at Tapad
Data Pipeline at Tapad
 
Data pipelines from zero to solid
Data pipelines from zero to solidData pipelines from zero to solid
Data pipelines from zero to solid
 
Apache Spark
Apache SparkApache Spark
Apache Spark
 
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial DataCloud Revolution: Exploring the New Wave of Serverless Spatial Data
Cloud Revolution: Exploring the New Wave of Serverless Spatial Data
 
CityLABS Workshop: Working with large tables
CityLABS Workshop: Working with large tablesCityLABS Workshop: Working with large tables
CityLABS Workshop: Working with large tables
 
Dancing with the Elephant
Dancing with the ElephantDancing with the Elephant
Dancing with the Elephant
 
Google Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 DayGoogle Cloud Computing on Google Developer 2008 Day
Google Cloud Computing on Google Developer 2008 Day
 

Recently uploaded

Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clashcharlottematthew16
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Enterprise Knowledge
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DaySri Ambati
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionDilum Bandara
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsMiki Katsuragi
 

Recently uploaded (20)

DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Powerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time ClashPowerpoint exploring the locations used in television show Time Clash
Powerpoint exploring the locations used in television show Time Clash
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024Designing IA for AI - Information Architecture Conference 2024
Designing IA for AI - Information Architecture Conference 2024
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo DayH2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
H2O.ai CEO/Founder: Sri Ambati Keynote at Wells Fargo Day
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Advanced Computer Architecture – An Introduction
Advanced Computer Architecture – An IntroductionAdvanced Computer Architecture – An Introduction
Advanced Computer Architecture – An Introduction
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Vertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering TipsVertex AI Gemini Prompt Engineering Tips
Vertex AI Gemini Prompt Engineering Tips
 

Datomic R-trees implement spatial indexing with Hilbert curves

  • 2. Summary ● Motivations ● Datomic overview ● Datomic R-tree implementation ● Hilbert Curves ● Bulk loading (via Hilbert Curves) ● Future plans
  • 3. Motivations ● I have an interest in geospatial applications – ● e.g. Thunderstorm probability application (THESPA) Datomic is an interesting database that makes different trade-offs to other databases – Wonder how far we can take the ability to describe arbitrary structures in Datomic
  • 4. Why don't we have both?
  • 5. Datomic Overview ● Immutable database ● Time-base facts (stored as entites) ● ACID transactions ● Expressive queries using Datalog ● Pluggable storage ● Flexible enough to act as row, column or graph database ● Schema that describes attributes that can be attached to entities – ● Attributes have a type; String, Long, Double, Inst, Ref etc. Database functions – Stored in the database, see the in transaction value
  • 7. Datomic Motivations ● Things that make Datomic appealing for spatial data – Time-base nature of Datomic is useful for time series data which we often have – No need to add spatial operations (union, intersection, etc.) to the database, can be handled by libraries in the peers – Spatial indexes can be stored as regular data, allows for a lot of freedom over choice of index, handling multiple indexes over subsets of the data in space and time – Flexible entity structures are useful because spatial data frequently does not fit nicely in a table – Immutability is surprisingly useful in lots of different applications!
  • 8. R-trees ● ● ● ● ● ● Efficient query of multi-dimensional data Groups nearby objects Balanced (all leaf nodes at same level) Aims for nodes minimise empty space coverage and overlap Designed for storage on disk (as used in databases) "R-Trees: A Dynamic Index Structure for Spatial Searching" – Guttman, A (1984)
  • 9. R-trees - Insertions ● ● ● Choose a leaf node to insert Insert entry into leaf node and enlarge node If node has more than max number of children split the node and propagate enlargement and splits up tree
  • 10. Datomic R-tree - Schema :rtree/root :db.type/ref :rtree/max-children :db.type/long :rtree/min-children :db.type/long :node/children :db.type/ref :node/is-leaf? :db.type/boolean :node/entry :db.type/ref :bbox/min-x :db.type/double :bbox/min-y :db.type/double :bbox/max-x :db.type/double :bbox/max-y :db.type/double
  • 14. Datomic R-tree - pick-next
  • 15. Datomic R-tree – regular transaction Transaction for adding new entry, calls database function Database function New entry with new ID Add new entry as child to leaf node
  • 16. Datomic R-tree – split transaction New entry Remove root Create new leaf nodes Add new root
  • 17. Bulk loading ● Issues with single insertion loading of R-tree – – ● ● ● Becomes slow with with many insertions The resulting tree is not as always as efficient as it could be Bulk loading builds a tree once from a number of entities Two basic approaches top-down and bottom-up Bulk loading does not imply bulk insertion
  • 18. Bulk loading – sort based loading ● Aims for better R-tree performance ● Bottom-up approach ● Sorts all entities in an order that aims to preserve locality ● ● ● Partitions the entities into clusters that are (hopefully) spatially collocated Recursively apply partitioning to build up the tree “Sort-based Query-adaptive Loading of R-trees” – ● D. Achakeev; B. Seeger; P. Widmayer (2012) “Sort-based parallel loading of R-trees” – D. Achakeev; M. Seidemann; M. Schmidt; B. Seeger (2012)
  • 19. Hilbert Curves ● ● ● ● a continuous fractal space-filling curve first described by mathematician David Hilbert in 1891 useful because it enables mapping from 2D to 1D preserving some notion of locality Other options are; Peano curve, Z-order curve (aka Morton Curve)
  • 20. Hilbert Curves ● ● ● ● a continuous fractal space-filling curve first described by mathematician David Hilbert in 1891 useful because it enables mapping from 2D to 1D preserving some notion of locality Other options are; Peano curve, Z-order curve (aka Morton Curve)
  • 21. Bulk loading – hilbert sort based ● Better Hilbert partitioning
  • 22. Bulk loading via Hilbert curves ● ● ● ● Insert all entities into Datomic (or using existing entities) Entities include an indexed Hilbert value attribute Obtain a seq of the entities using the :avet index with the Hilbert value Perform partioning
  • 23. Bulk - hilbert-ents Takes advantage of Datomic index API to get direct access to the Hilbert index
  • 24. Bulk - min-cost-index List of options for the next partition point Must be at least min-children in the partition
  • 28. Conclusions ● It works! (install-single-insertions conn 50000 20 10) – "Elapsed time: 119114.342783 msecs" (install-and-bulk-load conn 50000 20 10) – "Elapsed time: 6511.543299 msecs" (time (naive-intersecting all-entries search-box)) – "Elapsed time: 870.575802 msecs" (time (intersecting root search-box)) – "Elapsed time: 2.927883 msecs" * note these times should be regarded with suspicion since they only use the in memory database
  • 29. Future plans ● Retractions and updates ● Bulk insertions ● More search and query support ● ● Schema for supporting Meridian Shapes and Features Investigate other R-trees; R* tree, R+ tree
  • 30. Questions? Thanks you! Any questions? James Sofra @sofra
  • 31. Other Interesting Resources ● ● "The R*-tree: an efficient and robust access method for points and rectangles" “OMT: Overlap Minimizing Top-down Bulk Loading Algorithm for R-tree.” – ● “The Priority R-Tree: A Practically Efficient and Worst-Case Optimal R-Tree” – ● L. Arge; M. de Berg; K. Yi (2004) “Compact Hilbert Indices” – ● T. Lee; S. Lee (2003) Hamilton. C (2006) “R-Trees: Theory and Applications” – Manolopoulos. Y; Nanopoulos. A; Papadopoulos. A. N; Theodoridis. Y (2006)