GeoMesa presentation from LocationTech Tour - DC - November, 14th 2013. Presented by Anthony Fox (@algoriffic) of CCRi.
GeoMesa is an open source project providing spatio-temporal indexing, querying, and visualizing capabilities to Accumulo. Learn more at http://geomesa.github.io/
4. Why?
● Volume of spatio-temporal data is increasing exponentially
● Traditional multi-dimensional indexing techniques are
straining to keep up
5. How?
• Storage - leverage distributed databases
•
like Accumulo.
Compute - parallelize spatio-temporal
queries and analytics using MapReduce.
GeoMesa enables geospatial analytics within
the Hadoop ecosystem.
6. What is GeoMesa?
• A flexible spatio-temporal
•
•
index built on Accumulo.
An implementation of
GeoTools interfaces to make
integration seamless.
A set of GeoServer plugins
for OGC compliant access to
data.
8. What is Accumulo?
“The Accumulo sorted distributed key/value store is a
robust, high performance data storage and retrieval
system”
http://accumulo.apache.org
9. What is Accumulo?
“The Accumulo sorted distributed key/value store is a
robust, high performance data storage and retrieval
system”
http://accumulo.apache.org
Based on Google BigTable
Adds cell-level security and server side
programming model in the form of
composable iterators
10. What is Accumulo?
“The Accumulo sorted distributed key/value store is a
robust, high performance data storage and retrieval
system”
http://accumulo.apache.org
h"p://accumulo.apache.org/1.4/user_manual/
Accumulo_Design.html
11. What is Accumulo?
“The Accumulo sorted distributed key/value store is a
robust, high performance data storage and retrieval
system”
http://accumulo.apache.org
h"p://accumulo.apache.org/1.4/user_manual/Accumulo_Design.html
12. How Do We Store Multi-Dimensional Data in a
Dictionary?
•
•
•
•
Space Filling Curves project
multiple dimensions into a single
dimension
Base32 encoding induces an
Accumulo friendly lexicographic
ordering
Recursive nesting facilitates storing
different resolutions of data
GeoHashes are common in web
services
http://blog.notdot.net/2009/11/Damn-Cool-Algorithms-Spatialindexing-with-Quadtrees-and-Hilbert-Curves
13. How Does GeoMesa’s Index Work?
Constructs a key beginning with a
shard id for horizontal
scalability.
14. How Does GeoMesa’s Index Work?
Constructs a key beginning with a
shard id for horizontal
scalability.
15. How Does GeoMesa’s Index Work?
Constructs a key beginning with a
shard id for horizontal
scalability.
16. How Does GeoMesa’s Index Work?
Constructs a key beginning with a
shard id for horizontal
scalability.
Uses Space Filling Curves to
encode spatio-temporal data in
Accumulo keys.
17. How Does GeoMesa’s Index Work?
Constructs a key beginning with a
shard id for horizontal
scalability.
Uses Space Filling Curves to
encode spatio-temporal data in
Accumulo keys.
Stacks server side iterators to
apply (E)CQL standard queries in
parallel at scan time.
19. How Does GeoMesa Perform?
GDELT - Global Database of Events, Language, and Tone
Leetaru, Kalev and Schrodt, Philip. (2013). GDELT: Global Data on Events, Language, and Tone, 1979-2012. International Studies Association Annual
Conference, April 2013. San Diego, CA. - See more at: http://gdelt.utdallas.edu/about.html
220 million geocoded events from 1979 until current.
Exhibits pathologies common in spatio-temporal data sets
Hot spots
Bad geocoding
20. GDELT
GDELT assigns an Event Code
to each event.
Codes are based on CAMEO Conflict Mediation and
Event Observation.
There are 20 top level
CAMEO codes.
John Beieler developed a
visualization of every
protest (one of the top
level categories) on the
planet since 1979.
http://www.foreignpolicy.com/articles/2013/08/22/mapped_what_every_protest_in_the_last_34_years_looks_like
23. Distributed Spatial Computations
●
Scalding greatly simplifies
Map/Reduce
●
AccumuloSource is an
implementation of a Scalding
source/sink
●
GeoMesa allows developers
to work with SimpleFeatures
in a Map/Reduce job
25. Roadmap
• Enhance integration with cell level security
• Build statistical index and query optimization
o
Bring Your Own Space Filling Curve
o
“VACUUM ANALYZE”
• Integrate GeoWebCache and Hadoop
• Ease developer on-ramping
• Grow community through LocationTech