SlideShare una empresa de Scribd logo
1 de 61
Descargar para leer sin conexión
Working with
using
&
What we’ll cover
● OpenStreetMap (OSM) and it’s data model
● A Missing Maps use case that needed big data tooling to
process OSM History
● OSMesa, what it is, and what it can do
● The future of distributed OSM processing, and what it will
enable
What is OpenStreetMap?
OSM Data Model
The OSM data model consists mainly of 3 elements:
● Nodes - Points
● Ways - LineStrings, Polygons
● Relations - GeometryCollections, Polygon with holes,
MultiPolygons
As well as the tag-based metadata that applies to each
elements, and changesets grouping edits
OSM Data Model: Relations
OSM Data Model: Changesets
● Edits are grouped into changesets, which have their own
metadata such as use comments (for developers, think
commit messages)
● Adding hashtags to user comments allows downstream
processing to group changes - for example, #HOTLunch
Backfilling missing maps
● Missing maps leaderboard processes OSM change files to
increment user and campaign statistics
● The statistics were correct for when the streaming
calculation started, but there was the problem of accounting
for edits previous to that streaming calculation not counting
towards user’s totals.
● So, there was a need to “backfill” the statistics based on
OSM history.
● Through the Red Cross and a grant
from Microsoft Philanthropies, Seth
Fitzsimmons of Pacific Atlas was
hired to solve the backfilling problem.
● Seth was previously involved with
releasing OSM data as a public
dataset on AWS and early work on
distributed processing of OSM data
Reducing the “time to first question”
Source: Seth’s blog post about processing OSM with Athena
Backfill: Athena approach
● Seth first tried to use Athena to calculate the backfill
statistics. This approach didn’t work
● The complexity of the queries made the jobs blow up or
never finish
● Also, Athena's geospatial support hadn't been announced
yet, and once it was, it still didn’t work with the complicated
set of queries
● Seth started showing interest in a set of tools that Azavea
was building at the time that used Apache Spark and
GeoTrellis for calculations calculating similar statistics
● He ported his complicated SQL queries for Athena to
SparkSQL and started contributing to that effort
Backfill: New approach
Leaderboard 2.0 blog post
What is OSMesa?
● It's a loose term for a workflow for OSM data processing
● Still being defined - useful, but amorphous
● More a group of tools and techniques then, say, a library
● Uses Spark, GeoTrellis and AWS to process OSM data into
geometries, vector tiles, and statistics
● a distributed computation engine.
● An API that lets you work with distributed data as a
collection, including a DataFrames API
● Written in Scala, with language bindings for use with Java,
Python, and R.
● Spark DataFrames provide an API that is similar to R or
Pandas DataFrames; allows working with data in a SQL-like
manner
● Very powerful, and can express complicated queries
● (partially) Abstracts away the complexities of distributed
computing
● Core geospatial library in Scala
● Enables Spark with geospatial types and operations
● Generally focused on Raster data, wraps JTS for vector
support
● Vector Tile module for reading and writing vector tiles
OSMesa workflow
AWS EMR Cluster
AWS S3
ORC
Statistics
Vector Tiles
ORC files
● With OSMesa, we can create full historical geometries.
● To do this, we need needed to create a concept of “minor
versions” of geometries
Creating features from History
way v1
highway=unclassified
node v1
node v1
node v1
node v1
node v1
node v1
node v2
node v2
way v2
highway=primary
node v1
node v1
node v2
node v2
way v1
highway=unclassified
way v1
highway=unclassified
node v1
node v1
node v1
node v1
way v1.1
highway=unclassified
node v1
node v1
node v2
node v2
way v2
highway=primary
node v1
node v1
node v2
node v2
minor
version
change
● With minor versions, we can bake new ORC files that
contain geometries of every element in OSM history, with
ways/relations representing every edit to the element as well
as elements that they contain
● Then, we compute statistics per changeset based on
geometries, and roll up the statistics per user and hashtag
Full historical geometries
● Processing of full history into features in under 40 minutes
(cluster of 255 m3.2xlarge nodes)
● This is not a small cluster ( ≈$65/hour). YMMV with smaller
clusters.
● We are building update mechanisms to avoid refreshing the
entire dataset
Processing OSM data at scale
Some data created by OSMesa...
Viewing time slices of Rhode Island OSM
Historical edits for several hashtag campaigns
Global friction surface for cost distance calculations using elevation (SRTM) and OSM roads + water bodies
● Building matching between OSM and other vector datasets
● Generating vector tiles for URCHN containing a subset of
historical data to front-end analytics
OSMesa: Other current uses
This is just the beginning
The Future: Validation workflows, Reputation
scores
● Better validation workflows is a big question in the OSM
community right now (according to SOTM US 2017)
● HOT Tasking manager does some; we can do better
● One way to improve validation workflows is to suggest
validation be done by veteran mappers, validation be
suggested for more junior mappers (“reputations core”)
● Development Seed, who contribute & uses OSMesa work,
have great ideas in this space.
The Future: Data Science notebooks,
production workflows
● We are aiming to create a Python notebook environment for
doing data science on OSM, in combination with raster data
● By building on Spark and projects like GeoMesa’s
“JTSFrames”, RasterFrames, and GeoTrellis, we’re creating
a platform that works both for data scientist poking around
in a Jupyter notebook and production systems.
The Future: Machine Learning pre- and post-
processing
● Pre-processing geospatial imagery and OSM into training
chips - a distributed label-maker
● Managing data into and out of Raster Vision
● Post-processing by cleaning the model output, matching to
OSM or other vector data to remove duplicates, conflation
workflows
● Matching OSM to imagery dates: e.g. pre- and post-
disaster.
Join in the fun
● There is a lot of interesting development challenges that
need to be met in the OSM world
● OSM has many different voices in the room, but they all
have one goal: building a better map
● Join the effort to build a better map
If you could ask the OpenStreetMap any
question, at any scale, what would you ask it?
THANKS!
Rob Emanuele, Azavea
@lossyrob (Twitter, GitHub)
www.azavea.com
Seth Fitzsimmons, Pacific Atlas
@mojodna (Twitter, GitHub)
www.pacatlas.com
github.com/azavea/osmesa
OSM Data Model: Nodes
● Single location; only OSM element with geospatial data
● Can represent points of interest, or be solely for inclusion in
ways
● Represents a Point geometry
OSM Data Model: Ways
● References a sequence of ordered nodes
● Represents a LineString geometry
● Closed ways can represent Polygon geometries
OSM Data Model: Relations
● Group of nodes, ways, and other relations
● Used for representing a Polygon with holes,
MultiPolygons, and more generally GeometryCollections
OSM Data Model: Tags
● Each Node, Way and Relation can have a sequence of
tags, which are string-based keys and values. This
describes the role of each element on the map, e.g.
○ highway=residential
○ landuse=grass
○ amenity=fast_food
Source: Dongpo Deng, https://www.slideshare.net/dongpo/the-one-and-many-maps-participatory-and-temporal-diversities-in-openstreetmap
https://planet.openstreetmap.org/
Ways to work with OSM snapshots
● Import OSM data into PostGIS
○ osm2pgsql
○ imposm3
● Render into raster tiles or vector tiles
○ Mapnik
○ Tegola
● Utilize for routing software
○ pgRouting
Ways to work with OSM history
● Clip it using osmium, and import a subset into PostGIS
● After that … not a lot of mature tooling available
Why is OSM history useful
● Calculating user history statistics
● Calculating campaign history statistics
● Calculating complete answers to the question, “what has
changed?”
● Taking a snapshot of OSM at any point in history
● Analytics for research
Why ORC?
● On-demand querying + predicate push-down is possible if
OSM data is in a format that was well-understood by the
Hadoop ecosystem
● bespoke formats have their place, especially when size or
other considerations are all-consuming, but it's really
frustrating to see people continually implementing OSM PBF
parsers to be slightly faster when those parsers are typically
single-use (for a specific application). i wanted to sidestep
the whole process and use a well-known, well-supported
The Approach: Features from OSM data
● Join element data to the other elements that contain them;
for example, join each node to the way(s) it belongs to.
● Assign a minor version to ways and relations modified
because the underlying elements change; e.g. a minor
version increments for a way if someone moves the nodes
belonging to it.
● Create Points, Line, Polygons, and Multipolygons for each
major and minor version of the element.
ProcessOSM.scala on GitHub
The Approach: Vector Tile Generation
Analytic Vector Tiles
● The name we’ve been using for Vector Tiles that contain
information for analysis not (necessarily) for display
● OSMesa/VectorPipe can create sets of Analytic Vector Tiles
from arbitrary subsets of OSM History and publish them to
AWS S3
● Think custom Mapbox QA Tiles, containing relations and
historical elements
● We are creating streaming update workflows to keep
Analytic Vector Tile sets up-to-the-minute (almost).
Other work in this space
● Mapbox’s Jennings Anderson gave a talk at SOTM and
wrote a blog post around quarterly QA tiles
● Uses a work-in-progress project called osm-wayback to
create the historical QA tiles. Goal of project is “...to create
historic geometries for each intermediate version of an OSM
feature.”
● RocksDB on the backend, which creates a ≈ 600GB index
● We have collaborating and looking to further collaborate,
the work is awesome
Animation of Rhode Island OSM edits over time
Global friction surface for cost distance calculations using elevation (SRTM) and OSM roads + water bodies
How to get started with OSMesa
● GitHub
● Gitter
● Docs are a TODO
An Aside - “Push vs Pull” models for AI
tooling for OSM (and in general)

Más contenido relacionado

La actualidad más candente

Modeling Cybersecurity with Neo4j, Based on Real-Life Data Insights
Modeling Cybersecurity with Neo4j, Based on Real-Life Data InsightsModeling Cybersecurity with Neo4j, Based on Real-Life Data Insights
Modeling Cybersecurity with Neo4j, Based on Real-Life Data InsightsNeo4j
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudJames Serra
 
Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...
Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...
Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...Databricks
 
Introducing Neo4j
Introducing Neo4jIntroducing Neo4j
Introducing Neo4jNeo4j
 
So You Think You Need A Digital Strategy
So You Think You Need A Digital StrategySo You Think You Need A Digital Strategy
So You Think You Need A Digital StrategyAlan McSweeney
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4jjexp
 
지리정보체계(GIS) - [1] GIS 데이터 유형, 구조 알기
지리정보체계(GIS) - [1] GIS 데이터 유형, 구조 알기지리정보체계(GIS) - [1] GIS 데이터 유형, 구조 알기
지리정보체계(GIS) - [1] GIS 데이터 유형, 구조 알기Byeong-Hyeok Yu
 
모든 데이터를 위한 단 하나의 저장소, Amazon S3 기반 데이터 레이크::정세웅::AWS Summit Seoul 2018
모든 데이터를 위한 단 하나의 저장소, Amazon S3 기반 데이터 레이크::정세웅::AWS Summit Seoul 2018모든 데이터를 위한 단 하나의 저장소, Amazon S3 기반 데이터 레이크::정세웅::AWS Summit Seoul 2018
모든 데이터를 위한 단 하나의 저장소, Amazon S3 기반 데이터 레이크::정세웅::AWS Summit Seoul 2018Amazon Web Services Korea
 
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...Neo4j
 
Ontology Design Patterns for Linked Data Tutorial at ISWC2016 - Introduction
Ontology Design Patterns for Linked Data Tutorial at ISWC2016 - IntroductionOntology Design Patterns for Linked Data Tutorial at ISWC2016 - Introduction
Ontology Design Patterns for Linked Data Tutorial at ISWC2016 - IntroductionAldo Gangemi
 
Microservices Docker Kubernetes Istio Kanban DevOps SRE
Microservices Docker Kubernetes Istio Kanban DevOps SREMicroservices Docker Kubernetes Istio Kanban DevOps SRE
Microservices Docker Kubernetes Istio Kanban DevOps SREAraf Karsh Hamid
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & DeltaDatabricks
 
(The life of a) Data engineer
(The life of a) Data engineer(The life of a) Data engineer
(The life of a) Data engineerAlex Chalini
 
[공간정보시스템 개론] L06 GIS의 이해
[공간정보시스템 개론] L06 GIS의 이해[공간정보시스템 개론] L06 GIS의 이해
[공간정보시스템 개론] L06 GIS의 이해Kwang Woo NAM
 
Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptxWasm1953
 
Real Time Data Strategy and Architecture
Real Time Data Strategy and ArchitectureReal Time Data Strategy and Architecture
Real Time Data Strategy and ArchitectureAlan McSweeney
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
[DSC Europe 23] Predrag Ilic & Simeon Rilling - From Data Lakes to Data Mesh ...
[DSC Europe 23] Predrag Ilic & Simeon Rilling - From Data Lakes to Data Mesh ...[DSC Europe 23] Predrag Ilic & Simeon Rilling - From Data Lakes to Data Mesh ...
[DSC Europe 23] Predrag Ilic & Simeon Rilling - From Data Lakes to Data Mesh ...DataScienceConferenc1
 
Design Science and Solution Architecture
Design Science and Solution ArchitectureDesign Science and Solution Architecture
Design Science and Solution ArchitectureAlan McSweeney
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Cathrine Wilhelmsen
 

La actualidad más candente (20)

Modeling Cybersecurity with Neo4j, Based on Real-Life Data Insights
Modeling Cybersecurity with Neo4j, Based on Real-Life Data InsightsModeling Cybersecurity with Neo4j, Based on Real-Life Data Insights
Modeling Cybersecurity with Neo4j, Based on Real-Life Data Insights
 
Choosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloudChoosing technologies for a big data solution in the cloud
Choosing technologies for a big data solution in the cloud
 
Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...
Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...
Hotels.com’s Journey to Becoming an Algorithmic Business… Exponential Growth ...
 
Introducing Neo4j
Introducing Neo4jIntroducing Neo4j
Introducing Neo4j
 
So You Think You Need A Digital Strategy
So You Think You Need A Digital StrategySo You Think You Need A Digital Strategy
So You Think You Need A Digital Strategy
 
Intro to Graphs and Neo4j
Intro to Graphs and Neo4jIntro to Graphs and Neo4j
Intro to Graphs and Neo4j
 
지리정보체계(GIS) - [1] GIS 데이터 유형, 구조 알기
지리정보체계(GIS) - [1] GIS 데이터 유형, 구조 알기지리정보체계(GIS) - [1] GIS 데이터 유형, 구조 알기
지리정보체계(GIS) - [1] GIS 데이터 유형, 구조 알기
 
모든 데이터를 위한 단 하나의 저장소, Amazon S3 기반 데이터 레이크::정세웅::AWS Summit Seoul 2018
모든 데이터를 위한 단 하나의 저장소, Amazon S3 기반 데이터 레이크::정세웅::AWS Summit Seoul 2018모든 데이터를 위한 단 하나의 저장소, Amazon S3 기반 데이터 레이크::정세웅::AWS Summit Seoul 2018
모든 데이터를 위한 단 하나의 저장소, Amazon S3 기반 데이터 레이크::정세웅::AWS Summit Seoul 2018
 
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
The perfect couple: Uniting Large Language Models and Knowledge Graphs for En...
 
Ontology Design Patterns for Linked Data Tutorial at ISWC2016 - Introduction
Ontology Design Patterns for Linked Data Tutorial at ISWC2016 - IntroductionOntology Design Patterns for Linked Data Tutorial at ISWC2016 - Introduction
Ontology Design Patterns for Linked Data Tutorial at ISWC2016 - Introduction
 
Microservices Docker Kubernetes Istio Kanban DevOps SRE
Microservices Docker Kubernetes Istio Kanban DevOps SREMicroservices Docker Kubernetes Istio Kanban DevOps SRE
Microservices Docker Kubernetes Istio Kanban DevOps SRE
 
Moving to Databricks & Delta
Moving to Databricks & DeltaMoving to Databricks & Delta
Moving to Databricks & Delta
 
(The life of a) Data engineer
(The life of a) Data engineer(The life of a) Data engineer
(The life of a) Data engineer
 
[공간정보시스템 개론] L06 GIS의 이해
[공간정보시스템 개론] L06 GIS의 이해[공간정보시스템 개론] L06 GIS의 이해
[공간정보시스템 개론] L06 GIS의 이해
 
Databricks on AWS.pptx
Databricks on AWS.pptxDatabricks on AWS.pptx
Databricks on AWS.pptx
 
Real Time Data Strategy and Architecture
Real Time Data Strategy and ArchitectureReal Time Data Strategy and Architecture
Real Time Data Strategy and Architecture
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
[DSC Europe 23] Predrag Ilic & Simeon Rilling - From Data Lakes to Data Mesh ...
[DSC Europe 23] Predrag Ilic & Simeon Rilling - From Data Lakes to Data Mesh ...[DSC Europe 23] Predrag Ilic & Simeon Rilling - From Data Lakes to Data Mesh ...
[DSC Europe 23] Predrag Ilic & Simeon Rilling - From Data Lakes to Data Mesh ...
 
Design Science and Solution Architecture
Design Science and Solution ArchitectureDesign Science and Solution Architecture
Design Science and Solution Architecture
 
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
Pipelines and Data Flows: Introduction to Data Integration in Azure Synapse A...
 

Similar a Working with OpenStreetMap using Apache Spark and Geotrellis

Rendering OpenStreetMap Data using Mapnik
Rendering OpenStreetMap Data using MapnikRendering OpenStreetMap Data using Mapnik
Rendering OpenStreetMap Data using MapnikGraham Jones
 
OpenStreetMap louis liu
OpenStreetMap   louis liuOpenStreetMap   louis liu
OpenStreetMap louis liuAidIQ
 
Apache Hive for modern DBAs
Apache Hive for modern DBAsApache Hive for modern DBAs
Apache Hive for modern DBAsLuis Marques
 
Spatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use CasesSpatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use Casesmathieuraj
 
Apache spark on planet scale
Apache spark on planet scaleApache spark on planet scale
Apache spark on planet scaleDenis Chapligin
 
Gis capabilities on Big Data Systems
Gis capabilities on Big Data SystemsGis capabilities on Big Data Systems
Gis capabilities on Big Data SystemsAhmad Jawwad
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poliivascucristian
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...DataWorks Summit
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartMukesh Singh
 
Concepts and Methods of Embedding Statistical Data into Maps
Concepts and Methods of Embedding Statistical Data into MapsConcepts and Methods of Embedding Statistical Data into Maps
Concepts and Methods of Embedding Statistical Data into MapsMohammad Liton Hossain
 
SoTM US Routing
SoTM US RoutingSoTM US Routing
SoTM US RoutingMapQuest
 
Gsoc proposal 2021 polaris
Gsoc proposal 2021 polarisGsoc proposal 2021 polaris
Gsoc proposal 2021 polarisAyushBansal122
 
Spark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, StreamingSpark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, StreamingPetr Zapletal
 
Arches Getty Brownbag Talk
Arches Getty Brownbag TalkArches Getty Brownbag Talk
Arches Getty Brownbag Talkbenosteen
 

Similar a Working with OpenStreetMap using Apache Spark and Geotrellis (20)

Rendering OpenStreetMap Data using Mapnik
Rendering OpenStreetMap Data using MapnikRendering OpenStreetMap Data using Mapnik
Rendering OpenStreetMap Data using Mapnik
 
OpenStreetMap louis liu
OpenStreetMap   louis liuOpenStreetMap   louis liu
OpenStreetMap louis liu
 
Openstreetmap
OpenstreetmapOpenstreetmap
Openstreetmap
 
Apache Hive for modern DBAs
Apache Hive for modern DBAsApache Hive for modern DBAs
Apache Hive for modern DBAs
 
Presto
PrestoPresto
Presto
 
Open layers
Open layersOpen layers
Open layers
 
Spatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use CasesSpatial Data Integrator - Software Presentation and Use Cases
Spatial Data Integrator - Software Presentation and Use Cases
 
Apache spark on planet scale
Apache spark on planet scaleApache spark on planet scale
Apache spark on planet scale
 
Gis capabilities on Big Data Systems
Gis capabilities on Big Data SystemsGis capabilities on Big Data Systems
Gis capabilities on Big Data Systems
 
L04.pdf
L04.pdfL04.pdf
L04.pdf
 
Distributed computing poli
Distributed computing poliDistributed computing poli
Distributed computing poli
 
Introduction to MapReduce
Introduction to MapReduceIntroduction to MapReduce
Introduction to MapReduce
 
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
Optimizing Geospatial Operations with Server-side Programming in HBase and Ac...
 
Gsoc proposal
Gsoc proposalGsoc proposal
Gsoc proposal
 
Ledingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @LendingkartLedingkart Meetup #2: Scaling Search @Lendingkart
Ledingkart Meetup #2: Scaling Search @Lendingkart
 
Concepts and Methods of Embedding Statistical Data into Maps
Concepts and Methods of Embedding Statistical Data into MapsConcepts and Methods of Embedding Statistical Data into Maps
Concepts and Methods of Embedding Statistical Data into Maps
 
SoTM US Routing
SoTM US RoutingSoTM US Routing
SoTM US Routing
 
Gsoc proposal 2021 polaris
Gsoc proposal 2021 polarisGsoc proposal 2021 polaris
Gsoc proposal 2021 polaris
 
Spark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, StreamingSpark Concepts - Spark SQL, Graphx, Streaming
Spark Concepts - Spark SQL, Graphx, Streaming
 
Arches Getty Brownbag Talk
Arches Getty Brownbag TalkArches Getty Brownbag Talk
Arches Getty Brownbag Talk
 

Más de Rob Emanuele

2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets
2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets
2021 Dask Summit - Using STAC to catalog SpatioTemporal datasetsRob Emanuele
 
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...Rob Emanuele
 
Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?Rob Emanuele
 
Q4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis PresentationQ4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis PresentationRob Emanuele
 
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTechGeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTechRob Emanuele
 
Enabling Access to Big Geospatial Data with LocationTech and Apache projects
Enabling Access to Big Geospatial Data with LocationTech and Apache projectsEnabling Access to Big Geospatial Data with LocationTech and Apache projects
Enabling Access to Big Geospatial Data with LocationTech and Apache projectsRob Emanuele
 
Processing Geospatial Data At Scale @locationtech
Processing Geospatial Data At Scale @locationtechProcessing Geospatial Data At Scale @locationtech
Processing Geospatial Data At Scale @locationtechRob Emanuele
 
Processing Geospatial at Scale at LocationTech
Processing Geospatial at Scale at LocationTechProcessing Geospatial at Scale at LocationTech
Processing Geospatial at Scale at LocationTechRob Emanuele
 
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and SparkFOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and SparkRob Emanuele
 

Más de Rob Emanuele (9)

2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets
2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets
2021 Dask Summit - Using STAC to catalog SpatioTemporal datasets
 
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
Analyzing Larger RasterData in a Jupyter Notebook with GeoPySpark on AWS - FO...
 
Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?Deep Learning on Aerial Imagery: What does it look like on a map?
Deep Learning on Aerial Imagery: What does it look like on a map?
 
Q4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis PresentationQ4 2016 GeoTrellis Presentation
Q4 2016 GeoTrellis Presentation
 
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTechGeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
 
Enabling Access to Big Geospatial Data with LocationTech and Apache projects
Enabling Access to Big Geospatial Data with LocationTech and Apache projectsEnabling Access to Big Geospatial Data with LocationTech and Apache projects
Enabling Access to Big Geospatial Data with LocationTech and Apache projects
 
Processing Geospatial Data At Scale @locationtech
Processing Geospatial Data At Scale @locationtechProcessing Geospatial Data At Scale @locationtech
Processing Geospatial Data At Scale @locationtech
 
Processing Geospatial at Scale at LocationTech
Processing Geospatial at Scale at LocationTechProcessing Geospatial at Scale at LocationTech
Processing Geospatial at Scale at LocationTech
 
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and SparkFOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
FOSDEM 2015: Distributed Tile Processing with GeoTrellis and Spark
 

Último

A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusZilliz
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Jeffrey Haguewood
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbuapidays
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 

Último (20)

A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu SubbuApidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
Apidays Singapore 2024 - Modernizing Securities Finance by Madhu Subbu
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 

Working with OpenStreetMap using Apache Spark and Geotrellis

  • 2. What we’ll cover ● OpenStreetMap (OSM) and it’s data model ● A Missing Maps use case that needed big data tooling to process OSM History ● OSMesa, what it is, and what it can do ● The future of distributed OSM processing, and what it will enable
  • 4.
  • 5. OSM Data Model The OSM data model consists mainly of 3 elements: ● Nodes - Points ● Ways - LineStrings, Polygons ● Relations - GeometryCollections, Polygon with holes, MultiPolygons As well as the tag-based metadata that applies to each elements, and changesets grouping edits
  • 6. OSM Data Model: Relations
  • 7. OSM Data Model: Changesets ● Edits are grouped into changesets, which have their own metadata such as use comments (for developers, think commit messages) ● Adding hashtags to user comments allows downstream processing to group changes - for example, #HOTLunch
  • 8.
  • 9.
  • 10.
  • 11.
  • 12. Backfilling missing maps ● Missing maps leaderboard processes OSM change files to increment user and campaign statistics ● The statistics were correct for when the streaming calculation started, but there was the problem of accounting for edits previous to that streaming calculation not counting towards user’s totals. ● So, there was a need to “backfill” the statistics based on OSM history.
  • 13. ● Through the Red Cross and a grant from Microsoft Philanthropies, Seth Fitzsimmons of Pacific Atlas was hired to solve the backfilling problem. ● Seth was previously involved with releasing OSM data as a public dataset on AWS and early work on distributed processing of OSM data
  • 14.
  • 15. Reducing the “time to first question”
  • 16.
  • 17. Source: Seth’s blog post about processing OSM with Athena
  • 18. Backfill: Athena approach ● Seth first tried to use Athena to calculate the backfill statistics. This approach didn’t work ● The complexity of the queries made the jobs blow up or never finish ● Also, Athena's geospatial support hadn't been announced yet, and once it was, it still didn’t work with the complicated set of queries
  • 19. ● Seth started showing interest in a set of tools that Azavea was building at the time that used Apache Spark and GeoTrellis for calculations calculating similar statistics ● He ported his complicated SQL queries for Athena to SparkSQL and started contributing to that effort Backfill: New approach
  • 21. What is OSMesa? ● It's a loose term for a workflow for OSM data processing ● Still being defined - useful, but amorphous ● More a group of tools and techniques then, say, a library ● Uses Spark, GeoTrellis and AWS to process OSM data into geometries, vector tiles, and statistics
  • 22. ● a distributed computation engine. ● An API that lets you work with distributed data as a collection, including a DataFrames API ● Written in Scala, with language bindings for use with Java, Python, and R.
  • 23. ● Spark DataFrames provide an API that is similar to R or Pandas DataFrames; allows working with data in a SQL-like manner ● Very powerful, and can express complicated queries ● (partially) Abstracts away the complexities of distributed computing
  • 24. ● Core geospatial library in Scala ● Enables Spark with geospatial types and operations ● Generally focused on Raster data, wraps JTS for vector support ● Vector Tile module for reading and writing vector tiles
  • 25. OSMesa workflow AWS EMR Cluster AWS S3 ORC Statistics Vector Tiles ORC files
  • 26. ● With OSMesa, we can create full historical geometries. ● To do this, we need needed to create a concept of “minor versions” of geometries Creating features from History
  • 27. way v1 highway=unclassified node v1 node v1 node v1 node v1 node v1 node v1 node v2 node v2 way v2 highway=primary node v1 node v1 node v2 node v2 way v1 highway=unclassified
  • 28. way v1 highway=unclassified node v1 node v1 node v1 node v1 way v1.1 highway=unclassified node v1 node v1 node v2 node v2 way v2 highway=primary node v1 node v1 node v2 node v2 minor version change
  • 29. ● With minor versions, we can bake new ORC files that contain geometries of every element in OSM history, with ways/relations representing every edit to the element as well as elements that they contain ● Then, we compute statistics per changeset based on geometries, and roll up the statistics per user and hashtag Full historical geometries
  • 30. ● Processing of full history into features in under 40 minutes (cluster of 255 m3.2xlarge nodes) ● This is not a small cluster ( ≈$65/hour). YMMV with smaller clusters. ● We are building update mechanisms to avoid refreshing the entire dataset Processing OSM data at scale
  • 31. Some data created by OSMesa...
  • 32. Viewing time slices of Rhode Island OSM
  • 33. Historical edits for several hashtag campaigns
  • 34. Global friction surface for cost distance calculations using elevation (SRTM) and OSM roads + water bodies
  • 35. ● Building matching between OSM and other vector datasets ● Generating vector tiles for URCHN containing a subset of historical data to front-end analytics OSMesa: Other current uses
  • 36. This is just the beginning
  • 37. The Future: Validation workflows, Reputation scores ● Better validation workflows is a big question in the OSM community right now (according to SOTM US 2017) ● HOT Tasking manager does some; we can do better ● One way to improve validation workflows is to suggest validation be done by veteran mappers, validation be suggested for more junior mappers (“reputations core”) ● Development Seed, who contribute & uses OSMesa work, have great ideas in this space.
  • 38. The Future: Data Science notebooks, production workflows ● We are aiming to create a Python notebook environment for doing data science on OSM, in combination with raster data ● By building on Spark and projects like GeoMesa’s “JTSFrames”, RasterFrames, and GeoTrellis, we’re creating a platform that works both for data scientist poking around in a Jupyter notebook and production systems.
  • 39. The Future: Machine Learning pre- and post- processing ● Pre-processing geospatial imagery and OSM into training chips - a distributed label-maker ● Managing data into and out of Raster Vision ● Post-processing by cleaning the model output, matching to OSM or other vector data to remove duplicates, conflation workflows ● Matching OSM to imagery dates: e.g. pre- and post- disaster.
  • 40. Join in the fun ● There is a lot of interesting development challenges that need to be met in the OSM world ● OSM has many different voices in the room, but they all have one goal: building a better map ● Join the effort to build a better map
  • 41. If you could ask the OpenStreetMap any question, at any scale, what would you ask it?
  • 42. THANKS! Rob Emanuele, Azavea @lossyrob (Twitter, GitHub) www.azavea.com Seth Fitzsimmons, Pacific Atlas @mojodna (Twitter, GitHub) www.pacatlas.com github.com/azavea/osmesa
  • 43. OSM Data Model: Nodes ● Single location; only OSM element with geospatial data ● Can represent points of interest, or be solely for inclusion in ways ● Represents a Point geometry
  • 44. OSM Data Model: Ways ● References a sequence of ordered nodes ● Represents a LineString geometry ● Closed ways can represent Polygon geometries
  • 45. OSM Data Model: Relations ● Group of nodes, ways, and other relations ● Used for representing a Polygon with holes, MultiPolygons, and more generally GeometryCollections
  • 46. OSM Data Model: Tags ● Each Node, Way and Relation can have a sequence of tags, which are string-based keys and values. This describes the role of each element on the map, e.g. ○ highway=residential ○ landuse=grass ○ amenity=fast_food
  • 47. Source: Dongpo Deng, https://www.slideshare.net/dongpo/the-one-and-many-maps-participatory-and-temporal-diversities-in-openstreetmap
  • 49. Ways to work with OSM snapshots ● Import OSM data into PostGIS ○ osm2pgsql ○ imposm3 ● Render into raster tiles or vector tiles ○ Mapnik ○ Tegola ● Utilize for routing software ○ pgRouting
  • 50. Ways to work with OSM history ● Clip it using osmium, and import a subset into PostGIS ● After that … not a lot of mature tooling available
  • 51. Why is OSM history useful ● Calculating user history statistics ● Calculating campaign history statistics ● Calculating complete answers to the question, “what has changed?” ● Taking a snapshot of OSM at any point in history ● Analytics for research
  • 52. Why ORC? ● On-demand querying + predicate push-down is possible if OSM data is in a format that was well-understood by the Hadoop ecosystem ● bespoke formats have their place, especially when size or other considerations are all-consuming, but it's really frustrating to see people continually implementing OSM PBF parsers to be slightly faster when those parsers are typically single-use (for a specific application). i wanted to sidestep the whole process and use a well-known, well-supported
  • 53. The Approach: Features from OSM data ● Join element data to the other elements that contain them; for example, join each node to the way(s) it belongs to. ● Assign a minor version to ways and relations modified because the underlying elements change; e.g. a minor version increments for a way if someone moves the nodes belonging to it. ● Create Points, Line, Polygons, and Multipolygons for each major and minor version of the element. ProcessOSM.scala on GitHub
  • 54. The Approach: Vector Tile Generation
  • 55. Analytic Vector Tiles ● The name we’ve been using for Vector Tiles that contain information for analysis not (necessarily) for display ● OSMesa/VectorPipe can create sets of Analytic Vector Tiles from arbitrary subsets of OSM History and publish them to AWS S3 ● Think custom Mapbox QA Tiles, containing relations and historical elements ● We are creating streaming update workflows to keep Analytic Vector Tile sets up-to-the-minute (almost).
  • 56. Other work in this space ● Mapbox’s Jennings Anderson gave a talk at SOTM and wrote a blog post around quarterly QA tiles ● Uses a work-in-progress project called osm-wayback to create the historical QA tiles. Goal of project is “...to create historic geometries for each intermediate version of an OSM feature.” ● RocksDB on the backend, which creates a ≈ 600GB index ● We have collaborating and looking to further collaborate, the work is awesome
  • 57. Animation of Rhode Island OSM edits over time
  • 58. Global friction surface for cost distance calculations using elevation (SRTM) and OSM roads + water bodies
  • 59.
  • 60. How to get started with OSMesa ● GitHub ● Gitter ● Docs are a TODO
  • 61. An Aside - “Push vs Pull” models for AI tooling for OSM (and in general)