SlideShare una empresa de Scribd logo
1 de 36
Descargar para leer sin conexión
Large Scale Geo Processing
on Hadoop
Christoph Koerner
Slides available
About me
● Data Scientist at T-Mobile Austria
● Visual Computing at Vienna University of Technology
● Author of Data Visualizations with D3 and AngularJS
● Author of Learning Responsive Data Visualization
● Organizer of Vienna Kaggle Meetup
● LinkedIn: linkedin.com/in/christophkoerner
● Twitter: @ChrisiKrnr
● Google+: +ChrisiHififm
● Github: github.com/chaosmail
Overview
1. Introduction
2. Pre-processing
3. Large scale geo processing on Hadoop
4. Some use-cases
Introduction
What is Geo Processing?
● Operations to manipulate spatial data
● Operations include geographic feature
overlay, feature selection and analysis,
topology processing, raster processing, and
data conversion
Source: Wikipedia
Spatial Data
● Contains data for a spatial reference
● Mostly 1D or 2D Geometries such as
Points, Lines, Polygons, etc.
● Usually in latitude and longitude
(or x and y) coordinates
3D or 2D
What is Latitude and Longitude?
Source: imgur.com
3D Coordinates
● Earth is not a perfect sphere!
● Can be approximated by a biaxial ellipsoid
● 3D coordinates need a reference ellipsoid
● Most widely used is the World Geodetic
System (WGS84) used by GPS
● Minimal positioning error on the surface
3D or 2D
Going to 2 Dimensions
2D Projections
● The earth cannot be displayed on a 2D map
without distortion
● Mapping to the surface of other 3D Volumes
○ Cylindrical
○ Conical
○ Azimuthal
2D Projections
● The earth cannot be displayed on a 2D map
without distortion
● Every mapping has its tradeoff
○ Length Preserving (Equidistant)
○ Area Preserving (Equal Area)
○ Angle Preserving (Conformal)
2D Projections
● Commonly used in Austria: MGI Austria
Lambert (equal area)
● Commonly used in the US: Albers USA
projection (equal area)
3D or 2D
Pre-processing
Geo Processing on Hadoop
● Acquire spatial dataset
● Pre-process the dataset
● Load the dataset into HDFS
● Perform topological processing and analysis
using Hive
● Visualize the results
Data Sources
● https://www.data.gv.at
Free Austrian data, demographics, health, tourism, public
transport, etc.
● GIP Graphenintegrations-Plattform
Austrian traffic graph, public transport, streets, etc.
● GADM Database of Global Administrative Areas
Country shapes
● Many more..
Vector Formats
● Shapefile
● GeoJSON
● WKT Well Known Text
● Points
GDAL - Geospatial Data Abstraction Library
● Translator library for raster and vector
geospatial data formats
● Converts spatial data between file formats,
reference systems and projections
● SQL query syntax
● Command-line tool (MIT license)
Source: gdal.org
GDAL - Transform Shapefiles to CSV
ogr2ogr -f CSV output.csv 
input.shp 
-lco GEOMETRY=AS_WKT 
-lco SEPARATOR=SEMICOLON 
-oo ENCODING=UTF-8
GDAL - Use spatial queries
ogr2ogr -sql "SELECT A.* FROM shape1 A,
shape2 B WHERE ST_Intersects(A.geo, B.geo)" 
-dialect SQLITE 
data input_dir 
-nln output.shp
More complex pre-processing
● Fiona for loading Shapefiles
● Shapely for geo processing
● Complex pre-processing, extraction,
transformations, area and length
computations, etc.
ESRI Tools on Hadoop
Source: ESRI Github
ESRI Tools on Hadoop
● Open Source Tools from ESRI
● Provided under Apache-License 2.0
● Geo processing tools for Hadoop + ArcGis
● Active development and feedback (on Github)
ESRI Tools on Hadoop
● Esri Geometry API for Java
Java library for geo processing
● Spatial Framework for Hadoop
Hive SerDe and UDFs based on the geometry API
● Geoprocessing Tools for Hadoop
Tools for data exchange between ArcGis and HDFS
● GIS Tools for Hadoop
Sample application and demos
Spatial Framework for Hadoop (Hive UDFs)
● Geometry Construction
Create geometries from WKT, from binary, or manually
● Relationship Tests
Contains, intersects, overlaps, touches, etc.
● Operations
Boundary, envelop, union, convex hull, etc.
● Accessors
Length, Area, Centroid, Distance, etc.
Spatial Framework for Hadoop
ADD JAR libs/esri-geometry-api.jar;
ADD JAR libs/spatial-sdk-hadoop.jar;
CREATE TEMPORARY FUNCTION ST_Point AS
‘com.esri.hadoop.hive.ST_Point’;
CREATE TEMPORARY FUNCTION ST_LineString AS
‘com.esri.hadoop.hive.ST_LineString’;
CREATE TEMPORARY FUNCTION ST_Polygon AS
‘com.esri.hadoop.hive.ST_Polygon’;
Spatial Framework for Hadoop
SELECT ST_Area(
ST_Polygon(0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0)
) FROM src LIMIT 1;
SELECT ST_AsText(
ST_Centroid(
ST_GeomFromText(geo_as_wkt)
)
) FROM src LIMIT 1;
Problems
● No persistent spatial indices
● No projections - length/area!
● Binary output by default
● Doesn’t work with vectorization
● No visualization
● Not feature complete (but most things work)
Use Cases for Geo Processing
Use Case: Geo Processing @ T-Mobile Austria
● Network traffic analysis and optimization
● Signal performance analog railway tracks
● Better analysis of network coverage
● Many more..
Use Case: Trips Analysis @ Uber
● What do trips look like?
● How can we reduce wait time and make more
trips?
● Are there new products we should introduce?
Source: slideshare.net
Use Case: Traffic Jam Prediction based on GPS/FCD
● Estimate average speed of cars on road
● Compare to the max speed on each street
● Use public traffic jam data as ground truth
● Train a model to predict traffic jams
Thank you.
Christoph Koerner

Más contenido relacionado

La actualidad más candente

Mapreduce introduction
Mapreduce introductionMapreduce introduction
Mapreduce introductionYogender Singh
 
Geoprocessing(Building Your Own Tool) and Geostatistical Analysis(An Introdu...
Geoprocessing(Building Your Own Tool)  and Geostatistical Analysis(An Introdu...Geoprocessing(Building Your Own Tool)  and Geostatistical Analysis(An Introdu...
Geoprocessing(Building Your Own Tool) and Geostatistical Analysis(An Introdu...Nepal Flying Labs
 
Managing Data Synchronization Between ArcSDE and POSTGIS using FME
Managing Data Synchronization Between ArcSDE and POSTGIS using FMEManaging Data Synchronization Between ArcSDE and POSTGIS using FME
Managing Data Synchronization Between ArcSDE and POSTGIS using FMESafe Software
 
SEPA - Esri UK Annual Conference 2016
SEPA - Esri UK Annual Conference 2016SEPA - Esri UK Annual Conference 2016
SEPA - Esri UK Annual Conference 2016Esri UK
 
Arc gis desktop_and_geoprocessing
Arc gis desktop_and_geoprocessingArc gis desktop_and_geoprocessing
Arc gis desktop_and_geoprocessingEsri
 
Break on Through (To The Java(Script) Side) - Smart Development - Esri UK Ann...
Break on Through (To The Java(Script) Side) - Smart Development - Esri UK Ann...Break on Through (To The Java(Script) Side) - Smart Development - Esri UK Ann...
Break on Through (To The Java(Script) Side) - Smart Development - Esri UK Ann...Esri UK
 
GIS on the Web
GIS on the WebGIS on the Web
GIS on the WebRuss White
 
Karnataka Geospatial Experience FME World Tour 2017 India
Karnataka Geospatial Experience FME World Tour 2017 IndiaKarnataka Geospatial Experience FME World Tour 2017 India
Karnataka Geospatial Experience FME World Tour 2017 IndiaRaghavendran S
 
2015 FOSS4G Track: Analyzing Aspen's Community Forest with Lidar, Object-Base...
2015 FOSS4G Track: Analyzing Aspen's Community Forest with Lidar, Object-Base...2015 FOSS4G Track: Analyzing Aspen's Community Forest with Lidar, Object-Base...
2015 FOSS4G Track: Analyzing Aspen's Community Forest with Lidar, Object-Base...GIS in the Rockies
 
Plugins in QGIS and its uses
Plugins in QGIS and its usesPlugins in QGIS and its uses
Plugins in QGIS and its usesMayuresh Padalkar
 
Scalable Data Analytics and Visualization with Cloud Optimized Services
Scalable Data Analytics and Visualization with Cloud Optimized ServicesScalable Data Analytics and Visualization with Cloud Optimized Services
Scalable Data Analytics and Visualization with Cloud Optimized ServicesGlobus
 
Movie Recommendations using Map Reduce and Python
Movie Recommendations using Map Reduce and PythonMovie Recommendations using Map Reduce and Python
Movie Recommendations using Map Reduce and PythonAvinash Pandu
 
NDGISUC2017 - Mobile Data Collection & Reporting System
NDGISUC2017 - Mobile Data Collection & Reporting SystemNDGISUC2017 - Mobile Data Collection & Reporting System
NDGISUC2017 - Mobile Data Collection & Reporting SystemNorth Dakota GIS Hub
 
GIS software
GIS softwareGIS software
GIS softwareSwetha A
 
Data Appliance for ArcGIS
Data Appliance for ArcGISData Appliance for ArcGIS
Data Appliance for ArcGISEsri
 
Our vision: The 3D Shared Earth Model - Oil and Gas seminar October 10th
Our vision: The 3D Shared Earth Model - Oil and Gas seminar October 10thOur vision: The 3D Shared Earth Model - Oil and Gas seminar October 10th
Our vision: The 3D Shared Earth Model - Oil and Gas seminar October 10thGeodata AS
 
Geocap Water Column and Seafloor for ArcGIS - Oil and Gas seminar October 10th
Geocap Water Column and Seafloor for ArcGIS - Oil and Gas seminar October 10th Geocap Water Column and Seafloor for ArcGIS - Oil and Gas seminar October 10th
Geocap Water Column and Seafloor for ArcGIS - Oil and Gas seminar October 10th Geodata AS
 
LIDAR and Drone Data - Datamine Discover3D
LIDAR and Drone Data - Datamine Discover3DLIDAR and Drone Data - Datamine Discover3D
LIDAR and Drone Data - Datamine Discover3DPrakher Hajela Saxena
 
Final Project Presentation
Final Project PresentationFinal Project Presentation
Final Project PresentationM Zubair Iqbal
 

La actualidad más candente (20)

Mapreduce introduction
Mapreduce introductionMapreduce introduction
Mapreduce introduction
 
Geoprocessing(Building Your Own Tool) and Geostatistical Analysis(An Introdu...
Geoprocessing(Building Your Own Tool)  and Geostatistical Analysis(An Introdu...Geoprocessing(Building Your Own Tool)  and Geostatistical Analysis(An Introdu...
Geoprocessing(Building Your Own Tool) and Geostatistical Analysis(An Introdu...
 
Managing Data Synchronization Between ArcSDE and POSTGIS using FME
Managing Data Synchronization Between ArcSDE and POSTGIS using FMEManaging Data Synchronization Between ArcSDE and POSTGIS using FME
Managing Data Synchronization Between ArcSDE and POSTGIS using FME
 
SEPA - Esri UK Annual Conference 2016
SEPA - Esri UK Annual Conference 2016SEPA - Esri UK Annual Conference 2016
SEPA - Esri UK Annual Conference 2016
 
Arc gis desktop_and_geoprocessing
Arc gis desktop_and_geoprocessingArc gis desktop_and_geoprocessing
Arc gis desktop_and_geoprocessing
 
Break on Through (To The Java(Script) Side) - Smart Development - Esri UK Ann...
Break on Through (To The Java(Script) Side) - Smart Development - Esri UK Ann...Break on Through (To The Java(Script) Side) - Smart Development - Esri UK Ann...
Break on Through (To The Java(Script) Side) - Smart Development - Esri UK Ann...
 
GIS on the Web
GIS on the WebGIS on the Web
GIS on the Web
 
Karnataka Geospatial Experience FME World Tour 2017 India
Karnataka Geospatial Experience FME World Tour 2017 IndiaKarnataka Geospatial Experience FME World Tour 2017 India
Karnataka Geospatial Experience FME World Tour 2017 India
 
2015 FOSS4G Track: Analyzing Aspen's Community Forest with Lidar, Object-Base...
2015 FOSS4G Track: Analyzing Aspen's Community Forest with Lidar, Object-Base...2015 FOSS4G Track: Analyzing Aspen's Community Forest with Lidar, Object-Base...
2015 FOSS4G Track: Analyzing Aspen's Community Forest with Lidar, Object-Base...
 
Plugins in QGIS and its uses
Plugins in QGIS and its usesPlugins in QGIS and its uses
Plugins in QGIS and its uses
 
Scalable Data Analytics and Visualization with Cloud Optimized Services
Scalable Data Analytics and Visualization with Cloud Optimized ServicesScalable Data Analytics and Visualization with Cloud Optimized Services
Scalable Data Analytics and Visualization with Cloud Optimized Services
 
Movie Recommendations using Map Reduce and Python
Movie Recommendations using Map Reduce and PythonMovie Recommendations using Map Reduce and Python
Movie Recommendations using Map Reduce and Python
 
NDGISUC2017 - Mobile Data Collection & Reporting System
NDGISUC2017 - Mobile Data Collection & Reporting SystemNDGISUC2017 - Mobile Data Collection & Reporting System
NDGISUC2017 - Mobile Data Collection & Reporting System
 
GIS software
GIS softwareGIS software
GIS software
 
Data Appliance for ArcGIS
Data Appliance for ArcGISData Appliance for ArcGIS
Data Appliance for ArcGIS
 
Our vision: The 3D Shared Earth Model - Oil and Gas seminar October 10th
Our vision: The 3D Shared Earth Model - Oil and Gas seminar October 10thOur vision: The 3D Shared Earth Model - Oil and Gas seminar October 10th
Our vision: The 3D Shared Earth Model - Oil and Gas seminar October 10th
 
Presentation may30th
Presentation may30thPresentation may30th
Presentation may30th
 
Geocap Water Column and Seafloor for ArcGIS - Oil and Gas seminar October 10th
Geocap Water Column and Seafloor for ArcGIS - Oil and Gas seminar October 10th Geocap Water Column and Seafloor for ArcGIS - Oil and Gas seminar October 10th
Geocap Water Column and Seafloor for ArcGIS - Oil and Gas seminar October 10th
 
LIDAR and Drone Data - Datamine Discover3D
LIDAR and Drone Data - Datamine Discover3DLIDAR and Drone Data - Datamine Discover3D
LIDAR and Drone Data - Datamine Discover3D
 
Final Project Presentation
Final Project PresentationFinal Project Presentation
Final Project Presentation
 

Similar a Large Scale Geo Processing on Hadoop

Using python to analyze spatial data
Using python to analyze spatial dataUsing python to analyze spatial data
Using python to analyze spatial dataKudos S.A.S
 
LocationTech Projects
LocationTech ProjectsLocationTech Projects
LocationTech ProjectsJody Garnett
 
Mapping, GIS and geolocating data in Java @ JAX London
Mapping, GIS and geolocating data in Java @ JAX LondonMapping, GIS and geolocating data in Java @ JAX London
Mapping, GIS and geolocating data in Java @ JAX LondonJoachim Van der Auwera
 
Java Tech & Tools | Mapping, GIS and Geolocating Data in Java | Joachim Van d...
Java Tech & Tools | Mapping, GIS and Geolocating Data in Java | Joachim Van d...Java Tech & Tools | Mapping, GIS and Geolocating Data in Java | Joachim Van d...
Java Tech & Tools | Mapping, GIS and Geolocating Data in Java | Joachim Van d...JAX London
 
Spatial Data processing with Hadoop
Spatial Data processing with HadoopSpatial Data processing with Hadoop
Spatial Data processing with HadoopVisionGEOMATIQUE2014
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding HadoopAhmed Ossama
 
Mapping, GIS and geolocating data in Java
Mapping, GIS and geolocating data in JavaMapping, GIS and geolocating data in Java
Mapping, GIS and geolocating data in JavaJoachim Van der Auwera
 
GIS User to Web-GIS Developer Journey
GIS User to Web-GIS Developer JourneyGIS User to Web-GIS Developer Journey
GIS User to Web-GIS Developer JourneyTek Kshetri
 
150810 ilts workshop_handson_presentation
150810 ilts workshop_handson_presentation150810 ilts workshop_handson_presentation
150810 ilts workshop_handson_presentationTakayuki Nuimura
 
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축Kwang Woo NAM
 
How to empower community by using GIS lecture 1
How to empower community by using GIS lecture 1How to empower community by using GIS lecture 1
How to empower community by using GIS lecture 1wang yaohui
 
Saving Money with Open Source GIS
Saving Money with Open Source GISSaving Money with Open Source GIS
Saving Money with Open Source GISbryanluman
 
Apache Hadoop - Big Data Engineering
Apache Hadoop - Big Data EngineeringApache Hadoop - Big Data Engineering
Apache Hadoop - Big Data EngineeringBADR
 
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010Jonathan Seidman
 
Apache spark - History and market overview
Apache spark - History and market overviewApache spark - History and market overview
Apache spark - History and market overviewMartin Zapletal
 
ITResumeKlonWaldrip
ITResumeKlonWaldripITResumeKlonWaldrip
ITResumeKlonWaldripKlon Waldrip
 
GeoNode intro and demo
GeoNode intro and demoGeoNode intro and demo
GeoNode intro and demoPaolo Corti
 
OpenStreetMap in 3D - current developments
OpenStreetMap in 3D - current developmentsOpenStreetMap in 3D - current developments
OpenStreetMap in 3D - current developmentsvirtualcitySYSTEMS GmbH
 

Similar a Large Scale Geo Processing on Hadoop (20)

Using python to analyze spatial data
Using python to analyze spatial dataUsing python to analyze spatial data
Using python to analyze spatial data
 
Gdal introduction
Gdal introductionGdal introduction
Gdal introduction
 
LocationTech Projects
LocationTech ProjectsLocationTech Projects
LocationTech Projects
 
Mapping, GIS and geolocating data in Java @ JAX London
Mapping, GIS and geolocating data in Java @ JAX LondonMapping, GIS and geolocating data in Java @ JAX London
Mapping, GIS and geolocating data in Java @ JAX London
 
Java Tech & Tools | Mapping, GIS and Geolocating Data in Java | Joachim Van d...
Java Tech & Tools | Mapping, GIS and Geolocating Data in Java | Joachim Van d...Java Tech & Tools | Mapping, GIS and Geolocating Data in Java | Joachim Van d...
Java Tech & Tools | Mapping, GIS and Geolocating Data in Java | Joachim Van d...
 
Spatial Data processing with Hadoop
Spatial Data processing with HadoopSpatial Data processing with Hadoop
Spatial Data processing with Hadoop
 
Understanding Hadoop
Understanding HadoopUnderstanding Hadoop
Understanding Hadoop
 
Mapping, GIS and geolocating data in Java
Mapping, GIS and geolocating data in JavaMapping, GIS and geolocating data in Java
Mapping, GIS and geolocating data in Java
 
GIS User to Web-GIS Developer Journey
GIS User to Web-GIS Developer JourneyGIS User to Web-GIS Developer Journey
GIS User to Web-GIS Developer Journey
 
150810 ilts workshop_handson_presentation
150810 ilts workshop_handson_presentation150810 ilts workshop_handson_presentation
150810 ilts workshop_handson_presentation
 
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축
[FOSS4G KOREA 2014]Hadoop 상에서 MapReduce를 이용한 Spatial Big Data 집계와 시스템 구축
 
How to empower community by using GIS lecture 1
How to empower community by using GIS lecture 1How to empower community by using GIS lecture 1
How to empower community by using GIS lecture 1
 
Saving Money with Open Source GIS
Saving Money with Open Source GISSaving Money with Open Source GIS
Saving Money with Open Source GIS
 
Apache Hadoop - Big Data Engineering
Apache Hadoop - Big Data EngineeringApache Hadoop - Big Data Engineering
Apache Hadoop - Big Data Engineering
 
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010Using Hadoop and Hive to Optimize Travel Search, WindyCityDB 2010
Using Hadoop and Hive to Optimize Travel Search , WindyCityDB 2010
 
Geospatial Data Abstraction Library (GDAL) Enhancement for ESDIS (GEE)
Geospatial Data Abstraction Library (GDAL) Enhancement for ESDIS (GEE)Geospatial Data Abstraction Library (GDAL) Enhancement for ESDIS (GEE)
Geospatial Data Abstraction Library (GDAL) Enhancement for ESDIS (GEE)
 
Apache spark - History and market overview
Apache spark - History and market overviewApache spark - History and market overview
Apache spark - History and market overview
 
ITResumeKlonWaldrip
ITResumeKlonWaldripITResumeKlonWaldrip
ITResumeKlonWaldrip
 
GeoNode intro and demo
GeoNode intro and demoGeoNode intro and demo
GeoNode intro and demo
 
OpenStreetMap in 3D - current developments
OpenStreetMap in 3D - current developmentsOpenStreetMap in 3D - current developments
OpenStreetMap in 3D - current developments
 

Último

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelDeepika Singh
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native ApplicationsWSO2
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 

Último (20)

2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 

Large Scale Geo Processing on Hadoop

  • 1. Large Scale Geo Processing on Hadoop Christoph Koerner Slides available
  • 2. About me ● Data Scientist at T-Mobile Austria ● Visual Computing at Vienna University of Technology ● Author of Data Visualizations with D3 and AngularJS ● Author of Learning Responsive Data Visualization ● Organizer of Vienna Kaggle Meetup ● LinkedIn: linkedin.com/in/christophkoerner ● Twitter: @ChrisiKrnr ● Google+: +ChrisiHififm ● Github: github.com/chaosmail
  • 3. Overview 1. Introduction 2. Pre-processing 3. Large scale geo processing on Hadoop 4. Some use-cases
  • 5. What is Geo Processing? ● Operations to manipulate spatial data ● Operations include geographic feature overlay, feature selection and analysis, topology processing, raster processing, and data conversion Source: Wikipedia
  • 6. Spatial Data ● Contains data for a spatial reference ● Mostly 1D or 2D Geometries such as Points, Lines, Polygons, etc. ● Usually in latitude and longitude (or x and y) coordinates
  • 8. What is Latitude and Longitude?
  • 10. 3D Coordinates ● Earth is not a perfect sphere! ● Can be approximated by a biaxial ellipsoid ● 3D coordinates need a reference ellipsoid ● Most widely used is the World Geodetic System (WGS84) used by GPS ● Minimal positioning error on the surface
  • 12. Going to 2 Dimensions
  • 13. 2D Projections ● The earth cannot be displayed on a 2D map without distortion ● Mapping to the surface of other 3D Volumes ○ Cylindrical ○ Conical ○ Azimuthal
  • 14. 2D Projections ● The earth cannot be displayed on a 2D map without distortion ● Every mapping has its tradeoff ○ Length Preserving (Equidistant) ○ Area Preserving (Equal Area) ○ Angle Preserving (Conformal)
  • 15. 2D Projections ● Commonly used in Austria: MGI Austria Lambert (equal area) ● Commonly used in the US: Albers USA projection (equal area)
  • 18. Geo Processing on Hadoop ● Acquire spatial dataset ● Pre-process the dataset ● Load the dataset into HDFS ● Perform topological processing and analysis using Hive ● Visualize the results
  • 19. Data Sources ● https://www.data.gv.at Free Austrian data, demographics, health, tourism, public transport, etc. ● GIP Graphenintegrations-Plattform Austrian traffic graph, public transport, streets, etc. ● GADM Database of Global Administrative Areas Country shapes ● Many more..
  • 20. Vector Formats ● Shapefile ● GeoJSON ● WKT Well Known Text ● Points
  • 21. GDAL - Geospatial Data Abstraction Library ● Translator library for raster and vector geospatial data formats ● Converts spatial data between file formats, reference systems and projections ● SQL query syntax ● Command-line tool (MIT license) Source: gdal.org
  • 22. GDAL - Transform Shapefiles to CSV ogr2ogr -f CSV output.csv input.shp -lco GEOMETRY=AS_WKT -lco SEPARATOR=SEMICOLON -oo ENCODING=UTF-8
  • 23. GDAL - Use spatial queries ogr2ogr -sql "SELECT A.* FROM shape1 A, shape2 B WHERE ST_Intersects(A.geo, B.geo)" -dialect SQLITE data input_dir -nln output.shp
  • 24. More complex pre-processing ● Fiona for loading Shapefiles ● Shapely for geo processing ● Complex pre-processing, extraction, transformations, area and length computations, etc.
  • 25. ESRI Tools on Hadoop Source: ESRI Github
  • 26. ESRI Tools on Hadoop ● Open Source Tools from ESRI ● Provided under Apache-License 2.0 ● Geo processing tools for Hadoop + ArcGis ● Active development and feedback (on Github)
  • 27. ESRI Tools on Hadoop ● Esri Geometry API for Java Java library for geo processing ● Spatial Framework for Hadoop Hive SerDe and UDFs based on the geometry API ● Geoprocessing Tools for Hadoop Tools for data exchange between ArcGis and HDFS ● GIS Tools for Hadoop Sample application and demos
  • 28. Spatial Framework for Hadoop (Hive UDFs) ● Geometry Construction Create geometries from WKT, from binary, or manually ● Relationship Tests Contains, intersects, overlaps, touches, etc. ● Operations Boundary, envelop, union, convex hull, etc. ● Accessors Length, Area, Centroid, Distance, etc.
  • 29. Spatial Framework for Hadoop ADD JAR libs/esri-geometry-api.jar; ADD JAR libs/spatial-sdk-hadoop.jar; CREATE TEMPORARY FUNCTION ST_Point AS ‘com.esri.hadoop.hive.ST_Point’; CREATE TEMPORARY FUNCTION ST_LineString AS ‘com.esri.hadoop.hive.ST_LineString’; CREATE TEMPORARY FUNCTION ST_Polygon AS ‘com.esri.hadoop.hive.ST_Polygon’;
  • 30. Spatial Framework for Hadoop SELECT ST_Area( ST_Polygon(0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0) ) FROM src LIMIT 1; SELECT ST_AsText( ST_Centroid( ST_GeomFromText(geo_as_wkt) ) ) FROM src LIMIT 1;
  • 31. Problems ● No persistent spatial indices ● No projections - length/area! ● Binary output by default ● Doesn’t work with vectorization ● No visualization ● Not feature complete (but most things work)
  • 32. Use Cases for Geo Processing
  • 33. Use Case: Geo Processing @ T-Mobile Austria ● Network traffic analysis and optimization ● Signal performance analog railway tracks ● Better analysis of network coverage ● Many more..
  • 34. Use Case: Trips Analysis @ Uber ● What do trips look like? ● How can we reduce wait time and make more trips? ● Are there new products we should introduce? Source: slideshare.net
  • 35. Use Case: Traffic Jam Prediction based on GPS/FCD ● Estimate average speed of cars on road ● Compare to the max speed on each street ● Use public traffic jam data as ground truth ● Train a model to predict traffic jams