Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Anuncio
Cargando en…3
×

Eche un vistazo a continuación

1 de 50 Anuncio

Más Contenido Relacionado

Presentaciones para usted (19)

Similares a Geo data analytics (20)

Anuncio

Más reciente (20)

Anuncio

Geo data analytics

  1. 1. Geo Data Analytics
  2. 2. @dmarcous ● DBA (@IDF) ● Big Data Professional (@IDF) ● Data Wizard - Magic with Data (@Google - Waze)
  3. 3. ● Pure professional ● Best practices ● Tools ● Tips & Tricks ● Free Advice!
  4. 4. Agenda ● Why? ● Common Language ● Problems at scale ● Solutions at scale ● Tips & Tricks for scientists (/Wizards) ● Art ● Keep an eye out for… ● Dog Pictures
  5. 5. Why Does Geo Data Matter?
  6. 6. ● C/C++, GEOS: http://trac.osgeo.org/geos ● C#, NTS: http://code.google.com/p/nettopologysuite/ ● Java, JTS: ○ http://tsusiatsoftware.net/jts/main.html ○ http://www.vividsolutions.com/jts/JTSHome.htm ● Python, shapely: https://github.com/Toblerity/Shapely ● Ruby, ffi-geos: https://github.com/dark-panda/ffi-geos ● Javascript, JSTS: http://github.com/bjornharrtell/jsts
  7. 7. Geometry Object Model
  8. 8. Geospatial Operations
  9. 9. ● WKT / WKB - Geospatial Markup Language ○ POLYGON((34.807841777801514 32.164333053441936,34.81168270111084 32.164859820966136,34.81337785720825 32.1613540349589,34.80865716934204 32.16046394346568,34.807841777801514 32.164333053441936)) ○ http://arthur-e.github.io/Wicket/sandbox-gmaps3.html ● GeoJSON ○ { "type": "FeatureCollection", "features": [{ "type": "Feature", "properties": { "Name": "Verint", "Guest": "dmarcous", "Accomodations": "Beer; Pizza" }, "geometry": { "type": "Polygon", "coordinates": [ [ [ 34.807841777801514, 32.164333053441936 ], [ 34.81168270111084, 32.164859820966136 ], [ 34.81337785720825, 32.1613540349589 ], [ 34.80865716934204, 32.16046394346568 ], [ 34.807841777801514, 32.164333053441936 ]]]}}]} ○ http://geojson.io/#map=17/32.16267/34.81061 ● Shape Files - ESRI vector format ● GML - The Geography Markup Language (GML) is an XML grammar for expressing geographical features. ● Raster - Display file built from coordinates Formats
  10. 10. Databases ● RDBMS ○ Postgres (PostGIS) ○ MS-SQL / DB2 / Oracle ● NoSQL ○ MongoDB ○ IBM Cloudant ○ Lucene spatial module (elastic/ solr) ● Pure Geospatial Database ○ CartoDB (OS / Hosted) ○ GeoMesa (Accumulo) ■ GeoTrellis - Scala framework for processing raster data
  11. 11. GIS Systems List of most popular ones - http://en.wikipedia.org/wiki/List_of_geographic_information_systems_software QGIS TileMillGRASS
  12. 12. Problem? ● Non scalar data types ○ Aggregating ○ Sharding ○ Unordered ● Speed & Accuracy ○ The Physical World is non-euclidian http://www.jandrewrogers.com/2015/03/02/geospatial- databases-are-hard/
  13. 13. Solution
  14. 14. Data Structures ● R-Tree (PostGIS, actually R+Tree) ● Quad Tree (DB2) ● Hyperdimensional Hashing ● Space Filling Curves ○ Z Order Curve (MS-SQL) ○ Hilbert Curve
  15. 15. The Curse of Dimensionality
  16. 16. Dimension Reduction ● GeoHash - The mainstream way ○ Linear (non tangant), up to x5 difference in cell area ○ Same Prefix - Close areas (sort of…) ○ http://geohash.org/ ○ https://github.com/google/open-location- code/blob/master/docs/comparison.adoc ● S2 - The google way ○ Quadratic, same level cell ~ similar area ○ Faces of a projected cube - divided by Quad-Trees to levels - Referenced to position on face by a Hilbert Curve ○ https://code.google.com/p/s2-geometry-library/
  17. 17. ● MongoDB Geospatial Indexing ● elastic / solr spatial indexing ● GeoMesa ● Build your own - Store the bytes in a fast key-value store with reduced keys (HBase / Cassandra) Near Real Time Answers
  18. 18. ● ESRI - Hive UDFs - https://github.com/Esri/spatial-framework-for- hadoop/wiki/UDF-Documentation ● Pigeon - Pig UDFs - https://github.com/aseldawy/pigeon ● Spark - ○ SpatialSpark ○ GeoTrellis Big Processing - It’s a UDF World
  19. 19. Graph Representation ● Use Cases ○ Routing ○ Supply Chains ○ Users Networks ● Tools ○ GraphX (Spark!) / Giraph (MR) ○ Dato SGraph (formerly known as GraphLab) ○ Gephi (On small parts for exploration) ● Algorithms ○ Shortest Path - Dijkstra / A-* ○ Communities - Triangle Counting ○ Importance - Centrality / Page Rank
  20. 20. Tips & Tricks
  21. 21. Approximation
  22. 22. Timezones ● tz_world ○ http://efele.net/maps/tz/world/ ○ What do we do with shapefiles? ● APIs ○ Geonames ○ http://www.earthtools.org/ ○ Google Timezone API ● UDFs? ○ Hive - from_utc_timestamp(timestamp, string timezone)
  23. 23. // Word Count val textFile = spark.textFile("hdfs://...") val counts = textFile.flatMap(line => line.split(" ")) .map(word => (word, 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://...") // Modified Word Count val textFile = spark.textFile("hdfs://...") val counts = textFile.map(line => line.split(",")) .map(point => (coord2S2Cell(point(1),point(2)), 1)) .reduceByKey(_ + _) counts.saveAsTextFile("hdfs://...") // Take that from a library! def coord2S2Cell(longitude: Double, latitude: Double, lvl = 14) : Int = { return S2Cell(longitude,latitude, lvl).CellId() } Good Old Word Count
  24. 24. Advanced - Precision is of the Essence ● Density Based Clustering ○ DBSCAN ■ Minimum cluster size (> Noise) ■ Epsilon (Spatial Radius) ○ R - MASS - kde2d ■ RGoogleMaps for the map ■ http://www.everydayanalytics.ca /2014/04/heatmap-of-toronto- traffic-signals.html
  25. 25. rJava ● Wrap geospatial functions of your choice ● call them from R ● Use apply on an entire Dataframe! ● Use as features! ● Visualize??? (in 5 minutes)
  26. 26. R Packs for Geospatial Analysis ● geonames ○ Timezone ○ Weather ○ Nearby places ● RGoogleMaps ○ download+paint Maps ○ getGeoCode ● sp / maps / maptools ○ OGC object abstractions ○ Manipulate / display geo data ● rgdal - spTransform ○ Convert formats / coordinates systems ● geosphere - distances / circles / centroids ● fpc - DBSCAN ● Coverage - ○ http://cran.r-project.org/web/views/Spatial.html
  27. 27. Engineered Geo features ● LOCAL ○ time ○ is_early / is_late ○ day of week ○ is_workday / is_weekend ○ is_day_light (sunrise/ sunset tz_world) ● Weather ○ Temperature ○ is_ Rain/ Fog / Hail / Snow ● Squared (s2cell/ geohash) statistics ○ Probability of users in square to predict X ● Address - is_residence / is_business ● News - GDELT
  28. 28. WOW!
  29. 29. Data Art
  30. 30. Google Sheets
  31. 31. Frontend = Javascript? ● Google Maps API ○ https://developers.google.com/maps/documentation/javascript/examples/layer- heatmap ● Leaflet
  32. 32. R for Visualisation ● ggplot2 + geospatial packs ○ http://uce.uniovi.es/mundor/howtoplotashapemap.html ○ http://stackoverflow.com/questions/9558040/ggplot-map-with-l ○ http://spatial.ly/2012/02/great-maps-ggplot2/ ● RGoogleMaps ○ http://rforwork.info/tag/rgooglemaps/
  33. 33. R For Interactive ● Shiny ○ Leaflet ■ http://rstudio.github.io/leaflet/ ■ http://shiny.rstudio.com/gallery/superzip-example.html ■ http://shiny.rstudio.com/gallery/bus-dashboard.html ○ Globe ■ https://github.com/trestletech/shinyGlobe
  34. 34. R Animation ● http://rmaps.github.io/blog/posts/animated-choropleths/
  35. 35. @aaronkoblin
  36. 36. Keep an Eye Out! https://locationtech.org/list-of-projects
  37. 37. Contact ● Daniel Marcous ● dmarcous@gmail.com

×