Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Science and Real-Time Decision Support

•Descargar como PPTX, PDF•

1 recomendación•417 vistas

Invited talk for 2016 AGU Fall Meeting Session IN12A Big Data Analytics I Introduced is a new approach for processing spatiotemporal big data by leveraging distributed analytics and storage. A suite of temporally-aware analysis tools summarizes data nearby or within variable windows, aggregates points (e.g., for various sensor observations or vessel positions), reconstructs time-enabled points into tracks (e.g., for mapping and visualizing storm tracks), joins features (e.g., to find associations between features based on attributes, spatial relationships, temporal relationships or all three simultaneously), calculates point densities, finds hot spots (e.g., in species distributions), and creates space-time slices and cubes (e.g., in microweather applications with temperature, humidity, and pressure, or within human mobility studies). These “feature geo analytics” tools run in both batch and streaming spatial analysis mode as distributed computations across a cluster of servers on typical “big” data sets, where static data exist in traditional geospatial formats (e.g., shapefile) locally on a disk or file share, attached as static spatiotemporal big data stores, or streamed in near-real-time. In other words, the approach registers large datasets or data stores with ArcGIS Server, then distributes analysis across a cluster of machines for parallel processing. Several brief use cases will be highlighted based on a 16-node server cluster at 14 Gb RAM per node, allowing, for example, the buffering of over 8 million points or thousands of polygons in ~1 minute. The approach is “hybrid” in that ArcGIS Server integrates open-source big data frameworks such as Apache Hadoop and Apache Spark on the cluster in order to run the analytics. In addition, the user may devise and connect custom open-source interfaces and tools developed in Python or Python Notebooks; the common denominator being the familiar REST API.

Ciencias

Feature Geo Analytics and
Big Data Processing:
Hybrid Approaches for Earth Science
and Real-Time Decision Making
Mansour Raad, Erik Hoel, Michael Park, Adam
Mollenkopf, Dawn J. Wright
Environmental Systems Research Institute (aka Esri)
IN12A-01 (Invited)
AGU Fall Meeting, 12 December 2016

What is Feature Geo Analytics?
A new way of processing spatiotemporal data designed for WEB-
BASED big data by leveraging distributed analytics and storage
• Works with existing GIS data and tabular data
• Designed to perform both spatial and temporal analysis
• Uses familiar workflows to complete complex analyses
• “Hybridity” - integrating open-source frameworks on clusters to run analytics

Feature Geo Analytics
Geoprocessing
Distributed analytics and storage
Feature Geo Analytics
Portal
Web GIS Layers
newmore extends

Solve New Problems
Run analytics:
• against data too big for a single desktop machine
- Buffer 8.2 million points or thousands of polygons in a little over a minute
- billions of observations of ship movements ingested via GeoEvent
• designed to gain insight into both spatial and temporal patterns
• against massive collections in a scalable manner
• and meet time constraints
months weeks days hours minutes

Geo Analytics Architectural Overview
Portal
Web GIS Layers
Un-Managed Data
New Web GIS Layers
Register large data stores, then distribute
spatial analysis across cluster of machines
for parallel processing
Store and/or deploy to web
Web GIS layers
via Pro, Portal,
Python Notebooks,
or the REST API
Managed Data
Relational
Data Store
Spatiotemporal
Data Store
Files
Files
Delimited Files EnterpriseShapefiles Big Data Stores
Server
Cluster

Rich Collection of (Web) Analysis Tools
Summarize Data
Aggregate Points
Summarize Nearby
Summarize Within
Reconstruct Tracks
Join Features
Find Locations
Find Existing Locations
Find Similar Locations
Analyze Patterns
Calculate Density
Find Hot Spots
Create Space Time Cube
Use Proximity
Create Buffers
Manage Data
Extract Data
* Temporally aware tools
Aggregate Points
Summarize Nearby
Summarize Within
Find Existing Locations
Find Similar Locations
Calculate Density
Find Hot Spots
Create Buffers
Extract Data

Analytical Overview: Aggregating and Summarizing
• Spatial Joins
• Space-time slices

• Spatiotemporal joins
Target Features Join Features Intermediate Result Final Result
Analytical Overview: Aggregating and Summarizing

• Points into Bins
Analytical Overview: Aggregating and Summarizing

Aggregation – Polygons vs Cells
Aggregation By Polygons Aggregation By Cells

• Reconstruct Tracks
- Summarize time-enabled points into tracks
Analytical Overview: Aggregating and Summarizing

Use Case: Hurricane Tracts
• Hurricane dataset
- 120,000 points, ~100 years
- Each point has:
- ID number
- Location
- Date
- Wind speed and pressure attributes
- Problems?
- Difficult to visualize that many points
- Difficult to visualize hurricane path

“Hybridity” for Distributed Computation
See also www.esri.com/software/open

Real-Time GIS Performance
ArcGIS 10.4
10s of thousands of e/s
ArcGIS Spatiotemporal
Big Data Store
DesktopWeb Device
ArcGIS Server
4,000
e/s
Ingestion
GeoEvent
4,000
e/s
Visualization
Live and Historic
Aggregates & Features
Enhanced Map and
Feature Service
• Ingest high-velocity real-
time data
• Observations in a Big Data
Store
• Visualize high-velocity,
high-volume data
- as an AGGREGATION,
- as discrete FEATURES,
- live & HISTORICALLY
• Visualizations CAN scale
Stream Service
Stream Layer
3,000
e/s
Live Features
Geo Analytics Performance
Spatiotemporal
Big Data Store

Discussion groups at geonet.esri.com
Step 1. Click orange “Join in” button to create your
account.
Step 2. Join the Big Data or Sciences groups
Step 3. Contribute to AGU conversations!
Mansour Raad, Esri Big Data Team
mraad@esri.com
thunderheadxpler.blogspot.com
github.com/mraad
@mraad
For Questions/Discussion

Más contenido relacionado

La actualidad más candente

Real Time GeodemographicsDr Muhammad Adnan

Scalable Data Analytics and Visualization with Cloud Optimized ServicesGlobus

Advancing Scientific Data Support in ArcGISThe HDF-EOS Tools and Information Center

ERFEG Seminar Fall 2008shirabay

From analogue to digital historyKatrina Navickas

Magellen: Geospatial Analytics on Spark by Ram SriharshaSpark Summit

Processing Geospatial Data At Scale @locationtechRob Emanuele

Big data mega surveys pushing the boundariesGeodata AS

Big Data and Geospatial with HPCC SystemsHPCC Systems

Computation of spatial data on Hadoop ClusterAbhishek Sagar

1Spatial: Edinburgh FME World Tour: Performance tips1Spatial

Kansa SAA 2014 Archaeological Data on Vastly Different Scalesdinaa_proj

Snow cover assessment tool using PythonPrasun Kumar Gupta

linkIn_CVPR15Xinchao Li

Enabling Access to Big Geospatial Data with LocationTech and Apache projectsRob Emanuele

Murphy presentationCOGS Presentations

Reading HDF family of formats via NetCDF-Java / CDMThe HDF-EOS Tools and Information Center

NJ Wildlife Habitat FinderDan Ford

Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...Viet-Trung TRAN

Deadline-aware MapReduce Job Scheduling with Dynamic Resource AvailabilityJAYAPRAKASH JPINFOTECH

La actualidad más candente (20)

Real Time Geodemographics

Scalable Data Analytics and Visualization with Cloud Optimized Services

Advancing Scientific Data Support in ArcGIS

ERFEG Seminar Fall 2008

From analogue to digital history

Magellen: Geospatial Analytics on Spark by Ram Sriharsha

Processing Geospatial Data At Scale @locationtech

Big data mega surveys pushing the boundaries

Big Data and Geospatial with HPCC Systems

Computation of spatial data on Hadoop Cluster

1Spatial: Edinburgh FME World Tour: Performance tips

Kansa SAA 2014 Archaeological Data on Vastly Different Scales

Snow cover assessment tool using Python

linkIn_CVPR15

Enabling Access to Big Geospatial Data with LocationTech and Apache projects

Murphy presentation

Reading HDF family of formats via NetCDF-Java / CDM

NJ Wildlife Habitat Finder

Paper@Soict2015: GPSInsights: towards a scalable framework for mining massive...

Deadline-aware MapReduce Job Scheduling with Dynamic Resource Availability

Similar a Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Science and Real-Time Decision Support

Introduction to Google Earth Engine .pptxPutu Perdana Kusuma Wiguna

Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAUInfinIT - Innovationsnetværket for it

Software for the Hydrographic oceanHydrographic Society Benelux

Big Process for Big Data @ PNNL, May 2013Ian Foster

ArcGIS and Multi-D: Tools & RoadmapThe HDF-EOS Tools and Information Center

CLIM Program: Remote Sensing Workshop, Distributed Access and Analysis: NASA ...The Statistical and Applied Mathematical Sciences Institute

Geospatial Sensor Networks and Partitioning DataAlexMiowski

Data Centric HPC for Numerical Weather ForecastingJames Arnold Faeldon

Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars Joel Saltz

PEARC17: Visual exploration and analysis of time series earthquake dataAmit Chourasia

CitySprint Fleetmapper use case -Big Data BootcampEduard Lazar

Analysis Ready Data workshop - OGC presentation George Percivall

Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWSEd Dodds

Big Data Analysis : Deciphering the haystack Srinath Perera

HP - Jerome Rolia - Hadoop World 2010Cloudera, Inc.

Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...Accumulo Summit

Exascale Computing and Experimental Sensor DataJoel Saltz

Geo Analytics Canada Overview - May 2020GEO Analytics Canada

CLIM: Transition Workshop - Optimization Methods in Remote Sensing - Jessica...The Statistical and Applied Mathematical Sciences Institute

ACCESS-Opt_OverviewMark Patrick Cheeseman

Similar a Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Science and Real-Time Decision Support (20)

Introduction to Google Earth Engine .pptx

Nye forskninsgresultater inden for geo-spatiale data af Christian S. Jensen, AAU

Software for the Hydrographic ocean

Big Process for Big Data @ PNNL, May 2013

ArcGIS and Multi-D: Tools & Roadmap

CLIM Program: Remote Sensing Workshop, Distributed Access and Analysis: NASA ...

Geospatial Sensor Networks and Partitioning Data

Data Centric HPC for Numerical Weather Forecasting

Exascale Challenges: Space, Time, Experimental Science and Self Driving Cars

PEARC17: Visual exploration and analysis of time series earthquake data

CitySprint Fleetmapper use case -Big Data Bootcamp

Analysis Ready Data workshop - OGC presentation

Experiences In Building Globus Genomics Using Galaxy, Globus Online and AWS

Big Data Analysis : Deciphering the haystack

HP - Jerome Rolia - Hadoop World 2010

Accumulo Summit 2016: GeoMesa: Using Accumulo for Optimized Spatio-Temporal P...

Exascale Computing and Experimental Sensor Data

Geo Analytics Canada Overview - May 2020

CLIM: Transition Workshop - Optimization Methods in Remote Sensing - Jessica...

ACCESS-Opt_Overview

Más de Dawn Wright

Geospatial as an Accelerator of Impact: Already Converging!Dawn Wright

Ease Leads to Exposure, Exposure Leads to AdoptionDawn Wright

Data for the Blue Future: New Collaborations for ProgressDawn Wright

AGU Sharing Science - Social Media TipsDawn Wright

The Perils and Promise of Environmental Data ScienceDawn Wright

Discovery, Technology, Hope: Colorado College Roberts SymposiumDawn Wright

Marie Tharp, Giants of Tectonophysics Session, American Geophysical UnionDawn Wright

52 Million Points and Counting: A New Stratification Approach for Mapping Glo...Dawn Wright

Discovery, Technology, HopeDawn Wright

Toward Easy Export of Imagery Products and Feature Classes as Training Data f...Dawn Wright

Integrated GIS/Machine-Learning Workflows - Seagrass Use CaseDawn Wright

Swells, Soundings, and Sustainability in the OceansDawn Wright

University of Redlands Symposium 2018Dawn Wright

Your Knowledge, Our Community, the Ocean's ResilienceDawn Wright

Socialspatial Research for Communities: Telling the Story of People and PlaceDawn Wright

Ocean Solutions, Earth SolutionsDawn Wright

Ecological Marine Units: A New Public-Private Partnership for the Global OceanDawn Wright

A Dark Side to Data-Centric Geography? Where are the Reward Systems?Dawn Wright

Esri and the Scientific CommunityDawn Wright

Latest Developments in Oceanographic Applications of GIS, including Near-real...Dawn Wright

Más de Dawn Wright (20)

Geospatial as an Accelerator of Impact: Already Converging!

Ease Leads to Exposure, Exposure Leads to Adoption

Data for the Blue Future: New Collaborations for Progress

AGU Sharing Science - Social Media Tips

The Perils and Promise of Environmental Data Science

Discovery, Technology, Hope: Colorado College Roberts Symposium

Marie Tharp, Giants of Tectonophysics Session, American Geophysical Union

52 Million Points and Counting: A New Stratification Approach for Mapping Glo...

Discovery, Technology, Hope

Toward Easy Export of Imagery Products and Feature Classes as Training Data f...

Integrated GIS/Machine-Learning Workflows - Seagrass Use Case

Swells, Soundings, and Sustainability in the Oceans

University of Redlands Symposium 2018

Your Knowledge, Our Community, the Ocean's Resilience

Socialspatial Research for Communities: Telling the Story of People and Place

Ocean Solutions, Earth Solutions

Ecological Marine Units: A New Public-Private Partnership for the Global Ocean

A Dark Side to Data-Centric Geography? Where are the Reward Systems?

Esri and the Scientific Community

Latest Developments in Oceanographic Applications of GIS, including Near-real...

Último

Cyathodium bryophyte: morphology, anatomy, reproduction etc.Silpa

Dr. E. Muralinath_ Blood indices_clinical aspectsmuralinath2

Reboulia: features, anatomy, morphology etc.Silpa

Factory Acceptance Test( FAT).pptx .Poonam Aher Patil

PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEGoa Call Girls High Profile Escorts

Use of mutants in understanding seedling development.pptxRenuJangid3

Selaginella: features, morphology ,anatomy and reproduction.Silpa

Digital Dentistry.Digital Dentistryvv.pptxMohamedFarag457087

THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxANSARKHAN96

Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....muralinath2

GBSN - Microbiology (Unit 3)Defense Mechanism of the body Areesha Ahmad

module for grade 9 for distance learninglevieagacer

Thyroid Physiology_Dr.E. Muralinath_ Associate Professormuralinath2

Molecular markers- RFLP, RAPD, AFLP, SNP etc.Silpa

POGONATUM : morphology, anatomy, reproduction etc.Silpa

biology HL practice questions IB BIOLOGY1301aanya

Clean In Place(CIP).pptx .Poonam Aher Patil

Human genetics..........................pptxSilpa

Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani

CURRENT SCENARIO OF POULTRY PRODUCTION IN INDIADr. TATHAGAT KHOBRAGADE

Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Science and Real-Time Decision Support

1. Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Science and Real-Time Decision Making Mansour Raad, Erik Hoel, Michael Park, Adam Mollenkopf, Dawn J. Wright Environmental Systems Research Institute (aka Esri) IN12A-01 (Invited) AGU Fall Meeting, 12 December 2016

2. What is Feature Geo Analytics? A new way of processing spatiotemporal data designed for WEB- BASED big data by leveraging distributed analytics and storage • Works with existing GIS data and tabular data • Designed to perform both spatial and temporal analysis • Uses familiar workflows to complete complex analyses • “Hybridity” - integrating open-source frameworks on clusters to run analytics

3. Feature Geo Analytics Geoprocessing Distributed analytics and storage Feature Geo Analytics Portal Web GIS Layers newmore extends

4. Solve New Problems Run analytics: • against data too big for a single desktop machine - Buffer 8.2 million points or thousands of polygons in a little over a minute - billions of observations of ship movements ingested via GeoEvent • designed to gain insight into both spatial and temporal patterns • against massive collections in a scalable manner • and meet time constraints months weeks days hours minutes

5. Geo Analytics Architectural Overview Portal Web GIS Layers Un-Managed Data New Web GIS Layers Register large data stores, then distribute spatial analysis across cluster of machines for parallel processing Store and/or deploy to web Web GIS layers via Pro, Portal, Python Notebooks, or the REST API Managed Data Relational Data Store Spatiotemporal Data Store Files Files Delimited Files EnterpriseShapefiles Big Data Stores Server Cluster

6. Rich Collection of (Web) Analysis Tools Summarize Data Aggregate Points Summarize Nearby Summarize Within Reconstruct Tracks Join Features Find Locations Find Existing Locations Find Similar Locations Analyze Patterns Calculate Density Find Hot Spots Create Space Time Cube Use Proximity Create Buffers Manage Data Extract Data * Temporally aware tools Aggregate Points Summarize Nearby Summarize Within Find Existing Locations Find Similar Locations Calculate Density Find Hot Spots Create Buffers Extract Data

7. Analytical Overview: Aggregating and Summarizing • Spatial Joins • Space-time slices

8. • Spatiotemporal joins Target Features Join Features Intermediate Result Final Result Analytical Overview: Aggregating and Summarizing

9. Temporal Relationships on Intervals

10. • Points into Bins Analytical Overview: Aggregating and Summarizing

11. Aggregation – Polygons vs Cells Aggregation By Polygons Aggregation By Cells

12. • Reconstruct Tracks - Summarize time-enabled points into tracks Analytical Overview: Aggregating and Summarizing

13. Use Case: Hurricane Tracts • Hurricane dataset - 120,000 points, ~100 years - Each point has: - ID number - Location - Date - Wind speed and pressure attributes - Problems? - Difficult to visualize that many points - Difficult to visualize hurricane path

14. “Hybridity” for Distributed Computation See also www.esri.com/software/open

15. “Hybridity” for Distributed Computation See also www.esri.com/software/open

16. Real-Time GIS Performance ArcGIS 10.4 10s of thousands of e/s ArcGIS Spatiotemporal Big Data Store DesktopWeb Device ArcGIS Server 4,000 e/s Ingestion GeoEvent 4,000 e/s Visualization Live and Historic Aggregates & Features Enhanced Map and Feature Service • Ingest high-velocity real- time data • Observations in a Big Data Store • Visualize high-velocity, high-volume data - as an AGGREGATION, - as discrete FEATURES, - live & HISTORICALLY • Visualizations CAN scale Stream Service Stream Layer 3,000 e/s Live Features Geo Analytics Performance Spatiotemporal Big Data Store

17. Discussion groups at geonet.esri.com Step 1. Click orange “Join in” button to create your account. Step 2. Join the Big Data or Sciences groups Step 3. Contribute to AGU conversations! Mansour Raad, Esri Big Data Team mraad@esri.com thunderheadxpler.blogspot.com github.com/mraad @mraad For Questions/Discussion

Notas del editor

“hybrid” in that ArcGIS Server integrates open-source big data frameworks such as Apache Hadoop and Apache Spark on the cluster in order to run the analytics
Building blocks of this approach
buffer 8.2 million points or thousands of polygons in a little over a minute Meet time constraints, especially against the next NSF proposal deadlines
These “feature geo analytics” tools run in both batch and streaming spatial analysis mode as distributed computations across a cluster of servers on typical “big” data sets, where static data exist in traditional geospatial formats (e.g., shapefile) locally on a disk or file share, attached as static spatiotemporal big data stores, or streamed in near-real-time. In other words, the approach registers large datasets or data stores with ArcGIS Enterprise (Server), then distributes analysis across a cluster of machines for parallel processing. We aim to register large data stores / data sets with ArcGIS Server, then distribute analysis across a cluster of machines for parallel processing Many frameworks/technologies exist for distributing computation E.g., Hadoop, MapReduce, Spark Spark: processes distributed data in memory; Supports MapReduce programming model Includes additional framework level distributed algorithms ArcGIS Server integrates these technologies on a cluster to solve analytic problems
Due to lack of time, will focus on Aggregation and Summarizing
Many frameworks/technologies exist for distributing computation E.g., Hadoop, MapReduce, Spark Spark: processes distributed data in memory; Supports MapReduce programming model Includes additional framework level distributed algorithms ArcGIS Server integrates these technologies on a cluster to solve analytic problems
For fast, dynamic queries, integrate Cloudera Impala which is an open-source query engine that runs on Apache Hadoop (Hadoop Distributed File System). Delivers fast SQL processing on HDFS Read/write data in HDFS using Impala Write code in Python, Java, Scala (like C, ”scalable language”) ArcPy helps you to perform geographic data analysis in Python By the way, you’ll need at least 8 CPU cores 16 Gb RAM (32 Gb is better) 512 Gb Solid State Drive (1 Tb is better)
e/s = events per second We aim to register large data stores / data sets with ArcGIS Server, then distribute analysis across a cluster of machines for parallel processing Performance example: buffer 8.2 million points or thousands of polygons in a little over a minute, Coming: ~250,000 writes to disk per second across 5 nodes Many frameworks/technologies exist for distributing computation E.g., Hadoop, MapReduce, Spark Spark: processes distributed data in memory; Supports MapReduce programming model Includes additional framework level distributed algorithms ArcGIS Server integrates these technologies on a cluster to solve analytic problems

Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Science and Real-Time Decision Support

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Science and Real-Time Decision Support

Similar a Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Science and Real-Time Decision Support (20)

Más de Dawn Wright

Más de Dawn Wright (20)

Último

Último (20)

Feature Geo Analytics and Big Data Processing: Hybrid Approaches for Earth Science and Real-Time Decision Support

Notas del editor