SlideShare una empresa de Scribd logo
1 de 53
• Satellite EO data is now too
big to analyze using
traditional desktop analytic
tools
• Impossible to analyze
satellite EO data over wide
areas and deep timeseries
using traditional tools
NASA EO archive (EOSDIS) Growth:
approaching 246PB in 2025
2
• Bring your algorithm to the data,
not the other way around
• Embrace big data tools and
systems used in other areas
• Transition away from desktop
analytics to cloud-native analytics
• This new era requires
partnerships between IT and
satellite EO experts
• Demonstration proof of concept
platform: www.geoanalytics.ca
3
www.geoanalytics.ca
4
• We can help you with your Big Geospatial Data Analytic problems
• Work with us to build & host your own platform
• Hatfield can provide embedded geospatial analytic experts to
support your project or initiative
For example: Wetland classification; Ecosystem disturbance monitoring
and recovery assessment; Wildfire mapping; Forest extent and biomass;
Water dynamics, including river and lake ice; Leaf Area Index (LAI) for
water balance studies; Land use and land cover change
5
• We did not want to build a closed platform that requires all data
and tools to be centralized in one place
• Instead we want to develop an ecosystem of open architected
systems that assume data and processing resources will be
distributed
• This platform demonstrates a starting point towards this open
architected, distributed ecosystem approach
6
• Cloud native
• Our solution is built from the ground-up to support the power
of cloud computing rather than simply migrating desktop apps
to the cloud
• Developer friendly
• Develop your own algorithms and systems in python, and scale
them dynamically and massively
• Desktop app friendly
• Take your linux desktop geospatial analytic apps
• Demonstrates the latest user- and machine-friendly OGC protocols
• API-Features and STAC
7
• Infrastructure vendor agnostic
• all tools and systems can be installed on a wide variety of cloud
computing providers. This allows us to pursue hybrid and multi-
cloud architectures that exploit pre-existing distributed data stores
• Supports open science
• All tools and systems support the key tenants of open
science: “openness, transparency, scrutiny and traceability of
results, access to large volume of complex data, and the
availability of community open tools”
• Canadian focused
• Uses only Canadian data storage and compute resource. This
supports Canadian organizations that are required to fulfill
Canadian privacy laws which require data to be kept in Canada.
•
8
9
• Based on Hatfield’s direct experience with ESA big
data analytic platforms:
• European Space Agency Thematic Exploitation
Platforms (TEPs), Copernicus Data and
Information Access Services (DIAS), etc.
• Informed by competitive analysis of other
internationally known platforms:
• OpenDataCube, Google Earth Engine,
Hexagon's M.appX, CS-SI’s GeoStorm, FAO’s
Sepal, EarthServer’s Rasdaman, Terradue’s
Ellip, EOS’s Platform, DigitalGlobe’s GBDX,
and Radiant.Earth’s platform
Object Storage
EO, ARD + project
shared data storage
Kubernetes
On-Demand
Compute
Docker image
storage
System Functions
STAC
Indexing of EO assets
GT Data Store with
OGC API-Features
OpenLDAP + DEX
authentication
KubeFlow batch
processing and machine
learning
Kubernetes Compute Cluster
Core system nodes
Per-user private
Interactive compute
nodes
On-demand
scalable compute
nodes
Web Portal
GitLab
private code repository
+ Docker registry
Jupyter-Lab model
development
environment
System documentation
+
examples
Desktops + tools (QGIS,
SNAP, etc.) in a browser
GT data upload and
management functions
EO data query +
discovery
User + cost
management
Infrastructure as a Service
Software as a Service
Web-map tile
generation
EO data
pre-processing
functions
File Browser
NFS Storage
User secure data
storage
Cost accounting
• IAAS: Infrastructure as a Service
• SAAS: Software as a Service
•
12
13
• Key Requirements:
• Providing managed Kubernetes clusters – dynamically scheduled and
scaled containerized workloads
• Availability of pre-emptible nodes –largescale computations done in a
cost-effective manner
• Having a Canadian data center – to comply with Canadian data residency
requirements.
• Selected: Google cloud
• Meets all the above requirements
• Already hosts Landsat 4-8 and Sentintel-2 collections, so no-need to
duplicate
14
• Vendor Neutrality:
• GEO Analytics Canada uses technologies available
on all major cloud hosting providers
• APIs and layers of abstraction have been used to
assure neutrality
• Vendor neutrality allows us to pursue multi-cloud
integrations
• For example: distributed machine learning, with
compute done close to pre-existing data stores
• Entirely based on Kubernetes (K8s)
• An open-source system for automating
deployment, scaling, and management of
containerized applications
• Analytics is done in parallel on many worker
nodes to conduct big data analytics in a
performant manner
• Pre-emptible nodes make on-demand
compute very inexpensive
• Applications and users request compute
resources (# of CPUs & GBs of RAM) which
are provided on-demand within seconds
15
16
• Object storage
• Highly durable with built-in
redundancy
• scales to exabytes of data
• Lowest cost
• On the Demonstration Platform, the
following are stored in object storage:
• Raw satellite EO data, including all
downloaded MODIS products
• Analysis ready satellite data (ARD)
• User and project team shared files
• Docker container images
17
• NFS storage service
• Compatible with all Linux-based
systems used on the
demonstration platform
• Used to store user personal home
directories
• Secure – only available to a
specific user (cannot be shared)
• Transfer to project team storage
area (on object store) if sharing
required
• Back-end storage is a standard
SATA disk
18
•
19
• All applications and APIs require
users to be authenticated
• User management and profiles
through LDAP
• Single-Sign-On
• Uses industry standard OAuth 2
protocol
• Users only need to log in once to
gain access to all applications
• APIs require token to
authenticate
21
• Web-browser based browse and
search interfaces
• Browse and search all datasets
• Query and view collections by
time, location
• SpatioTemporal Asset Catalog
(STAC) API of all EO datasets
• OGC API-Features (WFS3)
compliant metadata server
• API documented at
www.stacspec.org
• Current EO Data Collections:
Collection Name Description Time Period
Available
landsat-8-l1 Landsat-8 images over eastern Canadian
landmass (Manitoba east) 2003-2020
modis.MCD12Q1 MODIS Land Cover 2000-2020
modis.MOD09GQ Terra Surface Reflectance 2000-2020
modis.MOD09Q1 Terra Surface Reflectance 2000-2020
modis.MOD11A1 Terra Land Surface Temperature and Emissivity 2000-2020
modis.MOD11A2 Terra Land Surface Temperature and Emissivity 2000-2020
modis.MOD13Q1 Terra Vegetation Indices 2000-2020
modis.mod09gq.veg.ndvi NDVI derived from Terra Surface Reflectance 2000-2020
modis.mod09gq.veg.evi2 EVI2 derived from Terra Surface Reflectance 2000-2020
24
• Fully uses the computing power and
scalability of the IAAS tier
• multi-stage data processing pipelines
• Enables containerized applications to
be put into a processing chain that can
be scaled massively
• Implemented using KubeFlow
• primarily designed to enable machine
learning (ML) workflows
• Same ML workflows constructs are re-
purposed for EO data ingestion and
pre-processing
• Proof of concept EO data pipelines created:
• Level-2 Sentinel-2 products using Sen2Cor
• Run any set of commands that are available through ESA’s
Sentinel Application Platform (SNAP) software
• Downloads MODIS products to the object store and adds the
product to the EO metadata system
• Adds Landsat-8 images over the Eastern Canadian landmass
(i.e. Manitoba east) to the EO metadata system
• Creates NDVI and EVI2 products from Terra Surface
Reflectance products
• Creates a daily thermal average product from Terra Land
Surface Temperature products
• NDVI and EVI2 derived from Terra Surface
Reflectance Pipeline:
• Processing completed for all products available
between 2000-2020
• Results stored in object storage and indexed in
EO data query system
• Results available through all platform systems,
including EO data query and discovery system,
File Browser, desktop in a browser, etc.
• Runtime Example:
• 3 years of data (3 TB) processed in 13 hours
• 36 processing pods (1 per month), Each pod
is allocated 1vCPU, 5GB RAM
• Total cluster resources: 36vCPU, 180GB
RAM Viewing NDVI product using QGIS through the
‘desktop in a browser’ system
28
• 10 Sentinel-2 L1A tiles to L2A conversion
• Typically ~3-4 hours
• GEOAnalytics: ~28 minutes
29
30
• Python-based scalable data analytics
• Interacts with Kubernetes to provide on-demand scalable compute
• Core software systems:
• Jupyter-Lab – provides the web application framework for
interactive analytics
• Xarray – provides an N-Dimensional Array interface and toolset
• Iris – provides methods for analysing and visualising meteorological
and oceanographic data sets
• Dask – provides flexible parallel computing for analytics
• Zarr – the next generation, cloud-native file format for gridded
datasets
31
• To conduct scalable data analytics
• Use Zarr as your on-disk data storage format
• Use Xarray as your in-memory data interface
• Use Dask to execute your code with parallel execution using
Kubernetes to provide on-demand scalable compute
• Lazy loading/execution throughout (which is the default for Xarray
and Dask)
32
• Xarray and Dask
• Used in both Australia’s Open Data Cube
and the Euro Data Cube’s xcube core library
Xarray python N-Dimensional array library
DASK python library for distributed computing
EO & GT data storage
Jupyter-Lab
Kubernetes Compute Cluster
34
35
• Implements a “Pangeo” Environment
• www.pangeo.io
• Supports both HPC and Cloud infrastructure
• Similar in nature to the European Joint Research
Centre’s “Earth Observation Data and Processing
Platform” (JEODPP)
• https://jeodpp.jrc.ec.europa.eu/home/
36
• Hatfield has started a library
of example notebooks on how
to use the Jupyter-Lab
Environment
• Access Landsat data
through STAC API and
process/analyze it to
create an NDVI timeseries
• Query EO data hosted on
GEOAnalytics.ca using
OwsLib
https://github.com/geoanalytics-ca/example-notebooks
37
• NDVI Landsat-8 Example Notebook:
• 30 nodes, 210GB RAM, 60 CPUs
• Random location close to Saint
Hyacinthe, QC
NDVI of 2018 acquisitions
mean NDVI
38
39
• Collaboration and sharing
of source code with Git
• Private and shared
repositories available
40
• The container registry is
backed by the object
store system
• Cost effective storage of
large container images
• Images in registry can be used
in scalable workflows in the
platform’s EO data ingestion
and pre-processing systems
41
42
• Provides users with their own
Personal Ubuntu desktop
environment
• Accessible through a browser
• Enables data exploration directly
on the platform, reducing the need
to download data
• Users can select the amount of
RAM + CPU on startup:
• From 1 to 31 CPUs
• From 1 to 116 GB RAM
43
• Pre-installed software (SNAP,
QGIS, Firefox, etc)
• Users can install their own
software and customize the
desktop environment to be
their own
• EO data stores are mounted in
desktop environment for easy
access:
• All Sentinel-2 data
• All Landsat 4-8 data
• Pre-processed data products Viewing a Sentinel-2 product using QGIS
through the ‘desktop in a browser’ system
44
45
• Enables browsing and
downloading of all data
stored on the platform
for use in external
systems
• Users can view and
download data from:
• All EO data stores
• Shared data
between users of the
platform
• Their own personal
data
46
47
• Vector ground truth data
can be uploaded, viewed
and deleted
• Users upload a SHP
file which is imported
into the system
• Organized into collections
that contain features
• A SHP file is a
“collection”
48
• Features can be
browsed/searched
interactively
• Features can be
searched
• Webmap displays
features
• API endpoints implement
OGC API-Features
specification (previously
referred to as WFS3)
• Implemented using
PyGEOApi
49
50
• We want to help you with
your Big Geospatial Data
Analytic problems
• Not a closed platform.
Instead lets create open
architected systems that
assume data and
processing are distributed
• Cloud native
• Developer friendly
• Desktop app friendly
• Latest OGC protocols
• Infrastructure vendor
agnostic
• Supports open science
• Canadian focused
• What makes this platform different:
51
• The proof of concept platform demonstrates how [1]:
• Existing stores of satellite EO data can be
analyzed in-place using cloud-computing
resources, rather than requiring download
• New modular and user friendly metadata
protocols, particularly Spatio Temporal Asset
Catalogs (STAC), can be used to provide search
interface for satellite EO dataset discovery
52
• The proof of concept platform demonstrates how [2]:
• The new OGC API – Features (WFS 3) standard
can be used manage and make available ground
truth and other in-situ datasets
• Satellite EO analytic programs in Python can be
created interactively, and then scaled to analyze
large areas and deep timeseries using XArray
and Dask libraries
• Ingestion, machine learning, analytical and pre-
processing applications (both binary and python
based) can be linked to form scalable satellite EO
data processing chains
53
• Bring your algorithm to the data, not the
other way around
Email contacts:
info@geoanalytics.ca
jsuwala@hatfieldgroup.com

Más contenido relacionado

La actualidad más candente

Demonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the CloudsDemonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the CloudsIgor Sfiligoi
 
Hopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open WorkshopHopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open WorkshopExtremeEarth
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?DataWorks Summit
 
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Big Data Spain
 
Openstack For Beginners
Openstack For BeginnersOpenstack For Beginners
Openstack For Beginnerscpallares
 
Processing Geospatial Data At Scale @locationtech
Processing Geospatial Data At Scale @locationtechProcessing Geospatial Data At Scale @locationtech
Processing Geospatial Data At Scale @locationtechRob Emanuele
 
DUG'20: 08 - DAOS-SEGY Mapping
DUG'20: 08 - DAOS-SEGY MappingDUG'20: 08 - DAOS-SEGY Mapping
DUG'20: 08 - DAOS-SEGY MappingAndrey Kudryavtsev
 
GeoMesa LocationTech DC
GeoMesa LocationTech DCGeoMesa LocationTech DC
GeoMesa LocationTech DCCCRinc
 
GEO Analytics Canada Overview April 2020
GEO Analytics Canada Overview April 2020GEO Analytics Canada Overview April 2020
GEO Analytics Canada Overview April 2020GEO Analytics Canada
 
Enabling Access to Big Geospatial Data with LocationTech and Apache projects
Enabling Access to Big Geospatial Data with LocationTech and Apache projectsEnabling Access to Big Geospatial Data with LocationTech and Apache projects
Enabling Access to Big Geospatial Data with LocationTech and Apache projectsRob Emanuele
 
C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...
C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...
C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...J On The Beach
 
Scaling HDFS for Exabyte Storage@twitter
Scaling HDFS for Exabyte Storage@twitterScaling HDFS for Exabyte Storage@twitter
Scaling HDFS for Exabyte Storage@twitterlohitvijayarenu
 
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTechGeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTechRob Emanuele
 
Greenplum Overview for Postgres Hackers - Greenplum Summit 2018
Greenplum Overview for Postgres Hackers - Greenplum Summit 2018Greenplum Overview for Postgres Hackers - Greenplum Summit 2018
Greenplum Overview for Postgres Hackers - Greenplum Summit 2018VMware Tanzu
 

La actualidad más candente (20)

Demonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the CloudsDemonstrating 100 Gbps in and out of the Clouds
Demonstrating 100 Gbps in and out of the Clouds
 
ArcGIS and Multi-D: Tools & Roadmap
ArcGIS and Multi-D: Tools & RoadmapArcGIS and Multi-D: Tools & Roadmap
ArcGIS and Multi-D: Tools & Roadmap
 
CLIM Program: Remote Sensing Workshop, Distributed Access and Analysis: NASA ...
CLIM Program: Remote Sensing Workshop, Distributed Access and Analysis: NASA ...CLIM Program: Remote Sensing Workshop, Distributed Access and Analysis: NASA ...
CLIM Program: Remote Sensing Workshop, Distributed Access and Analysis: NASA ...
 
Hopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open WorkshopHopsworks - ExtremeEarth Open Workshop
Hopsworks - ExtremeEarth Open Workshop
 
What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?What's the Hadoop-la about Kubernetes?
What's the Hadoop-la about Kubernetes?
 
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
Large Infrastructure Monitoring At CERN by Matthias Braeger at Big Data Spain...
 
Openstack For Beginners
Openstack For BeginnersOpenstack For Beginners
Openstack For Beginners
 
Working with HDF and netCDF Data in ArcGIS: Tools and Case Studies
Working with HDF and netCDF Data in ArcGIS: Tools and Case StudiesWorking with HDF and netCDF Data in ArcGIS: Tools and Case Studies
Working with HDF and netCDF Data in ArcGIS: Tools and Case Studies
 
Processing Geospatial Data At Scale @locationtech
Processing Geospatial Data At Scale @locationtechProcessing Geospatial Data At Scale @locationtech
Processing Geospatial Data At Scale @locationtech
 
Druid
DruidDruid
Druid
 
DUG'20: 08 - DAOS-SEGY Mapping
DUG'20: 08 - DAOS-SEGY MappingDUG'20: 08 - DAOS-SEGY Mapping
DUG'20: 08 - DAOS-SEGY Mapping
 
GeoMesa LocationTech DC
GeoMesa LocationTech DCGeoMesa LocationTech DC
GeoMesa LocationTech DC
 
GEO Analytics Canada Overview April 2020
GEO Analytics Canada Overview April 2020GEO Analytics Canada Overview April 2020
GEO Analytics Canada Overview April 2020
 
Bridging ICESat and ICESat-2 Standard Data Products
Bridging ICESat and ICESat-2 Standard Data ProductsBridging ICESat and ICESat-2 Standard Data Products
Bridging ICESat and ICESat-2 Standard Data Products
 
Enabling Access to Big Geospatial Data with LocationTech and Apache projects
Enabling Access to Big Geospatial Data with LocationTech and Apache projectsEnabling Access to Big Geospatial Data with LocationTech and Apache projects
Enabling Access to Big Geospatial Data with LocationTech and Apache projects
 
C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...
C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...
C2MON - A highly scalable monitoring platform for Big Data scenarios @CERN by...
 
ICESat-2 Metadata and Status
ICESat-2 Metadata and StatusICESat-2 Metadata and Status
ICESat-2 Metadata and Status
 
Scaling HDFS for Exabyte Storage@twitter
Scaling HDFS for Exabyte Storage@twitterScaling HDFS for Exabyte Storage@twitter
Scaling HDFS for Exabyte Storage@twitter
 
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTechGeoSpatially enabling your Spark and Accumulo clusters with LocationTech
GeoSpatially enabling your Spark and Accumulo clusters with LocationTech
 
Greenplum Overview for Postgres Hackers - Greenplum Summit 2018
Greenplum Overview for Postgres Hackers - Greenplum Summit 2018Greenplum Overview for Postgres Hackers - Greenplum Summit 2018
Greenplum Overview for Postgres Hackers - Greenplum Summit 2018
 

Similar a Geo Analytics Canada Overview - May 2020

Analysis Ready Data workshop - OGC presentation
Analysis Ready Data workshop - OGC presentation Analysis Ready Data workshop - OGC presentation
Analysis Ready Data workshop - OGC presentation George Percivall
 
Geospatial Sensor Networks and Partitioning Data
Geospatial Sensor Networks and Partitioning DataGeospatial Sensor Networks and Partitioning Data
Geospatial Sensor Networks and Partitioning DataAlexMiowski
 
Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...David Wallom
 
MapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR Technologies
 
OGC Interfaces in Thematic Exploitation Platforms
OGC Interfaces in Thematic Exploitation PlatformsOGC Interfaces in Thematic Exploitation Platforms
OGC Interfaces in Thematic Exploitation Platformsterradue
 
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
 IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop PlatformApache Apex
 
Scalable Preservation Workflows
Scalable Preservation WorkflowsScalable Preservation Workflows
Scalable Preservation WorkflowsSCAPE Project
 
Federated Cloud Computing
Federated Cloud ComputingFederated Cloud Computing
Federated Cloud ComputingDavid Wallom
 
DSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado Blasco
DSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado BlascoDSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado Blasco
DSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado BlascoDeltares
 
PEARC17: Live Integrated Visualization Environment: An Experiment in General...
PEARC17: Live Integrated Visualization Environment: An Experiment in General...PEARC17: Live Integrated Visualization Environment: An Experiment in General...
PEARC17: Live Integrated Visualization Environment: An Experiment in General...moneyjh
 
COMSODE networking session at ICT Lisbon 2015
COMSODE networking session at ICT Lisbon 2015COMSODE networking session at ICT Lisbon 2015
COMSODE networking session at ICT Lisbon 2015Comsode - FP7 project
 
Network Engineering for High Speed Data Sharing
Network Engineering for High Speed Data SharingNetwork Engineering for High Speed Data Sharing
Network Engineering for High Speed Data SharingGlobus
 
Monitoring federation open stack infrastructure
Monitoring federation open stack infrastructureMonitoring federation open stack infrastructure
Monitoring federation open stack infrastructureFernando Lopez Aguilar
 
Towards INSPIRE environmental 5* Open Data
Towards INSPIRE environmental 5* Open Data Towards INSPIRE environmental 5* Open Data
Towards INSPIRE environmental 5* Open Data Martin Tuchyna
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache ApexApache Apex
 
EOSC-hub service portfolio
EOSC-hub service portfolioEOSC-hub service portfolio
EOSC-hub service portfolioEOSC-hub project
 
BDE SC3.3 Workshop - BDE Platform: Technical overview
 BDE SC3.3 Workshop -  BDE Platform: Technical overview BDE SC3.3 Workshop -  BDE Platform: Technical overview
BDE SC3.3 Workshop - BDE Platform: Technical overviewBigData_Europe
 

Similar a Geo Analytics Canada Overview - May 2020 (20)

Analysis Ready Data workshop - OGC presentation
Analysis Ready Data workshop - OGC presentation Analysis Ready Data workshop - OGC presentation
Analysis Ready Data workshop - OGC presentation
 
Geospatial Sensor Networks and Partitioning Data
Geospatial Sensor Networks and Partitioning DataGeospatial Sensor Networks and Partitioning Data
Geospatial Sensor Networks and Partitioning Data
 
Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...Utilising Cloud Computing for Research through Infrastructure, Software and D...
Utilising Cloud Computing for Research through Infrastructure, Software and D...
 
MapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data PlatformMapR 5.2: Getting More Value from the MapR Converged Data Platform
MapR 5.2: Getting More Value from the MapR Converged Data Platform
 
OGC Interfaces in Thematic Exploitation Platforms
OGC Interfaces in Thematic Exploitation PlatformsOGC Interfaces in Thematic Exploitation Platforms
OGC Interfaces in Thematic Exploitation Platforms
 
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
 IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
IoT Ingestion & Analytics using Apache Apex - A Native Hadoop Platform
 
Scalable Preservation Workflows
Scalable Preservation WorkflowsScalable Preservation Workflows
Scalable Preservation Workflows
 
Federated Cloud Computing
Federated Cloud ComputingFederated Cloud Computing
Federated Cloud Computing
 
DSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado Blasco
DSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado BlascoDSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado Blasco
DSD-INT 2015 - RSS Sentinel Toolbox - J. Manuel Delgado Blasco
 
EPCC MSc industry projects
EPCC MSc industry projectsEPCC MSc industry projects
EPCC MSc industry projects
 
PEARC17: Live Integrated Visualization Environment: An Experiment in General...
PEARC17: Live Integrated Visualization Environment: An Experiment in General...PEARC17: Live Integrated Visualization Environment: An Experiment in General...
PEARC17: Live Integrated Visualization Environment: An Experiment in General...
 
COMSODE networking session at ICT Lisbon 2015
COMSODE networking session at ICT Lisbon 2015COMSODE networking session at ICT Lisbon 2015
COMSODE networking session at ICT Lisbon 2015
 
GPA Software Overview R3
GPA Software Overview R3GPA Software Overview R3
GPA Software Overview R3
 
Network Engineering for High Speed Data Sharing
Network Engineering for High Speed Data SharingNetwork Engineering for High Speed Data Sharing
Network Engineering for High Speed Data Sharing
 
Monitoring federation open stack infrastructure
Monitoring federation open stack infrastructureMonitoring federation open stack infrastructure
Monitoring federation open stack infrastructure
 
Towards INSPIRE environmental 5* Open Data
Towards INSPIRE environmental 5* Open Data Towards INSPIRE environmental 5* Open Data
Towards INSPIRE environmental 5* Open Data
 
Introduction to Apache Apex
Introduction to Apache ApexIntroduction to Apache Apex
Introduction to Apache Apex
 
Introduction to GIS
Introduction to GISIntroduction to GIS
Introduction to GIS
 
EOSC-hub service portfolio
EOSC-hub service portfolioEOSC-hub service portfolio
EOSC-hub service portfolio
 
BDE SC3.3 Workshop - BDE Platform: Technical overview
 BDE SC3.3 Workshop -  BDE Platform: Technical overview BDE SC3.3 Workshop -  BDE Platform: Technical overview
BDE SC3.3 Workshop - BDE Platform: Technical overview
 

Último

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr BaganFwdays
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.Curtis Poe
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 

Último (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan"ML in Production",Oleksandr Bagan
"ML in Production",Oleksandr Bagan
 
Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.How AI, OpenAI, and ChatGPT impact business and software.
How AI, OpenAI, and ChatGPT impact business and software.
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 

Geo Analytics Canada Overview - May 2020

  • 1.
  • 2. • Satellite EO data is now too big to analyze using traditional desktop analytic tools • Impossible to analyze satellite EO data over wide areas and deep timeseries using traditional tools NASA EO archive (EOSDIS) Growth: approaching 246PB in 2025 2
  • 3. • Bring your algorithm to the data, not the other way around • Embrace big data tools and systems used in other areas • Transition away from desktop analytics to cloud-native analytics • This new era requires partnerships between IT and satellite EO experts • Demonstration proof of concept platform: www.geoanalytics.ca 3 www.geoanalytics.ca
  • 4. 4 • We can help you with your Big Geospatial Data Analytic problems • Work with us to build & host your own platform • Hatfield can provide embedded geospatial analytic experts to support your project or initiative For example: Wetland classification; Ecosystem disturbance monitoring and recovery assessment; Wildfire mapping; Forest extent and biomass; Water dynamics, including river and lake ice; Leaf Area Index (LAI) for water balance studies; Land use and land cover change
  • 5. 5 • We did not want to build a closed platform that requires all data and tools to be centralized in one place • Instead we want to develop an ecosystem of open architected systems that assume data and processing resources will be distributed • This platform demonstrates a starting point towards this open architected, distributed ecosystem approach
  • 6. 6 • Cloud native • Our solution is built from the ground-up to support the power of cloud computing rather than simply migrating desktop apps to the cloud • Developer friendly • Develop your own algorithms and systems in python, and scale them dynamically and massively • Desktop app friendly • Take your linux desktop geospatial analytic apps • Demonstrates the latest user- and machine-friendly OGC protocols • API-Features and STAC
  • 7. 7 • Infrastructure vendor agnostic • all tools and systems can be installed on a wide variety of cloud computing providers. This allows us to pursue hybrid and multi- cloud architectures that exploit pre-existing distributed data stores • Supports open science • All tools and systems support the key tenants of open science: “openness, transparency, scrutiny and traceability of results, access to large volume of complex data, and the availability of community open tools” • Canadian focused • Uses only Canadian data storage and compute resource. This supports Canadian organizations that are required to fulfill Canadian privacy laws which require data to be kept in Canada.
  • 9. 9 • Based on Hatfield’s direct experience with ESA big data analytic platforms: • European Space Agency Thematic Exploitation Platforms (TEPs), Copernicus Data and Information Access Services (DIAS), etc. • Informed by competitive analysis of other internationally known platforms: • OpenDataCube, Google Earth Engine, Hexagon's M.appX, CS-SI’s GeoStorm, FAO’s Sepal, EarthServer’s Rasdaman, Terradue’s Ellip, EOS’s Platform, DigitalGlobe’s GBDX, and Radiant.Earth’s platform
  • 10. Object Storage EO, ARD + project shared data storage Kubernetes On-Demand Compute Docker image storage System Functions STAC Indexing of EO assets GT Data Store with OGC API-Features OpenLDAP + DEX authentication KubeFlow batch processing and machine learning Kubernetes Compute Cluster Core system nodes Per-user private Interactive compute nodes On-demand scalable compute nodes Web Portal GitLab private code repository + Docker registry Jupyter-Lab model development environment System documentation + examples Desktops + tools (QGIS, SNAP, etc.) in a browser GT data upload and management functions EO data query + discovery User + cost management Infrastructure as a Service Software as a Service Web-map tile generation EO data pre-processing functions File Browser NFS Storage User secure data storage Cost accounting
  • 11. • IAAS: Infrastructure as a Service • SAAS: Software as a Service
  • 13. 13 • Key Requirements: • Providing managed Kubernetes clusters – dynamically scheduled and scaled containerized workloads • Availability of pre-emptible nodes –largescale computations done in a cost-effective manner • Having a Canadian data center – to comply with Canadian data residency requirements. • Selected: Google cloud • Meets all the above requirements • Already hosts Landsat 4-8 and Sentintel-2 collections, so no-need to duplicate
  • 14. 14 • Vendor Neutrality: • GEO Analytics Canada uses technologies available on all major cloud hosting providers • APIs and layers of abstraction have been used to assure neutrality • Vendor neutrality allows us to pursue multi-cloud integrations • For example: distributed machine learning, with compute done close to pre-existing data stores
  • 15. • Entirely based on Kubernetes (K8s) • An open-source system for automating deployment, scaling, and management of containerized applications • Analytics is done in parallel on many worker nodes to conduct big data analytics in a performant manner • Pre-emptible nodes make on-demand compute very inexpensive • Applications and users request compute resources (# of CPUs & GBs of RAM) which are provided on-demand within seconds 15
  • 16. 16 • Object storage • Highly durable with built-in redundancy • scales to exabytes of data • Lowest cost • On the Demonstration Platform, the following are stored in object storage: • Raw satellite EO data, including all downloaded MODIS products • Analysis ready satellite data (ARD) • User and project team shared files • Docker container images
  • 17. 17 • NFS storage service • Compatible with all Linux-based systems used on the demonstration platform • Used to store user personal home directories • Secure – only available to a specific user (cannot be shared) • Transfer to project team storage area (on object store) if sharing required • Back-end storage is a standard SATA disk
  • 19. 19
  • 20. • All applications and APIs require users to be authenticated • User management and profiles through LDAP • Single-Sign-On • Uses industry standard OAuth 2 protocol • Users only need to log in once to gain access to all applications • APIs require token to authenticate
  • 21. 21
  • 22. • Web-browser based browse and search interfaces • Browse and search all datasets • Query and view collections by time, location • SpatioTemporal Asset Catalog (STAC) API of all EO datasets • OGC API-Features (WFS3) compliant metadata server • API documented at www.stacspec.org
  • 23. • Current EO Data Collections: Collection Name Description Time Period Available landsat-8-l1 Landsat-8 images over eastern Canadian landmass (Manitoba east) 2003-2020 modis.MCD12Q1 MODIS Land Cover 2000-2020 modis.MOD09GQ Terra Surface Reflectance 2000-2020 modis.MOD09Q1 Terra Surface Reflectance 2000-2020 modis.MOD11A1 Terra Land Surface Temperature and Emissivity 2000-2020 modis.MOD11A2 Terra Land Surface Temperature and Emissivity 2000-2020 modis.MOD13Q1 Terra Vegetation Indices 2000-2020 modis.mod09gq.veg.ndvi NDVI derived from Terra Surface Reflectance 2000-2020 modis.mod09gq.veg.evi2 EVI2 derived from Terra Surface Reflectance 2000-2020
  • 24. 24
  • 25. • Fully uses the computing power and scalability of the IAAS tier • multi-stage data processing pipelines • Enables containerized applications to be put into a processing chain that can be scaled massively • Implemented using KubeFlow • primarily designed to enable machine learning (ML) workflows • Same ML workflows constructs are re- purposed for EO data ingestion and pre-processing
  • 26. • Proof of concept EO data pipelines created: • Level-2 Sentinel-2 products using Sen2Cor • Run any set of commands that are available through ESA’s Sentinel Application Platform (SNAP) software • Downloads MODIS products to the object store and adds the product to the EO metadata system • Adds Landsat-8 images over the Eastern Canadian landmass (i.e. Manitoba east) to the EO metadata system • Creates NDVI and EVI2 products from Terra Surface Reflectance products • Creates a daily thermal average product from Terra Land Surface Temperature products
  • 27. • NDVI and EVI2 derived from Terra Surface Reflectance Pipeline: • Processing completed for all products available between 2000-2020 • Results stored in object storage and indexed in EO data query system • Results available through all platform systems, including EO data query and discovery system, File Browser, desktop in a browser, etc. • Runtime Example: • 3 years of data (3 TB) processed in 13 hours • 36 processing pods (1 per month), Each pod is allocated 1vCPU, 5GB RAM • Total cluster resources: 36vCPU, 180GB RAM Viewing NDVI product using QGIS through the ‘desktop in a browser’ system
  • 28. 28 • 10 Sentinel-2 L1A tiles to L2A conversion • Typically ~3-4 hours • GEOAnalytics: ~28 minutes
  • 29. 29
  • 30. 30 • Python-based scalable data analytics • Interacts with Kubernetes to provide on-demand scalable compute • Core software systems: • Jupyter-Lab – provides the web application framework for interactive analytics • Xarray – provides an N-Dimensional Array interface and toolset • Iris – provides methods for analysing and visualising meteorological and oceanographic data sets • Dask – provides flexible parallel computing for analytics • Zarr – the next generation, cloud-native file format for gridded datasets
  • 31. 31 • To conduct scalable data analytics • Use Zarr as your on-disk data storage format • Use Xarray as your in-memory data interface • Use Dask to execute your code with parallel execution using Kubernetes to provide on-demand scalable compute • Lazy loading/execution throughout (which is the default for Xarray and Dask)
  • 32. 32 • Xarray and Dask • Used in both Australia’s Open Data Cube and the Euro Data Cube’s xcube core library
  • 33. Xarray python N-Dimensional array library DASK python library for distributed computing EO & GT data storage Jupyter-Lab Kubernetes Compute Cluster
  • 34. 34
  • 35. 35 • Implements a “Pangeo” Environment • www.pangeo.io • Supports both HPC and Cloud infrastructure • Similar in nature to the European Joint Research Centre’s “Earth Observation Data and Processing Platform” (JEODPP) • https://jeodpp.jrc.ec.europa.eu/home/
  • 36. 36 • Hatfield has started a library of example notebooks on how to use the Jupyter-Lab Environment • Access Landsat data through STAC API and process/analyze it to create an NDVI timeseries • Query EO data hosted on GEOAnalytics.ca using OwsLib https://github.com/geoanalytics-ca/example-notebooks
  • 37. 37 • NDVI Landsat-8 Example Notebook: • 30 nodes, 210GB RAM, 60 CPUs • Random location close to Saint Hyacinthe, QC NDVI of 2018 acquisitions mean NDVI
  • 38. 38
  • 39. 39 • Collaboration and sharing of source code with Git • Private and shared repositories available
  • 40. 40 • The container registry is backed by the object store system • Cost effective storage of large container images • Images in registry can be used in scalable workflows in the platform’s EO data ingestion and pre-processing systems
  • 41. 41
  • 42. 42 • Provides users with their own Personal Ubuntu desktop environment • Accessible through a browser • Enables data exploration directly on the platform, reducing the need to download data • Users can select the amount of RAM + CPU on startup: • From 1 to 31 CPUs • From 1 to 116 GB RAM
  • 43. 43 • Pre-installed software (SNAP, QGIS, Firefox, etc) • Users can install their own software and customize the desktop environment to be their own • EO data stores are mounted in desktop environment for easy access: • All Sentinel-2 data • All Landsat 4-8 data • Pre-processed data products Viewing a Sentinel-2 product using QGIS through the ‘desktop in a browser’ system
  • 44. 44
  • 45. 45 • Enables browsing and downloading of all data stored on the platform for use in external systems • Users can view and download data from: • All EO data stores • Shared data between users of the platform • Their own personal data
  • 46. 46
  • 47. 47 • Vector ground truth data can be uploaded, viewed and deleted • Users upload a SHP file which is imported into the system • Organized into collections that contain features • A SHP file is a “collection”
  • 48. 48 • Features can be browsed/searched interactively • Features can be searched • Webmap displays features • API endpoints implement OGC API-Features specification (previously referred to as WFS3) • Implemented using PyGEOApi
  • 49. 49
  • 50. 50 • We want to help you with your Big Geospatial Data Analytic problems • Not a closed platform. Instead lets create open architected systems that assume data and processing are distributed • Cloud native • Developer friendly • Desktop app friendly • Latest OGC protocols • Infrastructure vendor agnostic • Supports open science • Canadian focused • What makes this platform different:
  • 51. 51 • The proof of concept platform demonstrates how [1]: • Existing stores of satellite EO data can be analyzed in-place using cloud-computing resources, rather than requiring download • New modular and user friendly metadata protocols, particularly Spatio Temporal Asset Catalogs (STAC), can be used to provide search interface for satellite EO dataset discovery
  • 52. 52 • The proof of concept platform demonstrates how [2]: • The new OGC API – Features (WFS 3) standard can be used manage and make available ground truth and other in-situ datasets • Satellite EO analytic programs in Python can be created interactively, and then scaled to analyze large areas and deep timeseries using XArray and Dask libraries • Ingestion, machine learning, analytical and pre- processing applications (both binary and python based) can be linked to form scalable satellite EO data processing chains
  • 53. 53 • Bring your algorithm to the data, not the other way around Email contacts: info@geoanalytics.ca jsuwala@hatfieldgroup.com

Notas del editor

  1. HPC Deployments: NCAR Cheyenne Cluster NASA Pleiades Cluster Columbia Habanero Cluster CNES HAL USGS Yeti UW Hyak DOD HPC at AFRL Princeton Tiger Pawsey Supercomputer University of Miami Pegasus
  2. HPC Deployments: NCAR Cheyenne Cluster NASA Pleiades Cluster Columbia Habanero Cluster CNES HAL USGS Yeti UW Hyak DOD HPC at AFRL Princeton Tiger Pawsey Supercomputer University of Miami Pegasus
  3. HPC Deployments: NCAR Cheyenne Cluster NASA Pleiades Cluster Columbia Habanero Cluster CNES HAL USGS Yeti UW Hyak DOD HPC at AFRL Princeton Tiger Pawsey Supercomputer University of Miami Pegasus
  4. HPC Deployments: NCAR Cheyenne Cluster NASA Pleiades Cluster Columbia Habanero Cluster CNES HAL USGS Yeti UW Hyak DOD HPC at AFRL Princeton Tiger Pawsey Supercomputer University of Miami Pegasus
  5. NFS image from https://medium.com/platformer-blog/nfs-persistent-volumes-with-kubernetes-a-case-study-ce1ed6e2c266
  6. HPC Deployments: NCAR Cheyenne Cluster NASA Pleiades Cluster Columbia Habanero Cluster CNES HAL USGS Yeti UW Hyak DOD HPC at AFRL Princeton Tiger Pawsey Supercomputer University of Miami Pegasus
  7. HPC Deployments: NCAR Cheyenne Cluster NASA Pleiades Cluster Columbia Habanero Cluster CNES HAL USGS Yeti UW Hyak DOD HPC at AFRL Princeton Tiger Pawsey Supercomputer University of Miami Pegasus
  8. DAG from https://www.slideshare.net/VictorZabalza/lens-data-exploration-with-dask-and-jupyter-widgets?from_action=save
  9. HPC Deployments: NCAR Cheyenne Cluster NASA Pleiades Cluster Columbia Habanero Cluster CNES HAL USGS Yeti UW Hyak DOD HPC at AFRL Princeton Tiger Pawsey Supercomputer University of Miami Pegasus