SlideShare una empresa de Scribd logo
1 de 47
Bill Howe
UW eScience Institute
Computer Science & Engineering
University of Washington
Big Ocean Data: Numerically
Conservative Parallel Query Processing
3/18/2022 Bill Howe, UW 1
Scott Moe
Applied Math
University of Washington
3/18/2022 Bill Howe, UW 2
http://escience.washington.edu
3/18/2022 Bill Howe, UW 3
eScience Scalable Data Analytics Group
Bill Howe, Phd (databases, cloud, data-intensive scalable computing, visualization)
Postdocs (past and present)
– Dan Halperin, postdoc (distributed systems, algorithms)
– Seung-Hee Bae, postdoc (scalable machine learning algorithms)
– (alumni) Marianne Shaw, Phd (hadoop, graph query, health informatics)
Staff (past and present)
– Sagar Chitnis (cloud computing, databases, web services)
– (alumni) Alicia Key (web applications, visualization) (alumni)
– (alumni) Garret Cole (databases, cloud computing, web services) (alumni)
Students
– Scott Moe (2nd yr Phd, Applied Math)
Partners
– CSE DB Faculty: Magda Balazinska, Dan Suciu
– CSE students: Paris Koutris, Prasang Upadhyaya, Shengliang Xu, Jinjing Wang
– UW-IT (web applications, QA/support)
– Cecilia Aragon, Phd, Associate Professor, HCDE (visualization, scientific applications)
3/18/2022 Bill Howe, UW 4
Summary of Big Data in the Ocean Sciences
• In transition from expedition-based to observatory-based science
• Enormous investments in infrastructure for linking sensors
• Ad hoc, on-demand integration of large, heterogeneous,
distributed datasets is the universal requirement in this regime
• “Integration” means “regridding”
– mesh to pixels, mesh to mesh, trajectory to mesh
– satellites to models, models to models, observations to models
• Regridding is hard
– Must be easy, tolerant of unusual grids, numerically conservative, efficient
Summary of our work
• We have the beginnings of a “universal regridding” operator with
nice algebraic properties
• We’re using it to implement efficient distributed data sharing
applications, parallel algorithms, and more
3/18/2022 Bill Howe, UW 5
slide: John Delaney, UW
Regional Scale Nodes
3/18/2022 Bill Howe, UW 7
John
Delaney
10s of Gigabits/second from the ocean floor
Cyberinfrastructure
3/18/2022 Bill Howe, UW 8
• “Integration of numerical ocean models and their output as derived data products”
• “Generation and distribution of qualified science data products in (near) real time”
• “Access to, syntactic transformation, semantic mediation, analysis, and visualization of
science data, data products, and derived data products”
Matthew
Arrot
3/18/2022 Bill Howe, UW 9
17 federal organizations named as partners
11 Regional Associations
“a strategy for incorporating observation systems from …
near shore waters as part of … a network of observatories.”
Center for Coastal Margin
Observation and Prediction (CMOP)
3/18/2022 Bill Howe, UW 10
Antonio
Baptista
Virtual Mekong Basin
3/18/2022 Bill Howe, UW 11
img src: Mark Stoermer, UW Center for Environmental Visualization
Jeff
Richey
3/18/2022 Bill Howe, UW 12
#
of
bytes
# of data sources
telescopes
spectra
LSST (~100PB; images, spectra)
PanSTARRS (~40PB; images, trajectories)
OOI (~50TB/year; sims, RSN)
IOOS (~50TB/year; sims, satellite, gliders,
AUVs, vessels, more)
CMOP (~10TB/year; sims, stations, gliders,
AUVs, vessels, more)
SDSS (~400TB; images, spectra, catalogs)
n-body
sims
models
AUVs
stations
cruises, CTDs
flow cytometry
gliders
ADCP
satellites
Astronomy
Ocean Sciences
3 V’s of Big Data
Volume
Variety
Velocity
SQLShare: Ad Hoc Databases for Science
Problem
– Data is captured and manipulated in ad hoc files
– Made sense five years ago; the data volumes were
manageable
– But now: 50k rows, 100s of files, sharing
Why not put everything into a database?
– A huge amount of up-front effort
– Hard to design for a moving target
– Running a database system is a huge drain
Solution: SQLShare
– Upload data through your browser: no setup, no installation
– Login to browse science questions in English
– Click a question, see the SQL to answer it question
– Edit the SQL to answer an "adjacent" question, even if you
wouldn’t know how to write it from scratch
3/18/2022 13
https://sqlshare.escience.washington.edu/
Summary of Big Data in the Ocean Sciences
• Transition from expedition-based to observatory-based science
• Enormous investments in integrating infrastructure for data
acquisition and interoperability
• Ad hoc, on-demand integration of large, heterogeneous,
distributed datasets is the universal requirement in this regime
• “Integration” means “regridding”
– mesh to pixels, mesh to mesh, trajectory to mesh
– satellites to models, models to models, observations to models
• Regridding is hard
– Must be easy, tolerant of unusual grids, numerically conservative, efficient
Summary of our work
• We have the beginnings of a “universal regridding” operator with
nice algebraic properties
• We’re using it to implement efficient distributed data sharing
applications, parallel algorithms, and more
3/18/2022 Bill Howe, UW 14
Status Quo
• “FTP + MATLAB”
• “Nascent Databases”
– File-based, format-specific API
– UniData’s NetCDF, HDF5
– Some IO optimization, some indexing
• Data Servers
– Same as file-based systems,
– but supports RPC
3/18/2022 Bill Howe, UW 15
Hyrax
None of this scales
- up with data volumes
- up with number of sources
- down with developer expertise
What do we really need here?
• Claim: Everything reduces to regridding
• Model-data comparisons skill assessment?
Regrid observations onto model mesh
• Model-model comparison?
Regrid one model’s mesh onto the other’s
• Model coupling?
Regrid a meso-scale atmospheric model onto your regional ocean model
• Visualization?
Regrid onto a 3D mesh, or regrid onto a 2D array of pixels
3/18/2022 Bill Howe, UW 16
Why is this so hard?
• Needs to handle Unstructured Grids
• Needs to be Numerically Conservative
• Needs to be Lightweight and Efficient
3/18/2022 Bill Howe, UW 17
3/18/2022 Bill Howe, UW 18
Washington
Oregon
Columbia River Estuary
Washington
Oregon
Columbia River Estuary
SciDB
Hyrax
GridFields
ESMF
VTK/Paraview
Structured grids are easy
3/18/2022 Bill Howe, eScience Institute 21
 The data model
(Cartesian products of coordinate variables)
 immediately implies a representation,
(multidimensional arrays)
 an API,
(reading and writing subslabs)
 and an efficient implementation
(address calculation using array “shape”)
Why is this so hard?
• Needs to handle Unstructured Grids
• Needs to be Numerically Conservative
• Needs to be Lightweight and Efficient
3/18/2022 Bill Howe, UW 22
Naïve Method: Interpolation (Spatial Join)
3/18/2022 Bill Howe, UW 23
For each vertex in the target grid,
Find containing cell in the source grid,
Evaluate the basis functions to interpolate
3/18/2022 Bill Howe, UW 24
Supermeshing [Farrell 10]
3/18/2022 Bill Howe, UW 25
For each cell in the target grid,
Find overlapping cells in the source grid,
Compute their intersections
Derive new coefficients to minimize L2 norm
* Guaranteeed Conservative
* Minimizes Error
But:
Domains must match exactly
3/18/2022 Bill Howe, UW 26
Why is this so hard?
• Needs to handle Unstructured Grids
• Needs to be Numerically Conservative
• Needs to be Lightweight and Efficient
3/18/2022 Bill Howe, UW 27
Pre-Relational: if your data changed, your application broke.
Early RDBMS were buggy and slow (and often reviled), but
required only 5% of the application code.
“Activities of users at terminals and most application programs should
remain unaffected when the internal representation of data is changed and
even when some aspects of the external representation are changed.”
Key Idea: Programs that manipulate tabular data exhibit an algebraic
structure allowing reasoning and manipulation independently of physical
data representation
Digression: Relational Database History
-- Codd 1979
Key Idea: An Algebra of Tables
select
project
join join
Other operators: aggregate, union, difference, cross product
30
Review: Algebraic Optimization
N = ((4*2)+((4*3)+0))/1
Algebraic Laws:
1. (+) identity: x+0 = x
2. (/) identity: x/1 = x
3. (*) distributes: (n*x+n*y) = n*(x+y)
4. (*) commutes: x*y = y*x
Apply rules 1, 3, 4, 2: N = (2+3)*4
two operations instead of five, no division operator
Same idea works with very large tables, but the payoff is much higher
3/12/09 Bill Howe, eScience Institute 31

H0 : (x,y,b) V0 : (z)
A
restrict(0, z >b)
B
color is depth
Algebraic Manipulation of Scientific Datasets,
B. Howe, D. Maier, VLDBJ 2005

H0 : (x,y,b) V0 : ( )
apply(0, z=(surf  b) *  )
bind(0, surf)
C
color is salinity
GridFields: An Algebra of Meshes
Example (1)
H = Scan(context, "H")
rH = Restrict("(326<x) & (x<345) & (287<y) & (y<302)", 0, H)
H = rH =
dimension
predicate
color: bathymetry
3/18/2022 howeb@stccmop.org
Example: Transect
P
3/18/2022 howeb@stccmop.org
Transect: Bad Query Plan

H(x,y,b)
V(z)
r(z>b) b(s) regrid

P
P  V
1) Construct full-size 3D grid
2) Construct 2D transect grid
3) Join 1) with 2)
3/18/2022 howeb@stccmop.org
Transect: Optimized Plan
P  V
V(z)
P
H(x,y,b)
regrid b(s)
 regrid

1) Find 2D cells containing points
2) Create “stacks” of 2D cells carrying data
3) Create 2D transect grid
4) Join 2) with 3)
3/18/2022 howeb@stccmop.org
1) Find cells containing points in P
3/18/2022 howeb@stccmop.org
1)
4)
2)
1) Find cells containing points in P
2) Construct “stacks” of cells
4) Join 2) with 3)
Transect: Results
3/18/2022 howeb@stccmop.org
0
5
10
15
20
25
30
35
40
45
vtk(3D) interpolate simple interp_o simple_o
secs
800 MB
(1 timestep)
• Screenshot of OPeNDAP demo
3/18/2022 Bill Howe, UW 39
http://ec2-174-129-186-110.compute-1.amazonaws.com:8088/nc/test4.nc.nc?
ugrid_restrict(0,"Y>41.5&Y<42.75&X>-68.0&X<-66.0")
3/18/2022 Bill Howe, UW 40
3/18/2022 Bill Howe, UW 41
Restrict(Regrid(X,Y)) = Regrid(Restrict(X), Restrict(Y))
Commutative Law of Regrid and Restrict:
G0 = Regrid(Restrict0(X), Restrict0(Y)))
G1 = Regrid(Restrict1(X), Restrict1(Y)))
:
GN = Regrid(Restrict2(X), Restrict2(Y)))
R = Stitch(G0, G1, G2)
3/18/2022 Bill Howe, UW 42
3/18/2022 Bill Howe, UW 43
3/18/2022 Bill Howe, UW 44
3/18/2022 Bill Howe, UW 45
3/18/2022 Bill Howe, UW 46
Globally conservative
Parallelizable
Commutes with user-
selected restrictions
masking to handle
mismatched domains
Todos:
• Characterize the error relative to plain supermeshing
• Universal Regridding-as-a-Service
Outreach and Usage
• Code released
– Search “gridfields” on google code
– http://code.google.com/p/gridfields/
– C++ with Python bindings
• Integrated into the Hyrax Data Server
– OPULS project funded by NOAA
– Server-side processing of unstructured grids
• Other users
– US Geological Survey
– NOAA
3/18/2022 Bill Howe, UW 47
3/18/2022 Bill Howe, UW 47

Más contenido relacionado

Destacado

Apps para audio y fotos
Apps para audio y fotosApps para audio y fotos
Apps para audio y fotosireneyaitana
 
Simulacro, subjetividad y bipolitica the show
Simulacro, subjetividad y bipolitica  the showSimulacro, subjetividad y bipolitica  the show
Simulacro, subjetividad y bipolitica the showcatiiz
 
El proyecto de investigación(slide)
El proyecto de investigación(slide)El proyecto de investigación(slide)
El proyecto de investigación(slide)moises2ve
 
Comelit 3319/2 Data Sheet
Comelit 3319/2 Data SheetComelit 3319/2 Data Sheet
Comelit 3319/2 Data SheetJMAC Supply
 
TOP 5 Reasons to Get PMP Certification
TOP 5 Reasons to Get PMP CertificationTOP 5 Reasons to Get PMP Certification
TOP 5 Reasons to Get PMP Certificationseric2167
 
Case Study 2: Instrumentation Software
Case Study 2: Instrumentation SoftwareCase Study 2: Instrumentation Software
Case Study 2: Instrumentation SoftwareJunaid Lodhi
 
Y15.09.16 SketchNotes SF meetup Kate Rutter
Y15.09.16 SketchNotes SF meetup Kate RutterY15.09.16 SketchNotes SF meetup Kate Rutter
Y15.09.16 SketchNotes SF meetup Kate RutterJerome Domurat
 
Surfconext a new collaboration paradigm
Surfconext a new collaboration paradigmSurfconext a new collaboration paradigm
Surfconext a new collaboration paradigmPaul van Dijk
 
Exeed Training Calendar 2017
Exeed Training Calendar 2017Exeed Training Calendar 2017
Exeed Training Calendar 2017Samras Mayimi
 
RV 2015: Transit Cost + Equity: Current Trends in Affordable Fares and Passes...
RV 2015: Transit Cost + Equity: Current Trends in Affordable Fares and Passes...RV 2015: Transit Cost + Equity: Current Trends in Affordable Fares and Passes...
RV 2015: Transit Cost + Equity: Current Trends in Affordable Fares and Passes...Rail~Volution
 
Storytelling - Wie die Macht von Geschichten (2.0) nutzen?
Storytelling - Wie die Macht von Geschichten (2.0) nutzen?Storytelling - Wie die Macht von Geschichten (2.0) nutzen?
Storytelling - Wie die Macht von Geschichten (2.0) nutzen?Karin Thier
 

Destacado (15)

Apps para audio y fotos
Apps para audio y fotosApps para audio y fotos
Apps para audio y fotos
 
Simulacro, subjetividad y bipolitica the show
Simulacro, subjetividad y bipolitica  the showSimulacro, subjetividad y bipolitica  the show
Simulacro, subjetividad y bipolitica the show
 
El proyecto de investigación(slide)
El proyecto de investigación(slide)El proyecto de investigación(slide)
El proyecto de investigación(slide)
 
Torne se um Associado SP Busca
Torne se um Associado SP BuscaTorne se um Associado SP Busca
Torne se um Associado SP Busca
 
Comelit 3319/2 Data Sheet
Comelit 3319/2 Data SheetComelit 3319/2 Data Sheet
Comelit 3319/2 Data Sheet
 
La atmósfera
La atmósferaLa atmósfera
La atmósfera
 
TOP 5 Reasons to Get PMP Certification
TOP 5 Reasons to Get PMP CertificationTOP 5 Reasons to Get PMP Certification
TOP 5 Reasons to Get PMP Certification
 
final ppt 2007
final ppt 2007final ppt 2007
final ppt 2007
 
Case Study 2: Instrumentation Software
Case Study 2: Instrumentation SoftwareCase Study 2: Instrumentation Software
Case Study 2: Instrumentation Software
 
Amazon Deep Learning
Amazon Deep LearningAmazon Deep Learning
Amazon Deep Learning
 
Y15.09.16 SketchNotes SF meetup Kate Rutter
Y15.09.16 SketchNotes SF meetup Kate RutterY15.09.16 SketchNotes SF meetup Kate Rutter
Y15.09.16 SketchNotes SF meetup Kate Rutter
 
Surfconext a new collaboration paradigm
Surfconext a new collaboration paradigmSurfconext a new collaboration paradigm
Surfconext a new collaboration paradigm
 
Exeed Training Calendar 2017
Exeed Training Calendar 2017Exeed Training Calendar 2017
Exeed Training Calendar 2017
 
RV 2015: Transit Cost + Equity: Current Trends in Affordable Fares and Passes...
RV 2015: Transit Cost + Equity: Current Trends in Affordable Fares and Passes...RV 2015: Transit Cost + Equity: Current Trends in Affordable Fares and Passes...
RV 2015: Transit Cost + Equity: Current Trends in Affordable Fares and Passes...
 
Storytelling - Wie die Macht von Geschichten (2.0) nutzen?
Storytelling - Wie die Macht von Geschichten (2.0) nutzen?Storytelling - Wie die Macht von Geschichten (2.0) nutzen?
Storytelling - Wie die Macht von Geschichten (2.0) nutzen?
 

Más de University of Washington

Database Agnostic Workload Management (CIDR 2019)
Database Agnostic Workload Management (CIDR 2019)Database Agnostic Workload Management (CIDR 2019)
Database Agnostic Workload Management (CIDR 2019)University of Washington
 
Data Responsibly: The next decade of data science
Data Responsibly: The next decade of data scienceData Responsibly: The next decade of data science
Data Responsibly: The next decade of data scienceUniversity of Washington
 
Thoughts on Big Data and more for the WA State Legislature
Thoughts on Big Data and more for the WA State LegislatureThoughts on Big Data and more for the WA State Legislature
Thoughts on Big Data and more for the WA State LegislatureUniversity of Washington
 
The Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore EnvironmentsThe Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore EnvironmentsUniversity of Washington
 
Big Data + Big Sim: Query Processing over Unstructured CFD Models
Big Data + Big Sim: Query Processing over Unstructured CFD ModelsBig Data + Big Sim: Query Processing over Unstructured CFD Models
Big Data + Big Sim: Query Processing over Unstructured CFD ModelsUniversity of Washington
 
Data, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data ScienceData, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data ScienceUniversity of Washington
 
Data Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionData Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionUniversity of Washington
 
The Other HPC: High Productivity Computing
The Other HPC: High Productivity ComputingThe Other HPC: High Productivity Computing
The Other HPC: High Productivity ComputingUniversity of Washington
 
Big Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DBig Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DUniversity of Washington
 
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe University of Washington
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)University of Washington
 
Myria: Analytics-as-a-Service for (Data) Scientists
Myria: Analytics-as-a-Service for (Data) ScientistsMyria: Analytics-as-a-Service for (Data) Scientists
Myria: Analytics-as-a-Service for (Data) ScientistsUniversity of Washington
 
Enabling Collaborative Research Data Management with SQLShare
Enabling Collaborative Research Data Management with SQLShareEnabling Collaborative Research Data Management with SQLShare
Enabling Collaborative Research Data Management with SQLShareUniversity of Washington
 
Virtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible ResearchVirtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible ResearchUniversity of Washington
 

Más de University of Washington (20)

Database Agnostic Workload Management (CIDR 2019)
Database Agnostic Workload Management (CIDR 2019)Database Agnostic Workload Management (CIDR 2019)
Database Agnostic Workload Management (CIDR 2019)
 
Data Responsibly: The next decade of data science
Data Responsibly: The next decade of data scienceData Responsibly: The next decade of data science
Data Responsibly: The next decade of data science
 
Thoughts on Big Data and more for the WA State Legislature
Thoughts on Big Data and more for the WA State LegislatureThoughts on Big Data and more for the WA State Legislature
Thoughts on Big Data and more for the WA State Legislature
 
The Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore EnvironmentsThe Other HPC: High Productivity Computing in Polystore Environments
The Other HPC: High Productivity Computing in Polystore Environments
 
Big Data + Big Sim: Query Processing over Unstructured CFD Models
Big Data + Big Sim: Query Processing over Unstructured CFD ModelsBig Data + Big Sim: Query Processing over Unstructured CFD Models
Big Data + Big Sim: Query Processing over Unstructured CFD Models
 
Data, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data ScienceData, Responsibly: The Next Decade of Data Science
Data, Responsibly: The Next Decade of Data Science
 
Democratizing Data Science in the Cloud
Democratizing Data Science in the CloudDemocratizing Data Science in the Cloud
Democratizing Data Science in the Cloud
 
Science Data, Responsibly
Science Data, ResponsiblyScience Data, Responsibly
Science Data, Responsibly
 
Data Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data InteractionData Science, Data Curation, and Human-Data Interaction
Data Science, Data Curation, and Human-Data Interaction
 
The Other HPC: High Productivity Computing
The Other HPC: High Productivity ComputingThe Other HPC: High Productivity Computing
The Other HPC: High Productivity Computing
 
Urban Data Science at UW
Urban Data Science at UWUrban Data Science at UW
Urban Data Science at UW
 
Intro to Data Science Concepts
Intro to Data Science ConceptsIntro to Data Science Concepts
Intro to Data Science Concepts
 
Big Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&DBig Data Talent in Academic and Industry R&D
Big Data Talent in Academic and Industry R&D
 
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
Big Data Middleware: CIDR 2015 Gong Show Talk, David Maier, Bill Howe
 
Data Science and Urban Science @ UW
Data Science and Urban Science @ UWData Science and Urban Science @ UW
Data Science and Urban Science @ UW
 
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
MMDS 2014: Myria (and Scalable Graph Clustering with RelaxMap)
 
Myria: Analytics-as-a-Service for (Data) Scientists
Myria: Analytics-as-a-Service for (Data) ScientistsMyria: Analytics-as-a-Service for (Data) Scientists
Myria: Analytics-as-a-Service for (Data) Scientists
 
Data science curricula at UW
Data science curricula at UWData science curricula at UW
Data science curricula at UW
 
Enabling Collaborative Research Data Management with SQLShare
Enabling Collaborative Research Data Management with SQLShareEnabling Collaborative Research Data Management with SQLShare
Enabling Collaborative Research Data Management with SQLShare
 
Virtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible ResearchVirtual Appliances, Cloud Computing, and Reproducible Research
Virtual Appliances, Cloud Computing, and Reproducible Research
 

Último

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodJuan lago vázquez
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUK Journal
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)wesley chun
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoffsammart93
 

Último (20)

How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 

Conservative Regridding and Algebraic Manipulation of Meshes with GridFields

  • 1. Bill Howe UW eScience Institute Computer Science & Engineering University of Washington Big Ocean Data: Numerically Conservative Parallel Query Processing 3/18/2022 Bill Howe, UW 1 Scott Moe Applied Math University of Washington
  • 4. eScience Scalable Data Analytics Group Bill Howe, Phd (databases, cloud, data-intensive scalable computing, visualization) Postdocs (past and present) – Dan Halperin, postdoc (distributed systems, algorithms) – Seung-Hee Bae, postdoc (scalable machine learning algorithms) – (alumni) Marianne Shaw, Phd (hadoop, graph query, health informatics) Staff (past and present) – Sagar Chitnis (cloud computing, databases, web services) – (alumni) Alicia Key (web applications, visualization) (alumni) – (alumni) Garret Cole (databases, cloud computing, web services) (alumni) Students – Scott Moe (2nd yr Phd, Applied Math) Partners – CSE DB Faculty: Magda Balazinska, Dan Suciu – CSE students: Paris Koutris, Prasang Upadhyaya, Shengliang Xu, Jinjing Wang – UW-IT (web applications, QA/support) – Cecilia Aragon, Phd, Associate Professor, HCDE (visualization, scientific applications) 3/18/2022 Bill Howe, UW 4
  • 5. Summary of Big Data in the Ocean Sciences • In transition from expedition-based to observatory-based science • Enormous investments in infrastructure for linking sensors • Ad hoc, on-demand integration of large, heterogeneous, distributed datasets is the universal requirement in this regime • “Integration” means “regridding” – mesh to pixels, mesh to mesh, trajectory to mesh – satellites to models, models to models, observations to models • Regridding is hard – Must be easy, tolerant of unusual grids, numerically conservative, efficient Summary of our work • We have the beginnings of a “universal regridding” operator with nice algebraic properties • We’re using it to implement efficient distributed data sharing applications, parallel algorithms, and more 3/18/2022 Bill Howe, UW 5
  • 7. Regional Scale Nodes 3/18/2022 Bill Howe, UW 7 John Delaney 10s of Gigabits/second from the ocean floor
  • 8. Cyberinfrastructure 3/18/2022 Bill Howe, UW 8 • “Integration of numerical ocean models and their output as derived data products” • “Generation and distribution of qualified science data products in (near) real time” • “Access to, syntactic transformation, semantic mediation, analysis, and visualization of science data, data products, and derived data products” Matthew Arrot
  • 9. 3/18/2022 Bill Howe, UW 9 17 federal organizations named as partners 11 Regional Associations “a strategy for incorporating observation systems from … near shore waters as part of … a network of observatories.”
  • 10. Center for Coastal Margin Observation and Prediction (CMOP) 3/18/2022 Bill Howe, UW 10 Antonio Baptista
  • 11. Virtual Mekong Basin 3/18/2022 Bill Howe, UW 11 img src: Mark Stoermer, UW Center for Environmental Visualization Jeff Richey
  • 12. 3/18/2022 Bill Howe, UW 12 # of bytes # of data sources telescopes spectra LSST (~100PB; images, spectra) PanSTARRS (~40PB; images, trajectories) OOI (~50TB/year; sims, RSN) IOOS (~50TB/year; sims, satellite, gliders, AUVs, vessels, more) CMOP (~10TB/year; sims, stations, gliders, AUVs, vessels, more) SDSS (~400TB; images, spectra, catalogs) n-body sims models AUVs stations cruises, CTDs flow cytometry gliders ADCP satellites Astronomy Ocean Sciences 3 V’s of Big Data Volume Variety Velocity
  • 13. SQLShare: Ad Hoc Databases for Science Problem – Data is captured and manipulated in ad hoc files – Made sense five years ago; the data volumes were manageable – But now: 50k rows, 100s of files, sharing Why not put everything into a database? – A huge amount of up-front effort – Hard to design for a moving target – Running a database system is a huge drain Solution: SQLShare – Upload data through your browser: no setup, no installation – Login to browse science questions in English – Click a question, see the SQL to answer it question – Edit the SQL to answer an "adjacent" question, even if you wouldn’t know how to write it from scratch 3/18/2022 13 https://sqlshare.escience.washington.edu/
  • 14. Summary of Big Data in the Ocean Sciences • Transition from expedition-based to observatory-based science • Enormous investments in integrating infrastructure for data acquisition and interoperability • Ad hoc, on-demand integration of large, heterogeneous, distributed datasets is the universal requirement in this regime • “Integration” means “regridding” – mesh to pixels, mesh to mesh, trajectory to mesh – satellites to models, models to models, observations to models • Regridding is hard – Must be easy, tolerant of unusual grids, numerically conservative, efficient Summary of our work • We have the beginnings of a “universal regridding” operator with nice algebraic properties • We’re using it to implement efficient distributed data sharing applications, parallel algorithms, and more 3/18/2022 Bill Howe, UW 14
  • 15. Status Quo • “FTP + MATLAB” • “Nascent Databases” – File-based, format-specific API – UniData’s NetCDF, HDF5 – Some IO optimization, some indexing • Data Servers – Same as file-based systems, – but supports RPC 3/18/2022 Bill Howe, UW 15 Hyrax None of this scales - up with data volumes - up with number of sources - down with developer expertise
  • 16. What do we really need here? • Claim: Everything reduces to regridding • Model-data comparisons skill assessment? Regrid observations onto model mesh • Model-model comparison? Regrid one model’s mesh onto the other’s • Model coupling? Regrid a meso-scale atmospheric model onto your regional ocean model • Visualization? Regrid onto a 3D mesh, or regrid onto a 2D array of pixels 3/18/2022 Bill Howe, UW 16
  • 17. Why is this so hard? • Needs to handle Unstructured Grids • Needs to be Numerically Conservative • Needs to be Lightweight and Efficient 3/18/2022 Bill Howe, UW 17
  • 18. 3/18/2022 Bill Howe, UW 18 Washington Oregon Columbia River Estuary
  • 21. Structured grids are easy 3/18/2022 Bill Howe, eScience Institute 21  The data model (Cartesian products of coordinate variables)  immediately implies a representation, (multidimensional arrays)  an API, (reading and writing subslabs)  and an efficient implementation (address calculation using array “shape”)
  • 22. Why is this so hard? • Needs to handle Unstructured Grids • Needs to be Numerically Conservative • Needs to be Lightweight and Efficient 3/18/2022 Bill Howe, UW 22
  • 23. Naïve Method: Interpolation (Spatial Join) 3/18/2022 Bill Howe, UW 23 For each vertex in the target grid, Find containing cell in the source grid, Evaluate the basis functions to interpolate
  • 25. Supermeshing [Farrell 10] 3/18/2022 Bill Howe, UW 25 For each cell in the target grid, Find overlapping cells in the source grid, Compute their intersections Derive new coefficients to minimize L2 norm * Guaranteeed Conservative * Minimizes Error But: Domains must match exactly
  • 27. Why is this so hard? • Needs to handle Unstructured Grids • Needs to be Numerically Conservative • Needs to be Lightweight and Efficient 3/18/2022 Bill Howe, UW 27
  • 28. Pre-Relational: if your data changed, your application broke. Early RDBMS were buggy and slow (and often reviled), but required only 5% of the application code. “Activities of users at terminals and most application programs should remain unaffected when the internal representation of data is changed and even when some aspects of the external representation are changed.” Key Idea: Programs that manipulate tabular data exhibit an algebraic structure allowing reasoning and manipulation independently of physical data representation Digression: Relational Database History -- Codd 1979
  • 29. Key Idea: An Algebra of Tables select project join join Other operators: aggregate, union, difference, cross product
  • 30. 30 Review: Algebraic Optimization N = ((4*2)+((4*3)+0))/1 Algebraic Laws: 1. (+) identity: x+0 = x 2. (/) identity: x/1 = x 3. (*) distributes: (n*x+n*y) = n*(x+y) 4. (*) commutes: x*y = y*x Apply rules 1, 3, 4, 2: N = (2+3)*4 two operations instead of five, no division operator Same idea works with very large tables, but the payoff is much higher
  • 31. 3/12/09 Bill Howe, eScience Institute 31  H0 : (x,y,b) V0 : (z) A restrict(0, z >b) B color is depth Algebraic Manipulation of Scientific Datasets, B. Howe, D. Maier, VLDBJ 2005  H0 : (x,y,b) V0 : ( ) apply(0, z=(surf  b) *  ) bind(0, surf) C color is salinity GridFields: An Algebra of Meshes
  • 32. Example (1) H = Scan(context, "H") rH = Restrict("(326<x) & (x<345) & (287<y) & (y<302)", 0, H) H = rH = dimension predicate color: bathymetry
  • 34. 3/18/2022 howeb@stccmop.org Transect: Bad Query Plan  H(x,y,b) V(z) r(z>b) b(s) regrid  P P  V 1) Construct full-size 3D grid 2) Construct 2D transect grid 3) Join 1) with 2)
  • 35. 3/18/2022 howeb@stccmop.org Transect: Optimized Plan P  V V(z) P H(x,y,b) regrid b(s)  regrid  1) Find 2D cells containing points 2) Create “stacks” of 2D cells carrying data 3) Create 2D transect grid 4) Join 2) with 3)
  • 36. 3/18/2022 howeb@stccmop.org 1) Find cells containing points in P
  • 37. 3/18/2022 howeb@stccmop.org 1) 4) 2) 1) Find cells containing points in P 2) Construct “stacks” of cells 4) Join 2) with 3)
  • 38. Transect: Results 3/18/2022 howeb@stccmop.org 0 5 10 15 20 25 30 35 40 45 vtk(3D) interpolate simple interp_o simple_o secs 800 MB (1 timestep)
  • 39. • Screenshot of OPeNDAP demo 3/18/2022 Bill Howe, UW 39 http://ec2-174-129-186-110.compute-1.amazonaws.com:8088/nc/test4.nc.nc? ugrid_restrict(0,"Y>41.5&Y<42.75&X>-68.0&X<-66.0")
  • 41. 3/18/2022 Bill Howe, UW 41 Restrict(Regrid(X,Y)) = Regrid(Restrict(X), Restrict(Y)) Commutative Law of Regrid and Restrict: G0 = Regrid(Restrict0(X), Restrict0(Y))) G1 = Regrid(Restrict1(X), Restrict1(Y))) : GN = Regrid(Restrict2(X), Restrict2(Y))) R = Stitch(G0, G1, G2)
  • 46. 3/18/2022 Bill Howe, UW 46 Globally conservative Parallelizable Commutes with user- selected restrictions masking to handle mismatched domains Todos: • Characterize the error relative to plain supermeshing • Universal Regridding-as-a-Service
  • 47. Outreach and Usage • Code released – Search “gridfields” on google code – http://code.google.com/p/gridfields/ – C++ with Python bindings • Integrated into the Hyrax Data Server – OPULS project funded by NOAA – Server-side processing of unstructured grids • Other users – US Geological Survey – NOAA 3/18/2022 Bill Howe, UW 47 3/18/2022 Bill Howe, UW 47

Notas del editor

  1. OOI: ~$100M, ~$40, ~$60, ~$100, ~$60 in 2009 – 2013
  2. IOOS: $37M in 2012, up from 2010, down from 2004-2007
  3. $17M
  4. Data IntegrationState of the art: Download all the files you need, write your own MATLABThe Database Game: Bring the computation to the data rather than the data to the computationThis works great for simple things like tables, and even arrays. What about meshes?They’re harder:Standards-resistant Standards group working off and on for 7 years Show lots of different kind of grids? AMR, warped, finite element, finite differenceNumerical conservation is important “I don’t trust your processsing; I need access to the raw data” “Raw” may still be grossly downsampled for convenience purposesFundamental operation is regridding: Visualization? Regrid onto a smooth mesh, or regrid onto pixels Data Integration? Regrid one model’s results onto another’s for comparison Model coupling? Regrid one model’s results onto another’s for comparison Model-data comparisons? Induce a “grid” on the data if necessary, then regrid onto the modelFundamental operation is restriction: “I only care about this region” Use a meso-scale atmospheric model to force a local estuary model Parallel processing: break a large mesh into pieces and process them in parallelFundamental Law of Efficient Ocean Data Integration:Restrict * Regrid = Regrid * RestrictExample: Server-side query: “I need the data around the mouth of the estuary, regridded to a regular grid.” Option 1: Restrict * Regrid Option 2: Regrid * Restrict Naïve method: interpolate. It commutes, but it is not conservative and the error is significant Alternative for conservative regridding: Supermeshing [Farrell] Supermeshing requires identical domains, and so does not commute with restrict. Our approach: enhance supermeshing with masking and lumping to make it commute with restrict. Results:
  5. Data IntegrationState of the art: Download all the files you need, write your own MATLABThe Database Game: Bring the computation to the data rather than the data to the computationThis works great for simple things like tables, and even arrays. What about meshes?They’re harder:Standards-resistant Standards group working off and on for 7 years Show lots of different kind of grids? AMR, warped, finite element, finite differenceNumerical conservation is important “I don’t trust your processsing; I need access to the raw data” “Raw” may still be grossly downsampled for convenience purposesFundamental operation is regridding: Visualization? Regrid onto a smooth mesh, or regrid onto pixels Data Integration? Regrid one model’s results onto another’s for comparison Model coupling? Regrid one model’s results onto another’s for comparison Model-data comparisons? Induce a “grid” on the data if necessary, then regrid onto the modelFundamental operation is restriction: “I only care about this region” Use a meso-scale atmospheric model to force a local estuary model Parallel processing: break a large mesh into pieces and process them in parallelFundamental Law of Efficient Ocean Data Integration:Restrict * Regrid = Regrid * RestrictExample: Server-side query: “I need the data around the mouth of the estuary, regridded to a regular grid.” Option 1: Restrict * Regrid Option 2: Regrid * Restrict Naïve method: interpolate. It commutes, but it is not conservative and the error is significant Alternative for conservative regridding: Supermeshing [Farrell] Supermeshing requires identical domains, and so does not commute with restrict. Our approach: enhance supermeshing with masking and lumping to make it commute with restrict. Results:
  6. Data IntegrationState of the art: Download all the files you need, write your own MATLABThe Database Game: Bring the computation to the data rather than the data to the computationThis works great for simple things like tables, and even arrays. What about meshes?They’re harder:Standards-resistant Standards group working off and on for 7 years Show lots of different kind of grids? AMR, warped, finite element, finite differenceNumerical conservation is important “I don’t trust your processsing; I need access to the raw data” “Raw” may still be grossly downsampled for convenience purposesFundamental operation is regridding: Visualization? Regrid onto a smooth mesh, or regrid onto pixels Data Integration? Regrid one model’s results onto another’s for comparison Model coupling? Regrid one model’s results onto another’s for comparison Model-data comparisons? Induce a “grid” on the data if necessary, then regrid onto the modelFundamental operation is restriction: “I only care about this region” Use a meso-scale atmospheric model to force a local estuary model Parallel processing: break a large mesh into pieces and process them in parallelFundamental Law of Efficient Ocean Data Integration:Restrict * Regrid = Regrid * RestrictExample: Server-side query: “I need the data around the mouth of the estuary, regridded to a regular grid.” Option 1: Restrict * Regrid Option 2: Regrid * Restrict Naïve method: interpolate. It commutes, but it is not conservative and the error is significant Alternative for conservative regridding: Supermeshing [Farrell] Supermeshing requires identical domains, and so does not commute with restrict. Our approach: enhance supermeshing with masking and lumping to make it commute with restrict. Results: