#4 FAIR - Provenance as an element of FAIR data principles - 20-09-17

Provenance as an element of
FAIR data principles
Enabling data reuse
Margie Smith
Science Data Governance & Policy
Science Data Section

Data governance and policy
ANDS FAIR webinar series #4 – September 2017
Data Governance Committee
Data Strategy
Data Management Policy
Data Archive Policy
⁞
Product Management Plans
Data Management Plans
Source catalogue
Standardised vocabularies
Publishing schemas
⁞

Why GA cares about data re-use
Understanding the provenance of data that GA creates and
consumes enables the organisation to adhere to its Science
principles
and underpins the organisation’s vision to ‘maximise our data
potential’.
http://www.ga.gov.au/about/corporate-plan

What does provenance information look like
As part of a metadata record
 Information can be brief free-text
 Structured free-text
Pilbara Block 1:100 000 Landsat-5-TM image maps. Image files in BIL format

What does provenance information look like
It can be discursive text
The ANUGA hydrodynamic model (https://anuga.anu.edu.au/) was run based on a Digital
Elevation Model (DEM) and inputs from a regional storm surge model (GEMS GCOM2D)
The maximum inundation depth and momentum values were identified in ArcGIS post processing. DEM
used within ANUGA: Triangular mesh created by/within ANUGA from a regular grid (1 m horizontal
resolution). The input grid was based on elevation data with varing accuary: onshore and
offshore LiDAR, Navy soundings and 1 second SRTM DEM. The derived triangular mesh consisted
of smaller triangles (max 5m^2) around the man-made drainage channels and larger triangles around
the remainder of the study region (max 350m^2)
Regional storm input: Temporal (i.e. storm characteristics through the simulation time) were
extracted from the regional storm modelling (GEMS GCOM2D model) results for point locations
along the Busselton-Dunsborough coastline.
ANUGA model variables Some key variables set within the Python code were:
minimum_storable_height = 0.10m, mannings coefficient of friction = 0.03, 12 minute modelling time
steps, 64 CPUs were used (variations were identifed between the results depending on the number of
CPUs specified.
The 64 CPU results were in the middle of the field (range from 8 to 128 CPUs). Broader detail of the
methods applied within this project are within the technical methodology document.
Also see the GA Professional Opinion (Coastal inundation modelling for Busselton, Western
Australia, under current and future climate)
(http://pid.geoscience.gov.au/dataset/78873)

Why we need provenance
Scenario: advice to the public was generated based on a
collection of sensor data at a point in time.
Advice is
generated
Dataset
A
Agent
Models
Algorithms
used
Dataset A
temporal subset
Software
version
Advice
request
HPRM
eCAT

Nick Car gave a presentation previously
https://youtu.be/elPcKqWoOPg

Provenance for data re-use
Process
Dataset A
HPRM
eCat
Output(s)
Advice
prov:Entity
Temporal DB
Event code /
query
Report
prov:Plan
prov:Activity
wasGeneratedBy
acquisition
GitHub

FAIR principles
TO BE RE-USABLE:
R1. meta(data) have a plurality of accurate and relevant
attributes.
• R1.1. (meta)data are released with a clear and accessible
data usage license.
• R1.2. (meta)data are associated with their provenance.
• R1.3. (meta)data meet domain-relevant community
standards.
https://www.force11.org/fairprinciples

What else we are doing at GA
• We have moved from an Oracle based ‘GeoCat’ catalogue to
our current ‘eCat’ which was made public last month.
• It was released as a minimum viable product and now
improvements are being backlogged and prioritised as
well as the BAU of product release.
• We are currently cataloguing our (300+) services and
linking the services to the data record in eCat where they
exist. (ie some services are based on aggregated datasets
or non-GA datasets)
• Catalogue schema and codelists will be published next
month.
• The processes for releasing/publishing data products is well
described and generally well known in the organisation.

GA Data and Publications Catalogue - eCat

http://pid.geoscience.gov.au/id/dataset/ga/72759

How to support provenance and data reuse
A ‘source
catalogue’ for the
data acquisition
phase
eCat for publishing
the data products
Software and Object
catalogues in the
future

Standards on provenance
“Machine readable” could be:
- An ISO19115 metadata statement per dataset contributing to
a PROV-DM provenance graph
Dataset
Record(1..n)
Product /subset
of data in eCat
Record1
Source
Catalogue
Service
Report
Data
product
Record(1..n)
Product in eCat
Record(1..n)
Product in eCat
derivedFrom

Standards on provenance
Dataset
A
CC-By
Dataset
B
Commercial
Ancestor(s)
Derived /
Aggregated dataset
will inherit a license
Dataset
D
Commercial
Licences
CC-By
CiC
…
License
aggregation WMS
CC-By
Software
C

Data management prioritisation
Useability
High Value

Thank you.
Margie.smith@ga.gov.au

#4 FAIR - Provenance as an element of FAIR data principles - 20-09-17

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a #4 FAIR - Provenance as an element of FAIR data principles - 20-09-17

Similar a #4 FAIR - Provenance as an element of FAIR data principles - 20-09-17 (20)

Más de ARDC

Más de ARDC (20)

Último

Último (20)

#4 FAIR - Provenance as an element of FAIR data principles - 20-09-17

Notas del editor