Más contenido relacionado La actualidad más candente (20) Similar a #2 NCI data services - Fair data webinar 6 Sept 2017 (20) #2 NCI data services - Fair data webinar 6 Sept 20171. FAIR Principle – Data Accessibility Practice at NCI
Jingbo Wang
ANDS webinar FAIR principles
6 Sep 2017
Canberra
2. nci.org.au© NCI Australia 2017
Overview of datasets at NCI
• climate and weather models
• satellite images
• bathymetry and elevation
• hydrology
• geophysics
• Also: optical astro, genomic
and social sciences
NCI makes available national reference datasets – especially those produced
by the government agencies. It is brought together at NCI and organised for
both high performance computation & high performance data analysis, as
well as making available more broadly to the research community.
3. nci.org.au© NCI Australia 2017
NCI provides user with Data-as-a-Service
User
generate/transfer
data
Fast data storage Data
Management
Portal
Web-time analytics
softwareHPC
Data Curation,
Publish, Citation
Data
catalogue
Super
computer
users
Paper and Data
are published
Data
visualization
Visualisation
tools
Data share and
re-use
What do we do?
4. nci.org.au© NCI Australia 2017
•Control Data Access - license, data access controls
•Persistently Access Data
•Access large Data through data services
•Scalable and distributed Data Access (advanced data services)
•Provenance implementation to Access versioned Data
•Access quality Data
5. nci.org.au© NCI Australia 2017
License: CCBY4.0, CCBY-NC-ND, CCBY-NC-SA, ECMWF, others
Access Control List (ACL): set to manage read and write permission
Embargo period, dev-ops time delay, publication dependency
FYI: Poblet. et al. 2016 Assigning Creative Commons Licenses to Research Metadata: Issues and Cases
https://arxiv.org/abs/1609.05700
1. Control Data Access
6. nci.org.au© NCI Australia 2017
2. Persistently Access Data
Reference: Wang et al. 2017 Persistent Identifier Practice for Big Data Management at NCI.
Digital Science Journal https://datascience.codata.org/articles/10.5334/dsj-2017-020/
7. nci.org.au© NCI Australia 2017
Open Geospatial Consortium Services
• Web mapping services
• Web coverage services
• Web feature services
• Web processing services
• Web coverage-processing services
Other NCI data services
• THREDDS data subsetting
• GSKY
• ESGF (services used for international Climate Model Intercomparison Projects (MIPS))
• ASVO
• ERDDAP, GeoServer, Rasdaman
The most broad-scale, general purpose server has become TDS because of the range of data services
and protocols supported.
3. Access Data through NCI’s data services
10. nci.org.au© NCI Australia 2017
3. Access Data through NCI’s data services - examples
When searching and browsing data,
published collections will have a direct
link to NCI’s Data Services.
https://datacatalogue.nci.org.au
https://geonetwork.nci.org.au
11. nci.org.au© NCI Australia 2017
THREDDS data services
What is THREDDS?
THREDDS (Thematic Realtime Environmental Distributed Data Services) data server (TDS) developed by
Unidata (UCAR) Allows for browsing and accessing of data (as well as metadata)
Name Description
OPeNDAP (DAP2) Protocol enabling data access and subsetting through the web
NetCDF Subset Service (NCSS) Web service for subsetting files that can be read by the netCDF
java library
Web Map Service (WMS) OGC web service for requesting static images of data
Web Coverage Service (WCS) OGC web service for requesting data in some output format
Godiva Data Viewer Tool for simple visualisation of data
HTTP File Download Direct downloading
12. nci.org.au© NCI Australia 2017
WMS
request
WCS
request
WPS
request
FEATURES
• Distributed
• Scalable
• Concurrent
• Multi Cloud
INPUT
OUTPUT
OGC
Request
OGC
Output
User’s
browser
WMS client
4. Scalable and Distributed Data Access - GSKY
GSKY http://ceur-ws.org/Vol-1913/RL17_paper_14.pdf
13. nci.org.au© NCI Australia 2017
5. Provenance implementation to Access versioned Data
Publication
with data
extract
reference
URI
points to
data
extract
URI
points to
an earlier
version x
of data
extract
URI
points to
an even
earlier
version 1
of data
extract
URI points the
original source
used to
generate a
series of data
extracts
reference: Wang et al. 2015. Enabling dynamic access to dynamic petascale Earth Systems and Environmental data collections is easy:
citing and reproducing the actual data extracts used in research publications is NOT. American Geophysical Union Fall meeting.
Data DataMetadata
final version raw data
14. nci.org.au© NCI Australia 2017
6. Access quality Data
NCI Data Quality Strategy Reference: Evans. et. al. 2017 (invited) A data quality strategy for
programmatic access to large collections of diverse datasets on an integrated high performance
platform. DMPI Informatics.
• Data Quality Control (QC), Quality Assurance (QA) report,
benchmarking use cases should be available for the community.
• When users access the data, they can also access the data quality
report.