Data for Science Service Portfolio

DATAFOR SCIENCE
SERVICE PORTFOLIO
VERSION 1
Editor: Zhiming Zhao
Contributors: Data for Science theme development teams
H2020 Project Project Number: 654182

The “data for science” service portfolio includes selected development
results from all development and use case teams.
 The development results (services) include
 The software/tools/methodology developed and customized by the
theme
 The software/tools/methodology reviewed and recommended by the
theme
 Each service serves certain phases in the data lifecycle, and is often
developed in the context of one or more service pillars defined in the
theme
 Technical specification
 How to use
 Success stories
 Community support
 Sustainability plan
INTRODUCTION

Acquisition
Curation
PublicationProcessing
Use
Research data
ResearchdatalifecycleinENVRIRM

Common vocabulary: Reference model
Identification/
Citation
Processing
Curation
Optimization
Provenance
Meta information linking: Linking model
RI development: Architecture design
SIOS
Cataloguing
Storage, computing, networking and other technologies
provided by underlying e-Infrastructures (EGI, EUDAT, etc.)
Servicepillarsdefinedindataforscience
themeforRIs

2017-Oct-1 draft created
 2017-Nov-3 Services are grouped as four
 2017-Nov-7 Version 1 revised, with input from ANAEE C2-C4
 2017-Nov-8 some contacts are refined
 2017-Nov-8 a new service from lifewatch was included
LOGS

A. Reference model related
A1: Reference model training service - CU
A2: Open information linking for ENV-RIs - UvA
A3: ENVRI knowledge base - UvA
A4: RI architecture design – NERC
B. Theme2 service pillar
B1: Linked open data ingestion and metadata service– ICOS/LU
B2: D4science data analytics - CNR
B3: Dynamic real-time infrastructure planner - UvA
B4: Curation - NERC
B5: Flagship cataloguing - IFREMER
B6: Provenance - EAA
C. Reusable solution from use cases/RIs
C1: Data subscription service - EUDAT
C2: Pipeline for semantic annotation of relational DB – ANAEE/INRA
C3: Data / metadata generation from semantic annotations- ANAEE/INRA
C4: Dynamic ecological information management system (DEIMS)- LTER/EAA
C5: Biodiversity Community Portal (LifeWatch/LTER)- EAA
D. Software quality check and testbed
D1: Envriplus service test bed - EGI
INDEX

B4: Curation - NERC
INDEX

A1. RM Training: Practical
Introduction to the ENVRI
RM
Relation to the data lifecycle: covers all phases
Data for Science crosscutting mechanisms: Common Vocabulary:
Reference model and RI development: Architecture Design
URL: https://training.envri.eu/course/search.php?search=ENVRI+RM
Available Date: 2017/10
Main contributor: Cardiff University
Role of contributor: developer
Contact: Abraham Nieva/Alex Hardisty

Highlighted features:
  Provides practical examples of the use of
the Reference Model
  Based on a real life modelling case
(DASSH use case)
  9 Lessons covering the science,
information and technology viewpoints
  Target users:
  Environmental Research Infrastructure
personnel who have little or no experience
designing and modelling complex
distributed information systems.
  Technology readiness level:
  Level 7. prototype demonstration in
operational environment
  Accessibility:
  Open Content, Licensed under a Creative
Commons Attribution - NonCommercial-
ShareAlike 2.0 Licence
  https://training.envri.eu/course/search.php?
search=ENVRI+RM
  Supported standards:
  SCORM 1.2
  Required platform:
  Firefox V.56, Chrome V.61
  Known bugs:
  Not tested on other browsers or earlier
versions
Technicalspecification

Howtouse?
Access the Course on Practical Introduction to the ENVRI RM
RI service
developers
E-Science
application
developers
Goal: use the ENVRI RM to
model technical solutions
which can be understood
by all stakeholders
Solution: Use the
Course on Practical
Introduction to the
ENVRI RM to learn
about designing models
to be shared with RI
stakeholders
YLooks like you
cannot benefit
from the course
Requirement: knowledge of
the research domain,
technical solutions available,
and modelling at different
levels for different audiences
Skilled in
modelling
complex systems?
Aware of the research
data life cycle and the
systems that support it?
Already developed
and deployed RI
systems?
YSystems are
documented and
easy to maintain?
YY
The course presents
an easy to follow
modelling process
The course shows how
the data lifecycle is
supported by RI systems
The course provides an
illustrated walkthrough
of the modelling
The course shows
how to document
existing systems
N N N N

Use case name:
  The Archive for Marine Species and
Habitats Data (DASSH)
  Key contributions:
  Modelled the existing system
  Illustrate use of the ENVRI RM
  Train other people needing to use the
ENVRI RM
  Used internally to train new staff
  Research infrastructure:
  DASSH
  Deployment environment:
  ENVRI Community training platform
  SCORM packages deployed in Moodle
  Results:
  First example using the ENVRI RM in
context
  Modelling of all three viewpoints SV, IV, CV
  Structured modelling process
Successstory
https://training.envri.eu/course/search.php?search=ENVRI+RM

Current Support:
 Practical Introduction to the ENVRI RM (Cardiff): provide a
structured introduction to the main concepts of the ENVRI Reference
Model in its 3 logical viewpoints. Nine lessons, starting with a use case
description and the research data lifecycle, incrementally introducing
more details about the Science, Information and Computational
Viewpoints.
 Work in progress
 Engineering and Technology Viewpoints (Cardiff): Four lessons
planned to describe the mapping of computational objects to
engineering and technology objects (Complementary lessons to
Practical Introduction course)
 ENVRI RM Introduction (Edinburgh) Course for RI Professionals:
Reference Model approach and the value of modelling. Ten Lessons
planned. This course will provide a clear, concise and understandable
introduction, free of technical jargon. The lessons will emphasise the
concrete benefits and value of systems-oriented modelling and the
role an RM plays in that.
RelationtotheENVRIRM

Current status:
 Beta version
 Technical support:
 Abraham Nieva, Alex Hardisty, Aurora Constantin, Malcolm Atkinson
 Support type:
 email support: Abraham Nieva/Alex Hardisty
(NievadelaHidalgaA@cardiff.ac.uk)
 open for feedback and requests
 planning of webinars and structured courses
Communitysupport

A2. Open Information Linking
for Environmental science
research infrastructures (OIL-E)
Relation to the data lifecycle: models all phases
Data for Science service pillar: semantic linking (cross-cutting)
URL: http://www.oil-e.net/
Available Date: 2017
Main contributor: University of Amsterdam
Contact: Paul Martin

  Captures ENVRI RM as a multi-viewpoint OWL ontology for RI architecture.
  Permits analysis and comparison of RI characteristics as the foundation for an ENVRI Knowledge
Base.
  Provides a linking framework for describing the different metadata schemes and technologies used by
RIs as well as to identify any semantic mappings available to convert between schemes.
  Target users:
  RI architects/developers,
  Semantic modellers,
  Tool developers.
  3–4，development on-going.
  Accessibility:
  Specifications: http://www.oil-e.net/
  Published RDF
  Semantic Web standards: RDF (representation), OWL (inference), SHACL (validation).
  Any RDF knowledge graph framework.

Current Support:
 OIL-E captures all archetypes from ENVRI RM v2.1, with updates planned
for future releases of ENVRI RM during ENVRIplus.
 Work in progress
 OIL-E needs to be extended to capture the design patterns used for
hosting services (engineering view) and to further classify the technologies
used by environmental science RIs (technology view).
 Greater extensibility: OIL-E will be restructured to better capture both the
architectural/schematic view of RI designs and the instance view of actual
RI service deployments and research asset collections classified in
accordance with ENVRI RM.
 Greater internal validation: taking into account new recommendations
(such as W3C SHACL), better validation of data instances can be
embedded within OIL-E to e.g. validate models constructed in ENVRI RM.
 More mappings: new semantic mappings will be created between selected
standards in order to demonstrate OIL-E’s viability as a semantic hub
between the different standards used to describe data, resources and
services in environmental services.

Howtouse?
Use OIL-E to
structure
your RI
reference
architecture.
RI architects Semantic modellers
No particular
benefit from
encoding in
OIL-E.
Map data
into OIL-E
framework OIL-E will not
benefit your tool
implementation.
Use OIL-E
as internal
data model
Tool developers
Yes
No
No need
to map to
OIL-E.
Want to augment a
knowledge graph with
ENVRI RM concepts?
Want to formally
publish your RI
specification for
discovery, query
and/or
comparison?
Modelled your RI
using ENVRI
RM?
Want to develop tools for
building RI specifications that
is programmatically verifiable
from a formal specification?
Want to describe a semantic
mapping from your controlled
vocabulary to ENVRI RM?
Want to extend
ENVRI RM with
classifiers and
rules from an
existing ontology?
Want to compare
your RI design
against a standard
model for
environmental RIs?

Wish to augment knowledge graph with ENVRI RM concepts?
 Want to formally publish RI specification (in ENVRI RM) for discovery,
query and comparison?
 Want to develop a tool for building RI specifications with a formally
verifiable model behind it?
 Want to describe a semantic mapping from your controlled vocabulary to
the ENVRI concept space?
 Want to extend ENVRI RM with classifiers and rules from an existing
ontology?
HOWTOUSE

Successstory
  Use case name:
  ENVRI Knowledge Base
  Key contributions:
  Encode sample RI data based on ENVRI RM
using OIL-E.
  Data accessible via a public SPARQL
endpoint:
http://oil-e.vlan400.uvalight.net/rm/sparql?
query=…
  Research infrastructure:
  Sample data from ENVRI+ RIs (EPOS, LTER,
Euro-Argo, etc.)—strictly for demo purposes
(i.e. un-validated) at this time.
  Deployment environment:
  Apache Jena Fuseki
  Results:
  Example queries can be tested by visiting
http://oil-e.vlan400.uvalight.net/.
  Paper: under development.

Current status:
 Beta version
 Paul Martin (p.w.martin@uva.nl), Zhiming Zhao (z.zhao@uva.nl)
 Support type:
 online accessible documentation via http://www.oil-e.net/.
 email support,
 open for new test data and test queries
Communitysupport

Sustainability plan
 The sustainability of the ENVRI RM ontology in OIL-E is partially tied to
the sustainability of ENVRI RM itself.
 Application of OIL-E, e.g. for the ENVRI Knowledge Base, creates a
community of use for ontologies.
 Publication of OIL-E in ontology repositories also increases exposure
and provides limited curation tied to lifespan of repository.
 Role in EOSC
 The capturing of RI design wisdom filtered through the controlled
vocabulary of ENVRI RM guides and directs discussion and
comparison between RI, e-Is and other autonomous agents within
EOSC.
 The bridging of semantic standards via OIL-E for sematic linking
provides a navigational aide for (meta)data interoperability.
Sustainabilityplan

A3. ENVRI Knowledge
Base
Relation to the data lifecycle: all phases
Data for Science service pillar: semantic linking (cross-cutting)
URL: http://oil-e.vlan400.uvalight.net/
Contact: Paul Martin

  Uses Open Information Linking for environmental science research infrastructures (OIL-E).
  Captures information about research infrastructure characteristics and design (“RI design wisdom”),
structured according to ENVRI RM.
  Captures information about technologies and standards used by RIs for key services.
  Provided as an (RDF) knowledge graph, accessible via SPARQL requests over HTTP.
  Permits analysis and comparison of RI characteristics.
  Target users:
  RI architects/developers,
  Investigators into RI design or current RI assets and technologies.
  6, live demonstrator.
  Accessibility:
  SPARQL end-point:
http://oil-e.vlan400.uvalight.net/rm/sparql?format=<format>&query=<query>
  Notebook (example queries):
http://oil-e.vlan400.uvalight.net
  Semantic Web standards: RDF (representation), OWL (inference), SPARQL (query), TTL (data import/
export).
  Any HTTP client.

Current Support:
 All information ingested into the Knowledge Base complies with the
OIL-E ontologies, which capture all archetypes from ENVRI RM v2.1.
 Work in progress
 For all cases where ENVRI RM has been used to model RIs in
ENVRIplus, it is possible to encode that information using OIL-E. It is
intended to upload all such instances into the Knowledge Base by the
end of the project.
 New developments in ENVRI RM v2.2, and in particular extensions
such as the engineering viewpoint, will be reflected in the next
version of OIL-E; this may affect the structure of information already in
the Knowledge Base (though most changes to OIL-E will extend rather
than restructure).
 The Knowledge Base contains additional information about specific
technologies used by RI components modelled using ENVRI RM. In
this respect the Knowledge Base captures technology viewpoint
concepts not formally prescribed by ENVRI RM at present.

Howtouse?
Contribute to
ENVRI KB
RI architects Researchers
Specified RI
using ENVRI
RM?
Want to compare with
other RIs’ designs?
Need a machine-
actionable reference
to RI design?
Consider
modelling your RI
using ENVRI RM
Link data with
ENVRI KB
No need
for KB
Want to use ENVRI
RM vocabulary for
semantic search?
Refer to ENVRI KB
Need semantic
relations between
concepts?
Want to understand
the relationship
between RIs and e-Is?
Wish to compare RI
technologies/
standards?
Want to study
ENVRI RM
examples?
Data providers
Want to improve
visibility of data
collections?
Yes
No
Your
call.

Current status:
 Beta version
 Paul Martin (p.w.martin@uva.nl), Zhiming Zhao (z.zhao@uva.nl)
 Support type:
 online accessible documentation via http://oil-e.vlan400.uvalight.net/.
 email support.
 open for new case studies.
Communitysupport

Sustainability plan
 The ENVRI Knowledge Base should be maintained as part of the ENVRI
community portal.
 At end of project, the usefulness of aggregating design wisdom and
technology landscape for RI should be evaluated and, if positively
received, a recipe for provisioning new knowledge bases for similar cluster
initiatives should be compiled and published.
 Role in EOSC
 Knowledge-driven services will be critical to the fulfilment of the EOSC
vision.
 The ENVRI Knowledge Base provides an example of how architecture/
design level knowledge could be aggregated and made available to
services using OIL-E as the ontological basis.
 A successor service (or cluster of services joined by a single knowledge
bus) can potentially provide great benefit to EOSC by providing a basis for
individual services to self-optimise based on available data.
Sustainabilityplan

A4.Architecture Design
Relation to the data lifecycle: all
Data for Science service pillar: all
URL:
Main contributor: NERC
Role of contributor: consultant
Contact: Keith Jeffery

 Recommendations to RIs for reference architecture
 Derived from D5.1 (requirements and State of the Art)
 Assumes RI have local e-I capability and access to European e-Is
 Target users:
 RI data service operators (provider)
 e-Infrastructure operators (provider)
 RI researchers (users)
 Technology readiness level:
 Architectural components expected to be TRL6-8
 Accessibility:
 Supported standards:
 All relevant standards defined in WP6,7,8,9
 In general ISO and W3C
 Required platform:
 Known bugs:

Current Support:
 Science Viewpoint provides view of business requirements
 Intensive work to align RM with architecture derived from D5.1 in line
with D5.4, D5.5
 Information Viewpoint: information objects defined but may change with
requirements
 Computation Viewpoint: services required defined but may change with
requirements
 Work in progress
 Engineering Viewpoint: working now on relationships and
dependencies between Information and Computation viewpoints
 NOTE: this work is very time-consuming

As a reference for implementation by RIs
 Overall architectural intent
 Components: catalog, common and cross-cutting services
HOWTOUSE?

Use case name:
 all
 Key contributions:
 Provision of Architecture
Recommendations
 Research infrastructure:
 all
 Deployment environment:
 Local RI e-I
 European e-I (e.g. EOSC)
 Results:
Recommendaions
 Depends on:
 Rich metadata Catalog covering
services, data, software, workflows,
computing resouces including
sensors
 Discovery
 Contextualisation
 Curation
 Provenance including versioning
 action
 Common services
 Cross-cutting services
Successstory1

Current status:
 D5.5
 Keith Jeffery: NievadelaHidalgaA@cardiff.ac.uk
 Support type:
 D5.5
 email support,
Communitysupport

Sustainability plan
 It is assumed each RI will implement the architecture
 catalog
 Common services
 Cross-cutting services
 Role in EOSC
 The ENVRIplus architecture provides a blueprint for RI
architectures in EOSC
 At present EOSC-Hub projects seems confused with multiple
catalogs which will make it difficult to implement the architecture
in an integrated fashiom
Sustainabilityplan

B4: Curation - NERC
C2: Linked open data ingestion and metadata service – ICOS/LU
INDEX

B1. Linked open data
ingestion and metadata
service
Relation to the data lifecycle: data identification and citation
Data for Science service pillar: provenance, cataloguing,
identification/citation
URL: https://meta.icos-cp.eu/edit/cpmeta
Main contributor: Lund University/COS Carbon Portal
Contact: Alex Vermeulen

 Machine to machine ingestion of data objects based on simple metadata profile
 Minting of ePIC PIDs, DOIs
 Streaming to trusted repository (iRods, B2SAFE)
 Creates dynamic landing pages based on ontology
 Target users:
 RI data service operators,
 data application developers,
 e-Infrastructure operators.
 7，operational in ICOS Carbon Portal
 Accessibility:
 GitHub (https://github.com/ICOS-Carbon-Portal/meta)
 GPL v3 license
 W3 semantic web
 ISO 19115
 Linux environment.
 Known bugs:

Current Support:
 Science Viewpoint:
 Information Viewpoint:
 Engineering Viewpoint:

Current status:
 Operational at ICOS Carbon Portal
 Oleg Mirzov (oleg.mirzov@nateko.lu.se), Jonathan Thiry
 Support type:
 online accessible documentation
https://github.com/ICOS-Carbon-Portal/meta
 email support,
 open for implementation at other portals
Communitysupport

Sustainability plan
 Integral part of ICOS data portal, will last until at least 2015
 Role in EOSC
 Will be connected to EOSC Hub Competence Center on station
metadata system
 Connected to CDI services
Sustainabilityplan

B2. D4Science Data
Analytics
Relation to the data lifecycle: all (processing)
Data for Science service pillar: processing
URL: https://wiki.gcube-system.org/gcube/Data_Mining_Facilities
Main contributor: National Research Council of Italy
Role of contributor: developer, customizer, service provider
Contact: L. Candela, G. Coro, P. Pagano

  Extensibility with respect to supported algorithms, programming “languages” and models, and
enactment platforms (hybrid model)
  VRE and Open Science friendliness
  Multi-tenancy of the service to deal with VRE designated communities
  Easy publication of available algorithms and executed processes
  Reproducibility-orientation
  Target users:
  Scientists (including data scientists, algorithm developers and providers);
  Service providers (including VRE providers, RI service providers);
  8 - exploited in several domains and contexts (biological sciences, earth and environmental sciences,
agricultural sciences, social sciences and humanities)
  Accessibility:
  Via several VREs, e.g. https://services.d4science.org/group/envriplus
  OGC WPS
  W3C PROV-O
  No one
  a plain web browser is sufficient to exploit it
  the service can be invoked by any WPS client (including WFMS)
  Algorithms can be developed in Java, R, Phyton,
  Known bugs:
  No major one … dedicated platform to collect https://support.d4science.org/

Howtouse?
Use D4Science Data Analytics
Scientist
Service
provider
Data analytics / processing task
(Open Science settings)?
Develop and operate a user-friendly
analytics / processing env.
Already have the algorithm(s) you
need?
Y
N
Develop an analytics /
processing algorithm
Already have a user-friendly
analytics / processing env.?
Already have the computing power
you need?
Promote algorithm availability
and make it (re-)usable
Y
Develop the processing
infrastructure
Deploy the analytics /
processing algorithm
Y
N
Use an algorithm and publish the
results
Data analytics is not in your
current agenda
N
Provide scientists with VREs with data analytics capabilities
Provide scientists with algorithms as-a-Service
Provide scientists with processing as-a-Service
Execute data analytics tasks by VREs

Use case name:
 EISCAT [ / EddyCovariance / LifeWatch ]
 Integration of the processes (Octave based)
 Added value to the original offline processes
 Repeatability-Reusability-Reproducibility
 Easy-to-use interface for new analyses
 Enhanced automatization of the analyses
(possibility to invoke also from the Website)
 EISCAT 3D [ / ICOS / LifeWatch ]
 D4Science (EGI FedCloud)
 Results:
WebTG as a new algorithm of DataMiner
 … TBC
Successstory

Coro, G., Pagano, P., & Ellenbroek, A. (2014). Comparing
heterogeneous distribution maps for marine species. GIScience &
remote sensing, 51(5), 593-611.
 Coro, G., Magliozzi, C., Ellenbroek, A., Kaschner, K., & Pagano, P.
(2016). Automatic classification of climate change effects on marine
species distributions in 2050 using the AquaMaps model.
Environmental and ecological statistics, 23(1), 155-180.
 Coro, G., Pagano, P., & Napolitano, U. (2016). Bridging environmental
data providers and SeaDataNet DIVA service within a collaborative
and distributed e-Infrastructure. Bollettino di Geofisica Teorica ed
Applicata. 57, 23-25.
Externalsuccessstories

Current status:
 Production (by D4Science.org)
https://support.d4science.org/
 Email: leonardo.candela@isti.cnr.it
 Support type:
 Documentation
 Features https://wiki.gcube-system.org/gcube/Data_Mining_Facilities
 Developer’s Guide https://dev.d4science.org/
 Algorithms integration
https://wiki.gcube-system.org/gcube/Statistical_Algorithms_Importer
 Coro G, Panichi G, Scarponi P, Pagano P. Cloud computing in a
distributed e-infrastructure using the web processing service
standard. Concurrency Computat: Pract Exper. 2017;29:e4219.
https://doi.org/10.1002/cpe.4219
 Tickets for requests for enhancements, algorithms integration, use
Communitysupport

B3. Dynamic real-time
infrastructure planner
(DRIP)
Relation to the data lifecycle: data processing and data use
Data for Science service pillar: optimization
URL: https://staff.fnwi.uva.nl/z.zhao/software/drip/
Contact: Zhiming Zhao

 Customize networked virtual machines for applications based on QoS of data
services or applications.
 Automated parallel provisioning for large virtual infrastructures with transparent
network configuration.
 Automated deployment for Dockers with time critical scheduling.
 Application programmable/controllable interfaces (wrapped from infrastructures).
 Target users:
 e-Infrastructure operators.
 6–7，demonstrated on small scale EGI/EUDAT environment (Egi FedCloud)
 Accessibility:
GitHub (https://github.com/QCAPI-DRIP/DRIP-integration/wiki),
 Apache license
 TOSCA,
 OCCI.
 Known bugs:

Current Support:
 Science Viewpoint: Processing Environment Planner (SV processing
community role) A role adopted by an agent that plans how to optimally
manage and execute a data processing activity using RI services and
the underlying e-infrastructure resources (handling sub-activities such
as data staging, data analysis/mining and result retrieval). DRIP
implements this role.
 Information Viewpoint: DRIP is described in a service catalogue using
the service description information object.
 Computational Viewpoint: DRIP is a custom configuration integrating
the coordination service and process controller can be used to
represent DRIP.
 Work in progress
 Engineering Viewpoint: The type of service provided by DRIP is an
example of the type of Processing Service suggested for the EPOS
use case

Howtouse?
Use the DRIP solution
RI service
developers
E-Science
application
developers
E-Infrastructure
operators
Data/computing intensive?
High QoS/QoE
requirements?
Require Cloud
resources?
Offering cloud
resources?
Already have full
software solution?
Already have
provisioning engine for
large Virtual
Infrastructure?
Already have Cloud
engine for time critical
deployment?
Complement with
time critical
planning?
Want an
automated Cloud
resource Solution?
Complement with
time critical
deployment?
Complement with
parallel
provisioning?
Complement with
smart resource
control?
Application
defined
control?
Already have
Time critical
planning?
Y
Y Y
Y
Y Y Y
Y Y Y Y Y
Y
N
N N N N
N
N
DRIP will not be a
direct choice for you.
Looks like
you can do
all DRIP
can!
Y
N
DRIP can
help you
move
towards
Cloud.
N N N N N

Use case name:
 Euro-Argo data subscription service
 Automating the infrastructure for
computing tailored data products
subscribed to by users,
 scheduling subscription tasks based on
possible time constraints
 Euro-Argo
 EGI FedCloud,
 EUDAT
 Results:
 Demo:
https://www.youtube.com/watch?
v=PKU_JcmSskw
 Paper: presented in DataCloud 2017.
Successstory1

Current status:
 Beta version
Spiros Koulouzis, Zhiming Zhao (z.zhao@uva.nl)
 Support type:
https://github.com/QCAPI-DRIP/DRIP-integration/wiki
 email support,
 open for new case studies
Communitysupport

Sustainability plan
 Open source of the code, exploit to the market place of e-
infrastructure
UvA will maintain it and look for other opportunities with RIs to
maintain it
 Encourage RIs/e-I to adopt it
 Role in EOSC
 Infrastructure programming and optimization facilitate for auto-
scalable and quality critical computing
 Be used to automatically bridge the gap between application
workflow and its execution on Cloud
Sustainabilityplan

B4. Data Curation
Relation to the data lifecycle: data curation and all
Data for Science service pillar: curation and all
URL:
Main contributor: NERC
Role of contributor: consultant
Contact: Keith Jeffery

 Recommendations to RIs for curation
 Depends on DMP
 Depends on RI arrangements with local and EC e-Is
 Target users:
 RI data service operators (provider)
 e-Infrastructure operators (provider)
 RI researchers (users)
 local and specialised solutions should be TRL8 or 9
 Accessibility:
 DCC recommendations
 OAIS wherever applicable
 Known bugs:

Current Support:
 Science Viewpoint requires curation
 Intensive work to align RM with curation architecture D8.1 derived from
D5.1 in line with D5.4, D5.5
 Information Viewpoint: information objects defined but may change with
requirements of provenance
 Computation Viewpoint: services required defined but may change with
requirements of provenance
 Work in progress
 Engineering Viewpoint: working now on relationships and
dependencies betwee Information and Computation viewpoints for
curation

Howtouse?
Recommended Curation Solution achieved
RI service
developers
E-Science
application
developers
E-Infrastructure
operators
Data intensive?
Require
Availability?
Require relevance?
Offering
curation
serices
Y
Y
Y
Y
Y
N
Do something else.
Do something else.
N
Follow
recommendations
Y
Match
recommendations?N
Y
DMP
Local RI e-I?
Match
recommendations?

Use case name:
 all
 Provision of Curation
Recommendations
 all
 Local RI e-I
 European e-I
 Results:
Recommendaions
 Depends on:
 DMP (DCC template)
 Local or European e-I provision:
 With appropriate partitioning/
fragmenting, replication
Catalog with rich metadata for:
 Discovery
 Contextualisation
 Including provenance,
versioning,
 action
Successstory1

Current status:
 D8.1
 Keith Jeffery Keith.Jeffery@keithgjefferyconsultants.co.uk
 Support type:
 D8.1;
 email support,
Communitysupport

Sustainability plan
 It is assumed each RI with their DMP has a sustainability plan for
 Information assets
 Software assets
 Service assets
 Role in EOSC
 Curation is of great importance in EOSC and links closely with
cataloguing and proveance
 At present EOSC-Hub projects seems confused with multiple
catalogs which will make it difficult to implement curation in an
integrated fashiom
Sustainabilityplan

B5. Flagship catalogue
Relation to the data lifecycle: curation/publication
Data for Science service pillar: cataloguing
URL: http://eudat6c.dkrz.de/group/envriplus
Available Date: September 2018
Main contributor: Ifremer
Contact: Erwann Quimbert

 Harvest catalogs from RI
 Mapping and validation by producer
 Interface for discover metadata
 Target users:
 Users outside the RI, researching data science,
 Users inside the RI, such as data managers, coordinators, and operators,
 The stakeholders, decision makers
 6–7，demonstrated on EUDAT/B2FIND environment
 Accessibility:
 eudat6c.dkrz.de/group/envriplus
 OAI-PMH,
 CSW
 JSON-API.
 All kind of environment.
 Known bugs:

Current Support:
 Science Viewpoint: As defined in the Reference Model, maintenance of
a catalog is a strategic component of the curation process and the
descriptions maintained in the catalog support the acquisition,
publication and use of data. Flagship catalog implements this role.
 Information Viewpoint: The reference model defines metadata as “Data
about data, in scientific applications is used to describe, explain, locate,
or make it easier to retrieve, use, or manage an information resource.”
 Computational Viewpoint:
 Work in progress
 Engineering Viewpoint: The type of service provided by Flagship
catalogue is an example of the type of Cataloguing Service suggested
by RM

Use case name:
 ANAEE metadata catalog
 Automate metadata collection from RI
(harvesting in OAI-PMH and CSW
protocols)
 Dedicated CSW for ENVRIPlus :
http://w3.avignon.inra.fr/
geoentwork_anaee/csw-envriplus
 ANAEE
 EUDAT/B2FIND
 Results:
 Demo:
http://eudat6c.dkrz.de/group/envriplus
Successstory1 AnaEE metadata catalogue
Dedicated Csw-envriplus

Current status:
 Beta version
 Heinrich Widmann, DKRZ
Erwann Quimbert: Erwann.Quimbert@ifremer.fr
 Support type:
 B2FIND User Documentation:
https://eudat.eu/services/userdoc/b2find
 B2FIND Training presentations:
https://www.eudat.eu/b2find-training-suite
 B2FIND hands-on training:
https://github.com/EUDAT-Training/B2FIND-Training
 email support
Communitysupport

Sustainability plan
 Be adopted by RIs
 Role in EOSC
 Overarching data catalogue, which will contribute to the EOSC
catalogue
Sustainabilityplan

B6. Provenance
Data for Science service pillar: provenance (reference model,
semantic linking)
URL: https://wiki.envri.eu/display/EC/WIKI+for+Semantics+
and+Provenance+services
Available Date: October 2018
Main contributor: EAA
Role of contributor: modeller/recommender
Contact: Barbara Magagna

 Provenance integrated in each viewpoint and life cycle phases of ENVRI RM
 OIL-E extended by PROV (model family) and mappings to other standards like
CERIF, provenance patterns integrated in ENVRI knowledge base
 Wiki for use cases, provenance patterns, recommended tools for provenance
 Implementation case demonstrating combination of provenance related services
 Target users:
 RI data service operators
 data application developers
 e-Infrastructure operators
 researchers
 6–7，demonstrated on EUDAT/B2FIND environment
 Accessibility:
https://wiki.envri.eu/display/EC/WIKI+for+Semantics+
and+Provenance+services (not yet publicly accessible)
 W3C PROV-O
 CERIF
 All kind of environment
 Known bugs:

Work in progress:
 Science Viewpoint: Roles and Behaviours including specific activity
diagrams for all provenance patterns by defining steps and artefacts for
these processes
 Information Viewpoint: Information Objects and Action Types for
modelling data and workflow provenance
 Computational Viewpoint: Computational Objects with operational
interfaces for providing or invoking provenance functionalities and with
stream interfaces.
 Engineering Viewpoint: This will provide a whole provenance
framework description with specific services such as provenance
collecting/tracking services, annotation service, storing service,
visualization service, provenance query service
 Technology Viewpoint: Technologies and standards in use

Howtouse?
Use/extend
recommended
provenance
tools.
RI architects Semantic modellers
Do
something
else
Add new
provenance
pattern Check
provenance
pattens
Tool developers
Yes
No
Do
something
else
Check related
provenance
pattern. Solution
applicable in RI
context?
RI-requirements
matchable to
provenance use
cases?
Want to augment provenance
patterns with your solution?
Want to extend
ENVRI RM activity
diagrams with your
approach?
Having a specific
provenance solution?
Add new
provenance
use cases
Want to contribute
in new use case
description?
Feed
ENVRI
knowledge
base
Want to develop tools for
provenance management?

Current status:
 Work in progress
 Barbara Magagna (Barbara.Magagna@umweltbundesamt.at)
 Support type:
 Wiki descriptions:
https://wiki.envri.eu/display/EC/WIKI+for+Semantics+
and+Provenance+services
- Generalised RI requirements modelled as use cases
- Provenance patterns (contributing to RDA WG on prov patterns)
- Recommended tools and provenance frameworks (workflow
management systems supporting provenance collection)
- Description of implementation case involving amongst others EUDAT
services (B2Share/B2Note), ORCID and existing provenance
collection tools
 email support
 open for new use case and provenance patterns
Communitysupport

Sustainability plan
 Involved in RDA WG on Provenance Patterns to avoid duplicate efforts
and ensure up to date research
 Role in EOSC
 Providing comprehensive provenance management insights on the
whole data life cycle with recommendation on specific tools and
services at different granularities, which will be of great benefit for
EOSC
Sustainabilityplan

A. Reference model based approaches
B1: Identification and citation– ICOS/LU
B4: Curation - NERC
C2: Pipeline for semantic annotation of relational DB – ANAEE/INRA
C3: Data / metadata generation from semantic annotations- ANAEE/INRA
C4: Dynamic ecological information management system (DEIMS)- LTER/EAA
C5: Biodiversity Community Portal (LifeWatch/LTER)- EAA
INDEX

C1. Data Subscription
Service (DSS)
Relation to the data lifecycle: data prosessing and data use
Data for Science service pillar: Processing and Optimization
URL: /
Main contributor: CSC (EUDAT)
Role of contributor: e-infrastructure
Contact: Chris Ariyo

 Interface for subscribing to and notifying of identified research data
objects
 Automated processing of queries data on any cloud system
 Target users:
 e-Infrastructure operators,
 researchers.
 6–7，demonstrated on small scale environment
 Accessibility:
OpenAPI v2/3
 Known bugs:

Current Support:
 Science Viewpoint: Data Use Subsystem. A role supporting the
access of users to an infrastructure. DSS implements this role.
 Information Viewpoint: DSS is described in a service catalogue using
the service description information object. In addition, the
subscription actions and objects are described respectively by IV
actions and IV information objects. The objects and actions are
identified by IV object identifiers.
 Computational Viewpoint: DSS is a custom configuration integrating the
data broker and coordination service.
 Work in progress
 Engineering Viewpoint: The type of service provided by DSS is
engineered in an agile approach with EuroArgo RI.

Howtouse?
Use DSS
RI service
developers
E-Science
application
developers
E-Infrastructure
operators
Y
Automating frequent actions
on data (previously) requiring
human monitoring of results?
(Near) Real-time result
requirements?
Y
DSS might not be a
direct choice for you.
Y
Research data objects
and actions uniquely
identified and resolvable?
N
Resources available
to integrate a UI to
DSS?
Y
Required service
portfolio integration
feasible?
Y
N
N

Use case name:
 Euro-Argo data subscription service
 Check and notify when new data
matching the subscription found
 Initiate processing on a cloud
 Euro-Argo
 EUDAT,
 University of Amsterdam,
 EGI FedCloud
 Results:
 Demo:
https://www.youtube.com/watch?
v=PKU_JcmSskw
 Paper: presented in DataCloud 2017.
Successstory1

Current status:
 Tested in EGI/EUDAT
 Contact
 Support type:
 Email
Communitysupport

Sustainability plan
Part of EUDAT services
Role in EOSC
Sustainabilityplan

C2. Pipeline for semantic
annotation of relational DB
and triples generation
Relation to the data lifecycle: data processing and data use
Data for Science service pillar: processing, provenance
Main contributor: INRA
Contact: Christian Pichot

  Pipeline for a) the semantic OBOE-based annotation of data managed in (postgreSQL) relational DB
and b) the generation of rdf triples.
  Steps: graph modeling (yEd), data annotation/ triples generation (ontop), triples inferences (corese),
SPARQL endpoint (BlazeGraph)
  Genericity through RBD connection parameters and a variable pattern approach.
  Target users:
  RI data scientists and data managers,
  e-Infrastructure semantic operators for pipeline deployment
  6–7 demonstrated and operational on AnaEE-France environment (OBOE-based ontology & postgres
RDB)
  Accessibility:
  Still under development for genericty extension
  Open Source
  Semantic Web W3C
  Linux environment, java
  Known bugs:

Howtouse?
variable semantic
description
Ontology
(OBOE-based)
RDB
raw data
odba mapping
Dat
a
sci
ent
ist
graph pattern
yEd based
processing
Dat
aB
as
e
ma
na
ger
End Point
Semantic
portals
raw data
raw data with
inferered triples
Metadata
generation
Data set
generation

C3. Data / metadata
generation from semantic
annotations
Relation to the data lifecycle: data publication
Data for Science service pillar: cataloguing,
identification/citation
Main contributor: INRA
Contact: Christian Pichot

  A-Generation of ISO19139 metadata records from rdf triples.
  Steps: 1) convertion of OBOE-based triples to DCAT-AP and 2) from DCAT-AP to ISO. This second step can
be re-used alone.
  B- Generation/identification of datasets from raw data OBOE-based RDF triples.
  Steps : 1) data perimeter delimitation (from metadata), 2) identification of dataset dimensionalities 3) Data file
(NETCDF) generation and 4) DOI generation
  Target users:
  RI metadata and data managers and publishers
  e-Infrastructure semantic operators
3–4 under development on AnaEE-France environment (OBOE-based ontology & postgres RDB)
  Accessibility:
  Development stage
  Open Source
  Supported standards and formats:
  Semantic Web W3C, ISO19115/19139, NetCDF, DataCite
  Linux environment, java
  Known bugs:

Howtouse? ISO19119
EML?
Datasets
prod/identif.
& public. (DOI)
A
B
R
D
F
m
et
ad
at
a
R
D
F
ra
w
da
ta
R
DF
O
B
O
E
me
tad
ata
Ontology specific
(OBOE for AnaEE)
API
(XSLT
)Geo
DCA
T
metad
ata
produ
cer
Generic
Semantic annotation
of resources
UI
application
perimeter
delimitation
metad
ata
produ
cer
OBOE
to DCAT
O
B
O
E
me
tad
ata
rec
or
d
UI
application
metadata record
selection
19139

Howtouse? ISO19119
EML?
Datasets
prod/identif.
& public. (DOI)
A
B
R
D
F
m
et
ad
at
a
R
D
F
ra
w
da
ta
R
DF
O
B
O
E
me
tad
ata
Ontology and
pipelilne specific
API
Data
mana
ger
&
publis
her
OBOE specific
Semantic annotation
of resources
UI
application
perimeter
delimitation
Data
mana
ger
&
publis
her
R
DF
O
B
O
E
ra
w
UI
application
data set
selection
annot
ation
pipelin
e
RDF data
genration

C4. Dynamic ecological
information management
system (DEIMS)
Relation to the data lifecycle: data publication
Data for Science service pillar: cataloguing
URL: https://data.lter-europe.net/deims/
Main contributor: EAA
Contact: Christoph Wohner

 Standardised documentation of research sites, datasets, data products and sensors
 Integration with GEOSS
 Exposition of data through standardised services (CSW, WFS, WMS, …)
 Target users:
 (environmental) scientists
 RI data managers,
 Potentially also data application developers that build their services on top of
DEIMS-SDR
 8–9，deployed on dedicated LTER Europe infrastructure
 Accessibility:
 Code available GitHub (multiple repositories due to modular nature)
 For sites: Inspire EF,
 For datasets: ISO 19139, ISO 19115, EML, BDP.
 For sensors: sensorML (beta version)
 Known bugs:

Current Support:
 Science Viewpoint: Roles and Behaviours (data discovery) as well as
activity diagrams describing the process of inclusion and
documentation of observation facilities
 Information Viewpoint: Information objects such as metadata catalogue
and all information action types dealing with metadata registration
 Computational Viewpoint: catalogue service as computational object
and related interfaces
 Work in progress
 Engineering Viewpoint: different service components provided by the
portal

Howtouse?
Use DEIMS-SDR
RI user
Observation facilities
documented?
Discover Site
Get persistent
identification
Documentation of
observation facility
Y
Y
Y Y
Y
N
DEIMS-SDR will be the
choice for you
Datasets
documented?
Discover Dataset
DEIMS-SDR not
needed
Researcher

Use case name:
 DEIMS-SDR Catalogue Interoperability
Generic documentation of observation and
experimentation facilities and linking to
resulting datasets
 Dynamic EF XML generation: e.g.
https://data.lter-europe.net/deims/node/
8611/emf
 CSW for datasets:
https://data.lter-europe.net/pycsw/csw.py?
service=CSW&version=2.0.2&request=GetC
apabilities
 LTER Europe / ILTER
 DEIMS-SDR (Drupal)
 Link to EUDAT/B2FIND under development
 Results:
https://data.lter-europe.net/deims/
Successstory1 DEIMS-SDR Site
Catalogue
Exchange of site and dataset
metadata
Generates INSPIRE EF
XML Records
Usable in external applications

Discovery
geoportal /
geonetwork
DEIMS-SDR
ISO19139EML/BDP CKAN
dataset
dataset, data
product, site
Dataset, data
product, site
WMS, WFS, WCS
Visualisation
(e.g. map)
Export (XML, OAI-PMH, json)
INSPIRE EF
service tbd
Service (e.g. pyCSW)
harvest
harvest
METACAT
Discovery
B2FIND / CKAN
site
Site, Network,
Person, Dataset,
Data product DEOS ID
register
export
Discovery
geoportal / geonetwork
Visualisation
(e.g. map)
Service (e.g. pyCSW)
Discovery
B2FIND / CKAN
SITE AND DATASET DISCOVERY
INFORMATION EXCHANGE
Persistent Site Identifier

Current status:
 Production version
 Christoph Wohner (christoph.wohner@umweltbundesamt.at)
 Support type:
https://data.lter-europe.net/deims/tutorial
https://data.lter-europe.net/deims/documentation
 email support,
 Feedback and support system on DEIMS-SDR
Communitysupport

DEIMS-SDR development institutionalised in LTER-Europe and
ILTER
 Additional funding and person months through projects (currently
H2020 project “eLTER” and H2020 project “EUDAT”)
 Role in EOSC
 This portal will help to foster collaboration and to share data
which is of great importance in EOSC and links closely with
cataloguing
Sustainabilityplan

A. Reference model based approaches
B1: Identification and citation– ICOS/LU
B4: Curation - NERC
INDEX

D1. ENVRIplus
ServiceTestbed
basedonEGICloudCompute
Data for Science service pillar: Storage, Computing, Networking
and other e-Infrastructure services
URL: https://www.egi.eu/services/cloud-compute/
Main contributor: EGI Foundation
Role of contributor: e-Infrastructure Service Provider
Contact: Baptiste Grenier

  Execute compute- and data-intensive workloads (both batch and interactive)
  Host long-running services (e.g. web servers, databases or applications servers)
  Create disposable testing and development environments
  Configure Virtual Machines (VMs) according to requirements
  Resources: CPU, memory, disk
  Application environments
  Scale infrastructure and manage resources in a flexible way
  Integrated monitoring and accounting capabilities
  Target users:
ENVRIplus RI research communities
ENVRIplus individual researchers
ENVRIplus Service Providers
ENVRIplus related SME/Industry
  TRL 9
  Accessibility:
https://wiki.egi.eu/wiki/Federated_Cloud_user_support#Getting_started
  Open Standard interfaces: OCCI, CDMI
  OpenStack interfaces
  Supported deployment artefacts:
  Virtual Machine (VM) images, docker containers, packages, archives, scripts…

Work in progress
 Technology Viewpoint: the provision of ENVRIplus service testbed
corresponds to RM Technology Viewpoint, that provide a real-world
configuration to support testing and validation of ENVRIplus services

Howtouse?
EGI Cloud Compute Service (+ Container, HTC, Data and Storage services)
ENVRIplus
Individual
Researchers
ENVRIplus
RIs
Have Data/computing
intensive solutions?
Need online/archive
storage?
Need computing
resources?
Need resources
(testbed)
API
Command
Line
Interface
Web
ENVRIplus
Service
Providers
Application
on Demand
Service
Need online
applications
Need service hosting?
Support distributed
users?

Current status:
 Production
 EGI Foundation Support Team: support@egi.eu
 Support type:
 Online user guide:
https://wiki.egi.eu/wiki/Federated_Cloud_user_support
 Helpdesk: EGI Helpdesk ticketing system
 Training: https://wiki.egi.eu/wiki/Training
 Request Service: https://www.egi.eu/request-service/
Communitysupport

Sustainability plan
 Production service maintained by EGI Federation
 Role in EOSC
 Key e-Infrastructure services in EOSC-Hub
 EOSC-Hub workshop Wednesday morning with Tiziana Ferrari
Sustainabilityplan

Biodiversity Community
Portal
Relation to the data lifecycle: data acquisition, curation,
publication
Data for Science service pillar: Identification/Citation, Curation,
Cataloguing & Provenance
URL: not yet publicly available
Available Date: following a consensus process
Main contributor: LifeWatch & LTER-Europe
Contact: Nicola Fiore & Barbara Magagna

  A central registry for semantic resources (e.g. ontologies, thesauri, reference lists codified in skos) used in the
ecological and biodiversity domain allowing users to identify and select them for specific tasks, as well as
offering generic services to exploit them in search, annotation or other scientific data management processes.
  functionalities such as browsing and different types of visualisation of the content, mapping between the
resources, automatic translation of labels if available, annotation services
  Target users:
  RI semantic modellers (providers),
  e-Infrastructure operators (providers),
  RI Researchers (users)
  6–7，demonstrated on small scale LIFEWATCH/LTER environment
  Accessibility:
http://193.204.79.100/
  SKOS
  OWL
  No one
  a plain web browser is sufficient to exploit it
  Known bugs:
  Not yet tested

Current Support:
 Science viewpoint: Roles and Behaviours (semantic harmonisation,
select or build conceptual model) including specific activity diagrams
for all supplier/user interaction with the portal by defining steps and
artefacts for these processes
 Information viewpoint: concept and conceptual model, mapping rule as
information objects, annotate metadata, build concept models, do data
mining as information action types
 Computational viewpoint: semantic laboratory, semantic broker,
annotation service as computation object
 Work in progress
  Engineering viewpoint: different service components provided by the
portal

Howtouse?
Use the Biodiversity Community Portal
RI
Researcher
Do you want to annotate
‘Experimental’ and
‘Observation’ Data?
Look for a
Vocabulary?
Look for a Term? Interact with
semantic
marketplace
Looking for
equivalent Terms
Evaluate the
content on the
exiting vocabulary
Y Y
Y Y
Y
Do
something
else
Do you want to learn
from semantic
resources but need
help to understand?
Y
RI Semantic
Modeller
Want to share your
semantic resources?
Y
Y
N
N
Y

Current status:
 Beta version
 LifeWatch Service Centre (nicola.fiore@unisalento.it; helpdesk
contacts coming soon)
 Support type:
 email support
 open to collect new semantic resources
 The portal offers a semantic marketplace for exchange information
between supplier and user of semantic resources
Communitysupport

•  actual system proven in operational environment (competitive manufacturing in the
case of key enabling technologies; or in space)TRL9
•  system complete and qualifiedTRL 8
•  system prototype demonstration in operational environmentTRL 7
•  technology demonstrated in relevant environment (industrially relevant
environment in the case of key enabling technologies)TRL 6
•  technology validated in relevant environment (industrially relevant environment in
the case of key enabling technologies)TRL 5
•  technology validated in labTRL 4
•  experimental proof of conceptTRL 3
•  technology concept formulatedTRL 2
•  Basic principles observedTRL 1
TECHNOLOGYREADINESSLEVEL(TRL)
FromECwebsite

Online service portfolio will be accessible via
 https://wiki.envri.eu/display/EC/ENVRIplus+Service+Portfolios
 Welcome to contact us:
 General comments: z.zhao@uva.nl
 Use case or technical questions: individual service contact
SUMMARY

Data for Science Service Portfolio

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (12)

Similar a Data for Science Service Portfolio

Similar a Data for Science Service Portfolio (20)

Más de EUDAT

Más de EUDAT (20)

Último

Último (20)

Data for Science Service Portfolio