This document discusses provenance and the goals of the CSSEF Provenance Environment (ProvEn) services. ProvEn aims to capture provenance from native data sources, map it to domain ontologies, and provide finished products to help users understand the origin and processing of complex climate science datasets. It seeks to address questions from users about datasets by integrating provenance from multiple sources and making it accessible through standards-based searches.
Boost PC performance: How more available memory can improve productivity
Climate Science for a Sustainable Energy Future Provenance
1. Climate Science for a
Sustainable Energy Future
(CSSEF) Provenance
ERIC STEPHAN
Pacific Northwest National Laboratory
Richland, WA
December 26, 2012 1
2. Provenance Definitions
! Provenance is a record that describes the people, institutions, entities,
and activities, involved in producing, influencing, or delivering a piece
of data or a thing.
https://dvcs.w3.org/hg/prov/raw-file/tip/presentations/wg-overview/overview/index.html
! Metadata used to describe the origin of the data and any of its
modifications.
! A log of historical events describing the origin of data and any
subsequent changes.
December 26, 2012 2
3. Popular Provenance Vocabularies
Dublin
Core
Provenance
Task
Force
Open
Provenance
Model
Proof
Markup
Language
Ontology
The
Provenance
Ontology
(Prov-‐O)
See
Also:
W3C
Incubator
Group,
h8p://www.w3.org/2005/Incubator/prov/wiki/W3C_Provenance_Incubator_Group_Wiki
3
4. The Systems Science Challenge
! Studying
complex
systems
typically
has
the
following
characterisEcs:
! Interdisciplinary
studies
involve
mulEple
stakeholders
! Leverage
mulEple
tools,
algorithms,
data
products,
and
sensors
! Reliant
on
highly
iteraEve
and
repeEEve
techniques
! Steps
are
difficult
to
document
and
are
oLen
Eme
commiMed
to
memory
or
notes.
! Sharing
complex
systems
data
between
collaborators
has
the
following
inherent
problems
! To
establish
data
confidence,
scienEsts
accessing
data
(consumers)
need
to
know
data
origin
and
modificaEon
history
(data
provenance).
! ScienEsts
producing
the
data
need
a
consistent
means
to
convey
data
provenance
to
targeted
scienEfic
communiEes
!
the
data
provenance
needs
to
be
diverse
enough
to
support
any
data.
! It
must
also
be
based
on
community
standards
to
cross-‐reference
searches
4
5. Example: Motivating User Questions About
the CSSEFARMBE Diagnostics Dataset
How
did
both
CSSEFARMBE
and
ARMBE
How
do
CAM
originate?
output
Variables
map
to
the
CSSEFARMBE
variables?
What
addiEonal
Atmosphere
ancillary
informaEon
is
ScienEst
available
about
this
dataset?
CAM
Modeler
December 26, 2012 5
6. The Knowledge Gap: CSSEF Users Needing
Additional Answers from Data Producers
Test
NCL
CF
read
Terms
Code
wrote
ARMBE
compared
Header
CAM
read
Web
wrote
Page
CSSEF
CSSEFARMBE
Developers
ARMBE
Header
wrote
How
do
CAM
Tech
output
Variables
map
Report
to
the
CSSEFARMBE
How
did
both
variables?
CSSEFARMBE
and
ARMBE
What
originate?
addiEonal
ancillary
informaEon
is
available
about
this
dataset?
CAM
Modeler
Atmosphere
December 26, 2012 6
ScienEst
7. Goals of CSSEF Provenance
Environment (ProvEn) Services
! Identify future user communities that will need provenance while the data
is being generated by scientists producing the data
! Knowledge products (e.g reports, archivable provenance records)
! Create consumer oriented provenance products by:
! Capturing historical information from any native source necessary to describe
the origin of the dataset.
! For user referential purposes retaining a copy of the native source familiar to
the domain community.
December 26, 2012 7
8. Goals of CSSEF Provenance
Environment (ProvEn) Services
! Store this information in a cross-referenced knowledge model by mapping
domain ontology to foundational ontology
! Domain ontologies are diverse and subject to constant changes defined by the
concepts extracted from native sources.
! Foundational ontologies are stable and seldom change.
! Use composite knowledge model to provide finished products to different
kinds of consumers
! Stability infers lots of methodologies, tools and, services are available to
leverage.
FoundaGonal
Ontology
Cross-‐Reference
Capability
W3C
Provenance
Ontology
(Prov-‐O)
Core
Ontology
Describing
Data
Origin
Dublin
Core
Terms
Data
citaEons
and
soLware
Friend
of
a
Friend
(FOAF)
DescripEon
of
ScienEst
and
collaborators
(Future)
Proof
Markup
Language
3.0
DescripEon
of
jusEficaEon
and
trust
(Future)
Dublin
Core
to
PROV-‐O
Mapping
Support
integraEon
of
DC
provenance
and
PROV-‐O
December 26, 2012 8
9. Identifying a New Product with Native Sources,
Domain Concepts and Terms for dataset
CSSEF
ARMBE
ARMBE
Header
Header
Tech
ObservaEonal
Data
ObservaEonal
Data
Report
Origin
Concepts
Origin
Concepts
Test
NCL
Code
IdenEfied
Variable
Mapping
Concepts
and
Terms
CF
Terms
CAM
IdenEfied
Variable
Mapping
Web
Page
Concepts
and
Terms
December 26, 2012 9
10. Creating and Maintaining Domain
Ontologies (Knowledge Engineer)
Atmosphere
DiagnosEcs
Add
Atmosphere
Dataset
Origin/ Domain
Mapping
Ontology
Terms
and
Concepts
(Build
Ontology)
Aligned
Knowledge
Model
Register
For
Atmosphere
(Align
Ontologies)
FoundaEonal
Ontologies
ProvEn
Services
December 26, 2012 10
11. Creating new Product By Populating ProvEn
Services with CSSEFARMBE Dataset Native
Sources CSSEF
ARMBE
CSSEFARMBE
Tech
Header
ARMBE
knowledge
relevant
Report
Test
Header
NaEve
Sources
to
CAM
Modeler
and
NCL
CAM
CF
contributed
by
Atmosphere
ScienEst
Code
Web
Terms
Developers
Page
CSSEFARMBE
Developers
NaEve
Source
Concept
ExtracEon
ProvEn
Services
NaEve
Provenance
Mapped
Copy
of
to
Atmosphere
Domain
Corresponding
Ontology
NaEve
NaEve
Sources
Source
Aligned
Knowledge
Model
References
for
Atmosphere
FoundaEonal
Ontologies
December 26, 2012 11
12. Producing ProvEn Services Product:
CSSEFARMBE Dataset Origin Report
ProvEn
Services
Store
What
addiEonal
NaEve
Provenance
Mapped
ancillary
informaEon
is
available
about
to
Atmosphere
Domain
this
dataset?
Ontology
CAM
Modeler
Aligned
Knowledge
Model
for
Atmosphere
Standard
Vocabulary
Cross-‐Reference
How
did
both
FoundaEonal
Ontologies
Searching
and
Reasoning
CSSEFARMBE
and
ARMBE
originate?
Atmosphere
ScienEst
December 26, 2012 12
13. ProvEn Services Architecture
Store
NaEve
Query
and
Cross-‐Reference
Provenance
Provenance
ESGF
Node
ProvEn
(Jersey)
REST
Services
Ali
Baba
Object
Searching
and
to
RDF
API
Inferencing
API
Local
Compute
Glassfish
Server
Portable
Cluster
Jarfile
Deploy
Sesame
Store
UVCDAT
December 26, 2012 13