The periodical National Climate Assessment (NCA) of the US Global Change Research Program (USGCRP) [1] produces reports about findings of global climate change and the impacts of climate change on the United States. Those findings are of great public and academic concerns and are used in policy and management decisions, which make the provenance information of findings in those reports especially important. The USGCRP is developing a Global Change Information System (GCIS), in which the NCA reports and associated provenance information are the primary records.
We were modeling and developing Semantic Web applications for the GCIS. By applying a use case-driven iterative methodology [2], we developed an ontology [3] to represent the content structure of a report and the associated provenance information. We also mapped the classes and properties in our ontology into the W3C PROV-O ontology [4] to realize the formal presentation of provenance. We successfully implemented the ontology in several pilot systems for a recent National Climate Assessment report (i.e., the NCA3). They provide users the functionalities to browse and search provenance information with topics of interest. Provenance information of the NCA3 has been made structured and interoperable by applying the developed ontology. Besides the pilot systems we developed, other tools and services are also able to interact with the data in the context of the “Web of data” and thus create added values.
Our research shows that the use case-driven iterative method bridges the gap between Semantic Web researchers and earth and environmental scientists and is able to be deployed rapidly for developing Semantic Web applications. Our work also provides first-hand experience for re-using the W3C PROV-O ontology in the field of earth and environmental sciences, as the PROV-O ontology is recently ratified (on 04/30/2013) by the W3C as a recommendation and relevant applications are still rare.
[1] http://www.globalchange.gov
[2] Fox, P., McGuinness, D.L., 2008. TWC Semantic Web Methodology. Accessible at: http://tw.rpi.edu/web/doc/TWC_SemanticWebMethodology
[3] https://scm.escience.rpi.edu/svn/public/projects/gcis/trunk/rdf/schema/GCISOntology.ttl
[4] http://www.w3.org/TR/prov-o/
Ontology Development for Provenance Tracing in National Climate Assessment of the US Global Change Research Program
1. TWC
AGU Fall Meeting 2013, San Francisco, CA
Ontology Development for Provenance Tracing in
National Climate Assessment of
the US Global Change Research Program
Xiaogang Ma a, Jin Guang Zheng a, Justin Goldstein b,c,
Linyun Fu a, Brian Duggan b,c, Patrick West a,
Jun Xu a, Chengcong Du a, Anusha Akkiraju a
Steve Aulenbach b,c, Curt Tilmes c,d, Peter Fox a
a
Tetherless World Constellation, Rensselaer Polytechnic Institute; b University
Corporation for Atmospheric Research; c U.S. Global Change Research
Program; d NASA Goddard Space Flight Center
2. TWC
Background
•
United States Global Change Research Program (USGCRP): An
interagency program that coordinates and integrates Federal research on
changes in the global environment and their implications for society
•
National Climate Assessment (NCA): An assessment conducted under the
auspices of the Global Change Research Act of 1990, which requires a report
to the President and the Congress every four years that evaluates, integrates
and interprets the findings of the USGCRP with the intent to advance an
inclusive and sustained process for assessing and communicating scientific
knowledge of the impacts, risks and vulnerabilities associated with a changing
global climate in support of decision making across the United States
•
Global Change Information System (GCIS): An information system under
development through the USGCRP that establishes data interfaces and
interoperable repositories of climate and global change data which can be
easily and efficiently accessed, integrated with other data sets, maintained
over time and expanded as needed into the future
From: The National Global Change Research Plan 2012 - 2021
2
3. TWC
Collaborators
National Science and
Technology Council (NSTC)
Committee on Environment,
Natural Resources and
Sustainability (CENRC)
White House Office
of Science and
Technology Policy
(OSTP)
Subcommittee on Global
Change Research (SGCR)
U.S. Global Change Research
Program (USGCRP)
GCIS: Information Model
and Semantic Application
Prototypes (GCIS-IMSAP)
Global Change
Information
System (GCIS)
National Climate
Assessment
(NCA)
National Climate Assessment
Development Advisory
Committee (NCADAC)
3
4. TWC
What we do
• Ongoing: provenance* for the NCA3** report
• Future: provenance of publications, datasets,
models, organizations, instruments, experiments,
people, etc. eventually covering the entire scope
of global change
* Provenance - Information about entities, activities, people and
organizations involved in the production of the research findings and the
supporting datasets and methods (cf. Moreau and Missier, 2013)
** NCA3 - The National Climate Assessment Development Advisory
Committee (NCADAC) engaged more than 240 authors in the creation
of the third NCA (NCA3) report, which is to be released in early 2014
4
6. TWC
Remote sensing sensors, platforms, and
instruments are used in global change research
Image source: Yang et al., 2013.
Nature Climate Change
6
7. TWC
An example question of provenance tracing:
What are NASA contributionsPast, Present, in the draft NCA3? NCA3
“Figure 1.2: Sea Level Rise: to Figure 1.2 and Future” in draft
7
8. TWC
Ontology Development for Provenance Tracing
in the third National Climate Assessment
The third National Climate
The third National Climate
Assessment Report (NCA3)
Assessment Report (NCA3)
Provenance – Information about
Provenance – Information about
entities, activities, people and
entities, activities, people and
organizations involved in the
organizations involved in the
production of the research
production of the research
findings and the supporting
findings and the supporting
datasets and methods
datasets and methods
Ontology – In this work the
Ontology – In this work the
ontology (GCIS ontology) is a
ontology (GCIS ontology) is a
conceptual model of classes,
conceptual model of classes,
properties and instances that
properties and instances that
can be used to capture
can be used to capture
provenance information in the
provenance information in the
NCA3
NCA3
Image courtesy of nature.com
8
9. TWC
Method: a use case-driven
iterative approach
Source: Fox and McGuiness, 2008. http://tw.rpi.edu/web/doc/TWC_SemanticWebMethodology
9
10. TWC
Identifies:
Identifies:
•goals/objectives to be accomplished
•goals/objectives to be accomplished
•resources to be used to achieve these objectives
•resources to be used to achieve these objectives
•methods to be used to produce the desired results
•methods to be used to produce the desired results
A template for documenting use cases:
A template for documenting use cases:
http://tw.rpi.edu/media/2013/07/25/ae99/UseCase_Tem
http://tw.rpi.edu/media/2013/07/25/ae99/UseCase_Tem
plate_SeS.doc
plate_SeS.doc
Source: Fox and McGuiness, 2008. http://tw.rpi.edu/web/doc/TWC_SemanticWebMethodology 10
11. TWC
A facilitator:
A facilitator:
•sets and monitors direction
•sets and monitors direction
•provides guidance for scoping the use case
•provides guidance for scoping the use case
•milestones for implementation
•milestones for implementation
Team formation: domain experts, data and information
Team formation: domain experts, data and information
producers, knowledge and information modelers,
producers, knowledge and information modelers,
software engineers, and a scribe.
software engineers, and a scribe.
Source: Fox and McGuiness, 2008. http://tw.rpi.edu/web/doc/TWC_SemanticWebMethodology 11
12. TWC
In GCIS-IMSAP works we used:
In GCIS-IMSAP works we used:
•Group meeting: Titanpad, Skpye, GotoMeeting
•Group meeting: Titanpad, Skpye, GotoMeeting
•Conceptual modeler: CMapTools
•Conceptual modeler: CMapTools
•Ontology editor: Protege, Notepad++
•Ontology editor: Protege, Notepad++
•Ontology documentation: LODE, Parrot
•Ontology documentation: LODE, Parrot
•Evolution environmens: TopBraid
•Evolution environmens: TopBraid
•Validator/Browser: ELDA, S2S
•Validator/Browser: ELDA, S2S
Source: Fox and McGuiness, 2008. http://tw.rpi.edu/web/doc/TWC_SemanticWebMethodology 12
13. TWC
Provenance-explicit use cases
The first use case
•
•
•
Title: Visit data center website of dataset used to generate a report
figure
Actor and system: a reader of the draft NCA3 on the GCIS website
Flow of interactions: A reader wishes to identify the source of the data
used to produce a particular figure in the draft NCA3. A reference to
the paper in which the image contained in this figure was originally
published appears in the figure caption. Clicking that reference
displays a page of metadata information about the paper, including
links to the datasets used in that paper. Pursuing each of those links
presents a page of metadata information about the dataset, including
a link back to the agency/data center web page describing the
dataset in more detail and making the actual data available for order
or download.
13
15. TWC
An intuitive concept map of the use case
Classes and properties recognized from the use case
15
16. TWC
An intuitive concept map of the use case
From an intuitive model to an ontology:
(1)A defined class or property should be meaningful and robust
enough to meet the requirements of various use cases
(2)An ontology can be extended by adding classes and properties
Classes and properties recognized from the use case
recognized from new use cases through the iterative approach
16
17. TWC
The second use case
•
•
•
•
Title: Identify roles of people in the generation of a chapter in the draft
NCA3
Actor and system: a viewer of the GCIS website
Flow of interactions: A viewer sees that Chapter 6 (Agriculture) in the
draft NCA3 was written by a group of authors mentioned in a list. On
the title page of that chapter the reader can view the role of each
author, e.g., convening lead author, lead author or contributing
author, in the generation of this report chapter.
We decided to use the PROV-O ontology to describe this use case
17
18. TWC
The three Starting Point classes
in PROV-O ontology and the
properties that relate them
Source: http://www.w3.org/TR/prov-o/
18
19. TWC
Mapping the use case
into PROV-O
Author of
Chapter 6
Chapter 6
in NCA3
isA
isA
Writing of
isA
Chapter 6
in NCA3
19
20. TWC
Roles of agents in an
activity in PROV-O
Source: http://www.w3.org/TR/prov-o/
20
21. TWC
Mapping roles of chapter
authors into PROV-O
isA Author of
Chapter 6
Writing of
Chapter 6
in NCA3
isA
Convening
lead author
Lead author
isA
Contributing
author
21
22. TWC
Roles of people in
the activity ‘Writing
of Chapter 6’
Here only three of
the eight authors
of this chapter are
shown. Each
author had a
specific role for
this chapter.
23. TWC
We used PROV-O for describing roles of agents in an activity
We can also describe roles of agents for an entity
23
24. TWC
Roles of people to
the entity ‘Chapter
6: Agriculture’
Here only three of
the eight authors
of this chapter are
shown. Each
author had a
specific role for
this chapter.
24
26. TWC
Re-using existing ontologies for the GCIS ontology
By such mappings we can use reasoners that are suitable for the PROV-O
ontology, and thus to retrieve provenance graphs from the established GCIS
26
27. TWC
The third use case
•
•
•
Title: Provenance tracing of NASA contributions to Figure 1.2 in the
draft NCA3
Actor and system: a viewer of the GCIS website
Flow of interactions: A viewer sees that the caption of Figure 1.2 “Sea
Level Rise: Past, Present and Future” of the draft NCA3 cites four
data sources. Selecting the third citation displays a page of
information about the cited paper and a citation to the dataset used in
that paper. Information about the dataset includes a formal
description of its origin, that is, the dataset is derived from data
produced by the TOPEX/Poseidon and Jason altimeter missions
funded by NASA and CNES. Clicking a link to each of these missions
presents a page about the platforms, instruments and sensors in that
mission.
27
29. TWC
(a) Instances of
calibration, model and
software underpinning
“paper/103”
Here only the details of one
paper (i.e., “paper/103”) cited
by that figure are shown
Here only the details of
Topex-Poseidon mission
are shown
(b) Instances of sensor,
instrument and platform
underpinning that
paper
Provenance tracing of NASA contributions to Figure 1.2 in draft NCA3
29
33. TWC
Current result
• GCIS ontology version 1.1
–
–
–
–
http://tw.rpi.edu/web/project/gcis-imsap/GCISOntology
Ontology documentation
Conceptual map
gcis ontology rpi
Ontology RDF
• We have had and will have more use cases, and
• New versions of GCIS ontologies
33
34. TWC
Current result:
GCIS ontology version 1.1
GCIS
ontology
version
1.1
(a) Classes and
properties
representing a
brief structure of
the draft NCA3
37. TWC
A few classes are asserted as
sub-classes of “prov:Entity” and
“prov:Activity”, respectively
37
38. TWC
Wrap up
• The use case-driven iterative method bridges the gap between
Semantic Web researchers and Earth and environmental
scientists
– It is capable of rapid deployment for Semantic Web application
developments
• First-hand experience for re-using the W3C PROV-O
ontology in the field of Earth and environmental sciences
• GCIS will enrich the GCIS ontology in its provenance
tracing capability, eventually for covering provenance
information for the entire scope of global change
• Collaboration for a PROV-ES ontology for Earth and
environmental sciences
38
USGCRP began as a presidential initiative in 1989 and was mandated by Congress in the Global Change Research Act of 1990 (P.L. 101-606), Thirteen departments and agencies participate in the USGCRP, The program is steered by the Subcommittee on Global Change Research under the Committee on Environment and Natural Resources, overseen by the Executive Office of the President, and facilitated by a National Coordination Office.
GCIS will collect and link records of publications, datasets, organizations, methods, people, etc. eventually covering
Let’s check an example to see what is provenance tracing?
The figure in the last slide was not drawn without foundation
It is based on datasets and analyses, and in turn the datasets are collected by instruments and the analyses used methods and models, etc.
A reader may be interested to see information about how the image in this figure is generated, from what data, who produced the image, etc…
How to realize the provenance tracing?
Details about a few steps or components in the approach
A use case describes an objective that a primary actor wants to accomplish and the sequence of interactions between the primary actor and a system such that the primary actor's objective is successfully achieved
Roles within the team include domain experts or stakeholders with background knowledge of the overlying topic pertaining to a use case, data and information producers familiar with where to access and/or produce the resources required, knowledge and information modelers who will analyze components and processes in the use case and draw conceptual schemas for them, and software engineers who will collaborate with the modelers to leverage existing capabilities and develop prototype applications.
Ontology engineering differs from database schema modeling in that ontologies do not pertain to specific use cases
Using “gcis:Publication” instead of “gcis:Paper” in the result of the first use case reflects the former aspect. For the latter, we developed and analyzed more use cases to enrich the ontology
Ontology engineering differs from database schema modeling in that ontologies do not pertain to specific use cases
Using “gcis:Publication” instead of “gcis:Paper” in the result of the first use case reflects the former aspect. For the latter, we developed and analyzed more use cases to enrich the ontology
Ontology engineering differs from database schema modeling in that ontologies do not pertain to specific use cases
Using “gcis:Publication” instead of “gcis:Paper” in the result of the first use case reflects the former aspect. For the latter, we developed and analyzed more use cases to enrich the ontology
By such mappings we can utilize reasoners (i.e., software tools able to complete logical inferences from a set of asserted facts) that are suitable for the PROV-O ontology, and thus to retrieve provenance graphs from the established GCIS
We saw this figure at the beginning of this presentation. So, now what we can do with the provenance tracing?
(a) Instances of calibration, model and software underpinning “paper/103” and (b) Instances of sensor, instrument and platform underpinning that paper.
(a) Instances of calibration, model and software underpinning “paper/103” and (b) Instances of sensor, instrument and platform underpinning that paper.