SlideShare a Scribd company logo
1 of 14
Download to read offline
Climate Science for a
Sustainable Energy Future
(CSSEF) Provenance
ERIC STEPHAN
Pacific Northwest National Laboratory
Richland, WA



December 26, 2012                       1
Provenance Definitions

!   Provenance is a record that describes the people, institutions, entities,
    and activities, involved in producing, influencing, or delivering a piece
    of data or a thing.
     https://dvcs.w3.org/hg/prov/raw-file/tip/presentations/wg-overview/overview/index.html


!   Metadata used to describe the origin of the data and any of its
    modifications.

!   A log of historical events describing the origin of data and any
    subsequent changes.




December 26, 2012                                                                             2
Popular Provenance Vocabularies

                                                      Dublin	
  Core	
  Provenance	
  Task	
  Force	
  



                                                      Open	
  Provenance	
  Model	
  



                                                      Proof	
  Markup	
  Language	
  Ontology	
  




                                                      The	
  Provenance	
  Ontology	
  (Prov-­‐O)	
  


See	
  Also:	
  
W3C	
  Incubator	
  Group,	
  h8p://www.w3.org/2005/Incubator/prov/wiki/W3C_Provenance_Incubator_Group_Wiki	
  
                                                                                                              3
The Systems Science Challenge
!   Studying	
  complex	
  systems	
  typically	
  has	
  the	
  
    following	
  characterisEcs:	
  	
  
     !    Interdisciplinary	
  studies	
  involve	
  mulEple	
  stakeholders	
  	
  
     !    Leverage	
  mulEple	
  tools,	
  algorithms,	
  data	
  products,	
  and	
  
          sensors	
  
     !    Reliant	
  on	
  highly	
  iteraEve	
  and	
  repeEEve	
  techniques	
  
     !    Steps	
  are	
  difficult	
  to	
  document	
  and	
  are	
  oLen	
  Eme	
  
          commiMed	
  to	
  memory	
  or	
  notes.	
  
!   Sharing	
  complex	
  systems	
  data	
  between	
  
    collaborators	
  has	
  the	
  following	
  inherent	
  problems	
  
     !    To	
  establish	
  data	
  confidence,	
  scienEsts	
  accessing	
  data	
  
          (consumers)	
  need	
  to	
  know	
  data	
  origin	
  and	
  modificaEon	
  
          history	
  (data	
  provenance).	
  	
  	
  
     !    ScienEsts	
  producing	
  the	
  data	
  need	
  a	
  consistent	
  means	
  to	
  
          convey	
  data	
  provenance	
  to	
  targeted	
  scienEfic	
  communiEes	
  
            !   	
  the	
  data	
  provenance	
  needs	
  to	
  be	
  diverse	
  enough	
  to	
  
                  support	
  any	
  data.	
  
            !   It	
  must	
  also	
  be	
  based	
  on	
  community	
  standards	
  to	
  
                  cross-­‐reference	
  searches	
  	
  
                                                                                                    4
Example: Motivating User Questions About
the CSSEFARMBE Diagnostics Dataset



                                                        How	
  did	
  both	
  
                                                        CSSEFARMBE	
  
                                                         and	
  ARMBE	
  
                     How	
  do	
  CAM	
                   originate?	
  
                         output	
  
                     Variables	
  map	
  
                         to	
  the	
  
                     CSSEFARMBE	
  
                       variables?	
  

                                     What	
  
                                  addiEonal	
                                    Atmosphere	
  
                                   ancillary	
  
                                informaEon	
  is	
                               ScienEst	
  
                               available	
  about	
  
                                 this	
  dataset?	
  

CAM	
  Modeler	
  




December 26, 2012                                                                        5
The Knowledge Gap: CSSEF Users Needing
 Additional Answers from Data Producers

                                        Test	
  
                                        NCL	
                                                    CF	
  
                                                                                 read	
        Terms	
  
                                        Code	
   wrote	
                                                            ARMBE	
  
                                                                                                     compared	
     Header	
  
                                             CAM	
      read	
  
                                             Web	
                                      wrote	
  
                                             Page	
                                                  CSSEF	
  
                                                               CSSEFARMBE	
  Developers	
           ARMBE	
  
                                                                                                    Header	
  
                                                                                 wrote	
  

                     How	
  do	
  CAM	
  
                                                                            Tech	
  
                         output	
  
                     Variables	
  map	
                                    Report	
  
                         to	
  the	
  
                     CSSEFARMBE	
                                                                                          How	
  did	
  both	
  
                       variables?	
                                                                                        CSSEFARMBE	
  
                                                                                                                            and	
  ARMBE	
  
                                     What	
                                                                                  originate?	
  
                                  addiEonal	
  
                                   ancillary	
  
                                informaEon	
  is	
  
                               available	
  about	
  
                                 this	
  dataset?	
  

CAM	
  Modeler	
                                                                                                                                    Atmosphere	
  
 December 26, 2012                                                                                                                                         6
                                                                                                                                                    ScienEst	
  
Goals of CSSEF Provenance
    Environment (ProvEn) Services

!   Identify future user communities that will need provenance while the data
    is being generated by scientists producing the data

!   Knowledge products (e.g reports, archivable provenance records)

!   Create consumer oriented provenance products by:
    !   Capturing historical information from any native source necessary to describe
        the origin of the dataset.

    !   For user referential purposes retaining a copy of the native source familiar to
        the domain community.




    December 26, 2012                                                                     7
Goals of CSSEF Provenance
 Environment (ProvEn) Services
!   Store this information in a cross-referenced knowledge model by mapping
    domain ontology to foundational ontology
     !   Domain ontologies are diverse and subject to constant changes defined by the
         concepts extracted from native sources.
     !   Foundational ontologies are stable and seldom change.

!   Use composite knowledge model to provide finished products to different
    kinds of consumers
     !   Stability infers lots of methodologies, tools and, services are available to
         leverage.

 FoundaGonal	
  Ontology	
                                      Cross-­‐Reference	
  Capability	
  
  W3C	
  Provenance	
  Ontology	
  (Prov-­‐O)	
                 Core	
  Ontology	
  Describing	
  Data	
  Origin	
  
  Dublin	
  Core	
  Terms	
                                     Data	
  citaEons	
  and	
  soLware	
  
  Friend	
  of	
  a	
  Friend	
  (FOAF)	
                       DescripEon	
  of	
  ScienEst	
  	
  and	
  collaborators	
  
  (Future)	
  Proof	
  Markup	
  Language	
  3.0	
              DescripEon	
  of	
  jusEficaEon	
  and	
  trust	
  
  (Future)	
  Dublin	
  Core	
  to	
  PROV-­‐O	
  Mapping	
     Support	
  integraEon	
  of	
  DC	
  provenance	
  and	
  PROV-­‐O	
  
     December 26, 2012                                                                                                           8
Identifying a New Product with Native Sources,
Domain Concepts and Terms for dataset

                                     CSSEF	
                                   ARMBE	
  
                                    ARMBE	
                                    Header	
  
                                    Header	
  
   Tech	
   ObservaEonal	
  Data	
                                                          ObservaEonal	
  Data	
  
  Report	
   Origin	
  Concepts	
                                                           Origin	
  Concepts	
  




                                                          Test	
  
                                                          NCL	
  
                                                          Code	
  
                                         IdenEfied	
  Variable	
  Mapping	
  
                                             Concepts	
  and	
  Terms	
  
                                                       CF	
  
                                                      Terms	
  
CAM	
  
  IdenEfied	
  Variable	
  Mapping	
  
Web	
  
Page	
   Concepts	
  and	
  Terms	
  
December 26, 2012                                                                                             9
Creating and Maintaining Domain
Ontologies (Knowledge Engineer)
  Atmosphere	
  
  DiagnosEcs	
                                          Add	
         Atmosphere	
  
 Dataset	
  Origin/                                                     Domain	
  
    Mapping	
                                                          Ontology	
  
   Terms	
  and	
  
    Concepts	
  
                             (Build	
  Ontology)	
  


                                                    Aligned	
  
                                                  Knowledge	
  	
  
                                                    Model	
  	
               Register	
  
                                                      For	
  
                                                  Atmosphere	
  
                      (Align	
  Ontologies)	
  
 FoundaEonal	
  
  Ontologies	
  
                                                                        ProvEn	
  Services	
  
December 26, 2012                                                                            10
Creating new Product By Populating ProvEn
     Services with CSSEFARMBE Dataset Native
     Sources                 CSSEF	
  
                                                                      ARMBE	
  
CSSEFARMBE	
                                             Tech	
  	
   Header	
   ARMBE	
  
knowledge	
  relevant	
  	
                             Report	
   Test	
        Header	
      NaEve	
  Sources	
  	
  
to	
  CAM	
  Modeler	
  and	
                                          NCL	
  
                                                      CAM	
                            CF	
   contributed	
  by	
  
Atmosphere	
  ScienEst	
                                               Code	
  
                                                      Web	
                          Terms	
   Developers	
  
                                                      Page	
  
                                                                                                                  CSSEFARMBE	
  Developers	
  


                                                      NaEve	
  Source	
  Concept	
  ExtracEon	
  
               ProvEn	
  Services	
  




                                        NaEve	
  Provenance	
  Mapped	
                                Copy	
  of	
  
                                          to	
  Atmosphere	
  Domain	
                              Corresponding	
  
                                                   Ontology	
                 NaEve	
  	
  
                                                                                                    NaEve	
  Sources	
  
                                                                              Source	
  
                                        Aligned	
  Knowledge	
  Model	
       References	
  
                                             for	
  Atmosphere	
  	
  

                                         FoundaEonal	
  Ontologies	
  
     December 26, 2012                                                                                                                11
Producing ProvEn Services Product:
CSSEFARMBE Dataset Origin Report


            ProvEn	
  Services	
  Store	
  

                                                                                      What	
  
                                                                                   addiEonal	
  

      NaEve	
  Provenance	
  Mapped	
                                               ancillary	
  
                                                                                 informaEon	
  is	
  
                                                                                available	
  about	
  
        to	
  Atmosphere	
  Domain	
                                              this	
  dataset?	
  

                 Ontology	
                                CAM	
  Modeler	
  



       Aligned	
  Knowledge	
  Model	
  
            for	
  Atmosphere	
  	
           Standard	
  Vocabulary	
  
                                              Cross-­‐Reference	
  	
                                    How	
  did	
  both	
  
        FoundaEonal	
  Ontologies	
           Searching	
  and	
  Reasoning	
  
                                                                                                         CSSEFARMBE	
  
                                                                                                          and	
  ARMBE	
  
                                                                                                           originate?	
  




                                                                                                                                  Atmosphere	
  
                                                                                                                                  ScienEst	
  




 December 26, 2012                                                                                                                             12
ProvEn Services Architecture
Store	
  NaEve	
  	
   Query	
  and	
  Cross-­‐Reference	
  	
  
Provenance	
   Provenance	
  



                                                                                               ESGF	
  Node	
  

       ProvEn	
  (Jersey)	
  REST	
  Services	
  
   Ali	
  Baba	
  Object	
        Searching	
  and	
  
     to	
  RDF	
  	
  API	
      Inferencing	
  API	
                                             Local	
  
                                                                                                Compute	
  
                    Glassfish	
  Server	
                           Portable	
  
                                                                                                 Cluster	
  
                                                                   Jarfile	
  
                                                                                  Deploy	
  

                        Sesame	
  Store	
  

                                                                                                 UVCDAT	
  


December 26, 2012                                                                                            13
Questions?


!   Contact: eric.stephan@pnnl.gov




14

More Related Content

Similar to Climate Science for a Sustainable Energy Future Provenance

Compatible One - Open Cloud
Compatible One - Open CloudCompatible One - Open Cloud
Compatible One - Open CloudeNovance
 
CompatibleOne @ OpenWorldForum 2011
CompatibleOne @ OpenWorldForum 2011CompatibleOne @ OpenWorldForum 2011
CompatibleOne @ OpenWorldForum 2011CompatibleOne
 
Compatibleone @ OpenStack In Action
Compatibleone @ OpenStack In Action Compatibleone @ OpenStack In Action
Compatibleone @ OpenStack In Action CompatibleOne
 
Open stackinaction compatibleone 09212011
Open stackinaction compatibleone  09212011Open stackinaction compatibleone  09212011
Open stackinaction compatibleone 09212011CompatibleOne
 
7th OA Conference - Nov 2005 - Opening Library Access - Standard Data Interfa...
7th OA Conference - Nov 2005 - Opening Library Access - Standard Data Interfa...7th OA Conference - Nov 2005 - Opening Library Access - Standard Data Interfa...
7th OA Conference - Nov 2005 - Opening Library Access - Standard Data Interfa...Tim55Ehrler
 
Arc 300-3 ade miller-en
Arc 300-3 ade miller-enArc 300-3 ade miller-en
Arc 300-3 ade miller-enlonegunman
 
HPC Storage and IO Trends and Workflows
HPC Storage and IO Trends and WorkflowsHPC Storage and IO Trends and Workflows
HPC Storage and IO Trends and Workflowsinside-BigData.com
 
Ultimate journey towards realtime data platform with 2.5M events per sec
Ultimate journey towards realtime data platform with 2.5M events per secUltimate journey towards realtime data platform with 2.5M events per sec
Ultimate journey towards realtime data platform with 2.5M events per secb0ris_1
 
Developer's Most Frequent Hadoop Headaches & How to Address Them__HadoopSumm...
Developer's Most Frequent Hadoop Headaches &  How to Address Them__HadoopSumm...Developer's Most Frequent Hadoop Headaches &  How to Address Them__HadoopSumm...
Developer's Most Frequent Hadoop Headaches & How to Address Them__HadoopSumm...Yahoo Developer Network
 
Content Management Standards
Content Management StandardsContent Management Standards
Content Management StandardsDavid Nuescheler
 
Verifying Architectural Design Rules of a Flight Software Product Line
Verifying Architectural Design Rules of a Flight Software Product LineVerifying Architectural Design Rules of a Flight Software Product Line
Verifying Architectural Design Rules of a Flight Software Product LineDharmalingam Ganesan
 
Is the current model of load testing broken ukcmg - steve thair
Is the current model of load testing broken   ukcmg - steve thairIs the current model of load testing broken   ukcmg - steve thair
Is the current model of load testing broken ukcmg - steve thairStephen Thair
 
Performance Benchmarking of Clouds Evaluating OpenStack
Performance Benchmarking of Clouds                Evaluating OpenStackPerformance Benchmarking of Clouds                Evaluating OpenStack
Performance Benchmarking of Clouds Evaluating OpenStackPradeep Kumar
 
An intro to explainable AI for polar climate science
An intro to  explainable AI for  polar climate scienceAn intro to  explainable AI for  polar climate science
An intro to explainable AI for polar climate scienceZachary Labe
 
Measuring Nexsan Performance and Compatibility in Virtualized Environments
Measuring Nexsan Performance and Compatibility in Virtualized EnvironmentsMeasuring Nexsan Performance and Compatibility in Virtualized Environments
Measuring Nexsan Performance and Compatibility in Virtualized EnvironmentsSuministros Obras y Sistemas
 
Altair HTC 2012 Optistruct Training
Altair HTC 2012 Optistruct TrainingAltair HTC 2012 Optistruct Training
Altair HTC 2012 Optistruct TrainingAltair
 
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...Jan Aerts
 
Ease Corporate Overview (PSL)
Ease Corporate Overview (PSL)Ease Corporate Overview (PSL)
Ease Corporate Overview (PSL)470media
 

Similar to Climate Science for a Sustainable Energy Future Provenance (20)

Compatible One - Open Cloud
Compatible One - Open CloudCompatible One - Open Cloud
Compatible One - Open Cloud
 
CompatibleOne @ OpenWorldForum 2011
CompatibleOne @ OpenWorldForum 2011CompatibleOne @ OpenWorldForum 2011
CompatibleOne @ OpenWorldForum 2011
 
Compatibleone @ OpenStack In Action
Compatibleone @ OpenStack In Action Compatibleone @ OpenStack In Action
Compatibleone @ OpenStack In Action
 
Open stackinaction compatibleone 09212011
Open stackinaction compatibleone  09212011Open stackinaction compatibleone  09212011
Open stackinaction compatibleone 09212011
 
7th OA Conference - Nov 2005 - Opening Library Access - Standard Data Interfa...
7th OA Conference - Nov 2005 - Opening Library Access - Standard Data Interfa...7th OA Conference - Nov 2005 - Opening Library Access - Standard Data Interfa...
7th OA Conference - Nov 2005 - Opening Library Access - Standard Data Interfa...
 
Arc 300-3 ade miller-en
Arc 300-3 ade miller-enArc 300-3 ade miller-en
Arc 300-3 ade miller-en
 
HPC Storage and IO Trends and Workflows
HPC Storage and IO Trends and WorkflowsHPC Storage and IO Trends and Workflows
HPC Storage and IO Trends and Workflows
 
Ultimate journey towards realtime data platform with 2.5M events per sec
Ultimate journey towards realtime data platform with 2.5M events per secUltimate journey towards realtime data platform with 2.5M events per sec
Ultimate journey towards realtime data platform with 2.5M events per sec
 
Developer's Most Frequent Hadoop Headaches & How to Address Them__HadoopSumm...
Developer's Most Frequent Hadoop Headaches &  How to Address Them__HadoopSumm...Developer's Most Frequent Hadoop Headaches &  How to Address Them__HadoopSumm...
Developer's Most Frequent Hadoop Headaches & How to Address Them__HadoopSumm...
 
Content Management Standards
Content Management StandardsContent Management Standards
Content Management Standards
 
techsumm
techsummtechsumm
techsumm
 
Verifying Architectural Design Rules of a Flight Software Product Line
Verifying Architectural Design Rules of a Flight Software Product LineVerifying Architectural Design Rules of a Flight Software Product Line
Verifying Architectural Design Rules of a Flight Software Product Line
 
Is the current model of load testing broken ukcmg - steve thair
Is the current model of load testing broken   ukcmg - steve thairIs the current model of load testing broken   ukcmg - steve thair
Is the current model of load testing broken ukcmg - steve thair
 
Performance Benchmarking of Clouds Evaluating OpenStack
Performance Benchmarking of Clouds                Evaluating OpenStackPerformance Benchmarking of Clouds                Evaluating OpenStack
Performance Benchmarking of Clouds Evaluating OpenStack
 
An intro to explainable AI for polar climate science
An intro to  explainable AI for  polar climate scienceAn intro to  explainable AI for  polar climate science
An intro to explainable AI for polar climate science
 
Measuring Nexsan Performance and Compatibility in Virtualized Environments
Measuring Nexsan Performance and Compatibility in Virtualized EnvironmentsMeasuring Nexsan Performance and Compatibility in Virtualized Environments
Measuring Nexsan Performance and Compatibility in Virtualized Environments
 
Altair HTC 2012 Optistruct Training
Altair HTC 2012 Optistruct TrainingAltair HTC 2012 Optistruct Training
Altair HTC 2012 Optistruct Training
 
DA_MAP
DA_MAPDA_MAP
DA_MAP
 
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
A Tovchigrechko - MGTAXA: a toolkit and webserver for predicting taxonomy of ...
 
Ease Corporate Overview (PSL)
Ease Corporate Overview (PSL)Ease Corporate Overview (PSL)
Ease Corporate Overview (PSL)
 

Recently uploaded

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024The Digital Insurer
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CVKhem
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAndrey Devyatkin
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesBoston Institute of Analytics
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 

Recently uploaded (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
HTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation StrategiesHTML Injection Attacks: Impact and Mitigation Strategies
HTML Injection Attacks: Impact and Mitigation Strategies
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 

Climate Science for a Sustainable Energy Future Provenance

  • 1. Climate Science for a Sustainable Energy Future (CSSEF) Provenance ERIC STEPHAN Pacific Northwest National Laboratory Richland, WA December 26, 2012 1
  • 2. Provenance Definitions !   Provenance is a record that describes the people, institutions, entities, and activities, involved in producing, influencing, or delivering a piece of data or a thing. https://dvcs.w3.org/hg/prov/raw-file/tip/presentations/wg-overview/overview/index.html !   Metadata used to describe the origin of the data and any of its modifications. !   A log of historical events describing the origin of data and any subsequent changes. December 26, 2012 2
  • 3. Popular Provenance Vocabularies Dublin  Core  Provenance  Task  Force   Open  Provenance  Model   Proof  Markup  Language  Ontology   The  Provenance  Ontology  (Prov-­‐O)   See  Also:   W3C  Incubator  Group,  h8p://www.w3.org/2005/Incubator/prov/wiki/W3C_Provenance_Incubator_Group_Wiki   3
  • 4. The Systems Science Challenge !   Studying  complex  systems  typically  has  the   following  characterisEcs:     !  Interdisciplinary  studies  involve  mulEple  stakeholders     !  Leverage  mulEple  tools,  algorithms,  data  products,  and   sensors   !  Reliant  on  highly  iteraEve  and  repeEEve  techniques   !  Steps  are  difficult  to  document  and  are  oLen  Eme   commiMed  to  memory  or  notes.   !   Sharing  complex  systems  data  between   collaborators  has  the  following  inherent  problems   !  To  establish  data  confidence,  scienEsts  accessing  data   (consumers)  need  to  know  data  origin  and  modificaEon   history  (data  provenance).       !  ScienEsts  producing  the  data  need  a  consistent  means  to   convey  data  provenance  to  targeted  scienEfic  communiEes   !    the  data  provenance  needs  to  be  diverse  enough  to   support  any  data.   !   It  must  also  be  based  on  community  standards  to   cross-­‐reference  searches     4
  • 5. Example: Motivating User Questions About the CSSEFARMBE Diagnostics Dataset How  did  both   CSSEFARMBE   and  ARMBE   How  do  CAM   originate?   output   Variables  map   to  the   CSSEFARMBE   variables?   What   addiEonal   Atmosphere   ancillary   informaEon  is   ScienEst   available  about   this  dataset?   CAM  Modeler   December 26, 2012 5
  • 6. The Knowledge Gap: CSSEF Users Needing Additional Answers from Data Producers Test   NCL   CF   read   Terms   Code   wrote   ARMBE   compared   Header   CAM   read   Web   wrote   Page   CSSEF   CSSEFARMBE  Developers   ARMBE   Header   wrote   How  do  CAM   Tech   output   Variables  map   Report   to  the   CSSEFARMBE   How  did  both   variables?   CSSEFARMBE   and  ARMBE   What   originate?   addiEonal   ancillary   informaEon  is   available  about   this  dataset?   CAM  Modeler   Atmosphere   December 26, 2012 6 ScienEst  
  • 7. Goals of CSSEF Provenance Environment (ProvEn) Services !   Identify future user communities that will need provenance while the data is being generated by scientists producing the data !   Knowledge products (e.g reports, archivable provenance records) !   Create consumer oriented provenance products by: !   Capturing historical information from any native source necessary to describe the origin of the dataset. !   For user referential purposes retaining a copy of the native source familiar to the domain community. December 26, 2012 7
  • 8. Goals of CSSEF Provenance Environment (ProvEn) Services !   Store this information in a cross-referenced knowledge model by mapping domain ontology to foundational ontology !   Domain ontologies are diverse and subject to constant changes defined by the concepts extracted from native sources. !   Foundational ontologies are stable and seldom change. !   Use composite knowledge model to provide finished products to different kinds of consumers !   Stability infers lots of methodologies, tools and, services are available to leverage. FoundaGonal  Ontology   Cross-­‐Reference  Capability   W3C  Provenance  Ontology  (Prov-­‐O)   Core  Ontology  Describing  Data  Origin   Dublin  Core  Terms   Data  citaEons  and  soLware   Friend  of  a  Friend  (FOAF)   DescripEon  of  ScienEst    and  collaborators   (Future)  Proof  Markup  Language  3.0   DescripEon  of  jusEficaEon  and  trust   (Future)  Dublin  Core  to  PROV-­‐O  Mapping   Support  integraEon  of  DC  provenance  and  PROV-­‐O   December 26, 2012 8
  • 9. Identifying a New Product with Native Sources, Domain Concepts and Terms for dataset CSSEF   ARMBE   ARMBE   Header   Header   Tech   ObservaEonal  Data   ObservaEonal  Data   Report   Origin  Concepts   Origin  Concepts   Test   NCL   Code   IdenEfied  Variable  Mapping   Concepts  and  Terms   CF   Terms   CAM   IdenEfied  Variable  Mapping   Web   Page   Concepts  and  Terms   December 26, 2012 9
  • 10. Creating and Maintaining Domain Ontologies (Knowledge Engineer) Atmosphere   DiagnosEcs   Add   Atmosphere   Dataset  Origin/ Domain   Mapping   Ontology   Terms  and   Concepts   (Build  Ontology)   Aligned   Knowledge     Model     Register   For   Atmosphere   (Align  Ontologies)   FoundaEonal   Ontologies   ProvEn  Services   December 26, 2012 10
  • 11. Creating new Product By Populating ProvEn Services with CSSEFARMBE Dataset Native Sources CSSEF   ARMBE   CSSEFARMBE   Tech     Header   ARMBE   knowledge  relevant     Report   Test   Header   NaEve  Sources     to  CAM  Modeler  and   NCL   CAM   CF   contributed  by   Atmosphere  ScienEst   Code   Web   Terms   Developers   Page   CSSEFARMBE  Developers   NaEve  Source  Concept  ExtracEon   ProvEn  Services   NaEve  Provenance  Mapped   Copy  of   to  Atmosphere  Domain   Corresponding   Ontology   NaEve     NaEve  Sources   Source   Aligned  Knowledge  Model   References   for  Atmosphere     FoundaEonal  Ontologies   December 26, 2012 11
  • 12. Producing ProvEn Services Product: CSSEFARMBE Dataset Origin Report ProvEn  Services  Store   What   addiEonal   NaEve  Provenance  Mapped   ancillary   informaEon  is   available  about   to  Atmosphere  Domain   this  dataset?   Ontology   CAM  Modeler   Aligned  Knowledge  Model   for  Atmosphere     Standard  Vocabulary   Cross-­‐Reference     How  did  both   FoundaEonal  Ontologies   Searching  and  Reasoning   CSSEFARMBE   and  ARMBE   originate?   Atmosphere   ScienEst   December 26, 2012 12
  • 13. ProvEn Services Architecture Store  NaEve     Query  and  Cross-­‐Reference     Provenance   Provenance   ESGF  Node   ProvEn  (Jersey)  REST  Services   Ali  Baba  Object   Searching  and   to  RDF    API   Inferencing  API   Local   Compute   Glassfish  Server   Portable   Cluster   Jarfile   Deploy   Sesame  Store   UVCDAT   December 26, 2012 13
  • 14. Questions? !   Contact: eric.stephan@pnnl.gov 14