SlideShare a Scribd company logo
1 of 27
Download to read offline
Janus
             Provenance

                                                    Janus:
                     from Workflows to Semantic Provenance
                                     and Linked Open Data

      Paolo Missier                  Jun Zhao                  Satya S. Sahoo
      Carole Goble                                               Amit Sheth
University of Manchester, UK   University of Oxford, UK   Wright State University, USA




                                    IPAW’ 10
                                     Troy, NY
                                 June 15-16, 2010
Key ideas
• Janus:
       – a semantic provenance model with domain-specific
         extensions
       – designed around the Taverna workflow model

• From domain-agnostic provenance graphs
• To domain-aware graphs through explicit
  annotations

• From local provenance graphs and queries
  scoped to the graph
• To
       – Graphs published as Linked Data
       – Queries extended into the Web of Data


                                                                2
Janus -- IPAW, Troy, NY, June 15-17, 2010
Example workflow (Taverna)


 chr: 17                                       QTL →
 start: 28500000
 end: 3000000                               Ensembl Genes



                               Ensembl Gene →           Ensembl Gene →
                                 Uniprot Gene             Entrez Gene



                                Uniprot Gene →           Entrez Gene →
                                  Kegg Gene               Kegg Gene


                                             merge gene IDs



                                            Gene → Pathway           path:mmu04210 Apoptosis,
                                                                     path:mmu04010 MAPK, ...



Janus -- IPAW, Troy, NY, June 15-17, 2010
Baseline provenance of a workflow run
                              QTL →                                mmu:12575
                           Ensembl Genes
                                                                               v1     ...   vn               w

        Ensembl Gene →                       Ensembl Gene →
          Uniprot Gene                         Entrez Gene
                                                                           path:mmu04012
                                                               exec
         Uniprot Gene →                      Entrez Gene →                     a1     ...   an         b1    ...   bm
           Kegg Gene                          Kegg Gene
                                                                                                       mmu:26416

                            merge gene IDs




                                                                      path:mmu04010   y11                    ymn
                          Gene → Pathway
                                                                                                 ...

                                                              path:mmu04010→derives_from→mmu:26416
                                                              path:mmu04012→derives_from→mmu:12575

              • The graph encodes all direct data dependency relations
              • Baseline query model: compute paths amongst sets of nodes
                • Transitive closure over data dependency relations
                                                                                                                   4
Janus -- IPAW, Troy, NY, June 15-17, 2010
Reference user questions

                           QTL →
                        Ensembl Genes                       Q0: Find all intermediate and initial input
                                                            values that contribute to the computation of a
                                                            certain output value.
     Ensembl Gene →                       Ensembl Gene →
       Uniprot Gene                         Entrez Gene

                                                            Q1. Find all those genes within the input QTL
      Uniprot Gene →                        Entrez Gene →
                                                            region that are involved in a given KEGG
        Kegg Gene                            Kegg Gene      pathway.
                         merge gene IDs
                                                            Q2: Find all Uniprot-sourced genes

                       Gene → Pathway




 Q3: Find all Entrez genes that encode proteins involved in ATP binding (go:
 0005524).

 Q4: List relevant PubMed publications for the pathways listed in the result set.



                                                                                                     5
Janus -- IPAW, Troy, NY, June 15-17, 2010
Query types and model capabilities

                       Query formulation effort       Annotation requirements        Query Scope

   Q0              - Requires knowledge of
                     process structure and
                     data values
                                                      No annotations required      Single run graph or
                                                                                    Multi-run graphs
                   - Graphical query
                   constructor may be
                   available
   Q1 Q2
                                                   Requires domain annotations
                   Use of domain terms                                             Single run graph or
                                                   on workflow tasks and on data
                   facilitates query formulation                                    Multi-run graphs
                                                   values

   Q3 Q4
                   - Use of domain terms           - Requires domain annotations
                     facilitates query               on workflow tasks and on
                     formulation.                    data values
                                                                                    The Web of Data
                   - Can be integrated with        - Relies on completeness of
                   browsers for LoD sources        Linked Data Sources


                                                                                               6
Janus -- IPAW, Troy, NY, June 15-17, 2010
Query types and model capabilities

                       Query formulation effort       Annotation requirements        Query Scope

   Q0              - Requires knowledge of
                     process structure and
                     data values
                                                      No annotations required      Single run graph or
                                                                                    Multi-run graphs
                   - Graphical query
                   constructor may be
                   available
   Q1 Q2
                                                   Requires domain annotations
                   Use of domain terms                                             Single run graph or
                                                   on workflow tasks and on data
                   facilitates query formulation                                    Multi-run graphs
                                                   values

   Q3 Q4
                   - Use of domain terms           - Requires domain annotations
                     facilitates query               on workflow tasks and on
                     formulation.                    data values
                                                                                    The Web of Data
                   - Can be integrated with        - Relies on completeness of
                   browsers for LoD sources        Linked Data Sources


                                                                                               6
Janus -- IPAW, Troy, NY, June 15-17, 2010
Janus: baseline model of provenance
• The semantic provenance model is an OWL ontology
       – defined for domain-agnostic provenance graphs
       – naturally extensible to domain concepts


• extends the Provenir upper ontology [*]
       – Itself an extension of the Basic Formal Ontology (BFO)
           • abstract concepts include data, process, and agent
       – Provenir adds 11 types of relationships:
           • partonomy relations
           • temporal information
           • precedence
           • causal relationships
           • ...



      [*] S. Sahoo and A. Sheth. Provenir ontology: Towards a Framework for eScience
      Provenance Management, Knoesis Center Tech Report, 2009.
                                                                                       7
Janus -- IPAW, Troy, NY, June 15-17, 2010
Janus structure




                                                      8
Janus -- IPAW, Troy, NY, June 15-17, 2010
Janus structure
                                                                                      port
                                                               v1     v2     v3      value

                                X1      X2    X3               X1    X2      X3
                                                                                      port
                                        P             exec          P_inst
                                Y1           Y2                Y1          Y2
                                                                                   processor
                                                   processor                         exec
                                                     spec      w1          w2




                                                                                               8
Janus -- IPAW, Troy, NY, June 15-17, 2010
Example Janus domain-agnostic fragment

<rdf:Description rdf:about="http://purl.org/net/taverna/janus/remove_Nulls">
<janus:has_execution rdf:resource="http://purl.org/net/taverna/janus/remove_Nulls"/>
<knoesis:has_parameter rdf:resource="http://purl.org/net/taverna/janus/remove_Nulls/output"/>
<knoesis:has_parameter rdf:resource="http://purl.org/net/taverna/janus/remove_Nulls/input"/>
<obo:part_of rdf:resource="http://purl.org/net/taverna/janus/e589d90b-01f2-4de6-..."/>
<rdf:type rdf:resource="http://purl.org/net/taverna/janus#processor_spec"/>


<rdf:Description rdf:about="http://purl.org/net/taverna/janus/remove_Nulls/input">
<janus:has_value_binding rdf:resource="http://purl.org/net/taverna/janus/test1625"/>
<janus:links_from rdf:resource="http://purl.org/net/taverna/janus/merge_entrez_genes/
concatenated"/>
<janus:is_processor_input rdf:datatype="http://www.w3.org/2001/
XMLSchema#boolean">true</janus:is_processor_input>
<rdf:type rdf:resource="http://purl.org/net/taverna/janus#port"/>


<rdf:Description rdf:about="http://purl.org/net/taverna/janus/test1625">
<janus:has_iteration rdf:datatype="http://www.w3.org/2001/XMLSchema#string">[]</
janus:has_iteration>
<rdf:type rdf:resource="http://purl.org/net/taverna/janus#port_value"/>
</rdf:Description>


                                                                                    9
Janus -- IPAW, Troy, NY, June 15-17, 2010
Annotated provenance

          Annotated workflow                                                 Annotated provenance graph


                             QTL →                                             Gene
                          Ensembl Genes




       Ensembl Gene →                       Ensembl Gene →
         Uniprot Gene                         Entrez Gene



                                                               exec                           ...
       Uniprot Gene →                       Entrez Gene →
         Kegg Gene                           Kegg Gene



                           merge gene IDs



                                                                      Kegg
                         Gene → Pathway




                                                             Q1 Q2


                                                                                                    10
Janus -- IPAW, Troy, NY, June 15-17, 2010
Desiderata: from structure to values annotations




                                                                          11
Janus -- IPAW, Troy, NY, June 15-17, 2010
Desiderata: from structure to values annotations




                                                                          11
Janus -- IPAW, Troy, NY, June 15-17, 2010
Desiderata: from structure to values annotations

                                                                                protein
                                                                               sequence

                                                            X1   X2    X3
                                                                                  interpro
                                                                 P                 match
                                                                                   report
                                                            Y1        Y2

                                                                            processor
                                                          interpro            spec
                                                            scan




                                                                                   11
Janus -- IPAW, Troy, NY, June 15-17, 2010
Desiderata: from structure to values annotations

                                                                                                       protein
                                                                                                      sequence

                                                                                 X1     X2    X3
                                                                                                         interpro
                                                                                        P                 match
                                                                                                          report
                                                                                 Y1          Y2

                                                                                                   processor
                                                                               interpro              spec
                                                                                 scan


                                             protein
                                            sequence
                                                                      port
                                                v1     v2     v3     value

                                                X1
                                                                                                   exec
                                                       X2     X3
                                                                      port
                                                     P_inst

       interpro                                 Y1          Y2
                                                                   processor          interpro
         scan                                                        exec
                                                w1          w2                         match
                                                                                       report             11
Janus -- IPAW, Troy, NY, June 15-17, 2010
Annotations propagation rules
                        X rdf:type Port    C = {c}    X has value type c
                             X has value v    v rdf:type PortValue
                                         v rdf:type C               denotes
                                                                               data type
                                                                               in the PL
                                             protein                             sense
         has_value_type                     sequence

        X1      X2       X3
                                              interpro
                P                              match
                                               report
        Y1           Y2

                                   processor
   interpro                          spec
     scan




                                                                                 12
Janus -- IPAW, Troy, NY, June 15-17, 2010
Annotations propagation rules
                        X rdf:type Port    C = {c}    X has value type c
                             X has value v    v rdf:type PortValue
                                         v rdf:type C               denotes
                                                                                             data type
                                                                                             in the PL
                                             protein                                           sense
         has_value_type                     sequence

        X1      X2       X3
                                               interpro
                P                               match
                                                report
        Y1           Y2

                                   processor
                                                                  ?
   interpro                          spec                                           port
     scan                                                    v1       v2    v3     value

                                                             X1       X2    X3
                                                                                    port
                                                                  P_inst

                                            interpro         Y1            Y2
                                                                                 processor    interpro
                                              scan                                 exec
                                                             w1            w2                  match
                                                                                               report
                                                                                               12
Janus -- IPAW, Troy, NY, June 15-17, 2010
Annotations propagation rules
                        X rdf:type Port    C = {c}    X has value type c
                             X has value v    v rdf:type PortValue
                                         v rdf:type C               denotes
                                                                                            data type
                                                                                            in the PL
                                             protein                                          sense
         has_value_type                     sequence

        X1      X2       X3
                                               interpro
                P                               match
                                                report
        Y1           Y2
                                                             protein
                                   processor                sequence
   interpro                          spec                                          port
     scan                                                    v1     v2     v3     value

                                                             X1    X2      X3
                                                                                   port
                                                                  P_inst

                                            interpro         Y1          Y2
                                                                                processor    interpro
                                              scan                                exec
                                                             w1          w2                   match
                                                                                              report
                                                                                              12
Janus -- IPAW, Troy, NY, June 15-17, 2010
Annotations as semantic overlay

                                                v1                                          w1




                                                                X3



                                                                            Y2
  Provenance graph                            has_port_value                      has_port_value




                                                                X2

                                                                      P
  fragment




                                                                            Y1
                                                                X1
                                                vn                                         wm




                                       has-input-type                Pathway                         has-output-type
                                                                      search
                                                                      service

                                                          instance-of
                            instance-of                                                               instance-of

       Gene                                     v1                                           w1                     Pathway
                                                                                                     has-source
                                                                 X3


                                has-source
                                               has_port_value                Y2    has_port_value
                                                                 X2

                                                                        P


                                instance-of
                                                                                                   instance-of
                                                                             Y1


       Kegg                                                                                  wm
                                                                 X1




                                                vn                                                                     Kegg
                          has-source                                                                   has-source


                                                                                                                       13
Janus -- IPAW, Troy, NY, June 15-17, 2010
Example Janus domain-aware fragment


       <rdf:Description rdf:about="http://purl.org/net/taverna/janus/test1625">
       <janus:has_iteration>[]</janus:has_iteration>
       <rdf:type rdf:resource="http://purl.org/net/taverna/janus#port_value"/>
       <rdf:type rdf:resource="http://purl.org/obo/owl/sequence#gene"/>
       <janus:has_source rdf:resource="http://purl.org/net/taverna/janus#KEGG"/>
       </rdf:Description>
                                              this is rule-
                                               defined, too




                                                                             14
Janus -- IPAW, Troy, NY, June 15-17, 2010
Extensions to Linked Data

          Annotated workflow                                            Annotated provenance graph


                             QTL →
                          Ensembl Genes




       Ensembl Gene →                       Ensembl Gene →
         Uniprot Gene                         Entrez Gene
                                                             exec                        ...
       Uniprot Gene →                       Entrez Gene →
         Kegg Gene                           Kegg Gene



                           merge gene IDs




                         Gene → Pathway




                                                                                               15
Janus -- IPAW, Troy, NY, June 15-17, 2010
Extensions to Linked Data

          Annotated workflow                                            Annotated provenance graph


                             QTL →
                          Ensembl Genes




       Ensembl Gene →                       Ensembl Gene →
         Uniprot Gene                         Entrez Gene
                                                             exec                        ...
       Uniprot Gene →                       Entrez Gene →
         Kegg Gene                           Kegg Gene



                           merge gene IDs




                         Gene → Pathway

                                                                                 - Publish
                                                                                 - I - Map IDs
                                                                                 - II - query



                                                                                               15
Janus -- IPAW, Troy, NY, June 15-17, 2010
I - Mapping data values to LoD URIs
   In our prototype we map data values to Bio2RDF as follows:
                                                                           Entrez Genes


                                                                           Uniprot Genes


                                                                           KEGG Genes


                                                                           KEGG Pathways


  <rdf:Description rdf:about="http://purl.org/net/taverna/janus/create_report/entrezGeneId">
    <janus:has_value_binding rdf:resource="http://purl.org/net/taverna/janus/test18"/>

   <rdf:Description rdf:about="http://purl.org/net/taverna/janus/test18">
     <rdf:type rdf:resource="http://purl.org/net/taverna/janus#port_value"/>
     <rdfs:comment>11835</rdfs:comment>
     <rdf:type rdf:resource="http://purl.org/obo/owl/sequence#gene"/>
     <janus:has_source rdf:resource="http://purl.org/net/taverna/janus#entrez_gene"/>

  <rdf:Description rdf:about="http://purl.org/net/taverna/janus/test18">
    <rdfs:seeAlso rdf:resource="http://bio2rdf.org/geneid:11835"/>
                                                                                        16
Janus -- IPAW, Troy, NY, June 15-17, 2010
II - extended queries in the LoD setting
    Strategy:
    - use the SQUIN LoD query engine to query multiple “Web of Data” sources
        - only Bio2RDF in our case
    - combine graph patterns on local provenance with conditions on remote LoD
      graphs


      Q5: Find all Entrez genes that encode proteins involved in ATP binding
                                  (GO:0005524).

   PREFIX uniprot: <http://purl.uniprot.org/core/>
   PREFIX : <http://www.taverna.org.uk/janus#>
   SELECT distinct ?entrezgene
   WHERE {
   ?protein uniprot:classifiedWith <http://bio2rdf.org/go:0005524> .              Bio2RDF
   ?entrezgene <http://bio2rdf.org/bio2rdf_resource:xPath> ?protein .
   ?gene rdfs:seeAlso ?entrezgene
   ?gene rdf:type :port_gene
   ?gene :has_source :entrez_gene . }
                                                             local provenance graph

                                                                                      17
Janus -- IPAW, Troy, NY, June 15-17, 2010
Current status
        Current Taverna provenance architecture:
          Lab prototype                                                      Production
     –    “Export as...” Janus RDF                                         – “native” (relational) graphs
     –    currently only queried using                                     – simple, efficient query language on
          SPARQL                                                             native provenance
e
     –    manually published
     –    manually annotated
                                                                                                                        Taverna
                                                        Taverna             <<Events stream>>                        Provenance
                                                        runtime                                 events capture



                                                query   <scope ... />            Lineage
                                                        <select .../>                                       relational
                                                                                  query
                                                        <focus .../>
                                                                                processor                      DB
                                         query response
                                            OPM graph

                                         complete Janus                            RDF
                                                  graph                          exporter

                                                                        Provenance
                                                                        API (Java)
                                                                                                                         18
    Janus -- IPAW, Troy, NY, June 15-17, 2010
Summary, and moving forward
• Janus: a semantic model for workflow provenance
       – OWL ontology, extension of Provenir
       – should include attribution + system level provenance
       – alignment with OPM?

• Domain-aware graphs through annotations:
       – automatically propagated from workflow annotations when
         possible
       – but in practice no real workflows are annotated

• LoD integration:
       – powerful provenance publishing and query broadening
       – mapping rules currently limited
       – no completeness guarantee -- all joins are outer joins!



                                                                   19
Janus -- IPAW, Troy, NY, June 15-17, 2010

More Related Content

Similar to Paper talk @ Ipaw 2010: Janus: from Workflows to Semantic Provenance and Linked Open Data

Experimentos de nubes científicas: Medical Genome Project
Experimentos de nubes científicas: Medical Genome ProjectExperimentos de nubes científicas: Medical Genome Project
Experimentos de nubes científicas: Medical Genome ProjectFundación Ramón Areces
 
eRNA_QTL_website.pptx
eRNA_QTL_website.pptxeRNA_QTL_website.pptx
eRNA_QTL_website.pptxxuelianma
 
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Enrico Glaab
 

Similar to Paper talk @ Ipaw 2010: Janus: from Workflows to Semantic Provenance and Linked Open Data (6)

Experimentos de nubes científicas: Medical Genome Project
Experimentos de nubes científicas: Medical Genome ProjectExperimentos de nubes científicas: Medical Genome Project
Experimentos de nubes científicas: Medical Genome Project
 
Biological Network Inference via Gaussian Graphical Models
Biological Network Inference via Gaussian Graphical ModelsBiological Network Inference via Gaussian Graphical Models
Biological Network Inference via Gaussian Graphical Models
 
Introduction to Apollo for i5k
Introduction to Apollo for i5kIntroduction to Apollo for i5k
Introduction to Apollo for i5k
 
eRNA_QTL_website.pptx
eRNA_QTL_website.pptxeRNA_QTL_website.pptx
eRNA_QTL_website.pptx
 
Bio_Computing
Bio_ComputingBio_Computing
Bio_Computing
 
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
Integrative analysis of transcriptomics and proteomics data with ArrayMining ...
 

More from Paolo Missier

Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsPaolo Missier
 
Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Paolo Missier
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...Paolo Missier
 
Realising the potential of Health Data Science: opportunities and challenges ...
Realising the potential of Health Data Science:opportunities and challenges ...Realising the potential of Health Data Science:opportunities and challenges ...
Realising the potential of Health Data Science: opportunities and challenges ...Paolo Missier
 
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Paolo Missier
 
A Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewA Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewPaolo Missier
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Paolo Missier
 
Tracking trajectories of multiple long-term conditions using dynamic patient...
Tracking trajectories of  multiple long-term conditions using dynamic patient...Tracking trajectories of  multiple long-term conditions using dynamic patient...
Tracking trajectories of multiple long-term conditions using dynamic patient...Paolo Missier
 
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Paolo Missier
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcarePaolo Missier
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcarePaolo Missier
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data SciencePaolo Missier
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Paolo Missier
 
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...Paolo Missier
 
Data Science for (Health) Science: tales from a challenging front line, and h...
Data Science for (Health) Science:tales from a challenging front line, and h...Data Science for (Health) Science:tales from a challenging front line, and h...
Data Science for (Health) Science: tales from a challenging front line, and h...Paolo Missier
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...Paolo Missier
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...Paolo Missier
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff UniversityPaolo Missier
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Paolo Missier
 
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...Paolo Missier
 

More from Paolo Missier (20)

Towards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance recordsTowards explanations for Data-Centric AI using provenance records
Towards explanations for Data-Centric AI using provenance records
 
Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...Interpretable and robust hospital readmission predictions from Electronic Hea...
Interpretable and robust hospital readmission predictions from Electronic Hea...
 
Data-centric AI and the convergence of data and model engineering: opportunit...
Data-centric AI and the convergence of data and model engineering:opportunit...Data-centric AI and the convergence of data and model engineering:opportunit...
Data-centric AI and the convergence of data and model engineering: opportunit...
 
Realising the potential of Health Data Science: opportunities and challenges ...
Realising the potential of Health Data Science:opportunities and challenges ...Realising the potential of Health Data Science:opportunities and challenges ...
Realising the potential of Health Data Science: opportunities and challenges ...
 
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
Provenance Week 2023 talk on DP4DS (Data Provenance for Data Science)
 
A Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overviewA Data-centric perspective on Data-driven healthcare: a short overview
A Data-centric perspective on Data-driven healthcare: a short overview
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
 
Tracking trajectories of multiple long-term conditions using dynamic patient...
Tracking trajectories of  multiple long-term conditions using dynamic patient...Tracking trajectories of  multiple long-term conditions using dynamic patient...
Tracking trajectories of multiple long-term conditions using dynamic patient...
 
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
Delivering on the promise of data-driven healthcare: trade-offs, challenges, ...
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
 
Digital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcareDigital biomarkers for preventive personalised healthcare
Digital biomarkers for preventive personalised healthcare
 
Data Provenance for Data Science
Data Provenance for Data ScienceData Provenance for Data Science
Data Provenance for Data Science
 
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...Capturing and querying fine-grained provenance of preprocessing pipelines in ...
Capturing and querying fine-grained provenance of preprocessing pipelines in ...
 
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...Quo vadis, provenancer? Cui prodest? our own trajectory: provenance of data...
Quo vadis, provenancer?  Cui prodest?  our own trajectory: provenance of data...
 
Data Science for (Health) Science: tales from a challenging front line, and h...
Data Science for (Health) Science:tales from a challenging front line, and h...Data Science for (Health) Science:tales from a challenging front line, and h...
Data Science for (Health) Science: tales from a challenging front line, and h...
 
Analytics of analytics pipelines: from optimising re-execution to general Dat...
Analytics of analytics pipelines:from optimising re-execution to general Dat...Analytics of analytics pipelines:from optimising re-execution to general Dat...
Analytics of analytics pipelines: from optimising re-execution to general Dat...
 
ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...ReComp: optimising the re-execution of analytics pipelines in response to cha...
ReComp: optimising the re-execution of analytics pipelines in response to cha...
 
ReComp, the complete story: an invited talk at Cardiff University
ReComp, the complete story:  an invited talk at Cardiff UniversityReComp, the complete story:  an invited talk at Cardiff University
ReComp, the complete story: an invited talk at Cardiff University
 
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
Efficient Re-computation of Big Data Analytics Processes in the Presence of C...
 
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...Decentralized, Trust-less Marketplacefor Brokered IoT Data Tradingusing Blo...
Decentralized, Trust-less Marketplace for Brokered IoT Data Trading using Blo...
 

Recently uploaded

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxBkGupta21
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .Alan Dix
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersRaghuram Pandurangan
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
unit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptxunit 4 immunoblotting technique complete.pptx
unit 4 immunoblotting technique complete.pptx
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .From Family Reminiscence to Scholarly Archive .
From Family Reminiscence to Scholarly Archive .
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
Generative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information DevelopersGenerative AI for Technical Writer or Information Developers
Generative AI for Technical Writer or Information Developers
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 

Paper talk @ Ipaw 2010: Janus: from Workflows to Semantic Provenance and Linked Open Data

  • 1. Janus Provenance Janus: from Workflows to Semantic Provenance and Linked Open Data Paolo Missier Jun Zhao Satya S. Sahoo Carole Goble Amit Sheth University of Manchester, UK University of Oxford, UK Wright State University, USA IPAW’ 10 Troy, NY June 15-16, 2010
  • 2. Key ideas • Janus: – a semantic provenance model with domain-specific extensions – designed around the Taverna workflow model • From domain-agnostic provenance graphs • To domain-aware graphs through explicit annotations • From local provenance graphs and queries scoped to the graph • To – Graphs published as Linked Data – Queries extended into the Web of Data 2 Janus -- IPAW, Troy, NY, June 15-17, 2010
  • 3. Example workflow (Taverna) chr: 17 QTL → start: 28500000 end: 3000000 Ensembl Genes Ensembl Gene → Ensembl Gene → Uniprot Gene Entrez Gene Uniprot Gene → Entrez Gene → Kegg Gene Kegg Gene merge gene IDs Gene → Pathway path:mmu04210 Apoptosis, path:mmu04010 MAPK, ... Janus -- IPAW, Troy, NY, June 15-17, 2010
  • 4. Baseline provenance of a workflow run QTL → mmu:12575 Ensembl Genes v1 ... vn w Ensembl Gene → Ensembl Gene → Uniprot Gene Entrez Gene path:mmu04012 exec Uniprot Gene → Entrez Gene → a1 ... an b1 ... bm Kegg Gene Kegg Gene mmu:26416 merge gene IDs path:mmu04010 y11 ymn Gene → Pathway ... path:mmu04010→derives_from→mmu:26416 path:mmu04012→derives_from→mmu:12575 • The graph encodes all direct data dependency relations • Baseline query model: compute paths amongst sets of nodes • Transitive closure over data dependency relations 4 Janus -- IPAW, Troy, NY, June 15-17, 2010
  • 5. Reference user questions QTL → Ensembl Genes Q0: Find all intermediate and initial input values that contribute to the computation of a certain output value. Ensembl Gene → Ensembl Gene → Uniprot Gene Entrez Gene Q1. Find all those genes within the input QTL Uniprot Gene → Entrez Gene → region that are involved in a given KEGG Kegg Gene Kegg Gene pathway. merge gene IDs Q2: Find all Uniprot-sourced genes Gene → Pathway Q3: Find all Entrez genes that encode proteins involved in ATP binding (go: 0005524). Q4: List relevant PubMed publications for the pathways listed in the result set. 5 Janus -- IPAW, Troy, NY, June 15-17, 2010
  • 6. Query types and model capabilities Query formulation effort Annotation requirements Query Scope Q0 - Requires knowledge of process structure and data values No annotations required Single run graph or Multi-run graphs - Graphical query constructor may be available Q1 Q2 Requires domain annotations Use of domain terms Single run graph or on workflow tasks and on data facilitates query formulation Multi-run graphs values Q3 Q4 - Use of domain terms - Requires domain annotations facilitates query on workflow tasks and on formulation. data values The Web of Data - Can be integrated with - Relies on completeness of browsers for LoD sources Linked Data Sources 6 Janus -- IPAW, Troy, NY, June 15-17, 2010
  • 7. Query types and model capabilities Query formulation effort Annotation requirements Query Scope Q0 - Requires knowledge of process structure and data values No annotations required Single run graph or Multi-run graphs - Graphical query constructor may be available Q1 Q2 Requires domain annotations Use of domain terms Single run graph or on workflow tasks and on data facilitates query formulation Multi-run graphs values Q3 Q4 - Use of domain terms - Requires domain annotations facilitates query on workflow tasks and on formulation. data values The Web of Data - Can be integrated with - Relies on completeness of browsers for LoD sources Linked Data Sources 6 Janus -- IPAW, Troy, NY, June 15-17, 2010
  • 8. Janus: baseline model of provenance • The semantic provenance model is an OWL ontology – defined for domain-agnostic provenance graphs – naturally extensible to domain concepts • extends the Provenir upper ontology [*] – Itself an extension of the Basic Formal Ontology (BFO) • abstract concepts include data, process, and agent – Provenir adds 11 types of relationships: • partonomy relations • temporal information • precedence • causal relationships • ... [*] S. Sahoo and A. Sheth. Provenir ontology: Towards a Framework for eScience Provenance Management, Knoesis Center Tech Report, 2009. 7 Janus -- IPAW, Troy, NY, June 15-17, 2010
  • 9. Janus structure 8 Janus -- IPAW, Troy, NY, June 15-17, 2010
  • 10. Janus structure port v1 v2 v3 value X1 X2 X3 X1 X2 X3 port P exec P_inst Y1 Y2 Y1 Y2 processor processor exec spec w1 w2 8 Janus -- IPAW, Troy, NY, June 15-17, 2010
  • 11. Example Janus domain-agnostic fragment <rdf:Description rdf:about="http://purl.org/net/taverna/janus/remove_Nulls"> <janus:has_execution rdf:resource="http://purl.org/net/taverna/janus/remove_Nulls"/> <knoesis:has_parameter rdf:resource="http://purl.org/net/taverna/janus/remove_Nulls/output"/> <knoesis:has_parameter rdf:resource="http://purl.org/net/taverna/janus/remove_Nulls/input"/> <obo:part_of rdf:resource="http://purl.org/net/taverna/janus/e589d90b-01f2-4de6-..."/> <rdf:type rdf:resource="http://purl.org/net/taverna/janus#processor_spec"/> <rdf:Description rdf:about="http://purl.org/net/taverna/janus/remove_Nulls/input"> <janus:has_value_binding rdf:resource="http://purl.org/net/taverna/janus/test1625"/> <janus:links_from rdf:resource="http://purl.org/net/taverna/janus/merge_entrez_genes/ concatenated"/> <janus:is_processor_input rdf:datatype="http://www.w3.org/2001/ XMLSchema#boolean">true</janus:is_processor_input> <rdf:type rdf:resource="http://purl.org/net/taverna/janus#port"/> <rdf:Description rdf:about="http://purl.org/net/taverna/janus/test1625"> <janus:has_iteration rdf:datatype="http://www.w3.org/2001/XMLSchema#string">[]</ janus:has_iteration> <rdf:type rdf:resource="http://purl.org/net/taverna/janus#port_value"/> </rdf:Description> 9 Janus -- IPAW, Troy, NY, June 15-17, 2010
  • 12. Annotated provenance Annotated workflow Annotated provenance graph QTL → Gene Ensembl Genes Ensembl Gene → Ensembl Gene → Uniprot Gene Entrez Gene exec ... Uniprot Gene → Entrez Gene → Kegg Gene Kegg Gene merge gene IDs Kegg Gene → Pathway Q1 Q2 10 Janus -- IPAW, Troy, NY, June 15-17, 2010
  • 13. Desiderata: from structure to values annotations 11 Janus -- IPAW, Troy, NY, June 15-17, 2010
  • 14. Desiderata: from structure to values annotations 11 Janus -- IPAW, Troy, NY, June 15-17, 2010
  • 15. Desiderata: from structure to values annotations protein sequence X1 X2 X3 interpro P match report Y1 Y2 processor interpro spec scan 11 Janus -- IPAW, Troy, NY, June 15-17, 2010
  • 16. Desiderata: from structure to values annotations protein sequence X1 X2 X3 interpro P match report Y1 Y2 processor interpro spec scan protein sequence port v1 v2 v3 value X1 exec X2 X3 port P_inst interpro Y1 Y2 processor interpro scan exec w1 w2 match report 11 Janus -- IPAW, Troy, NY, June 15-17, 2010
  • 17. Annotations propagation rules X rdf:type Port C = {c} X has value type c X has value v v rdf:type PortValue v rdf:type C denotes data type in the PL protein sense has_value_type sequence X1 X2 X3 interpro P match report Y1 Y2 processor interpro spec scan 12 Janus -- IPAW, Troy, NY, June 15-17, 2010
  • 18. Annotations propagation rules X rdf:type Port C = {c} X has value type c X has value v v rdf:type PortValue v rdf:type C denotes data type in the PL protein sense has_value_type sequence X1 X2 X3 interpro P match report Y1 Y2 processor ? interpro spec port scan v1 v2 v3 value X1 X2 X3 port P_inst interpro Y1 Y2 processor interpro scan exec w1 w2 match report 12 Janus -- IPAW, Troy, NY, June 15-17, 2010
  • 19. Annotations propagation rules X rdf:type Port C = {c} X has value type c X has value v v rdf:type PortValue v rdf:type C denotes data type in the PL protein sense has_value_type sequence X1 X2 X3 interpro P match report Y1 Y2 protein processor sequence interpro spec port scan v1 v2 v3 value X1 X2 X3 port P_inst interpro Y1 Y2 processor interpro scan exec w1 w2 match report 12 Janus -- IPAW, Troy, NY, June 15-17, 2010
  • 20. Annotations as semantic overlay v1 w1 X3 Y2 Provenance graph has_port_value has_port_value X2 P fragment Y1 X1 vn wm has-input-type Pathway has-output-type search service instance-of instance-of instance-of Gene v1 w1 Pathway has-source X3 has-source has_port_value Y2 has_port_value X2 P instance-of instance-of Y1 Kegg wm X1 vn Kegg has-source has-source 13 Janus -- IPAW, Troy, NY, June 15-17, 2010
  • 21. Example Janus domain-aware fragment <rdf:Description rdf:about="http://purl.org/net/taverna/janus/test1625"> <janus:has_iteration>[]</janus:has_iteration> <rdf:type rdf:resource="http://purl.org/net/taverna/janus#port_value"/> <rdf:type rdf:resource="http://purl.org/obo/owl/sequence#gene"/> <janus:has_source rdf:resource="http://purl.org/net/taverna/janus#KEGG"/> </rdf:Description> this is rule- defined, too 14 Janus -- IPAW, Troy, NY, June 15-17, 2010
  • 22. Extensions to Linked Data Annotated workflow Annotated provenance graph QTL → Ensembl Genes Ensembl Gene → Ensembl Gene → Uniprot Gene Entrez Gene exec ... Uniprot Gene → Entrez Gene → Kegg Gene Kegg Gene merge gene IDs Gene → Pathway 15 Janus -- IPAW, Troy, NY, June 15-17, 2010
  • 23. Extensions to Linked Data Annotated workflow Annotated provenance graph QTL → Ensembl Genes Ensembl Gene → Ensembl Gene → Uniprot Gene Entrez Gene exec ... Uniprot Gene → Entrez Gene → Kegg Gene Kegg Gene merge gene IDs Gene → Pathway - Publish - I - Map IDs - II - query 15 Janus -- IPAW, Troy, NY, June 15-17, 2010
  • 24. I - Mapping data values to LoD URIs In our prototype we map data values to Bio2RDF as follows: Entrez Genes Uniprot Genes KEGG Genes KEGG Pathways <rdf:Description rdf:about="http://purl.org/net/taverna/janus/create_report/entrezGeneId"> <janus:has_value_binding rdf:resource="http://purl.org/net/taverna/janus/test18"/> <rdf:Description rdf:about="http://purl.org/net/taverna/janus/test18"> <rdf:type rdf:resource="http://purl.org/net/taverna/janus#port_value"/> <rdfs:comment>11835</rdfs:comment> <rdf:type rdf:resource="http://purl.org/obo/owl/sequence#gene"/> <janus:has_source rdf:resource="http://purl.org/net/taverna/janus#entrez_gene"/> <rdf:Description rdf:about="http://purl.org/net/taverna/janus/test18"> <rdfs:seeAlso rdf:resource="http://bio2rdf.org/geneid:11835"/> 16 Janus -- IPAW, Troy, NY, June 15-17, 2010
  • 25. II - extended queries in the LoD setting Strategy: - use the SQUIN LoD query engine to query multiple “Web of Data” sources - only Bio2RDF in our case - combine graph patterns on local provenance with conditions on remote LoD graphs Q5: Find all Entrez genes that encode proteins involved in ATP binding (GO:0005524). PREFIX uniprot: <http://purl.uniprot.org/core/> PREFIX : <http://www.taverna.org.uk/janus#> SELECT distinct ?entrezgene WHERE { ?protein uniprot:classifiedWith <http://bio2rdf.org/go:0005524> . Bio2RDF ?entrezgene <http://bio2rdf.org/bio2rdf_resource:xPath> ?protein . ?gene rdfs:seeAlso ?entrezgene ?gene rdf:type :port_gene ?gene :has_source :entrez_gene . } local provenance graph 17 Janus -- IPAW, Troy, NY, June 15-17, 2010
  • 26. Current status Current Taverna provenance architecture: Lab prototype Production – “Export as...” Janus RDF – “native” (relational) graphs – currently only queried using – simple, efficient query language on SPARQL native provenance e – manually published – manually annotated Taverna Taverna <<Events stream>> Provenance runtime events capture query <scope ... /> Lineage <select .../> relational query <focus .../> processor DB query response OPM graph complete Janus RDF graph exporter Provenance API (Java) 18 Janus -- IPAW, Troy, NY, June 15-17, 2010
  • 27. Summary, and moving forward • Janus: a semantic model for workflow provenance – OWL ontology, extension of Provenir – should include attribution + system level provenance – alignment with OPM? • Domain-aware graphs through annotations: – automatically propagated from workflow annotations when possible – but in practice no real workflows are annotated • LoD integration: – powerful provenance publishing and query broadening – mapping rules currently limited – no completeness guarantee -- all joins are outer joins! 19 Janus -- IPAW, Troy, NY, June 15-17, 2010