SlideShare una empresa de Scribd logo
1 de 19
Towards Query Generation for
PROV-O Data
Jun Zhao1, HongHanWu2 and Jeff Z. Pan2
1Lancaster University
@junszhao | j.zhao5 at lancaster.ac.uk
2University of Aberdeen
honghan.wu | jeff.z.pan at abdn.ac.uk
Outline
• Motivation
• Profile-driven query generation
– K-Drive
– ProvQ
• Result discussion
• Future work
The Big Picture of PROV: A Motivation Scenario
http://www.w3.org/2005/Incubator/prov/wiki/images/3/38/Content-b.png
The Big Picture of PROV: A Motivation Scenario
Adapted from:
http://www.w3.org/2005/Incubator/prov/wiki/images/3/38/Content-b.png
Provenance information
The Big Picture of PROV: A Motivation Scenario
http://www.w3.org/2005/Incubator/prov/wiki/images/b/b8/Use-b.png
Provenance in the Wild v.s. ProvBench
Taverna-
PROV
Vistrails
PROV
Wings
PROV
Wikipedia
-PROV
Twitter-
PROV
OBIAMA
(social
simulation)
Workflow
/ scientific
domain
• 11 repositories so far
• Various representations
• Cross different domains
• Openly accessible under
different open licenses
Web
resources
Social
domain
https://github.com/provbench
https://sites.google.com/site/provbench/home
Next Step: Access PROV Datasets
Taverna-
PROV
Vistrails
PROV
Wings
PROV
Wikipedia
-PROV
Twitter-
PROV
OBIAMA
(social
simulation)
Can we query
across them?
Can we learn
something by
querying
across them?
What can we
do with them?
……
Query Generation: A Bottom-up Approach
Taverna-
PROV
Wings
PROV
Wikipedia
-PROV
OBIAMA
(social
simulation)
Provenance Data Profile Generator
Provenance Query Builder
SPARQL queries
for PROV-O
datasets
Example profiles:
• Class associations
• Property associations
Query Generation: A First Step
A
PROV
Dataset
Provenance Data Profile Generator
Provenance Query Builder
SPARQL queries
for the PROV-O
dataset
Example profiles:
• Class associations
• Property associations
Big City:
Big Road:
Slide credit: Dr Wu at Scottish Linked Data Workshop 2014
http://www.kdrive-project.eu EU FP7 Marie-Curie 286348
Pan et al. Query generation for semantic datasets. K-CAP 2013. p 113-116
• University of Aberdeen
• A generic query generation
tool for semantic web data
• Find key sub-graphs in the
RDF data
– Big City: The most
instantialised concepts in the
data
– Big Road: The most frequent
relations connecting those
big cities
K-Drive Query Generation
K-Drive Generator
Live demo:
http://homepages.abdn.ac.uk/honghan.wu/pages/prov2/index.html
Live demo:
http://homepages.abdn.ac.uk/honghan.wu/pages/prov2/index.html
SELECT ?Generation ?x4_1 ?x3_1 ?x0_1
WHERE {
?Generation rdf:type <http://www.w3.org/ns/prov#Generation>.
?Generation <http://www.w3.org/ns/prov#activity> ?x4_1 .
?Generation <http://www.w3.org/ns/prov#hadRole> ?x3_1 .
?x0_1 <http://www.w3.org/ns/prov#qualifiedGeneration> ?Generation .
}
K-Drive Generator
ProvQ: Property Association Mining
A
PROV
Dataset
Provenance Data Profile Generator
Provenance Query Builder
SPARQL queries
for the PROV-O
dataset
Discover properties that are
used together with each
PROV-O properties
Expand a set of “seed” PROV-O
queries using the discovered
associating properties
https://github.com/junszhao/ProvQ
ProvQ: Property Association Mining
• Advantages
– Reduce the performance challenge usually faced
in association rule mining
– Produce provenance-centric queries
• Disadvantages
– Could miss queries that are not related to PROV-
O terms at all
Expanding Starting Queries
Approach Walk-Through
• Given a seed atomic query,
we have seed property:
• We find all properties used together with
– http://purl.org/wf4ever/wfprov#describedByParameter
– http://purl.org/wf4ever/wfprov#wasOutputFrom
– http://www.w3.org/ns/prov#qualifiedGeneration
• Return resulting conjunctive SPARQL query
Results Comparison
• K-Drive Generator
– 7 Queries
– 3 of them are not
exactly provenance
queries
– Probably easier to
understand because
classes are included in
the queries
– But queries can be
complex
• ProvQ
– 7 Queries
– 1 not returned by K-Drive
(prov:wasDerivedFrom)
– Only provenance queries
are returned
– Queries are simple,
based on properties
associations starting from
“seed” PROV-O
properties
https://github.com/junszhao/ProvQ/blob/master/results/query-analysis.txt
Future Work
• Define and evaluate usefulness
• Test against more datasets
• Experiment with reasoning
• Query generation across multiple datasets
Thank you!
These slides have been created by Jun Zhao
This work is licensed under a Creative Commons
Attribution-NonCommercial-ShareAlike 3.0
Unported
http://creativecommons.org/licenses/by-nc-sa/3.0/

Más contenido relacionado

Similar a Query-generation-for-provo-data-201406

Towards Supporting the Life Cycle of Web Data
Towards Supporting the Life Cycle of Web DataTowards Supporting the Life Cycle of Web Data
Towards Supporting the Life Cycle of Web DataGong Cheng
 
Cool Tools For Library
Cool Tools For Library Cool Tools For Library
Cool Tools For Library Johnson888
 
Cool Tools for Library Webmasters - Internet Librarian 2007
Cool Tools for Library Webmasters - Internet Librarian 2007Cool Tools for Library Webmasters - Internet Librarian 2007
Cool Tools for Library Webmasters - Internet Librarian 2007Darlene Fichter
 
Invincea: Reasoning in Incident Response in Tapio
Invincea: Reasoning in Incident Response in TapioInvincea: Reasoning in Incident Response in Tapio
Invincea: Reasoning in Incident Response in TapioInvincea, Inc.
 
"Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications""Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications"Pinar Alper
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsGaignard Alban
 
Building collaborative workflows for scientific data
Building collaborative workflows for scientific dataBuilding collaborative workflows for scientific data
Building collaborative workflows for scientific dataBruno Vieira
 
Semantic Representation of Provenance in Wikipedia
Semantic Representation of Provenance in WikipediaSemantic Representation of Provenance in Wikipedia
Semantic Representation of Provenance in WikipediaFabrizio Orlandi
 
ICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick ProvartICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick ProvartAraport
 
RMLL 2013 : Build Your Personal Search Engine using Crawlzilla
RMLL 2013 : Build Your Personal Search Engine using CrawlzillaRMLL 2013 : Build Your Personal Search Engine using Crawlzilla
RMLL 2013 : Build Your Personal Search Engine using CrawlzillaJazz Yao-Tsung Wang
 
Rise presentation-2012-01
Rise presentation-2012-01Rise presentation-2012-01
Rise presentation-2012-01Richard Nurse
 
Building communities around open-source scientific software
Building communities around open-source scientific softwareBuilding communities around open-source scientific software
Building communities around open-source scientific softwareKaren Cranston
 
PhD Projects in Java Research Help
PhD Projects in Java Research HelpPhD Projects in Java Research Help
PhD Projects in Java Research HelpPhD Services
 
Towards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemTowards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemHerbert Van de Sompel
 
Esa 2014 qiime
Esa 2014 qiimeEsa 2014 qiime
Esa 2014 qiimeZech Xu
 
The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...Paolo Missier
 

Similar a Query-generation-for-provo-data-201406 (20)

Towards Supporting the Life Cycle of Web Data
Towards Supporting the Life Cycle of Web DataTowards Supporting the Life Cycle of Web Data
Towards Supporting the Life Cycle of Web Data
 
Cool Tools For Library
Cool Tools For Library Cool Tools For Library
Cool Tools For Library
 
Cool Tools for Library Webmasters - Internet Librarian 2007
Cool Tools for Library Webmasters - Internet Librarian 2007Cool Tools for Library Webmasters - Internet Librarian 2007
Cool Tools for Library Webmasters - Internet Librarian 2007
 
Invincea: Reasoning in Incident Response in Tapio
Invincea: Reasoning in Incident Response in TapioInvincea: Reasoning in Incident Response in Tapio
Invincea: Reasoning in Incident Response in Tapio
 
"Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications""Data Provenance: Principles and Why it matters for BioMedical Applications"
"Data Provenance: Principles and Why it matters for BioMedical Applications"
 
Ccanz webinar-oaw
Ccanz webinar-oawCcanz webinar-oaw
Ccanz webinar-oaw
 
ISMB Workshop 2014
ISMB Workshop 2014ISMB Workshop 2014
ISMB Workshop 2014
 
Sharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reportsSharing massive data analysis: from provenance to linked experiment reports
Sharing massive data analysis: from provenance to linked experiment reports
 
Building collaborative workflows for scientific data
Building collaborative workflows for scientific dataBuilding collaborative workflows for scientific data
Building collaborative workflows for scientific data
 
Java PathFinder
Java PathFinderJava PathFinder
Java PathFinder
 
Semantic Representation of Provenance in Wikipedia
Semantic Representation of Provenance in WikipediaSemantic Representation of Provenance in Wikipedia
Semantic Representation of Provenance in Wikipedia
 
ICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick ProvartICAR 2015 Workshop - Nick Provart
ICAR 2015 Workshop - Nick Provart
 
RMLL 2013 : Build Your Personal Search Engine using Crawlzilla
RMLL 2013 : Build Your Personal Search Engine using CrawlzillaRMLL 2013 : Build Your Personal Search Engine using Crawlzilla
RMLL 2013 : Build Your Personal Search Engine using Crawlzilla
 
Rise presentation-2012-01
Rise presentation-2012-01Rise presentation-2012-01
Rise presentation-2012-01
 
Building communities around open-source scientific software
Building communities around open-source scientific softwareBuilding communities around open-source scientific software
Building communities around open-source scientific software
 
PhD Projects in Java Research Help
PhD Projects in Java Research HelpPhD Projects in Java Research Help
PhD Projects in Java Research Help
 
Milex 2010 final
Milex 2010 finalMilex 2010 final
Milex 2010 final
 
Towards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication SystemTowards a Machine-Actionable Scholarly Communication System
Towards a Machine-Actionable Scholarly Communication System
 
Esa 2014 qiime
Esa 2014 qiimeEsa 2014 qiime
Esa 2014 qiime
 
The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...The lifecycle of reproducible science data and what provenance has got to do ...
The lifecycle of reproducible science data and what provenance has got to do ...
 

Más de Jun Zhao

2012 05-swpm-provo
2012 05-swpm-provo2012 05-swpm-provo
2012 05-swpm-provoJun Zhao
 
2012 04-ldow-prov
2012 04-ldow-prov2012 04-ldow-prov
2012 04-ldow-provJun Zhao
 
2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurghJun Zhao
 
2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurghJun Zhao
 
2010 09 opm_tutorial_02-jun-opmv
2010 09 opm_tutorial_02-jun-opmv2010 09 opm_tutorial_02-jun-opmv
2010 09 opm_tutorial_02-jun-opmvJun Zhao
 
2010 09 opm_tutorial_01-jun-usecase-datagovuk
2010 09 opm_tutorial_01-jun-usecase-datagovuk2010 09 opm_tutorial_01-jun-usecase-datagovuk
2010 09 opm_tutorial_01-jun-usecase-datagovukJun Zhao
 
2010 06 rdf_next
2010 06 rdf_next2010 06 rdf_next
2010 06 rdf_nextJun Zhao
 
2010 06 ipaw_prv
2010 06 ipaw_prv2010 06 ipaw_prv
2010 06 ipaw_prvJun Zhao
 
2010 05 edinburgh
2010 05 edinburgh2010 05 edinburgh
2010 05 edinburghJun Zhao
 
2010 03 Lodoxf Openflydata
2010 03 Lodoxf Openflydata2010 03 Lodoxf Openflydata
2010 03 Lodoxf OpenflydataJun Zhao
 
2009 09 Lod London
2009 09 Lod London2009 09 Lod London
2009 09 Lod LondonJun Zhao
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod GmodJun Zhao
 
2009 Dils Flyweb
2009 Dils Flyweb2009 Dils Flyweb
2009 Dils FlywebJun Zhao
 
Talk_linked_data_for_hcls_at_iswc2009
Talk_linked_data_for_hcls_at_iswc2009Talk_linked_data_for_hcls_at_iswc2009
Talk_linked_data_for_hcls_at_iswc2009Jun Zhao
 
myExperiment and AIDA
myExperiment and AIDAmyExperiment and AIDA
myExperiment and AIDAJun Zhao
 
2008 Jun Zhao Eswc
2008 Jun Zhao Eswc2008 Jun Zhao Eswc
2008 Jun Zhao EswcJun Zhao
 
2008 04 22 Jun Zhao Ldow
2008 04 22 Jun Zhao Ldow2008 04 22 Jun Zhao Ldow
2008 04 22 Jun Zhao LdowJun Zhao
 

Más de Jun Zhao (17)

2012 05-swpm-provo
2012 05-swpm-provo2012 05-swpm-provo
2012 05-swpm-provo
 
2012 04-ldow-prov
2012 04-ldow-prov2012 04-ldow-prov
2012 04-ldow-prov
 
2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh
 
2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh2011 03-provenance-workshop-edingurgh
2011 03-provenance-workshop-edingurgh
 
2010 09 opm_tutorial_02-jun-opmv
2010 09 opm_tutorial_02-jun-opmv2010 09 opm_tutorial_02-jun-opmv
2010 09 opm_tutorial_02-jun-opmv
 
2010 09 opm_tutorial_01-jun-usecase-datagovuk
2010 09 opm_tutorial_01-jun-usecase-datagovuk2010 09 opm_tutorial_01-jun-usecase-datagovuk
2010 09 opm_tutorial_01-jun-usecase-datagovuk
 
2010 06 rdf_next
2010 06 rdf_next2010 06 rdf_next
2010 06 rdf_next
 
2010 06 ipaw_prv
2010 06 ipaw_prv2010 06 ipaw_prv
2010 06 ipaw_prv
 
2010 05 edinburgh
2010 05 edinburgh2010 05 edinburgh
2010 05 edinburgh
 
2010 03 Lodoxf Openflydata
2010 03 Lodoxf Openflydata2010 03 Lodoxf Openflydata
2010 03 Lodoxf Openflydata
 
2009 09 Lod London
2009 09 Lod London2009 09 Lod London
2009 09 Lod London
 
2009 0807 Lod Gmod
2009 0807 Lod Gmod2009 0807 Lod Gmod
2009 0807 Lod Gmod
 
2009 Dils Flyweb
2009 Dils Flyweb2009 Dils Flyweb
2009 Dils Flyweb
 
Talk_linked_data_for_hcls_at_iswc2009
Talk_linked_data_for_hcls_at_iswc2009Talk_linked_data_for_hcls_at_iswc2009
Talk_linked_data_for_hcls_at_iswc2009
 
myExperiment and AIDA
myExperiment and AIDAmyExperiment and AIDA
myExperiment and AIDA
 
2008 Jun Zhao Eswc
2008 Jun Zhao Eswc2008 Jun Zhao Eswc
2008 Jun Zhao Eswc
 
2008 04 22 Jun Zhao Ldow
2008 04 22 Jun Zhao Ldow2008 04 22 Jun Zhao Ldow
2008 04 22 Jun Zhao Ldow
 

Último

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxhariprasad279825
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek SchlawackFwdays
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embeddingZilliz
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...Fwdays
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfRankYa
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 

Último (20)

Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 
Artificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptxArtificial intelligence in cctv survelliance.pptx
Artificial intelligence in cctv survelliance.pptx
 
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
"Subclassing and Composition – A Pythonic Tour of Trade-Offs", Hynek Schlawack
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 
Training state-of-the-art general text embedding
Training state-of-the-art general text embeddingTraining state-of-the-art general text embedding
Training state-of-the-art general text embedding
 
DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks..."LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
"LLMs for Python Engineers: Advanced Data Analysis and Semantic Kernel",Oleks...
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Search Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdfSearch Engine Optimization SEO PDF for 2024.pdf
Search Engine Optimization SEO PDF for 2024.pdf
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 

Query-generation-for-provo-data-201406

  • 1. Towards Query Generation for PROV-O Data Jun Zhao1, HongHanWu2 and Jeff Z. Pan2 1Lancaster University @junszhao | j.zhao5 at lancaster.ac.uk 2University of Aberdeen honghan.wu | jeff.z.pan at abdn.ac.uk
  • 2. Outline • Motivation • Profile-driven query generation – K-Drive – ProvQ • Result discussion • Future work
  • 3. The Big Picture of PROV: A Motivation Scenario http://www.w3.org/2005/Incubator/prov/wiki/images/3/38/Content-b.png
  • 4. The Big Picture of PROV: A Motivation Scenario Adapted from: http://www.w3.org/2005/Incubator/prov/wiki/images/3/38/Content-b.png Provenance information
  • 5. The Big Picture of PROV: A Motivation Scenario http://www.w3.org/2005/Incubator/prov/wiki/images/b/b8/Use-b.png
  • 6. Provenance in the Wild v.s. ProvBench Taverna- PROV Vistrails PROV Wings PROV Wikipedia -PROV Twitter- PROV OBIAMA (social simulation) Workflow / scientific domain • 11 repositories so far • Various representations • Cross different domains • Openly accessible under different open licenses Web resources Social domain https://github.com/provbench https://sites.google.com/site/provbench/home
  • 7. Next Step: Access PROV Datasets Taverna- PROV Vistrails PROV Wings PROV Wikipedia -PROV Twitter- PROV OBIAMA (social simulation) Can we query across them? Can we learn something by querying across them? What can we do with them? ……
  • 8. Query Generation: A Bottom-up Approach Taverna- PROV Wings PROV Wikipedia -PROV OBIAMA (social simulation) Provenance Data Profile Generator Provenance Query Builder SPARQL queries for PROV-O datasets Example profiles: • Class associations • Property associations
  • 9. Query Generation: A First Step A PROV Dataset Provenance Data Profile Generator Provenance Query Builder SPARQL queries for the PROV-O dataset Example profiles: • Class associations • Property associations
  • 10. Big City: Big Road: Slide credit: Dr Wu at Scottish Linked Data Workshop 2014 http://www.kdrive-project.eu EU FP7 Marie-Curie 286348 Pan et al. Query generation for semantic datasets. K-CAP 2013. p 113-116 • University of Aberdeen • A generic query generation tool for semantic web data • Find key sub-graphs in the RDF data – Big City: The most instantialised concepts in the data – Big Road: The most frequent relations connecting those big cities K-Drive Query Generation
  • 12. Live demo: http://homepages.abdn.ac.uk/honghan.wu/pages/prov2/index.html SELECT ?Generation ?x4_1 ?x3_1 ?x0_1 WHERE { ?Generation rdf:type <http://www.w3.org/ns/prov#Generation>. ?Generation <http://www.w3.org/ns/prov#activity> ?x4_1 . ?Generation <http://www.w3.org/ns/prov#hadRole> ?x3_1 . ?x0_1 <http://www.w3.org/ns/prov#qualifiedGeneration> ?Generation . } K-Drive Generator
  • 13. ProvQ: Property Association Mining A PROV Dataset Provenance Data Profile Generator Provenance Query Builder SPARQL queries for the PROV-O dataset Discover properties that are used together with each PROV-O properties Expand a set of “seed” PROV-O queries using the discovered associating properties https://github.com/junszhao/ProvQ
  • 14. ProvQ: Property Association Mining • Advantages – Reduce the performance challenge usually faced in association rule mining – Produce provenance-centric queries • Disadvantages – Could miss queries that are not related to PROV- O terms at all
  • 16. Approach Walk-Through • Given a seed atomic query, we have seed property: • We find all properties used together with – http://purl.org/wf4ever/wfprov#describedByParameter – http://purl.org/wf4ever/wfprov#wasOutputFrom – http://www.w3.org/ns/prov#qualifiedGeneration • Return resulting conjunctive SPARQL query
  • 17. Results Comparison • K-Drive Generator – 7 Queries – 3 of them are not exactly provenance queries – Probably easier to understand because classes are included in the queries – But queries can be complex • ProvQ – 7 Queries – 1 not returned by K-Drive (prov:wasDerivedFrom) – Only provenance queries are returned – Queries are simple, based on properties associations starting from “seed” PROV-O properties https://github.com/junszhao/ProvQ/blob/master/results/query-analysis.txt
  • 18. Future Work • Define and evaluate usefulness • Test against more datasets • Experiment with reasoning • Query generation across multiple datasets
  • 19. Thank you! These slides have been created by Jun Zhao This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 3.0 Unported http://creativecommons.org/licenses/by-nc-sa/3.0/

Notas del editor

  1. wasGeneratedBy, startedAtTime, endedAtTime, wasAssociatedWith, wasAttributedTo, actedOnBehalfOf, wasInformedBy
  2. From prov:wasGeneratedBy: Select distinct * where { ?s prov:wasGeneratedBy ?o . optional {?s <http://purl.org/wf4ever/wfprov#describedByParameter> ?o1.} optional {?s <http://purl.org/wf4ever/wfprov#wasOutputFrom> ?o3 .} optional {?s <http://www.w3.org/ns/prov#qualifiedGeneration> ?o4 .} } limit 100 2. From prov:used <http://purl.org/wf4ever/wfprov#usedInput>; 1 rdfs:label; 1 prov:endedAtTime; 1 prov:startedAtTime; 1 prov:qualifiedAssociation; 1 prov:qualifiedUsage; 1 <http://purl.org/wf4ever/wfprov#describedByProcess>; 0.98 <http://purl.org/wf4ever/wfprov#wasPartOfWorkflowRun>; 0.98 Select distinct * where { ?s prov:used ?o . ?s <http://purl.org/wf4ever/wfprov#usedInput> ?o1 . ?s rdfs:label ?o2 . ?s prov:endedAtTime ?o3 . ?s prov:startedAtTime ?o4 . ?s prov:qualifiedAssociation ?o5 . ?s prov:qualifiedUsage ?o6 . optional {?s <http://purl.org/wf4ever/wfprov#describedByProcess> ?o7 .} optional {?s <http://purl.org/wf4ever/wfprov#wasPartOfWorkflowRun> ?o8 .} } limit 100 3. From prov:wasDerivedFrom <http://ns.taverna.org.uk/2012/tavernaprov/errorMessage>; 1 <http://ns.taverna.org.uk/2012/tavernaprov/stackTrace>; 1 Select distinct * where { ?s prov:wasDerivedFrom ?o . ?s <http://ns.taverna.org.uk/2012/tavernaprov/errorMessage> ?o1. ?s <http://ns.taverna.org.uk/2012/tavernaprov/stackTrace> ?o2 . } limit 100 4. From prov:startedAtTime and prov:endedAtTime, will produce similar result as query 2 rdfs:label; 1 prov:endedAtTime; 1 prov:qualifiedAssociation; 1 <http://purl.org/wf4ever/wfprov#describedByProcess>; 0.97 <http://purl.org/wf4ever/wfprov#wasPartOfWorkflowRun>; 0.97 prov:qualifiedUsage; 0.90 prov:used; 0.90 <http://purl.org/wf4ever/wfprov#usedInput>; 0.90 Select distinct * where { ?s prov:startedAtTime?o . ?s rdfs:label ?o1 . ?s prov:endedAtTime ?o2 . ?s prov:qualifiedAssociation ?o3 . optional {?s <http://purl.org/wf4ever/wfprov#describedByProcess> ?o4 .} optional {?s <http://purl.org/wf4ever/wfprov#wasPartOfWorkflowRun> ?o5 .} optional {?s <http://purl.org/wf4ever/wfprov#usedInput> ?o6 .} optional {?s prov:qualifiedUsage ?o7 .} optional {?s prov:used ?o8 .} } limit 100
  3. 3 queries were largely the same, 3 queries were only returned by K-Drive, and the rest had different degrees of overlap. 1 query not returned