1. Towards Query Generation for
PROV-O Data
Jun Zhao1, HongHanWu2 and Jeff Z. Pan2
1Lancaster University
@junszhao | j.zhao5 at lancaster.ac.uk
2University of Aberdeen
honghan.wu | jeff.z.pan at abdn.ac.uk
3. The Big Picture of PROV: A Motivation Scenario
http://www.w3.org/2005/Incubator/prov/wiki/images/3/38/Content-b.png
4. The Big Picture of PROV: A Motivation Scenario
Adapted from:
http://www.w3.org/2005/Incubator/prov/wiki/images/3/38/Content-b.png
Provenance information
5. The Big Picture of PROV: A Motivation Scenario
http://www.w3.org/2005/Incubator/prov/wiki/images/b/b8/Use-b.png
6. Provenance in the Wild v.s. ProvBench
Taverna-
PROV
Vistrails
PROV
Wings
PROV
Wikipedia
-PROV
Twitter-
PROV
OBIAMA
(social
simulation)
Workflow
/ scientific
domain
• 11 repositories so far
• Various representations
• Cross different domains
• Openly accessible under
different open licenses
Web
resources
Social
domain
https://github.com/provbench
https://sites.google.com/site/provbench/home
7. Next Step: Access PROV Datasets
Taverna-
PROV
Vistrails
PROV
Wings
PROV
Wikipedia
-PROV
Twitter-
PROV
OBIAMA
(social
simulation)
Can we query
across them?
Can we learn
something by
querying
across them?
What can we
do with them?
……
8. Query Generation: A Bottom-up Approach
Taverna-
PROV
Wings
PROV
Wikipedia
-PROV
OBIAMA
(social
simulation)
Provenance Data Profile Generator
Provenance Query Builder
SPARQL queries
for PROV-O
datasets
Example profiles:
• Class associations
• Property associations
9. Query Generation: A First Step
A
PROV
Dataset
Provenance Data Profile Generator
Provenance Query Builder
SPARQL queries
for the PROV-O
dataset
Example profiles:
• Class associations
• Property associations
10. Big City:
Big Road:
Slide credit: Dr Wu at Scottish Linked Data Workshop 2014
http://www.kdrive-project.eu EU FP7 Marie-Curie 286348
Pan et al. Query generation for semantic datasets. K-CAP 2013. p 113-116
• University of Aberdeen
• A generic query generation
tool for semantic web data
• Find key sub-graphs in the
RDF data
– Big City: The most
instantialised concepts in the
data
– Big Road: The most frequent
relations connecting those
big cities
K-Drive Query Generation
13. ProvQ: Property Association Mining
A
PROV
Dataset
Provenance Data Profile Generator
Provenance Query Builder
SPARQL queries
for the PROV-O
dataset
Discover properties that are
used together with each
PROV-O properties
Expand a set of “seed” PROV-O
queries using the discovered
associating properties
https://github.com/junszhao/ProvQ
14. ProvQ: Property Association Mining
• Advantages
– Reduce the performance challenge usually faced
in association rule mining
– Produce provenance-centric queries
• Disadvantages
– Could miss queries that are not related to PROV-
O terms at all
16. Approach Walk-Through
• Given a seed atomic query,
we have seed property:
• We find all properties used together with
– http://purl.org/wf4ever/wfprov#describedByParameter
– http://purl.org/wf4ever/wfprov#wasOutputFrom
– http://www.w3.org/ns/prov#qualifiedGeneration
• Return resulting conjunctive SPARQL query
17. Results Comparison
• K-Drive Generator
– 7 Queries
– 3 of them are not
exactly provenance
queries
– Probably easier to
understand because
classes are included in
the queries
– But queries can be
complex
• ProvQ
– 7 Queries
– 1 not returned by K-Drive
(prov:wasDerivedFrom)
– Only provenance queries
are returned
– Queries are simple,
based on properties
associations starting from
“seed” PROV-O
properties
https://github.com/junszhao/ProvQ/blob/master/results/query-analysis.txt
18. Future Work
• Define and evaluate usefulness
• Test against more datasets
• Experiment with reasoning
• Query generation across multiple datasets
19. Thank you!
These slides have been created by Jun Zhao
This work is licensed under a Creative Commons
Attribution-NonCommercial-ShareAlike 3.0
Unported
http://creativecommons.org/licenses/by-nc-sa/3.0/