The document discusses link traversal based query execution for querying linked data on the web. It describes an approach that alternates between evaluating parts of a query on a continuously augmented local dataset, and looking up URIs in solutions to retrieve more data and add it to the local dataset. This allows querying linked data as if it were a single large database, without needing to know all data sources in advance. A key issue is how to efficiently cache retrieved data to avoid redundant lookups.
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
The Impact of Data Caching of on Query Execution for Linked Data
1. The Impact of
Data Caching on
Query Execution for Linked Data
Olaf Hartig
http://olafhartig.de/foaf.rdf#olaf
@olafhartig
Database and Information Systems Research Group
Humboldt-Universität zu Berlin
2. Can we query the Web of Data
as of it were a single,
giant database?
SELECT DISTINCT ?i ?label
WHERE {
?prof rdf:type <http://res ... data/dbprofs#DBProfessor> ;
foaf:topic_interest ?i .
}
OPTIONAL {
}
?i rdfs:label ?label
FILTER( LANG(?label)="en" || LANG(?label)="")
ORDER BY ?label
?
Our approach: Link Traversal Based Query Execution
[ISWC'09]
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 2
3. Main Idea
● Intertwine query evaluation with traversal of data links
● We alternate between:
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the query-local dataset
query-local
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 3
4. Main Idea
● Intertwine query evaluation with traversal of data links
● We alternate between:
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the query-local dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 4
5. Main Idea
● Intertwine query evaluation with traversal of data links
● We alternate between:
htt
p:/ ?
● Evaluate parts of the query (triple patterns)
/bo
on a continuously augmented set of data
b.n
am
Look up URIs in intermediate
e
●
solutions and add retrieved data
to the query-local dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 5
6. Main Idea
● Intertwine query evaluation with traversal of data links
● We alternate between:
htt
p:/ ?
● Evaluate parts of the query (triple patterns)
/bo
on a continuously augmented set of data
b.n
am
Look up URIs in intermediate
e
●
solutions and add retrieved data
to the query-local dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 6
7. Main Idea
● Intertwine query evaluation with traversal of data links
● We alternate between:
htt
p:/ ?
● Evaluate parts of the query (triple patterns)
/bo
on a continuously augmented set of data
b.n
am
Look up URIs in intermediate
e
●
solutions and add retrieved data
to the query-local dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 7
8. Main Idea
● Intertwine query evaluation with traversal of data links
● We alternate between:
htt
p:/ ?
● Evaluate parts of the query (triple patterns)
/bo
on a continuously augmented set of data
b.n
am
Look up URIs in intermediate
e
●
solutions and add retrieved data
“Descriptor object”
to the query-local dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 8
9. Main Idea
● Intertwine query evaluation with traversal of data links
● We alternate between:
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the query-local dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 9
10. Main Idea
● Intertwine query evaluation with traversal of data links
● We alternate between:
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the query-local dataset
http://bob.name
Query kno
ws
http://bob.name
http://alice.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 10
11. Main Idea
● Intertwine query evaluation with traversal of data links
?acq
● We alternate between:
http://alice.name
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the query-local dataset
http://bob.name
Query kno
ws
http://bob.name
http://alice.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 11
12. Main Idea
● Intertwine query evaluation with traversal of data links
?acq
● We alternate between:
http://alice.name
● Evaluate parts of the query (triple patterns)
? me
on a continuously augmented set of data
a
e.n
lic
a
://
● Look up URIs in intermediate
p
htt
solutions and add retrieved data
to the query-local dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 12
13. Main Idea
● Intertwine query evaluation with traversal of data links
?acq
● We alternate between:
http://alice.name
● Evaluate parts of the query (triple patterns)
? me
on a continuously augmented set of data
a
e.n
lic
a
://
● Look up URIs in intermediate
p
htt
solutions and add retrieved data
to the query-local dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 13
14. Main Idea
● Intertwine query evaluation with traversal of data links
?acq
● We alternate between:
http://alice.name
● Evaluate parts of the query (triple patterns)
? me
on a continuously augmented set of data
a
e.n
lic
a
://
● Look up URIs in intermediate
p
htt
solutions and add retrieved data
to the query-local dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 14
15. Main Idea
● Intertwine query evaluation with traversal of data links
?acq
● We alternate between:
http://alice.name
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the query-local dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 15
16. Main Idea
● Intertwine query evaluation with traversal of data links
?acq
● We alternate between:
http://alice.name
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the query-local dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 16
17. Main Idea
● Intertwine query evaluation with traversal of data links
?acq
● We alternate between:
http://alice.name
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate
solutions and add retrieved data
to the query-local dataset
http://alice.name
Query pr o
http://bob.name jec
t
?prjName http://.../AlicesPrj
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 17
18. Main Idea
● Intertwine query evaluation with traversal of data links
?acq
● We alternate between:
http://alice.name
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate ?acq ?prj
http://alice.name http://.../AlicesPrj
solutions and add retrieved data
to the query-local dataset
http://alice.name
Query pr o
http://bob.name jec
t
?prjName http://.../AlicesPrj
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 18
19. Main Idea
● Intertwine query evaluation with traversal of data links
?acq
● We alternate between:
http://alice.name
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate ?acq ?prj
http://alice.name http://.../AlicesPrj
solutions and add retrieved data
to the query-local dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 19
20. Main Idea
● Intertwine query evaluation with traversal of data links
?acq
● We alternate between:
http://alice.name
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate ?acq ?prj
http://alice.name http://.../AlicesPrj
solutions and add retrieved data
to the query-local dataset ?prj ?prjName
http://.../AlicesPrj “…“
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 20
21. Main Idea
● Intertwine query evaluation with traversal of data links
?acq
● We alternate between:
http://alice.name
● Evaluate parts of the query (triple patterns)
on a continuously augmented set of data
● Look up URIs in intermediate ?acq ?prj
http://alice.name http://.../AlicesPrj
solutions and add retrieved data
to the query-local dataset ?prj ?prjName
http://.../AlicesPrj “…“
Query ?acq ?prj ?prjName
http://bob.name
?prjName http://alice.name http://.../AlicesPrj “…“
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 21
22. Characteristics
● Link traversal based query execution:
● Evaluation on a continuously augmented dataset
● Discovery of potentially relevant data during execution
● Discovery driven by intermediate solutions
● Main advantage:
● No need to know all data sources in advance
● Limitations:
● Query has to contain a URI as a starting point
●
Ignores data that is not reachable* by the query execution
*
formal definition in [LDOW'11a]
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 22
23. The Issue
Query
?acq interest
?i
s
ow
label
kn
http://bob.name
?iLabel
query-local
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 23
24. The Issue
Query
?acq interest
?i
s
ow
label
kn
http://bob.name
?iLabel
htt query-local
p: //b
ob dataset
? .nam
e
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 24
25. The Issue
Query
?acq interest http://bob.name
?i
kno
s
ow
w s
label
kn
http://alice.name
http://bob.name
?iLabel
query-local
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 25
26. The Issue
Query
?acq interest http://bob.name
?i
kno
s
ow
w s
label
kn
http://alice.name
http://bob.name
?iLabel
query-local
dataset
?acq ?i ?iLabel
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 26
27. The Issue
Query
?acq interest
?i
s
ow
label
kn
http://bob.name
?iLabel
query-local
dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 27
28. Reusing the Query-Local Dataset
Query
?acq interest
?i
s
ow
label
kn
http://bob.name
?iLabel
query-local
dataset
Query
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 28
29. Reusing the Query-Local Dataset
Query
?acq interest
?i
s
ow
label
kn
http://bob.name
?iLabel
http://alice.name
o ws
Query kn
http://bob.name
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 29
30. Reusing the Query-Local Dataset
Query
?acq interest
?i ?acq
s
ow
http://alice.name
label
kn
http://bob.name
?iLabel
http://alice.name
o ws
Query kn
http://bob.name
http://bob.name
?prjName
s
ow
me
kn
na
?acq query-local
project ?prj
dataset
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 30
31. Hypothesis
Re-using the query-local dataset (a.k.a. data caching)
may benefit
query performance + result completeness
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 31
32. Contributions
● Systematic analysis of the impact of data caching
●
Theoretical foundation*
●
Conceptual analysis*
● Empirical evaluation of the potential impact
*
see [LDOW'11a]
● Out of scope: Caching strategies (replacement, invalidation)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 32
33. Experiment – Scenario
● Information about the
distributed social
network of FOAF
profiles
● 5 types of queries
● Experiment Setup:
● 20 persons
● Sequential use
➔ 100 queries
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 33
34. Experiment – Single Query
no reuse reuse 0 10 20 30 40 50 60
query
per ● no reuse experiment:
0,01 0,1 1 10 100
ContactInfoDanBri
● No data caching
(Query No. 61) ● reuse per query experiment
UnsetPropsDanBri ● Reuse of query-local dataset
(Query No. 62) for 3 executions of each query
2ndDegree1DanBri
● Third execution measured
(Query No. 63)
2ndDegree2DanBri
(Query No. 64)
IncomingDanBri
(Query No. 65)
0 10 20 30 40 50 60 0,01 0,1 1 10 100
number of query results query execution time (in seconds)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 34
35. Experiment – Single Query
no reuse reuse 0 10 20 30 40 50 60
query
per ● no 0,01
reuse experiment:
0,1 1 10 100
ContactInfoDanBri
● No data caching
(Query No. 61) ● reuse per query experiment
UnsetPropsDanBri ● Reuse of query-local dataset
(Query No. 62) for 3 executions of each query
2ndDegree1DanBri
● Third execution measured
(Query No. 63)
2ndDegree2DanBri
(Query No. 64)
IncomingDanBri
(Query No. 65)
0 10 20 30 40 50 60 0,01 0,1 1 10 100
number of query results query execution time (in seconds)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 35
36. Experiment – Single Query
no reuse reuse 0 10 20 30 40 50 60
per 0,01 0,1 1 10 100
query
ContactInfoDanBri
(Query No. 61)
UnsetPropsDanBri
(Query No. 62)
2ndDegree1DanBri
(Query No. 63)
2ndDegree2DanBri
(Query No. 64)
IncomingDanBri
(Query No. 65)
0 10 20 30 40 50 60 0,01 0,1 1 10 100
number of query results query execution time (in seconds)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 36
37. Experiment – Single Query
no reuse reuse 0 10 20 30 40 50 60
per 0,01 0,1 1 10 100
query
ContactInfoDanBri
(Query No. 61)
UnsetPropsDanBri
(Query No. 62)
2ndDegree1DanBri
(Query No. 63)
2ndDegree2DanBri
(Query No. 64)
IncomingDanBri
(Query No. 65)
0 10 20 30 40 50 60 0,01 0,1 1 10 100
number of query results query execution time (in seconds)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 37
38. Experiment – Single Query
no reuse reuse 0 10 20 30 40 50 60
per 0,01 0,1 1 10 100
query
ContactInfoDanBri
(Query No. 61)
UnsetPropsDanBri
(Query No. 62)
2ndDegree1DanBri
(Query No. 63)
2ndDegree2DanBri
(Query No. 64)
IncomingDanBri
(Query No. 65)
0 10 20 30 40 50 60 0,01 0,1 1 10 100
number of query results query execution time (in seconds)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 38
39. Experiment – Complete Sequence
no reuse reuse 0 10 20 30 all 50 60
query
per reuse 40
queries
● reuse all queries experiment: 100
0,01 0,1 1 10
ContactInfoDanBri
● Reuse of the query-local
(Query No. 61) dataset for the complete
sequence of all 100 queries
UnsetPropsDanBri
(Query No. 62)
2ndDegree1DanBri
(Query No. 63)
2ndDegree2DanBri
(Query No. 64)
IncomingDanBri
(Query No. 65)
0 10 20 30 40 50 60 0,01 0,1 1 10 100
number of query results query execution time
(in seconds)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 39
40. Experiment – Complete Sequence
no reuse reuse 0 10 20 30 all 50 60
per reuse 40 0,01 0,1 1 10 100
query queries
ContactInfoDanBri
(Query No. 61)
UnsetPropsDanBri
(Query No. 62)
2ndDegree1DanBri
(Query No. 63)
2ndDegree2DanBri
(Query No. 64)
IncomingDanBri
(Query No. 65)
0 10 20 30 40 50 60 0,01 0,1 1 10 100
number of query results query execution time
(in seconds)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 40
41. Experiment – Complete Sequence
no reuse reuse 0 10 20 30 all 50 60
per reuse 40 0,01 0,1 1 10 100
query queries
ContactInfoDanBri
(Query No. 61)
UnsetPropsDanBri
(Query No. 62)
2ndDegree1DanBri
(Query No. 63)
2ndDegree2DanBri
(Query No. 64)
IncomingDanBri
(Query No. 65)
0 10 20 30 40 50 60 0,01 0,1 1 10 100
number of query results query execution time
(in seconds)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 41
42. Experiment – Complete Sequence
no reuse reuse 0 10 20 30 all 50 60
per reuse 40 0,01 0,1 1 10 100
query queries
ContactInfoDanBri
(Query No. 61)
UnsetPropsDanBri
(Query No. 62)
2ndDegree1DanBri
(Query No. 63)
2ndDegree2DanBri
(Query No. 64)
IncomingDanBri
(Query No. 65)
0 10 20 30 40 50 60 0,01 0,1 1 10 100
number of query results query execution time
(in seconds)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 42
43. Experiment – Complete Sequence
no reuse reuse 0 10 20 30 all 50 60
per reuse 40 0,01 0,1 1 10 100
query queries
ContactInfoDanBri
(Query No. 61)
UnsetPropsDanBri
(Query No. 62)
2ndDegree1DanBri
(Query No. 63)
2ndDegree2DanBri
(Query No. 64)
IncomingDanBri
(Query No. 65)
0 10 20 30 40 50 60 0,01 0,1 1 10 100
number of query results query execution time
(in seconds)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 43
44. Outlook
● Requirements of a data cache:
● Replacement mechanism
● Coherency mechanism
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 44
45. Cache Replacement
● Cache full → remove descriptor objects
● Replacement strategy
● Primary goal: maximize hit rate
● Recency-based
● Frequency-based
● Function-based
● Randomized
● Replacement process
● Watermarks: high and low
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 45
47. Studying Cache Replacement?
“Web cache replacement in its general
form seems to be a solved topic.”
S. Podlipnig and L. Böszörmenyi: Survey of
Web Cache Replacement Strategies, 2003
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 47
48. Studying Cache Replacement?
“Web cache replacement in its general
form seems to be a solved topic.”
S. Podlipnig and L. Böszörmenyi: Survey of
Web Cache Replacement Strategies, 2003
●
6 quad indexes* in main memory
● Size grows linear in the number of quads
● Example (after reuse all queries experiment, 100 queries):
● 905 descriptor objects, overall number of 745,756 triples
● ca. 103 MB
➔ Available main memory is almost no limit
*
as introduced in [LDOW'11b]
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 48
49. Cache Coherency
● Data items in the cache may become inconsistent
● Strong cache consistency
● Server validation
● Client validation
● Weak cache consistency
● Time to live (TTL)
● Adaptive TTL
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 49
50. Client Validation
● Polling every time
● Enables strong cache consistency
● Conditional GET
● Request with If-Modified-Since header
● Possible response: 304 Not Modified
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 50
51. Client Validation
● Polling every time
● Enables strong cache consistency
● Conditional GET
● Request with If-Modified-Since header
● Possible response: 304 Not Modified
● Not supported by most Linked Data servers
● Experiment based on the CKAN catalog of linked datasets
● 41 out of 154 example resources (26.6%) from 110 datasets
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 51
52. Time to Live (TTL)
● TTL field: life time estimation for each object
● Supported by HTTP response headers:
● Expires
● Cache-Control: max-age
● When TTL elapses, object is invalid
● Accessing an invalid object → re-retrieve object again
● Conditional GET
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 52
53. Time to Live (TTL)
● TTL field: life time estimation for each object
● Supported by HTTP response headers:
● Expires 37.0%
● Cache-Control: max-age 37.7%
● When TTL elapses, object is invalid
● Accessing an invalid object → re-retrieve object again
● Conditional GET 26.6%
● Alternative (due to lack of support in Linked Data servers):
● Assume a default TTL for each object
● Ordinary GET
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 53
54. Adaptive TTL
● Assumption:
● The older an object, the less likely it is to be modified
● TTL is a percentage of the age:
● Threshold = 10% ; age = 30 days → TTL = 3 days
● Last verification: yesterday → invalidation in 2 days
● HTTP-based implementation:
● Calculation of age: use Last-Modified response header
● Verification with conditional GET
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 54
55. Adaptive TTL
● Assumption:
● The older an object, the less likely it is to be modified
● TTL is a percentage of the age:
● Threshold = 10% ; age = 30 days → TTL = 3 days
● Last verification: yesterday → invalidation in 2 days
● HTTP-based implementation: 35.1%
● Calculation of age: use Last-Modified response header
● Verification with conditional GET
● Alternative (due to lack of support in Linked Data servers):
● Assume Last-Modified is time of first retrieval
● Verification by comparing a response to the current version
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 55
56. Summary
● Systematic analysis of the impact of data cache
● Theoretical foundation
● Conceptual analysis
● Empirical evaluation
● Main findings:
● Additional results possible (for semantically similar queries)
● Impact on performance may be positive but also negative
● Future work:
● Analysis of caching strategies in our context
● Main issue: invalidation
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 56
58. Contributions
● Theoretical foundation (extension of the original definition)
● Reachability by a Dseed-initialized execution of a BGP query b
● Dseed-dependent solution for a BGP query b
● Reachability R(B) for a serial execution of B = b1 , … , bn
➔ Each solution for bcur is also R(B)-dependent solution for bcur
● Conceptual analysis of the impact of data caching
● Performance factor: p( bcur , B ) = c( bcur , [ ] ) – c( bcur , B )
● Serendipity factor: s( bcur , B ) = b( bcur , B ) – b( bcur , [ ] )
● Empirical verification of the potential impact
● Out of scope: Caching strategies (replacement, invalidation)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 58
60. Query Template UnsetProps
SELECT DISTINCT ?result ?resultLabel WHERE
{
?result rdfs:isDefinedBy <http://xmlns.com/foaf/0.1/> .
?result rdfs:domain foaf:Person .
OPTIONAL { <PERSON> ?result ?var0 }
FILTER ( !bound(?var0) )
<PERSON> foaf:knows ?var2 .
?var2 ?result ?var3 .
?result rdfs:label ?resultLabel .
?result vs:term_status ?var1 .
}
ORDER BY ?var1
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 60
61. Query Template Incoming
SELECT DISTINCT ?result WHERE
{
?result foaf:knows <PERSON> .
OPTIONAL
{
?result foaf:knows ?var1 .
FILTER ( <PERSON> = ?var1 )
<PERSON> foaf:knows ?result .
}
FILTER ( !bound(?var1) )
}
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 61
62. Query Template 2ndDegree1
SELECT DISTINCT ?result WHERE
{
<PERSON> foaf:knows ?p1 .
<PERSON> foaf:knows ?p2 .
FILTER ( ?p1 != ?p2 )
?p1 foaf:knows ?result .
FILTER ( <PERSON> != ?result )
?p2 foaf:knows ?result .
OPTIONAL {
<PERSON> ?knows ?result .
FILTER ( ?knows = foaf:knows )
}
FILTER ( !bound(?knows) )
}
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 62
63. Query Template 2ndDegree2
SELECT DISTINCT ?result WHERE
{
<PERSON> foaf:knows ?p1 .
<PERSON> foaf:knows ?p2 .
FILTER ( ?p1 != ?p2 )
?result foaf:knows ?p1 .
FILTER ( <PERSON> != ?result )
?result foaf:knows ?p2 .
OPTIONAL {
<PERSON> ?knows ?result .
FILTER ( ?knows = foaf:knows )
}
FILTER ( !bound(?knows) )
}
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 63
64. Experiment – Single Query
Experiment Avg.1 number of Average1 Avg.1 query
Query Results Hit Rate Execution Time
(std.dev.) (std.dev.) (std.dev.)
27.37 0.849 64.95 s
no reuse
(140.49) (0.205) (124.50)
26.71 1 0.02 s
reuse per query
(148.77) (0) (0.07)
1
Averaged over all 100 queries
● In the ideal case for Bupper= [ bcur , bcur ] :
● pupper( bcur , Bupper ) = c( bcur , [ ] ) – c( bcur , Bupper ) = c( bcur , [ ] )
● supper( bcur , Bupper ) = b( bcur , Bupper ) – b( bcur , [ ] ) = 0
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 64
65. Experiment – Single Query
Experiment Avg.1 number of Average1 Avg.1 query
Query Results Hit Rate Execution Time
(std.dev.) (std.dev.) (std.dev.)
27.37 0.849 64.95 s
no reuse
(140.49) (0.205) (124.50)
26.71 1 0.02 s
reuse per query
(148.77) (0) (0.07)
1
Averaged over all 100 queries
● Summary (measurement errors aside):
● Same number of query results
● Significant improvements in query performance
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 65
66. Experiment – Complete Sequence
Experiment Avg.1 number of Average1 Avg.1 query
Query Results Hit Rate Execution Time
(std.dev.) (std.dev.) (std.dev.)
27.37 0.849 64.95 s
no reuse
(140.49) (0.205) (124.50)
26.71 1 0.02 s
reuse per query
(148.77) (0) (0.07)
44.87 0.991 37.91 s
reuse all queries
(178.36) (0.053) (112.94)
1
Averaged over all 100 queries
● Summary:
● Data cache may provide for additional query results
● Impact on performance may be positive but also negative
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 66
67. Experiment – Complete Sequence
Experiment Avg.1 number of Average1 Avg.1 query
Query Results Hit Rate Execution Time
(std.dev.) (std.dev.) (std.dev.)
27.37 0.849 64.95 s
no reuse
(140.49) (0.205) (124.50)
26.71 1 0.02 s
reuse per query
(148.77) (0) (0.07)
44.87 0.991 37.91 s
reuse all queries
(178.36) (0.053) (112.94)
reuse all queries 118.18 0.992 20.61 s
(random orders) (867.07) (0.016) (216.61)
● Executing the query sequence in a random order results in
measurements similar to the given order.
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 67
68. These slides have been created by
Olaf Hartig
http://olafhartig.de
This work is licensed under a
Creative Commons Attribution-Share Alike 3.0 License
(http://creativecommons.org/licenses/by-sa/3.0/)
Olaf Hartig - The Impact of Data Caching on Query Execution for Linked Data 68