These are the slides of my invited talk at the 5th Int. Workshop on Usage Analysis and the Web of Data (USEWOD 2015): http://usewod.org/usewod2015.html
The abstract of this talks is given as follows:
To reduce user-perceived response time many interactive Web applications visualize information in a dynamic, incremental manner. Such an incremental presentation can be particularly effective for cases in which the underlying data processing systems are not capable of completely answering the users' information needs instantaneously. An example of such systems are systems that support live querying of the Web of Data, in which case query execution times of several seconds, or even minutes, are an inherent consequence of these systems' ability to guarantee up-to-date results. However, support for an incremental result visualization has not received much attention in existing work on such systems. Therefore, the goal of this talk is to discuss approaches that enable query systems for the Web of Data to return query results incrementally.
Rethinking Online SPARQL Querying to Support Incremental Result Visualization
1. Rethinking Online SPARQL Querying
to Support
Incremental Result Visualization
Olaf Hartig
http://olafhartig.de
@olafhartig
2. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 2
Prologue
3. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 3
Live Querying the Web of Data
● Federated query processing
– i.e., querying a federation of SPARQL endpoints
● Linked Data query processing
– i.e., querying Linked Data by relying only on the
Linked Data principles (interface: URI lookups)
– e.g., traversal-based query execution
● Querying other Linked Data fragment servers
– e.g., triple pattern fragments
4. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 4
Chapter 1
5. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 5
Can the progress that has been made
on (Read/Write) Linked Data change the
way we interact with the Web […] ?”
“
6. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 6
Information in Dynamic Web Pages
Support for such an incremental visualization
has not received much attention in existing
work on querying the Web of Data
7. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 7
“
I think we have not made enough progress to even
enable well-understood interaction techniques that
are widely applied in “traditional” Web applications
Can the progress that has been made
on (Read/Write) Linked Data change the
way we interact with the Web […] ?”
“
8. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 8
Topics
Opportunities to Optimize the Response
Times of Traversal-based Query Executions
Making the Core Fragment of SPARQL
Suitable for the Task
9. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 9
Chapter 2
10. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 10
Implementation Approach
Data Retrieval
Operator
Triple
Pattern
Operator
Triple
Pattern
Operator
Dispatcher
. . .
Triple pattern
( ?v1, knows, ?v2 )
11. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 11
Data Retrieval Operator
Dispatcher
. . .
GET http://example.org/...
. . . . . . . .
RDF triple
( Bob, knows, Alice )
Triple pattern
( ?v1, knows, ?v2 )
Triple
Pattern
Operator
Triple
Pattern
Operator
16. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 16
Output
Properties
. . .
. . . . . . . .
TP Operator
Data
Retrieval
Dispatcher
TP Operator
● Supports:
– any reachability-based
query semantics
● Highly flexible
– routing of intermediate
solutions
● Inspired by “Eddies”
– Avnur & Hellerstein,
SIGMOD 2000
17. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 17
Hypothesis 1
Responses time can be reduced
by applying a suitable routing policy.
18. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 18
Test of Different Routing Policies
Setup:
● Data retrieval operator simply appends to its lookup queue
● Web simulation environment (test Web: W-62-47, test query: Q1, details: [Hartig and Özsu 2014])
● Each bar represents geometric mean of 5 separate executions
Response time for
last reported solution,
relative to overall QET
Response time for
first reported solution,
relative to overall QET
Routing policy
has no impact!
19. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 19
Hypothesis 1
Responses time can be reduced
by applying a suitable routing policy.
No!
Why?
20. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 20
Data Retrieval Dominates!!!
Query 1 Query 4 Query 5 Query 9 Query 10
0.1
1
10
100
1000
10000
100000
10 threads 20 threads cache
avg.queryexec.time(seconds)
logscale!
5 queries of the FedBench benchmark suite,
executed over real Linked Data on the WWW
Different number of lookup threads
used by the data retrieval operator Data retrieval op. equipped with a cache
● Cache populated
by a first execution
● Times measured for
a 2nd, cache-only
execution (i.e., data
retrieval deactivated)
21. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 21
Hypothesis 2
Response times can be reduced
by choosing a “good” strategy
of prioritizing URI lookups.
. . . . . . . .
22. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 22
0 1 2 3 4 5 6
0
5
10
15
20
25
30
35
QET
exec1
exec2
exec3
exec4
exec5
Prioritizing Lookups Randomly
result elements
timefrombeginofthequeryexecution
(inminutes)
ca. 25% of QET
ca. 58%
Setup:
● LD10 of the FedBench benchmark suite,
over real Linked Data on the WWW
23. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 23
Hypothesis 2
Response times can be reduced
by choosing a “good” strategy
of prioritizing URI lookups.
√
24. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 24
Question
Response times can be reduced
by choosing a “good” strategy
of prioritizing URI lookups.
√
What is
?
25. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 25
Chapter 3
26. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 26
Topics
Opportunities to Optimize the Response
Times of Traversal-based Query Executions √
Making the Core Fragment of SPARQL
Suitable for the Task
(by making it monotonic)
27. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 27
Monotonicity?
● Query Q is monotonic if for every pair ( , ) of
possible databases, it holds that:
● Example: the SPARQL pattern is
P = (a, p,?x) OPT (?x, p,?y)
is not monotonic
– G1 = { (a, p, b) }
– G2 = { (a, p, b), (b, p, c) }
– ⟦P⟧G1 = { μ }, where μ = { ?x → b }
– ⟦P⟧G2 = { μ' }, where μ' = { ?x → b, ?y → c } ≠ μ !
⟹ Q( ) ⊆ Q( )
28. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 28
What is the Issue?
● For any non-monotonic query, elements of
the result set can be output only after we
have seen all query-relevant parts of the DB
– Hence, since we discover our DB (the Web of Data)
at runtime, we can output result elements only after
completing the discovery process
● Good news: the AND-UNION-FILTER fragment of
SPARQL is monotonic [Arenas and Perez 2011]
● Bad news: for the AND-UNION-FILTER-OPT fragment,
monotonicity is undecidable [Hartig 2014]
– i.e., queries with OPT may be non-monotonic
29. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 29
What is the Usage of OPT?
● DBpedia
– 46.4% of ca. 1.3M unique queries
(logs from Apr. – Jul. 2010)
Picalausa and Vansummeren, in SWIM 2011
– 16.6% (logs from USEWOD 2011 dataset)
Gallego et al., in USEWOD 2011
– 15% (logs from USEWOD 2011 dataset)
Elbedweihy et al., in COLD 2011
● Semantic Web conference corpus (SWDF)
– 0.4% (logs from USEWOD 2011 dataset)
Gallego et al., in USEWOD 2011
30. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 30
A Proposal: The OPT
+
Operator
● Query Q is monotonic if for every pair ( , ) of
possible databases, it holds that:
●
● Recall our example: the SPARQL pattern is
P' = (a, p,?x) OPT (?x, p,?y)
is not monotonic
– G1 = { (a, p, b) }, G2 = { (a, p, b), (b, p, c) }
– ⟦P'⟧G1 = { μ }, where μ = { ?x → b }
– ⟦P'⟧G2 = { μ, μ' }, where μ' = { ?x → b, ?y → c } ≠ μ !
● 〚 P1 OPT+
P2 〛 G = ( 〚 P1 〛 G ⋈ 〚 P2 〛 G ) υ ( 〚 P1 〛 G 〚 P2 〛 G )
● 〚 P1 OPT+
P2 〛 G = ( 〚 P1 〛 G ⋈ 〚 P2 〛 G ) υ 〚 P1 〛 G
➔ P1 OPT+
P2 ≡ (P1 AND P2) UNION P1
31. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 31
A Proposal: The OPT
+
Operator
● Query Q is monotonic if for every pair ( , ) of
possible databases, it holds that:
●
● Recall our example: the SPARQL pattern is
P' = (a, p,?x) OPT+
(?x, p,?y)
is not monotonic √
– G1 = { (a, p, b) }, G2 = { (a, p, b), (b, p, c) }
– ⟦P'⟧G1 = { μ }, where μ = { ?x → b }
– ⟦P'⟧G2 = { μ, μ' }, where μ' = { ?x → b, ?y → c } ≠ μ !
● 〚 P1 OPT+
P2 〛 G = ( 〚 P1 〛 G ⋈ 〚 P2 〛 G ) υ ( 〚 P1 〛 G 〚 P2 〛 G )
● 〚 P1 OPT+
P2 〛 G = ( 〚 P1 〛 G ⋈ 〚 P2 〛 G ) υ 〚 P1 〛 G
➔ P1 OPT+
P2 ≡ (P1 AND P2) UNION P1
√
32. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 32
A Proposal: The OPT
+
Operator
● 〚 P1 OPT+
P2 〛 G = ( 〚 P1 〛 G ⋈ 〚 P2 〛 G ) υ ( 〚 P1 〛 G 〚 P2 〛 G )
● 〚 P1 OPT+
P2 〛 G = ( 〚 P1 〛 G ⋈ 〚 P2 〛 G ) υ 〚 P1 〛 G
➔ P1 OPT+
P2 ≡ (P1 AND P2) UNION P1
33. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 33
Epilogue
34. Rethinking Online SPARQL Querying to Support Incremental Result Visualization - Olaf Hartig 34
Conclusions
● Returning result elements early has not yet
received sufficient attention in existing work
on live querying the Web of Data
● Prioritizing data retrieval can reduce response
times of traversal-based query executions
What approaches are suitable and effective?
Similar for federated query processing, LDFs?
● Language features have to be chosen with care
Their impact has to be studied
Dedicated optimization techniques are possible