The document discusses enabling web clients to directly query linked data through low-cost publishing infrastructures. It proposes that linked data APIs can be simplified by defining data fragments that clients can query, such as triple pattern fragments which allow querying matches of a triple pattern along with metadata on total matches and access to other fragments. This lower-cost approach of publishing linked data through triple pattern fragments aims to make linked data publishing more sustainable and democratic.
1. METADATA AND CONTROL FEATURES
FOR LOW-COST LINKED DATA
PUBLISHING INFRASTRUCTURES
Public defense Drs. Miel Vander Sande
DEPARTMENT ELIS
RESEARCH GROUP IDLAB
8. Web Client Web Server
“Give me
http://wikipedia.org/wiki/Umberto_Eco”
Request
9. Web Client Web Server
“Give me
http://wikipedia.org/wiki/Umberto_Eco”
Request
Response“Here’s the document
http://wikipedia.org/wiki/Umberto_Eco”
10. Web Client Web Server
“Give me
http://wikipedia.org/wiki/Umberto_Eco”
Request
Response“Here’s the document
http://wikipedia.org/wiki/Umberto_Eco”
HTTP
protocol
18. displays
married
The Name of The Rose
Umberto Eco
Alessandria
Wikipedia page of
Umberto Eco
Renate Ramge
displays
name
author
name
name
name
birthplace
20. displays
birthplace
married
The Name of The Rose
Umberto Eco
Alessandria
Renate Ramge
displays
name
author
name
name
stars
Sean Connery
name
about
name
Jean-Jacques Annaud
name
director
21. displays
birthplace
married
The Name of The Rose
Umberto Eco
Alessandria
Renate Ramge
displays
name
author
name
name
stars
Sean Connery
name
about
name
Jean-Jacques Annaud
name
director
Film database
Book database
22. displays
birthplace
married
The Name of The Rose
Umberto Eco
Alessandria
Renate Ramge
displays
name
author
name
name
Linked Data
stars
Sean Connery
name
about
name
Jean-Jacques Annaud
name
director
Film database
Book database
23. What actor stars in films based on
books by ?Umberto Eco
Web applications execute queries to yield answers
24. What actor stars in films based on
books by ?Umberto Eco
author
name
stars
Sean Connery
name
about
Web applications execute queries to yield answers
27. Website Web API
Client
Server Linked Data
API
machine
readable
human
readable
+ understandable
machine
understandable
28. Around 10.000 published
Linked Open Datasets.
But not many are directly queryable.
Most datasets
require download
Unavailable for at least
1,5 days / month
29. It is partly an architectural problem
with economical repercussions.
.,
Many data publishers are under-resourced,
looking for “good-enough” solutions.
30. It is partly an architectural problem
with economical repercussions.
.,
Many data publishers are under-resourced,
looking for “good-enough” solutions.
31. Can we enable Web clients to
query Linked data directly,
32. Can we enable Web clients to
query Linked data directly,
while lowering infrastructure cost
by simplifying Linked Data APIs
33. Can we enable Web clients to
query Linked data directly,
while lowering infrastructure cost
by simplifying Linked Data APIs
thus making Linked Data publishing
more democratic and sustain better?
34. How do Web clients
query published Linked Data today
query a Linked Data API with
lower server cost
discover and query multiple
low-cost Linked Data APIs
reproduce query results
1
2
3
4
35. How do Web clients
query published Linked Data today
query a Linked Data API with
lower server cost
discover and query multiple
low-cost Linked Data APIs
reproduce query results
1
2
3
4
37. displays
married
The Name of The Rose
Umberto Eco
Alessandria
Renate Ramge
displays
name
author
name
name
name
birthplace
38. umberto.jpg displays umberto-eco
umberto-eco name “Umberto Eco”
umberto-eco birthplace allessandria
renate.jpg displays renate ramge
renate-ramge name “Renate Ramge”
renate-ramge married umberto-eco
the-name-of-the-rose name “The name of the rose”
allessandria name “Allessandria"
the-name-of-the-rose author umberto-eco
39. umberto.jpg
Linked Dataset/Graph
displays umberto-eco
umberto-eco name “Umberto Eco”
umberto-eco birthplace allessandria
renate.jpg displays renate ramge
renate-ramge name “Renate Ramge”
renate-ramge married umberto-eco
the-name-of-the-rose name “The name of the rose”
allessandria name “Allessandria"
the-name-of-the-rose author umberto-eco
40. umberto.jpg displays umberto-eco
umberto-eco name “Umberto Eco”
umberto-eco birthplace allessandria
renate.jpg displays renate ramge
renate-ramge name “Renate Ramge”
renate-ramge married umberto-eco
the-name-of-the-rose name “The name of the rose”
allessandria name “Allessandria"
the-name-of-the-rose author umberto-eco
Triple
Subject Predicate Object
41. umberto.jpg displays umberto-eco
umberto-eco name “Umberto Eco”
umberto-eco birthplace allessandria
renate.jpg displays renate ramge
renate-ramge name “Renate Ramge”
renate-ramge married umberto-eco
the-name-of-the-rose name “The name of the rose”
allessandria name “Allessandria"
the-name-of-the-rose author umberto-eco
http://dbpedia.org/resource/Umberto_Eco
http://dbpedia.org/resource/Alessandria
http://dbpedia.org/ontology/birthPlace
Triple
URI URI URI (or Value)
42. umberto.jpg displays umberto-eco
umberto-eco name “Umberto Eco”
umberto-eco birthplace allessandria
renate.jpg displays renate ramge
renate-ramge name “Renate Ramge”
renate-ramge married umberto-eco
the-name-of-the-rose name “The name of the rose”
allessandria name “Allessandria"
the-name-of-the-rose author umberto-eco
URI URI URI (or Value)
dbr:Umberto_Eco
dbr:Alessandria
dbo:birthPlace
Triple
43. SELECT ?bookname
WHERE {
?person dbo:name “Umberto Eco”.
?book dbo:author ?person;
dbo:name ?bookname.
}
What book was written by
Umberto Eco?
Queries over Linked Data are written in SPARQL
44. SELECT ?bookname
WHERE {
?person dbo:name “Umberto Eco”.
?book dbo:author ?person;
dbo:name ?bookname.
}
“I want to select a value”
What book was written by
Umberto Eco?
Queries over Linked Data are written in SPARQL
45. SELECT ?bookname
WHERE {
?person dbo:name “Umberto Eco”.
?book dbo:author ?person;
dbo:name ?bookname.
}
“I want to select a value”
“I’m looking for somebody
named ‘Umberto Eco’”
What book was written by
Umberto Eco?
Queries over Linked Data are written in SPARQL
46. SELECT ?bookname
WHERE {
?person dbo:name “Umberto Eco”.
?book dbo:author ?person;
dbo:name ?bookname.
}
“I want to select a value”
“I’m looking for somebody
named ‘Umberto Eco’”
“Some book has
that somebody
as author”
What book was written by
Umberto Eco?
Queries over Linked Data are written in SPARQL
47. SELECT ?bookname
WHERE {
?person dbo:name “Umberto Eco”.
?book dbo:author ?person;
dbo:name ?bookname.
}
“I want to select a value”
“I’m looking for somebody
named ‘Umberto Eco’”
“Some book has
that somebody
as author”
“That book must have a name”
What book was written by
Umberto Eco?
Queries over Linked Data are written in SPARQL
49. ?person name “Umberto Eco”
?book author ?person
?book name ?bookname
Triple pattern
Variable URI Value (or URI)
50. umberto.jpg displays umberto-eco
umberto-eco name “Umberto Eco”
umberto-eco birthplace allessandria
renate.jpg displays renate ramge
renate-ramge name “Renate Ramge”
renate-ramge married umberto-eco
the-name-of-the-rose name “The name of the rose”
allessandria name “Allessandria"
the-name-of-the-rose author umberto-eco
?person name “Umberto Eco”
51. umberto.jpg displays umberto-eco
umberto-eco name “Umberto Eco”
umberto-eco birthplace allessandria
renate.jpg displays renate ramge
renate-ramge name “Renate Ramge”
renate-ramge married umberto-eco
the-name-of-the-rose name “The name of the rose”
allessandria name “Allessandria"
the-name-of-the-rose author umberto-eco
?person name “Umberto Eco”
52. umberto.jpg displays umberto-eco
umberto-eco name “Umberto Eco”
umberto-eco birthplace allessandria
renate.jpg displays renate ramge
renate-ramge name “Renate Ramge”
renate-ramge married umberto-eco
the-name-of-the-rose name “The name of the rose”
allessandria name “Allessandria"
the-name-of-the-rose author umberto-eco
?person name “Umberto Eco”
53. umberto-eco name “Umberto Eco”
renate-ramge name “Renate Ramge”
the-name-of-the-rose name “The name of the rose”
allessandria name “Allessandria"
?person name “Umberto Eco”
56. umberto-eco
umberto.jpg displays umberto-eco
umberto-eco name “Umberto Eco”
umberto-eco birthplace allessandria
renate.jpg displays renate ramge
renate-ramge name “Renate Ramge”
renate-ramge married umberto-eco
the-name-of-the-rose name “The name of the rose”
allessandria name “Allessandria"
the-name-of-the-rose author umberto-eco
umberto-eco?person:
?book author ?person
57. umberto-eco
umberto.jpg displays umberto-eco
umberto-eco name “Umberto Eco”
umberto-eco birthplace allessandria
renate.jpg displays renate ramge
renate-ramge name “Renate Ramge”
renate-ramge married umberto-eco
the-name-of-the-rose name “The name of the rose”
allessandria name “Allessandria"
the-name-of-the-rose author umberto-eco
?person:
?book author umberto-eco
59. umberto-eco
umberto.jpg displays umberto-eco
umberto-eco name “Umberto Eco”
umberto-eco birthplace allessandria
renate.jpg displays renate ramge
renate-ramge name “Renate Ramge”
renate-ramge married umberto-eco
the-name-of-the-rose name “The name of the rose”
allessandria name “Allessandria"
the-name-of-the-rose author umberto-eco
umberto-eco?person:
The-name-of-the-rose?book:
?book name ?bookname
the-name-of-the-rose
60. umberto-eco
umberto.jpg displays umberto-eco
umberto-eco name “Umberto Eco”
umberto-eco birthplace allessandria
renate.jpg displays renate ramge
renate-ramge name “Renate Ramge”
renate-ramge married umberto-eco
the-name-of-the-rose name “The name of the rose”
allessandria name “Allessandria"
the-name-of-the-rose author umberto-eco
umberto-eco?person:
The-name-of-the-rose?book:
name ?booknamethe-name-of-the-rose
61. umberto-eco
the-name-of-the-rose name “The name of the rose”
umberto-eco?person:
The-name-of-the-rose?book:
name ?booknamethe-name-of-the-rose
?bookname: “The name of the rose”
62. ?person name “Umberto Eco”
?book author ?person
?book name ?bookname
Order A
?person name “Umberto Eco”
?book author ?person
Order B
?book name ?bookname
1 + 1 + 1 = 3 operations
4 + 1 + 4 = 9 operations
1
1
1
1
4
4
63. How do Web clients
query published Linked Data today
query a Linked Data API with
lower server cost
discover and query multiple
low-cost Linked Data APIs
reproduce query results
1
2
3
4
66. filename URI SPARQL Query
Data
dump
Linked Data
document
SPARQL
Endpoint
Request
Response results
67. filename URI SPARQL Query
Data
dump
Linked Data
document
SPARQL
Endpoint
Client
Network
Traffic
Request
Response
Server
results
68. filename URI SPARQL Query
Data
dump
Linked Data
document
SPARQL
Endpoint
Client
Network
Traffic
Request
Response
Server
results
69. filename URI SPARQL Query
Data
dump
Linked Data
document
SPARQL
Endpoint
Client
Network
Traffic
Request
Response
Server
results
70. highservercostlow server cost
data
dump
SPARQL
endpoint
API offered by the server
high availability low availability
high network traffic low network traffic
out-of-date data live data
lowclientcosthigh client cost
LinkedData
documents
Offers
specific fragments
of a Linked Dataset.
Hunting for trade-offs between client & server:
Linked Data Fragments
71. data
metadata
controls
What triples does it contain?
What do we know about it?
How to access more data?
Each type of Linked Data Fragment is defined by
3 characteristics
72. A low-cost API that enables clients to query:
Triple Pattern Fragments
low server cost
data
dump
SPARQL
endpoint
high availability
live data
LinkedData
documents
triplepattern
fragments
73. matches of a triple pattern
total number of matches
access to all other fragments
data
metadata
controls
(in pages)
A low-cost API that enables clients to query
Triple Pattern Fragments
78. SPARQL Layer
Fragment Layer
HTTP LayerClient
Server
TPF API
triple
pattern fragment
How clients evaluate SPARQL over
Triple Pattern Fragments APIs
79. SPARQL Layer
Fragment Layer
HTTP LayerClient
Server
TPF API
triple
pattern fragment
How clients evaluate SPARQL over
Triple Pattern Fragments APIs
GiveclientaSPARQLqueryand
anyfragmentURI.
80. SPARQL Layer
Fragment Layer
HTTP LayerClient
Server
TPF API
triple
pattern fragment
How clients evaluate SPARQL over
Triple Pattern Fragments APIs
GiveclientaSPARQLqueryand
anyfragmentURI.
Clientslookinsidethefragment
toseehowtoaccesstheAPI.
81. SPARQL Layer
Fragment Layer
HTTP LayerClient
Server
TPF API
triple
pattern fragment
How clients evaluate SPARQL over
Triple Pattern Fragments APIs
GiveclientaSPARQLqueryand
anyfragmentURI.
Clientslookinsidethefragment
toseehowtoaccesstheAPI.
Clientsissuearequesttotheserverfor
eachtriplepattern
82. SPARQL Layer
Fragment Layer
HTTP LayerClient
Server
TPF API
triple
pattern fragment
How clients evaluate SPARQL over
Triple Pattern Fragments APIs
GiveclientaSPARQLqueryand
anyfragmentURI.
Clientslookinsidethefragment
toseehowtoaccesstheAPI.
andusethecountmetadata
todetermineinwhichorder.
Clientsissuearequesttotheserverfor
eachtriplepattern
83. Querying Datasets on
1 10 100
10100100010000
clients
throughput(q/hr)
Virtuoso 6
Fuseki–tdb
triple pattern
Fig. 3.1: Server performance (log-log plot)
The query throughput is lower,
but resilient to high client numbers.
executed SPARQL queries per hour
84. The server uses much less CPU,
lowering the cost of server infrastructure.
server CPU usage per core
1 10 100
0
50
100
clients
#tim
Fig. 3.3: Query timeouts
1 10 100
0
50
100
clients
cpuuse(%)
Fig. 3.5: Server processor usage per core
100
(%)
85. The server traffic is higher,
but requests are significantly lighter.
ets on the Web with High Availability 13
so 6 Virtuoso 7
–tdb Fuseki–hdt
pattern fragments
1 10 100
0
2
4
clients
datasent(mb)
Fig. 3.2: Server network trafficdata sent by server in MB
86. For some queries, many requests are of
type “is this triple in the dataset?”
0%
25%
50%
75%
100%
L1 L2 L3 L4 L5 S1 S2 S3 S4 S5 S6 S7 F1 F2 F3 F4 F5 C1 C2 C3
The fraction of membership requests for 20 queries
linear (L), star (S), snowflake-shaped (F) and complex (C)
91. >50%ofthequerieshasfewerrequests,
< 20% has more requests.
Original
+ Bloom
Original
+ GCS
Optimized
+ Bloom
Optimized
+ GCS
Percentage of queries per AMF/query algorithm combination
0% 25% 50% 75% 100%
6%
5%
18%
17%
35%
33%
33%
32%
59%
62%
49%
50%
Fewer Requests Equal More Requests
92. No queries have reduction in execution time,
a third even has increase.
Original
+ Bloom
Original
+ GCS
Optimized
+ Bloom
Optimized
+ GCS
Percentage of queries per AMF/query algorithm combination
0% 25% 50% 75% 100%
16%
31%
33%
38%
84%
69%
67%
62%
Equal Lower Execution time Higher Execution time
93. How do Web clients
query published Linked Data today
query a Linked Data API with
lower server cost
discover and query multiple
low-cost Linked Data APIs
reproduce query results
1
2
3
4
96. A Web of Linked Data
TPF API TPF API
TPF APITPF API
TPF API
TPF API TPF API
97. a sustainable Web of Linked Data?
Are low-cost Triple Pattern Fragments APIs a good fit for
98. a sustainable Web of Linked Data?
Are low-cost Triple Pattern Fragments APIs a good fit for
How to query
multiple TPF APIs
TPF API TPF API
TPF API
99. a sustainable Web of Linked Data?
Are low-cost Triple Pattern Fragments APIs a good fit for
How to query
multiple TPF APIs
How to discover
relevant TPF APIs
TPF API TPF API
TPF API
TPF API TPF API
TPF API
100. Fragment mediator
A mediator enables the client to abstract
multiple Triple Pattern Fragment APIs
SPARQL Layer
Fragment Layer B
HTTP Layer BClient
Server
TPF API
HTTP Layer A
TPF API
Fragment Layer A
Merge multiple
Triple Pattern Fragments
as one
Sum the
count metadata
Eliminate sources that
have no results
Dataset A Dataset B
101. 1
10
100
Average Execution time per Query Group in seconds
LD CD LS C
Triple Pattern Fragments ANAPSID ANAPSID EG FedX SPLENDID
Executiontimesonapublicnetwork
areinrangeoftheSOTAonalocalnetwork.
102. 0%
25%
50%
75%
100%
Percentage of Queries per System
Triple Pattern
Fragments
ANAPSID ANAPSID EG FedX (warm) SPLENDID
100% 90 - 100% 10 - 90% 0 - 10% 0%
Compared to the other systems,
more queries retrieve >90%oftheresults.
103. TPF API TPF API
TPF APITPF API
TPF API
TPF API TPF API
104. TPF API TPF API
TPF APITPF API
TPF API
TPF API TPF API
Exploit the links in Linked Data to let APIs
discover each other and inform the client.
105. TPF API
Each Triple Pattern Fragments API creates
a summary of the dataset.
geonames.org
106. TPF API
Each Triple Pattern Fragments API creates
a summary of the dataset.
Per Predicate, list first part of the Subject and Object URIs.
http://dbpedia.org, … located in http://geonames.org, …
… … …
geonames.org
107. TPF API
Each Triple Pattern Fragments API creates
a summary of the dataset.
Per Predicate, list first part of the Subject and Object URIs.
Keep a sample URI for each external domain
http://dbpedia.org, … located in http://geonames.org, …
… … …
http://dbpedia.org/resource/Louvre
geonames.org
118. Number of
needed
requests
0 200 400 600 800
DBPedia subset NY Times LinkedMDB Jamendo
Geonames Semantic Web Dog Food Drugbank Kegg-ChEBI
Discovery
process
time
in minutes
0 1,75 3,5 5,25 7
119. 0%
25%
50%
75%
100%
Percentage of Queries per Dataset
DBPedia NYTimes LinkedMDB Jamendo Geonames SWDF Drugbank Kegg-chebi
100% 90 - 100% 10 - 90% 0 - 10% 0% Unknown
The number of retrieved results is low and
highly depends on what dataset is queried.
120. 1
1.000
1.000.000
Execution time per Query in milliseconds (logarithmic)
No discovery With discovery
Discovery reduces query time for most,
but causessubstantialoverheadforsome.
121. How do Web clients
query published Linked Data today
query a Linked Data API with
lower server cost
discover and query multiple
low-cost Linked Data APIs
reproduce query results
1
2
3
4
122. displays
birthplace
married
The Name of The Rose
Umberto Eco
Alessandria
Renate Ramge
displays
name
author
name
name
Sean Connery
name
about
name
Jean-Jacques Annaud
name
director
Film database
Book database
stars
123. displays
birthplace
married
The Name of The Rose
Umberto Eco
Alessandria
Renate Ramge
displays
name
author
name
name
Sean Connery
name
about
name
Jean-Jacques Annaud
name
director
Film database
Book database
2017 2017
stars
124. displays
birthplace
married
The Name of The Rose
Umberto Eco
Alessandria
Renate Ramge
displays
name
author
name
name
Sean Connery
name
about
name
Jean-Jacques Annaud
name
director
Film database
Book database
2017 2018
Tom Hanks
name
stars
125. What actor stars in films based on
books by ?Umberto Eco
author
name
Sean Connery
about
Linked Datasets drift & produce different answers later on
name
stars
2017
126. What actor stars in films based on
books by ?Umberto Eco
author
name
Sean Connery
about
Linked Datasets drift & produce different answers later on
Tom Hanks
name
stars
2018
127. Ensuring the reproducibility of query results
over Linked Data.
Sustain the
validity of claims
Backwards-compatible
applications
Version 1.0 Version 2.0
128. A pragmatic DBpedia archive can store
14 versions with 12% of the original size.
0
40
80
120
160
2.0
3.0
3.1
3.2
3.3
3.4
3.5
3.6
3.7
3.8
3.9
2014
2015-04
2015-10
Original data size (GB) Archived size (GB)
Archive’s space (↓50%) and time-to-publish (↓20h / version)
significantly decreased for twice the number of triples (6 billion).
129. Querying a Triple Pattern Fragments API
without knowing what versions exist.
SPARQL Layer
Fragment Layer
HTTP LayerClient
Server
TPF API
Dataset in 2017 Dataset now
“Dataset
in 2017 please”
Query in 2017
132. Fragment mediator
Multiple Triple Pattern Fragment APIs
can be synced to a certain point in time.
SPARQL Layer
Fragment Layer B
Memento Layer BClient
Server
TPF API
Memento Layer A
TPF API
Fragment Layer A
Dataset A Dataset B
“Dataset A
in 2017
please”
“Dataset B
in 2017
please”
Query in 2017
133. 2008 2009 2010 2011 2012 2013 2014 2015 2016
2008 2009 2010 2011 2012 2013 2014 2015 2016
“What is the number of awards won by Belgian academics?”
“What is the number of triples describing professor
Jacques-Joseph Haus of Ghent University?”
Multiple sources
Single source
134. 2008 2009 2010 2011 2012 2013 2014 2015 2016
2008 2009 2010 2011 2012 2013 2014 2015 2016
“What is the number of awards won by Belgian academics?”
“What is the number of triples describing professor
Jacques-Joseph Haus of Ghent University?”
When interpreting differences between facts,
consider why facts change.
Multiple sources
Single source
135. How do Web clients
query published Linked Data today
query a Linked Data API with
lower server cost
discover and query multiple
low-cost Linked Data APIs
reproduce query results
1
2
3
4
136. Embrace the Web
and the diversity in publishers.
Many queries are answered within acceptable time,
and the query algorithm can still improve.
Enable clients to be intelligent, not servers.
Triple Pattern Fragments trade bandwidth and time
for low and stable CPU usage.
137. Rethink Web querying.
“Fast” is defined by the application
and when it needs the results.
In a public Web setting, other query languages
besides SPARQL might be (more) appropriate.
Continue the quest for metadata and interfaces
to cover more query use cases.
138. From physical integration
to virtual integration.
Triple Pattern Fragments is competitive as
infrastructure for querying multiple APIs.
Publishing archives can ensure reproducibility,
but caution is needed when interpreting change.
Lightweight APIs enable more Linked Data
publishers with maintained control.
139. Blur the distinction between
querying one or more APIs.
Exploiting Linked Data for API discovery is promising,
but clients need to consume links more intelligently.
Selecting relevant sources is a open challenge,
which could involve machine learning.
Dedicated discovery hubs that gather metadata
will be necessary for scale.