Invited talk at USEWOD2014 (http://people.cs.kuleuven.be/~bettina.berendt/USEWOD2014/)
A tremendous amount of machine-interpretable information is available in the Linked Open Data Cloud. Unfortunately, much of this data remains underused as machine clients struggle to use the Web. I believe this can be solved by giving machines interfaces similar to those we offer humans, instead of separate interfaces such as SPARQL endpoints. In this talk, I'll discuss the Linked Data Fragments vision on machine access to the Web of Data, and indicate how this impacts usage analysis of the LOD Cloud. We all can learn a lot from how humans access the Web, and those strategies can be applied to querying and analysis. In particular, we have to focus first on solving those use cases that humans can do easily, and only then consider tackling others.
12. Currently, there are three waysāØ
to provide access to a Linked Data dataset.
SPARQL endpoint
data dump
Linked Data documents
13. Those three ways have one thing in common:
they offer fragments of a dataset.
SPARQL endpoint
data dump
Linked Data documents
14. Linked Data Fragments lookāØ
at all ways at the same time.
specific queriesāØ
high server effortāØ
low availability
generic requestsāØ
high client effortāØ
high availability
LDāØ
document
dataāØ
dump
SPARQLāØ
result
15. Each type of Linked Data FragmentāØ
is defined by three characteristics.
selector
metadata
controls
What data does it contain?
What do we know about it?
What can we do next?
16. Each type of Linked Data FragmentāØ
is defined by three characteristics.
selector
metadata
controls
a speciļ¬c entity
creator, maintainer, ā¦
links to other LD documents
Linked Data Document
17. Each type of Linked Data FragmentāØ
is defined by three characteristics.
selector
metadata
controls
a SPARQL query
(none)
(none)
SPARQL CONSTRUCT result
18. Each type of Linked Data FragmentāØ
is defined by three characteristics.
selector
metadata
controls
everything
(none)
data dump
number of triples, ļ¬le size
19. Any API that provides triplesāØ
publishes Linked Data Fragments.
specific queriesāØ
high server effortāØ
low availability
generic requestsāØ
high client effortāØ
high availability
LDāØ
document
dataāØ
dump
SPARQLāØ
result
20. Can we deļ¬ne APIs that eļ¬ciently allowāØ
SPARQL querying with high availability?
specific queriesāØ
high server effortāØ
low availability
generic requestsāØ
high client effortāØ
high availability
LDāØ
document
dataāØ
dump
SPARQLāØ
result
basicāØ
LDFs
21. A basic Linked Data Fragments APIāØ
offers triple-pattern-based access.
selector
metadata
controls
a triple pattern
total number of matches
access to all basic LDFs
basic Linked Data Fragment
23. Triple-pattern-based access to Linked DataāØ
doesnāt endanger a serverās availability.
Easy to generate
Eļ¬ciently cacheable through HTTP
Low message entropy
compressed triple format HDT
24. The higher the message entropy,
the more interesting analysis becomes.
high message entropylow message entropy
LDāØ
document
dataāØ
dump
SPARQLāØ
result
basicāØ
LDFs
interesting for USEWOD?boring for USEWOD?
28. SELECT ?person ?city WHERE {
!
!
}
How can we answer this queryāØ
using basic Linked Data Fragments?
?person a dbpedia-owl:Artist.
?person dbpedia-owl:birthPlace ?city.
?city foaf:name "York"@en.
29. Split the query based onāØ
the available fragment types.
?person a dbpedia-owl:Artist.
?person dbpedia-owl:birthPlace ?city.
?city foaf:name "York"@en.
30. ?person a dbpedia-owl:Artist.
?person dbpedia-owl:birthPlace ?city.
?city foaf:name "York"@en.
dbpedia:York foaf:name "York"@en.
dbpedia:York,_Ontario foaf:name "York"@en.āØ
ā¦
dbpedia:Ganesh_Ghosh ā¦:birthPlace dbpedia:Bengal_Presidency.
dbpedia:Jacques_L'enfant ā¦:birthPlace dbpedia:Beauce.āØ
ā¦
dbpedia:Aamir_Zaki a dbpedia-owl:Artist.
dbpedia:Ahmad_Morid a dbpedia-owl:Artist.āØ
ā¦
Get the first pageāØ
of the corresponding fragments.
31. ?person a dbpedia-owl:Artist.
?person dbpedia-owl:birthPlace ?city.
?city foaf:name "York"@en.
dbpedia:York foaf:name "York"@en.
dbpedia:York,_Ontario foaf:name "York"@en.āØ
ā¦
dbpedia:Ganesh_Ghosh ā¦:birthPlace dbpedia:Bengal_Presidency.
dbpedia:Jacques_L'enfant ā¦:birthPlace dbpedia:Beauce.āØ
ā¦
dbpedia:Aamir_Zaki a dbpedia-owl:Artist.
dbpedia:Ahmad_Morid a dbpedia-owl:Artist.āØ
ā¦
Read the count metadataāØ
of each fragment page.
Ā±61,000
Ā±470,000
12
32. ?person a dbpedia-owl:Artist
?person dbpedia-owl:birthPlace
?city foaf:name "York"@en.
dbpedia:York foaf:name "York"@en.
dbpedia:York,_Ontario foaf:name "York"@en.āØ
ā¦
dbpedia:Ganesh_Ghosh ā¦:birthPlace dbpedia:Bengal_Presidency.
dbpedia:Jacques_L'enfant ā¦:birthPlace dbpedia:Beauce.
ā¦
dbpedia:Aamir_Zaki
dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
ā¦
Take the smallest fragment,
start with its first match.
Ā±61,
Ā±470,
12
33. SELECT ?person WHERE {
!
!
}
How can we answer this queryāØ
using basic Linked Data Fragments?
?person a dbpedia-owl:Artist.
?person dbpedia-owl:birthPlace dbpedia:York.
dbpedia:York foaf:name "York"@en.
35. ?person a dbpedia-owl:Artist.
?person dbpedia-owl:birthPlace dbpedia:York.
dbpedia:John_Flaxman dbpo:birthPlace dbpedia:York.
dbpedia:Joseph_Hansom dbpo:birthPlace dbpedia:York.āØ
ā¦
dbpedia:Aamir_Zaki a dbpedia-owl:Artist.
dbpedia:Ahmad_Morid a dbpedia-owl:Artist.āØ
ā¦
Get the first pageāØ
of the corresponding fragments.
36. ?person a dbpedia-owl:Artist.
?person dbpedia-owl:birthPlace dbpedia:York.
dbpedia:John_Flaxman dbpo:birthPlace dbpedia:York.
dbpedia:Joseph_Hansom dbpo:birthPlace dbpedia:York.āØ
ā¦
dbpedia:Aamir_Zaki a dbpedia-owl:Artist.
dbpedia:Ahmad_Morid a dbpedia-owl:Artist.āØ
ā¦
Read the count metadataāØ
of each fragment page.
Ā±61,000
75
37. ?person a dbpedia-owl:Artist
?person dbpedia-owl:birthPlace dbpedia:York.
dbpedia:John_Flaxman dbpo:birthPlace dbpedia:York.
dbpedia:Joseph_Hansom dbpo:birthPlace dbpedia:York.āØ
ā¦
dbpedia:Aamir_Zaki
dbpedia:Ahmad_Morid a dbpedia-owl:Artist.
ā¦
Ā±61,
75
Take the smallest fragment,
start with its first match.
38. ASK {
!
!
}
How can we answer this queryāØ
using basic Linked Data Fragments?
dbpedia:John_Flaxman a dbpedia-owl:Artist.
dbpedia:John_Flaxman :birthPlace dbpedia:York.
dbpedia:York foaf:name "York"@en.
45. How can we analyze queriesāØ
from intelligent clients?
Client queries are diļ¬erent
Look at the logs
Treat machine clients as humans
46. How can we analyze queriesāØ
from intelligent clients?
Client queries are diļ¬erent
Look at the logs
Treat machine clients as humans
47. Despite being on the Web, we use
public SPARQL endpoints like databases.
Ask a complex question.
Wait.
Process the answer.
48. When was the last timeāØ
you used the Web like that?
Ask a complex question.
Wait.
Process the answer.
49. On the Web, there are no ļ¬nal answers.āØ
We ask questions and iteratively improve.
Ask a simple questions.
Process answers as they arrive.
Create new questions.
50. Show a sorted list of names of Greek artists,āØ
linked to their DBpedia page.
ā¦
ĪŗĪ±Ī»Ī»Ī¹ĻĪĻĪ½ĪµĻ endpointāØ
approach
fragmentāØ
approach
52. endpointāØ
approach
Show a sorted list of names of Greek artists,āØ
linked to their DBpedia page.
SELECT DISTINCT(?person) MIN(?name)
WHERE {
?person a dbpedia-owl:Artist;
foaf:name ?name;
dbpedia-owl:birthPlace ?city.
?city dbpedia-owl:country dbpedia:Greece.
}
ORDER BY ?name
53. endpointāØ
approach
Show a sorted list of names of Greek artists,āØ
linked to their DBpedia page.
SELECT DISTINCT(?person) MIN(?name)
WHERE {
?person a dbpedia-owl:Artist;
foaf:name ?name;
dbpedia-owl:birthPlace ?city.
?city dbpedia-owl:country dbpedia:Greece.
}
ORDER BY ?name
54. endpointāØ
approach
Show a sorted list of names of Greek artists,āØ
linked to their DBpedia page.
DISTINCT
MIN
SORT BY
keep all results in memory
keep all results in memory, blocking
keep all results in memory, blocking
Consequences:
Doesnāt matter; weāre waiting anyway.
55. fragmentāØ
approach
Show a sorted list of names of Greek artists,āØ
linked to their DBpedia page.
SELECT ?person ?nameāØ
WHERE {
?person a dbpedia-owl:Artist;
foaf:name ?name;
dbpedia-owl:birthPlace ?city.
?city dbpedia-owl:country dbpedia:Greece
}
No blocking operators; streaming is important.
57. Making the LOD cloud less lonesomeāØ
starts with embracing its open nature.
How meaningful is a sort anyway?
How meaningful is a single answer?
Build applications that react to data.
58. How can we analyze queriesāØ
from intelligent clients?
Client queries are diļ¬erent
Look at the logs
Treat machine clients as humans
59. Letās closely inspect the server logsāØ
of the āArtists from Yorkā query.
SPARQL:
http://dbpedia.org/sparql?query=SELECT+%3Fp+
%3Fc+WHERE+%7B%0D%0A++++%3Fp+a+
%3Chttp%3A%2F%2Fdbpedia.org%2Fontology
SELECT ?person ?city WHERE {
!
!
}
?person a dbpedia-owl:Artist.
?person dbpedia-owl:birthPlace ?city.
?city foaf:name "York"@en.
60. Letās closely inspect the server logsāØ
of the āArtists from Yorkā query.
basic Linked Data Fragments:
/dbpedia
/dbpedia?predicate=http%3A%2F%2Fxmlns.com%2Ffoa
/dbpedia?predicate=http%3A%2F%2Fwww.w3.org%2F1
/dbpedia?predicate=http%3A%2F%2Fdbpedia.org%2Fo
/dbpedia?predicate=http%3A%2F%2Fdbpedia.org%2Fo
/dbpedia?predicate=http%3A%2F%2Fdbpedia.org%2Fo
/dbpedia?predicate=http%3A%2F%2Fdbpedia.org%2Fo
/dbpedia?predicate=http%3A%2F%2Fdbpedia.org%2Fo
/dbpedia?predicate=http%3A%2F%2Fdbpedia.org%2Fo
/dbpedia?predicate=http%3A%2F%2Fdbpedia.org%2Fo
/dbpedia?predicate=http%3A%2F%2Fdbpedia.org%2Fo
61. Letās closely inspect the server logsāØ
of the āArtists from Yorkā query.
basic Linked Data Fragments:
?c foaf:name "York"@en.
?p rdf:type dbpedia-owl:Artist.
?p dbpedia-owl:birthPlace ?c.
?p dbpedia-owl:birthPlace dbpedia:York_(explorer).
?p dbpedia-owl:birthPlace dbpedia:York_railway_station.
?p dbpedia-owl:birthPlace dbpedia:28220_York.
?p dbpedia-owl:birthPlace dbpedia:York_(provincial_elect
?p dbpedia-owl:birthPlace dbpedia:York,_New_York.
dbpedia:Cornelius_R._Parsons rdf:type dbpedia-owl:Artist
dbpedia:John_R._McPherson rdf:type dbpedia-owl:Artist.
62. Access logs resulting from basic LDF clientsāØ
are hard to interpret.
Parallel requests, unclear dependencies
Full query hard to reconstruct
Was it SPARQL in the ļ¬rst place?
63. What would we doāØ
if the users were humans?
Create a user proļ¬le
Use cookies
Check the Referer header
64. The Referer header tells usāØ
the path the client has followed.
Interesting, still underused idea
Augmenting the Web of Data using ReferersāØ
by Hannes MĆ¼hleisen & Anja Jentzsch
It explains part of the āwhyā
Allows to reconstruct dependencies
66. These dependencies can help usāØ
cache and prefetch.
After retrieving ?s <p> <o> patterns,āØ
clients often ask for <s> rdfs:label ?l .
Example observation:
Example action:
Always add labels to conceptsāØ
in all responses.
cfr. Caching and Prefetching Strategies for SPARQL queriesāØ
by Johannes Lorey and Felix Naumann
67. However, the open-world assumptionāØ
can cause cardinality trouble.
SELECT ?person ?label WHERE {
?person a dbpedia-owl:Artist;
rdfs:label ?label.
}
dbpedia:Yannis_Markopoulos a dbpedia-owl:Artist;
rdfs:label "Yannis Markopoulos"@en.
dbpedia:Yanni a dbpedia-owl:Artist;
rdfs:label āYanni"@en.
ā¦
Are these all labels?āØ
Should I ask for more?
fragment ā?person a dbpedia-owl:Artistā
68. The intent of this queryāØ
is probably different from its semantics.
SELECT ?person ?label WHERE {
?person a dbpedia-owl:Artist;
rdfs:label ?label.
}
With SPARQL endpoints, this doesnāt matter.āØ
Clients donāt have to work more.
To optimize client usage patterns,āØ
this diļ¬erence is really important.
69. Referers only show part of the story.āØ
Can we know more?
GET /dbpedia?o=dbpedia%3AGreece HTTP/1.1
User-Agent: curl/7.35.0
Host: data.linkeddatafragments.org
Accept: text/turtle
Referer: http://data.linkeddatafragments.org/dbpedia
X-Executed-Query: SELECT ?person ?label WHERE { ?person a dbpe
Inform the server what youāre doing.
Then the server can help you better in the future.
70. How can we analyze queriesāØ
from intelligent clients?
Client queries are diļ¬erent
Look at the logs
Treat machine clients as humans
71. My reļ¬ex when building machine clientsāØ
is to wonder: what would a human do?
I donāt expect any serverāØ
to solve my queries;āØ
I collect small pieces of informationāØ
to solve queries myself.
72. If you as a human use a websiteāØ
and it doesnāt work the way you want,
what would you do?
73. As a human, I would leave feedback.āØ
I would comment, like or upvote/downvote.
āI tried to ļ¬nd artists from Greece.āØ
Finding out Greek citizens was easy,āØ
but the artist checks went quite slow.āØ
The total query took me 4 minutes,āØ
whereas I would prefer 1 minute.ā
ā ā ā āā
Feedback is keyāØ
to improving a service.
74. Why donāt we let machinesāØ
give feedback about their experience?
[ a f:ExperienceFeedback;
f:author _:agent;
f:subject _:query;
f:actualSituation [
f:duration "3m";
f:bandwidth "500KB"
];
f:desiredSituation [
f:duration "1m"
] ].
75. If clients are more intelligent than servers,āØ
we have to analyze usage diļ¬erently.
Enable clients to act smart
Creatively reuse human techniques
Learn from optimizations
feedback
77. Machine clients sending feedback?
What you say is total science ļ¬ction!
Whatās next, machine clientsāØ
that poke you on Facebook?
ā
78. What I consider science ļ¬ction:
a public endpoint on the WebāØ
that answers any question.
79. 99.9% of time, a basic LDF clientāØ
solves this query in 3 seconds:
Which public SPARQL endpointsāØ
could guarantee you that?
SELECT ?person ?city WHERE {
?person a dbpedia-owl:Artist.
?person dbpedia-owl:birthPlace ?city.
?city foaf:name āYork"@en.
}
80. You cannot solve all queriesāØ
with basic Linked Data Fragments!
SELECT ?x ?l WHERE {
?x rdfs:label ?l.
FILTER REGEX(?l, "^A")
}
ā
81. The Semantic Web triedāØ
to solve too much too fast.
The result isāØ
a very lonesome LOD Cloud.
You can query anything,
but it never works.
82. Start with enabling tasksāØ
humans could easily do.
SELECT ?x ?l WHERE {
?x rdfs:label ?l.
FILTER REGEX(?l, "^A")
}
83. Start with enabling tasksāØ
humans could easily do.
SELECT ?person ?city WHERE {
?person a dbpedia-owl:Artist.
?person dbpedia-owl:birthPlace ?city.
?city foaf:name "York"@en.
}
84. Start with enabling tasksāØ
humans could easily do.
āI tried to ļ¬nd artists from Greece.āØ
Finding out Greek citizens was easy,āØ
but the artist checks went quite slow.āØ
The total query took me 4 minutes,āØ
whereas I would prefer 1 minute.ā
ā ā ā āā
85. Start with enabling tasksāØ
humans could easily do.
After that,āØ
weāll talk about the rest.
86. The LOD usage communityāØ
can help create intelligent clients.
Put the intelligent servers aside,āØ
enable clients to be intelligent.
Look at usage from the perspectiveāØ
of intelligent clients.
87. The LOD Cloud is lonesomeāØ
because we gaveāØ
human and machine clientsāØ
diļ¬erent interfaces.
Letās make the simple things work.āØ
Letās get the data used.