"Although the cloud of Linked Open Data has been growing continuously for several years, little is known about the particular features of linked data usage. Motivating why it is important to understand the usage of Linked Data, we describe typical linked data usage scenarios and contrast the so derived requirement with conventional server access analysis. Then, we report on usage patterns found through an in-depth analysis of access logs of four popular LOD datasets. Eventually, based on the usage patterns we found in the analysis, we propose metrics for assessing Linked Data usage from the human and the machine perspective, taking into account different agent types and resource representations."
Slides for a presentation at WebScience 2010. The paper is available for download at http://journal.webscience.org/302/.
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Learning from Linked Open Data Usage
1. Copyright 2010 Knud Möller
Except where otherwise noted, this work is licensed under
http://creativecommons.org/licenses/by-sa/3.0/
Learning from Linked Open Data Usage:
Patterns & Metrics
Knud Möller, Michael Hausenblas, Richard Cyganiak,
Gunnar Grimnes, Siegfried Handschuh
WebScience 2010, Raleigh, NC, USA
26/04/2010
13/03/2008 FAST kick-off, Madrid, 2008
Copyright 2010 Digital Enterprise Research Institute. All rights reserved.
Monday 26 April 2010
2. What is Linked (Open) Data? (in <1 minute)
Conventional “Eye-ball” Web Web of Linked Data
interlinked documents interlinked items of data
(URIs, RDF)
mainly people / Web mainly machine agents
browsers
2
Monday 26 April 2010
3. What is Linked (Open) Data? (in <1 minute)
Linked Open Data cloud (the set of interlinked, Semantic
Web datasets)
February 2008
http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
July 2009
3
Monday 26 April 2010
4. Question: How is Linked Data being Used?
•plenty of research on conventional Web usage
•what about usage of linked data?
Why?
•how healthy is the Web of linked data?
•who is using the data and how? Is it useful? Are there
trends?
•providers: improve hosting
•... just curiosity!
4
Monday 26 April 2010
5. Question: How is Linked Data being Used?
•plenty of research on conventional Web usage
•what about usage of linked data?
Why?
•how healthy is the Web of linked data?
•who is using the data and how? Is it useful? Are there
trends?
ics?
•providers: improve hosting
e tr
m
•... just curiosity!
e bo
w
4
Monday 26 April 2010
6. Approach
•particular sites:
– a URI for each data item ➙ a request for each data item
(resource)
– content negotiation best practices
– redirection (HTTP 303)
5
Monday 26 April 2010
7. Approach
•particular sites:
– a URI for each data item ➙ a request for each data item
(resource)
– content negotiation best practices
– redirection (HTTP 303)
http://data.semanticweb.org/
conference/www/2009
plain
resource URI
RDF HTML
document URI document URI
http://data.semanticweb.org/ http://data.semanticweb.org/
conference/www/2009/rdf conference/www/2009/html
5
Monday 26 April 2010
9. se Code Responce Size Referrer User Agent
Source Data
Figure 1: The combined log format
# triples # days total # hits # plain hits # RDF hits # HTML hits SPARQL
Dog Food 79,175 597 8,427,967 1,923,945 259,031 1,647,205 879,932
(14,117) (3,223) (434) (2,759) (1,471)
DBpedia 109,750,000 118 87,203,310 22,821,475 7,008,310 22,999,237 20,972,630
(739,011) (193,402) (59,392) (194,909) (177,734)
DBTune 74,209,000 61 7,467,125 1,952,185 1,135,509 677,904 3,055,493
(122,412) (32,003) (18,615) (11,113) (50,090)
RKBExplorer 91,501,684 29 529,938 — — — 9,327
(18,274) (—) (—) (—) (322)
RDF 5.8% Semantic 2.8% Table 1: Overview of four 4.2% datasets
Semantic LOD Semantic 2.5%
RDF 14.9% RDF 7.8%
are served. For our evaluation, we had access to log taining a SPARQL query, we assume that it is
Plain 47.7%
two periods: from 24/05/2009–21/06/2009 and from ble of 45%
Plain handling the query result, i.e., either a
Plain 41.0%
2009–29/10/2009, i.e., roughly two months. bindings (in the case of a SELECT query), pote
containing URIs of RDF resources, or an RDF
RKBExplorer (in the case of a CONSTRUCT or DESCRIBE q
BExplorer6 [11] is another meta-dataset currently com-
44 sub-datasets covering various topics and sources
HTML 46.5%
• RDF requests: if an agent directly requests
HTML 39.9% HTML 51.1%
the domain of academic research, as well as a Web from a server, we assume that it knows how t
ation that allows users to access and browse its content cess data in this format. Directly here mean
DBpedia
ntegrated fashion. Both RDF and HTML documents DBTune the agent specified an RDF syntax such as rd
Dog Food
the resources in all datasets are available. Apart from as an acceptable response in the header of its re
g linked data, the site also features a module that Merely requesting the URI of an RDF represen
es co-reference resolution functionality [10]. For our does not suffice to indicate semanticity, as this
7
tion, we had access to log files in the period from simply mean that the agent followed a link to th
2009–21/06/2009, i.e., roughly one month. However, resentation.
Monday 26 April 2010
10. Agents: Ordinary Traffic
http://data.semanticweb.org, 21/07/2008 - 20/06/2009
500000
hits
3)
83
ordinary traffic: the usual suspects
66 8
97
37 23
)
(4
13 59
400000
ot
(1
B
p
)
le
28
ur
&
)
g
11
Sl
89
oo
92
11
o!
G
(1
ho
t(
300000
bo
er
5)
Ya
32
ch
sn
12
et
m
hits
eF
r(
le
ic
w
nd
ra
Si
200000
2)
tic
34
ul
)
08
(7
m
68
.0
/1
r(
ot
de
fb
100000
ea
rd
R
C
R
A
0
0 5 10 15 20 25 30
SW Dog Food (21/07/2008 - 20/06/2009)
agents
8
Monday 26 April 2010
12. Is Demand for LOD increasing?
Dog Food Hits over Time (smoothing factor 0.05)
6000
plain
html
rdf
5000 semantic
4000
3000
2000
1000
no increase for semantic requests
0
2008-07-01
2008-09-01
2008-11-01
2009-01-01
2009-03-01
2009-05-01
2009-07-01
2009-09-01
2009-11-01
2010-01-01
2010-03-01
2010-05-01
10
Monday 26 April 2010
13. Is Demand for LOD increasing? (ctd.)
DBpedia Hits over Time (smoothing factor 0.05)
300000
plain
html
rdf
250000 semantic
200000
150000
100000
50000
no increase for semantic requests
0
2009-06-20
2009-07-04
2009-07-18
2009-08-01
2009-08-15
2009-08-29
2009-09-12
2009-09-26
2009-10-10
2009-10-24
2009-11-07
11
Monday 26 April 2010
14. Do Real-world Events have an Impact on LOD Usage?
Demand for Events (smoothing factor 0.05)
700
iswc2008
www2009
600 possible impact eswc2009
iswc2009
500
400
300
200
100
0
2008-07-01
2008-09-01
2008-11-01
2009-01-01
2009-03-01
2009-05-01
2009-07-01
2009-09-01
2009-11-01
2010-01-01
2010-03-01
2010-05-01
12
Monday 26 April 2010
15. Do Real-world Events have an Impact on LOD Usage?
Irish Lisbon Treaty Referendum (smoothing factor 0.05)
9
http://dbpedia.org/resource/Republic_of_Ireland
http://dbpedia.org/resource/European_Union
8 http://dbpedia.org/resource/Treaty_of_Lisbon
7
possible impact
6
5
4
3
2
1
0
2009-06-20
2009-07-04
2009-07-18
2009-08-01
2009-08-15
2009-08-29
2009-09-12
2009-09-26
2009-10-10
2009-10-24
2009-11-07
13
Monday 26 April 2010
16. Do Real-world Events have an Impact on LOD Usage?
Michael Jackson Memorial Service (smoothing factor 0.05)
4.5
http://dbpedia.org/resource/Staples_Center
http://dbpedia.org/resource/Michael_Jackson_memorial_service
4 http://dbpedia.org/resource/Michael_Jackson
3.5
3
2.5
2
possible impact
1.5
1
0.5
0
2009-06-20
2009-07-04
2009-07-18
2009-08-01
2009-08-15
2009-08-29
2009-09-12
2009-09-26
2009-10-10
2009-10-24
2009-11-07
14
Monday 26 April 2010
17. Conclusion (of sorts)
•Generic approach for analysing usage of LOD sites (but
see below), based on server log files
•Metric for semanticity of agents
•Did not notice a rising demand in LOD
•However: real-world events do seem to have an effect
on LOD usage
•Restrictions:
– does not work well with embedded metadata (e.g., RDFa-based
sites)
– does not take into account usage through meta sites (indexes,
search engines, ...)
15
Monday 26 April 2010