From Event to Action: Accelerate Your Decision Making with Real-Time Automation
Finding and consuming (Linked) Open Data
1. Finding and consuming
(Linked) Open Data
Christophe Guéret (@cgueret)
March 8, 2012
http://latc-project.eu http://ehumanities.nl http://www.vu.nl
2. Th e ne xt two h ou rs
Open Data
What is it? Why opening data?
How to find Open Data
How to consume it
Hands-on session
Linked Data & Linked Open Data
What is it? Relation with Open Data?
How to get Linked Data
Ways to consume it
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 2/
3. O p e n D ata
March 8, 2012 F ind ing and cons u m ing (Linke d ) http://www.flickr.com/photos/-jvl-/4983920242
O p e n D ata 3/
4. O p e n D ata
“A piece of content or data is open if anyone is free to
use, reuse, and redistribute it — subject only, at
most, to the requirement to attribute and share-alike.”
http://opendefinition.org/
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 4/
5. Why op e ning d ata?
Data has more value than applications
Data is more used if it's easier to use it
Credit: Dorothea Salo, http://www.slideshare.net/cavlec/rdf-rda-and-other-tlas
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 5/
6. O p e n D ata for P u b lic ins titu tions
Improve transparency
Active citizenship and data journalism
Create new opportunities
Develop need-focused applications almost for free
See all AppsforX challenges (Amsterdam, Nederland, …)
http://opendatachallenge.org/
Let businesses sell services around the data
Improve efficiency
Help share data within institutions
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 6/
7. O p e n D ata for R e s e arch e rs
Consider data as an asset
Like papers, can be referenced to
Like papers, open access for increased usage
“Better” science
Reproducibility of experiments
Cross usage of data sets in different studies
Improve transparency (and decrease fraud?)
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 7/
8. D ata workflow
Search for the relevant data sets data integration and clean up
Do Visualise and/or analyse the data
Re-publish integrated and curated data
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 8/
9. D ata workflow
Search for the relevant data sets data integration and clean up
Do Visualise the data
Re-publish integrated and curated data
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 9/
10. Th re e ways to s e arch for d ata
Generic search engine with specific target
Use keywords or keywords + file type
Browse data archives
Focused around particular topic(s)
Explored by facets and keywords
Use data portals
“Yellow pages” for data archives, faceted search
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 1 0/
11. U s ing a s e arch e ngine
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 11 /
12. D ata arch ive → D ryad
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 1 2/
13. D ata arch ive → E as y
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 1 3/
14. D ata p ortal → O ve rh e id .nl
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 1 4/
15. D ata p ortal → P u b licd ata.e u
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 1 5/
16. D ata p ortal → Kas ab i
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 1 6/
17. D ata catalogs
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 1 7/
18. D ata workflow
Search for the relevant data sets data integration and clean up
Do Visualise the data
Re-publish integrated and curated data
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 1 8/
19. D ata inte gration
Unify the different data in a single format
XLS + PDF + CSV => CSV
Integrate the data
Connect the bits and pieces
Curate the data
Fix errors in the data
Process the data in preparation for its usage
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 1 9/
20. D ata inte gration
Unify the different data in a single format
XLS + PDF + CSV => CSV Use Linked
Data to
save time
there!
Integrate the data
Connect the bits and pieces
Curate the data
Fix errors in the data
Process the data in preparation for its usage
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 20/
21. D ata workflow
Search for the relevant data sets data integration and clean up
Do Visualise the data
Re-publish integrated and curated data
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 21 /
22. Vis u alis e d ata → D ataM arke t
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 22/
23. Vis u alis e d ata → G oogle e xp lore r
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 23/
24. Vis u alis e d ata → M icros oft e xp lore r
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 24/
25. Vis u alis e d ata → Wolfram Alp h a
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 25/
26. D ata workflow
Search for the relevant data sets data integration and clean up
Do Visualise the data
Re-publish integrated and curated data
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 26/
27. P u b lis h p roce s s e d d ata
How?
Send to data archive
Publish on web sites
Why?
Re-usability
Community process (“if I do it, other will do it”)
Scientific process
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 27/
28. H and s on s e s s ion
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 28/
29. In 2001, what were the council election results in the
county of Warwickshire (UK) ?
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 29/
30. What is the evolution of literacy rate
in Tanzania since 1988 ?
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 30/
31. Can you make this plot of unemployment rates
using the Google Public data explorer ?
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 31 /
32. Linke d D ata & Linke d O p e n D ata
Linked Data
March 8, 2012 http://www.flickr.com/photos/erikcharlton/3337465138
F ind ing and cons u m ing (Linke d ) O p e n D ata 32/
33. Wh at is th e p rob le m ?
Frank and Christophe publish some open data
Roi wants to combine and enrich it
Kennissen Stad
Christophe Amsterdam
Peter Barcelona WWW
Frank David Parijs
Ville Pays Roi
Barcelone Espagne
Paris France WWW
Christophe Amsterdam Pays-Bas
Marvel icons: mermer, DeviantArt
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 33/
34. Wh at is th e p rob le m ?
Kennissen Stad Ville Pays
Christophe
Peter
David
Amsterdam
Barcelona
Parijs
+
Barcelone
Paris
Amsterdam
Espagne
France
Pays-Bas
= ?
Data integration issue
“Kennissen”, “Stad”, “Ville”, “Pays” ?
“Paris” = “Parijs” ?
“Amsterdam” = “Amsterdam” ?
Lot of work for the data consumer
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 34/
35. Why is th is s o p rob le m atic?
Un-even balance of information
Christophe and Frank have more of it than Roi
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 35/
36. S olu tion: s h are m ore inform ation
“Amsterdam” = “Amsterdam” ?
Replace “Amsterdam” by “Amsterdam,
Netherlands”
“Kennissen”, “Stad”, “Ville”, “Pays” ?
Provide a description for the meaning of the
columns as a separate document
“Paris” = “Parijs” ?
Use English names instead of local ones
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 36/
37. Bu t is th at e nou gh ?
There could still be several “Amsterdam,
Netherlands”
Precise until 100% certain of uniqueness
Documentation of columns is one more thing to
consume to use the data
It's hard to enforce the usage of a single
language to name things
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 37/
38. Linke d D ata id e a
Data integration at the data level
Define “things” in the data set
Use unambiguous identifiers for the things
Associate descriptions to the identifiers
Connect things together
Works in
1 2
Name fr is “Paris”
Name is “Christophe”
Name nl is “Parijs”
...
...
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 38/
39. Linke d D ata and th e We b
Proposal: use the Web as a platform
Identifiers = URIs
Descriptions = de-referenced documents
This is a
“triple”
This is a
“resource”
ex:worksIn
ex:Christophe dbpedia:Amsterdam
Use of compact URIs
dbpedia = http://dbpedia.org/resource/
ex = http://example.org/
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 39/
40. Wh at is at d b p e d ia:Am s te rd am ?
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 40/
41. Be ne fits of Linke d D ata
Data model of triples and resources:
Everything defined as described things and
relations
Cope easilly with heterogeneous descriptions
Easy to cross-reference things between data sets
The network contains both the data and its
description
Use the Web and other open standards (RDF,
SPARQL, ...)
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 41 /
42. Frank publishes his data Kennissen Stad
Christophe Amsterdam
Peter Barcelona
David Parijs
ex:Acquaintance
rdf:type rdf:type rdf:type
ex:Christophe ex:Peter ex:David
ex:worksIn ex:worksIn ex:worksIn
dbpedia:Amsterdam dbpedia:Barcelona dbpedia:Paris
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 42/
43. Christophe re-use part of Frank's data Ville Pays
to publish his data Barcelone Espagne
Paris France
Amsterdam Pays-Bas
ex:Acquaintance
rdf:type rdf:type rdf:type
ex:Christophe ex:Peter ex:David
ex:worksIn ex:worksIn ex:worksIn
dbpedia:Amsterdam dbpedia:Barcelona dbpedia:Paris
ex:isIn ex:isIn ex:isIn
dbpedia:Netherlands dbpedia:Spain dbpedia:France
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 43/
44. Roi add some “Conocido”@es
more information
rdf:label
ex:Acquaintance
rdf:type rdf:type rdf:type
ex:Christophe ex:Peter ex:David
ex:worksIn ex:worksIn ex:worksIn
dbpedia:Amsterdam dbpedia:Barcelona dbpedia:Paris
ex:isIn ex:isIn ex:isIn
dbpedia:Netherlands dbpedia:Spain dbpedia:France
ex:isIn ex:isIn ex:isIn
dbpedia:Europe
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 44/
45. R e as oning with S e m antics Bonus!
dbpedia:Amsterdam ex:isIn dbpedia:Amsterdam
ex:isIn rdf:type
dbpedia:Netherlands + owl:TransitiveProperty = ex:isIn
ex:isIn
dbpedia:Europe dbpedia:Europe
Example usage
Materialize implicit information
Check for consistencyu m ing (Linke d ) O p e n D ata
March 8, 2012 F ind ing and cons 45/
46. Linke d D ata vs Linke d O p e n D ata
Linked Data doesn't imply Open Data!
Possible to use Linked Data principles to closed
data
Open Data doesn't imply Linked Data
Many open data is not yet published as linked
data
Linked data + Open Data = Linked Open Data
Global, web-scale, dataingspacee nof open data
March 8, 2012 F ind ing and cons u m (Linke d ) O p D ata 46/
47. R ou gh e s tim ate of s ize
295 data sets, 31B facts in LOD Cloud
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 47/
48. E ve ryone can e nrich th e clou d
“Conocido”@es
rdf:label
ex:Acquaintance
rdf:type rdf:type rdf:type
ex:Christophe ex:Peter ex:David
ex:worksIn ex:worksIn ex:worksIn
dbpedia:Amsterdam dbpedia:Barcelona dbpedia:Paris
ex:isIn ex:isIn ex:isIn
dbpedia:Netherlands dbpedia:Spain dbpedia:France
ex:isIn ex:isIn ex:isIn
dbpedia:Europe
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 48/
49. G e t Linke d O p e n D ata
Linked Data is a graph data base on the Web
It can be consumed in two ways
As documents on the Web
Open the resources and ask for RDF content to get a graph
As a data base
Query the data with SPARQL (equivalent of SQL)
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 49/
50. S e arch for R D F d ocu m e nts
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 50/
51. Look for th e R D F e xp ort
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 51 /
52. Look for th e R D F e xp ort
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 52/
53. Look for th e R D F e xp ort
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 53/
54. S ind ice We b d ata ins p e ctor
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 54/
55. H and s on s e s s ion
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 55/
56. G e t th e R D F of a Be s tBu y p rod u ct
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 56/
57. G e t R D F ou t of rotte ntom atoe s
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 57/
58. U s e -cas e : b u ild ing a s ocial
ne twork of m u s icians
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 58/
59. G oal
Make a network
Nodes = artists
Edges => play(ed) in the same band
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 59/
60. U s e Fre e b as e as d ata s ou rce
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 60/
61. G e tting th e d ata
First option:
Get all the pages for all the artists as RDF
Merge them
Filter the data to keep only the desired relations
Second option:
Extract a sub-graph out of the data graph of
Freebase
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 61 /
62. S PAR Q L q u e ry
PREFIX fb:
<http://rdf.freebase.com/ns/>
SELECT distinct ?name1 ?name2
WHERE {
?g1
fb:music.group_membership.group ?
group.
?g1
fb:music.group_membership.member
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 62/
63. R e s u lt
Use factforge.net
Contains a copy of the data from Freebase
Understands SPARQL queries
Results: http://bit.ly/music_sn
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 63/
64. H ot line for Linke d (O p e n) D ata
Christophe Guéret
c.d.m.gueret@vu.nl
http://www.few.vu.nl/~cgueret
@cgueret
Rinke Hoekstra
rinke.hoekstra@vu.nl
http://www.rinkehoekstra.nl/
@rinkehoekstra
March 8, 2012 F ind ing and cons u m ing (Linke d ) O p e n D ata 64/