2. Intro
● LOD: Linked Open Data
● Linked Data + Open Data = “5-star” data
● The WWW is on the way to become an
immense database (Web of Data)
● What does this mean? What is it made
of?What is it for? Whom is it for?
3. Intro
Going from this...
Tartu is the second largest city of Estonia,
following Estonia's political and financial
capital Tallinn.
Tartu is often considered the intellectual
centre of the country, especially since it is
home to the nation's oldest and most
renowned university, the University of
Tartu.
In German, Swedish and Polish the town
has been known and is sometimes still
referred to as Dorpat, a variant of
Tarbatu.
The University of Tartu (Estonian: Tartu
Ülikool, Latin: Universitas Tartuensis) is a
classical university in the city of Tartu,
Estonia.
4. Intro
… to this!
Tartu
Estonia
Tallinn
University of Tartu
city
college town
Dorpat
Tarbatu
Tartu Ülikool Universitas Tartuensis
hasCapital
is a
name
name
name (official) name (Latin)
located in
in country
is a
8. 5-star data: how
★ make your stuff available on the Web
(whatever format) under an open license
★★ make it available as structured data (e.g.,
Excel instead of image scan of a table)
★★★ make it available in a non-proprietary open
format (e.g., CSV rather than Excel)
★★★★ use URIs to denote things, so that people can
point at your stuff
★★★★★ link your data to other data to provide context
9. LD principles
● Original design rules
– Use URIs as unique identifiers for resources (not the
same as URL)
– Use the HTTP URI scheme (rather than other
schemes such as URN), so that URL = URI
– When an ID is dereferenced (= looked up), give
useful information using the standards (e.g. RDF)
– Provide links to other resources
● LOD = LD + open license
10. RDF model
● RDF (Resource Description Framework) is
a fundamental brick to build LD
● It is built on the concept of triple: a subject
linked to an object by means of a predicate
ns2:Ingredient 1
ns2:Ingredient 2
ns2:Product1
ns:product
ns:product
10
20
ns:weight
ns:weight
ns = http://www.example.com/ ns2 = http://www.anotherexample.com/
11. RDF: serialization
● It is possible to use content negotiation to get the
same file in different serialization formats
● Linux: use the curl command
– $ curl -L -H "Accept: application/rdf+xml"
http://dbpedia.org/resource/Tartu
– $ curl -L -H "Accept: text/turtle”
http://dbpedia.org/resource/Tartu
● There are also REST clients for Firefox and Chrome
16. RDF: data schema
● In a relational database, we have to look
for definitions in the data schema
● Using RDF, instead, we can fully describe
data and their schema!
● In order to do this, we need vocabularies
– Every term in a vocabulary has a
common base URI called namespace
17. Common vocabularies
● rdf, rdfs, owl – RDF “core” vocabularies
● dcterms – general properties for resources
● foaf – Friend of a Friend
● geo – geolocalization
● skos – description of schemas and taxonomies
● void, dcat – description of datasets
● doap – description of projects
18. ● rdf and rdfs are used basically everywhere,
since they are used to define the data
schema
● Using rdf we can say that an entity
belongs to a class of entities
● Using rdfs we can define super- and
subclass relations
Common vocabularies
19. Examples
● “Tartu is a city”
– dbr:Tartu rdf:type dbo:City
– dbr:Tartu a dbo:City
● “Cities are settlements”
– dbo:City rdfs:subClassOf
dbo:Settlement
20. Ontologies
● An ontology is a model used to describe a
domain
● Ontologies can be used to describe
complex, interesting concepts
● The may be hard to develop, because
logical and modelling decisions are not
always straightforward
21. Using LD
● Should we know all the details about RDF
to be able to use LD?
● “Follow your nose” approach thanks to links
– https://www.wikidata.org
– http://sameas.org
– https://datahub.io
23. Using LD
● SPARQL is to RDF what SQL is to databases:
a query language
● A SPARQL endpoint is a resource where
SPARQL queries can be sent to and data can
be retrieved from
● Some SPARQL endpoints:
– https://query.wikidata.org/
– http://dbpedia.org/sparql
24. SPARQL queries
● I want to find some interesting facts about
Tartu
● Let’s go to https://www.wikidata.org and
search for Tartu again
● Let’s take a note of the “Q number”
25. SPARQL queries
● Now let’s go on https://query.wikidata.org
● Let’s insert this query
SELECT DISTINCT *
WHERE {
?person wdt:P19 wd:Q13972; rdfs:label ?personName .
FILTER (LANG(?personName) = "en")
}
ORDER BY ?personName
27. SPARQL queries
● Anyone born in Tartu whose name looks Italian?
SELECT DISTINCT *
WHERE {
?person wdt:P19 wd:Q13972; rdfs:label ?personName;
wdt:P735/wdt:P407 wd:Q652 .
FILTER (LANG(?personName) = "en")
}
ORDER BY ?personName
28. SPARQL queries
● Anyone born in Tartu who died somewhere in Italy?
SELECT DISTINCT *
WHERE {
?person wdt:P19 wd:Q13972; rdfs:label ?personName;
wdt:P20 ?place .
?place wdt:P17 wd:Q38; rdfs:label ?placeName .
FILTER (LANG(?personName) = "en" && LANG(?placeName) = "en")
}
ORDER BY ?personName
29. Software
● There is plenty of software to play with
LOD
– Python: rdflib
(http://rdflib.readthedocs.io)
– Java: the Apache Jena project
(https://jena.apache.org/)
30. Advantages
● Easier interlinking of heterogeneous data
● Easier creation and maintenance of data
schemas
● Distributed “by default”
● Controlled definition of shared knowledge
31. Challenges
● Rather new topic
– Needs skill and experience
● As data size increases, performance may worsen
– However, this depends on the use case
● Extra care is necessary when using distributed data
sources
– Accessibility & availability issues
– Data quality
34. Wikidata
● Wikidata is a free and open knowledge base that can
be read and edited by both humans and machines.
● Wikidata acts as central storage for the structured
data of its Wikimedia sister projects including
Wikipedia, Wikivoyage, Wikisource, and others.
● Wikidata also provides support to many other sites and
services beyond just Wikimedia projects! The content
of Wikidata is available under a free license,
exported using standard formats, and can be
interlinked to other open data sets on the linked data
web.
35. Wikidata
● Centralized access: only
one resource to link data
belonging to (or created
from) several projects
● Management of
structured data: not just
text pages, but also data
designed according to a
schema and usable by
external software
more Tartu identifiers
37. Wikidata
● There is a playground to try these things out:
the sandbox element
● Go to
https://www.wikidata.org/wiki/Q4115189
● Start editing!
– Note: it is possible to edit without being
logged in, but (as for Wikipedia) it would
be nicer to have an account