Linked Data Basics Slot in WWW2012 Tutorial: Practical Cross-Dataset Queries on the Web of Data
http://latc-project.eu/events/www2012-tutorial-cross-dataset-queries
1. Linked Data Basics
Anja Jentzsch, Freie Universität Berlin
17 April 2012
Tutorial: Practical Cross-Dataset Queries on the Web of Data
WWW2012, Lyon, France
1
2. Architecture of the classic Web
Single global document space
Web Search
Browsers Engines
Small set of simple standards
1. HTML as document format
2. HTTP URLs as
HTML HTML HTML
• globally unique IDs hyper-
links
• retrieval mechanism
3. Hyperlinks to connect everything
A B C
2
3. Web 2.0 APIs and Mashups
No single global data space
Shortcomings
1. APIs have proprietary interfaces Mashup
2. Mashups are based on a fixed set of data
sources
3. No hyperlinks between data items within Web Web Web Web
API API API API
different APIs
A B C D
3
4. Web APIs slice the Web into Walled Gardens
Image: Bob Jagensdorf, http://flickr.com/photos/darwinbell/, CC-BY 4
5. Linked Data
Extend the Web with a single global data space
1. by using RDF to publish structured data on the Web
2. by setting links between data items within different data sources
RDF RDF RDF RDF RDF
RDF RDF RDF RDF RDF
RDF RDF RDF RDF
Links Links Links Links
A B C D E
5
6. Linked Data Principles
Set of best practices for publishing structured data on the Web in
accordance with the general architecture of the Web.
1. Use URIs as names for things.
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful RDF information.
4. Include RDF statements that link to other URIs so that they can discover
related things.
Tim Berners-Lee, http://www.w3.org/DesignIssues/LinkedData.html, 2006
6
7. The RDF Data Model
rdf:type
pd:chris foaf:Person
foaf:name Chris Bizer
foaf:based_near
dbpedia:Berlin
7
8. Data Items are identified with HTTP URIs
rdf:type
pd:chris foaf:Person
foaf:name Chris Bizer
foaf:based_near
dbpedia:Berlin
pd:chris = http://www.bizer.de#chris
dbpedia:Berlin = http://dbpedia.org/resource/Berlin
8
9. Resolving URIs over the Web
rdf:type
pd:chris foaf:Person
foaf:name Chris Bizer 3.450.889
foaf:based_near dp:population
dbpedia:Berlin
skos:subject
dp:Cities_in_Germany
9
10. Dereferencing URIs over the Web
rdf:type
pd:chris foaf:Person
foaf:name Chris Bizer 3.450.889
foaf:based_near dp:population
dbpedia:Berlin
skos:subject
skos:subject
dbpedia:Hamburg dp:Cities_in_Germany
skos:subject
dbpedia:Muenchen
10
11. RDF
• RDF is just a data model, it requires a serialization format
• For transmission over the network
• For storage as files
• Multiple serialization formats have been defined
• RDF/XML
• Turtle
• N-Triples
• RDFa
• ...
• It’s all triples!
• Syntax doesn’t matter much and can be chosen case-by-case for
pragmatic reasons
11
12. Properties of the Web of Linked Data
• Global, distributed data space build on a simple set of standards
• RDF, URIs, HTTP
• Entities are connected by links
• creating a global data graph that spans data sources and
• enables the discovery of new data sources
• Provides for data-coexistence
• Everyone can publish data to the Web of Linked Data
• Everyone can express their personal view on things
• Everybody can use the vocabularies/schema that they like
12
13. W3C Linking Open Data Project
• Grassroots community effort to
• publish existing open license datasets as Linked Data on the Web
• interlink things between different data sources
13
14. LOD Data Sets on the Web: May 2007
• 12 data sets
• Over 500 million RDF triples
• Around 120,000 RDF links between data sources 14
15. LOD Data Sets on the Web: November 2007
• 28 data sets
15
16. LOD Data Sets on the Web: September 2008
• 45 data sets
• Over 2 billion RDF triples 16
17. LOD Data Sets on the Web: July 2009
• 95 data sets
• Over 6.5 billion RDF triples 17
18. LOD Data Sets on the Web: September 2010
• 203 data sets
• Over 24,7 billion RDF triples
• Over 436 million RDF links between data sources 18
19. LOD Data Sets on the Web: September 2011
• 295 data sets
• Over 31 billion RDF triples
• Over 504 million RDF links between data sources 19
20. LOD Data Set statistics as of 09/2011
LOD Cloud Data Catalog on CKAN
• http://www.ckan.net/group/lodcloud
More statistics
• http://lod-cloud.net/state/
20
21. Uptake in the Government Domain
• The EU is pushing Linked Data (LOD2, LATC, Eurostat)
• W3C Government Linked Data (GLD) Working Group
22. Uptake in the Libraries Community
• Institutions publishing Linked Data
• Library of Congress (subject headings)
• German National Library (PND dataset and subject headings)
• Swedish National Library (Libris - catalog)
• Hungarian National Library (OPAC and Digital Library)
• British National Library
• Europeana project
22
23. Uptake in the Libraries Community
• W3C Library Linked Data Incubator Group (2010)
• OKFN Working Group on Bibliographic Data (2010)
• Goals:
• Integrate Library Catalogs on global scale
• Interconnect resources between repositories (by topic, by location, by
historical period, by ...)
23
24. Uptake in the Media Industry
• Publish data as RDF or embed as
RDFa
• Goal: Drive traffic to websites via
search engines
24
25. schema.org
• jointly proposed vocabularies for embedding data into HTML pages (Microdata)
• available since June 2011 25
26. Linked Data Applications
Linked Data Linked Data Search
Browsers Mashups Engines
Thing Thing Thing Thing Thing
Thing Thing Thing Thing Thing
typed typed typed typed
links links links links
A B C D E
26
30. Lower Data Integration Costs
The overall data integration effort is split between
the data publisher, the data consumer and third parties.
• Data Publisher
• publishes data as RDF
• sets identity links
• reuses terms or publishes mappings
• Third Parties
• set identity links pointing at your data
• publish mappings to the Web
• Data Consumer
• has to do the rest
• using record linkage and schema matching techniques 30
31. Is your data 5 star?
★ Make your stuff available on the Web (whatever format) under
an open license.
★★ Make it available as structured data (e.g., Excel instead of image
scan of a table) so that it can be reused.
★★★ Use non-proprietary, open formats (e.g., CSV instead of Excel).
★★★★ Use URIs to identify things, so that people can point at your stuff
and serve RDF from it.
★ ★ ★ ★ ★ Link your data to other data to provide context.
Tim Berners-Lee, http://www.w3.org/DesignIssues/LinkedData.html, 2010
31
32. How to publish Linked Data
Tasks:
1. Make data available as RDF via HTTP
2. Set RDF links pointing at other data sources
3. Make your data self-descriptive
4. Reuse common vocabularies
Tom Heath, Christian Bizer: Linked Data: Evolving the Web into a Global Data
Space
http://linkeddatabook.com/
32
33. Make Data available as RDF via HTTP
•Ready to use tools (examples)
• D2R Server
• provides for mapping relational
databases into RDF and for
serving them as Linked Data
• Pubby
• Linked Data Frontend for
SPARQL Endpoints
• More tools
• http://esw.w3.org/TaskForces/
CommunityProjects/
LinkingOpenData/PublishingTools
33
34. Set RDF links to other data sources
• Examples of RDF links
<http://dbpedia.org/resource/Berlin> owl:sameAs <http://
sws.geonames.org/2950159> .
<http://richard.cyganiak.de/foaf.rdf#cygri> foaf:topic_interest
<http://dbpedia.org/resource/Semantic_Web> .
<http://example-bookshop.com/book006251587X> owl:sameAs <http://
www4.wiwiss.fu-berlin.de/bookmashup/books/006251587X> .
34
35. How to generate RDF links?
• Pattern-based approaches
• Exploit naming conventions within URIs (for instance ISBNs, ISINs, …)
• Similarity-based approaches
• Compare items within different data sources using various similarity metrics
• Ready to use tools (Examples)
• Silk Link Discovery Framework
• provides a declarative language for specifying link conditions
which may combine different similarity metrics
• More tools
• http://esw.w3.org/TaskForces/CommunityProjects/LinkingOpenData/
EquivalenceMining
35
36. Make your Data Self-Descriptive
• Increase the usefulness of your data and ease data integration
• Aspects of self-descriptiveness
• Enable clients to retrieve the schema
• Reuse terms from common vocabularies
• Publish schema mappings for proprietary terms
• Provide provenance metadata
• Provide licensing metadata
• Provide data-set-level metadata using voiD
• Refer to additional access methods using voiD
36
37. Enable Clients to retrieve the Schema
Clients can resolve the URIs that identify vocabulary terms in
order to get their RDFS or OWL definitions.
Some data on the Web
<http://richard.cyganiak.de/foaf.rdf#cygri>
foaf:name "Richard Cyganiak" ;
rdf:type <http://xmlns.com/foaf/0.1/Person> .
Resolve unknown term http://xmlns.com/foaf/0.1/Person
RDFS or OWL definition
<http://xmlns.com/foaf/0.1/Person>
rdf:type owl:Class ;
rdfs:label "Person";
rdfs:subClassOf <http://xmlns.com/foaf/0.1/Agent> ;
rdfs:subClassOf <http://xmlns.com/wordnet/1.6/Agent> .
37
38. Reuse Terms from Common Vocabularies
• Common Vocabularies
• Friend-of-a-Friend for describing people and their social network
• SIOC for describing forums and blogs
• SKOS for representing topic taxonomies
• Organization Ontology for describing the structure of organizations
• GoodRelations provides terms for describing products and business entities
• Music Ontology for describing artists, albums, and performances
• Review Vocabulary provides terms for representing reviews
• Common sources of identifiers (URIs) for real world objects
• LinkedGeoData and Geonames locations
• GeneID and UniProt life science identifiers 38
40. Conclusion
• Linked Data provides a standardized data access interface
• Linked Data allows for the development of a variety of tools to integrate,
enhance and and view the data
• The Web of Data is growing rapidly
• There are active deployment communities in different domains
• Web search is evolving into query answering
• Search engines will increasingly rely on structured data from the Web
40
41. Thanks
Questions?
Email: anja@anjeve.de
Twitter: @anjeve
References
• Tom Heath, Christian Bizer: Linked Data: Evolving the Web into a Global Data Space
http://linkeddatabook.com/
• Christian Bizer, Tom Heath, Tim Berners-Lee: Linked Data – The Story So Far
http://tomheath.com/papers/bizer-heath-berners-lee-ijswis-linked-data.pdf
• Linking Open Data Project Wiki
http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
41