American Art Collaborative Planning Grant Educational Briefings
Linked Data and Tools
Pedro Szekely - USC/Information Sciences Institute
September 30, 2014
1. Linked Data and Tools
Pedro Szekely
USC/Information Sciences Institute
pszekely@isi.edu, http://isi.edu/~szekely
September 2014
CC-By 2.0
2. Outline
• Introduction to linked open data
• RDF: the Resource Description Framework
• Tools to convert data to RDF
• Tools for linking/reconciliation/resolution
• Storing and maintaining the data
• Applications
Pedro Szekely CC-By 2.0 2
7. Problem
web pages are machine processable,
but not machine understandable
impractical for building applications using the data
Pedro Szekely CC-By 2.0 7
9. What Is Linked Data?
A method of publishing structured data
so that it can be interlinked
and become more useful
Builds upon standard Web technologies
such as HTTP and URIs
to share information
in a way that can be read automatically by computers
from Wikipedia
Pedro Szekely CC-By 2.0 9
10. “Linked” Open Data
Crystal Bridges
Museum of
American Art
Dallas Museum
of Art
Indianapolis
Museum
of Art
National Portrait
Gallery
The Metropolitan
Museum of Art
Smithsonian American
Art Museum
Pedro Szekely CC-By 2.0 10
11. “Linked” Open Data
Crystal Bridges
Museum of
American Art
Dallas Museum
of Art
… data is public!
… in a common format!
… but we only have islands of data!
Indianapolis
Museum
of Art
National Portrait
Gallery
The Metropolitan
Museum of Art
Smithsonian American
Art Museum
✔
✖
Pedro Szekely CC-By 2.0 11
13. Linked Data Principles
• Use URIs as names for things
• Use HTTP URIs so that people
can look up those names
• When someone looks up a URI,
provide useful information,
using the standards (RDF,
SPARQL)
• Include links to other URIs so
that they can discover more
things http://youtu.be/OM6XIICm_qo!
http://www.w3.org/DesignIssues/LinkedData.html !
Pedro Szekely CC-By 2.0 13
14. Pedro Szekely
Principle 1
Use URIs as names for things
Principle 2
Use HTTP URIs so that people can look up those names
CC-By 2.0 14
24. Pedro Szekely
http://szekelys.com/diego
Principle 3
When someone looks up a URI, provide
useful information, using the standards
(RDF*, SPARQL) CC-By 2.0 24
25. Pedro Szekely
Principle 4
Include links to other URIs so that they
can discover more things
CC-By 2.0 25
32. Resource Description Framework
Intended for representing metadata about Web resources,
such as the title, author, and modification date
of a Web document
… also be used to represent information about
things that can be identified on the Web,
even when they cannot be directly retrieved on the Web
Pedro Szekely CC-By 2.0 32
33. Represent Resources Using URIs
That guy has first name “Pedro”
h&p://szekelys.com/family#pedro
“Pedro”
h&p://xmlns.com/foaf/0.1/firstName
Pedro Szekely CC-By 2.0 33
34. Represent Information as Triples
h&p://szekelys.com/family#pedro
h&p://xmlns.com/foaf/0.1/firstName
Subject!
Predicate!
“Pedro”
The resource being described
A property of the resource
Object! The value of the property
Pedro Szekely CC-By 2.0 34
37. RDF Graphs
Real world objects! Kinds of things!
h&p://szekelys.com/family#pedro
“Pedro”
foaf:firstName
foaf:Person
rdf:type
foaf:homepage
h&p://isi.edu/~szekely
Literals!
Properties of things!
Pedro Szekely CC-By 2.0 37
41. Steps to Create Linked Open Data
• Select ontologies
… that define classes and properties for our data
• Convert data to RDF
… from the museum database to the ontologies
• Identify links to other Linked Data datasets
… to other museums and Link Data hubs
Pedro Szekely CC-By 2.0 41
42. • Select ontologies
… that define classes and properties for our data
CIDOC CRM
http://www.cidoc-crm.org/
Pedro Szekely CC-By 2.0 42
43. • Select ontologies
… that define classes and properties for our data
• Convert data to RDF
… from the museum database to the ontologies
Pedro SzPeekderlyo Szekely CC-By 2.0 43
44. RDF Mapping Tools
Tool Shortcomings Benefits
custom
labor intensive, error
flexible
code
prone
R2RML difficult to learn, only
for SQL databases
W3C standard, good documentation,
multiple vendors
RDF
Refine
only for tabular data graphical user interface, support for
reconciliation, open source
Karma semi-automatic, graphical user
interface, supports tabular data, XML
and JSON, multiple export formats,
R2RML compatible, open source
Pedro Szekely CC-By 2.0 44
53. Linking/Reconciliation Tools
Tool Shortcomings Benefits
custom
code
very difficult tuned to the data
SILK
LIMES
experimental, poor
support
work with RDF, efficient, relatively
easy to use
RDF
Refine
requires implementing
a new reconciliation
service
integrated with RDF conversion, user
interface for curation
Karma under development
Pedro Szekely CC-By 2.0 53
57. Storage Options
Technology Shortcomings Benefits
SPARQL
low reliability, esoteric,
endpoint
slow
sophisticated query language
RDF dump no query capability,
esoteric
flexibility: clients can
download and use in
applications, easy to publish
JSON-LD +
ElasticSearch
restricted query language very high performance,
mainstream technology, easy
to publish
Pedro Szekely CC-By 2.0 57
61. we have expanded the reach of linked data within the BBC to more
audience facing products and presented our ambitions to using linked
data as glue for the plethora of content the BBC produces!
!
http://www.bbc.co.uk/blogs/internet/posts/Linked-Data-new-ontologies-website!
http://www.bbc.co.uk/blogs/internet/posts/Linked-Data-Connecting-together-the-BBCs-Online-Content!
Pedro Szekely CC-By 2.0 61
http://www.bbc.co.uk/blogs/internet/posts/Opening-up-the-BBCs-Linked-Data!