Providing open data is of interest for its societal and commercial value, for transparency, and because more people can do fun things with data. There is a growing number of initiatives to provide open data, from, for example, the UK government and the World Bank. However, much of this data is provided in formats such as Excel files, or even PDF files. This raises the question of
- How best to provide access to data so it can be most easily reused?
- How to enable the discovery of relevant data within the multitude of available data sets?
- How to enable applications to integrate data from large numbers of formerly unknown data sources?
One way to address these issues to to use the design principles of linked data (http://www.w3.org/DesignIssues/LinkedData.html), which suggest best practices for how to publish and connect structured data on the Web. This presentation gives an overview of linked data technologies (such as RDF and SPARQL), examples of how they can be used, as well as some starting points for people who want to provide and use linked data.
The presentation was given on August 8, at the Hacknight event (http://hacknight.se/) of Forskningsavdelningen (http://forskningsavd.se/) (Swedish: “Research Department”) a hackerspace in Malmö.
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
Open data and linked data
1. Open data & linked data
Hacknight, Malmö – 6 augusti 2011
Marie Gustafsson Friberger
marie.friberger@mah.se
2. Hans Rosling
http://www.flickr.com/photos/23176450@N08/2663925153/
3. Hans Rosling
"The database hugging in public institutions
is hampering innovation."
Hans Rosling at OECD World Forum in Istanbul, 2007
http://www.viddler.com/explore/JesseRobbins/videos/4/
http://www.flickr.com/photos/23176450@N08/2663925153/
4. Why open data?
• Transparency
• Value – for society as a whole and
commercially
• More people can do fun things with data!
10. • How best to provide access to data so it
can be most easily reused?
• How to enable the discovery of relevant
data within the multitude of available data
sets?
• How to enable applications to integrate
data from large numbers of formerly
unknown data sources?
10
12. Available on the web (whatever format),
but with an open licence
Available as machine-readable structured
data (e.g. excel instead of image scan of a table)
as (2) plus non-proprietary format (e.g.
CSV instead of excel)
All the above plus, Use open standards from W3C
(RDF and SPARQL) to identify things, so that people can
point at your stuff
All the above, plus: Link your data to other
people’s data to provide context
17. Subject - Predicate - Object
Image from the book Semantic Web for the Working Ontologist by Allemang and Hendler.
18. Image from the book Semantic Web for the Working Ontologist by Allemang and Hendler.
19. Multiple Sources
+
+ Image from the book Semantic Web for the Working Ontologist by Allemang and Hendler.
20. One graph...
Image from the book Semantic Web for the Working Ontologist by Allemang and Hendler.
21. What is what?
• If two sources use the same terminology,
do they have the same thing in mind?
• URIs to the rescue!
• Two nodes are “the same” if they share the
same URI.
22. SPARQL
• SPARQL Protocol and RDF Query
Language (recursive acronym...)
• W3C recommendation
• A query contain a set of triple patterns
called a basic graph pattern.
• Triple patterns are like RDF triples except
that each of the subject, predicate and
object may be a variable.
23. Linked Data principles
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up
those names.
3. When someone looks up a URI, provide
useful information, using the standards
(RDF*, SPARQL)
4. Include links to other URIs so that they can
discover more things.
http://www.w3.org/DesignIssues/LinkedData.html
25. Linked data
25
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
26. • Extract structured information from
Wikipedia and to make this information
available on the Web.
• The DBpedia knowledge base currently
describes more than 3.4 million things,
out of which 1.5 million are classified in a
consistent ontology, including 312,000
persons, 413,000 places, 94,000 music
albums, 49,000 films, 15,000 video games,
140,000 organizations, 146,000 species
27. Possible Queries
• DBPedia allows you to find answers to
questions where the information is spread
across many different Wikipedia articles.
• For example...
43. Read more
• Heath and Bizer (2011) Linked Data: Evolving the
Web into a Global Data Space
http://linkeddatabook.com/editions/1.0/
• Allemang and Hendler (2011) Semantic Web for
the Working Ontologist
http://workingontologist.org/
• http://open.blogs.nytimes.com/2010/03/30/build-
your-own-nyt-linked-data-application/
• http://www.w3.org/2001/sw/wiki/Tools
Notas del editor
\n
\n
\n
\n
\n
\n
\n
How best to provide access to data so it can be most easily reused?\nHow to enable the discovery of relevant data within the multitude of available data sets?\nHow to enable applications to integrate data from large numbers of formerly unknown data sources?\n