4. Linked data ...
publishing data on the web ...
... to enable integration, linking and reuse
across silos
5. Linked data
Apply the principles to the web to publication of data
The linked data web:
is a global network of things
each identified by a URI
fetching a URI gives a set of statements in RDF
things connected by typed links
open, anyone can say anything about anything else
Linked data is “data you can click on”
7. Example schools information
http://education.data.gov.uk/id/school/401874 a School
label phase
district “Secondary”
“Cardiff High School”
“Cardiff”
8. Example schools information
http://education.data.gov.uk/id/school/401874 a school:School
phase
label
district school:PhaseOfEducation_Secondary
“Cardiff High School”
http://statistics.data.gov.uk/id/local-authority-district/00PT label “Cardiff”
9. Example schools information
http://education.data.gov.uk/id/school/401874 rdf:type school:School
rdfs:label school:phase
school:district school:PhaseOfEducation_Secondary
“Cardiff High School”
http://statistics.data.gov.uk/id/local-authority-district/00PT rdfs:label “Cardiff”
10. Example schools information
http://education.data.gov.uk/id/school/401874 rdf:type school:School
rdfs:label school:phase
school:district school:PhaseOfEducation_Secondary
“Cardiff High School”
http://statistics.data.gov.uk/id/local-authority-district/00PT label “Cardiff”
http://data.ordnancesurvey.co.uk/id/7000000000025484
admingeo:ward
spatial:extent
admingeo:parish
GML: 310499.4 184176.6 310476.5 ...
11. Example schools information
http://education.data.gov.uk/id/school/401874 rdf:type school:School
rdfs:label school:phase
school:district school:PhaseOfEducation_Secondary
“Cardiff High School”
http://statistics.data.gov.uk/id/local-authority-district/00PT label “Cardiff”
owl:sameAs
http://data.ordnancesurvey.co.uk/id/7000000000025484
admingeo:ward
spatial:extent
admingeo:parish
GML: 310499.4 184176.6 310476.5 ...
12.
13. Role in data set publication
well suited to describing things
schools, companies, animal species, music tracks, tv programmes ...
what about datasets?
environmental measurements, experimental results, statistical analyses ...
14. Approach 1 : Data catalogues
treat the dataset as a single resource, identify with a URI
provide metadata as linked data
descriptive
categorical
technical and structural
Benefits?
separate of metadata from resource & repository
easy aggregation of metadata into catalogues
schema-less enables use-specific annotations and links
use of sharable category schemes and reference data
=> support for discovery
15. Approach 2 : Fine grain publication
publish the data set itself as linked data
entities, terms, individual records in data identified by URIs
data set structure and ontologies linked from data
still include dataset metadata
Benefits?
all benefits of approach 1 to support discovery
self-describing
data slices addressable (trace back, provenance, annotation)
integration across sets - reuse of terms for dimensions, units, values
fine grained access
=> integration, comparison, context, data as a service
16.
17. bathing water quality
what we do...
start of season
15th May Press interest
bathing season
what information 20-22 samples in 22weeks
is relevant to the public
about beaches
30th Sept
annual report
what November
we do
December
18. how linkable data helps
Tenby
Tourist Information Centre
Unit 2 , The Gateway Complex
Tenby. Wales , SA70 7LT
Tel: 01834 842 402
Fax: 01834 845 439
Email: tenby.tic@pembrokeshire.gov.uk
Photo by Skellig2008 (flickr)
19. Publishing the Bathing Water Quality data set
Bathing Sampling Zones Of Assessment
Vocabularies
Waters Points Influence s
e.g. http://location.data.gov.uk/def/ef/SampingPoint
URI Set
Bathing Sampling Zone Of
Reference Data Waters Points Influence
e.g. http://location.data.gov.uk/so/ef/SamplingPoint/bwsp.eaew
Assessme http://environment.data.gov.uk/data/bathing-water-quality
Observation
nt
Datasets
void:subset void:subset
In-season
Annual
Weekly
.../compliance Complianc .../in-season
Assessme
e
nt
20. Data cube vocabulary
collaborative development
sponsored by data.gov.uk
simple, flexible vocabulary
mirrors core information models from:
SDMX (Statistical Data and Metadata eXchange)
DDI (Data Documentation Initiative)
extension to SCOVO vocabulary
image: dullhunk @ flickr
21. Data cube model
A set of observations
indexed by dimensions
describing measures
interpreted according to attributes
(e.g. region)
dimension
measure(s) attributes
• population unit of measure = count
= 32,567 status = preliminary
...
dimension
(e.g. time)
23. Data cube vocabulary
1. Top level
DataSet qb:DataStructureDefinition
qb:component
provenance and metadata qb:sliceKey
structure qb:structure
Observation qb:DataSet qb:SliceKey
measured values, at dimensions qb:slice
qb:sliceStructure
qb:dataset
with attributes qb:Slice
direct link to DataSet qb:subSlice
qb:observation
qb:Observation
dimension values
measure value(s)
attribute values
24. Data cube vocabulary
1. Top level
DataSet qb:DataStructureDefinition
qb:component
provenance and metadata qb:sliceKey
structure qb:structure
Observation qb:DataSet qb:SliceKey
measured values, at dimensions qb:slice
qb:sliceStructure
qb:dataset
with attributes qb:Slice
direct link to DataSet qb:subSlice
Slice qb:observation
qb:Observation
optional grouping by fixing
dimensions dimension values
measure value(s)
attribute values
guide to presentation
allows for abbreviated data
25. Data cube vocabulary
2. Data Structure Definition
explicit definition of cube
qb:DataSet
structure, inline in the data qb:structure
enables qb:DataStructureDefinition
validation qb:component
visualization
discovery qb:ComponentSpecification
abbreviation qb:componentRequired
qb:componentAttachment
qb:order
qb:dimension
qb:measure
qb:attribute
26. Bathing Water Quality cubes
measures
total coliform count, entero virus count, ...
sample classification
dimensions
sampling point
sampling week
sampling year
attributes
abnormal weather
27. Everything has a URI
Selected Lists and
Individual Bathing Waters
Lists and Individual
Assessments
In-Season or Annual
Compliance
Vocabulary Terms
Datasets (and subsets)
Presented as:
HTML, (for people)
JSON, XML, RDF and CSV
(for programs)
28. Data Platform and Applications
Web of Linked Data
http://environment.data.gov.uk/lab/bwq-os.html
29. Outcomes
bathing water quality information available
as both data set and set of web APIs
updated weekly (in season)
third party applications to use and combine the data
seed a web of environmental and location data
reference identifiers can be reused for related information
URI patterns designed to be compatible with INSPIRE
31. Lessons
importance of reference identifiers
developer accessibility
linked data API
publish once, consume many ways
importance of maintenance and QoS expectation
reusable patterns:
reusable vocabularies - Data Cube, org ...
URI patterns
provenance – OMPV and specializations
incremental approach
32. Acknowledgements
Alex Coley (Environment Agency)
for slides 17, 18, and for sponsoring the bathing water quality
data publication
Stuart Williams
developer of the bathing water application and slides 19,27,28
John Sheridan (The National Archive)
for sponsoring the development of data cube
Richard Cyganiak, Jeni Tennison
co-developers of the data cube vocabulary
33. fin.
fin.
image: Christian Haugen @ flickr.com
35. Linked data principles
Use URIs as names for things
Use HTTP URIs so that people can look up those names
When someone looks up a URI, provide useful
information, using the standards (RDF*, SPARQL)
Include links to other URIs, so that they can discover
more things
Pattern of application of semantic
web stack
36. Linked open data cloud: 2007
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
37. Linked open data cloud: 2009
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
38. Linked open data cloud: 2010
Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/
39. Accessing all this data
link following
HTTP GET, follow links, aggregate relevant statements
query
SPARQL
40. SPARQL
core idea is pattern matching
graph patterns with variables
any subgraph which matches yields row of bindings
ont:districtAdministrative rdfs:label
?school [] “Cardiff”
syntax based on Turtle syntax for RDF
web API endpoints
lots of power
filters sub-queries federated query
optionals property chains update
named graphs aggregation construct
41. Accessing all this data
link following
HTTP GET, follow links, aggregate relevant statements
query
SPARQL
linked data API
RESTful API onto linked data resources
simple query, usable without RDF stack, web dev friendly
easy to layer visualizations and UIs on top
third parties
search engines and aggregators e.g. Sindice, sameAs.org