2. “We shape our tools and
they in turn shape us”
Marshall McLuhan
2
3. The Wealth of Networks
“Different technologies make different kinds of human action and
interaction easier or harder to perform. All other things being equal,
things that are easier to do are more likely to be done and things that
are harder to do are less likely to be done.
All other things are *never* equal.
That is why technological determinism in the strict sense–if you have
technology “t” you should expect social structure or relation “s” to
emerge–is false…Neither deterministic nor wholly malleable, technology
sets some parameters of individual and social action. It can make some
actions, relationships, organizations and institutions easier to pursue,
and others harder…
The same technologies of networked computers can be adopted in very
different patterns. There is no guarantee that networked information
technology will lead to the improvements in innovation, freedom and
justice that I suggest are possible…The way we develop will, in
significant measure, depend on choices we make in the next decade or
so.”
– Yochai Benkler, The Wealth of Networks
4. Information economics and data
• Better informed markets operate more efficiently
• Governments are making more data available on the web
• We are at the beginning of an age of data abundance
• Large scale data aggregation is now possible
4
9. Which says…
16. GOVERNMENT TRANSPARENCY
The Government believes that we need to throw open the doors of
public bodies, to enable the public to hold politicians and public
bodies to account. We also recognise that this will help to deliver
better value for money in public spending, and help us achieve our
aim of cutting the record deficit. Setting government data free will
bring significant economic benefits by enabling businesses and non-
profit organisations to build innovative applications and websites.
We will ensure that all data published by public bodies is published in
an open and standardised format, so that it can be used easily and
with minimal cost by third parties.
9
10. Open Data Policy in the UK
• Open by default
• Open Government Licence
• Seeking to address substantial policy issues through the
use of open data
• Health and Transport data are at the forefront of this drive
• Consultation in Autumn 2011, White Paper early this year
10
12. Choosing formats for data
Formats for people Formats for machines
Focused on presentation or Focused on data interchange
typographic layout between computers
Look good, but hard to Look dreadful, hard for people
access the underlying data to understand but easy to
import into other systems and
use
12
13. A false dichotomy
Formats for Single Formats for
people source of machines
Focused on Focused on data
presentation or data interchange
typographic layout between computers
13
14. Download or programmatic access?
• Download
o Good for static information
o Small files
o Used for export/import
o Easy for publishers
o Most of the data registered on data.gov.uk
• Programmatic access
o Good for dynamic or real-time information or very large datasets
o Lets developers select and use just the information they need
o Retains more control for the publisher
o More complicated to implement but much more powerful
o Vital for many useful datasets
14
16. Henry Maudslay (1771–1831)
He also developed the first industrially
practical screw-cutting lathe in 1800,
allowing standardisation of screw thread
sizes for the first time. This allowed the
concept of interchangeability (a idea that
was already taking hold) to be practically
applied to nuts and bolts. Before this, all
nuts and bolts had to be made as matching
pairs only. This meant that when machines
were disassembled, careful account had to
be kept of the matching nuts and bolts
ready for when reassembly took place.
http://en.wikipedia.org/wiki/Henry_Maudslay
17. Joseph Whitworth (1804-1887)
In 1841, Joseph Whitworth created a
design that, through its adoption by many
British railroad companies, became a
national standard for the United Kingdom
called British Standard Whitworth. During
the 1840s through 1860s, this standard
was often used in the United States and
Canada as well, in addition to myriad
intra- and inter-company standards. .
http://en.wikipedia.org/wiki/Screw_thread
#History_of_standardization
18. Tim Berners-Lee five stars
* make your stuff available on the Web
(whatever format) under an open licence
** make it available as structured data (e.g.,
Excel instead of image scan of a table)
*** use non-proprietary formats (e.g., CSV
instead of Excel)
**** use URIs to identify things, so that people
can point at your stuff
***** link your data to other data to provide
context
18
20. Linked Data
• Give names, or web identifiers (URIs), to things
• Publish information about them as Web Resources
• Use RDF triples (subject, property, value)
• Link to other data about those things
20
21. Benefits
• Enables web-scale data publishing - distributed
publication with web-based discovery mechanisms
• Everything is a resource – follow your nose to
discover more about properties, classes, or codes
within a code list
• Everything can be annotated - make comments
about observations, data series, points on a map
• Easy to extend - create new properties as required,
no need to plan everything up-front
• Easy to merge - slot together RDF graphs, no need
to worry about name clashes
21
23. UK Government has been:
• developing standards for responsible publishing of key
types of data (financial data, organisation data, aggregate
statistics, location data)
• developing guidance, practices and tools that make it
easy to publish data in Linked Data form, at low cost
• making it easy for people to consume data in a
programmatic way
24. Types of data:
2008 2009 2010
Director
General A 1,345 1,456 2,301
B 2,112 3,543 2,111
C 2,345 2,987 2,455
Director Director
(Operations) (Strategy) D 6,342 6,256 6,123
E 7,435 7,432 8,102
Deputy Director Deputy Director
(A) Transaction
(A) Date Supplier Amount
A-1263 09/09/2010 Spottiswoode & Co £ 2,345
A-1264 09/09/2010 JSB & Sons £ 2,111
A-1265 09/09/2010 BLG Ltd £ 2,455
A-1266 09/09/2010 Spottiswoode & Co £ 6,123
A-1267 09/09/2010 BLG Ltd £ 8,102
25. Naming things with URIs
• URI = uniform resource identifier
• Everything starts HTTP – which gives us actionable names
• There is choice about how to make URIs
• We are using
{sector}.data.gov.uk/id/{something}
25
28. Naming things in legislation
• If you visit legislation.gov.uk you will see we have taken
great care with naming things
Returns an html document for United Kingdom Public General Act (ukpga),
2005, Chapter 14, Section 1
Returns an html document with a list from all legislation types where the
title contains “wildlife”
29. Some names are quite sophisticated…
• UK Public General Act (ukpga)
• 1981
• Chapter 69
• Section 5
• As it extends to England
• As it stood on 30th January 2001
• Displayed as an HTML document with the timeline on
• Although URIs are opaque having this type of design
changes how people use the service
29
30. Legislation as Open Data
• Everything on legislation.gov.uk is available as open data
under the terms of our Open Government Licence
• To access the data, visit any page and add:
o /data.xml
o /data.rdf
o /data.xht
• For lists
o /data.feed
30
31. Linked Data Standards
• Re-use where we can, create where we must
• Small, high level, light weight vocabularies
o Examples include datacube, organization, provenance
• Create local specialisations
o Examples include payments, central-government
• Post hoc linking
31
35. Reference data
http://reference.data.gov.uk/id/day/2012-01-18
http://reference.data.gov.uk/id/department/CO
http://transport.data.gov.uk/id/station/WAT
http://education.data.gov.uk/id/school/341451
http://location.data.gov.uk/id/3245677362123
http://www.legislation.gov.uk/id/ukpga/2009/12/section/2
36. British time intervals
• http://reference.data.gov.uk/id/day/2011-06-1
• There are similar URIs for seconds, minutes, hours,
weeks, months, quarters, years
• We were a bit slow (170 years) to move from the Julian
to Gregorian Calendar (see the Calendar Act, 1750)
• To transition, we lost 11 days in 1752
• Convoluted explanation of why the tax year in the UK
starts on the 6th April
• Our URIs for time intervals work this way too and the
British time intervals URI Set is linked to the
legislation
38. Chop-O-Matic
• Malcolm Gladwell article on Ron Popeil from 2000 in the
New Yorker:
• ”And how do you persuade people to disrupt their lives? Not
merely by ingratiation or sincerity, and not by being famous
or beautiful. You have to explain the invention to consumers
- not once or twice but three or four times, with a different
twist each time. You have to show them exactly how it
works and why it works, and make them follow your hands
as you chop liver with it, and then tell them precisely how it
fits into their routine, and, finally, sell them on the
paradoxical fact that, revolutionary as the gadget is, it's not
at all hard to use.”
43. Linked Data API
• Open Standard
• Generic approach for creating APIs from Linked Data
• Sits on top of a Linked Data store
• Several implementations, most mature is Puelia
43
48. Publishing Organisation Data
• We will require public bodies to publish online the job titles
of every member of staff and the salaries and expenses of
senior officials paid more than the lowest salary permissible
in Pay Band 1 of the Senior Civil Service pay scale, and
organograms that include all positions in those bodies.
49. Our first go…
• October 2010
• CSV template and PDFs of organograms, typically authored
using Powerpoint
• Emphasis on visual appearance, led to inconsistent
datasets which are very hard to re-use
• No relationship between the organogram and data
• Not using web standards
49
50. Press Release
“The Government has published
the most comprehensive
organisational charts of the UK
Civil Service ever released online,
taking another step towards its
goal of being the most transparent
government in the world and
opening up the structure of the
Civil Service to public scrutiny”
51. It’s *all* Linked Data
• 100s of UK Government Organisations published their
organisation data as Linked Data
• Distributed data publishing
• The data is deeply linked (Departments, Grades ,
Professions, date of the snapshot)
• Cross dataset queries are perhaps the most interesting
• Proves Linked Data is moving from research topic to
commodity publishing
• We can now extend this approach to other types of dataset
and link our transparency data
51
52. Our aims with Organogram Data
• Make it as simple as possible for people in Departments to
create Linked Data
• Create high quality, consistent data that matches the policy
intent and guidance
• Distributed capture and publishing
• Create open data in open standards using open source
tools
• Human readable and machine readable from single source
• Provide download and API access in different formats
(CSV, XML, JSON, RDF, HTML)
• Evolutionary route to create longitudinal datasets,
reconciling against previous data
• Enable everyone to publish 5 Star Linked Data
52
53. The process
• Capture organisation data using a spreadsheet, which
verifies policy rules and datatypes
• Upload spreadsheet
• Preview organogram
• Download RDF and two CSVs
• Publish on your website and register with data.gov.uk
53
54. The Excel bit…
• It’s the tool most Civil Servants have
• This *does* also work in Libre Office / Open Office etc
54
58. Linked Data Publishing Infrastructure
Organogram
HTML, CSS &
JavaScript
Excel
file
HTML XML JSON
1. Upload Excel
Organogram (PHP) Linked Data API
2. Create 3. Create 4. Query 5. Create
CSVs Mapping (SPARQL) RDF
RDF file
Senior Junior Mapping API
6. Load
CSV CSV TRiG RDF Config
7. Query
XLWrap (SPARQL)
Sesame
TDB RDF Store
Reconciliation
58
59. Linked Data adds value
• Implicit properties are made explicit (person, role, person in
a role)
• Reconciliation adds value by automatic linking to other data
• Provenance
• Example data
• Explicit open licence
62. On the web, everything is a claim
• How did you come by this information?
• What did you do with it?
• When, who and how?
62
63. An opportunity
• We are developing a new system for publishing legislation,
operating inside the government secure intranet / extranet
• We want to provide evidence that supports the data we are
publishing
63
64. Legislation workflows
• Complicated and vary by jurisdiction and content type
• We take documents in different formats (Word,
Framemaker) and convert them to a single format (XML)
• We store XML documents in an XML Database
• We take documents from a single format (XML) and
transform them to different formats (HTML and PDF)
• Complex processes for handling images etc
• Sometimes mistakes are made, which can be corrected
through a “Correction Slip”
64
65. Objectives for provenance with legislation
• Transparency and public trust - we substantiate our claim
that this web page is what the legislation says
• The audit trail is repeatable
• Performs automatic checks along the way and evidence
that checking
• Use digital signatures rather than rely on the immutability of
paper, to ensure authenticity
• Create a data source we can use to resolve any disputes
(where did that footnote go?)
• Create a data source we can use to measure contractual
performance (how long did it take to publish that
document?)
65
66. Our technology choices
• We use both XML and RDF
• XML is brilliant for single source publishing solutions – one
source, many outputs
• RDF provides a flexible data model for other types of
information (bibliographic metadata, but also things like
which item of legislation has changed what)
• We are recording provenance in RDF using the Open
Provenance Model Vocabulary
66
69. Publishing provenance
• Provenance information may be associated by including a
<link> element in the HTML <head> section:
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<link rel="provenance" href="provenance-URI">
<link rel="anchor" href="entity-URI">
<title>Welcome to example.com</title>
</head>
<body> ... </body>
</html>
69
70. Summary
• Linked Data is essential to realising the promise of Open
Government Data
• Using Linked Data means working on
o Standards
o Reference Data
o Production
o Publishing
• Benefits grow with the more data you want to combine
• Lots of opportunities for international collaboration
• Best advice, just start
Different technologies make different kinds of human action and interaction easier or harder to perform. All other things being equal, things that are easier to do are more likely to be done and things that are harder to do are less likely to be done. All other things are never equal. That is why technological determinism in the strict sense–if you have technology “t” you should expect social structure or relation “s” to emerge–is false…Neither deterministic nor wholly malleable, technology sets some parameters of individual and social action. It can make some actions, relationships, organizations and institutions easier to pursue, and others harder…The same technologies of networked computers can be adopted in very different patterns. There is no guarantee that networked information technology will lead to the improvements in innovation, freedom and justice that I suggest are possible…The way we develop will, in significant measure, depend on choices we make in the next decade or so.
Combination of OSS, cloud computing and other similar trends.
Names are important, they provide the framework or the architecture around which
Upload your spreadsheet
Preview your data in different ways
Then simply download your RDF!
Architecture
A “data explorer” view, for filtering and querying data and pulling it back in different formats