Culture Geeks talk: "Adventures in Linked Data Land", by Richard Light.
Feb, 25th 2009 - Regency Town House
Culture Geeks is a Brighton-based community open to everyone who is
interested in using digital technologies in the cultural sector.
How AI, OpenAI, and ChatGPT impact business and software.
Culture Geeks Feb talk: Adventures in Linked Data Land
1. Adventures in Linked Data Land
Richard Light Consultancy
Culture Geeks, 25 February 2009
2. Discovering Linked Data
Four principles of Linked Data (Tim B-L):
Use URIs to identify resources
●
Use HTTP URIs so that people can look them up
●
Provide useful information about the resource
●
Include links to other URIs in your data
●
3. Discovering dbPedia
Extraction of Linked Data from Wikipedia
●
Statements in info boxes (mainly) become RDF
●
triples:
<rdf:Description
rdf:about=quot;http://dbpedia.org/resource/Ber
lin_Marathonquot;>
<dbpprop:location
rdf:resource=quot;http://dbpedia.org/resource/
Berlinquot;/>
</rdf:Description>
Note the URLs
4. Browsing Linked Data
View RDF as a web page:
●
http://dbpedia.org/page/Berlin
Navigate from one data source to another
●
Specialist Linked Data browsers/plugins:
●
DISCO
–
Marbles
–
Openlink Data Explorer
–
Tabulator
–
11. So what do we have here?
An initiative to generate lots of Linked Data
●
A Linked Data Cloud, containing a growing
●
number of RDF datasets
A hard-to-use query language capable of very
●
precise and powerful querying
Where do museums come into this picture?
12. The Wordsworth Trust
Typical museum collection: about 60,000 objects
●
Major collection of manuscripts (notebooks,
●
letters, etc.)
Objects published to the Web from a ModesXML
●
database
Unwise enough to allow me Remote Desktop
●
access ...
16. One identifier; three “views”
This object has a single persistent identifier:
●
http://collections.wordsworth.org.uk/object/GRMDC.C104.2
This maps to different views depending on the
●
“Accept” header in the HTTP request:
application/rdf+xml >> RDF
–
application/xtm+xml >> XTM Topic Map
–
Otherwise >> HTML (human-readable)
–
Achieved through a custom 404 “page not found”
●
handler
17. “Page not found” handler (1)
All URLs are fictitious, so they generate a 404
●
Modified a generic smart 404 handler from:
●
http://evolvedcode.net/content/code_smart404/
Added support for “303 See other” redirects
●
added wild card matching to re-format URLs
●
18. “Page not found” handler (2)
Generic URL, plus requested Accept format,
●
determine initial “303 See other” mapping, e.g.:
http://collections.wordsworth.org.uk/object/GRMDC.C104.2
+
Accept: application/rdf+xml
=
http://collections.wordsworth.org.uk/object/rdf/GRMDC.C104.2
When this is passed back in, the 404 handler has to
●
generate the required RDF directly
Can't just keep redirecting requests!
●
20. “Page not found” handler (4)
Generic URL plus a supported Accept type
●
generates a “303 See other” redirect
If it comes back as a page request, it is further
●
redirected with a “301 Moved permanently” to the
object's web page
If it comes back as an RDF or XTM request, the
●
record is fetched as XML and subjected to an
XSLT transform by the handler
21. What has been learnt?
The Linked Data paradigm encourages simple
●
RDF triples: no “blank nodes”
For an object, this becomes a simple metadata set,
●
very analogous to the PNDS DCAP format
The properties involved need to encapsulate the
●
whole relation between object and data, e.g.
<p:title>Ulswater from Pooley Bridge</p:title>
<p:technique>drawn</p:technique>
<p:maker>Farington, Joseph (1747-1821)</p:maker>
<p:technique>engraved</p:technique>
<p:maker>Middiman, Samuel (1750-1831)</p:maker>
22. Properties: which framework?
I have used dbPedia properties (for Linked Data
●
compatibility):
http://dbpedia.org/property/title
http://dbpedia.org/property/maker
A viable alternative would be PNDS DCAP:
●
http://purl.org/dc/elements/1.1/title
http://purl.org/dc/elements/1.1/creator
One framework which doesn't fit is the CIDOC
●
CRM:
E21 Physical Thing – E12 Production – E39 Actor = “creator”
23. The problem of URIs
Good Linked Data requires URIs everywhere
●
Most of my museum RDF resolves to strings
●
One exception is Geonames lookup:
●
Ullswater
becomes
http://www.geonames.org/2635191/
In the absence of a central “people” registry,
●
should be minting URIs for people myself
26. Implementation details
HTML needed a “back link” to RDF to keep
●
OpenLink Explorer happy:
<link rel=quot;alternatequot; type=quot;application/rdf+xmlquot;
href=quot;http://collections.wordsworth.org.uk/object/data/GRMDC
.C104.2quot; title=quot;RDFquot; />
Result is totally unfindable: need a search or
●
harvesting mechanism:
– OAI support (possible)
– SPARQL end-point (harder)
27. Conclusions
Implementing an RDF Linked Data front-end to a
●
museum database is feasible if:
You can generate multiple outputs from your database
–
(XML is sufficient)
You can implement a suitable URL rewriter or 404
–
handler
It's easy (and a good idea) to mint and publish
●
URIs for your collection objects
It's less clear where all the other URIs we'll need
●
will come from
28. LD: foothills of the Semantic Web
Linked Data is a very modest start
●
It's not obvious how this will scale
●
Full Semantic Web will involve machine-driven
●
processes
Judging by where we are today, that will be a
●
while coming ...