This paper surveys the landscape of linked open data projects in cultural heritage, exam- ining the work of groups from around the world. Traditionally, linked open data has been ranked using the five star method proposed by Tim Berners-Lee. We found this ranking to be lacking when evaluating how cultural heritage groups not merely develop linked open datasets, but find ways to used linked data to augment user experience. Building on the five-star method, we developed a six-stage life cycle describing both dataset development and dataset usage. We use this framework to describe and evaluate fifteen linked open data projects in the realm of cultural heritage.
Exploring the Future Potential of AI-Enabled Smartphone Processors
Linked Open Data for Cultural Heritage
1. Linked Open Data Projects for Cultural Heritage:
Evolution of an Information Technology
Julia Marsden – Carolyn Li-Madeo
Jeff Edelstein – Noreen Whysel
Lola Galla– Alison Rhonemus
Cultural Heritage: Description & Access
Pratt SILS LIS 670 – Spring 2013
Prof. Cristina Pattuelli
2. WHAT IS LINKED OPEN DATA?
Linked Data provides a mechanism
for representing databases (RDF)
and a mechanism for querying
those databases (SPARQL)*
Linked Open Data uses W3C
Semantic Web standards to create
relationships between previously
isolated data silos
Behind almost every website is a
database and although these sites
are linkable the information in their
databases is left unconnected
*From the New York Times’ OPEN blog
3. REVIEW OF TERMINOLOGIES
RDF Triple
Subject
Object
Predicate
URI
API
An Application Programming Interface
software
program
software
program
Allows software programs to interact
with one another
URL URN
U
R
I
Unique Resource Identifier
URI
SPARQL Query
• SPARQL Protocol and RDF Query Language
• Query language for RDF / Databases
• Allows users to write unambiguous queries
4. METHODOLOGY
•Affiliation / Mission / Intended Audience
•Knowledge Organization / Data Models & Vocabulary
•Technology Platform
•Usability/Interface Design
•Discovery (search & navigation)
•Data Shareability (ie. availability of an API)
•Sustainability (ie. digital preservation, documentation or available code)
•Project Leaders
•Funding Sources
•Level of Collaboration
•Analysis
•Star-Rating (ie Tim Berners-Lee's coffee cup)
5. Developing Datasets
Release one or more datasets in linked
open format, expressed as RDF triples,
that others may use.
Projects: Library of Congress; Pan-
Canadian Documentary Heritage
Network
Linking Data
Cultural heritage institutions link their datasets
to others (e.g., DBpedia, VIAF, GeoNames) to
enhance discovery and reuse of
their collections.
Projects: Hungarian National Library;
Civil War 150; Linking Lives;
Bibliothèque national de France
Documenting Processes for
Reuse
Explain linked open data and ways
that cultural heritage professionals
can use datasets.
Projects: New York Times;
Deutsche National Bibliothek
Developing User Interfaces
Institutional or collaborative projects use
the datasets to develop applications , including
interfaces, visualizations, and augmented reality.
Projects: Agora; Pan-Canadian Documentary Heritage
Network; Amsterdam Mobile City App; Linked Jazz
Promoting Reuse
Institutions go beyond the creation
of their own test projects, encouraging
users to develop innovative applications.
Projects: Open Cultuur Data, EUScreen
Expanding the Definition
of Cultural Heritage
Efforts from outside the cultural
heritage framework, such as
government agencies and
international aid organizations,
can serve to strengthen societies
and their cultural institutions.
Project: Open Data for Resilience
Initiative
LINKED DATA LIFE CYCLES
7. Pan-Canadian Documentary Heritage Network
• Formed in 2010; highly collaborative effort across a broad spectrum
of LAMs.
• Pilot project results published July 2012:
• RDF metadata
• Detailed project report
• Demonstration video, “Out of the Trenches”
• Project content submitted in various formats:
• War songs (MARC records; BAnQ)
• War posters (spreadsheets; McGill)
• Newspaper articles, postcards, and wartime records (MODS XML; University of Alberta)
• Portrait archives of CEF solders; WWI documents (spreadsheets; University of Calgary)
• Archival material from Saskatchewan War Experience Project (DC RDF; University of
Saskatchewan)
• Use of external LOD datasets:
• Geonames, VIAF, LCSH, TGM, Rameau, LACSH
• Metadata then mapped to ontologies (e.g., events, places,
persons)
• Principal findings:
• Good approach for resource integration and discovery
• Considered “reuse” in terms of using element sets in multiple
contexts (e.g., “role” as predicate or as object) and repurposing vocabularies
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
8. LIBRARY OF CONGRESS
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
9. LIBRARY OF CONGRESS
Dereferenceable URI
Name Variants
Related Terms
Promotes existing
Library of Congress
resources to Linked
Open Data web
resources, uncovers
and connects
related names and
terms
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
10. LIBRARY OF CONGRESS
Multiple formats are
available for wider use
LC Classification Numbers
are related to each entry
Connects with and
acknowledges other
schemes
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
12. developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
13. developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
14.
15. CIVIL WAR DATA 150
Project was designed to
encourage the contribution of a
wide variety of data sources:
from institutions to individuals
Partnership between The
Archives of Michigan, The
Internet Archive and Freebase
Celebrating the
sesquicentennial of the
American Civil War
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
16. CIVIL WAR DATA 150
Project Goals:
Create web apps to
enable users to add to or
modify shared metadata
with strong identifiers
Engage the public in the process of
interacting with and adding value to the data
Identify sources and map
metadata into Freebase
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
17. LOCAH and Linking Lives
• Projects of Archives Hub UK (http://archiveshub.ac.uk), which represents more than 220
institutions
• LOCAH (Linked Open Copac& Archives Hub; 2010-2011):
• Published data from Archives Hub finding aids and Copac, a union catalog of more than 70
major UK libraries
• Created LOD resources:
1. SPARQL endpoint
2. Query box for trying out SPARQL queries
3. RDF dump of the dataset
4. Archives HUB EAD to RDF XSLT stylesheet
• Linking Lives (2011-2012) expanded on LOCAH
• Test project focusing on biography
• Brought in more external datasets (Dbpedia, VIAF,
Freebase, OpenLibrary, BBC Programmes, Linked Open
British National Biography)
• Developed interface model (wireframe)
• Principal findings:
• Even when expressed in triples, data may lack uniformity, requiring time-consuming clean-up
• Difficulty of firmly establishing identity when there are variant forms of names or identifying
roles (e.g., “author” vs. “writer”) and when different people have the same name
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
19. DEUTSCHE NATIONAL BIBLIOTEK
• Linked Data Service
• Library scientist led
• Authority names and
bibliographic data
• Downloadable dataset
• SRU and OAI/PMH interfaces
• Extensive documentation
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
20. THE NEW YORK TIMES
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
21. THE NEW YORK TIMES
The OPEN Blog
Documents and contextualizes the APIs
Platform for sharing Open Source Code
Forum for trouble shooting and ideas
Downloadable SKOS Files
The entire dataset is downloadable
Developers can also chose by topic
Users are invited to utilize the datasets
and APIs through downloads,
documentation, support and explanation
of LOD terminology, code and uses
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
22. THE NEW YORK TIMES
Available APIs
Developer Network
API Request Tool allows developers to
search through the expansive list of
APIs and set parameters for their search
using a widget. The tool then formats
the URL and request results
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
24. AUSTRALIAN WAR MEMORIAL
• Proof of concept
• Developer led
• Embedded RDF tags
• Page based API
• No documentation or
downloadable dataset
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
25. THE AMSTERDAM MUSEUM
• Mobile app parses data
from Amsterdam museum
and linked ontologies
• Proposal for visual
interface that enables
user to become tour guide
• Current problem: search
and download speed
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
26. Out of the Trenches Demonstration Video
Subjects can be explored across a range of dimensions
Source: http://www.canadiana.ca/sites/pub.canadiana.ca/files/LOD-Demo-ENG_0.mp4
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
27. developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
29. OPEN CULTUUR DATA INITIATIVE
• Offered workshops on how cultural heritage orgs could open their
data
• Hosted hackathons to encourage developers to turn datasets into
apps
• Three award-winners:
• VISTORY (using LOD Open Images dataset)
• Rijksmonumenten.info
• Connected Collection
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
30. OPEN CULTUUR DATA INITIATIVE
Screenshot from
http://www.glimworm.com/vistory.shtml
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
31. EUSCREEN
• Linked Data Pilot
• International collaboration
• Open, International standards
• Downloadable datasets
• Fully documented
• Showcase of projects in blog
• Active in promoting reuse
developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
33. developing datasets – linking data – documenting processes – developing user interfaces – promoting reuse – expanding definitions
34. CONCLUSIONS
• (Most) LOD projects:
• Proof of concept
• No access to a dataset
• Not highly documented
• Highly curated
• Experimental
• Promising
• The number of LOD datasets continues to increase
• Actual use by cultural heritage institutions appears to remain limited
• Trust remains an obstacle
• Compare: “A guppy is_a_Kind_of fish” (TRUE)
“A pony is_a_Kind_offish" (UNTRUE)
Computers see these as equally valid.
• Verifying or identifying source of a statement may become a best practice
• Information added to triples?
“A guppy is_a_Kind_offish [source] DBpedia”
• Published datasets hold great potential for making the content of an archive's collections
known
• Researcher studying Person A finds that a collection of Person X's letters includes letters
to or from Person A