The document describes how the Amsterdam Museum converted its collection metadata into Linked Open Data using a methodology and set of tools. The metadata was extracted from XML sources, transformed into RDF using XMLRDF, restructured interactively using rules in ClioPatria, mapped to the Europeana Data Model, and aligned with external vocabularies using Amalgame. The resulting LOD graph of over 5 million triples representing the museum's 73,000 objects is published and queried using a SPARQL endpoint.
Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritage institutes: The Amsterdam Museum Case Study
1. Supporting Linked Data Production
for Cultural Heritage institutes:
The Amsterdam Museum Case Study
Victor de Boer, Jan Wielemaker, Judith van
Gent, Michiel Hildebrand, Antoine Isaac,
Jacco van Ossenbruggen, Guus Schreiber
EuropeanaConnect
3. Europeana
“Europeana enables people to explore the digital
resources of Europe's museums, libraries, archives and
audio-visual collections.’’
www.europeana.eu
From portal… …to data aggregator.
3
4. data.europeana.eu
2.4 Million objects exposed as
Linked Data.
8 aggregators, 200
institutions, 15 countries
Europeana Semantic Elements
converted to RDF Europeana
Data Model (EDM)
4
6. Linked data-ify
Convert to
Linked data
Mapped to EDM
Aggregate and convert
6
7. Methodology and tool stack
• Focus on transparency and interactivity
– Reproducability
– Both in conversion and alignment
• Maintain detail and complexity of original data
• Interoperability through schema mapping
7
8. Methods Tools
ClioPatria
1. XML ingestion (OAI)
2. Direct transformation to ‘crude’ RDF
XMLRDF
3. Interactive RDF restructuring
4. Create a metadata mapping schema
5. Align vocabularies with external sources Amalgame
6. Publish as Linked Data
8
cliopatria.swi-prolog.org powered by
9. Case study:
Amsterdam Museum
• Formerly Amsterdam Historic
Museum
– “The rich collection of works of art,
objects and archaeological finds brings
to life the fortunes of Amsterdammers
of days gone by and today.”
• In March 2010 published their whole
collection online
– 73.000 objects
– CC license
9
10. Methods Tools
ClioPatria
1. XML ingestion
2. Direct transformation to ‘crude’ RDF
XMLRDF
3. Interactive RDF restructuring
4. Create a metadata mapping schema
Amalgame
5. Align vocabularies with external sources
6. Publish as Linked Data
10
11. Ingested AM metadata
<record priref="10541“ >
• Adlib database XML API <acquisition.date>1997</acquisition.date>
<dimension>
<dimension.type>hoogte</dimension.type>
<dimension.unit>cm</dimension.unit>
• Object metadata <dimension.value>6</dimension.value>
</dimension>
• 73.000 objects, 256MB …
• Nested XML </record>
• Concept Thesaurus <record priref="28024“ >
<term>Kalverstraat 124</term>
• 27.000, 9MB <broader_term>Kalverstraat</broader_term>
<term.type>GEOKEYW </term.type>
• Different types (geo,motif, event) </record>
• Person Authority File <record priref="6" >
• 67.000 persons, 10MB <biography>boekverkoper en uitgever van
• Consolidated from object metadata fields cartografie</biography>
<birth.date.start>1659</birth.date.start>
• Creators, annotators, reproduction <death.date.start>1733</death.date.start>
creators, institutions, <name>Aa, Pieter van der</name>
<nationality>Nederlands</nationality>
<use>Aa, Pieter van der (I)</use>
</record>
11
22. Europeana Data Model (EDM)
• Dublin Core for metadata representation
– creator, date, title etc.
• SKOS for vocabularies
– preferredLabel, hasBroader, etc.
• RDA Group 2 elements for persons
– dateOfBirth, name etc.
• OAI-ORE to allow for aggregations etc.
• Some EDM-specific properties
– edm:wasPresentAt, …
22
23. Methods Tools
ClioPatria
1. XML ingestion
2. Direct transformation to ‘crude’ RDF
XMLRDF
3. Interactive RDF restructuring
4. Create a metadata mapping schema
Amalgame
5. Align vocabularies with external sources
6. Publish as Linked Data
23
29. “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
30.
31. Wrapping up
Methodology
• Stay as close as possible to original (XML) metadata
• Separate syntactic transformation, semantic interpretations
• Interactive workflow, simple steps
• Use rdf schema to map to interoperability layer
• Keep provenance, reproducability
Tools
• XMLRDF Realised clean workflow for RDF production.
• Amalgame: Interactive and transparent vocablary alignment
• ClioPatria Semantic server: statistics at any moment + Full
expressivity of Some Prolog
31
32. Issues
• Validate with real collection managers
– Making good rules is sometimes hard
– Graphical tools can help
• Integrate in normal collection workflow (tools)
– LD as another view on the data
– Live updates
• RDFS reasoning needed to have interoperability
32
33. http://semanticweb.cs.vu.nl/lod/am/
v.de.boer@vu.nl
amsterdammuseum.nl
?
ClioPatria: the SWI-Prolog RDF toolkit
(includes XMLRDF and Amalgame packages)
http://cliopatria.swi-prolog.org
Notas del editor
Rather than having Linked Data ingestion being done automatically by large aggregators,we present a methodology that is both transparent and interactive. The methodologycovers data ingestion, conversion, alignment and Linked Data publication. It ishighly modular with clearly recognizable data transformation steps, which can be evaluatedand adapted based on these evaluations. This design allows the institute’s collectionmanagers, who are most knowledgeable about their own data, to perform or overseethe process themselves. We describe a stack of tools that allow collection managers toproduce a Linked Data version of their metadata that maintains the richness of the originaldata including the institute-specific metadata classes and properties. By providinga mapping to a common schema interoperability is achieved.Flickr: givingnot@rocketmail.com, aoppelaar, hhesterr, Grufnik, moria, Banjaxx, Paradasos
2.4 million texts, images, videos and sounds gathered by Europeana. These objects come from data providers who have reacted early and positively to Europeana's initiative of promoting more open data and new data exchange agreements. These collections come from 8 direct Europeana providers encompassing over 200 cultural institutions from 15 countries.
Rather than having Linked Data ingestion being done automatically by large aggregators,we present a methodology that is both transparent and interactive. The methodologycovers data ingestion, conversion, alignment and Linked Data publication. It ishighly modular with clearly recognizable data transformation steps, which can be evaluatedand adapted based on these evaluations. This design allows the institute’s collectionmanagers, who are most knowledgeable about their own data, to perform or overseethe process themselves. We describe a stack of tools that allow collection managers toproduce a Linked Data version of their metadata that maintains the richness of the originaldata including the institute-specific metadata classes and properties. By providinga mapping to a common schema interoperability is achieved.Flickr: givingnot@rocketmail.com, aoppelaar, hhesterr, Grufnik, moria, Banjaxx, Paradasos
Rather than having Linked Data ingestion being done automatically by large aggregators,we present a methodology that is both transparent and interactive. The methodologycovers data ingestion, conversion, alignment and Linked Data publication. It ishighly modular with clearly recognizable data transformation steps, which can be evaluatedand adapted based on these evaluations. This design allows the institute’s collectionmanagers, who are most knowledgeable about their own data, to perform or overseethe process themselves. We describe a stack of tools that allow collection managers toproduce a Linked Data version of their metadata that maintains the richness of the originaldata including the institute-specific metadata classes and properties. By providinga mapping to a common schema interoperability is achieved.Flickr: givingnot@rocketmail.com, aoppelaar, hhesterr, Grufnik, moria, Banjaxx, Paradasos
- Not completely straightforward xml (nestedness)
XMLRDF tool: clean up, link to resources etc.
XMLRDF tool: clean up, link to resources etc.58 XMLRDF rewrite rules23 rewriting rules2 rules