Valentine Charles (Europeana) “Linking cultural heritage with KOS: the Europeana example”
Presentation at the KnoweScape workshop "Evolution and variation of classification systems" March 4-5, 2015 Amsterdam
Valentine Charles: Linking cultural heritage with KOS: the Europeana example
1. Linking cultural heritage with KOS
the Europeana example
Valentine Charles
Evolution and variation of classification systems – KnoweScape, Amsterdam,
05.03.2015
2. Context
à Aggregates metadata from the cultural heritage sector in
Europe
• Libraries, museums, archives and audio-visual archives
• Metadata in 33 languages
à Provides a portal for users to access data and objects
• http://www.europeana.eu/ in 31 languages
• Metadata under Creative Commons Zero - public domain
• Previews and links to source
à Data distributed via
• API http://labs.europeana.eu/api/
• Linked Data (currently being updated) http://data.europeana.eu/
4. Create a new data framework for richer
metadata
à Europeana Data Model (EDM)
• Re-uses several existing Semantic Web-based models: Dublin
Core, OAI-ORE, SKOS, CIDOC-CRM…
• More granular metadata
• links e.g. between objects and context entities (persons, places)
• multilingual & semantic linked data for contextual resources (e.g.
Concepts)
à EDM gives support for contextual resources (semantic layer)
5. Rely on KOS to solve a problem of data
integration
à Create a “semantic layer” on top of connected cultural
heritage objects
• Include multilingual “value vocabularies”
• From Europeana’s providers or from third-party data sources
6. Contextual entities
Representing (real-world) entities related to a provided object
as fully fledged resources, not just strings
edm:Agent
foaf:name
skos:altLabel
rdaGr2:biographicalInformation
rdaGr2:dateOfBirth
skos:Concept
skos:prefLabel
skos:altLabel
skos:broader
skos:related
skos:definition….
edm:TimeSpan
skos:prefLabel
dcterms:isPartOf
edm:begin
edm:end
….
edm:Place
wgs84_pos:lat
wgs84_pos:long
skos:prefLabel
skos:note
dcterms:isPartOf….
7. Encourage data providers to contribute their
own vocabularies
à Benefit from data links made at data providers’ level
à Ingestion of vocabularies is made possible if the vocabularies used
the data structures EDM expects
• For instance SKOS for concept
à For other vocabularies, Europeana does custom mappings
8.
9. An example the integration of AAT URIs in EDM
hourglasses@en uurglazen@nl
reloj de las
horas@es
http://vocab.getty.edu/aat/300206197
edm:ProvidedCHO
Hourglass
urn:imss:instrument:401058
skos:Concept
http://vocab.getty.edu/
aat/300198626
skos:prefLabel
skos:prefLabel
skos:prefLabel
skos:broader
dc:type
10.
11. Demo with AAT and PartagePlus vocabularies
à http://www.europeana.eu/portal/search.html?
query=sabliers&rows=24&qf=PROVIDER%3A%22Museo+Galileo+-
+Istituto+e+Museo+di+Storia+della+Scienza%22&qt=false
à http://www.europeana.eu/portal/search.html?
query=Brooch&rows=24&qf=PROVIDER%3A%22Partage+Plus
%22&qt=false
13. Challenge #1
à Europeana needs to regularly check that vocabularies have not
changed at source:
• Changes in concepts’ identifiers
• Changes in the description of concepts (which would require a new
mapping)
14. Challenge #2
à Some of the vocabularies supported by Europeana have
been developed by projects
• Issue of sustainability who maintains the vocabulary when the
project ends? What happens to the data?
15. Europeana also manages its own vocabulary–
WWI example
à Europeana developed a series of domain specific “sub-sites”
à Europeana 1914-1918 (http://www.europeana1914-1918.eu/ )
developed its own vocabulary based on a subset of LCSH
• Terms translated in 10 languages and linked to id.loc.gov
• Published in SKOS via the OpenSkos vocabulary service
18. Challenge #3
à Creation of caches of existing LOD vocabularies
• Europeana needs to keep track of the updates at the vocabulary
provider side.
à The enrichment done on the Europeana side lives separately
from the source vocabulary.
19. Multilingual Access to Subjects (MACS)
à MACS project has produced manual and semi automatic
alignments between:
• Library of Congress Subject Heading (LCSH)
• RAMEAU
• Schlagwortnormdatei (SWD)
è 120,000 links created
à MACS is integrated in The European Library as links
included in all bibliographic data.
20. An example of a MACS
record before and after
additions by The
European Library :
- ARK identifiers
- LOD URIs
22. Automatic enrichment based on KOS
Goal: Contextualization which goes beyond
the scope of a particular platform
Object External Dataset
and Vocabulary
23. Automatic enrichment process in Europeana
• Metadata
fields in
resource
descriptions
• Selection of
potential rules
to match
• Matching the
values of the
metadata fields
to values of the
contextual
resources
• Adding
contextual links
• Selecting the
values from the
contextual
resource
• Augmentation of
the index with the
labels picked from
the vocabulary
Analysis
Linking
Augmentation
24. Vocabularies selection requirements
In the context of Europeana a target vocabulary should be:
à Technically available (through Linked Data or in dedicated
repositories), properly documented, and in open access;
à well-connected together, e.g. equivalent elements in other
vocabularies are indicated;
• Key to avoid duplication and redundancy
à Multilingual
25. Enrichment Types and Vocabularies
Enrichment Type Target vocabulary Source metadata fields
Places GeoNames dcterms:spatial,
dc:coverage
Concepts GEMET, DBpedia, dc:subject, dc:type
Agents DBpedia dc:creator, dc:contributor
Time Semium Time dc:date, dc:coverage,
dcterms:temporal,
edm:year
27. Challenge #4
à A significant change change in the target vocabulary implies
• an update of the retrieved RDF files and a new deployment of
the enrichment framework (and/or)
• An update of the enrichment rules
28. Challenge #5
à Europeana data providers might also perform enrichment on
their side
à Europeana has currently no mecanism to separate the
(curated) links to contextual resources by data providers from
(automatic) enrichments by providers.
29. Challenge #6
à Automatic enrichment has flaws and problems
• For instance linking any print to the physical “pressure” concept
because of its German “Druck” alternative label.
à Incorrect enrichments lead to
• Devaluation of curated metadata
• Loss of trust from providers
• Irrelevant search results
• Bad user experiences
30. To conclude
à Europeana continues to focus on pivot vocabularies such as
Wikidata, Agrovoc to improve its search and retrieval
services.
à We now investigates how to use more domains specific
vocabularies for dedicated services.
à We also work on the definitions of best practices and
evaluation methods for enrichment
• http://pro.europeana.eu/get-involved/europeana-tech/
europeanatech-task-forces/evaluation-and-enrichments
32. Toolbox
Replace text and
adjust size
Replace text and
adjust size
Replace text and
adjust size Replace text and
adjust size
Replace text and
adjust size
Replace text and
adjust size