Europeana meeting under Finland’s Presidency of the Council of the EU - Day 1, 24 october 2019
1. Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0
EUROPEANA MEETING
UNDER FINLAND’S PRESIDENCY
OF THE COUNCIL OF THE EU
ESPOO, FINLAND
24 October 2019
2. Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0
Andy Neale
Technical Director
Europeana Foundation
3. Europa [Material cartográfico] : Nach den vorzüglichsten Hülfsnitteln, Götze, Johann August Ferdinand, 1773-1819 Biblioteca Digital de Madrid Spain, Public domain
Contribution to EU GDP
by culture and creative sectors
Trade Surplus
in cultural goods
€ 8.7B 4.2%
New Agenda for Culture
8. Europa [Material cartográfico] : Nach den vorzüglichsten Hülfsnitteln, Götze, Johann August Ferdinand, 1773-1819 Biblioteca Digital de Madrid Spain, Public domain
3.700
CHIs across Europe
9. EUROPEANA COLLECTIONS
58m
Cultural heritage records
Europa [Material cartográfico] : Nach den vorzüglichsten Hülfsnitteln, Götze, Johann August Ferdinand, 1773-1819 Biblioteca Digital de Madrid Spain, Public domain
2.5bln
Information items
10. 1. Common Tech & data architecture
Europeana
Data Model +
Metis
12. Statements for works that are
not in copyright
Statements for works where the
copyright status is unclear
Statements for works that are in
copyright
16. Objectives
1. Stimulate reflection on multilingualism in digital cultural heritage at
large using Europeana as a case study;
2. Develop a deeper understanding of the multilingualism
problem/opportunity space for digital cultural heritage;
3. Consider what options can be pursued to provoke action at the local
level, furthering the multilingual capabilities;
4. Provide input and feedback for the Europeana multilingual strategy.
17. Sessions
1. Setting the scene
2. User interactions
3. Multilingual metadata
4. Content translation
5. Conclusions and steps for progress
18.
19. Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0
Juliane
Stiller
Information Specialist
You, We & Digital
‘Multilingual Developments in
Digital Cultural Heritage Domain:
Problem Space & Solutions’
20. 20
● 10 years researcher at
Humboldt-Universität zu
Berlin in Europeana-related
projects
● multilinguality, interaction
patterns, metadata and its
quality, research on search
and browse, retrieval,
evaluation
● since 2019 consultancy and
training in digital literacy
@stillinsky
21. Agenda
• Multilinguality: the problem space
• Bridging the language gap
• Translations
• Enrichments
• What is left to do?
21
24. User Interface
24
Challenges:
• Translation of static and dynamic
pages
• Switching languages via text or
icons such as flags
• Default language
• Determine the user‘s preferred
language through IP address or
browser settings
User Interface
26. Mismatch between query and
content language
• Mona Lisa 203 results
• Monna Lisa 13 results
• La Gioconda 376 results
• La Joconde 78 results
26
Interactions
Roma, Galleria Corsini - La
Gioconda,
35. Query heterogeneity & long tail
35
Europeana queries in a month in 2016
442 times: Wolfgang Amadeus
Mozart
once: full history of ging
tsholing in bhutan
36. Queries in cultural heritage are
● Short
● Heterogeneous
● Focus on entities: 61.96% of the queries contain NE (Stiller, Gäde &
Petras, 2010)
● Highly ambiguous in language:
○ “culture”, “administration”, “paris”, “madonna”
● Semantically ambiguous:
○ “barber” (composer or hairdresser)
36
37. Multilingual academic search
● informational queries from the psychology domain in 4
languages: pubpsych.eu
● Buildung domain-specific lexical resources and map them
to queries; entries look like this:
○ wohlbefinden|||en:well-being|||es:bienestar|||fr:bien-etre
○ wohlfuhlen|||en:well-being|||es:bienestar|||fr:bien-etre
○ Well-being|||es:bienestar|||de:wohlbefinden|||fr:bien-etre
● Translation does not depend on language identification
● Deals well with NE -> no match in Lexicon, no translation
More Info on the project: https://www.clubs-project.eu/en/
38. Query
2) Translate the content
38
Spanish
French
German
English
Content
English
French
German
Spanish
Content
Database
40. Challenges
• Missing training data for small languages
• Missing training data for (sub)domains
• Amount of language pairs is immense with 50+
languages
• Metadata is too scarce for good translation results
40
43. Number of enriched objects, their type and
vocabularies
GeoNames
7 Millions
GEMET, DBpedia
9.2 Millions
Semium Time
10.2 Millions
DBpedia
144,000
Time Concept
Locations
Agents
Enriched entities in Europeana
46. Adapt to queries
Entity graphs for
exploration
• Object
• Person
• Concept
• Period
• Location
• Event
46
47. Evaluate solution based on goal
○ E.g. for ML retrieval we might not need the perfect fluent
translation
○ Identify the impact of different workflows / processes on
multilinguality of system
○ Translations do not only have an impact on data but also on
retrieval and therefore on user satisfaction
47
49. References
• Petras, V., Hill, T., Stiller, J., & Gäde, M. (2017). Europeana – a Search Engine for Digitised Cultural
Heritage Material. Datenbank-Spektrum, 1–6. https://doi.org/10.1007/s13222-016-0238-1
• Hill, T. D., Charles, V., Isaac, A., & Stiller, J. (2016). “Searching for Inspiration”: User Needs and Search
Architecture in Europeana Collections. ASIS&T 2016 Annual Meeting.
• Manguinhas H (2016) Europeana Semantic Enrichment Framework. Documentation, Europeana.
https://docs.google.com/document/d/1JvjrWMTpMIH7WnuieNqcT0zpJAXUPo6x4uMBj1pEx0Y
• Stiller, J. (editor) )(2016) Best practices for multilingual access. Tech. rep.
http://pro.europeana.eu/files/Europeana_
Professional/Publications/BestPracticesForMultilingualAccess_whitepaper.pdf
• Stiller, J., Gäde, M., & Petras, V. (2013). Multilingual access to digital libraries: The Europeana use
case. Information-Wissenschaft Und Praxis, 64, 86–95.
• Olensky, M., Stiller, J., & Dröge, E. (2012). Poisonous India or the Importance of a Semantic and
Multilingual Enrichment Strategy. In 6th Research Conference, MTSR 2012, Cádiz, Spain, November
28-30, 2012. (pp. 252–263). Berlin: Springer.
• Stiller, Gäde, Petras (2010): Ambiguity of Queries and the Challenges for Query Language Detection.
49
50. Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0
Rickard
Domeij
Language Planner
Language Council of Sweden, Institute of
Language and Folklore
Multilingualism, technology
and language policy
51. Content
● The LC and the multilingual language policy of Sweden (and EU)
● Multilingually accessible services
● Language technology (LT) and language resources
● National Language Bank
● First experiences in digital humanities and cultural heritage
● Challenges for LT in cultural heritage
● Next steps
52. Multilingual language policy
● The LC monitors and promotes the languages of Sweden and their use
● Language policy (2005) and Language act (2009)
● Status and rights to use Swedish and other languages in Sweden
● National minority languages: Sami, Meänkieli, Finnish, Romani, Jiddish
● Swedish sign, Nordic languages, EU-languages, immigrant languages
● Public agencies have to reach out to the whole population
● Also good for business
53. Multilingually accessible services
● Vision: a multilingual society in which all citizens are included with
respect to different backgrounds and languages --> digital inclusion
● Access to info and services according to language rights and needs
● Switch between languages and modes according to preferences
● Example: have a web text read aloud in your language
● Essential for people with disabilities but also useful for others = design
for all (e.g. subtitling)
54. LT to make it possible
● Conversions between languages and modes
● Different modes: writing, speech, gestures …
● Multilingualism = multilinguality + multimodality
● LT modules: text-to-speech (TTS), speech-to-text (STT), machine
translation (MT) …
● Applications: recitation, dictation, translation …
● Voice translation: STT > MT > TTS
55.
56. LT to make it possible II
● Problems with quality and trust, especially on unrestricted data
● User and domain adaptation, user interaction
● Ex: respeaking system for subtitling on tv
● Accessibility often means loss of quality, but other gains
● Accessible and usable
57. Language resources needed
● Data and tools: corpora, markup tools, lexicons, language models …
● Rule-based methods, especially for less resourced languages
● Market forces are not enough
● Stimulate the development of LT and multilingually accessible services
by national means (ex: respeaking system for Swedish tv)
● National Language Bank (NLB) to make resources available for R&D
An NLB promotes the development of technology, which benefits the languages in
Sweden and improves access to information for everyone.
Digital agenda for Sweden (2011)
58. National research infrastructure (2017-
00626) funded by the Swedish Research
Council by 1,5 mil./year until 2025.
Two main types of data:
Multilingual texts and terms from PAs
Multimodal cultural heritage collections
59. First experiences in cultural heritage
● Available voice recognition and MT doesn’t work!
● Instead try other methods:
○ ”sound browsing” to explore speech recordings acoustically
○ respeaking for transcribing speech
○ transcription of handwritten dialect text in Transcribus
○ time-alignment of existing transcripts to sound in ELAN
○ linking from text to speech data in the archives (see next page)
● Usage centered, participative design in multidisciplinary teams
● Tilltal project (SAF16-0917:1)
60.
61. First experiences in cultural heritage
● State-of-the-art voice recognition and MT doesn’t work!
● Instead try other methods:
○ ”sound browsing” to explore speech recordings acoustically
○ respeaking for transcribing speech
○ transcription of handwritten dialect text in Transcribus
○ time-alignment of existing transcripts to sound in ELAN
○ linking from text to speech data in the archives
● Usage centered, participative design in multidisciplinary teams
● Tilltal project (SAF16-0917:1)
62. Challenges for LT in cultural heritage
● Interface or content (= multilingual in a broad sense)
● Far beyond modern standard language use
● Great variation makes domain adaptation hard
● Variation in place (dialects and languages), time (old Swedish) and
situation (informal-formal)
● Modal variation in collections: (handwritten) text, speech, pictures
● Hard to handle as researchers want to explore a collection as a whole
63. Next steps
● Linked data to describe the collection conceptually and relationally
● Multilingual search methods for handling language variation in place,
time and situation
● Domain adopted speech-to-text conversion to transcribe recordings
● Crowdsourcing for correcting
● Shared resources for the languages, dialects, domains etc
● Long time funding for the National Language Bank
● Collaborative projects involving LTists, researchers and data holders
64. Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0
Andrejs Vasiļjevs
Executive Chairman
Tilde
Project Manager
Culture information systems centre Riga
Jānis Ziediņš
Learnings from the automatic
translation projects and how to
apply them for the culture and
heritage sector
65. Culture information systems centre
65
Our mission is to assist cultural heritage institutions -
ARCHIVES LIBRARIES MUSEUMS
maintain and make available cultural heritage for future generations
through the latest information technology solutions.
68. Benefit for eGovernment
6
8
State Gov.lv platform
Platform for the
provision and
management of
e-Services
Single Public Administration
Data Area
Municipality IS
Other IS
State information systems
MT platform
OpenData
77. Digitization of the Cultural Heritage Content
The National Library of Latvia is implementing a European Regional Development
Fund (ERDF) and nationally co-funded project in the field of Latvia's digital cultural
heritage, together with project partners – the National Archives of Latvia, the State
Inspection for Heritage Protection of Latvia, and the Cultural Information System
Centre.
The project will further develop the Digital Object Management and Conservation
System, develop the Copyright Management and Content Licensing System, publish
several Open Datasets, including Related Open Datasets, and develop the Stage of
an Integrated Centralized Open System Information Platform.
7
7
80. Translation test
A photomontage postcard with five
views of Riga. The central city
panorama with the new Pontoon
Bridge opened in 1896 and the Mazā
Guild building in the right corner.
Below these images, the city theatre,
Vērmanes Garden and the bridge
across the canal by Bastejkalns.
A postcard is assembled from five views
of Riga - downtown panorama with the
new Pontonbridge discovered in 1896,
the Little Guild House in the right
corner, under these images - City
Theatre, Verman Gardens, a bridge
over the canal near BastejHill.
Manual translation Hugo.lv translation
VRVM 176655 http://www.nmkk.lv/Items/ItemViewForm.aspx?i
d=167748
8
0
84. 84
Based on Tilde Neural MT
technologies that have won the
1st place at the
WMT2017-2019, a global
competition between the
World’s top language
technology providers
Best
WMT
2017
Best
WMT
2018
Best
WMT
2019
85. • Generic MT systems were
trained on
52 million parallel sentences
• Cultural domain MT systems
were customized with
additional
826 000 parallel sentences
5 million monolingual
sentences
Books
Public sector
data
▪ Fiction
▪ Scientific literature
▪ Technical literature (manuals, instructions)
▪ News from popular media (also multilingual
media)
▪ Company press releases
▪ Multilingual web site content
▪ Laws, regulations, directives, etc.
▪ Documents of internal and external use
▪ Press releases
▪ Public sector web site data
News and web
content
Proprietary
translation
memories
▪ Professional and amateur translator produced
data
▪ Translation memories of translation and
localisation service providing companies
▪ Translation memories of international
organisations
Datafor MT System
Development
94. 94
EU PRESIDENCY TRANSLATOR
AI-powered Neural MTCEF eTranslation
MT systems for the 24 official
EU languages enabling
translation of full documents,
preserving text formatting
AI-powered custom Neural
MT providing superior-quality
translation adapted for the
Presidency requirements
98. 98
BENEFITS FOR ESTONIA, BULGARIA, AUSTRIA
• Enables Presidency staff to quickly translate documents
• Empowers visiting journalists and delegates to access info in
the local language, e.g., press releases, local news sites
• Supports staff translators in their work by boosting
translation productivity up to 35%
• Lowers costs of translation for documents by utilizing
post-edited machine translation
• Allows public sector organizations to translate content and
websites into multiple languages
99. 99
From September, 2017 to October, 2019 the EU Council
Presidency Translator has processed:
32 159 082
million
words
2.83
million
sentences
1.09
million
translation
requests
~207 books (there are 155
thousand words on average in one
Harry Potter book)
STATISTICS
101. 101
Conclusions
• New generation of Neural MT strongly improves quality and applicability of
machine translation, especially for morphology rich languages
• Domain specific data is crucial for making MT suitable for cultural and other
domains
• Depending on the application, translation needs can be served by selecting
the most efficient approach – pure MT, human review of the MT, or fully
human translation
• We will be happy to share our experience, technologies and tools :)
103. Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0
Heli
Kautonen
Library Director
Finnish Literature Society SKS
Design for Diversity
104. Design for Diversity
Heli Kautonen
Library Director, Finnish Literature Society (SKS)
24.10.2019
Europeana meeting on multilingualism, Hanaholmen, Finland
114. Development Implementation Operation and maintenance Initiation
(of a new service)
time
Process-time Use-time Future
Who are involved in
the development and
implementation of
your service?
What kinds of benefits
can be identified?
Who uses your
service? Are there
other stakeholders?
What kinds of benefits
can be identified?
Who could (re)use
your service or
materials in the
(undefined) future?
What kinds of benefits
can be anticipated?
Model for temporal division of benefits
Kautonen, H. & Nieminen, M. (2018): Conceptualizing Benefits of User-Centered Design for Digital Library
Services. Liber Quarterly, 28(1), ss. 1–34. DOI: http://doi.org/10.18352/lq.10231.
119. Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0
Dasha
Moskalenko
Manager Service Design
Europeana Foundation
Europeana case study
UX Design and user testing
120. Ο Ζητιάνος Φοιτητής, Άγνωστος δημιουργός, 1945,Ίδρυμα Μουσείου Νίκου Καζαντζάκη, Greece, CC BY-NC-ND
136. Hands showing the French sign language alphabet, Wellcome Collection, CC BY
europeana.eu
@EuropeanaEU
THANK YOU!
Questions & comments are welcome.
dasha.moskalenko@europeana.eu
137. Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0
Matias
Frosterus
Information Systems Manager
with Mikko Lappalainen, Osma Suominen,
Satu Niininen
National Library of Finland
Multilingual linked vocabularies
and automatic subject indexing
services - National Library's
Finto and Annif
144. THE NATIONAL LIBRARY OF FINLAND
The goal
▪ Bringing the library know-how into use for all of the public
sector
▪ But better!
▪ Better vocabularies
▪ Publication, use, and integration of those better vocabularies
▪ Automated tools to make it even easier
145. THE NATIONAL LIBRARY OF FINLAND
What is needed?
▪ Modern linked data vocabularies
▪ A way to publish them for everyone to use
▪ A way to integrate them into your systems
▪ A way to make using them less labour-intensive
147. THE NATIONAL LIBRARY OF FINLAND
Vocabularies
▪ Starting point: General Finnish Thesaurus YSA
▪ Developed in the 1980’s mainly for book indexing
▪ Over 30,000 terms
▪ Monolingual but has a Swedish counterpart Allärs
148.
149. THE NATIONAL LIBRARY OF FINLAND
Thesaurus to ontology
▪ Reconstruction of YSA into machine-readable and multilingual YSO
▪ Trilingual terms for concepts (fin, swe, eng)
▪ YSA and Allärs merged together and translated into English
▪ Concepts are a compromise between Finnish and Swedish as YSA
and Allärs are not completely identical
▪ Links to Library of Congress Subject Headings (LCSH)
▪ Linking to Wikidata underway
▪ YSO just made the list of Europeana dereferenceable vocabularies
that can be enriched in the Europeana portal
151. THE NATIONAL LIBRARY OF FINLAND
Challenges of multilinguality
▪ Founded on the concepts of the Finnish cultural sphere
▪ Some concepts may not be common outside of that
▪ sandwich cakes, uncles (maternal)
▪ väheneminen = minskning (antal) = decrease (passive)
vähentäminen = minskning (aktiv reducering av antal) = decrease (active)
▪ Liikuntalukiot = idrottsgymnasier = general upper secondary schools
focusing on sport and exercise
152. THE NATIONAL LIBRARY OF FINLAND
Challenges of multilinguality
▪ Some may result in somewhat awkward terms
▪ rivers = joet = floder, åar och älvar
▪ The original Swedish thesaurus Allärs had three terms that could be
used interchangeably
153. THE NATIONAL LIBRARY OF FINLAND
Challenges of multilinguality
▪ Can also affect hierarchy
▪ pesät
⤷ muurahaispesät (literally ant nests)
bon
⤷ myrstackar
nests
⤷ ant hills
▪ For more information, see http://urn.fi/URN:NBN:fi-fe201705106375
Satu Niininen, Susanna Nykyri, Osma Suominen, (2017) "The future of
metadata: open, linked, and multilingual – the YSO case", Journal of
Documentation, Vol. 73 Issue: 3, pp.451-465, doi: 10.1108/JD-06-2016-0084.
154. THE NATIONAL LIBRARY OF FINLAND
YSO
YSO
Upper
hierarchy
General
concepts
Specific
concepts
155. THE NATIONAL LIBRARY OF FINLAND
YSO
YSO
Upper
hierarchy
General
concepts
Specific
concepts
156. THE NATIONAL LIBRARY OF FINLAND
Adapted into use outside the library domain
▪ Extended with domain ontologies
▪ Using the core provided by YSO
▪ Helps interoperability!
▪ Developed by the domain experts in various organizations
157. THE NATIONAL LIBRARY OF FINLAND
Adapted into use outside the library domain
▪ Extended with domain ontologies
▪ Using the core provided by YSO
▪ Helps interoperability!
▪ Developed by the domain experts in various organizations
▪ Over a dozen domain ontologies such as:
▪ AFO - Agriculture - 7 000 concepts
▪ JUHO - Government - 6 300
▪ KAUNO - Literature - 5 000
▪ KULO - Cultural research - 1 500
▪ LIITO - Economics - 3 000
▪ SOTO - Military - 2 000
▪ TERO - Health - 6 500
▪ And others
162. THE NATIONAL LIBRARY OF FINLAND
National vocabulary and ontology service
Finto
▪ A bit of history
▪ FinnONTO-research project (2003-2012)
▪ Built research prototypes of services and started the ontologization
process of the various thesauri
▪ The National Library began the Finto project in 2013 funded by
the Ministry of Education and Culture and the Ministry of Finance
▪ A national vocabulary and ontology service for the whole public
sector
167. THE NATIONAL LIBRARY OF FINLAND
Adopted widely in Finland
▪ Finto is used in many organizations in Finland to annotate
their various resources, among them
▪ The national broadcasting company Yle
▪ Suomi.fi citizen’s portal to public services
▪ Various public sector content systems
▪ Websites of various ministries
▪ Various museums, archives, and libraries
168. THE NATIONAL LIBRARY OF FINLAND
Skosmos
▪ The heart beating inside Finto
▪ Open source SKOS vocabulary browser
▪ http://skosmos.org
▪ Publication and use of light-weight ontologies, thesauri and classifications
▪ Web interface
▪ REST API
▪ SPARQL endpoint
▪ Community
▪ https://groups.google.com/forum/#!forum/skosmos-users
169. How does it work?
▪ Make your thesaurus into SKOS
172. SPARQL
Skosmos
▪ And serve your thesaurus for
humans, Linked Data agents,
and REST API access
How does it work?
173. THE NATIONAL LIBRARY OF FINLAND
Key features
▪ Multilingual browser interface (10 languages)
▪ Autocomplete search
▪ Alphabetical index
▪ Concept hierarchy display
▪ Concept groups (thematic index)
▪ New concepts
▪ REST API for enabling use of vocabularies in other
applications
▪ responses usually JSON-LD
178. THE NATIONAL LIBRARY OF FINLAND
Automated Subject Indexing made easy:
Annif
▪ An open source multilingual automated subject indexing
system using machine learning and our own vocabularies
180. Metadata about 13M documents,
many of them tagged with subjects! Hot tub by a lake
Andrei Niemimäki
CC BY-SA
181. Hot tub by a lake
Andrei Niemimäki
CC BY-SA
Metadata about 13M documents,
many of them tagged with subjects!
182. Hot tub by a lake
Andrei Niemimäki
CC BY-SA
Metadata about 13M documents,
many of them tagged with subjects!
183. Finna API
▪ All Finna metadata is
▪ YSO and KOKO widely used
184. THE NATIONAL LIBRARY OF FINLAND
▪ Try it out for yourself at http://annif.org/
Automated Subject Indexing made easy:
Annif
Prototype in 2017
185. THE NATIONAL LIBRARY OF FINLAND
Automated Subject Indexing made easy:
Annif
VsAutomating our own processes Creating generic tools for many contexts
186. THE NATIONAL LIBRARY OF FINLAND
Annif development
▪ Packaging Annif into an easy-to-deploy solution via Docker
▪ Tuning the various algorithms and their hyperparameters
powering Annif
▪ Making integration easier through a Finto API
188. THE NATIONAL LIBRARY OF FINLAND
Summary
Interlinked multilingual vocabularies
for various domains
A national service for
publishing and using
said vocabularies
An automated system
for making it easy
to produce annotations
with said vocabularies
189. THE NATIONAL LIBRARY OF FINLAND
Summary
Interlinked multilingual vocabularies
for various domains
A national service for
publishing and using
said vocabularies
An automated system
for making it easy
to produce annotations
with said vocabularies
All the while
utilizing library
know-how
Richer metadata
Cross-domain findability and interoperability
More efficient workflows
New connections, new possibilities
190. THE NATIONAL LIBRARY OF FINLAND
Thank you!
matias.frosterus@helsinki.fi
finto-posti@helsinki.fi
@Fintopalvelu
All pictures used under CC0
license unless otherwise noted
191. Books on a table, Aalto, Ilmari, 1928, National Digital Library (NDL), Finland, CC0
Hugo
Manguinhas
Product Manager API
Europeana Foundation
Case Study -
Translation of object metadata
using the Knowledge Graph
193. Object Metadata
What is the title of the object?
Who created or contributed it?
What topics is the object about?
What kind of object it is?
When was it created or published?
Where was it created or is located?
...
195. About the Knowledge Graph
● Vast network of data sources made available in
the wider Linked Open Data cloud
● Can be linked to and used to bring more
contextual information to the items
● Vast and readily available source of controlled
translations
Part of the Linking Open (LOD) Data Project Cloud Diagram, CC-BY-SA.
196. EDM and the Knowledge Graph
We encourage data providers to
● Contribute links to their own
vocabularies and publish them as
Linked Open Data
● Use available reference vocabularies
to describe their content
Clavecin, Bartolomeo Cristofori
Cite de la Musique,
MIMO - Musical Instruments Museums
Online|CC BY-NC-SA
197. ● Available as Linked Open Data and
therefore part of the Knowledge Graph
● The rights statements have been
translated into: Estonian, Finnish,
French, German, Polish and Spanish,
but 7 more translation efforts are
ongoing
Research has shown that the
official translation of rights
information leads to better
investment/effort into adoption
of rs.org and thus more accurate
copyright info
201. Entity Collection: benefits
● Allows Europeana to establish links to the
Knowledge Graph through means of semantic
enrichment of the object metadata
● Harmonizes vocabularies from the multiplexity of
data providers into a single point of reference
● Exploits coreference links between vocabularies
to increase multilingual coverage
Entity
Collection
Entity
Collection
204. Steps to improve the Knowledge Graph
● Promote alignment efforts
between vocabularies used by
data providers to complementary
vocabularies such as Wikidata
● Promote translation
efforts/campaigns to increase
multilingual coverage of the
Knowledge Graph prioritising on
discovery-enabling metadata
fields
205. A FOCUSED VIEW ON
THE GENERAL STRATEGY
Idrottstävlingar på Eyravallen. "Benke". 27 september 1955.,Örebro Kuriren, Örebro läns museum, Sweden, Public domain
206. Multilingual search, browse and display
Usage scenarios
● Enter search query in chosen language
● See search results and interact with filters in
chosen language
● Display object metadata on item page
● Navigate to entities
207. Proposals for indexing and storing translations
● Automated identification of language if needed (only 26.5% of the data
provider’s metadata is language qualified)
● Use translations from multilingual knowledge graph
● Augment the provider metadata with static translation of the fields to English
(to fill metadata values not covered by the knowledge graph)
● Store and index translated metadata for search and display (original metadata
+ languages of the knowledge graph + English)
208. Proposals for search on object metadata
Identify
language
Original
query
Translate to
English
Multilingual
index
User
Disambiguates
Search
Translated query (English)
Suggest Entity
(Knowledge Graph)
Entity-based query
Multilingual query:
entity based query
OR original query +
translated query
#1: French
#2: Spanish
#3: Polish
209. Proposals for display of object metadata
Multilingual
Database
Translate from
English
Obtain metadata
(Knowledge Graph)
In original language
or English
Obtain
metadata
In other
language
Request
metadata
210. MULTILINGUAL EXPERIENCE
OUTCOMES
● Users can search and filter in one of 24 official languages
● Item page metadata would display in chosen language if knowledge
graph translations were present
● Where chosen language is not supported, display will default to
source language and offer option to view in English
211. Challenges & Open Questions
● How successful is automated language detection?
● Would prioritising static translation of discovery-enabling metadata fields to
English be “good enough”?
● How well can we statically translate remaining metadata fields to English,
specially when they contain single or short phrases?
● Would dynamic translation of metadata (for languages other than English) be
good enough?
212. The Chinese Market, 1767 - 1769, Rijksmuseum, Netherlands, Public domain
europeana.eu
@EuropeanaEU