This document summarizes challenges for the language technology industry in Europe related to Europeana, a platform providing access to cultural heritage collections across Europe. It notes that Europeana provides access to over 33 million objects from over 2,300 contributors in 36 countries, with metadata in 33 languages. However, it faces challenges in facilitating re-use and access across languages due to the diversity of languages and domains in its collections. It discusses the need for automatic translation and natural language processing tools to address multilingual search and access issues at Europeana's scale. The document also outlines resource constraints for libraries, archives, and museums in developing language technologies, and their role in providing open data and use cases to the industry.
3. Built on descriptive metadata
from a broad, heterogeneous network
Audiovisual
collections
National Aggregators
Regional Aggregators
Archives
Thematic collections
Libraries
Musées
Lausannois
Culture.frThe
European
Library
APEX
European Film
Gateway Europeana Fashion
2,300 galleries, museums, archives and libraries
9. Related projects applying NLP tools
E.g. a project (PATHS) developed techniques to enrich English and
Spanish collections
1)Identification of key entities
2)Detection of (typed) similarities between objects, using metadata
3)“Background links” to external resources such as Wikipedia
4)Classification of object against a hierarchy of topics
Applying these to other languages would require work
1)-> requires language-specific tools (PoS tagging, lemmatization)
2)-> straightforward to apply to new languages
3)-> requires language-specific tools
4)-> depends on (3) and on translation of some topics
http://www.paths-project.eu/eng/Resources/Semantic-Enrichment-of-Cultural-Heritage-content-in-PATHS
10. Language challenges for Digital Libraries
Typical queries are very short
Average < 2 terms
Identification of query language is not easy, even manually
39% of queries may belong to several languages
Plenty of named entities
60% of queries are for persons & places
Not only is it hard for queries: the same issues apply
to the descriptive metadata
Studies by Humboldt University on Europeana and The European Library
http://www.clef-initiative.eu/documents/71612/86374/CLEF2010wn-LogCLEF-StillerEt2010.pdf
12. Very diverse domains, probably with few
training corpora available
Tools, UCL Museums, CC-BY-NC-SA
Paris, nouvelle machine à paver : [photographie de presse] / [Agence Rol], National
Library of France, Public Domain
St. Philip holding a book and St. James (the Less?) holding a book, National Library of
the Netherlands, Public domain
La paloma / O sole mio, Dalane Folkemuseum, CC0
13. Relevant LT can come from everywhere in
Europe, raising interoperability issues
14. Resource problem
Both for us and our partners - libraries, archives, museums
Not much money
Few technical experts
Emphasis on open source technology
We can provide interesting challenges for the industry in
terms of (open) data availability, users and scenarios.
But we're not (yet) a market of the size of others
Les Miserables: Victor Hugo’s handwritten manuscripts: http://www.europeana.eu/portal/record/9200103/5372912AF66AB529E188218BC1F747E75EB1A18F.html
BnF, public domain
Matisse ‘53 in the form of a double helix’ http://www.europeana.eu/portal/record/9200104/F8D60AB9136C8A59B59DF1CFEC278A6CABA8B0C6.htmlThe Wellcome Library (CC-BY-NC-ND)
‘söprűtánc’ – Hungarian traditional dance http://www.europeana.eu/portal/record/08901/E1A7B01BE4AED87FD239672F4F3941F52262D6B2.html
Hungarian Academy of Sciences Institute for Musicology, public domain
‘Neurologico reggae’ Music album http://www.europeana.eu/portal/record/08901/ADC241BCBF8470988DBA6EEAFCF13F14D88E5534.html
DISMARC – EuropeanaConnect Paid Access
‘Castle of Kavala’ 3D exploration of a Greek castle http://www.europeana.eu/portal/record/2020703/05607B24D15BD516EE2B765F74CDA39C7427F7FB.html
Cultural and Educational Technology Institute - Research Centre Athen CARARE CC-BY-NC-ND
All partners send us descriptions of their assets, which we aggregate in a single service
Germany 15.44%
France 10.97%
Netherlands 9.67%
Sweden 9.44%
Spain 9.98%
UK 6.98%
Norway 6.60%
Italy 5.4%
Ireland 4.04%
Poland 4.02%
Europe 3.95%
Finland 2.95%
Austria 2.05%
Belgium 1.61%
Hungary 1.26%
Users from everywhere
Data from everywhere
Tools from everywhere
http://europeana.eu/portal/record/2022347/B7C7D15C23C28EFD3FA25147ED3A580757CFBB04.html
http://europeana.eu/portal/record/9200103/ark__12148_btv1b6921004c.html
Users from everywhere
Data from everywhere
Tools from everywhere
http://europeana.eu/portal/record/2022347/B7C7D15C23C28EFD3FA25147ED3A580757CFBB04.html
http://europeana.eu/portal/record/9200103/ark__12148_btv1b6921004c.html
http://www.europeana.eu/portal/record/9200122/BibliographicResource_1000056116671.html
http://www.europeana.eu/portal/record/2022608/DF_DF_13399.html
Users from everywhere
Data from everywhere
Tools from everywhere
http://europeana.eu/portal/record/2022347/B7C7D15C23C28EFD3FA25147ED3A580757CFBB04.html
http://europeana.eu/portal/record/9200103/ark__12148_btv1b6921004c.html