Georg Rehm. Towards a Human Language Project for Multilingual Europe: AI and Interpretation. DG Interpretation Conference - Interpretation: Sharing Knowledge & Fostering Communities. European Commission, Brussels, April 2018. April 19/20, 2018. Invited talk.
Georg Rehm. Towards a Human Language Project for Multilingual Europe: AI and Interpretation. DG Interpretation Conference - Interpretation: Sharing Knowledge & Fostering Communities. European Commission, Brussels, April 2018. April 19/20, 2018. Invited talk.
Towards a Human Language Project for Multilingual Europe: AI and Interpretation
1.
Georg Rehm
German Research Center for Artificial Intelligence (DFKI) GmbH
Language Technology Lab – Berlin, Germany
META-NET, General Secretary
georg.rehm@dfki.de
Towards a Human Language
Project for Multilingual Europe
AI and Interpretation
2.
Artificial Intelligence
SCIC Universities Conference (19/20 April 2018) 2/12
3.
SCIC Universities Conference (19/20 April 2018) 3
4.
SCIC Universities Conference (19/20 April 2018)
Data Intelligence
Current breakthroughs based on Machine Learning (“Deep Learning”)
Also still in use: symbolic, rule-based methods and systems
Artificial Intelligence
• Huge data sets + powerful algorithms + extremely fast hardware
• Enormous potential for disruptions in all sectors and areas
4
5.
META-NET and
Multilingual Europe
SCIC Universities Conference (19/20 April 2018) 5/12
6.
• Multilingualism is at the heart of the European idea
• 24 EU languages – all have the same status
• Dozens of regional and minority languages as well as
languages of immigrants and trade partners
• Many economic and social challenges:
– The Digital Single Market needs to be multilingual
– Cross-border, cross-lingual, cross-cultural
communication
7.
!
60 research centres in 34 countries (founded in 2010)
Chair of Executive Board: Jan Hajic (CUNI)
Dep.: J. van Genabith (DFKI), A. Vasiljevs (Tilde)
General Secretary: Georg Rehm (DFKI)
!
Multilingual Europe
Technology Alliance.
826 members in
67 countries
(published in 2013) (31 volumes; published in 2012)
T4ME (META-NET) CESAR METANET4UMETA-NORDMultilingual Europe Technology AllianceNET
9.
MT
English
good
French, Spanish
moderate fragmentary
Catalan, Dutch, German,
Hungarian, Italian, Polish,
Romanian
weak or no support through LT
Basque, Bulgarian, Croatian,
Czech, Danish, Estonian, Finnish,
Galician, Greek, Icelandic, Irish,
Latvian, Lithuanian, Maltese,
Norwegian, Portuguese, Serbian,
Slovak, Slovene, Swedish, Welsh
excellent
Czech, Dutch,
Finnish, French,
German, Italian,
Portuguese,
Spanish
moderate fragmentary
Basque, Bulgarian, Catalan,
Danish, Estonian, Galician,
Greek, Hungarian, Irish,
Norwegian, Polish, Serbian,
Slovak, Slovene, Swedish
weak or no support through LT
Croatian, Icelandic, Latvian,
Lithuanian, Maltese, Romanian,
Welsh
excellent
English
good
Speech
English
good
Dutch, French,
German, Italian,
Spanish
moderate fragmentary
Basque, Bulgarian, Catalan,
Czech, Danish, Finnish,
Galician, Greek, Hungarian,
Norwegian, Polish,
Portuguese, Romanian,
Slovak, Slovene, Swedish
weak or no support through LT
Croatian, Estonian, Icelandic, Irish,
Latvian, Lithuanian, Maltese,
Serbian, Welsh
excellent
English
good
Czech, Dutch,
French, German,
Hungarian, Italian,
Polish, Spanish,
Swedish
moderate fragmentary
Basque, Bulgarian, Catalan,
Croatian, Danish, Estonian,
Finnish, Galician, Greek,
Norwegian, Portuguese,
Romanian, Serbian, Slovak,
Slovene
Icelandic, Irish, Latvian,
Lithuanian, Maltese, Welsh
weak or no support through LTexcellent
ResourcesTextAnalytics
10.
Fragmentary
Weak/none
Moderate
Good
Excellent
Welsh
Maltese
Lithuanian
Latvian
Icelandic
Irish
Croatian
Serbian
Estonian
Slovene
Slovak
Romanian
Norwegian
Greek
Galician
Danish
Bulgarian
Basque
Swedish
Portuguese
Finnish
Catalan
Polish
Hungarian
Czech
Italian
German
Dutch
Spanish
French
English
Levelofsupport
Languages with names in red
have little or no MT support
Source: META-NET White Paper Series: Europe's Languages in the Digital Age. Springer, Heidelberg,
New York, Dordrecht, London, September 2012. Georg Rehm and Hans Uszkoreit (series editors)
11.
Fragmentary
Weak/none
Moderate
Good
Excellent
Welsh
Maltese
Lithuanian
Latvian
Icelandic
Irish
Croatian
Serbian
Estonian
Slovene
Slovak
Romanian
Norwegian
Greek
Galician
Danish
Bulgarian
Basque
Swedish
Portuguese
Finnish
Catalan
Polish
Hungarian
Czech
Italian
German
Dutch
Spanish
French
English
Levelofsupport
Languages with names in red
have little or no MT support
Source: META-NET White Paper Series: Europe's Languages in the Digital Age. Springer, Heidelberg,
New York, Dordrecht, London, September 2012. Georg Rehm and Hans Uszkoreit (series editors)
Important: even current state of the art
technologies are far from being perfect!
12.
Fragmentary
Weak/none
Moderate
Good
Excellent
Welsh
Maltese
Lithuanian
Latvian
Icelandic
Irish
Croatian
Serbian
Estonian
Slovene
Slovak
Romanian
Norwegian
Greek
Galician
Danish
Bulgarian
Basque
Swedish
Portuguese
Finnish
Catalan
Polish
Hungarian
Czech
Italian
German
Dutch
Spanish
French
English
Levelofsupport
Languages with names in red
have little or no MT support
Source: META-NET White Paper Series: Europe's Languages in the Digital Age. Springer, Heidelberg,
New York, Dordrecht, London, September 2012. Georg Rehm and Hans Uszkoreit (series editors)
Important: 20+ European languages are
severely under-supported and face the
danger of digital extinction.
13.
Fragmentary
Weak/none
Moderate
Good
Excellent
Welsh
Maltese
Lithuanian
Latvian
Icelandic
Irish
Croatian
Serbian
Estonian
Slovene
Slovak
Romanian
Norwegian
Greek
Galician
Danish
Bulgarian
Basque
Swedish
Portuguese
Finnish
Catalan
Polish
Hungarian
Czech
Italian
German
Dutch
Spanish
French
English
Levelofsupport
Languages with names in red
have little or no MT support
Source: META-NET White Paper Series: Europe's Languages in the Digital Age. Springer, Heidelberg,
New York, Dordrecht, London, September 2012. Georg Rehm and Hans Uszkoreit (series editors)
We carried out the study in 2010/2012. While support
for many languages has improved in the meantime,
the overall picture remains mostly the same.
14.
AI and Interpretation
SCIC Universities Conference (19/20 April 2018) 14/12
15.
• Since approx. 2015, with breakthroughs in neural technolo-
gies, Machine Translation has been getting better and better.
• All areas of AI look for “super-human performance” but
language is fundamentally different and much more complex.
• Neural AI approaches cannot understand language, they
process it according to huge underlying data sets.
• In many use cases, mistakes can be tolerated.
• But: translation and interpretation are often mission-critical!
• Mistakes can have serious consequences (politics, medicine).
Translation and Interpretation
SCIC Universities Conference (19/20 April 2018) 15
16.
• Example: Lecture Translator
– University lectures are automatically transcribed and translated,
in near-real time, into several languages
– Students can follow the translation through a web interface
• Example: Presentation Translator
– Presenter can have the speech automatically translated
– Translations are displayed as subtitles
• Example: Call Translator
– Internet telephony provider offers automatic voice translation
Speech Translation
SCIC Universities Conference (19/20 April 2018) 16
17.
• The three example applications work surprisingly well for
general-domain language and input. But:
– They are far from being perfect.
– They aren’t robust.
– They cannot cope with unforeseen situations.
– They cannot understand language as humans do.
– They are not (yet?) suited for conference interpretation.
! Limitations as regards their fields of application.
• Interpretation is often mission-critical.
! Human interpreters won’t be replaced anytime soon.
Issues and Limitations
SCIC Universities Conference (19/20 April 2018) 17
18.
SCIC Universities Conference (19/20 April 2018) 18
https://slator.com/features/ai-interpreter-fail-at-china-summit-sparks-debate-about-future-of-profession/
19.
Human Language
Project
SCIC Universities Conference (19/20 April 2018) 19/12
20.
• LT in Europe: World class research, strong SME base, thousands
of LSPs; immense fragmentation; need for coordination.
• Need for High-Quality LT: translation, interpretation, MDSM etc.
• The European Language Challenge cannot be – it must not be –
abandoned or outsourced!
! Need for Language Technology, made in Europe, for Europe!
! STOA Workshop in the EP (January 2017): “Language equality in
the digital age – towards a Human Language Project”
LT – Current Developments
SCIC Universities Conference (19/20 April 2018) 20
STUDY
EPRS | European Parliamentary Research Service
Scientific Foresight Unit (STOA)
PE 581.621
Science and Technology Options Assessment
21.
• Goal: Deep Natural Language Understanding by 2030
• Vision: EU FET Flagship Project (10+ years)
• Broad coverage, high quality, high precision
• Create approaches, algorithms, data sets, resources
• Across modalities: text, text types, speech, video etc.
Artificial Intelligence
including cognition, perception, vision,
cross-modal, cross-platform, cross-culture etc.
Machine Learning
Language TechnologyLinguistics
SCIC Universities Conference (19/20 April 2018)
Human Language Project
21
22.
Summary & Conclusions
• AI is disrupting all industries – including translation
and, increasingly, also interpretation.
! But: perfect, robust, precise language technologies (incl.
written/spoken MT and interpretation) are still far away.
• Linguists are increasingly needed – new profiles emerging
! The machine will support human experts and help them
become more efficient – it will not replace them.
• The Human Language Project is still a vision. Its goal:
develop new breakthroughs in Language Technology.
SCIC Universities Conference (19/20 April 2018) 22
23.
Recommendation
• SCIC Speech Repository
• 4,000 speeches (3,000 public + 1,000 private)
• Extremely interesting data set and language resource for
Language Technology researchers!
• Many R&D groups currently work on TED talk data sets
• Recommendation: establish bridges between SCIC
and research groups for spoken language translation
• Help build the next generation of AI tools for interpreters
• AI tools that are tailored to the needs and wishes, topics
and domains of conference interpreters in the EC/EP
SCIC Universities Conference (19/20 April 2018) 23
24.
Thank you!
Dr. Georg Rehm
DFKI Berlin
georg.rehm@dfki.de
http://de.linkedin.com/in/georgrehm
https://www.slideshare.net/georgrehm
SCIC Universities Conference (19/20 April 2018) 24
Strategic Research and Innovation Agenda
Language Technologies for
Multilingual Europe
Towards a Human Language Project
SRIA Editorial Team
Version 1.0 – December 2017
Los recortes son una forma práctica de recopilar diapositivas importantes para volver a ellas más tarde. Ahora puedes personalizar el nombre de un tablero de recortes para guardar tus recortes.
Crear un tablero de recortes
Compartir esta SlideShare
¿Odia los anuncios?
Consiga SlideShare sin anuncios
Acceda a millones de presentaciones, documentos, libros electrónicos, audiolibros, revistas y mucho más. Todos ellos sin anuncios.
Oferta especial para lectores de SlideShare
Solo para ti: Prueba exclusiva de 60 días con acceso a la mayor biblioteca digital del mundo.
La familia SlideShare crece. Disfruta de acceso a millones de libros electrónicos, audiolibros, revistas y mucho más de Scribd.
Parece que tiene un bloqueador de anuncios ejecutándose. Poniendo SlideShare en la lista blanca de su bloqueador de anuncios, está apoyando a nuestra comunidad de creadores de contenidos.
¿Odia los anuncios?
Hemos actualizado nuestra política de privacidad.
Hemos actualizado su política de privacidad para cumplir con las cambiantes normativas de privacidad internacionales y para ofrecerle información sobre las limitadas formas en las que utilizamos sus datos.
Puede leer los detalles a continuación. Al aceptar, usted acepta la política de privacidad actualizada.