Georg Rehm. AI for Translation Technologies and Multilingual Europe. DG TRAD Conference - Translation Services in the Digital World: A Sneak Peek into the (near) Future. Luxembourg. October 16/17, 2017.
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
AI for Translation Technologies and Multilingual Europe
1. Georg Rehm
georg.rehm@dfki.de
DFKI GmbH, Language Technology Lab – Berlin, Germany
META-NET, General Secretary
AI for Translation Technologies
and Multilingual Europe
2. Outline
• Artificial Intelligence
• Technology Support for Multilingual Europe
• European MT Research – Results from QT21
• Connecting Europe Facility – Automated Translation
• Towards the Human Language Project
• Conclusions
2EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies
4. EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies 4
5. EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies 5
6. EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies 6
7. EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies 7
8. EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies 8
Data Intelligence
Current breakthroughs based on Machine Learning (Deep Learning)
Also still in use: symbolic, rule-based methods and systems
Artificial Intelligence
• Huge data sets + powerful algorithms + extremely fast hardware
• Self-driving cars, robots, image recognition, machine translation
• Enormous potential for disruptions in all sectors and areas
10. • Multilingualism is at the very heart of the European idea
• 24 EU languages – all languages have the same status
• Dozens of regional and minority languages as well as
languages of immigrants and trade partners
• Economic challenges:
– If the DSM is not multilingual, there will be 20+ isolated markets
– Language barriers are market barriers
• Social and public challenges:
– Empower all citizens to use their mother tongues
– Enable cross-border, cross-lingual, cross-cultural communication
– Provide multilingual digital public services
– Restore trust in media (fake news debate, filter bubble issue etc.)
11. q
60 research centres in 34 countries (founded in 2010)
Chair of Executive Board: Jan Hajic (CUNI)
Dep.: J. van Genabith (DFKI), A. Vasiljevs (Tilde)
General Secretary: Georg Rehm (DFKI)
q
Multilingual Europe
Technology Alliance.
826 members in
67 countries
(published in 2013) (31 volumes; published in 2012)
T4ME (META-NET) CESAR METANET4UMETA-NORDMultilingual Europe Technology AllianceNET
18. • Morphologically rich languages (De, Cz, Lv)
• Under-resourced languages (Lv, Ro)
• Quality Assessment: MQM – DQF
• Learning from human feedback (APE)
• Evaluation framework: WMT
– Event series to present and discuss results from MT evaluations
– Procedures: Automatic scoring (BLEU etc.) and human
judgements (large number of human annotators)
• Shared tasks (newspaper translation, quality estimation,
metrics and automatic post-editing)
QT21 is Improving Automatic Translation
18
19. Human Judgement Rankings
64 53 1 3First
66 65 3 6
First +
Second
2015 2016 2017
QT21 Best Online
19
WMT Newspaper Translation Task
• En ó Cz
• En ó De
• En ó Fr
• En ó Cz
• En ó De
• En ó Ro
• En ó Cz
• En ó De
• En ó Lv
20. 0
5
10
15
20
25
30
35
40
En -> De De -> En En -> Cz Cz -> En
QT21 improvement in the last 12 months vs. online systems
QT21-WMT-2016 Online WMT-2017 QT21-WMT-2017
WMT 2016 System on WMT 2017 Data
20
21. • Data sets are the fuel for neural networks
• QT21’s neural technologies define the state of the art
• Ranked #1 in more than 80% of all tasks at WMT 2017
• Also predominantly ranked #1 at WMT 2016
• QT21 keeps commercial systems at a distance
• Huge improvements on morphologically rich languages
• MQM as a standard for quality evaluation
Selected Results
21
23. Connecting Europe
23
• EU flagship goal: Establishing the Digital Single Market
• Overcoming existing barriers
– by creating an environment for digital services to flourish
– by providing cross-border infrastructures and services.
• Sectorial CEF Digital Service Infrastructures (DSIs)
This also includesODRBRIS eHealth EESSI
Citizens
need to
solve
disputes
online across
borders
Citizens and
business
partners need
legal certainty
when doing
business
cross-border
Citizens need
to have online
access to their
patient
summary when
abroad
Citizens need
to get to enjoy
their social
security
seamlessly
and online
when abroad
eProcurement
Open Data
e-Justice
Cyber Security
Safer Internet …
24. 24
• Technological CEF building blocks can be used by the
different DSIs (e.g., eInvoicing, eSignature etc.)
• Most important in this context: CEF eTranslation
– Why? To help European and national public administrations
exchange information across language barriers
– How? By providing MT capabilities that will enable digital
services (in particular all DSIs) to be multilingual.
• CEF eTranslation builds on MT@EC
• Guarantees confidentiality and security of translated data
• è ELRC contract
Connecting Europe
Coordinator:
Josef van Genabith (DFKI)
25. European Language
Resource Coordination
2525
• Language resourcesCollect
• Needs of public servicesIdentify
• With the public sector in the
identification of Language ResourcesEngage
• With any technical or legal issuesHelp
• Observatory for language resources
across EuropeAct
26. What has been achieved?
0
20
40
60
80
100
120
140
160
Bi-/Multilingual Corpora Terminologies Monolingual Corpuora
LR contributions by type
Status: April 2017
• 225 language resources collected
• More than 2 billion words in all EU official
languages, Norwegian and Icelandic
• Over 450,000 terms
• More than 2 million translation units
• More than 91 resources to be used by you!
27. ELRC for you
27
• ELRC-SHARE Repository
– Access to, sharing and contribution of LRs
– Access to tools and services catalogue (forthcoming)
– http://www.lr-coordination.eu/resources
• ELRC Technical and Legal Helpdesk
– Support for potential data donors (phone, email)
– http://www.lr-coordination.eu/helpdesk
• ELRC On-site assistance
– http://www.lr-coordination.eu/services
29. • Multilingual Europe: our languages enjoy equal status yet digital
extinction of the majority of EU languages is a very severe danger.
• Language Technology Research and Innovation in Europe:
World class research results (e.g., in QT21), strong SME base,
thousands of LSPs; fragmentation; need for coordination.
• Big need for high-quality Language Technologies: translation,
personal assistants, multilingual DSM etc. (example: CEF).
• AI: Important breakthroughs and massive investments in R&D and
applications (mostly in US, Asia) – huge opportunity for Europe!
• The European Language Challenge cannot be abandoned or
outsourced!
Ø Need for Language Technology made in Europe for Europe!
Current Developments
29EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies
30. Towards the
Human Language Project
30EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies
31. • STOA Workshop in European Parliament (January 2017):
“Language equality in the digital age – towards a Human Language Project”
• Human Language Project vision suggested in several presentations
• STOA Study, published in March 2017, does recommend setting up the HLP
Ø http://www.stoa.europarl.europa.eu/stoa/cms/home/workshops/language
STUDY
EPRS | European Parliamentary Research Service
Scientific Foresight Unit (STOA)
PE 581.621
Science and Technology Options Assessment
32. 32
• Goal: Deep Natural Language Understanding by 2030
• AI for Next Generation Language Technology
• Large-scale EU funding programme for basic and
applied research as well as innovation (10-15 years)
• New breakthroughs for research, industry and society
to foster a multitude of innovations.
Artificial Intelligence
including cognition, perception, vision,
cross-modal, cross-platform, cross-culture, IoT etc.
Machine Learning
Language TechnologyKnowledge Technology
EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies
Human Language Project
33. • All official European and many additional languages
• Broad coverage, HQ, high precision – across modalities,
across platforms, across cultures
• Collaboration between EU, EC, EP, Member States,
research, industry, other stakeholders.
• Basic and applied research, innovation, commercialisation
• Policy change towards “LT-enabled multilingualism”
• HQMT – overcome quality (and language) barriers, written
and spoken, collaborate with human translators
• Resources and technologies for all European languages
33EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies
Human Language Project
34. http://www.cracker-project.eu • http://www.meta-net.eu
Version 1.0 of the SRIA
• Strategic Agenda: “Language Technologiesfor Multilingual
Europe – Towards a Human Language Project”
• Key recommendation: set up Human Language Project
• Also: establish Multilingual Digital Single Market
• Informed by “LT for Multilingual Europe” survey
• Takes into account: CEF AT, DSM, NGI
• To be presented at META-FORUM 2017 (13/14 Nov. 2017)
34
35. Summary & Conclusions
• AI is disrupting all industries – including translation.
• But: perfect machine translation is still far away.
• Not only are tools for gist translation getting better and
better, so are tools for human translators!
• Translators can expect to make use of a vastly improved
(adaptive) tool landscape in the next couple of years.
• We are collaborating with human translators better to
understand how translation processes work.
• The goal of the Human Language Project is to move
Europe into the pole position in this field.
35EP DG Trad Conference (16/17 Oct. 2017) – AI for Translation Technologies
37. Thank you!
Many thanks to Josef van Genabith, Christian Dugast,
Andrea Lösch (all DFKI) and to Maria Giagkou (ILSP).
Dr. Georg Rehm
DFKI Berlin
! georg.rehm@dfki.de
! http://de.linkedin.com/in/georgrehm
! https://www.slideshare.net/georgrehm
Human
Language
Project
Truly
Multilingual
Europe
European
Economy
(MDSM)
Attractive
jobs for
high
potentials
Education
and young
researchers
Massive
boost for
research
Foster
innovation
and new
companies
13/14 November 2017
Brussels, Belgium – http://www.meta-forum.eu
Register now! Participation is free of charge.