SlideShare una empresa de Scribd logo
1 de 29
Descargar para leer sin conexión
PangeaMT
Sharing Experiences on MT System,
Data management,
Hybridation
Alex Helle / Manuel Herranz
Intro
Brief history
Pangea system introduction /
features for EXPERT

Hybridation experiences at
Pangeanic (+future work)
Intro
Brief history
• “1-2 million words an hour”
• “quite adequate speed to
cope with the whole output
of the Soviet Union in a
week… a few hours computer
time a week”
• [full scale production] “if our
experiments go well, within 5
years or so”

http://youtu.be/K-HfpsHPmvw
What is PangeaMT?
 The first commercial application of Open Source Moses (AMTA 2010,
http://euromatrixplus.net/moses)
 A development overcoming Moses limitations for localization
industry presented at Association for MT in the Americas :
PangeaMT putting open standards to work... well AMTA 2010
http://bit.ly/uM8x6V
 06/2011 PangeaMT launches the DIY Solution to Machine Translate
independently and flexibly like never before http://bit.ly/kSd3wC
 07/2011 MT experiences Sony Europe http://slidesha.re/oxZmBS
 07/2011 A harness that eases re-training and updating  DIY SMT
as presented at TAUS Barcelona 2011 http://slidesha.re/nEe5mU
 02/2012 API for hosted solutions
What is PangeaMT?

2007 and before

• RB tests with commercial software
• Insufficiently good output
• Only internal production

2007/08
• V1: Small data sets (2-5M words),
automotive & electronics
• (ES), then Fr/It/De in other fields

• EU Post-Editing Award
2009/10
• Division born
• 00's of engine trials and
language combinations
• Open-Source to commercial

2011/12
• DIY SMT
• Automated retraining
• API v1
• Glossary
• Automated re-training
• Transfer architecture
and know-how to users
• Compatibility with
commercial formats
(ttx, sdlxliff, docx, odt)

• TMX / XLIFF workflows

• Powerful API v2 for live translation

• Confidence scores
• Compatibility with more commercial formats

2013
SMT at work
Unrest is continuing in Cairo as protesters set up their demand for Egypt’s
military rulers to resign

+ specific language rules
+ job or client glossary
+ hybrid technologies
Data? best clean, thank you
Cleaning
<tu srclang="en-GB">
<tuv xml:lang="EN-GB">
<seg>A system for recovering the methane that is emitted from the manure so that
it does not leak into the atmosphere.</seg>
</tuv>
<tuv xml:lang="FR-FR">
<seg>Système permettant de r€ pérer le méthane qui se dégage de l'engrais naturel
d'origine animale de sorte qu'il ne se dissipe pas dans l'atmosphère.</seg>
</tuv>

Cleaning

<tu creationdate="20090817T114430Z" creationid="APIACCESS"
changedate="20110617T141159Z" changeid=“pat">
<tuv xml:lang="EN-US">
<seg>Overall heigtht –<bpt i="1">{f43 </bpt> <ept i="1">}</ept>25&quot;; width –
<bpt i="2">{f43 </bpt> <ept i="2">}</ept>20.1&quot;.</seg>
</tuv>
<tuv xml:lang="ES-EM">
<seg><bpt i="1">{f2 </bpt>Altura total - 25&quot;; anchura <ept i="1">}</ept>–
<bpt i="2">{f43 </bpt> <ept i="2">}</ept><bpt i="3">{f2 </bpt>20,1&quot;.<ept
i="3">}</ept></seg>
</tuv>
</tu>

More cleaning

<tuv xml:lang=“EN-US">
<seg>On 22nd May we decided not to join the group.</seg>
<tuv xml:lang=“DE-DE">
<seg>Am 22. </seg>
Data? best clean, thank you
Cleaning
<tu srclang="en-GB">
<tuv xml:lang="EN-GB">
<seg>The President of the United States visited Costa Rica.</seg>
</tuv>
<tuv xml:lang=“ES-ES">
<seg>El Presidente de los Estados Unidos, el señor Obama y su esposa la señora
Michelle, visitaron Costa Rica el pasado sábado.</seg>
</tuv>

Cleaning

<tuv xml:lang=“JP">
<seg>同書は「通訳・翻訳キャリアガイド」の2011-2012年度版。
英字新聞のジャパンタイムズ社が強みとするジャーナリスティックな視点で、通訳や翻訳という仕事が持つ魅
力ややりがい、プロに要求されるスキルおよび意識の持ち方などを紹介。また通訳者・翻訳者になるための道
すじから、実際の仕事の現場にいたるまで、今日の通訳・翻訳業界の実像を包括的に紹介。</seg>
<tuv xml:lang=“EN-US">
<seg>It is a journalistic point of view and strengths of the Englishlanguage newspaper Japan Times. It includes a description of the exciting and
rewarding work of translation and interpretation, as well as the introduction of
consciousness and how to acquire the required professional skills. The road to
becoming a translator and interpreter also down to the actual work site, a
comprehensive guide to interpreting the reality of today'stranslation industry.
</seg>

More cleaning
Data? best clean, thank you
Parallel text extraction / Translation
input / Post-edited material

Cleaning

This is often comes from CAT tools or document
alignments, crawling

Engine training with
clean data
Having approved,
terminologically sound,
clean data improves engine
accuracy and performance
with even small sets of
data.

Data Cleaning (in-lines)
Remove all non-translation
data.

Data cleaning modules
•
•
•

TMX Human approval
Some of this material may
actually be OK for training. It
is then input in the training
set.

•
•

Remove any “suspects”:
Sentences that are too long
Mismatches (of many
kinds!)
Terminological inaccuracies
Non-useful segments, etc
System features – For EXPERT
Cleaning
System features – For EXPERT
Domain
System features – For EXPERT
Engine Creation
System features – For EXPERT
Engine Training
System features – For EXPERT
Typically a 5 n-gram, DL, table
Unrest is continuing in Cairo as protesters set up their demand for Egypt’s
military rulers to resign

•
•
•
•

specific language rules
job / client glossary
hybrid technologies
good bleu tracking, ideal
for experimentation
Different MT Systems for Different
Lang Pairs?
Related languages 
SMT, with accurate n-gram training and in-domain data (typically 5,
distorsion limit, weighs and fine-tuning)
Morphology-rich languages 
Data is not enough and casuistry too large (Baltic languages like Lavian are
extreme, Turkish is regular but too many suffixes) SMT cannot cope. Rulebased or Hybrid
Syntactically distant languages 
Need additional information, this is where different HYBRID TECHNIQUES
come into place. NO “SIZE FITS ALL”
Hybridation Experiences at Pangeanic
Rationale
when the
syntactic distance between languages is very large
(unrelated languages). Patterns are lost (or not found)
 monotone TR
-

Linguistic
Information

Language
Knowledge

Data

Output Translation
Hybridation Experiences at Pangeanic
TWO OPTIONS

SYNTAX-BASED HYBRID SMT
Altaic languages   English
Arabic   European languages
Agglutinative   Non- agglutinative

Linguistic
Information

Language
Knowledge

Data

RE-ORDERING
Toshiba / Mecab benchmarking
EN   JP
Output Translation
Hybridation Experiences at Pangeanic
TWO METHODS

CHALLENGES
 SVO vs SOV
 Tokenization: No spaces between words Mecab/KyTea for JP,
Peterson Segmentor for ZH
 RBMT systems have traditionally worked with linguistic &
morphological analyzers. Thus “units” were segmented.
 SMT can’t and so we need to tokenize to leave similar amount of
“words” on both sides  Giza++ can then relate words and groups.
Hybridation Experiences at Pangeanic
TWO OPTIONS

CHALLENGES
 SVO vs SOV
Hybridation Experiences at Pangeanic
TWO METHODS

CHALLENGES
 SVO vs SOV
 Re-ordering?
 Phrase-based or hierarchical models (syntactical)?
Continue to press the button to scroll through the components of the program until
the display shows the desired current selection.
Japanese proper word order would be

the display the desired current selection shows until the components the program of
through to scroll the button to press continue.
Hybridation Experiences at Pangeanic
Syntax-based analysis & re-ordering rules

SYNTAX-BASED (TREE) FOR HYBRID SMT

Tree depth: 10
Calc time +59% !!
Hybridation Experiences at Pangeanic
Syntax-based analysis & re-ordering rules

SYNTAX-BASED RULES FOR HYBRID SMT
発売 時 には、 同社は 次の バージョンを 提供する 予定 です 。

Translation & Cleaning
available When , the company the following : plans to offer :

Nipponization module
(Cond clause),

(Subject)

(VBPt) (to)

(Predicate)

(ADV) (ADJ) (Punct) (DET) (NNSing) (VBPt3) (to) (VBinf) (DET) (NN)
When available, the company plans to offer the following:
Hybridation Experiences at Pangeanic
TWO OPTIONS

TOSHIBA vs MECAB
Toshiba’s The Honyaku is a established RB system (+30 years)
Lacks flexibility, rules contradict each other
Proposal: re-arrange whole corpus EN for JP with Toshiba’s
rules, but this meant dependency on a proprietary system for
future inputs.
Hybridation Experiences at Pangeanic
TWO OPTIONS

TOSHIBA vs MECAB – LESSONS LEARNT
Mecab re-ordering produced higher BLEU than Toshiba’s
5-fold structure
Hybridation Experiences at Pangeanic
TWO OPTIONS

TOSHIBA vs MECAB – LESSONS LEARNT
Mecab re-ordering produced higher BLEU than Toshiba’s

Paper published December 2011 AAMT Going Hybrid: Pangeanic’s and Toshiba’s
First Steps Toward ENJP MT Hybridation
Hybridation Experiences at Pangeanic
TWO OPTIONS

TOSHIBA vs MECAB – LESSONS LEARNT
Mecab re-ordering produced higher BLEU than Toshiba’s

Paper published December 2011 AAMT Going Hybrid: Pangeanic’s and Toshiba’s
First Steps Toward ENJP MT Hybridation
Future (current) Work on Hybrids
 Morphology-rich langs: RU in particular.
Improve DE

 Distant languages: re-ordering for AR?
 Agglutinative langs: TK – new paradigm
Brief history

Intro

Pangea system introduction /
features for EXPERT
Hybridation experiences at
Pangeanic (+future work)
Questions?
m.herranz@pangeanic.com
#manuelhrrnz #pangeanic

pangeanic

Más contenido relacionado

La actualidad más candente

cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...Lifeng (Aaron) Han
 
Tech capabilities with_sa
Tech capabilities with_saTech capabilities with_sa
Tech capabilities with_saRobert Martin
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Fwdays
 
Statistical machine translation for indian language copy
Statistical machine translation for indian language   copyStatistical machine translation for indian language   copy
Statistical machine translation for indian language copyNakul Sharma
 
Neural Machine Translation: a report from the front line
Neural Machine Translation: a report from the front lineNeural Machine Translation: a report from the front line
Neural Machine Translation: a report from the front lineIconic Translation Machines
 
Overlapping optimization with parsing through metagrammars
Overlapping optimization with parsing through metagrammarsOverlapping optimization with parsing through metagrammars
Overlapping optimization with parsing through metagrammarsIAEME Publication
 
Experiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine TranslationExperiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine Translationkhyati gupta
 
Pangeanic Cor-ActivaTM-Neural machine translation Taus Tokyo 2017
Pangeanic Cor-ActivaTM-Neural machine translation Taus Tokyo 2017Pangeanic Cor-ActivaTM-Neural machine translation Taus Tokyo 2017
Pangeanic Cor-ActivaTM-Neural machine translation Taus Tokyo 2017Manuel Herranz
 
Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Fwdays
 
"Machine Translation 101" and the Challenge of Patents
"Machine Translation 101" and the Challenge of Patents"Machine Translation 101" and the Challenge of Patents
"Machine Translation 101" and the Challenge of PatentsIconic Translation Machines
 
Cohesive Software Design
Cohesive Software DesignCohesive Software Design
Cohesive Software Designijtsrd
 
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...Hayahide Yamagishi
 
Gestión proyectos traducción - Universitat Autònoma de Barcelona
Gestión proyectos traducción - Universitat Autònoma de BarcelonaGestión proyectos traducción - Universitat Autònoma de Barcelona
Gestión proyectos traducción - Universitat Autònoma de BarcelonaManuel Herranz
 
Tools-Driven Content Curation & Engine Training ATMA 2014
Tools-Driven Content Curation & Engine Training ATMA 2014Tools-Driven Content Curation & Engine Training ATMA 2014
Tools-Driven Content Curation & Engine Training ATMA 2014Welocalize
 
From Programming to Modeling And Back Again
From Programming to Modeling And Back AgainFrom Programming to Modeling And Back Again
From Programming to Modeling And Back AgainMarkus Voelter
 

La actualidad más candente (20)

SMT3
SMT3SMT3
SMT3
 
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
 
Tech capabilities with_sa
Tech capabilities with_saTech capabilities with_sa
Tech capabilities with_sa
 
Introduction to programing languages part 1
Introduction to programing languages   part 1Introduction to programing languages   part 1
Introduction to programing languages part 1
 
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
Thomas Wolf "An Introduction to Transfer Learning and Hugging Face"
 
Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...
Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...
Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...
 
Statistical machine translation for indian language copy
Statistical machine translation for indian language   copyStatistical machine translation for indian language   copy
Statistical machine translation for indian language copy
 
Neural Machine Translation: a report from the front line
Neural Machine Translation: a report from the front lineNeural Machine Translation: a report from the front line
Neural Machine Translation: a report from the front line
 
Overlapping optimization with parsing through metagrammars
Overlapping optimization with parsing through metagrammarsOverlapping optimization with parsing through metagrammars
Overlapping optimization with parsing through metagrammars
 
Experiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine TranslationExperiments with Different Models of Statistcial Machine Translation
Experiments with Different Models of Statistcial Machine Translation
 
Pangeanic Cor-ActivaTM-Neural machine translation Taus Tokyo 2017
Pangeanic Cor-ActivaTM-Neural machine translation Taus Tokyo 2017Pangeanic Cor-ActivaTM-Neural machine translation Taus Tokyo 2017
Pangeanic Cor-ActivaTM-Neural machine translation Taus Tokyo 2017
 
Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"Thomas Wolf "Transfer learning in NLP"
Thomas Wolf "Transfer learning in NLP"
 
"Machine Translation 101" and the Challenge of Patents
"Machine Translation 101" and the Challenge of Patents"Machine Translation 101" and the Challenge of Patents
"Machine Translation 101" and the Challenge of Patents
 
Cohesive Software Design
Cohesive Software DesignCohesive Software Design
Cohesive Software Design
 
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
[PACLING2019] Improving Context-aware Neural Machine Translation with Target-...
 
Gestión proyectos traducción - Universitat Autònoma de Barcelona
Gestión proyectos traducción - Universitat Autònoma de BarcelonaGestión proyectos traducción - Universitat Autònoma de Barcelona
Gestión proyectos traducción - Universitat Autònoma de Barcelona
 
Tools-Driven Content Curation & Engine Training ATMA 2014
Tools-Driven Content Curation & Engine Training ATMA 2014Tools-Driven Content Curation & Engine Training ATMA 2014
Tools-Driven Content Curation & Engine Training ATMA 2014
 
C aptitude book
C aptitude bookC aptitude book
C aptitude book
 
Introduction to programming languages part 1
Introduction to programming languages   part 1Introduction to programming languages   part 1
Introduction to programming languages part 1
 
From Programming to Modeling And Back Again
From Programming to Modeling And Back AgainFrom Programming to Modeling And Back Again
From Programming to Modeling And Back Again
 

Destacado

1. EXPERT Winter School Partner Introductions
1. EXPERT Winter School Partner Introductions1. EXPERT Winter School Partner Introductions
1. EXPERT Winter School Partner IntroductionsRIILP
 
2. Constantin Orasan (UoW) EXPERT Introduction
2. Constantin Orasan (UoW) EXPERT Introduction2. Constantin Orasan (UoW) EXPERT Introduction
2. Constantin Orasan (UoW) EXPERT IntroductionRIILP
 
16. Anne Schumann (USAAR) Terminology and Ontologies 1
16. Anne Schumann (USAAR) Terminology and Ontologies 116. Anne Schumann (USAAR) Terminology and Ontologies 1
16. Anne Schumann (USAAR) Terminology and Ontologies 1RIILP
 
7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine TranslationRIILP
 
17. Anne Schuman (USAAR) Terminology and Ontologies 2
17. Anne Schuman (USAAR) Terminology and Ontologies 217. Anne Schuman (USAAR) Terminology and Ontologies 2
17. Anne Schuman (USAAR) Terminology and Ontologies 2RIILP
 
8. Qun Liu (DCU) Hybrid Solutions for Translation
8. Qun Liu (DCU) Hybrid Solutions for Translation8. Qun Liu (DCU) Hybrid Solutions for Translation
8. Qun Liu (DCU) Hybrid Solutions for TranslationRIILP
 
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memoriesRIILP
 
11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation
11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation
11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translationRIILP
 
14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for Translation14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for TranslationRIILP
 
3. Natalia Konstantinova (UoW) EXPERT Introduction
3. Natalia Konstantinova (UoW) EXPERT Introduction3. Natalia Konstantinova (UoW) EXPERT Introduction
3. Natalia Konstantinova (UoW) EXPERT IntroductionRIILP
 
18. Alessandro Cattelan (Translated) Terminology
18. Alessandro Cattelan (Translated) Terminology18. Alessandro Cattelan (Translated) Terminology
18. Alessandro Cattelan (Translated) TerminologyRIILP
 
10. Lucia Specia (USFD) Evaluation of Machine Translation
10. Lucia Specia (USFD) Evaluation of Machine Translation10. Lucia Specia (USFD) Evaluation of Machine Translation
10. Lucia Specia (USFD) Evaluation of Machine TranslationRIILP
 
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...RIILP
 
6. Khalil Sima'an (UVA) Statistical Machine Translation
6. Khalil Sima'an (UVA) Statistical Machine Translation6. Khalil Sima'an (UVA) Statistical Machine Translation
6. Khalil Sima'an (UVA) Statistical Machine TranslationRIILP
 
12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...
12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...
12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...RIILP
 
13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for TranslationRIILP
 

Destacado (16)

1. EXPERT Winter School Partner Introductions
1. EXPERT Winter School Partner Introductions1. EXPERT Winter School Partner Introductions
1. EXPERT Winter School Partner Introductions
 
2. Constantin Orasan (UoW) EXPERT Introduction
2. Constantin Orasan (UoW) EXPERT Introduction2. Constantin Orasan (UoW) EXPERT Introduction
2. Constantin Orasan (UoW) EXPERT Introduction
 
16. Anne Schumann (USAAR) Terminology and Ontologies 1
16. Anne Schumann (USAAR) Terminology and Ontologies 116. Anne Schumann (USAAR) Terminology and Ontologies 1
16. Anne Schumann (USAAR) Terminology and Ontologies 1
 
7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation
 
17. Anne Schuman (USAAR) Terminology and Ontologies 2
17. Anne Schuman (USAAR) Terminology and Ontologies 217. Anne Schuman (USAAR) Terminology and Ontologies 2
17. Anne Schuman (USAAR) Terminology and Ontologies 2
 
8. Qun Liu (DCU) Hybrid Solutions for Translation
8. Qun Liu (DCU) Hybrid Solutions for Translation8. Qun Liu (DCU) Hybrid Solutions for Translation
8. Qun Liu (DCU) Hybrid Solutions for Translation
 
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
5. manuel arcedillo & juanjo arevalillo (hermes) translation memories
 
11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation
11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation
11. manuel leiva & juanjo arevalillo (hermes) evaluation of machine translation
 
14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for Translation14. Michael Oakes (UoW) Natural Language Processing for Translation
14. Michael Oakes (UoW) Natural Language Processing for Translation
 
3. Natalia Konstantinova (UoW) EXPERT Introduction
3. Natalia Konstantinova (UoW) EXPERT Introduction3. Natalia Konstantinova (UoW) EXPERT Introduction
3. Natalia Konstantinova (UoW) EXPERT Introduction
 
18. Alessandro Cattelan (Translated) Terminology
18. Alessandro Cattelan (Translated) Terminology18. Alessandro Cattelan (Translated) Terminology
18. Alessandro Cattelan (Translated) Terminology
 
10. Lucia Specia (USFD) Evaluation of Machine Translation
10. Lucia Specia (USFD) Evaluation of Machine Translation10. Lucia Specia (USFD) Evaluation of Machine Translation
10. Lucia Specia (USFD) Evaluation of Machine Translation
 
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
 
6. Khalil Sima'an (UVA) Statistical Machine Translation
6. Khalil Sima'an (UVA) Statistical Machine Translation6. Khalil Sima'an (UVA) Statistical Machine Translation
6. Khalil Sima'an (UVA) Statistical Machine Translation
 
12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...
12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...
12. Gloria Corpas, Jorge Leiva, Miriam Seghiri (UMA) Human Translation & Tran...
 
13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation13. Constantin Orasan (UoW) Natural Language Processing for Translation
13. Constantin Orasan (UoW) Natural Language Processing for Translation
 

Similar a 9. Manuel Harranz (pangeanic) Hybrid Solutions for Translation

Presentation at CEF-EU-Luxembourg
Presentation at CEF-EU-LuxembourgPresentation at CEF-EU-Luxembourg
Presentation at CEF-EU-LuxembourgManuel Herranz
 
Pangeanic presentation at Elia Together Athens - Manuel Herranz
Pangeanic presentation at Elia Together Athens - Manuel HerranzPangeanic presentation at Elia Together Athens - Manuel Herranz
Pangeanic presentation at Elia Together Athens - Manuel HerranzManuel Herranz
 
Panacea presentation - Pangeanic - Budapest
Panacea presentation - Pangeanic - BudapestPanacea presentation - Pangeanic - Budapest
Panacea presentation - Pangeanic - BudapestManuel Herranz
 
Pangeanic Taus Presentation 13.06.17
Pangeanic Taus Presentation 13.06.17Pangeanic Taus Presentation 13.06.17
Pangeanic Taus Presentation 13.06.17Garth Brian Hedenskog
 
Gestión proyectos traducción en la Universitat Autònoma de Barcelona
Gestión proyectos traducción en la Universitat Autònoma de BarcelonaGestión proyectos traducción en la Universitat Autònoma de Barcelona
Gestión proyectos traducción en la Universitat Autònoma de BarcelonaManuel Herranz
 
SDL BeGlobal The SDL Platform for Automated Translation
SDL BeGlobal The SDL Platform for Automated TranslationSDL BeGlobal The SDL Platform for Automated Translation
SDL BeGlobal The SDL Platform for Automated TranslationSDL Trados
 
Safaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-Editing
Safaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-EditingSafaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-Editing
Safaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-EditingWelocalize
 
Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)
Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)
Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)TAUS - The Language Data Network
 
Tms days 04 2012 manuel herranz pangea mt
Tms days 04 2012 manuel herranz pangea mtTms days 04 2012 manuel herranz pangea mt
Tms days 04 2012 manuel herranz pangea mtManuel Herranz
 
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Manuel Herranz, Pangean...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Manuel Herranz, Pangean...TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Manuel Herranz, Pangean...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Manuel Herranz, Pangean...TAUS - The Language Data Network
 
RSJ2011 OSS Robotics and Tools OpenHRI Intro
RSJ2011 OSS Robotics and Tools OpenHRI IntroRSJ2011 OSS Robotics and Tools OpenHRI Intro
RSJ2011 OSS Robotics and Tools OpenHRI IntroYosuke Matsusaka
 
Automatic subtitle generation
Automatic subtitle generationAutomatic subtitle generation
Automatic subtitle generationtanyasaxena1611
 
IRJET- Voice Command Execution with Speech Recognition and Synthesizer
IRJET- Voice Command Execution with Speech Recognition and SynthesizerIRJET- Voice Command Execution with Speech Recognition and Synthesizer
IRJET- Voice Command Execution with Speech Recognition and SynthesizerIRJET Journal
 
A Man-Computer Interactive System
A Man-Computer Interactive SystemA Man-Computer Interactive System
A Man-Computer Interactive SystemJames Heller
 

Similar a 9. Manuel Harranz (pangeanic) Hybrid Solutions for Translation (20)

Presentation at CEF-EU-Luxembourg
Presentation at CEF-EU-LuxembourgPresentation at CEF-EU-Luxembourg
Presentation at CEF-EU-Luxembourg
 
Pangeanic presentation at Elia Together Athens - Manuel Herranz
Pangeanic presentation at Elia Together Athens - Manuel HerranzPangeanic presentation at Elia Together Athens - Manuel Herranz
Pangeanic presentation at Elia Together Athens - Manuel Herranz
 
Panacea presentation - Pangeanic - Budapest
Panacea presentation - Pangeanic - BudapestPanacea presentation - Pangeanic - Budapest
Panacea presentation - Pangeanic - Budapest
 
Pangeanic Taus Presentation 13.06.17
Pangeanic Taus Presentation 13.06.17Pangeanic Taus Presentation 13.06.17
Pangeanic Taus Presentation 13.06.17
 
Jtf new
Jtf newJtf new
Jtf new
 
Gestión proyectos traducción en la Universitat Autònoma de Barcelona
Gestión proyectos traducción en la Universitat Autònoma de BarcelonaGestión proyectos traducción en la Universitat Autònoma de Barcelona
Gestión proyectos traducción en la Universitat Autònoma de Barcelona
 
Jtf new
Jtf newJtf new
Jtf new
 
SDL BeGlobal The SDL Platform for Automated Translation
SDL BeGlobal The SDL Platform for Automated TranslationSDL BeGlobal The SDL Platform for Automated Translation
SDL BeGlobal The SDL Platform for Automated Translation
 
Safaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-Editing
Safaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-EditingSafaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-Editing
Safaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-Editing
 
Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)
Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)
Topic 4: The Magician's Hat: Turning Data into Business Intelligence (3)
 
Tms days 04 2012 manuel herranz pangea mt
Tms days 04 2012 manuel herranz pangea mtTms days 04 2012 manuel herranz pangea mt
Tms days 04 2012 manuel herranz pangea mt
 
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Manuel Herranz, Pangean...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Manuel Herranz, Pangean...TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Manuel Herranz, Pangean...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Manuel Herranz, Pangean...
 
RSJ2011 OSS Robotics and Tools OpenHRI Intro
RSJ2011 OSS Robotics and Tools OpenHRI IntroRSJ2011 OSS Robotics and Tools OpenHRI Intro
RSJ2011 OSS Robotics and Tools OpenHRI Intro
 
Automatic subtitle generation
Automatic subtitle generationAutomatic subtitle generation
Automatic subtitle generation
 
IRJET- Voice Command Execution with Speech Recognition and Synthesizer
IRJET- Voice Command Execution with Speech Recognition and SynthesizerIRJET- Voice Command Execution with Speech Recognition and Synthesizer
IRJET- Voice Command Execution with Speech Recognition and Synthesizer
 
Autosar Basics hand book_v1
Autosar Basics  hand book_v1Autosar Basics  hand book_v1
Autosar Basics hand book_v1
 
CAT TOOLS.ppt
CAT TOOLS.pptCAT TOOLS.ppt
CAT TOOLS.ppt
 
A Man-Computer Interactive System
A Man-Computer Interactive SystemA Man-Computer Interactive System
A Man-Computer Interactive System
 
LVTS Projects
LVTS ProjectsLVTS Projects
LVTS Projects
 
Ch1
Ch1Ch1
Ch1
 

Más de RIILP

Gabriella Gonzalez - eTRAD
Gabriella Gonzalez - eTRAD Gabriella Gonzalez - eTRAD
Gabriella Gonzalez - eTRAD RIILP
 
Manuel Herranz - Pangeanic
Manuel Herranz - Pangeanic Manuel Herranz - Pangeanic
Manuel Herranz - Pangeanic RIILP
 
Carla Parra Escartin - ER2 Hermes Traducciones
Carla Parra Escartin - ER2 Hermes Traducciones Carla Parra Escartin - ER2 Hermes Traducciones
Carla Parra Escartin - ER2 Hermes Traducciones RIILP
 
Juanjo Arevelillo - Hermes Traducciones
Juanjo Arevelillo - Hermes Traducciones Juanjo Arevelillo - Hermes Traducciones
Juanjo Arevelillo - Hermes Traducciones RIILP
 
Gianluca Giulinin - FAO
Gianluca Giulinin - FAO Gianluca Giulinin - FAO
Gianluca Giulinin - FAO RIILP
 
Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic
Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic
Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic RIILP
 
Tony O'Dowd - KantanMT
Tony O'Dowd -  KantanMT Tony O'Dowd -  KantanMT
Tony O'Dowd - KantanMT RIILP
 
Santanu Pal - ESR 2 USAAR
Santanu Pal - ESR 2 USAARSantanu Pal - ESR 2 USAAR
Santanu Pal - ESR 2 USAARRIILP
 
Chris Hokamp - ESR 9 DCU
Chris Hokamp - ESR 9 DCU Chris Hokamp - ESR 9 DCU
Chris Hokamp - ESR 9 DCU RIILP
 
Anna Zaretskaya - ESR 1 UMA
Anna Zaretskaya - ESR 1 UMAAnna Zaretskaya - ESR 1 UMA
Anna Zaretskaya - ESR 1 UMARIILP
 
Carolina Scarton - ESR 7 - USFD
Carolina Scarton - ESR 7 - USFD  Carolina Scarton - ESR 7 - USFD
Carolina Scarton - ESR 7 - USFD RIILP
 
Rohit Gupta - ESR 4 - UoW
Rohit Gupta - ESR 4 - UoW Rohit Gupta - ESR 4 - UoW
Rohit Gupta - ESR 4 - UoW RIILP
 
Hernani Costa - ESR 3 - UMA
Hernani Costa - ESR 3 - UMA Hernani Costa - ESR 3 - UMA
Hernani Costa - ESR 3 - UMA RIILP
 
Liangyou Li - ESR 8 - DCU
Liangyou Li - ESR 8 - DCU Liangyou Li - ESR 8 - DCU
Liangyou Li - ESR 8 - DCU RIILP
 
Liling Tan - ESR 5 USAAR
Liling Tan - ESR 5 USAARLiling Tan - ESR 5 USAAR
Liling Tan - ESR 5 USAARRIILP
 
Sandra de luca - Acclaro
Sandra de luca - AcclaroSandra de luca - Acclaro
Sandra de luca - AcclaroRIILP
 
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015RIILP
 
ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015
ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015
ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015RIILP
 
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015RIILP
 
ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015
ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015
ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015RIILP
 

Más de RIILP (20)

Gabriella Gonzalez - eTRAD
Gabriella Gonzalez - eTRAD Gabriella Gonzalez - eTRAD
Gabriella Gonzalez - eTRAD
 
Manuel Herranz - Pangeanic
Manuel Herranz - Pangeanic Manuel Herranz - Pangeanic
Manuel Herranz - Pangeanic
 
Carla Parra Escartin - ER2 Hermes Traducciones
Carla Parra Escartin - ER2 Hermes Traducciones Carla Parra Escartin - ER2 Hermes Traducciones
Carla Parra Escartin - ER2 Hermes Traducciones
 
Juanjo Arevelillo - Hermes Traducciones
Juanjo Arevelillo - Hermes Traducciones Juanjo Arevelillo - Hermes Traducciones
Juanjo Arevelillo - Hermes Traducciones
 
Gianluca Giulinin - FAO
Gianluca Giulinin - FAO Gianluca Giulinin - FAO
Gianluca Giulinin - FAO
 
Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic
Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic
Lianet Sepulveda & Alexander Raginsky - ER 3a & ER 3b Pangeanic
 
Tony O'Dowd - KantanMT
Tony O'Dowd -  KantanMT Tony O'Dowd -  KantanMT
Tony O'Dowd - KantanMT
 
Santanu Pal - ESR 2 USAAR
Santanu Pal - ESR 2 USAARSantanu Pal - ESR 2 USAAR
Santanu Pal - ESR 2 USAAR
 
Chris Hokamp - ESR 9 DCU
Chris Hokamp - ESR 9 DCU Chris Hokamp - ESR 9 DCU
Chris Hokamp - ESR 9 DCU
 
Anna Zaretskaya - ESR 1 UMA
Anna Zaretskaya - ESR 1 UMAAnna Zaretskaya - ESR 1 UMA
Anna Zaretskaya - ESR 1 UMA
 
Carolina Scarton - ESR 7 - USFD
Carolina Scarton - ESR 7 - USFD  Carolina Scarton - ESR 7 - USFD
Carolina Scarton - ESR 7 - USFD
 
Rohit Gupta - ESR 4 - UoW
Rohit Gupta - ESR 4 - UoW Rohit Gupta - ESR 4 - UoW
Rohit Gupta - ESR 4 - UoW
 
Hernani Costa - ESR 3 - UMA
Hernani Costa - ESR 3 - UMA Hernani Costa - ESR 3 - UMA
Hernani Costa - ESR 3 - UMA
 
Liangyou Li - ESR 8 - DCU
Liangyou Li - ESR 8 - DCU Liangyou Li - ESR 8 - DCU
Liangyou Li - ESR 8 - DCU
 
Liling Tan - ESR 5 USAAR
Liling Tan - ESR 5 USAARLiling Tan - ESR 5 USAAR
Liling Tan - ESR 5 USAAR
 
Sandra de luca - Acclaro
Sandra de luca - AcclaroSandra de luca - Acclaro
Sandra de luca - Acclaro
 
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
ER1 Eduard Barbu - EXPERT Summer School - Malaga 2015
 
ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015
ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015
ESR1 Anna Zaretskaya - EXPERT Summer School - Malaga 2015
 
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
ESR2 Santanu Pal - EXPERT Summer School - Malaga 2015
 
ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015
ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015
ESR3 Hernani Costa - EXPERT Summer School - Malaga 2015
 

Último

8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCRashishs7044
 
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607dollysharma2066
 
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckPitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckHajeJanKamps
 
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCRashishs7044
 
Organizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessOrganizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessSeta Wicaksana
 
Memorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMMemorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMVoces Mineras
 
Innovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfInnovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfrichard876048
 
2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis UsageNeil Kimberley
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Kirill Klimov
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCRashishs7044
 
International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...ssuserf63bd7
 
Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737Riya Pathan
 
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfIntro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfpollardmorgan
 
Future Of Sample Report 2024 | Redacted Version
Future Of Sample Report 2024 | Redacted VersionFuture Of Sample Report 2024 | Redacted Version
Future Of Sample Report 2024 | Redacted VersionMintel Group
 
PSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationPSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationAnamaria Contreras
 
Islamabad Escorts | Call 03070433345 | Escort Service in Islamabad
Islamabad Escorts | Call 03070433345 | Escort Service in IslamabadIslamabad Escorts | Call 03070433345 | Escort Service in Islamabad
Islamabad Escorts | Call 03070433345 | Escort Service in IslamabadAyesha Khan
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfRbc Rbcua
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Pereraictsugar
 
Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Seta Wicaksana
 

Último (20)

8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
 
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
(Best) ENJOY Call Girls in Faridabad Ex | 8377087607
 
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deckPitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
Pitch Deck Teardown: Geodesic.Life's $500k Pre-seed deck
 
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
8447779800, Low rate Call girls in Uttam Nagar Delhi NCR
 
Organizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessOrganizational Structure Running A Successful Business
Organizational Structure Running A Successful Business
 
Memorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMMemorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQM
 
Innovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdfInnovation Conference 5th March 2024.pdf
Innovation Conference 5th March 2024.pdf
 
2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage
 
Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024Flow Your Strategy at Flight Levels Day 2024
Flow Your Strategy at Flight Levels Day 2024
 
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR8447779800, Low rate Call girls in Tughlakabad Delhi NCR
8447779800, Low rate Call girls in Tughlakabad Delhi NCR
 
International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...International Business Environments and Operations 16th Global Edition test b...
International Business Environments and Operations 16th Global Edition test b...
 
Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737
 
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdfIntro to BCG's Carbon Emissions Benchmark_vF.pdf
Intro to BCG's Carbon Emissions Benchmark_vF.pdf
 
Enjoy ➥8448380779▻ Call Girls In Sector 18 Noida Escorts Delhi NCR
Enjoy ➥8448380779▻ Call Girls In Sector 18 Noida Escorts Delhi NCREnjoy ➥8448380779▻ Call Girls In Sector 18 Noida Escorts Delhi NCR
Enjoy ➥8448380779▻ Call Girls In Sector 18 Noida Escorts Delhi NCR
 
Future Of Sample Report 2024 | Redacted Version
Future Of Sample Report 2024 | Redacted VersionFuture Of Sample Report 2024 | Redacted Version
Future Of Sample Report 2024 | Redacted Version
 
PSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationPSCC - Capability Statement Presentation
PSCC - Capability Statement Presentation
 
Islamabad Escorts | Call 03070433345 | Escort Service in Islamabad
Islamabad Escorts | Call 03070433345 | Escort Service in IslamabadIslamabad Escorts | Call 03070433345 | Escort Service in Islamabad
Islamabad Escorts | Call 03070433345 | Escort Service in Islamabad
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdf
 
Kenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith PereraKenya Coconut Production Presentation by Dr. Lalith Perera
Kenya Coconut Production Presentation by Dr. Lalith Perera
 
Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...Ten Organizational Design Models to align structure and operations to busines...
Ten Organizational Design Models to align structure and operations to busines...
 

9. Manuel Harranz (pangeanic) Hybrid Solutions for Translation

  • 1. PangeaMT Sharing Experiences on MT System, Data management, Hybridation Alex Helle / Manuel Herranz
  • 2. Intro Brief history Pangea system introduction / features for EXPERT Hybridation experiences at Pangeanic (+future work)
  • 3. Intro Brief history • “1-2 million words an hour” • “quite adequate speed to cope with the whole output of the Soviet Union in a week… a few hours computer time a week” • [full scale production] “if our experiments go well, within 5 years or so” http://youtu.be/K-HfpsHPmvw
  • 4. What is PangeaMT?  The first commercial application of Open Source Moses (AMTA 2010, http://euromatrixplus.net/moses)  A development overcoming Moses limitations for localization industry presented at Association for MT in the Americas : PangeaMT putting open standards to work... well AMTA 2010 http://bit.ly/uM8x6V  06/2011 PangeaMT launches the DIY Solution to Machine Translate independently and flexibly like never before http://bit.ly/kSd3wC  07/2011 MT experiences Sony Europe http://slidesha.re/oxZmBS  07/2011 A harness that eases re-training and updating  DIY SMT as presented at TAUS Barcelona 2011 http://slidesha.re/nEe5mU  02/2012 API for hosted solutions
  • 5. What is PangeaMT? 2007 and before • RB tests with commercial software • Insufficiently good output • Only internal production 2007/08 • V1: Small data sets (2-5M words), automotive & electronics • (ES), then Fr/It/De in other fields • EU Post-Editing Award 2009/10 • Division born • 00's of engine trials and language combinations • Open-Source to commercial 2011/12 • DIY SMT • Automated retraining • API v1 • Glossary • Automated re-training • Transfer architecture and know-how to users • Compatibility with commercial formats (ttx, sdlxliff, docx, odt) • TMX / XLIFF workflows • Powerful API v2 for live translation • Confidence scores • Compatibility with more commercial formats 2013
  • 6. SMT at work Unrest is continuing in Cairo as protesters set up their demand for Egypt’s military rulers to resign + specific language rules + job or client glossary + hybrid technologies
  • 7. Data? best clean, thank you Cleaning <tu srclang="en-GB"> <tuv xml:lang="EN-GB"> <seg>A system for recovering the methane that is emitted from the manure so that it does not leak into the atmosphere.</seg> </tuv> <tuv xml:lang="FR-FR"> <seg>Système permettant de r€ pérer le méthane qui se dégage de l'engrais naturel d'origine animale de sorte qu'il ne se dissipe pas dans l'atmosphère.</seg> </tuv> Cleaning <tu creationdate="20090817T114430Z" creationid="APIACCESS" changedate="20110617T141159Z" changeid=“pat"> <tuv xml:lang="EN-US"> <seg>Overall heigtht –<bpt i="1">{f43 </bpt> <ept i="1">}</ept>25&quot;; width – <bpt i="2">{f43 </bpt> <ept i="2">}</ept>20.1&quot;.</seg> </tuv> <tuv xml:lang="ES-EM"> <seg><bpt i="1">{f2 </bpt>Altura total - 25&quot;; anchura <ept i="1">}</ept>– <bpt i="2">{f43 </bpt> <ept i="2">}</ept><bpt i="3">{f2 </bpt>20,1&quot;.<ept i="3">}</ept></seg> </tuv> </tu> More cleaning <tuv xml:lang=“EN-US"> <seg>On 22nd May we decided not to join the group.</seg> <tuv xml:lang=“DE-DE"> <seg>Am 22. </seg>
  • 8. Data? best clean, thank you Cleaning <tu srclang="en-GB"> <tuv xml:lang="EN-GB"> <seg>The President of the United States visited Costa Rica.</seg> </tuv> <tuv xml:lang=“ES-ES"> <seg>El Presidente de los Estados Unidos, el señor Obama y su esposa la señora Michelle, visitaron Costa Rica el pasado sábado.</seg> </tuv> Cleaning <tuv xml:lang=“JP"> <seg>同書は「通訳・翻訳キャリアガイド」の2011-2012年度版。 英字新聞のジャパンタイムズ社が強みとするジャーナリスティックな視点で、通訳や翻訳という仕事が持つ魅 力ややりがい、プロに要求されるスキルおよび意識の持ち方などを紹介。また通訳者・翻訳者になるための道 すじから、実際の仕事の現場にいたるまで、今日の通訳・翻訳業界の実像を包括的に紹介。</seg> <tuv xml:lang=“EN-US"> <seg>It is a journalistic point of view and strengths of the Englishlanguage newspaper Japan Times. It includes a description of the exciting and rewarding work of translation and interpretation, as well as the introduction of consciousness and how to acquire the required professional skills. The road to becoming a translator and interpreter also down to the actual work site, a comprehensive guide to interpreting the reality of today'stranslation industry. </seg> More cleaning
  • 9. Data? best clean, thank you Parallel text extraction / Translation input / Post-edited material Cleaning This is often comes from CAT tools or document alignments, crawling Engine training with clean data Having approved, terminologically sound, clean data improves engine accuracy and performance with even small sets of data. Data Cleaning (in-lines) Remove all non-translation data. Data cleaning modules • • • TMX Human approval Some of this material may actually be OK for training. It is then input in the training set. • • Remove any “suspects”: Sentences that are too long Mismatches (of many kinds!) Terminological inaccuracies Non-useful segments, etc
  • 10. System features – For EXPERT Cleaning
  • 11. System features – For EXPERT Domain
  • 12. System features – For EXPERT Engine Creation
  • 13. System features – For EXPERT Engine Training
  • 14. System features – For EXPERT Typically a 5 n-gram, DL, table Unrest is continuing in Cairo as protesters set up their demand for Egypt’s military rulers to resign • • • • specific language rules job / client glossary hybrid technologies good bleu tracking, ideal for experimentation
  • 15. Different MT Systems for Different Lang Pairs? Related languages  SMT, with accurate n-gram training and in-domain data (typically 5, distorsion limit, weighs and fine-tuning) Morphology-rich languages  Data is not enough and casuistry too large (Baltic languages like Lavian are extreme, Turkish is regular but too many suffixes) SMT cannot cope. Rulebased or Hybrid Syntactically distant languages  Need additional information, this is where different HYBRID TECHNIQUES come into place. NO “SIZE FITS ALL”
  • 16. Hybridation Experiences at Pangeanic Rationale when the syntactic distance between languages is very large (unrelated languages). Patterns are lost (or not found)  monotone TR - Linguistic Information Language Knowledge Data Output Translation
  • 17. Hybridation Experiences at Pangeanic TWO OPTIONS SYNTAX-BASED HYBRID SMT Altaic languages   English Arabic   European languages Agglutinative   Non- agglutinative Linguistic Information Language Knowledge Data RE-ORDERING Toshiba / Mecab benchmarking EN   JP Output Translation
  • 18. Hybridation Experiences at Pangeanic TWO METHODS CHALLENGES  SVO vs SOV  Tokenization: No spaces between words Mecab/KyTea for JP, Peterson Segmentor for ZH  RBMT systems have traditionally worked with linguistic & morphological analyzers. Thus “units” were segmented.  SMT can’t and so we need to tokenize to leave similar amount of “words” on both sides  Giza++ can then relate words and groups.
  • 19. Hybridation Experiences at Pangeanic TWO OPTIONS CHALLENGES  SVO vs SOV
  • 20. Hybridation Experiences at Pangeanic TWO METHODS CHALLENGES  SVO vs SOV  Re-ordering?  Phrase-based or hierarchical models (syntactical)? Continue to press the button to scroll through the components of the program until the display shows the desired current selection. Japanese proper word order would be the display the desired current selection shows until the components the program of through to scroll the button to press continue.
  • 21. Hybridation Experiences at Pangeanic Syntax-based analysis & re-ordering rules SYNTAX-BASED (TREE) FOR HYBRID SMT Tree depth: 10 Calc time +59% !!
  • 22. Hybridation Experiences at Pangeanic Syntax-based analysis & re-ordering rules SYNTAX-BASED RULES FOR HYBRID SMT 発売 時 には、 同社は 次の バージョンを 提供する 予定 です 。 Translation & Cleaning available When , the company the following : plans to offer : Nipponization module (Cond clause), (Subject) (VBPt) (to) (Predicate) (ADV) (ADJ) (Punct) (DET) (NNSing) (VBPt3) (to) (VBinf) (DET) (NN) When available, the company plans to offer the following:
  • 23. Hybridation Experiences at Pangeanic TWO OPTIONS TOSHIBA vs MECAB Toshiba’s The Honyaku is a established RB system (+30 years) Lacks flexibility, rules contradict each other Proposal: re-arrange whole corpus EN for JP with Toshiba’s rules, but this meant dependency on a proprietary system for future inputs.
  • 24. Hybridation Experiences at Pangeanic TWO OPTIONS TOSHIBA vs MECAB – LESSONS LEARNT Mecab re-ordering produced higher BLEU than Toshiba’s 5-fold structure
  • 25. Hybridation Experiences at Pangeanic TWO OPTIONS TOSHIBA vs MECAB – LESSONS LEARNT Mecab re-ordering produced higher BLEU than Toshiba’s Paper published December 2011 AAMT Going Hybrid: Pangeanic’s and Toshiba’s First Steps Toward ENJP MT Hybridation
  • 26. Hybridation Experiences at Pangeanic TWO OPTIONS TOSHIBA vs MECAB – LESSONS LEARNT Mecab re-ordering produced higher BLEU than Toshiba’s Paper published December 2011 AAMT Going Hybrid: Pangeanic’s and Toshiba’s First Steps Toward ENJP MT Hybridation
  • 27. Future (current) Work on Hybrids  Morphology-rich langs: RU in particular. Improve DE  Distant languages: re-ordering for AR?  Agglutinative langs: TK – new paradigm
  • 28. Brief history Intro Pangea system introduction / features for EXPERT Hybridation experiences at Pangeanic (+future work)