Olga Beregovaya, CEO Americas, PROMT
PROMT's approach to engine hybridization differs from many other companies’ technology, using statistical methods on every stage of translation process: pre-editing, transfer and post-editing. The hybrid engine defines syntactic, lexical and grammar choices on an “atomic” level, rather than processing complete translated sentences. Pilot case examples will be used to demonstrate the robustness of advances.
TAUS USER CONFERENCE 2010, The Deep Hybrid machine translation engine
1. TAUS USER CONFERENCE 2010
LANGUAGE BUSINESS INNOVATION
4 – 6 OCTOBER / PORTLAND (OR), USA
TUESDAY 5 OCTOBER / 11.15
THE DEEP HYBRID MACHINE TRANSLATION ENGINE
Olga Beregovaya, PROMT
2. Company Profile
• Experienced. Founded in 1991
• International. Offices in US, Germany, Russia
• Innovative. 150 employees, 80 of them are in R&D
• Widely used. Over 120 million hits per month on our online
translation sites
3. Enterprise MT User Challenges
Market need: translated content built of fluent and relevant sentences that
preserve metadata information, branding, tone of voice and terminology.
Source: This error occurs in SQL Partner products when code in a trigger cancels
the operation using the SQL RAISE function, or if the SQLConnection.cancel
or SQLStatement.cancel methods are called when a statement is executed
using SQLStatement.execute or SQLStatement.next.
Not-so-good Target: Este error se produce en SQL productos de socios cuando
el código en un desencadenador se cancela la operación utilizando la
función de SQL levantar, o si el o los métodos SQLConnection.cancel
SQLStatement.cancel se llama cuando una declaración se ejecuta utilizando
SQLStatement.execute o SQLStatement.next.
Known challenges:
RBMT limitations: fluency, terminology, engine customization effort
SMT limitations: sentence structure, duplicates and omissions, over-
normalization both at training and at run-time
4. PROMT DeepHybrid Engine –
Taking on the Challenge
PROMT DeepHybrid – both approaches work side by side providing the best choice
possible during each step of the translation process
• Fluency: PROMT DeepHybrid approach increases the fluency of the final translation by letting
the corpus make translation choices – both grammatical and lexical
• Sentence structure: PROMT DeepHybrid preserves the syntactic accuracy and predictability
of the RBMT engine output
• Relevance: PROMT DeepHybrid combines terminology management capabilities of RBMT
systems with SMT corpus-based terminology validation
PROMT DeepHybrid also supports and enhances existing key product features:
• Style integrity: PROMT’s Virtual Style Guide technology automates the preservation of tone-
of-voice and corporate identity through automated rules selection
• Extensive Metadata Support (Translation Anchors): the rule-based core engine takes on all
heavy-duty metadata processing and preservation
5. PROMT DeepHybrid Engine Flowchart
Dictionaries Rules Corpora
PROMT
PROMT
General
Client Corpora: Client
Dictionary Client
Dictionaries PROMT Parallel Corpora
and Translation
and Transfer Rules and and
Domain Preferences
Glossaries Monolingual TMs
Dictionaries
PROMT Assets Client Assets PROMT Assets Client Assets PROMT Assets Client Assets
Post-edited
TM
Customized Translation Statistical Translation Language
Source Branching Candidates Post- Candidates Model Best Target
Transfer Editing Selection
1. 1.
2. 2.
X. X.
6. PROMT DeepHybrid at a Glance
Branching Transfer
PROMT Branching Transfer is a sequence of rule-based algorithmic steps enhanced by client-
specific statistical input
• Translation choices largely depend on the context ; PROMT engine only makes the most
apparent forced choices; otherwise, all probable instances are generated
• PROMT translation model usually produces 4-12 candidates for a 10-15 word sentence after
tree pruning techniques are applied
• Each step during lexical analysis relies on terminology from client TMs in addition to baseline
PROMT dictionaries
• Each step during syntactic analysis relies on PROMT rule library enhanced by syntactic
patterns mined from client TMs
Statistical Post-editing
Before being fed to the language model, the candidate translations undergo statistical post-
editing based on the sub-sentential parsing of both MT output and client data
Language Model:
• General corpora - billions of words are available
• In-domain corpora – pooling TDA data is helpful because of the thin domain space
• Client corpora provide probability skew in favor of client-specific choices
7. PROMT DeepHybrid Engine: Candidate
Selection Examples
Example 1: Syntactic choice Example 2: Lexical choice
Source: Source:
It is used for patient information, lab results, reports, images, and The "Nehalem" system architecture features an integrated memory
clinical data. controller
RBMT translation: RBMT translation:
Es usado para información sobre los pacientes, resultados del La arquitectura del sistema "Nehalem" presenta un controlador
laboratorio, informes, imágenes, y datos clínicos. de memoria integrado
Hybrid engine candidates: Hybrid engine candidates:
а) Es usado para información sobre los pacientes, resultados del а) La arquitectura del sistema "Nehalem" presenta un
laboratorio, informes, imágenes, y datos clínicos. controlador de memoria integrado
ppl= 791.4319204909 ppl= 288.17916810444
b) Se usa para información sobre los pacientes, resultados del b) La arquitectura del sistema "Nehalem" incluye un controlador
laboratorio, informes, imágenes, y datos clínicos. de memoria integrado
ppl= 424.83820234214 ppl= 234.86938828311
c) Está usado para información sobre los pacientes, resultados
del laboratorio, informes, imágenes, y datos clínicos.
Hybrid Outcome:
ppl= 814.24328845084 La arquitectura del sistema "Nehalem" incluye un
controlador de memoria integrado
Hybrid Outcome:
Se usa para información sobre los pacientes, resultados
del laboratorio, informes, imágenes, y datos clínicos.
• Hybrid engine chooses the candidate with the lowest perplexity (ppl)
8. Statistical Post-Editing at runtime
Example 1
Source: The following options were included with this subscription:
Pre post-editing: Las opciones siguientes se incluyeron con esta suscripción:
After post-editing: Las siguientes opciones se han incluido con esta suscripción:
Reference human translation: Las siguientes opciones se han incluido con su suscripción:
Example 2
Source: To meet financial service industry regulations, we need to confirm some of your personal
information.
Pre post-editing: Para encontrar normas de la industria del servicio financiero, tenemos que
confirmar un poco de su información personal.
After post-editing: Para cumplir las normas de la industria de servicios financieros, necesitamos
confirmar parte de información personal.
Reference human translation: Para cumplir las normas de la industria de servicios financieros,
necesitamos confirmar información personal suya.
9. PROMT 9.0 vs. PROMT DeepHybrid BLEU Scores
Engine status English-Spanish English-Spanish English-Spanish
Sample 1 Sample 2 Sample 3
~1,800 words ~2,000 words ~2,500 words
Out-of-the-box 31.80 26.74 29.02
Customized 39.00 34.30 36.50
RBMT
PROMT 46.20 41.02 43.65
DeepHybrid
10. PROMT DeepHybrid– Bridging the Post-Editing
Gap
Post editing effort for PROMT customized translation is now reported to range
between 4,000 - 8,000 words a day
Average post-editing effort is comprised of:
• Correcting sentence structure
• Correcting part of speech errors
• Correcting general grammar errors
• Looking up terminology
• Reordering meta-tags
PROMT DeepHybrid technology addresses the above challenges, which will have an
even greater impact on post-editors’ productivity
11. PROMT DeepHybrid – Putting the Puzzle
together
Not-so-good target: Este error se produce en SQL productos de socios cuando el código en un
desencadenador se cancela la operación utilizando la función de SQL levantar, o si el o los
métodos SQLConnection.cancel SQLStatement.cancel se llama cuando una declaración se
ejecuta utilizando SQLStatement.execute o SQLStatement.next.
Good sentence: Este error ocurre en los productos SQL Partner cuando código en un activador
cancela la operación llamando a la función SQL RAISE, o si los métodos SQLConnection.cancel
o SQLStatement.cancel son llamados cuando la declaración es ejecutada usando
SQLStatement.execute o SQLStatement.next .
PROMT DeepHybrid – up to the challenge!