SlideShare una empresa de Scribd logo
1 de 22
Foundations
Machine Translation
Post-Editing
Copyright: Welocalize, Inc. 2014. All Rights Reserved
machine.translation
Copyright: Welocalize, Inc. 2014. All Rights Reserved
Subheader
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
- Sample text here sample text here Sample text here.
- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
machine.translation
• Contracts
• Patents
• Annual Reports
• Light Marketing
• Software Documentation
• Software User Interface
• SEO (Search Engine Optimization)
• e-Learning Content
• User Guides
• Internal Corporate Communications
• Wikis
• Knowledge Bases
• Proposals / Draft Applications
• User Generated Content
Different use cases
for MT
(audience?
perishability?
visibility?)
Copyright: Welocalize, Inc. 2014. All Rights Reserved
Subheader
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
- Sample text here sample text here Sample text here.
- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
why.mt
For clients
– Increase throughputs and consistency
– Reduce cost of translation
– Content explosion due to Internet
– Most internet content is in English (user community is global)
– Desire to translate also “lower quality” content, such as User Generated Content
(UGC) at a profitable price
– Quality of MT has improved (new technologies, lots of research)
For the translator
– Increase throughputs and consistency
– MT is likely to become commonplace, like TMs before
– More & more clients and LSPs use MT
– Be an early-adopter
– MT and new forms of post-editing requirements are fast evolving
Copyright: Welocalize, Inc. 2014. All Rights Reserved
Subheader
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
- Sample text here sample text here Sample text here.
- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
basic.concepts
MT in a nutshell
[…] Machine Translation provides a set of tools by which digital text is automatically
translated from one language (e.g. English) into another language (e.g. Spanish).
Source: Systran user guide
There are 3 main types of MT systems with different underlying logics:
 Rules-based (RBMT)
 Statistical (SMT)
 Hybrid (SMT + RBMT)
Most systems used today are either statistical or hybrid. All system types can
be customized for specific clients, incorporating client Translation
Memories, basic preferences and/or terminology lists.
Copyright: Welocalize, Inc. 2014. All Rights Reserved
basic.concepts
Client-
specific data
TMs, glossaries
Domain-specific data
chemistry or mechanical or
IT or…
General language data
anything to“teach the system the
basics on the language pair“, so all of:
tourism, IT, automotive, literature,…
e.g. Google
Translate
and Bing
would be
Baseline only
Customizable
MT systems
(licensed or
open source)
Copyright: Welocalize, Inc. 2014. All Rights Reserved
- Sample text here sample text here Sample text here.
- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
basic.concepts
Understanding statistical MT
For the translator, it is important to understand that SMT systems are
based on algorithms calculating probabilities within a given set of data
(bilingual and monolingual).
In other words, the system learns from legacy human translations
(Translation Memories in our case) and calculates probabilities of most
likely translations from these, without applying linguistic rules as such.
Copyright: Welocalize, Inc. 2014. All Rights Reserved
basic.concepts
The logic behind
statistical
machine translation
(SMT)
Imagine the TM(s) as aligned data corpus – example
Example
Terminology
The term click appears > 16 000 times in TM A
In 90% of cases it is translated with fare clic
in 10% as: selezionare, scegliere, …
The probability is high, that the machine translation will be fare clic
…BUT, maybe…
The string click OK appears 500 times in TM A
In 50% of cases it is translated with fare clic su OK
in 50% as: selezionare OK
The probability is 50%, that the machine translation will be selezionare OK
Copyright: Welocalize, Inc. 2014. All Rights Reserved
Subheader
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
- Sample text here sample text here Sample text here.
- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
typical.examples
good > perfect to overall understandable and fairly fluent
medium > contains useful chunks, terms and occasionally perfect output;
more or less understandable, little fluency
poor > poor with regard to understandability and fluency
 We carry out content evaluations to prevent content with overall poor
MT output from going into production
 Medium is the broadest category and can still lead to productivity
gains when used as a basis for post-editing
The quality of raw MT output can vary. A distinction is typically made as follows:
Copyright: Welocalize, Inc. 2014. All Rights Reserved
Subheader
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
- Sample text here sample text here Sample text here.
- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
typical.examples
The quality of raw MT output can vary.
Example:
Copyright: Welocalize, Inc. 2014. All Rights Reserved
Subheader
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
- Sample text here sample text here Sample text here.
- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
typical.examples
Know the patterns of MT output
Even ”good” MT output is not expected to be perfect. Depending on the underlying
MT logic and the language pair, there tend to be typical issues to fix, e.g.:
– issues around capitalization
– punctuation (source punctuation is copied)
– spacing
– omissions/additions of text (usually different in nature to those in fuzzy matches)
– unknown/new words may be translated literally or be left in English
– word order: can be mirroring the source
– compound formation
– word form agreement
→ being aware of typical issues helps good post-editing
Copyright: Welocalize, Inc. 2014. All Rights Reserved
typical.examples
Copyright: Welocalize, Inc. 2014. All Rights Reserved
typical.examples
Copyright: Welocalize, Inc. 2014. All Rights Reserved
Subheader
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
- Sample text here sample text here Sample text here.
- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
post.editing
What is Post-Editing?
Copyright: Welocalize, Inc. 2014. All Rights Reserved
Subheader
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
- Sample text here sample text here Sample text here.
- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
post.editing
In other words…
 Make changes where necessary, using as much of the MT output as possible
(based on language and client requirements)
 Read the MT output & the source > decide quickly what can be used
 Use as many “bits/sections“ of the MT output as possible:
move them around, correct word forms, change the part of speech, use them as
inspiration
 Look up key terms in your reference material as usual, but also learn to trust the
customized output
 Automate with customized QA checks
Adjust your expectations. Rethink your approach. Report recurring errors.
Copyright: Welocalize, Inc. 2014. All Rights Reserved
Subheader
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
- Sample text here sample text here Sample text here.
- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
full.post.editing
full post-editing: “publishable quality”
► Client Glossary, TM, Style Guide and others apply
Examples:
 infinitive / imperative preferences?
 passive / impassive preferences?
 formal / informal preferences?
 different styles for headers, lists, tables?
 special formatting of UI options? (bilingual, English)
 are measurements to be converted?
 Terminology
If the client requests “full post-editing”, this means publishable quality.
The post-editor is responsible for ensuring the client requirements with regard
to final quality expectations are met.
Copyright: Welocalize, Inc. 2014. All Rights Reserved
Subheader
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
- Sample text here sample text here Sample text here.
- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
light.post.editing
light post-editing / “understandable quality”
Full Post-Editing Light Post-Editing
Grammar and spell-checking are correct Minor issues in grammar (and spelling) are
acceptable
Terminology is accurate & consistent Terminology is understandable and
actionable
Spelling is consistent (e.g. hyphenation) Variations in spelling are acceptable
Style is consistent (headers, list items,…) Style variations are acceptable
Punctuation is correct Variations/errors in punctuation are
acceptable
Style & tone are appropriate for content Style & tone are not offensive
Specific requirements: 33 cm (13‘‘); change
EN quotation marks to FR/DE/….
Follow MT output, e.g. keep proposed
number format 13‘‘ (33cm), English
quotation marks,...
… …
Copyright: Welocalize, Inc. 2014. All Rights Reserved
Subheader
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
- Sample text here sample text here Sample text here.
- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
post.editing
light post-editing versus full post-editing
*Copyright CSA
Copyright: Welocalize, Inc. 2014. All Rights Reserved
Image © Common Sense Advisory, “Post-Edited machine translation defined”, April 30, 2013
Subheader
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
- Sample text here sample text here Sample text here.
- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
post.editing
Notes on productivity
Just as with human translation, throughput can vary and depends on:
– language pair
– content type & complexity
– experience
– domain knowledge
– quality requirements
– use of automatic QA tools
– quality of TM and reference material
With MT, additional factors are:
– quality of the MT
– experience with post-editing
Compared to average daily throughputs for human translation, average daily
throughputs for full post-editing can be up to 3 x higher.
Copyright: Welocalize, Inc. 2014. All Rights Reserved
Subheader
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
- Sample text here sample text here Sample text here.
- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
take.aways
 There are different use-cases of MT associated with different levels of
final (post-edited) quality
 When full PE is requested, this means publishable quality
 There are different MT systems, Welocalize works with a range of them
 MT output varies in quality, we evaluate it with our translation partners
to ensure the necessary quality for post-editing is met
 MT is not expected to be perfect, that„s why we need post-editors!
 Post-editing replaces the translation stage in the workflow, but it is a
different task, cognitively
 MT systems can improve through adding more data & through
constructive feedback from post-editors
Copyright: Welocalize, Inc. 2014. All Rights Reserved
- Sample text here sample text here Sample text here.
- Sample text here sample text here Sample text here.
Sample text here Sample text here Sample text here Sample text
here Sample text here Sample text here.
trademark.disclaimer:
Product names, logos, brands and other trademarks referenced within this
presentation are the property of their respective trademark holders. These
trademark holders are not owned or affiliated to Welocalize, Inc., our
products, or our website. They do not sponsor or endorse our materials.
Reference is for education purposes only.
Copyright: Welocalize, Inc. 2014. All Rights Reserved
Questions?
Contact the Welocalize Language Tools Team
lena.marg@welocalize.com, elaine.ocurran@welocalize.com
Welocalize
Frederick, Maryland - Headquarters
241 East 4th St. Suite 207
Frederick, Maryland 21701 USA
[t] +1.301.668.0330
[t] +1.800.370.9515 Toll Free
www.welocalize.com
Copyright: Welocalize, Inc. 2014. All Rights Reserved

Más contenido relacionado

Más de Welocalize

Más de Welocalize (16)

Tools-Driven Content Curation & Engine Training ATMA 2014
Tools-Driven Content Curation & Engine Training ATMA 2014Tools-Driven Content Curation & Engine Training ATMA 2014
Tools-Driven Content Curation & Engine Training ATMA 2014
 
MT and Post-Editing User-Generated Content AMTA 2014
MT and Post-Editing User-Generated Content AMTA 2014MT and Post-Editing User-Generated Content AMTA 2014
MT and Post-Editing User-Generated Content AMTA 2014
 
Enterprise MT Content Drift: Challenges, Impacts and Advanced Solutions AMTA...
 Enterprise MT Content Drift: Challenges, Impacts and Advanced Solutions AMTA... Enterprise MT Content Drift: Challenges, Impacts and Advanced Solutions AMTA...
Enterprise MT Content Drift: Challenges, Impacts and Advanced Solutions AMTA...
 
Content Marketing World 2014 Language Fun Fact Challenge by welocalize
Content Marketing World 2014 Language Fun Fact Challenge by welocalizeContent Marketing World 2014 Language Fun Fact Challenge by welocalize
Content Marketing World 2014 Language Fun Fact Challenge by welocalize
 
Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...
Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...
Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...
 
Welocalize Cisco CNGL Partnership Shared at Localization World Dublin 2014
Welocalize Cisco CNGL Partnership Shared at Localization World Dublin 2014Welocalize Cisco CNGL Partnership Shared at Localization World Dublin 2014
Welocalize Cisco CNGL Partnership Shared at Localization World Dublin 2014
 
TAUS Quality Summit Dublin Welocalize Presentation by Olga Beregovaya and Len...
TAUS Quality Summit Dublin Welocalize Presentation by Olga Beregovaya and Len...TAUS Quality Summit Dublin Welocalize Presentation by Olga Beregovaya and Len...
TAUS Quality Summit Dublin Welocalize Presentation by Olga Beregovaya and Len...
 
Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014
Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014
Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014
 
Beyond Disruption: Make Way for Return on Content by Welocalize Olga Beregovaya
Beyond Disruption: Make Way for Return on Content by Welocalize Olga BeregovayaBeyond Disruption: Make Way for Return on Content by Welocalize Olga Beregovaya
Beyond Disruption: Make Way for Return on Content by Welocalize Olga Beregovaya
 
Better translations through automated source and post edit analysis
Better translations through automated source and post edit analysisBetter translations through automated source and post edit analysis
Better translations through automated source and post edit analysis
 
2013 CHAT tcworld tekom Welocalize Teaminology
2013 CHAT tcworld tekom Welocalize Teaminology 2013 CHAT tcworld tekom Welocalize Teaminology
2013 CHAT tcworld tekom Welocalize Teaminology
 
Overcoming “Old Fears” in the “New Marketing” World by Informatica and Weloca...
Overcoming “Old Fears” in the “New Marketing” World by Informatica and Weloca...Overcoming “Old Fears” in the “New Marketing” World by Informatica and Weloca...
Overcoming “Old Fears” in the “New Marketing” World by Informatica and Weloca...
 
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
 
An MT Journey Intuit and Welocalize Localization World 2013
An MT Journey Intuit and Welocalize Localization World 2013An MT Journey Intuit and Welocalize Localization World 2013
An MT Journey Intuit and Welocalize Localization World 2013
 
Safaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-Editing
Safaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-EditingSafaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-Editing
Safaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-Editing
 
MT Summit 2013 Welocalize Getting the MT Recipe Right by L Casanellas and L Marg
MT Summit 2013 Welocalize Getting the MT Recipe Right by L Casanellas and L MargMT Summit 2013 Welocalize Getting the MT Recipe Right by L Casanellas and L Marg
MT Summit 2013 Welocalize Getting the MT Recipe Right by L Casanellas and L Marg
 

Último

Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al MizharAl Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
allensay1
 
Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in PakistanChallenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
vineshkumarsajnani12
 
Mckinsey foundation level Handbook for Viewing
Mckinsey foundation level Handbook for ViewingMckinsey foundation level Handbook for Viewing
Mckinsey foundation level Handbook for Viewing
Nauman Safdar
 

Último (20)

Uneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration PresentationUneak White's Personal Brand Exploration Presentation
Uneak White's Personal Brand Exploration Presentation
 
HomeRoots Pitch Deck | Investor Insights | April 2024
HomeRoots Pitch Deck | Investor Insights | April 2024HomeRoots Pitch Deck | Investor Insights | April 2024
HomeRoots Pitch Deck | Investor Insights | April 2024
 
Cannabis Legalization World Map: 2024 Updated
Cannabis Legalization World Map: 2024 UpdatedCannabis Legalization World Map: 2024 Updated
Cannabis Legalization World Map: 2024 Updated
 
Durg CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN durg ESCORTS
Durg CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN durg ESCORTSDurg CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN durg ESCORTS
Durg CALL GIRL ❤ 82729*64427❤ CALL GIRLS IN durg ESCORTS
 
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al MizharAl Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
 
Paradip CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Paradip CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGParadip CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Paradip CALL GIRL❤7091819311❤CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
 
Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...
Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...
Escorts in Nungambakkam Phone 8250092165 Enjoy 24/7 Escort Service Enjoy Your...
 
Katrina Personal Brand Project and portfolio 1
Katrina Personal Brand Project and portfolio 1Katrina Personal Brand Project and portfolio 1
Katrina Personal Brand Project and portfolio 1
 
WheelTug Short Pitch Deck 2024 | Byond Insights
WheelTug Short Pitch Deck 2024 | Byond InsightsWheelTug Short Pitch Deck 2024 | Byond Insights
WheelTug Short Pitch Deck 2024 | Byond Insights
 
QSM Chap 10 Service Culture in Tourism and Hospitality Industry.pptx
QSM Chap 10 Service Culture in Tourism and Hospitality Industry.pptxQSM Chap 10 Service Culture in Tourism and Hospitality Industry.pptx
QSM Chap 10 Service Culture in Tourism and Hospitality Industry.pptx
 
Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in PakistanChallenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
Challenges and Opportunities: A Qualitative Study on Tax Compliance in Pakistan
 
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
Horngren’s Cost Accounting A Managerial Emphasis, Canadian 9th edition soluti...
 
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAI
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAIGetting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAI
Getting Real with AI - Columbus DAW - May 2024 - Nick Woo from AlignAI
 
CROSS CULTURAL NEGOTIATION BY PANMISEM NS
CROSS CULTURAL NEGOTIATION BY PANMISEM NSCROSS CULTURAL NEGOTIATION BY PANMISEM NS
CROSS CULTURAL NEGOTIATION BY PANMISEM NS
 
Phases of Negotiation .pptx
 Phases of Negotiation .pptx Phases of Negotiation .pptx
Phases of Negotiation .pptx
 
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdfDr. Admir Softic_ presentation_Green Club_ENG.pdf
Dr. Admir Softic_ presentation_Green Club_ENG.pdf
 
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service AvailableNashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
Nashik Call Girl Just Call 7091819311 Top Class Call Girl Service Available
 
Putting the SPARK into Virtual Training.pptx
Putting the SPARK into Virtual Training.pptxPutting the SPARK into Virtual Training.pptx
Putting the SPARK into Virtual Training.pptx
 
Mckinsey foundation level Handbook for Viewing
Mckinsey foundation level Handbook for ViewingMckinsey foundation level Handbook for Viewing
Mckinsey foundation level Handbook for Viewing
 
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDINGBerhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
Berhampur 70918*19311 CALL GIRLS IN ESCORT SERVICE WE ARE PROVIDING
 

Welocalize Machine Translation Post Editing Basics Course I

  • 3. Subheader Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. machine.translation • Contracts • Patents • Annual Reports • Light Marketing • Software Documentation • Software User Interface • SEO (Search Engine Optimization) • e-Learning Content • User Guides • Internal Corporate Communications • Wikis • Knowledge Bases • Proposals / Draft Applications • User Generated Content Different use cases for MT (audience? perishability? visibility?) Copyright: Welocalize, Inc. 2014. All Rights Reserved
  • 4. Subheader Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. why.mt For clients – Increase throughputs and consistency – Reduce cost of translation – Content explosion due to Internet – Most internet content is in English (user community is global) – Desire to translate also “lower quality” content, such as User Generated Content (UGC) at a profitable price – Quality of MT has improved (new technologies, lots of research) For the translator – Increase throughputs and consistency – MT is likely to become commonplace, like TMs before – More & more clients and LSPs use MT – Be an early-adopter – MT and new forms of post-editing requirements are fast evolving Copyright: Welocalize, Inc. 2014. All Rights Reserved
  • 5. Subheader Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. basic.concepts MT in a nutshell […] Machine Translation provides a set of tools by which digital text is automatically translated from one language (e.g. English) into another language (e.g. Spanish). Source: Systran user guide There are 3 main types of MT systems with different underlying logics:  Rules-based (RBMT)  Statistical (SMT)  Hybrid (SMT + RBMT) Most systems used today are either statistical or hybrid. All system types can be customized for specific clients, incorporating client Translation Memories, basic preferences and/or terminology lists. Copyright: Welocalize, Inc. 2014. All Rights Reserved
  • 6. basic.concepts Client- specific data TMs, glossaries Domain-specific data chemistry or mechanical or IT or… General language data anything to“teach the system the basics on the language pair“, so all of: tourism, IT, automotive, literature,… e.g. Google Translate and Bing would be Baseline only Customizable MT systems (licensed or open source) Copyright: Welocalize, Inc. 2014. All Rights Reserved
  • 7. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. basic.concepts Understanding statistical MT For the translator, it is important to understand that SMT systems are based on algorithms calculating probabilities within a given set of data (bilingual and monolingual). In other words, the system learns from legacy human translations (Translation Memories in our case) and calculates probabilities of most likely translations from these, without applying linguistic rules as such. Copyright: Welocalize, Inc. 2014. All Rights Reserved
  • 8. basic.concepts The logic behind statistical machine translation (SMT) Imagine the TM(s) as aligned data corpus – example Example Terminology The term click appears > 16 000 times in TM A In 90% of cases it is translated with fare clic in 10% as: selezionare, scegliere, … The probability is high, that the machine translation will be fare clic …BUT, maybe… The string click OK appears 500 times in TM A In 50% of cases it is translated with fare clic su OK in 50% as: selezionare OK The probability is 50%, that the machine translation will be selezionare OK Copyright: Welocalize, Inc. 2014. All Rights Reserved
  • 9. Subheader Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. typical.examples good > perfect to overall understandable and fairly fluent medium > contains useful chunks, terms and occasionally perfect output; more or less understandable, little fluency poor > poor with regard to understandability and fluency  We carry out content evaluations to prevent content with overall poor MT output from going into production  Medium is the broadest category and can still lead to productivity gains when used as a basis for post-editing The quality of raw MT output can vary. A distinction is typically made as follows: Copyright: Welocalize, Inc. 2014. All Rights Reserved
  • 10. Subheader Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. typical.examples The quality of raw MT output can vary. Example: Copyright: Welocalize, Inc. 2014. All Rights Reserved
  • 11. Subheader Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. typical.examples Know the patterns of MT output Even ”good” MT output is not expected to be perfect. Depending on the underlying MT logic and the language pair, there tend to be typical issues to fix, e.g.: – issues around capitalization – punctuation (source punctuation is copied) – spacing – omissions/additions of text (usually different in nature to those in fuzzy matches) – unknown/new words may be translated literally or be left in English – word order: can be mirroring the source – compound formation – word form agreement → being aware of typical issues helps good post-editing Copyright: Welocalize, Inc. 2014. All Rights Reserved
  • 12. typical.examples Copyright: Welocalize, Inc. 2014. All Rights Reserved
  • 13. typical.examples Copyright: Welocalize, Inc. 2014. All Rights Reserved
  • 14. Subheader Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. post.editing What is Post-Editing? Copyright: Welocalize, Inc. 2014. All Rights Reserved
  • 15. Subheader Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. post.editing In other words…  Make changes where necessary, using as much of the MT output as possible (based on language and client requirements)  Read the MT output & the source > decide quickly what can be used  Use as many “bits/sections“ of the MT output as possible: move them around, correct word forms, change the part of speech, use them as inspiration  Look up key terms in your reference material as usual, but also learn to trust the customized output  Automate with customized QA checks Adjust your expectations. Rethink your approach. Report recurring errors. Copyright: Welocalize, Inc. 2014. All Rights Reserved
  • 16. Subheader Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. full.post.editing full post-editing: “publishable quality” ► Client Glossary, TM, Style Guide and others apply Examples:  infinitive / imperative preferences?  passive / impassive preferences?  formal / informal preferences?  different styles for headers, lists, tables?  special formatting of UI options? (bilingual, English)  are measurements to be converted?  Terminology If the client requests “full post-editing”, this means publishable quality. The post-editor is responsible for ensuring the client requirements with regard to final quality expectations are met. Copyright: Welocalize, Inc. 2014. All Rights Reserved
  • 17. Subheader Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. light.post.editing light post-editing / “understandable quality” Full Post-Editing Light Post-Editing Grammar and spell-checking are correct Minor issues in grammar (and spelling) are acceptable Terminology is accurate & consistent Terminology is understandable and actionable Spelling is consistent (e.g. hyphenation) Variations in spelling are acceptable Style is consistent (headers, list items,…) Style variations are acceptable Punctuation is correct Variations/errors in punctuation are acceptable Style & tone are appropriate for content Style & tone are not offensive Specific requirements: 33 cm (13‘‘); change EN quotation marks to FR/DE/…. Follow MT output, e.g. keep proposed number format 13‘‘ (33cm), English quotation marks,... … … Copyright: Welocalize, Inc. 2014. All Rights Reserved
  • 18. Subheader Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. post.editing light post-editing versus full post-editing *Copyright CSA Copyright: Welocalize, Inc. 2014. All Rights Reserved Image © Common Sense Advisory, “Post-Edited machine translation defined”, April 30, 2013
  • 19. Subheader Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. post.editing Notes on productivity Just as with human translation, throughput can vary and depends on: – language pair – content type & complexity – experience – domain knowledge – quality requirements – use of automatic QA tools – quality of TM and reference material With MT, additional factors are: – quality of the MT – experience with post-editing Compared to average daily throughputs for human translation, average daily throughputs for full post-editing can be up to 3 x higher. Copyright: Welocalize, Inc. 2014. All Rights Reserved
  • 20. Subheader Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. take.aways  There are different use-cases of MT associated with different levels of final (post-edited) quality  When full PE is requested, this means publishable quality  There are different MT systems, Welocalize works with a range of them  MT output varies in quality, we evaluate it with our translation partners to ensure the necessary quality for post-editing is met  MT is not expected to be perfect, that„s why we need post-editors!  Post-editing replaces the translation stage in the workflow, but it is a different task, cognitively  MT systems can improve through adding more data & through constructive feedback from post-editors Copyright: Welocalize, Inc. 2014. All Rights Reserved
  • 21. - Sample text here sample text here Sample text here. - Sample text here sample text here Sample text here. Sample text here Sample text here Sample text here Sample text here Sample text here Sample text here. trademark.disclaimer: Product names, logos, brands and other trademarks referenced within this presentation are the property of their respective trademark holders. These trademark holders are not owned or affiliated to Welocalize, Inc., our products, or our website. They do not sponsor or endorse our materials. Reference is for education purposes only. Copyright: Welocalize, Inc. 2014. All Rights Reserved
  • 22. Questions? Contact the Welocalize Language Tools Team lena.marg@welocalize.com, elaine.ocurran@welocalize.com Welocalize Frederick, Maryland - Headquarters 241 East 4th St. Suite 207 Frederick, Maryland 21701 USA [t] +1.301.668.0330 [t] +1.800.370.9515 Toll Free www.welocalize.com Copyright: Welocalize, Inc. 2014. All Rights Reserved

Notas del editor

  1. This one is ok to leave as is as it is a list of many IT companies’ names, doesn’t really point at a specific client.
  2. Same as Dell