SlideShare una empresa de Scribd logo
1 de 25
Descargar para leer sin conexión
TAUS	
  MACHINE	
  TRANSLATION	
  SHOWCASE	
  
The WeMT Program
10:20 – 10:40
Thursday, 10 October 2013
Olga Beregovaya
Welocalize
WeMT	
  Tools	
  and	
  
Processes	
  
We’ll talk about:
•  MT	
  Programs	
  
•  Metrics	
  
•  Engines	
  
•  Language	
  Tools	
  
Current MT Programs
	
  
Dell	
  –	
  27	
  languages	
  
Autodesk	
  –	
  11	
  languages	
  
PayPal	
  	
  -­‐	
  8	
  languages	
  
Cisco	
  –	
  17	
  languages	
  between	
  3	
  Ders	
  
Intuit	
  –	
  20+languages	
  
MicrosoH	
  (pre-­‐project	
  support)	
  	
  
McAfee	
  (pilot)	
  	
  
…	
  many	
  more	
  in	
  pilot	
  stage	
  
MT Program: Path-to-Success
Components
	
  

A	
  set	
  of	
  MT	
  engines	
  –	
  “mix	
  and	
  match”	
  
	
  
TMT	
  SelecDon	
  Mechanisms	
  
Post-­‐ediDng	
  Environment	
  
	
  
Processes	
  and	
  metrics	
  
	
  
Data	
  gathering	
  and	
  reporDng	
  tool	
  –	
  what,	
  
how	
  much,	
  how	
  fast	
  and	
  at	
  what	
  effort	
  
	
  
EDUCATION	
  EDUCATION	
  EDUCATION	
  
	
  
CHANGE	
  

The recipe
for success
Process and Workflow
All aspects of the localization ecosystem are
taken into consideration

MT KPIs:

Selec3ng	
  the	
  right	
  MT	
  engine	
  
By	
  using	
  our	
  MT	
  engine	
  selecDon	
  Scorecard	
  we	
  make	
  sure	
  all	
  
important	
  KPIs	
  are	
  taken	
  into	
  consideraDon	
  at	
  selecDon	
  Dme	
  	
  

Empowerment	
  through	
  educa3on	
  
Internal,	
  by	
  the	
  use	
  of	
  customized	
  Toolkits;	
  external,	
  through	
  
specialised	
  Trainings.	
  

The	
  feedback	
  loop	
  
ConstrucDve	
  communicaDon	
  from	
  post-­‐editor	
  to	
  MT	
  
provider	
  

ü 	
  Produc3vity:	
  Throughputs	
  
ü 	
  Produc3vity:	
  Delta	
  	
  
ü 	
  Quality:	
  LQA	
  	
  
ü 	
  Quality:	
  Automa3c	
  Scores	
  
ü 	
  Cost	
  
ü 	
  GlobalSight:	
  Connec3vity	
  	
  
ü 	
  GlobalSight:	
  Tagging	
  	
  
ü 	
  Human	
  Evalua3on	
  
ü 	
  Customiza3on:	
  Internal/External	
  
ü 	
  Customiza3on:	
  Time	
  
MT Program Design - Source
o 
o 
o 
o 

o 

o 

Source	
  content	
  classificaDon	
  (i.e.	
  markeDng/UI/UA/UGC)	
  
Length	
  of	
  the	
  source	
  segment	
  
Source	
  segment	
  morpho-­‐syntacDc	
  complexity	
  
Presence/absence	
  of	
  pre-­‐defined	
  glossary	
  terms	
  or	
  mulD-­‐word	
  glossary	
  
elements,	
  UI	
  elements,	
  numeric	
  variables,	
  product	
  lists,	
  ‘do-­‐not-­‐translate’	
  
and	
  transliteraDon	
  lists	
  
Tag	
  density	
  -­‐	
  Metadata	
  aeributes	
  and	
  their	
  representaDon	
  in	
  localizaDon	
  
industry	
  standard	
  formats	
  (“tags”)	
  
ROC	
  –	
  quality	
  levels	
  based	
  on	
  content	
  use	
  (“impact”)	
  

3D	
  Model:	
  Expected	
  producDvity	
  mapped	
  to	
  desired	
  quality	
  levels	
  and	
  source	
  
content	
  complexity	
  
	
  
MT Engine Selection Scorecard
Produc3vity	
  -­‐	
  Throughputs	
  
	
  Number	
  of	
  post-­‐edited	
  words	
  per	
  hour	
  
Produc3vity	
  -­‐	
  Delta	
  	
  
	
  Percentage	
  difference	
  between	
  translaDon	
  and	
  post-­‐	
  
	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ediDng	
  Dme	
  
Cost	
  
	
  ExtrapolaDon,	
  cost	
  per	
  word	
  
CMS	
  -­‐	
  Connec3vity	
  	
  
We have tested and used
	
  Is	
  there	
  a	
  connector	
  in	
  place?	
  
different engines so we’ve seen
the good, the bad and the ugly;
Quality/Nature	
  of	
  source	
  
now we can better appreciate
Quality	
  (Final)	
  -­‐	
  LQA	
  	
  
what we have
	
  Internal	
  quality	
  verificaDon	
  
Quality	
  (MT)	
  -­‐	
  Automa3c	
  Scores	
  
	
  A	
  set	
  of	
  automaDc	
  scoring	
  systems	
  is	
  used	
  
Scorecard - Metrics
Overall	
  data

	
  	
  

German
KPIs
#	
  1 #	
  2 #	
  3 #	
  4
Productivity
4 4 4 4
Productivity	
  Increase
5 4 1 3
Quality	
  -­‐	
  LQA
2 2 1 2
Quality	
  -­‐	
  Automatic	
  Scores
3 3 3 3
Cost
4 2 3 3
GlobalSight	
  -­‐	
  Connectivity	
  
4 3 2 4
GlobalSight	
  -­‐	
  Tagging	
  
4 2 4 2
Human	
  Evaluation
3 3 3 4
Customization	
  -­‐	
  Internal/External
4 2 3 3
Customization	
  -­‐	
  Time
3 1 2 1
Total
36 26 26 29

French
KPIs
#	
  1 #	
  2 #	
  3 #	
  4
Productivity
4 5 3 4
Productivity	
  Increase
5 5 1 4
Quality	
  -­‐	
  LQA
5 3 3 4
Quality	
  -­‐	
  Automatic	
  Scores
3 4 3 3
Cost
4 2 3 3
GlobalSight	
  -­‐	
  Connectivity	
  
4 3 2 4
GlobalSight	
  -­‐	
  Tagging	
  
4 2 2 2
Human	
  Evaluation
3 3 3 3
Customization	
  -­‐	
  Internal/External 4 2 3 3
Customization	
  -­‐	
  Time
3 1 2 1
Total
39 30 25 31

ProducDvity	
  metrics	
  

AutomaDc	
  Scoring	
  
Human	
  EvaluaDon	
  
Toolkits and Trainings
Our	
  experience:	
  	
  
	
  
ü 	
  Most	
  translators	
  know	
  and	
  have	
  experienced	
  post-­‐ediDng	
  but	
  they	
  have	
  
limited	
  knowledge	
  of	
  any	
  other	
  related	
  aspect	
  (automaDc	
  scoring,	
  output	
  
differences	
  between	
  RBMT	
  and	
  SMT...)	
  
ü 	
  The	
  majority	
  of	
  people	
  who	
  work	
  in	
  localizaDon	
  have	
  heard	
  about	
  MT	
  but	
  
most	
  of	
  them	
  sDll	
  find	
  it	
  a	
  daunDng	
  subject.	
  
Our	
  answer:	
  
	
  
ü 	
  ConDnuous	
  MT	
  and	
  PE	
  related	
  trainings	
  and	
  documentaDon	
  for	
  language	
  
providers	
  
ü 	
  Customized	
  Toolkits	
  for	
  different	
  internal	
  departments	
  (ProducDon,	
  Quality,	
  
Sales,	
  Vendor	
  Management)	
  
Transparency and Ownership
Theory	
  –	
  knowledge	
  foundaDons	
  
	
  
Prac3ce	
  –	
  customized	
  PE	
  sessions	
  for	
  different	
  client	
  accounts	
  
	
  
	
  
	
  
Transparency	
  –	
  process,	
  engine	
  selecDon/customizaDon,	
  evaluaDons	
  
Training  helps a lot - After I was told
some of the background information
and tips and tricks for certain engines/
outputs, I was much more relaxed
and happy to give MT a go.

Responsibility	
  –	
  valid	
  evaluaDons,	
  construcDve	
  feedback,	
  quality	
  ownership	
  
Legacy data – best prediction tool
	
  

>	
  StaDsDcs	
  from	
  legacy	
  knowledge	
  base	
  
The feedback loop
For me the biggest
advantage would be
the possibility to
implement a client
terminology list [in SMT]

I wish we could easily fix
the corpus for outdated
terminology and
characters

Teach the engine to properly
cope with sentences containing
more than one verb and/or
verbs in progressive form

engine retraining improved significantly the
handling of tags and spaces around tags,
this is a productive achievement as it saves
us a lot of manual corrections.
Feedback and Engine Improvement
“Beyond the Engine” Tools
•  Teaminology	
  -­‐	
  crowdsourcing	
  plamorm	
  for	
  centralized	
  term	
  governance;	
  simultaneous	
  
concordance	
  search	
  of	
  TMs	
  and	
  term	
  bases	
  =>	
  clean	
  training	
  data	
  
•  Dispatcher	
  -­‐	
  A	
  global	
  community	
  content	
  translaDon	
  applicaDon	
  that	
  connects	
  user	
  
generated	
  content	
  (UGC)	
  including	
  live	
  chats,	
  social	
  media,	
  forums,	
  comments	
  and	
  
knowledge	
  bases	
  to	
  customized	
  machine	
  translaDon	
  (MT)	
  engines	
  for	
  real-­‐Dme	
  
translaDon	
  
•  Source	
  Candidate	
  Scorer	
  –	
  scoring	
  of	
  candidate	
  sentences	
  against	
  historically	
  good	
  and	
  
bad	
  sentences	
  based	
  on	
  POS	
  and	
  perplexity	
  
	
  
•  Corpus	
  Prepara3on	
  Toolkit	
  –	
  set	
  of	
  applicaDon	
  to	
  maximize	
  data	
  preparaDon	
  for	
  MT	
  
engine	
  training	
  
Teaminology

Teaminology
Dispatcher
Source Candidate Scorer
Source
Candidate
Scorer

Compares	
  your	
  source	
  content	
  to	
  “the	
  good”	
  and	
  “the	
  bad”	
  
legacy	
  segments	
  and	
  esDmates	
  potenDal	
  suitability	
  for	
  MT	
  
Corpus Preparation Suite
	
  
• 
• 
• 
• 
• 
• 
• 

Variety	
  of	
  tools	
  to	
  prepare	
  corpus	
  for	
  training	
  MT	
  engines	
  such	
  as:	
  
DeleDng	
  formaong	
  tags	
  from	
  TMX	
  
Removing	
  double	
  spaces	
  
Removing	
  duplicated	
  punctuaDon	
  (e.g.	
  commas)	
  
DeleDng	
  segments	
  where	
  source	
  =	
  target	
  
DeleDng	
  segments	
  containing	
  only	
  URLs	
  
Escaping	
  characters	
  
Removing	
  duplicate	
  sentences	
  
Corpus Preparation: TM Creator
Aggregates	
  training	
  data	
  from	
  various	
  relevant	
  sources	
  

TM Creator
Corpus Preparation: TMX Splitter

Extracts	
  the	
  relevant	
  training	
  corpus	
  
based	
  on	
  the	
  TMX	
  metadata	
  	
  
Welocalize Moses Implementation
•  Why?	
  Far	
  more	
  control	
  over	
  engine	
  quality	
  since	
  we	
  can	
  control	
  corpus	
  
preparaDon	
  and	
  output	
  post-­‐processing	
  
•  Control	
  over	
  metadata	
  handling	
  
•  Ties	
  into	
  our	
  company	
  open-­‐source	
  philosophy	
  
•  Have	
  experienced	
  personnel	
  in-­‐house	
  
•  Can	
  extend	
  and	
  customize	
  Moses	
  funcDonality	
  as	
  necessary	
  
•  Have	
  connector	
  to	
  TMS	
  (GlobalSight)	
  	
  
	
  
RESULTS:	
  In	
  our	
  internal	
  tests	
  with	
  Moses/DoMT,	
  we	
  are	
  geong	
  automated	
  
scores	
  similar	
  to	
  commercial	
  engines	
  for	
  the	
  languages	
  into	
  which	
  we	
  localize	
  
most.	
  	
  
Same	
  feedback	
  received	
  from	
  human	
  evaluators	
  
	
  
… And it works!
We are in the position to offer realistic discounts and aggressive
timelines providing quality levels appropriate for the content
“Work-in-progress” Projects

•  Ongoing improvements to our adaptation of iOmegaT tool
(Welocalize/CNGL)
•  Industry Partner in CNGL “Source Content Profiler” project
•  Adoption of TMTPrime (CNGL) - MT vs. Fuzzy Match selection
mechanism
•  Language and content-specific pre-processing for the inhouse Moses deployment
•  Teaminology – adding linguistic intelligence
Questions

Language_Tools_Group_all@welocalize.com
	
  
We	
  speak	
  MT	
  -­‐	
  the	
  language	
  of	
  the	
  future
	
  

Más contenido relacionado

La actualidad más candente

Mahendra-9+yrs-testing-VMware Resume
Mahendra-9+yrs-testing-VMware ResumeMahendra-9+yrs-testing-VMware Resume
Mahendra-9+yrs-testing-VMware Resume
mahi aluri
 
US Resume Rohit Sharma
US Resume Rohit SharmaUS Resume Rohit Sharma
US Resume Rohit Sharma
Rohit Sharma
 
Fion_CV_20102016
Fion_CV_20102016Fion_CV_20102016
Fion_CV_20102016
Fion Yong
 
Long Term IT Contract Positions
Long Term IT Contract PositionsLong Term IT Contract Positions
Long Term IT Contract Positions
swanhrconsulting
 
Amit Kumar Dash Resume_QA Lead
Amit Kumar Dash Resume_QA LeadAmit Kumar Dash Resume_QA Lead
Amit Kumar Dash Resume_QA Lead
Amit Dash
 
Scott A Frantz resume
Scott A Frantz resumeScott A Frantz resume
Scott A Frantz resume
Scott Frantz
 
Ankita_Bhatnagar_TestLead_FSI_1.0
Ankita_Bhatnagar_TestLead_FSI_1.0Ankita_Bhatnagar_TestLead_FSI_1.0
Ankita_Bhatnagar_TestLead_FSI_1.0
Ankita Bhatnagar
 
SandeepKola_CAPPM_Consultant
SandeepKola_CAPPM_ConsultantSandeepKola_CAPPM_Consultant
SandeepKola_CAPPM_Consultant
Sandeep Kola
 

La actualidad más candente (20)

Mahendra-9+yrs-testing-VMware Resume
Mahendra-9+yrs-testing-VMware ResumeMahendra-9+yrs-testing-VMware Resume
Mahendra-9+yrs-testing-VMware Resume
 
US Resume Rohit Sharma
US Resume Rohit SharmaUS Resume Rohit Sharma
US Resume Rohit Sharma
 
5 challenges of scaling l10n workflows KantanMT/bmmt webinar
5 challenges of scaling l10n workflows KantanMT/bmmt webinar5 challenges of scaling l10n workflows KantanMT/bmmt webinar
5 challenges of scaling l10n workflows KantanMT/bmmt webinar
 
KotaSriHarsha
KotaSriHarsha KotaSriHarsha
KotaSriHarsha
 
Fion_CV_20102016
Fion_CV_20102016Fion_CV_20102016
Fion_CV_20102016
 
Software developer occupational brief
Software developer occupational briefSoftware developer occupational brief
Software developer occupational brief
 
Nripendra
NripendraNripendra
Nripendra
 
Long Term IT Contract Positions
Long Term IT Contract PositionsLong Term IT Contract Positions
Long Term IT Contract Positions
 
Amit Kumar Dash Resume_QA Lead
Amit Kumar Dash Resume_QA LeadAmit Kumar Dash Resume_QA Lead
Amit Kumar Dash Resume_QA Lead
 
Scott A Frantz resume
Scott A Frantz resumeScott A Frantz resume
Scott A Frantz resume
 
Saravanaperumal b
Saravanaperumal bSaravanaperumal b
Saravanaperumal b
 
Bhavani_D_Testing
Bhavani_D_TestingBhavani_D_Testing
Bhavani_D_Testing
 
Mainframes project
Mainframes projectMainframes project
Mainframes project
 
Sumesh_Appunni_CV
Sumesh_Appunni_CVSumesh_Appunni_CV
Sumesh_Appunni_CV
 
Ankita_Bhatnagar_TestLead_FSI_1.0
Ankita_Bhatnagar_TestLead_FSI_1.0Ankita_Bhatnagar_TestLead_FSI_1.0
Ankita_Bhatnagar_TestLead_FSI_1.0
 
DeepaShetty
DeepaShettyDeepaShetty
DeepaShetty
 
K Sreedhar
K SreedharK Sreedhar
K Sreedhar
 
Writwik (1)
Writwik (1)Writwik (1)
Writwik (1)
 
Bhanu prasad profile
Bhanu prasad profileBhanu prasad profile
Bhanu prasad profile
 
SandeepKola_CAPPM_Consultant
SandeepKola_CAPPM_ConsultantSandeepKola_CAPPM_Consultant
SandeepKola_CAPPM_Consultant
 

Destacado

Destacado (20)

Terminology Life Cycle Management Increasing Company-Wide Terminology Collabo...
Terminology Life Cycle Management Increasing Company-Wide Terminology Collabo...Terminology Life Cycle Management Increasing Company-Wide Terminology Collabo...
Terminology Life Cycle Management Increasing Company-Wide Terminology Collabo...
 
Common industry API for translation services presented by TAUS at FEISGILTT
Common industry API for translation services presented by TAUS at FEISGILTTCommon industry API for translation services presented by TAUS at FEISGILTT
Common industry API for translation services presented by TAUS at FEISGILTT
 
TAUS Best Practices Error Typology Guidelines
TAUS Best Practices Error Typology GuidelinesTAUS Best Practices Error Typology Guidelines
TAUS Best Practices Error Typology Guidelines
 
TAUS Best Practices Adequacy/Fluency Guidelines
TAUS Best Practices Adequacy/Fluency GuidelinesTAUS Best Practices Adequacy/Fluency Guidelines
TAUS Best Practices Adequacy/Fluency Guidelines
 
TAUS USER CONFERENCE 2010, Machine translation in the imperfect world - Pract...
TAUS USER CONFERENCE 2010, Machine translation in the imperfect world - Pract...TAUS USER CONFERENCE 2010, Machine translation in the imperfect world - Pract...
TAUS USER CONFERENCE 2010, Machine translation in the imperfect world - Pract...
 
TAUS USER CONFERENCE 2010, What’s on the horizon? The research agenda
TAUS USER CONFERENCE 2010, What’s on the horizon? The research agendaTAUS USER CONFERENCE 2010, What’s on the horizon? The research agenda
TAUS USER CONFERENCE 2010, What’s on the horizon? The research agenda
 
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Seattle, Language Processing T...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Seattle, Language Processing T...TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Seattle, Language Processing T...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Seattle, Language Processing T...
 
WEBINAR: TAUS Outlook 2013
WEBINAR: TAUS Outlook 2013WEBINAR: TAUS Outlook 2013
WEBINAR: TAUS Outlook 2013
 
TAUS USER CONFERENCE 2010, The Deep Hybrid machine translation engine
TAUS USER CONFERENCE 2010, The Deep Hybrid machine translation engineTAUS USER CONFERENCE 2010, The Deep Hybrid machine translation engine
TAUS USER CONFERENCE 2010, The Deep Hybrid machine translation engine
 
Mirai Translate - TAUS Tokyo 2015
Mirai Translate - TAUS Tokyo 2015Mirai Translate - TAUS Tokyo 2015
Mirai Translate - TAUS Tokyo 2015
 
How to keep post-editors engaged and prevent attrition. (Jose Sanchez, eBay)
How to keep post-editors engaged and prevent attrition. (Jose Sanchez, eBay)How to keep post-editors engaged and prevent attrition. (Jose Sanchez, eBay)
How to keep post-editors engaged and prevent attrition. (Jose Sanchez, eBay)
 
TAUS New Year's Reception 2014
TAUS New Year's Reception 2014TAUS New Year's Reception 2014
TAUS New Year's Reception 2014
 
TAUS Moses Roundtable, Prague, 11 September 2013
TAUS Moses Roundtable, Prague, 11 September 2013TAUS Moses Roundtable, Prague, 11 September 2013
TAUS Moses Roundtable, Prague, 11 September 2013
 
TAUS USER CONFERENCE 2010, Man, Machine and advanced translation memory lever...
TAUS USER CONFERENCE 2010, Man, Machine and advanced translation memory lever...TAUS USER CONFERENCE 2010, Man, Machine and advanced translation memory lever...
TAUS USER CONFERENCE 2010, Man, Machine and advanced translation memory lever...
 
MT domain customization – conditions and benefits. Chris Wendt (Microsoft)
MT domain customization – conditions and benefits. Chris Wendt (Microsoft)MT domain customization – conditions and benefits. Chris Wendt (Microsoft)
MT domain customization – conditions and benefits. Chris Wendt (Microsoft)
 
Workshop on Multilingual Data Value Chains in the Digital Single Market, 16 ...
Workshop on Multilingual Data Value Chains in the Digital Single Market,  16 ...Workshop on Multilingual Data Value Chains in the Digital Single Market,  16 ...
Workshop on Multilingual Data Value Chains in the Digital Single Market, 16 ...
 
TAUS Machine Translation Showcase, The Simplified Guide to Getting Started in...
TAUS Machine Translation Showcase, The Simplified Guide to Getting Started in...TAUS Machine Translation Showcase, The Simplified Guide to Getting Started in...
TAUS Machine Translation Showcase, The Simplified Guide to Getting Started in...
 
TAUS MT SHOWCASE, Microsoft Translator, Chris Wendt, Microsoft, 10 October 2013
TAUS MT SHOWCASE,  Microsoft Translator, Chris Wendt, Microsoft, 10 October 2013TAUS MT SHOWCASE,  Microsoft Translator, Chris Wendt, Microsoft, 10 October 2013
TAUS MT SHOWCASE, Microsoft Translator, Chris Wendt, Microsoft, 10 October 2013
 
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Fred Hollowood, Symante...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Fred Hollowood, Symante...TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Fred Hollowood, Symante...
TAUS OPEN SOURCE MACHINE TRANSLATION SHOWCASE, Paris, Fred Hollowood, Symante...
 
Quality Management in Localization Certification
Quality Management in Localization CertificationQuality Management in Localization Certification
Quality Management in Localization Certification
 

Similar a TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2013

Resume Manoj Kumar M
Resume Manoj Kumar MResume Manoj Kumar M
Resume Manoj Kumar M
Manoj Kumar
 
Asha Jacob_Resume
Asha Jacob_ResumeAsha Jacob_Resume
Asha Jacob_Resume
Asha Jacob
 
Bira-Cunha_Resume V3
Bira-Cunha_Resume V3Bira-Cunha_Resume V3
Bira-Cunha_Resume V3
Bira cunha
 

Similar a TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2013 (20)

WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
 
MT Summit 2013 Welocalize Getting the MT Recipe Right by L Casanellas and L Marg
MT Summit 2013 Welocalize Getting the MT Recipe Right by L Casanellas and L MargMT Summit 2013 Welocalize Getting the MT Recipe Right by L Casanellas and L Marg
MT Summit 2013 Welocalize Getting the MT Recipe Right by L Casanellas and L Marg
 
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...
TAUS MT SHOWCASE, Creating Competitive Advantage with Rapid Customization & D...
 
What machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happyWhat machine translation developers are doing to make post-editors happy
What machine translation developers are doing to make post-editors happy
 
Resume Aditya Santhanam
Resume Aditya SanthanamResume Aditya Santhanam
Resume Aditya Santhanam
 
Senior Quality Analyst
Senior Quality AnalystSenior Quality Analyst
Senior Quality Analyst
 
Consulting
ConsultingConsulting
Consulting
 
MOND Semantics Integration
MOND Semantics IntegrationMOND Semantics Integration
MOND Semantics Integration
 
Software Measurement: Lecture 3. Metrics in Organization
Software Measurement: Lecture 3. Metrics in OrganizationSoftware Measurement: Lecture 3. Metrics in Organization
Software Measurement: Lecture 3. Metrics in Organization
 
Yogesh Keshaowar_Profile
Yogesh Keshaowar_ProfileYogesh Keshaowar_Profile
Yogesh Keshaowar_Profile
 
Cs 568 Spring 10 Lecture 5 Estimation
Cs 568 Spring 10  Lecture 5 EstimationCs 568 Spring 10  Lecture 5 Estimation
Cs 568 Spring 10 Lecture 5 Estimation
 
Resume Manoj Kumar M
Resume Manoj Kumar MResume Manoj Kumar M
Resume Manoj Kumar M
 
RUG-Asia - ALM
RUG-Asia - ALMRUG-Asia - ALM
RUG-Asia - ALM
 
Resume
ResumeResume
Resume
 
RAJA_TripleA_7Y_EXP
RAJA_TripleA_7Y_EXPRAJA_TripleA_7Y_EXP
RAJA_TripleA_7Y_EXP
 
Improving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case StudyImproving Translator Productivity with MT: A Patent Translation Case Study
Improving Translator Productivity with MT: A Patent Translation Case Study
 
Asha Jacob_Resume
Asha Jacob_ResumeAsha Jacob_Resume
Asha Jacob_Resume
 
Lambert_testing
Lambert_testingLambert_testing
Lambert_testing
 
Bira-Cunha_Resume V3
Bira-Cunha_Resume V3Bira-Cunha_Resume V3
Bira-Cunha_Resume V3
 
Sneha_Resume
Sneha_ResumeSneha_Resume
Sneha_Resume
 

Más de TAUS - The Language Data Network

Más de TAUS - The Language Data Network (20)

TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
TAUS Global Content Summit Amsterdam 2019 / Beyond MT. A few premature reflec...
 
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
TAUS Global Content Summit Amsterdam 2019 / Measure with DQF, Dace Dzeguze (T...
 
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
TAUS Global Content Summit Amsterdam 2019 / Automatic for the People by Domin...
 
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
TAUS Global Content Summit Amsterdam 2019 / The Quantum Leap: Human Parity, C...
 
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
TAUS Global Content Summit Amsterdam 2019 / Growing Business by Connecting Co...
 
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
Achieving Translation Efficiency and Accuracy for Video Content, Xiao Yuan (P...
 
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
Introduction Innovation Contest Shenzhen by Henri Broekmate (Lionbridge)
 
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
 Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann... Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
Game Changer for Linguistic Review: Shifting the Paradigm, Klaus Fleischmann...
 
A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...A translation memory P2P trading platform - to make global translation memory...
A translation memory P2P trading platform - to make global translation memory...
 
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
Shiyibao — The Most Efficient Translation Feedback System Ever, Guanqing Hao ...
 
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
Stepes – Instant Human Translation Services for the Digital World, Carl Yao (...
 
Farmer Lv (TrueTran)
Farmer Lv (TrueTran)Farmer Lv (TrueTran)
Farmer Lv (TrueTran)
 
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
Smart Translation Resource Management: Semantic Matching, Kirk Zhang (Wiitran...
 
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
 The Theory and Practice of Computer Aided Translation Training System, Liu Q... The Theory and Practice of Computer Aided Translation Training System, Liu Q...
The Theory and Practice of Computer Aided Translation Training System, Liu Q...
 
Translation Technology Showcase in Shenzhen
Translation Technology Showcase in ShenzhenTranslation Technology Showcase in Shenzhen
Translation Technology Showcase in Shenzhen
 
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
How to efficiently use large-scale TMs in translation, Jing Zhang (Tmxmall)
 
SDL Trados Studio 2017, Jocelyn He (SDL)
SDL Trados Studio 2017, Jocelyn He (SDL)SDL Trados Studio 2017, Jocelyn He (SDL)
SDL Trados Studio 2017, Jocelyn He (SDL)
 
How we train post-editors - Yongpeng Wei (Lingosail)
How we train post-editors - Yongpeng Wei (Lingosail)How we train post-editors - Yongpeng Wei (Lingosail)
How we train post-editors - Yongpeng Wei (Lingosail)
 
A use-case for getting MT into your company, Kerstin Berns (berns language c...
 A use-case for getting MT into your company, Kerstin Berns (berns language c... A use-case for getting MT into your company, Kerstin Berns (berns language c...
A use-case for getting MT into your company, Kerstin Berns (berns language c...
 
QE integrated in XTM, by Bob Willans (XTM)
QE integrated in XTM, by Bob Willans (XTM)QE integrated in XTM, by Bob Willans (XTM)
QE integrated in XTM, by Bob Willans (XTM)
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024Top 10 Most Downloaded Games on Play Store in 2024
Top 10 Most Downloaded Games on Play Store in 2024
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdfUnderstanding Discord NSFW Servers A Guide for Responsible Users.pdf
Understanding Discord NSFW Servers A Guide for Responsible Users.pdf
 
Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024Manulife - Insurer Innovation Award 2024
Manulife - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 

TAUS MT SHOWCASE, The WeMT Program, Olga Beregovaya, Welocalize, 10 October 2013

  • 1. TAUS  MACHINE  TRANSLATION  SHOWCASE   The WeMT Program 10:20 – 10:40 Thursday, 10 October 2013 Olga Beregovaya Welocalize
  • 2. WeMT  Tools  and   Processes  
  • 3. We’ll talk about: •  MT  Programs   •  Metrics   •  Engines   •  Language  Tools  
  • 4. Current MT Programs   Dell  –  27  languages   Autodesk  –  11  languages   PayPal    -­‐  8  languages   Cisco  –  17  languages  between  3  Ders   Intuit  –  20+languages   MicrosoH  (pre-­‐project  support)     McAfee  (pilot)     …  many  more  in  pilot  stage  
  • 5. MT Program: Path-to-Success Components   A  set  of  MT  engines  –  “mix  and  match”     TMT  SelecDon  Mechanisms   Post-­‐ediDng  Environment     Processes  and  metrics     Data  gathering  and  reporDng  tool  –  what,   how  much,  how  fast  and  at  what  effort     EDUCATION  EDUCATION  EDUCATION     CHANGE   The recipe for success
  • 6. Process and Workflow All aspects of the localization ecosystem are taken into consideration MT KPIs: Selec3ng  the  right  MT  engine   By  using  our  MT  engine  selecDon  Scorecard  we  make  sure  all   important  KPIs  are  taken  into  consideraDon  at  selecDon  Dme     Empowerment  through  educa3on   Internal,  by  the  use  of  customized  Toolkits;  external,  through   specialised  Trainings.   The  feedback  loop   ConstrucDve  communicaDon  from  post-­‐editor  to  MT   provider   ü   Produc3vity:  Throughputs   ü   Produc3vity:  Delta     ü   Quality:  LQA     ü   Quality:  Automa3c  Scores   ü   Cost   ü   GlobalSight:  Connec3vity     ü   GlobalSight:  Tagging     ü   Human  Evalua3on   ü   Customiza3on:  Internal/External   ü   Customiza3on:  Time  
  • 7. MT Program Design - Source o  o  o  o  o  o  Source  content  classificaDon  (i.e.  markeDng/UI/UA/UGC)   Length  of  the  source  segment   Source  segment  morpho-­‐syntacDc  complexity   Presence/absence  of  pre-­‐defined  glossary  terms  or  mulD-­‐word  glossary   elements,  UI  elements,  numeric  variables,  product  lists,  ‘do-­‐not-­‐translate’   and  transliteraDon  lists   Tag  density  -­‐  Metadata  aeributes  and  their  representaDon  in  localizaDon   industry  standard  formats  (“tags”)   ROC  –  quality  levels  based  on  content  use  (“impact”)   3D  Model:  Expected  producDvity  mapped  to  desired  quality  levels  and  source   content  complexity    
  • 8. MT Engine Selection Scorecard Produc3vity  -­‐  Throughputs    Number  of  post-­‐edited  words  per  hour   Produc3vity  -­‐  Delta      Percentage  difference  between  translaDon  and  post-­‐                              ediDng  Dme   Cost    ExtrapolaDon,  cost  per  word   CMS  -­‐  Connec3vity     We have tested and used  Is  there  a  connector  in  place?   different engines so we’ve seen the good, the bad and the ugly; Quality/Nature  of  source   now we can better appreciate Quality  (Final)  -­‐  LQA     what we have  Internal  quality  verificaDon   Quality  (MT)  -­‐  Automa3c  Scores    A  set  of  automaDc  scoring  systems  is  used  
  • 9. Scorecard - Metrics Overall  data     German KPIs #  1 #  2 #  3 #  4 Productivity 4 4 4 4 Productivity  Increase 5 4 1 3 Quality  -­‐  LQA 2 2 1 2 Quality  -­‐  Automatic  Scores 3 3 3 3 Cost 4 2 3 3 GlobalSight  -­‐  Connectivity   4 3 2 4 GlobalSight  -­‐  Tagging   4 2 4 2 Human  Evaluation 3 3 3 4 Customization  -­‐  Internal/External 4 2 3 3 Customization  -­‐  Time 3 1 2 1 Total 36 26 26 29 French KPIs #  1 #  2 #  3 #  4 Productivity 4 5 3 4 Productivity  Increase 5 5 1 4 Quality  -­‐  LQA 5 3 3 4 Quality  -­‐  Automatic  Scores 3 4 3 3 Cost 4 2 3 3 GlobalSight  -­‐  Connectivity   4 3 2 4 GlobalSight  -­‐  Tagging   4 2 2 2 Human  Evaluation 3 3 3 3 Customization  -­‐  Internal/External 4 2 3 3 Customization  -­‐  Time 3 1 2 1 Total 39 30 25 31 ProducDvity  metrics   AutomaDc  Scoring   Human  EvaluaDon  
  • 10. Toolkits and Trainings Our  experience:       ü   Most  translators  know  and  have  experienced  post-­‐ediDng  but  they  have   limited  knowledge  of  any  other  related  aspect  (automaDc  scoring,  output   differences  between  RBMT  and  SMT...)   ü   The  majority  of  people  who  work  in  localizaDon  have  heard  about  MT  but   most  of  them  sDll  find  it  a  daunDng  subject.   Our  answer:     ü   ConDnuous  MT  and  PE  related  trainings  and  documentaDon  for  language   providers   ü   Customized  Toolkits  for  different  internal  departments  (ProducDon,  Quality,   Sales,  Vendor  Management)  
  • 11. Transparency and Ownership Theory  –  knowledge  foundaDons     Prac3ce  –  customized  PE  sessions  for  different  client  accounts         Transparency  –  process,  engine  selecDon/customizaDon,  evaluaDons   Training  helps a lot - After I was told some of the background information and tips and tricks for certain engines/ outputs, I was much more relaxed and happy to give MT a go. Responsibility  –  valid  evaluaDons,  construcDve  feedback,  quality  ownership  
  • 12. Legacy data – best prediction tool   >  StaDsDcs  from  legacy  knowledge  base  
  • 13. The feedback loop For me the biggest advantage would be the possibility to implement a client terminology list [in SMT] I wish we could easily fix the corpus for outdated terminology and characters Teach the engine to properly cope with sentences containing more than one verb and/or verbs in progressive form engine retraining improved significantly the handling of tags and spaces around tags, this is a productive achievement as it saves us a lot of manual corrections.
  • 14. Feedback and Engine Improvement
  • 15. “Beyond the Engine” Tools •  Teaminology  -­‐  crowdsourcing  plamorm  for  centralized  term  governance;  simultaneous   concordance  search  of  TMs  and  term  bases  =>  clean  training  data   •  Dispatcher  -­‐  A  global  community  content  translaDon  applicaDon  that  connects  user   generated  content  (UGC)  including  live  chats,  social  media,  forums,  comments  and   knowledge  bases  to  customized  machine  translaDon  (MT)  engines  for  real-­‐Dme   translaDon   •  Source  Candidate  Scorer  –  scoring  of  candidate  sentences  against  historically  good  and   bad  sentences  based  on  POS  and  perplexity     •  Corpus  Prepara3on  Toolkit  –  set  of  applicaDon  to  maximize  data  preparaDon  for  MT   engine  training  
  • 18. Source Candidate Scorer Source Candidate Scorer Compares  your  source  content  to  “the  good”  and  “the  bad”   legacy  segments  and  esDmates  potenDal  suitability  for  MT  
  • 19. Corpus Preparation Suite   •  •  •  •  •  •  •  Variety  of  tools  to  prepare  corpus  for  training  MT  engines  such  as:   DeleDng  formaong  tags  from  TMX   Removing  double  spaces   Removing  duplicated  punctuaDon  (e.g.  commas)   DeleDng  segments  where  source  =  target   DeleDng  segments  containing  only  URLs   Escaping  characters   Removing  duplicate  sentences  
  • 20. Corpus Preparation: TM Creator Aggregates  training  data  from  various  relevant  sources   TM Creator
  • 21. Corpus Preparation: TMX Splitter Extracts  the  relevant  training  corpus   based  on  the  TMX  metadata    
  • 22. Welocalize Moses Implementation •  Why?  Far  more  control  over  engine  quality  since  we  can  control  corpus   preparaDon  and  output  post-­‐processing   •  Control  over  metadata  handling   •  Ties  into  our  company  open-­‐source  philosophy   •  Have  experienced  personnel  in-­‐house   •  Can  extend  and  customize  Moses  funcDonality  as  necessary   •  Have  connector  to  TMS  (GlobalSight)       RESULTS:  In  our  internal  tests  with  Moses/DoMT,  we  are  geong  automated   scores  similar  to  commercial  engines  for  the  languages  into  which  we  localize   most.     Same  feedback  received  from  human  evaluators    
  • 23. … And it works! We are in the position to offer realistic discounts and aggressive timelines providing quality levels appropriate for the content
  • 24. “Work-in-progress” Projects •  Ongoing improvements to our adaptation of iOmegaT tool (Welocalize/CNGL) •  Industry Partner in CNGL “Source Content Profiler” project •  Adoption of TMTPrime (CNGL) - MT vs. Fuzzy Match selection mechanism •  Language and content-specific pre-processing for the inhouse Moses deployment •  Teaminology – adding linguistic intelligence
  • 25. Questions Language_Tools_Group_all@welocalize.com   We  speak  MT  -­‐  the  language  of  the  future