SlideShare una empresa de Scribd logo
1 de 27
PRESENTATION
AMTA
ALEXYANISHEVSKY
Welocalize
November 2015
How Much Cake is Enough:
The Case for Domain-Specific
Engines
What is the Tipping Point?
How Much Cake Is Too Much Cake?
I
LOVE
CAKE…
TOO MUCH
CAKE! ugh!
•How many engines
•How to split domains
•How to measure success
•How to improve
AGENDA
HOW MANY ENGINES: CRITERIA
• Environment: elegant deployment?
• Cost
• How different are they from each other?
• Maintenance: engineering + linguistic feedback implementation
HOW TO SPLIT DOMAINS:
CRITERIA
• Content owner feedback
• Historical experience based on business unit or portfolio
• Naming convention
• Style analysis: difference in characteristics based on lexical diversity,
sentence length + syntactic complexity
HOW TO SPLIT DOMAINS:
TOOLS
• Build domain-specific language models + select TUs for domain by PPL
• Source Content Profiler – helps identify domain based on language models,
as well as other stylistic characteristics
• Style Scorer – higher score indicates better match to style established by
client’s documents
HOLISTIC APPROACH BASED ON SEVERAL TOOLS:
TOOLS: PERPLEXITY EVALUATOR
TU LEVEL
<tu srclang="EN-US" tuid="75438"> <prop type="x-ppl:train2">208</prop><prop type="x-
ppl:techdoc6">191.025</prop><prop type="x-ppl:support2">325.983</prop><prop type="x-
ppl:sales1">97.0736</prop><prop type="x-ppl:productLoc1">396.398</prop><prop type="x-
ppl:legal1">617.876</prop><tuv xml:lang="EN-US"> <seg>Consistent feature set across
multiple platforms (Windows, Mac, iOS, Android).</seg> </tuv> <tuv
changedate="20140325T122530Z" changeid="serviceaaa" creationdate="20140325T122530Z"
creationid="serviceaaa" lastusagedate="20140325T122530Z" usagecount="0" xml:lang="ES-XL">
<prop type="x-ALS:Context">TEXT</prop> <prop type="x-ALS:Source
File">DATATC39720SRCEN-USco-02__battle-card_enco-02__battle-card_en.inx</prop>
<seg>Conjunto de características coherente en varias plataformas (Windows, Mac, iOS,
Android)</seg> </tuv> </tu>
TOOLS: SOURCE CONTENT PROFILERIN CONJUNCTION with DCU/ADAPT
TOOLS: STYLE SCORER
COMBINES PPL RATIOS,
DISSIMILARITY SCORE +
CLASSIFICATION SCORE
TEST CATEGORY TRAINING CATEGORY SCORE
SUPPORT TECH DOC 3.16
TECH DOC TECH DOC 2.94
TECH DOC LEGAL .02
WHY USE STYLE SCORER?
• Identify similarity of source document to “gold standard” documents from that
domain and other domains
• Identify similarity of target document to “gold standard” documents from that
domain and other domains
• Example: Is this really a support document? To what degree is it similar to
other support documents, tech doc documents, etc.?
• Dissimilarity can point to worse quality for raw MT and/or reduced post-editing
productivity
STYLE SCORER + SCP
• SCP helps classify a document
• Style Scorer tells you how good a match a document is to a profile
• SCP only works on English source
• Style Scorer works on English source + non-English target
Three Rings for the Elven-kings under the sky,
Seven for the Dwarf-lords in their halls of stone,
Nine for Mortal Men doomed to die,
One for the Dark Lord on his dark throne
In the Land of Mordor where the Shadows lie.
One Ring to rule them all, One Ring to find them,
One Ring to bring them all and in the darkness bind them
In the Land of Mordor where the Shadows lie.
CASE STUDY
ONE DOMAIN?
One ring
to rule them all
CASE STUDY: HOW MANY DOMAINS?
• Started with 6 Domains: Technical Documentation, Legal, Support, Training,
Product UI, Sales/Marketing
• Found that Technical Documentation, Support + Training were very similar based
on LM scores against each other, Style Scorer, length of sentences, similar
grammatical structures
• Found that Product UI was close enough to above 3 that making a separate
engine was not warranted
• Found that Legal + Sales/Marketing were different enough from above domains
and from each other based on LM scores against each other, Style Scorer +
length of sentences
CASE STUDY: GATHERING ASSETS
TMs
• Old
• Somewhat recent
• Current
• Termbases in MultiTerm
• Existing user dictionaries + normalization dictionaries
• New user dictionaries based on term extractions + auto-import for some
languages
CASE STUDY: CURATING ASSETS
• Cleaned TMs
• Introduced metadata for PPL into TUs based on LM perplexity
• Kept the UDs + normalization dictionaries as is
• Additional term extraction for weak languages or languages with
insufficient assets
CASE STUDY: ENGINE ITERATIONS
Based on options in Systran:
• RBMT only
• Hybrid with Stemming, LM Order, Distortion, etc.
• SMT only
HOW TO MEASURE SUCCESS
• Automatic scores
• Human evaluations
• Some reduction in linguistic issues
Forthcoming
• Informed PE Distance
• Productivity
• Systematic reduction of linguistic issues
CASE STUDY: AUTOMATIC SCORES
FOR GENERAL/TECH DOC
CASE STUDY: HUMAN EVALUATIONS
FOR GENERAL/TECH DOC
CASE STUDY: AUTOMATIC SCORES
SALES/MARKETING1
CASE STUDY: AUTOMATIC SCORES
SALES/MARKETING2
CASE STUDY: AUTOMATIC SCORES
LEGAL1
CASE STUDY: AUTOMATIC SCORES
LEGAL2
CASE STUDY: AUTOMATIC SCORES
TECH DOC
HOW TO IMPROVE
Scoring
• Linguistically-informed automatic scores: PE distance with different
weights for POS
• Productivity
Engine
• Eradicate high-frequency inconsistencies between TMs, Termbases +
User Dictionaries (UDs)
• Create domain-specific UDs
• Send best reply: TMT Prime, send best translation irrespective of domain
Source
• Pre-MT source check: was this content properly categorized?
SUMMARY
• Domain-specific engines yield better results as evidenced by auto scores
and human evaluations. Some of evidence of reduction of linguistic issues.
• Group closely related content into one domain
• Determine how many engines your infrastructure can support
YOU
THANK
ALEXYANISHEVSKY
Welocalize
November 2015

Más contenido relacionado

Destacado

Destacado (11)

MT Quality Evaluations: From Test Environment to Production
MT Quality Evaluations: From Test Environment to ProductionMT Quality Evaluations: From Test Environment to Production
MT Quality Evaluations: From Test Environment to Production
 
Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...
Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...
Welocalize EAMT 2014 Presentation Assumptions, Expectations and Outliers in P...
 
Safaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-Editing
Safaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-EditingSafaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-Editing
Safaba Welocalize MT Summit 2013 Analyzing MT Utility and Post-Editing
 
Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014
Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014
Rating Evaluation Methods through Correlation MTE 2014 Workshop May 2014
 
MT Summit 2013 Welocalize Getting the MT Recipe Right by L Casanellas and L Marg
MT Summit 2013 Welocalize Getting the MT Recipe Right by L Casanellas and L MargMT Summit 2013 Welocalize Getting the MT Recipe Right by L Casanellas and L Marg
MT Summit 2013 Welocalize Getting the MT Recipe Right by L Casanellas and L Marg
 
2013 CHAT tcworld tekom Welocalize Teaminology
2013 CHAT tcworld tekom Welocalize Teaminology 2013 CHAT tcworld tekom Welocalize Teaminology
2013 CHAT tcworld tekom Welocalize Teaminology
 
EAMT Presentation by Welocalize Olga Beregovaya May 2015
EAMT Presentation by Welocalize Olga Beregovaya May 2015EAMT Presentation by Welocalize Olga Beregovaya May 2015
EAMT Presentation by Welocalize Olga Beregovaya May 2015
 
Better translations through automated source and post edit analysis
Better translations through automated source and post edit analysisBetter translations through automated source and post edit analysis
Better translations through automated source and post edit analysis
 
Stephane Domisse (John Deere) at the Industry Leaders Forum 2015
Stephane Domisse (John Deere) at the Industry Leaders Forum 2015Stephane Domisse (John Deere) at the Industry Leaders Forum 2015
Stephane Domisse (John Deere) at the Industry Leaders Forum 2015
 
Localizing for Travel: Diverse Solutions for Diverse Needs by Laura Casanell...
Localizing for Travel: Diverse Solutions for Diverse Needs by Laura Casanell...Localizing for Travel: Diverse Solutions for Diverse Needs by Laura Casanell...
Localizing for Travel: Diverse Solutions for Diverse Needs by Laura Casanell...
 
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
WeMT Tools and Processes Welocalize TAUS Showcase October 2013 Localization W...
 

Similar a How Much Cake to Eat: The Case for Targeted MT Engines

Babysitting your orm essenmacher, adam
Babysitting your orm   essenmacher, adamBabysitting your orm   essenmacher, adam
Babysitting your orm essenmacher, adam
Adam Essenmacher
 
Talk proposal get_accepted
Talk proposal get_acceptedTalk proposal get_accepted
Talk proposal get_accepted
lauraxthomson
 

Similar a How Much Cake to Eat: The Case for Targeted MT Engines (20)

Building Large Arabic Multi-Domain Resources for Sentiment Analysis
Building Large Arabic Multi-Domain Resources for Sentiment Analysis Building Large Arabic Multi-Domain Resources for Sentiment Analysis
Building Large Arabic Multi-Domain Resources for Sentiment Analysis
 
Babysitting your orm essenmacher, adam
Babysitting your orm   essenmacher, adamBabysitting your orm   essenmacher, adam
Babysitting your orm essenmacher, adam
 
The Intricacies of DITA Content Localization
The Intricacies of DITA Content LocalizationThe Intricacies of DITA Content Localization
The Intricacies of DITA Content Localization
 
The importance of terminology
The importance of terminologyThe importance of terminology
The importance of terminology
 
Opening the Black Box of Software Localization
Opening the Black Box of Software LocalizationOpening the Black Box of Software Localization
Opening the Black Box of Software Localization
 
Single-Sourcing and Localization stc16
Single-Sourcing and Localization stc16Single-Sourcing and Localization stc16
Single-Sourcing and Localization stc16
 
DITA for Localization
DITA for LocalizationDITA for Localization
DITA for Localization
 
MongoDB at Agilysys: A Case Study
MongoDB at Agilysys: A Case StudyMongoDB at Agilysys: A Case Study
MongoDB at Agilysys: A Case Study
 
TM Town - TAUS Tokyo Forum 2015
TM Town - TAUS Tokyo Forum 2015TM Town - TAUS Tokyo Forum 2015
TM Town - TAUS Tokyo Forum 2015
 
Intro to Programming Lang.pptx
Intro to Programming Lang.pptxIntro to Programming Lang.pptx
Intro to Programming Lang.pptx
 
Info 2402 irt-chapter_4
Info 2402 irt-chapter_4Info 2402 irt-chapter_4
Info 2402 irt-chapter_4
 
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...
Extending Solr: Behind CareerBuilder’s Cloud-like Knowledge Discovery Platfor...
 
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery PlatformExtending Solr: Building a Cloud-like Knowledge Discovery Platform
Extending Solr: Building a Cloud-like Knowledge Discovery Platform
 
Talk proposal get_accepted
Talk proposal get_acceptedTalk proposal get_accepted
Talk proposal get_accepted
 
Karnov Super power your search with Text Analytics - Findability Day 2014
Karnov Super power your search with Text Analytics - Findability Day 2014Karnov Super power your search with Text Analytics - Findability Day 2014
Karnov Super power your search with Text Analytics - Findability Day 2014
 
IR
IRIR
IR
 
Introduction to Testing and TDD
Introduction to Testing and TDDIntroduction to Testing and TDD
Introduction to Testing and TDD
 
The Typed Index
The Typed IndexThe Typed Index
The Typed Index
 
Getting the Most out of Your Translation Memories (TM-Town ProZ Webinar April...
Getting the Most out of Your Translation Memories (TM-Town ProZ Webinar April...Getting the Most out of Your Translation Memories (TM-Town ProZ Webinar April...
Getting the Most out of Your Translation Memories (TM-Town ProZ Webinar April...
 
Introduction
IntroductionIntroduction
Introduction
 

Más de Welocalize

Más de Welocalize (7)

Automating the Localization Workflow. What Works?
Automating the Localization Workflow. What Works?Automating the Localization Workflow. What Works?
Automating the Localization Workflow. What Works?
 
Welocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
Welocalize Throughputs and Post-Editing Productivity Webinar Laura CasanellasWelocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
Welocalize Throughputs and Post-Editing Productivity Webinar Laura Casanellas
 
Content Marketing World 2014 Language Fun Fact Challenge by welocalize
Content Marketing World 2014 Language Fun Fact Challenge by welocalizeContent Marketing World 2014 Language Fun Fact Challenge by welocalize
Content Marketing World 2014 Language Fun Fact Challenge by welocalize
 
Welocalize Cisco CNGL Partnership Shared at Localization World Dublin 2014
Welocalize Cisco CNGL Partnership Shared at Localization World Dublin 2014Welocalize Cisco CNGL Partnership Shared at Localization World Dublin 2014
Welocalize Cisco CNGL Partnership Shared at Localization World Dublin 2014
 
TAUS Quality Summit Dublin Welocalize Presentation by Olga Beregovaya and Len...
TAUS Quality Summit Dublin Welocalize Presentation by Olga Beregovaya and Len...TAUS Quality Summit Dublin Welocalize Presentation by Olga Beregovaya and Len...
TAUS Quality Summit Dublin Welocalize Presentation by Olga Beregovaya and Len...
 
Beyond Disruption: Make Way for Return on Content by Welocalize Olga Beregovaya
Beyond Disruption: Make Way for Return on Content by Welocalize Olga BeregovayaBeyond Disruption: Make Way for Return on Content by Welocalize Olga Beregovaya
Beyond Disruption: Make Way for Return on Content by Welocalize Olga Beregovaya
 
Overcoming “Old Fears” in the “New Marketing” World by Informatica and Weloca...
Overcoming “Old Fears” in the “New Marketing” World by Informatica and Weloca...Overcoming “Old Fears” in the “New Marketing” World by Informatica and Weloca...
Overcoming “Old Fears” in the “New Marketing” World by Informatica and Weloca...
 

Último

unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabiunwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
Abortion pills in Kuwait Cytotec pills in Kuwait
 
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Sheetaleventcompany
 
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai KuwaitThe Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
daisycvs
 
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
lizamodels9
 
Call Girls From Raj Nagar Extension Ghaziabad❤️8448577510 ⊹Best Escorts Servi...
Call Girls From Raj Nagar Extension Ghaziabad❤️8448577510 ⊹Best Escorts Servi...Call Girls From Raj Nagar Extension Ghaziabad❤️8448577510 ⊹Best Escorts Servi...
Call Girls From Raj Nagar Extension Ghaziabad❤️8448577510 ⊹Best Escorts Servi...
lizamodels9
 
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al MizharAl Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
allensay1
 
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
amitlee9823
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
daisycvs
 

Último (20)

Business Model Canvas (BMC)- A new venture concept
Business Model Canvas (BMC)-  A new venture conceptBusiness Model Canvas (BMC)-  A new venture concept
Business Model Canvas (BMC)- A new venture concept
 
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabiunwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
unwanted pregnancy Kit [+918133066128] Abortion Pills IN Dubai UAE Abudhabi
 
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRLBAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
BAGALUR CALL GIRL IN 98274*61493 ❤CALL GIRLS IN ESCORT SERVICE❤CALL GIRL
 
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service AvailableCall Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
Call Girls Ludhiana Just Call 98765-12871 Top Class Call Girl Service Available
 
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
Chandigarh Escorts Service 📞8868886958📞 Just📲 Call Nihal Chandigarh Call Girl...
 
Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...
Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...
Unveiling Falcon Invoice Discounting: Leading the Way as India's Premier Bill...
 
Cheap Rate Call Girls In Noida Sector 62 Metro 959961乂3876
Cheap Rate Call Girls In Noida Sector 62 Metro 959961乂3876Cheap Rate Call Girls In Noida Sector 62 Metro 959961乂3876
Cheap Rate Call Girls In Noida Sector 62 Metro 959961乂3876
 
(Anamika) VIP Call Girls Napur Call Now 8617697112 Napur Escorts 24x7
(Anamika) VIP Call Girls Napur Call Now 8617697112 Napur Escorts 24x7(Anamika) VIP Call Girls Napur Call Now 8617697112 Napur Escorts 24x7
(Anamika) VIP Call Girls Napur Call Now 8617697112 Napur Escorts 24x7
 
Falcon Invoice Discounting platform in india
Falcon Invoice Discounting platform in indiaFalcon Invoice Discounting platform in india
Falcon Invoice Discounting platform in india
 
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai KuwaitThe Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
The Abortion pills for sale in Qatar@Doha [+27737758557] []Deira Dubai Kuwait
 
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
Russian Call Girls In Gurgaon ❤️8448577510 ⊹Best Escorts Service In 24/7 Delh...
 
Falcon Invoice Discounting: Empowering Your Business Growth
Falcon Invoice Discounting: Empowering Your Business GrowthFalcon Invoice Discounting: Empowering Your Business Growth
Falcon Invoice Discounting: Empowering Your Business Growth
 
Call Girls From Raj Nagar Extension Ghaziabad❤️8448577510 ⊹Best Escorts Servi...
Call Girls From Raj Nagar Extension Ghaziabad❤️8448577510 ⊹Best Escorts Servi...Call Girls From Raj Nagar Extension Ghaziabad❤️8448577510 ⊹Best Escorts Servi...
Call Girls From Raj Nagar Extension Ghaziabad❤️8448577510 ⊹Best Escorts Servi...
 
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Hebbal Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al MizharAl Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
Al Mizhar Dubai Escorts +971561403006 Escorts Service In Al Mizhar
 
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
Call Girls Jp Nagar Just Call 👗 7737669865 👗 Top Class Call Girl Service Bang...
 
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
Quick Doctor In Kuwait +2773`7758`557 Kuwait Doha Qatar Dubai Abu Dhabi Sharj...
 
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
The Path to Product Excellence: Avoiding Common Pitfalls and Enhancing Commun...
 
Lundin Gold - Q1 2024 Conference Call Presentation (Revised)
Lundin Gold - Q1 2024 Conference Call Presentation (Revised)Lundin Gold - Q1 2024 Conference Call Presentation (Revised)
Lundin Gold - Q1 2024 Conference Call Presentation (Revised)
 
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60% in 6 Months
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60%  in 6 MonthsSEO Case Study: How I Increased SEO Traffic & Ranking by 50-60%  in 6 Months
SEO Case Study: How I Increased SEO Traffic & Ranking by 50-60% in 6 Months
 

How Much Cake to Eat: The Case for Targeted MT Engines

  • 1. PRESENTATION AMTA ALEXYANISHEVSKY Welocalize November 2015 How Much Cake is Enough: The Case for Domain-Specific Engines
  • 2. What is the Tipping Point? How Much Cake Is Too Much Cake? I LOVE CAKE… TOO MUCH CAKE! ugh!
  • 3. •How many engines •How to split domains •How to measure success •How to improve AGENDA
  • 4. HOW MANY ENGINES: CRITERIA • Environment: elegant deployment? • Cost • How different are they from each other? • Maintenance: engineering + linguistic feedback implementation
  • 5. HOW TO SPLIT DOMAINS: CRITERIA • Content owner feedback • Historical experience based on business unit or portfolio • Naming convention • Style analysis: difference in characteristics based on lexical diversity, sentence length + syntactic complexity
  • 6. HOW TO SPLIT DOMAINS: TOOLS • Build domain-specific language models + select TUs for domain by PPL • Source Content Profiler – helps identify domain based on language models, as well as other stylistic characteristics • Style Scorer – higher score indicates better match to style established by client’s documents HOLISTIC APPROACH BASED ON SEVERAL TOOLS:
  • 7. TOOLS: PERPLEXITY EVALUATOR TU LEVEL <tu srclang="EN-US" tuid="75438"> <prop type="x-ppl:train2">208</prop><prop type="x- ppl:techdoc6">191.025</prop><prop type="x-ppl:support2">325.983</prop><prop type="x- ppl:sales1">97.0736</prop><prop type="x-ppl:productLoc1">396.398</prop><prop type="x- ppl:legal1">617.876</prop><tuv xml:lang="EN-US"> <seg>Consistent feature set across multiple platforms (Windows, Mac, iOS, Android).</seg> </tuv> <tuv changedate="20140325T122530Z" changeid="serviceaaa" creationdate="20140325T122530Z" creationid="serviceaaa" lastusagedate="20140325T122530Z" usagecount="0" xml:lang="ES-XL"> <prop type="x-ALS:Context">TEXT</prop> <prop type="x-ALS:Source File">DATATC39720SRCEN-USco-02__battle-card_enco-02__battle-card_en.inx</prop> <seg>Conjunto de características coherente en varias plataformas (Windows, Mac, iOS, Android)</seg> </tuv> </tu>
  • 8. TOOLS: SOURCE CONTENT PROFILERIN CONJUNCTION with DCU/ADAPT
  • 9. TOOLS: STYLE SCORER COMBINES PPL RATIOS, DISSIMILARITY SCORE + CLASSIFICATION SCORE TEST CATEGORY TRAINING CATEGORY SCORE SUPPORT TECH DOC 3.16 TECH DOC TECH DOC 2.94 TECH DOC LEGAL .02
  • 10. WHY USE STYLE SCORER? • Identify similarity of source document to “gold standard” documents from that domain and other domains • Identify similarity of target document to “gold standard” documents from that domain and other domains • Example: Is this really a support document? To what degree is it similar to other support documents, tech doc documents, etc.? • Dissimilarity can point to worse quality for raw MT and/or reduced post-editing productivity
  • 11. STYLE SCORER + SCP • SCP helps classify a document • Style Scorer tells you how good a match a document is to a profile • SCP only works on English source • Style Scorer works on English source + non-English target
  • 12. Three Rings for the Elven-kings under the sky, Seven for the Dwarf-lords in their halls of stone, Nine for Mortal Men doomed to die, One for the Dark Lord on his dark throne In the Land of Mordor where the Shadows lie. One Ring to rule them all, One Ring to find them, One Ring to bring them all and in the darkness bind them In the Land of Mordor where the Shadows lie. CASE STUDY ONE DOMAIN? One ring to rule them all
  • 13. CASE STUDY: HOW MANY DOMAINS? • Started with 6 Domains: Technical Documentation, Legal, Support, Training, Product UI, Sales/Marketing • Found that Technical Documentation, Support + Training were very similar based on LM scores against each other, Style Scorer, length of sentences, similar grammatical structures • Found that Product UI was close enough to above 3 that making a separate engine was not warranted • Found that Legal + Sales/Marketing were different enough from above domains and from each other based on LM scores against each other, Style Scorer + length of sentences
  • 14. CASE STUDY: GATHERING ASSETS TMs • Old • Somewhat recent • Current • Termbases in MultiTerm • Existing user dictionaries + normalization dictionaries • New user dictionaries based on term extractions + auto-import for some languages
  • 15. CASE STUDY: CURATING ASSETS • Cleaned TMs • Introduced metadata for PPL into TUs based on LM perplexity • Kept the UDs + normalization dictionaries as is • Additional term extraction for weak languages or languages with insufficient assets
  • 16. CASE STUDY: ENGINE ITERATIONS Based on options in Systran: • RBMT only • Hybrid with Stemming, LM Order, Distortion, etc. • SMT only
  • 17. HOW TO MEASURE SUCCESS • Automatic scores • Human evaluations • Some reduction in linguistic issues Forthcoming • Informed PE Distance • Productivity • Systematic reduction of linguistic issues
  • 18. CASE STUDY: AUTOMATIC SCORES FOR GENERAL/TECH DOC
  • 19. CASE STUDY: HUMAN EVALUATIONS FOR GENERAL/TECH DOC
  • 20. CASE STUDY: AUTOMATIC SCORES SALES/MARKETING1
  • 21. CASE STUDY: AUTOMATIC SCORES SALES/MARKETING2
  • 22. CASE STUDY: AUTOMATIC SCORES LEGAL1
  • 23. CASE STUDY: AUTOMATIC SCORES LEGAL2
  • 24. CASE STUDY: AUTOMATIC SCORES TECH DOC
  • 25. HOW TO IMPROVE Scoring • Linguistically-informed automatic scores: PE distance with different weights for POS • Productivity Engine • Eradicate high-frequency inconsistencies between TMs, Termbases + User Dictionaries (UDs) • Create domain-specific UDs • Send best reply: TMT Prime, send best translation irrespective of domain Source • Pre-MT source check: was this content properly categorized?
  • 26. SUMMARY • Domain-specific engines yield better results as evidenced by auto scores and human evaluations. Some of evidence of reduction of linguistic issues. • Group closely related content into one domain • Determine how many engines your infrastructure can support

Notas del editor

  1. LM built using KenLM with ngram=5