SlideShare una empresa de Scribd logo
1 de 13
Descargar para leer sin conexión
Intento
1
BEST vs. FIT TO PURPOSE
© Intento, Inc. / November 2019
linguistically best
for my
language pair
MT with a proper data
protection and retention
policy, proper level of custom
terminology support, which
trains on my linguistic assets
with good ROI per my business
goals, and works well with my
source data quality, format
and content type according to
my subject matter experts
vs.
Intento
2
BEST vs. FIT TO PURPOSE
© Intento, Inc. / November 2019
linguistically best
for my
language pair
MT with a proper data
protection and retention
policy, proper level of custom
terminology support, which
trains on my linguistic assets
with good ROI per my business
goals, and works well with my
source data quality, format
and content type according to
my subject matter experts
vs.
on-the-fly
MT routing
based on the
historical data
automated
procurement
and vendor
management
Intento
3
PRACTICAL
APPROACH
© Intento, Inc. / November 2019
1
2
3
4
Select candidate MT systems
Improve with your data assets
Automated scoring
Human-assisted scoring
Intento
1. SELECT CANDIDATE ENGINES
4© Intento, Inc. / November 2019
GENERIC STOCK MODELS
Alibaba Amazon Baidu DeepL eBay Google
GTCom IBM Kakao Microsoft Mirai ModernMT
Niutrans Naver Omniscien PROMT Rozetta SAP
SDL Sogou Systran Tencent Tilde Yandex
VERTICAL STOCK MODELS
CUSTOM TERMINOLOGY SUPPORT
AUTO DOMAIN ADAPTATION MANUAL DOMAIN ADAPTATION
Youdao
Alibaba Baidu
Cloud
Translate
Iconic Microsoft Omniscien
PROMT SAP Systran
Amazon Baidu Google IBM Iconic Microsoft Rozetta SDL Systran
Globalese Google IBM
Kantan Microsoft ModernMT
Omniscien SDL Systran
Alibaba Baidu
Cloud
Translate
Iconic
Omniscien PangeaMT Prompsit PROMT
SDL Systran Tilde Yandex
Yandex
Standalone commercial MT products with an API. All product names, trademarks and registered trademarks are property of their respective owners. All company,
product and service names used in this website are for identification purposes only. Use of these names, trademarks and brands does not imply endorsement.
Intento
5
business requirements
—

language pair
—

domain
—

available language assets
—

tag and format support
—

cost of ownership
© Intento, Inc. / November 2019
July 2018 January 2019
WHAT MAKES THE DIFFERENCE?
Intento
6
2. IMPROVING
ENGINES
© Intento, Inc. / November 2019
data cleaning
TM training
glossaries
sentence scores
40-60% of “live”
TM is not suitable
for MT
—
linguistic
glossaries need to
be “compiled” for
MT
Intento
7
2. IMPROVING
ENGINES
© Intento, Inc. / November 2019
data cleaning
TM training
glossaries
sentence scores Different data volume and data
quality requirements
Different performance of baseline
models
July 2018 January 2019
Intento
8
2. IMPROVING
ENGINES
© Intento, Inc. / November 2019
data cleaning
TM training
glossaries
sentence scores
corpus scores are not
actionable
—
sentence scores help
linguists to focus
Intento
9
PRACTICAL
APPROACH
© Intento, Inc. / November 2019
1
2
3
4
Select candidate MT systems
Improve with your data assets
Automated scoring
Human-assisted scoring
Intento
CORPUS SCORES TO FIND TOP-RUNNERS
10© Intento, Inc. / November 2019
lack of correlation
indicates certain
types of errors
—

statistically
significant rapid
drop-off identifies
top-runners
Intento
SENTENCE SCORES TO HELP REVIEWERS
11© Intento, Inc. / November 2019
hard show NMT
training flaws
—

controversial expose
NMT quirks
—

easy to check how
high scores are
correlated with quality
—

typical to measure
PE effort
typical
Intento
4. HUMAN-ASSISTED SCORING
12© Intento, Inc. / November 2019
Depends
on the
purpose!
Linguistic Quality Assessment
—

Post-Editing Tracking
—

A/B testing
—

WTF-score
Intento
DIFFERENT SCENARIOS - DIFFERENT CHOICES
(even for the same language pair!)
13© Intento, Inc. / November 2019
PEMT / LSP
—

PEMT / Individual
—

Cross-Language Analysis and Retrieval (think eDiscovery)
—

Large-Scale Raw MT (think eCommerce)
—

Customer Support (think Global B2C)
—

Gisting and Inbound Content (think translation portals)
—

Large Enterprise
—

Government and Regulated Industries

Más contenido relacionado

Más de Konstantin Savenkov

Más de Konstantin Savenkov (20)

State of the Machine Translation by Intento (stock engines, Jun 2019)
State of the Machine Translation by Intento (stock engines, Jun 2019)State of the Machine Translation by Intento (stock engines, Jun 2019)
State of the Machine Translation by Intento (stock engines, Jun 2019)
 
State of the Machine Translation by Intento (stock engines, Jan 2019)
State of the Machine Translation by Intento (stock engines, Jan 2019)State of the Machine Translation by Intento (stock engines, Jan 2019)
State of the Machine Translation by Intento (stock engines, Jan 2019)
 
State of the Domain-Adaptive Machine Translation by Intento (November 2018)
State of the Domain-Adaptive Machine Translation by Intento (November 2018)State of the Domain-Adaptive Machine Translation by Intento (November 2018)
State of the Domain-Adaptive Machine Translation by Intento (November 2018)
 
EVALUATION IN USE: NAVIGATING THE MT ENGINE LANDSCAPE WITH THE INTENTO EVALUA...
EVALUATION IN USE: NAVIGATING THE MT ENGINE LANDSCAPE WITH THE INTENTO EVALUA...EVALUATION IN USE: NAVIGATING THE MT ENGINE LANDSCAPE WITH THE INTENTO EVALUA...
EVALUATION IN USE: NAVIGATING THE MT ENGINE LANDSCAPE WITH THE INTENTO EVALUA...
 
Improving the Demand Side of the AI Economy (API World 2018)
Improving the Demand Side of the AI Economy (API World 2018)Improving the Demand Side of the AI Economy (API World 2018)
Improving the Demand Side of the AI Economy (API World 2018)
 
Сравнительный анализ систем машинного перевода
Сравнительный анализ систем машинного переводаСравнительный анализ систем машинного перевода
Сравнительный анализ систем машинного перевода
 
State of the Machine Translation by Intento (July 2018)
State of the Machine Translation by Intento (July 2018)State of the Machine Translation by Intento (July 2018)
State of the Machine Translation by Intento (July 2018)
 
Cloud Sentiment Analysis - Vendor Overview (April 2018)
Cloud Sentiment Analysis - Vendor Overview (April 2018)Cloud Sentiment Analysis - Vendor Overview (April 2018)
Cloud Sentiment Analysis - Vendor Overview (April 2018)
 
State of the Machine Translation by Intento (March 2018)
State of the Machine Translation by Intento (March 2018)State of the Machine Translation by Intento (March 2018)
State of the Machine Translation by Intento (March 2018)
 
State of the Machine Translation by Intento (November 2017)
State of the Machine Translation by Intento (November 2017)State of the Machine Translation by Intento (November 2017)
State of the Machine Translation by Intento (November 2017)
 
NLU / Intent Detection Benchmark by Intento, August 2017
NLU / Intent Detection Benchmark by Intento, August 2017NLU / Intent Detection Benchmark by Intento, August 2017
NLU / Intent Detection Benchmark by Intento, August 2017
 
Intento Machine Translation Benchmark, July 2017
Intento Machine Translation Benchmark, July 2017Intento Machine Translation Benchmark, July 2017
Intento Machine Translation Benchmark, July 2017
 
Building a Data Driven Business
Building a Data Driven BusinessBuilding a Data Driven Business
Building a Data Driven Business
 
Управление бизнесом на основе данных
Управление бизнесом на основе данныхУправление бизнесом на основе данных
Управление бизнесом на основе данных
 
Messengers, Bots and Personal Assistants
Messengers, Bots and Personal AssistantsMessengers, Bots and Personal Assistants
Messengers, Bots and Personal Assistants
 
Рекомендательные системы: роль и оценка эффективности
Рекомендательные системы: роль и оценка эффективностиРекомендательные системы: роль и оценка эффективности
Рекомендательные системы: роль и оценка эффективности
 
Measuring the agile process improvement
Measuring the agile process improvementMeasuring the agile process improvement
Measuring the agile process improvement
 
Lean production для SAAS
Lean production для SAASLean production для SAAS
Lean production для SAAS
 
Driving Business Goals with Recommender Systems @ YAC/m 2015
Driving Business Goals with Recommender Systems @ YAC/m 2015Driving Business Goals with Recommender Systems @ YAC/m 2015
Driving Business Goals with Recommender Systems @ YAC/m 2015
 
The Economics of Recommender Systems
The Economics of Recommender SystemsThe Economics of Recommender Systems
The Economics of Recommender Systems
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 

Fantastic MT Engines and Where to Find Them

  • 1. Intento 1 BEST vs. FIT TO PURPOSE © Intento, Inc. / November 2019 linguistically best for my language pair MT with a proper data protection and retention policy, proper level of custom terminology support, which trains on my linguistic assets with good ROI per my business goals, and works well with my source data quality, format and content type according to my subject matter experts vs.
  • 2. Intento 2 BEST vs. FIT TO PURPOSE © Intento, Inc. / November 2019 linguistically best for my language pair MT with a proper data protection and retention policy, proper level of custom terminology support, which trains on my linguistic assets with good ROI per my business goals, and works well with my source data quality, format and content type according to my subject matter experts vs. on-the-fly MT routing based on the historical data automated procurement and vendor management
  • 3. Intento 3 PRACTICAL APPROACH © Intento, Inc. / November 2019 1 2 3 4 Select candidate MT systems Improve with your data assets Automated scoring Human-assisted scoring
  • 4. Intento 1. SELECT CANDIDATE ENGINES 4© Intento, Inc. / November 2019 GENERIC STOCK MODELS Alibaba Amazon Baidu DeepL eBay Google GTCom IBM Kakao Microsoft Mirai ModernMT Niutrans Naver Omniscien PROMT Rozetta SAP SDL Sogou Systran Tencent Tilde Yandex VERTICAL STOCK MODELS CUSTOM TERMINOLOGY SUPPORT AUTO DOMAIN ADAPTATION MANUAL DOMAIN ADAPTATION Youdao Alibaba Baidu Cloud Translate Iconic Microsoft Omniscien PROMT SAP Systran Amazon Baidu Google IBM Iconic Microsoft Rozetta SDL Systran Globalese Google IBM Kantan Microsoft ModernMT Omniscien SDL Systran Alibaba Baidu Cloud Translate Iconic Omniscien PangeaMT Prompsit PROMT SDL Systran Tilde Yandex Yandex Standalone commercial MT products with an API. All product names, trademarks and registered trademarks are property of their respective owners. All company, product and service names used in this website are for identification purposes only. Use of these names, trademarks and brands does not imply endorsement.
  • 5. Intento 5 business requirements — language pair — domain — available language assets — tag and format support — cost of ownership © Intento, Inc. / November 2019 July 2018 January 2019 WHAT MAKES THE DIFFERENCE?
  • 6. Intento 6 2. IMPROVING ENGINES © Intento, Inc. / November 2019 data cleaning TM training glossaries sentence scores 40-60% of “live” TM is not suitable for MT — linguistic glossaries need to be “compiled” for MT
  • 7. Intento 7 2. IMPROVING ENGINES © Intento, Inc. / November 2019 data cleaning TM training glossaries sentence scores Different data volume and data quality requirements Different performance of baseline models July 2018 January 2019
  • 8. Intento 8 2. IMPROVING ENGINES © Intento, Inc. / November 2019 data cleaning TM training glossaries sentence scores corpus scores are not actionable — sentence scores help linguists to focus
  • 9. Intento 9 PRACTICAL APPROACH © Intento, Inc. / November 2019 1 2 3 4 Select candidate MT systems Improve with your data assets Automated scoring Human-assisted scoring
  • 10. Intento CORPUS SCORES TO FIND TOP-RUNNERS 10© Intento, Inc. / November 2019 lack of correlation indicates certain types of errors — statistically significant rapid drop-off identifies top-runners
  • 11. Intento SENTENCE SCORES TO HELP REVIEWERS 11© Intento, Inc. / November 2019 hard show NMT training flaws — controversial expose NMT quirks — easy to check how high scores are correlated with quality — typical to measure PE effort typical
  • 12. Intento 4. HUMAN-ASSISTED SCORING 12© Intento, Inc. / November 2019 Depends on the purpose! Linguistic Quality Assessment — Post-Editing Tracking — A/B testing — WTF-score
  • 13. Intento DIFFERENT SCENARIOS - DIFFERENT CHOICES (even for the same language pair!) 13© Intento, Inc. / November 2019 PEMT / LSP — PEMT / Individual — Cross-Language Analysis and Retrieval (think eDiscovery) — Large-Scale Raw MT (think eCommerce) — Customer Support (think Global B2C) — Gisting and Inbound Content (think translation portals) — Large Enterprise — Government and Regulated Industries