State of the Machine Translation by Intento (stock engines, Jan 2019)

STATE OF THE
MACHINE TRANSLATION
STOCK* MT MODELS
by Intento

Jan 2019
* commercially available pre-trained MT models

January 2019© Intento, Inc.
DISCLAIMER
2
The MT systems used in this report were accessed from Dec 15 to Dec
31, 2018. They may have changed many times since then.
—
This report demonstrates performance of those systems exclusively on
the dataset used for this report (see slide 14) using proximity scores. The
ﬁnal MT decision requires Human LQA and depends on the use-case.
—
We run multiple evaluations for our clients for various language pairs and
domains, observing different rankings of the MT systems.
—
There’s no “best” MT system. Performance depends on how your data is
similar to what they used to train their models and on their algorithms.
—
Don’t jump to conclusions. Do your homework.

About
At Intento, we make Cloud Cognitive AI easy to discover, access,
and evaluate for a speciﬁc use.
—
We evaluate models for Machine Translation since May 2017
(Custom NMT as well).
—
As we show in this report, the Machine Translation landscape is
complex, with models from 9 different vendors required to get the
best performance across popular language pairs and 200x
difference in price.
—
We deliver this overview report for FREE. To evaluate on your own
dataset, reach us at hello@inten.to
3

Intento MT Hub
- that’s how we run such evaluations
Vendor-agnostic
API
Universal
CLI and SDK
Connects to
MemoQ, SDL
Trados, Matecat
and more
10-20x faster
faster due to
multi-threading
Get your
API key at
inten.to
4
Works with ﬁles
of any size
MAY BE DEPLOYED
AT PRIVATE CLOUD

Important highlights
Changes in the MT Engines list:
- ModernMT and SDL BeGlobal (NMT) added to the quantitative evaluation.
- eBay, Kakao, Naver, Niutrans and Sogou added to the MT systems list.
- IBM SMT and Microsoft SMT deprecated and removed.
—
For 21 language pairs, the best MT provider has changed since July 2018. To get
the best quality across 48 language pairs, one needs 9 engines (see slide 18).
—
Signiﬁcant changes in the Optimal MT chart due to 50% price reduction by
Yandex (see slide 19)
—
Amazon, DeepL, Youdao, SAP, IBM increased language coverage In the same
time, deprecation of SMT engines reduced coverage for low-resource language
pairs.
—
For 2 language pairs, available MT quality raised more than 5% since July 2018:
en-de (▲8%), it-pt (▲5%); also we have updated some of the datasets (led to
3-4% drop in performance in general).
5

Overview
1 TRANSLATION QUALITY
2 PRICING
3 LANGUAGE COVERAGE
4 HISTORICAL PROGRESS
5 CONCLUSIONS
48
Language Pairs
23
Machine Translation
Engines
6

Machine Translation Engines*
with Pre-Trained Models
* We have evaluated general purpose Cloud Machine Translation services with prebuilt translation models, provided via API. Some vendors also provide
web-based, on-premise or custom MT engines, which may differ on all aspects from what we’ve evaluated.
Alibaba Cloud
MT
Amazon
Translate
Baidu
Translate API
DeepL
API
eBay
Translation API
Google Cloud
Translation API
GTCom
YeeCloud MT
IBM Watson
Language Translator
Kakao Developers
Translation
Microsoft Translator
Text API v3
ModernMT
Enterprise API
Naver Cloud
Papago NMT
Niutrans
Maverick Translation
PROMT
Cloud API
SAP
Translation Hub
SDL
BeGlobal
SDL
Language Cloud
Sogou
Deepi MT
Systran PNMT
Enterprise Server
Systran REST
Translation API
Tencent Cloud
TMT API (preview)
Yandex
Translate API
Youdao Cloud
Translation API
7
(MT systems marked with grey color were unavailable for quantitative evaluation for different reasons)

1Translation Quality
1.1 Evaluation Methodology
1.2 Available MT Quality
1.3 Top-Performing Engines
1.4 Best General-Purpose Engines
1.5 Optimal General-Purpose Engines
8

Evaluation methodology (I)
Translation quality is evaluated by computing LEPOR score
between reference translations and the MT output (Slide 11).
—
Currently, our goal is to evaluate the performance of translation
between the most popular languages (Slide 12).
—
We use public datasets from StatMT/WMT, CASMACAT News
Commentary and Tatoeba (Slide 13).
—
We have performed LEPOR metric convergence analysis to
identify the minimal viable number of segments in the dataset.
See Slide 14 for some details.
9

Evaluation methodology (II)
We judge that the MT quality of service A is better than that of
B for the language pair C if:
- mean LEPOR score of A is greater than LEPOR of B for the
pair C, and
- lower bound of the LEPOR 95% conﬁdence interval of A is
greater than the upper bound of the LEPOR conﬁdence
interval of B for the pair C. See Slide 14 for example.
—
Different language pairs (and different datasets) impose different
translation complexity. To compare overall MT performance of
different services, we regularize LEPOR scores across all
language pairs (See Appendix A for more details).
10

LEPOR score
LEPOR: automatic machine translation evaluation metric
considering the enhanced Length Penalty, n-gram Position
difference Penalty and Recall
—
In our evaluation, we used hLEPORA v.3.1:
—
(best metric from ACL-WMT 2013 contest)
https://www.slideshare.net/AaronHanLiFeng/lepor-an-augmented-machine-translation-evaluation-metric-thesis-ppt
https://github.com/aaronlifenghan/aaron-project-lepor
LIKE BLEU,
BUT BETTER
11

48
Language
Pairs
* https://w3techs.com/technologies/overview/content_language/all
Language groups by
web popularity*:
P1 - ≥ 2.0% websites
P2 - 0.5%-2% websites
P3 - 0.1-0.3% websites
P4 - <0.1% websites
—
We focus on the en-P1,
P1-en and P1-P1
(partially)
en ru ja de es fr pt it zh cs tr ﬁ ro ko ar nl
en ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓ ✓
ru ✓ ✓ ✓ ✓ ✓
ja ✓ ✓ ✓
de ✓ ✓ ✓ ✓ ✓
es ✓ ✓
fr ✓ ✓ ✓ ✓
pt ✓
it ✓ ✓ ✓
zh ✓ ✓ ✓
cs ✓
tr ✓
ﬁ ✓
ro ✓
ko ✓
ar ✓
nl ✓
12

Datasets
WMT-2013 (translation task, news domain)
en-es, es-en
fr-en, en-fr
ro-en, en-ro
zh-en, en-zh, cs-en, en-cs, de-en, en-de, ru-en, en-ru, tr-en, en-tr, ﬁ-en, en-ﬁ
NewsCommentary-2011
en-ja, ja-en, en-pt, pt-en, en-it, it-en, ru-de, de-ru, ru-es, ru-fr, ru-pt, ja-fr, de-ja, es-
zh, fr-ru, fr-es, it-pt, zh-it, en-ar, ar-en, en-nl, nl-en, fr-de, de-fr, de-it, it-de, ja-zh, zh-ja
Tatoeba, JHE
en-ko, ko-en
13

We used 1600 - 2000 sentences per language pair. The metric stabilizes and adding
more from the same domain won’t change the outcome.
number of sentences
regularisedhLEPORscores
Aggregated across all language pairs Examples for individual language pairs:
LEPOR Convergence
Conﬁ-
dence

interval
Aggre-
gated
mean
14

en 5 7 8 9 6 8 7 6 2 2 2 3 1 4 1
ru 6 5 5 5 3
ja 4 4 6
de 7 4 3 6 5
es 9 5
fr 9 4 7 8
pt 8
it 5 5 4
zh 7 5 4
cs 2
tr 4
ﬁ 1
ro 3
ko 1
ar 7
nl 1
$
$
Available
MT
Quality Maximal
Available

hLEPOR score:
>80 %
70 %
60 %
50 %
40 %
<40 %
Minimal price
for this quality,
per 1M char*:
$$$ ≥$20
$$ $10-15
$ <$10
No. of

top-performing

MT Providers**
* base pricing tier
** up to 5% worse than the leader,
SMT and NMT counted separately
Check Appendix B for more
detailed data.
$
$
$$
$
$
$$
$
$
$$
$
$
$
$$
$$$ $
$
$
$
$$
$
$
$
$
$
$
$$
$
$
$$
$ $$ $$$
$
$
$
$
$$$
$
$
$$ $$$
$
$
15

Sample pair analysis: English-Chinese
LEPOR

score Providers
Price range

(per 1M characters)
74 % Tencent (preview)
73 % Baidu, GTCom $8-10
72 % Google, Amazon $15-20
70 % Yandex $7
based on
WMT-18

dataset
BEST
QUALITY:
Tencent (preview)
TOP 5%: Tencent, Baidu, GTCom,
Google, Amazon, Yandex
BEST PRICE
IN TOP 5%:
Yandex
16

optimal
Provides the lowest price
among the top 5% MT
engines for a language
pair
0
10
20
30
40
50
deepl
google
am
azon
yandex
systran-pnm
tm
odernm
t
ibm
-nm
t
prom
t
m
sft-nm
t
tencent
baidu
sdl-beglobal
gtcom
sdl-sm
t
across 48 language pairs
TOP Performing MT Providers
best
Provides the best MT
Quality for a language
pair
top 5%
Within 5% of the best
available MT Quality for a
language pair
17
numberoflanguagepairs

en
ru
ja
de
es
fr
pt
it
zh
cs
tr
ﬁ
ro
ko
ar
nl
Best
general-
purpose
MT
engines
MT Engines
deepl
google
amazon
yandex
systran-pnmt
modernmt
ibm
promt
microsoft
tencent
baidu
18
In several cases, there’s no
statistically signiﬁcant difference
between the top engines.
Check Appendix B for more
detailed data.

en
ru
ja
de
es
fr
pt
it
zh
cs
tr
ﬁ
ro
ko
ar
nl
* Cheapest with a
performance within
5% of the best
available for this
language pair
Optimal*
general-
purpose
MT
engines
19
MT Engines
deepl
google
amazon
yandex
systran-pnmt
modernmt
ibm
promt
microsoft
tencent
baidu

2 Public pricing
USD per 1M symbols
* +20% for some language pairs
** estimation based on 4.79 symbols per word
20

3Language Coverage
3.1 Supported and Unique per Provider
3.2 Coverage by Language Popularity
21

1
100
10000
N
iutrans
G
oogle
Yandex
M
icrosoftv3
Sogou
Baidu
Am
azon
Tencent
Youdao
SystranSDL
Language

C
loud
PRO
M
T
SAP
DeepL
IBM
W
atson
v3M
odernM
T
N
aver
Alibaba
G
TC
om
Kakao
eBay
1
3
2
54
2
126
4
240
2 024
2
1212
20
3842
50
72
9298104110
132
210
417
756
3 422
3 782
7 656
10 71213 572
Total
Unique
3.1 Supported and Unique Language Pairs*
Unique
language pairs
- supported
exclusively by
one provider
22
* where possible, we have checked via API if all language pairs advertised by the documentation are
supported and removed the pairs we were unable to locate in the API.
** as advertised (not validated via API)
** ** ** ** ** ** ** **

Language popularity
Language groups by
web popularity*:
P1 - ≥ 2.0% websites
P2 - 0.5%-2% websites
P3 - 0.1-0.3% websites
P4 - <0.1% websites
* https://w3techs.com/technologies/overview/content_language/all
A total of
29070
pairs possible,
14290
are supported
across all providers
P1
en, ru, ja, de, es, fr,
pt, it, zh
P2
pl, fa, tr, nl, ko, cs, ar,
vi, el, sv in, ro, hu
P3
da, sk, ﬁ, th, bg, he, lt, uk, hr,
no, nb, sr, ca, sl, lv, et
P4
hi, az, bs, ms, is, mk, bn, eu, ka, sq, gl,
mn, kk, hy, se, uz, kr, ur, ta, nn, af, be,
si, my, br, ne, sw, km, ﬁl, ml, pa, …
23

100% 100% 63%
38%
P1 P2 P3 P4
P1
P2
P3
P4
60%
100%
100%
100%
63%
100% 100%
100%
63%
63% 60%
99%
3.2 Language coverage
by popularity
49%
of possible
language pairs
24

Language coverage
by service provider
Niutrans
Maverick
Translation
Google Cloud
Translation API
Yandex
Translate API
Microsoft
Translator Text
API v3
Sogou
Deepi MT
Baidu
Translate API
Amazon
Translate
Tencent Cloud
TMT API
(preview)
Youdao Cloud
Translation API
Systran REST
Translation API
SDL
Language
Cloud
PROMT
Cloud API
SAP Translation
Hub
DeepL
API
IBM Watson
Language
Translator v3
ModernMT
API
Naver
Papago NMT
Alibaba
Translate
GTCom
YeeCloud MT
Kakao
MT
eBay
MT
(preview)
25

4 Historical Progress
4.1 Number of Cloud MT Vendors
4.2 MT Quality
4.3 Performance/Price Efﬁciency
26

4.1 Independent Cloud MT Vendors
with pre-built models
Commercial
Alibaba, Amazon,
Baidu, DeepL,
Google, GTCom,
IBM, Microsoft,
ModernMT, Naver,
Niutrans, PROMT,
SAP, SDL, Sogou,
Systran, Yandex,
Youdao
Preview
Tencent, eBay, Kakao
0
5
10
15
20
25
Nov 17 Mar 18 Jul 18 Dec 18
Preview
Commercial
Intento, Inc. • July 2018
27

30 %
40 %
50 %
60 %
70 %
80 %
Best pair
Worst pair
4 6
4.2 Best available
MT Quality
Number of
language pairs
available at this level
of LEPOR quality
out of 35 pairs we
evaluated since
November 2017
14
11
5
13
11
5
Intento, Inc. • Dec 2018
13
11
5
7
13
10
5
28
2
3
2

3
33
4.3 Best available
Performance/Price Efficiency
Efficiency =
(hLEPOR in %)² /
(USD per 1M
symbols)
—
Number of
language pairs
available at this level
of efficiency out of
35 pairs we
evaluated since
November 2017
8
4
6
4
7
3
8
5
5
7
3
8
4
7
7
2
8
5
7
4
29
2
1
4
2
1
2
2
5
100
200
300
400
500
600
700
800
900
Best pair
Worst pair

5 Conclusions
Since July 2018, the MT Landscape changed
completely, both in terms of quality and price.
—
Even for the general domain, having the best quality
across 48 language pairs requires 9 engines used
simultaneously (and those are different from half a
year ago).
—
Re-evaluate your MT choice often to stay
competitive.
30

Intento Professional Services
MT Evaluation and Integration
Training and statistically significant evaluation of NMT
engines, which may bring the most cost and time reduction
on the post-editing stage (see the example here).
—
Identifying a subset of MT results for fast and affordable
manual inspection (~200x reduction of LQA efforts).
—
LQA and HTER also available via our LSP partners.
—
MT Integration - SDK and connectors to open platforms and
in-house software.
—
Reach us at hello@inten.to
31

Intento Web-Tools
Human-Friendly UI
working directly with the
Intento API
—
Quick way to try every MT
engine and translate large
files without API
integration.
—
Available in preview at no
added cost to Intento API
32
SIGN UP

at https://console.inten.to

Intento Plugins and Connectors
33
MemoQ (private plugin)
—
SDL Trados (private plugin, also in SDL
AppStore)
—
Matecat (private plugin)
—
Also, many of the engines are available in
Smartcat.
—
Miss some connector? Reach us at
hello@inten.to!

Intento MT Hub
- that’s how we run such evaluations
Vendor-agnostic
API
Universal
CLI and SDK
Connects to
MemoQ, SDL
Trados, Matecat
and more
10-20x faster
faster due to
multi-threading
Get your
API key at
inten.to
34
Works with ﬁles
of any size
MAY BE DEPLOYED
AT PRIVATE CLOUD

by Intento (https://inten.to)

January 2019
Konstantin Savenkov
ks@inten.to
(415) 429-0021
2150 Shattuck Ave
Berkeley CA 94705
35
STATE OF THE
MACHINE TRANSLATION
STOCK* MT MODELS

Appendix A
Overall performance of the MT services across many language
pairs is computed in the following way:
1. [Standardisation] We compute mean language-standardized
LEPOR score (or z-score) for each provider.
2. [Scale adjustment] We restore the original scale by multiplying
z-score for each MT provider by the global LEPOR standard
deviation and adding the global mean LEPOR score.
36

Appendix B. Average hLEPOR ranking
across all 48 language pairs.
WARNING: This chart looks
cool but requires a high level
of color sensitivity. Also, there
are lots of overlapping circles.
Please look at sides 18 and
19 for more digestible data.
37
AveragehLEPOR

State of the Machine Translation by Intento (stock engines, Jan 2019)

Recommended

Recommended

More Related Content

What's hot

What's hot (12)

Similar to State of the Machine Translation by Intento (stock engines, Jan 2019)

Similar to State of the Machine Translation by Intento (stock engines, Jan 2019) (20)

More from Konstantin Savenkov

More from Konstantin Savenkov (12)

Recently uploaded

Recently uploaded (20)

State of the Machine Translation by Intento (stock engines, Jan 2019)