This presentation is a part of the MosesCore project that encourages the development and usage of open source machine translation tools, notably the Moses statistical MT toolkit.
MosesCore is supported by the European Commission Grant Number 288487 under the 7th Framework Programme.
For the latest updates, follow us on Twitter - #MosesCore
4. MT
open
resources:
tools
o Open source MT toolkits:
o Moses (University of Edinburgh, UK + others)
o Joshua (JHU, USA)
o Cdec (CMU, USA)
o NiuTrans (Northeastern University, China)
o Apertium (University of Alicante, Spain)
o etc..
o Free MT support tools:
o Word alignment (GIZA++, MGIZA++, Berkeley Aligner, etc.)
o Language-dependent tools (tokenizers, segmentors, parsers,..)
o MT evaluation tools (BLEU, TER, METEOR, etc.)
o Many more…
7. MT
open
resources:
data
Name Description Domain Aligned data
(average)
Languages
Europarl European
Parliament
Proceedings
Legal/“General
Domain”
1.8 million
sentences
11 European
languages
JRC-Acquis EU laws Legal 270 000
paragraphs
22 European
languages
Hansards Canadian
Parliament
Proceedings
Legal/“General
Domain”
1.3 million
sentences
North American
English, French
UN Resolutions of the
general assembly
Legal 3 million words English, French,
Spanish, Russian,
Chinese,
Arabic
} Governmental resources:
8. MT
open
resources:
data
Name Description Domain Languages
OPUS Free corpora
collected by Jörg
Tiedemann
IT, movie subtitles,
medical
European, non-
European for IT
LDC Linguistic Data
Consortium (US)
News English, Chinese,
Arabic, …
ELRA European Language
Resources
Association
European
} Academic resources:
9. MT
open
resources:
data
} Industrial resources:
Name Description Domain Languages
TAUS Data* TAUS Data
Repository
Several with slant
to IT
All major languages
TMs Translation
Memories
• your own
• from your
customer
• from your supplier
Project-specific
(great for v2.0 or
later)
* Open for the participants of the TAUS Developing Talent project.
10. MT
open
resources:
data
ü 2,200
language pairs
ü 17 industry categories
ü more than 54 billion words
13. MT
open
resources:
TAUS
MT
and
Moses
Tutorial
o https://tauslabs.com/open-source-mt/mosescore/50-
moses-tutorial-guest
o Online tutorial
o Narrated presentations
o Step-by-step screen casts
o Technical audience
o Learn about statistical MT and its practical application on
the example of Moses
14. Moses-‐
specific
Presenta=on/
Demo
Principles
of
Machine
Transla>on No
Presenta>on
Training
Data
Data
Types
and
Sources
No
Presenta>on
Data
Conversion
and
Corpus
Prepara>on
No
Demo
Data
Cleaning
and
Tokeniza>on
No
Presenta>on
Data
Cleaning
and
Tokeniza>on
Demo
No
Demo
Training
Moses
MT
Systems
Moses
Introduc>on
Yes
Presenta>on
Training
a
Moses
MT
System
Yes
Demo
Bulk
Transla>on
and
MT
System
Op>miza>on
Yes
Demo
MT
open
resources:
TAUS
MT
and
Moses
Tutorial
15. Moses-‐
specific
Presenta=on/
Demo
Evalua>ng
MT
Systems
Automa>c
Metrics
No
Presenta>on
Human
Evalua>on
No
Presenta>on
Integra>on
Document
Transla>on
and
Integra>on
Scenarios
Yes
Presenta>on
Document
Transla>on
and
Web
API
Demo
Yes
Demo
o More
to
come
o Demos
o In-‐depth
Info
o Commercial
Vendor
Presenta>ons
MT
open
resources:
TAUS
MT
and
Moses
Tutorial
24. MT
open
resources:
Support
o Moses support list
o http://mailman.mit.edu/mailman/listinfo/moses-support
o EAMT MT list
o http://www.eamt.org/mt-list.php
o Corpora list
o http://www.hit.uib.no/corpora/