ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015

..
Joachim Daiber
..
Institute for Logic, Language and Computation
University of Amsterdam
.
On Using Syntactic Preordering Models to
Delimit Morphosyntactic Search Space.

Introduction Word order Morphology Conclusion
Introduction Motivation
Introduction
Project title: Exploiting hierarchical alignments for linguistically-informed
SMT models to meet the hybrid approaches that aim at compositional
translation
▶ ESR 10
▶ University of Amsterdam
▶ Supervisor: Prof. Khalil Sima'an
1/19

Motivation
▶ Current MT models work well if languages are structurally similar
▶ Diﬃculties with morphologically rich languages:
− freer word order
− more productive morphological in ections
− agreement over long distances
2/19

Motivation
..
..der ..Mann ..schlug ..Peter
..the ..man ..punched ..Peter
..derx ..Mannx ..schlugx ..Peterx
..
..den ..Mann ..schlug ..Peter
..Peter ..punched ..the ..man
..Peter ..schlug ..den ..Mann
3/19

Source preordering Preordering spaces Evaluation
Part I: Word Order
4/19

Preordering source trees
....Peter ..escaped ..from ..the ..police.
Root
.
Sb
.
AuxP
.
Adv
.
AuxA
.
Peter
.
entkam
.
der
case=dat
.
Polizei
case=dat
▶ Source dependency trees are well tted for preordering:
− Lerner and Petrov (2013) present two classi er-based dep. tree
preordering models
− Jehl et al. (2014) and de Gispert et al. (2015) preorder dep. trees via
branch-and-bound search
5/19

Preordering source trees
▶ Lerner and Petrov (2013) preoder trees starting at the root
▶ Order all children (model 1) or left and right children (model 2)
Root
.
Sb
.
AuxP
.
Adv
.
AuxA
6/19

Generating a preordering space
▶ Both Lerner and Petrov (2013) and Jehl et al. (2014) make only
single-best predictions
▶ We want:
− ALL REASONABLE predictions instead of SINGLE BEST
− more exible model
7/19

Multiple predictions and more exible model
▶ Multiple predictions
− Mistakes in order decisions propagate
− Extract n-best decisions from the model to pass to later models
▶ Making the model more exible
− Bad: order decisions are local to tree families
− Non-local features would help (e.g. LM)
→ integration via cube pruning
8/19

Making the model more exible
▶ Use standard log-linear model (Och and Ney, 2002)
ˆs′
= arg max
s′
∑
i
λi log ϕi(s′
)
▶ Where to get the weights?
− PRO: tuning as ranking (Hopkins and May, 2011)
− Scoring functions:
1. Kendall's τ coeﬃcient
2. Simulate word level MT system, score by BLEU
9/19

Do non-local features help?
Model Kendall's τ BLEU (ˆs′
→ s′
)
First-best −LM 92.16 68.1
First-best +LM (cube) 92.27 68.7
10/19

Quality of the preordering space
▶ Experiments with top 10 preordering outputs of this model
Distortion BLEU MTR TER
Baseline
7
15.2 35.4 66.6
Oracle (k = 10) 17.26 37.97 62.64
11/19

Motivation Prediction on source trees Learning what to predict
Part II: Morphology
12/19

Morphology
▶ Word order is only one part of the problem for MRLs
▶ Many linguistic properties are not expressed via word order
▶ Three questions:
− Does knowing morphological target properties help?
− Can we predict these on source trees?
− Which properties should we predict?
13/19

Does knowing morphological target properties help?
▶ Perform morph. tagging of target side of translation
▶ Project morphological attributes via the alignments
Decoration Morph. attributes Tags BLEU
None - - 15.12
Gold
All attributes 846 15.96
Manual selection 77 15.86
Automatic selection 225 15.73
14/19

Predicting target morphology on source trees
▶ Prediction based on dependency chains instead of linear chains
▶ Can take into account full syntactic context
Root
.
Sb
.
AuxP
.
Adv
.
AuxA
.
Peter
.
entkam
.
der
case=dat
.
Polizei
case=dat
15/19

Learning what to predict
Idea: Only include attr. if it leads to better lexical selection
Learning Procedure (sketch):
1. Decorate the source with all attributes
2. Calc. likelihood of heldout set with word-based system (IBM model 1)
3. As long as the likelihood increases:
− Find worst attribute by merging tags + recal. liklihood
− Remove attribute, re-align
− Repeat
16/19

Learning what to predict (English–German)
Part of speech Manual selection Automatic selection
noun
gender†
number
case
gender
number
case
adj
gender†
number‡
case‡
declension
gender
number
case
synpos
degree
verb
number‡*
person‡*
tense*
mode*
-
Additionally only in automatic: part:negativeness, part:subpos, punc:type, num:type.
17/19

Learning what to predict
Manual Automatic All
Training 50k 36m 45m 77m
Training 100k 58m 82m 2h51m
Training 200k 1h54m 3h5m 6h44m
Best F1 72.67 74.67 62.18
18/19

Conclusion
Our work so far:
Question 1: Can we make syntactic preordering models more exible
and generate a space of possible preorderings?
Question 2: Can we predict target morphology on the source?
Current and future work:
Question 3: Can we combine both ideas to exploit interactions?
19/19

References
Thank You!
Any questions?
19/19

References
References
de Gispert, A., Iglesias, G., and Byrne, W. (2015). Fast and accurate preordering for smt using neural
networks. In Proceedings of the Conference of the North American Chapter of the Association for
Computational Linguistics - Human Language Technologies (NAACL HLT 2015).
Hopkins, M. and May, J. (2011). Tuning as ranking. In Proceedings of the 2011 Conference on Empirical
Methods in Natural Language Processing, pages 1352--1362, Edinburgh, Scotland, UK. Association for
Computational Linguistics.
Jehl, L., de Gispert, A., Hopkins, M., and Byrne, B. (2014). Source-side preordering for translation using
logistic regression and depth- rst branch-and-bound search. In Proceedings of the 14th Conference of
the European Chapter of the Association for Computational Linguistics, pages 239--248, Gothenburg,
Sweden. Association for Computational Linguistics.
Lerner, U. and Petrov, S. (2013). Source-side classi er preordering for machine translation. In Proceedings
of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 513--523, Seattle,
Washington, USA. Association for Computational Linguistics.
Och, F. J. and Ney, H. (2002). Discriminative training and maximum entropy models for statistical machine
translation. In Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, ACL
'02, pages 295--302, Stroudsburg, PA, USA. Association for Computational Linguistics.
19/19

ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015

Similar a ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015 (20)

Más de RIILP

Más de RIILP (20)

Último

Último (20)

ESR10 Joachim Daiber - EXPERT Summer School - Malaga 2015