3. 14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 3
Synchronous Context Free Grammar
(SCFG)
4. 14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 4
Learning SCFG
●
Synchronous rules are retrieved from each parallel corpora and their
word alignment .
●
: Source sentence
●
: Target sentence
●
: Set of word alignment
5. 14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 5
Closed Phrase Pair under Word Alignment
●
A phrase pair is closed under its word alignment
●
Phrase pair and alignment satisfy below:
he
will
dissolve
the
diet
in
the
near
future 彼
は
近い
うち
に
国会
を
解散
する
(国会 を → the diet)
6. 14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 6
Extracting Abstract Rules
●
We can make more abstract synchronous rules by replacing some words
in a phrase pair into a non-terminal symbol, when the phrase pair
covers other "small" phrase pair.
dissolve
the
diet
in
the
near
future
近い
うち
に
国会
を
解散
する
dissolve
in
the
に
解散
する
(国会 を, the diet)
(近い うち, near future)
(近い うち ... 解散 する, dissolve the ... near future)
(X1 に X2 解散 する, dissolve X2 in the X1)
7. 14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 7
Hiero Grammar
●
Hierarchical phrase grammar (Hiero Grammar):
– Set of all synchronous rule in the parallel corpus
●
Algorithm:
1.
where is the set of all possible phrase pair in the parallel corpora.
2. If a rule and a phrase pair satisfies then
3.
8. 14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 8
Constraints of Hiero Rules
●
To suppress size and ambiguity of Hiero grammar, we can introduce
some constraints for rule extraction.
●
Minimal phrase pair
– (国会 を, the diet) ... BAD
– (国会, the diet) ... GOOD
●
Phrase length
– (奈良 先端 科学 技術 大学院 大学 情報 科学 研究 科 自然 言語 処理 学 研究 室, ...) BAD (too many words)
●
Number of symbol
– X → 〈あらゆる X1 を 全て X2 の 方 へ ねじ曲げ た の だ, ...〉 BAD (too many symbols)
●
Rank of rule
– X → 〈X1 が X2 で X3 に X4 した, ...〉 BAD (too many non-terminals)
the
diet
国会
を
9. 14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 9
Glue Rules
●
To make large size sentence using small rules, we introduce glue rules
as below:
10. 14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 10
Introducing Syntax Labels
●
Up to here, we considered basic ideas of Hiero rules.
– non-terminal symbol are only and .
●
This model is very simple, but very ambiguous.
●
Next, we introduce syntax information into Hiero rules.
= Syntax-augmented machine translation (SAMT)
S
NP VP
PRP VBZ DT NN
this is a pen
NP
Hiero Syntax
+ → SAMT
11. 14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 11
Combinatorial Categorical Grammar (CCG)
●
SAMT uses categories (≒partial structure of syntax label) based on
the idea of combinatorial categorical grammar (CCG) .
●
Categories:
: Syntax label with absence of right-side child
: Syntax label with absence of left-side child
: Concatenation of two syntax labels and
12. 14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 12
Extracting SAMT Rules
dissolve
the
diet
in
the
near
future
近い
うち
に
国会
を
解散
する
VP
VB
NP
PP
DT
NNP
IN
NP
DT
JJ
NN
NP
NP
PP
VP
NPDT
IN+DT
VP/PP
VPVB
VP → 〈NPDT1 に NP2 解散 する, dissolve NP2 in the NPDT1〉
VP → 〈近い うち IN+DT1 国会 を VB2, VB2 the diet IN+DT1 near future〉
etc...
13. 14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 13
Probabilistic Formalization of Hiero Model
●
We consider that the translation problem using Hiero grammar is
maximization of posterior probability (similar to phrase based model):
●
And we assume the probability is modeled as log-linear model:
: Set of derivation (≒ set of used synchronous rules)
: Weights
: Feature functions
14. 14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 14
Features of Hiero Model (1)
●
Generative model: likelihoods of translation probability
Forward model:
Backward model:
where
Forward
Backward
15. 14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 15
Features of Hiero Model (2)
●
Generative model: likelihoods of translation probability
Syntax model (f):
Syntax model (e):
where
Syntax (f) Syntax (e)
16. 14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 16
Features of Hiero Model (3)
●
Lexical translation model: goodness of phrase alignment
Forward model:
Backward model:
where
Forward
Backward
17. 14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 17
Features of Hiero Model (4)
●
Language model: measuring fluency of hypothesis
Out-of-vocabulary (OOV) penalty: adjusting LM
●
Length penalty: adjusting number of words in hypothesis
Glueing penalty: adjusting number of glue rules in derivation
18. 14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 18
Decoding of Hiero Model
●
Now input sentence and set of SCFG rules are given, we find
the optimal output sequence :
: Set of possible derivation given a grammar
: Sequence of terminal symbols in given derivationn
19. 14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 19
Decoding Process
1. Calculate intersection between and .
•
= Generating syntax forest using CYK algorithm
2. Transform syntax forest into corresponding translation forest .
3. Output the sequence of terminal symbols in that maximizes model
score.
S
NP VP
PP NP V
NP P NP
が
犬
本
の
上に
座った
S
NP VP
the dog V NP PP
sat NP of P NP
the upper on the book
"犬 が 本 の 上 に 座った"
"the dog sat on the book"
20. 14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 20
Synchronous Tree Substitution Grammar
(STSG)
21. 14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 21
Synchronous Tree Substitution Grammar
●
STSG is a extension of Tree Substitution Grammar (TSG) for bilingual
analysis.
●
STSG is a subset of Synchronous Tree Adjoining Grammar (STAG).
●
Definition:
SCFG (Hiero)
STSG
STAG
U
U
Set of non-terminal symbol
Start symbol
Set of terminal symbol
Set of rules
Weight semiring
22. 14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 22
Synchronous Rules of STSG
●
Definition:
where : Elementary tree (source language)
: Elementary tree (target language)
: Association between and
●
All rules are also associated a weight:
S
x1:NP VP
x2:NP V
開けた
S
x1:NP VP
VBD x2:NP
opened
frontier
23. 14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 23
Expressive Power of STSG
●
SCFG cannot express the difference of syntax, but STSG can treat it.
●
Example:
– This synchronous rule cannot generate using more smaller SCFG rules
because these trees not corresponds any structure.
– STSG framework can treat these correspondence of tree structure directly.
NP
NP PP
N P x1:CD PC
犬 が 匹
NP
NNSx1:CD
dogs
24. 14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 24
Translation Models under STSG Framework
●
In the STSG framework, we can use the sequence of frontier nodes
(leaves of synchronous rule) instead of full tree.
●
4 translation models are available when we choose either tree or
sequence of frontier as data structure about source and target
language.
Target : frontier Target : tree
Source : frontier
String-to-string
translation
(= SCFG)
String-to-tree
translation
Source : tree
Tree-to-string
translation
Tree-to-tree
translation
S
x1:NP
VP
x2:NP
V
開けた
Tree
sequence of frontier nodes
25. 14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 25
Retrieving STSG Synchronous Rules
●
Heuristic method (similar to SCFG rule extraction)
: Syntax tree generated
from source sentence
: Syntax tree generated
from target sentence
dissolve
the
diet
in
the
near
future
近い
うち
に
国会
を
解散
する
VP VB
NP
PP
DT
NNP
IN
NP
DT
JJ
NN
VP
PP NP VP
N P
NP
NP VP P
VP
x1:PP x2:NP VP
V P
解散 する
VP
x1:PPx2:NPVB
dissolve
26. 14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 26
GHKM Algorithm
●
Galley-Hopkins-Kinght-Marcu (GHKM) Algorithm
– Generating STSG synchronous rules (string-to-tree rules) by composing minimal
rules using inside-outside algorithm.
Minimal rule
Syntax tree
1.
Detecting minimal rules
from target syntax trees.
2.
Generating large synchronous
rules by composing minimal
rules.
27. 14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 27
GHKM: Alignment Span (1)
●
Alignment span :
– Set of indexes of words in source sentence aligned to partial tree
●
Complement alignment span :
– Set of indexes of words in source sentence aligned to other than
●
Closure :
– Minimum range that covers the alignment span
28. 14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 28
GHKM: Alignment Span (2)
he
will
dissolve
the
diet
in
the
near
future
彼
は
近い
うち
に
国会
を
解散
する
VP VB
NP
PP
DT
NNP
IN
NP
DT
JJ
NN
S
NP
PRP
MD
29. 14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 29
GHKM: Admissible Node
●
Admissible node:
– Node in target syntax tree that satisfies:
he
will
dissolve
the
diet
in
the
near
future
VP VB
NP
PP
DT
NNP
IN
NP
DT
JJ
NN
S
NP
PRP
MD
彼
は
近い
うち
に
国会
を
解散
する
30. 14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 30
GHKM: Minimal Rule
●
Split the syntax tree by admissible node
he
will
dissolve
the
diet
in
the
near
future
VP VB
NP
PP
DT
NNP
IN
NP
DT
JJ
NN
S
NP
PRP
MD
彼
は
近い
うち
に
国会
を
解散
する
VP
x1:PP x2:NP x3:VB
x
x3 x2 x1
VP
the near future
x
近い うち
DT JJ NN
31. 14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 31
Extension for Tree-to-tree Model (1)
●
We need to extract node pairs of two syntax trees that are admissible
each other.
●
First, find admissible nodes in given .
●
A node pair satisfies below then they are
bidirectional admissible:
●
Span :
– Minimum range over sentence that covers all terminal symbols in
32. 14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 32
Extension for Tree-to-tree Model (2)
dissolve
the
diet
in
the
near
future
近い
うち
に
国会
を
解散
する
VP VB
NP
PP
DT
NNP
IN
NP
DT
JJ
NN
VP
PP NP VP
N P
NP
NP VP P
33. 14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 33
Features of STSG Model (1)
●
Generative model: likelihoods of translation probability
34. 14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 34
Features of STSG Model (2)
●
Lexical translation model: goodness of phrase alignment
35. 14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 35
Features of STSG Model (3)
●
Height penalty: adjusting depth of derivation
●
Internal node penalty: adjusting total size of derivation
●
Some features introduced to Hiero model are also available
36. 14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 36
Decoding of STSG Model
●
STSG decoding is basically same method as Hiero decoding:
Depends on translation model
37. 14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 37
Difference of Formalization of Each Model
●
String-to-string model
– Same model as Hiero (SCFG) model.
●
String-to-tree model
– Never use any informations from syntax of source sentence.
●
Tree-to-string model
●
Tree-to-tree model
– Explicitly use syntax informations of source sentence.
– Translation process can be divided into syntax analysis and decoding.
Source
sentence
Syntax tree
of source sentence
Translation
hypothesi(e)s
Syntax
analyzer Decoder
Non-syntax-based
translation
Syntax(tree)-based
translation
38. 14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 38
Formalization of Syntax-based Translation
●
Syntax-based translation model uses the syntax tree of source
sentence.
●
We can ignore because is already decided while syntax
analysis.
39. 14/06/05 Copyright (C) 2014 by Yusuke Oda, AHC-Lab, IS, NAIST 39
Questions & Discussions