8. Qun Liu (DCU) Hybrid Solutions for Translation

Hybrid Solutions for
Translation: Going hybrid
Qun Liu (DCU)
Dr. Manuel Herranz (Pangeanic)
12 November 2013, Birmingham, UK

PART A
Qun Liu (DCU)
qliu@computing.dcu.ie

Outline
 Why Hybrid MT?
 An overview of Hybrid MT
 Typical Hybrid MT Approaches
 Conclusion

Winter School 2013, Birmingham

MT Approaches
 RBMT: Rule-based Machine Translation
 EBMT: Example-based Machine Translation
 TM: Translation Memory
 SMT: Statistical Machine Translation


RBMT: Vauquois’ Triangle
Interlingua

Analysis

Generation
Semantic Transfer

Syntactic Transfer

Source Language

Direct

Target Language

RBMT: Rules for Components
Morphological Analysis

Source Semantic Rules
Bilingual Lexicon

Syntactic Transfer

Syntactic Mapping Rules

Semantic Transfer

Semantic Mapping Rules

Semantic Generation
Generation

Source Grammar

Lexical Transfer
Transfer

Syntactic Analysis (Parsing)
Semantic Analysis

Analysis

Source Morphological Rules

Target Semantic Rules

Syntactic Generation

Target Grammar

Morphological Generation

Target Morphological Rules

RBMT: an Example


RBMT
 RBMT makes use of human encoded
linguistic rules for translation
 Development of a RBMT system is
very expensive because it needs
plenty of human labour and takes a
long time (years)

RBMT
 RBMT systems can reach good translation
quality after years of development in the
given domain.
 Well developed RBMT systems tend to
better capture large size sentence
structures but perform worse on small size
expressions compared with SMT systems.

EBMT
 An EBMT system translate sentences by
analog of existing translation examples
 EBMT does not need deep analysis of
source text and may generate high quality
translation when similar examples are
found

EBMT


EBMT
 Quality of EBMT increases while we
get more examples.
 A problem of EBMT is the coverage of
the examples, especially for long
sentences.


TM
 Translation Memory directly output
existing target sentence when a very
similar source sentence is found in the
memory, or it outputs nothing.


SMT
 SMT builds statistical models to predict the
probability of a target sentence being the
translation of a given source sentence.
 To translate a given source sentence is just
to search for a target sentence with the
highest translation probability.

SMT
 A large number of translation pairs (parallel
corpus) is needed to estimate the model
parameters.
 To predict the translation, sentence pairs are
broken into smaller translation equivalence,
either in word level, or in phrase level or
syntax rule level.

Word-based SMT


Word-based SMT
Source

Target

Probability

Bushi （布什）

Bush

0.7

President

0.2

US

0.1

and

0.6

with

0.4

hold

0.7

had

0.3

hold

0.01

...

...

yu （与）
juxing （举行）
le （了）


Phrase-based SMT


Phrase-based SMT
Source

Target

Probability

Bushi （布什）

Bush

0.5

president Bush

0.3

the US president

0.2

Bush and

0.8

the president and

0.2

and Shalong

0.6

with Shalong

0.4

hold a meeting

0.7

had a meeting

0.3

Bushi yu （布什与）
yu Shalong （与沙龙）
juxing le huiang （举行了会谈）


Hierarchical Phrased-based SMT


Hierarchical Phrased-based SMT
Source

Target

Probability

juxing le huiang （举行了会谈）

hold a meeting

0.6

had a meeting

0.3

X a meeting

0.8

X a talk

0.2

hold a X

0.5

had a X

0.5

Bushi yu Shalong （布什与沙龙）

Bush and Sharon

0.8

Bushi X （布什X）

Bush X

0.7

X yu Y （X与Y）

X and Y

0.9

X huitang （X会谈）
juxing le X （举行了X）


Syntax-based SMT


Syntax-based SMT
Source

Target

Probability

VPB(VS(juxing) AS(le) NPB(huiang))

hold a meeting

0.6

（举行了会谈）

have a meeting

0.3

have a talk

0.1

hold a x1

0.5

have a x1

0.5

VPB(VS(juxing) AS(le) x1:NPB)
（举行了x1）

VP(PP(P(yu) x1:NPB) x2:VPB) （与 x1 x2） x2 with x1

0.9

IP(x1:NPB VP(x2:PP x3:VPB))

0.7

x1 x3 x2


SMT
 SMT is cheap
 SMT systems can be developed in a
short time
 SMT needs a large number of parallel
corpus


SMT
 SMT gets good quality translations if we
have plenty of in-domain data
 SMT quality drops dramatically for out-ofdomain data
 SMT results is fluent in short phrases but
not good at large size sentence structures
(esp. for distant languages)

Why Hybrid MT?
 Each MT approach has its pros and cons.
 We want to take advantage of different MT
approaches
 We do not want to waste our investments
on existing MT systems

An overview of Hybrid MT
 Selective MT: loose coupling
 Pipelined MT: medium coupling
 Mixture MT: close coupling


Selective MT
 Given translations generated by
different approaches, Selective MT
tries to select a best one, or select
best parts from different translations
and combine them to a new one.

Selective MT
MT1

MT2

Select
Target

Source

MT3
Target

Selective MT
 Typical Selective MT:
 System Recommendation
 System Combination
 Sentence-level combination
 word-level combination


Pipelined MT
 Pipelined MT adopts one approach as
the main approach and use another
approach for monolingual preprocessing or post-processing.


Pipelined MT

Pre-Processing

Main Approach

Post-Processing

Pipelined MT
 Typical Pipelined MT:
 Statistical Post-Editing for RBMT
 Rule-based Pre-reordering for SMT


Mixture MT
 Mixture MT adopts one approach as
the main approach but utilizes one or
more different approaches in some
components.


Mixture MT


Mixture MT
 Typical Mixture MT:
 Statistical Parsing in RBMT
 Rule-based Named Entity Translation
in SMT
 Human-Encoded Rules in SMT
 SMT Decoding with TM Phrases

Typical Hybrid MT Approaches
 Selective MT
 System Recommendation

System Combination
 Pipelined MT
 Mixture MT

System Recommendation
 Yifan He, Yanjun Ma, Josef van Genabith and Andy
Way, Bridging SMT and TM with System
Recommendation, Proceedings of the 48th Annual
Meeting of the Association for Computational
Linguistics (ACL2010), pages 622–630, Uppsala,
Sweden, 11-16 July 2010.

 Intuition:
 In some cases when we have enough big
translation memory, the trained SMT system is
comparable with TM output in translation quality.
Here comes the problem of selection.
 System recommendation recommends SMT
outputs to a TM user when it predicts that SMT
outputs are more suitable for post-editing than
the hits provided by the TM

TM
System
Recommendation

SMT

Parallel Corpus

 A SVM binary classifier is adopted
 The classifier is trained on humanannotated data
 A confidence score is given for the
recommendation


 SMT System Features: features used in the SMT system
 TM Feature: Fuzzy Match Cost
 System Independent Features:
 Source-Side Language Model Score and Perplexity
 Target-Side Language Model Perplexity
 The Pseudo-Source Fuzzy Match Score
 The IBM Model 1 Score.

 Evaluation Metrics:

Where A is the set of recommended MT
outputs, and B is the set of MT outputs that
have lower TER than TM hits.

 Selective MT
System Recommendation
 System Combination

 Pipelined MT
 Mixture MT

System Combination
 Rosti, A. V. I., Ayan, N. F., Xiang, B., Matsoukas,
S., Schwartz, R. M., & Dorr, B. J. (2007, April).
Combining Outputs from Multiple Machine
Translation Systems. In HLT-NAACL (pp. 228-235).


System Combination
 Rosti, A. V. I., Matsoukas, S., & Schwartz, R. (2007,
June). Improved word-level system combination for
machine translation. In ANNUAL MEETINGASSOCIATION FOR COMPUTATIONAL
LINGUISTICS (Vol. 45, No. 1, p. 312).


System Combination
 He, X., Yang, M., Gao, J., Nguyen, P., & Moore, R.
2008. Indirect-HMM-based hypothesis alignment for
combining outputs from machine translation systems.
In Proceedings of the Conference on Empirical
Methods in Natural Language Processing (pp. 98-107).
Association for Computational Linguistics.


System Combination
 Feng, Y., Liu, Y., Mi, H., Liu, Q., & Lü, Y. 2009. Latticebased system combination for statistical machine
translation. In Proceedings of the 2009 Conference on
Empirical Methods in Natural Language Processing:
Volume 3-Volume 3 (pp. 1105-1113). Association for
Computational Linguistics.


Sentence-Level
System Combination
 Kumar, S., & Byrne, W. J. (2004, May).
Minimum Bayes-Risk Decoding for
Statistical Machine Translation. In
HLT-NAACL (pp. 169-176).


Sentence-Level
System Combination
 Consider we have several MT systems
 For a given source text F, each MT system
output a n-best target text
 If possible, MT system gives each target
text a probability P(E|F), or we may
consider the n-best target text with equal
probabilities.

Sentence-Level
System Combination
 Minimum Bayes-Risk (MBR):


Word-Level
System Combination
 Select a translation candidate as a skeleton
(backbone) with Minimal Bayes Risk
 Construct a confusion network by aligning
all the words in other translation candidates
to the words in the skeleton
 Select the best path from the confusion
network and generate a new translation

Translation Candidate
Skeleton


Word Alignment
against the Skeleton

Skeleton


Confusion Network

Final output:
Please show me on the map.

Word-Level
System Combination
 System combination is proved to be very
effective
 In NIST Open MT Evaluation ChineseEnglish task, MSR-NRC-SRI ranked no.1
by using system combination technologies
 In later NIST evaluations, different tracks
are defined participants using or not using
system combination technologies.

 Selective MT
 Pipelined MT
Statistical Post-Editing for RBMT
Rule-based Pre-reordering for SMT
 Mixture MT

Statistical Post-Editing for RBMT

 Dugast, L., Senellart, J., & Koehn, P. (2007, June).
Statistical post-editing on SYSTRAN's rule-based
translation system. In Proceedings of the Second
Workshop on Statistical Machine Translation (pp.
220-223). Association for Computational
Linguistics.


Statistical Post-Editing for RBMT

 Simard, M., Ueffing, N., Isabelle, P., & Kuhn, R.
(2007). Rule-based Translation With Statistical
Phrase-based Post-editing. Second Workshop on
Statistical Machine Translation. Prague, Czech
Republic. June 23, 2007. pp. 203–206.


Statistical Post-Editing
 When we have:
 A very good RBMT system
 Large number of parallel corpus which can be
used for SMT training

 Both RBMT and SMT have advantages and
disadvantages
 Can we make benefits from both methods?

Statistical Post-Editing
A Statistical Post-Editing (SPE) system is a
monolingual SMT system which takes the result of a
RBMT system as input and generate a improved
target output.
Source
Text

RBMT

RBMT
Result

SPE

SPE
Result


Statistical Post Edit: Training

Source
Target

RBMT

RBMT
Target

SPE
Training

SPE

Target


Statistical Post Edit: Training
 RBMT usually generates a better word
order while SMT can make better
lexical selection.
 RBMT+SPE outperforms the original
RBMT and SMT systems.


Rule-based Pre-reordering for SMT
 Elia Yuste, Manuel Herranz, Alexandra Helle and
Hirokazu Suzuki, Go Hybrid: Pangeanic's and Toshiba's
First Steps Towards ENJP MT Hybridization, AAMT
Journal, No.50, December 2011 (Part B for this tutorial)


 Xia, F., & McCord, M. (2004, August). Improving a
statistical MT system with automatically learned rewrite
patterns. In Proceedings of the 20th international
conference on Computational Linguistics (p. 508).
Association for Computational Linguistics.



 A phrase-based SMT (PBSMT) system
performs good lexical choices but is not
good at long distance reordering without
linguistics knowledge
 A rule-based word-reordering on the source
side is conducted to make the word order of
the source text much more similar with the
word order in the target side.


Source
Text

PreReordering

Reordered
Source Text

PBSMT

Target
Text


PBSMT: Training

Source
Target

Prereordering

Reordered
Source

PBSMT
Training

PBSMT

Target


Pre-reordering: Training
 The rule for pre-ordering can be
automatic acquired from the parallel
corpus with automatic word alignment
and parsing trees in both side.


Pre-reordering: Training
 Parsing the source sentence
 Parsing the target sentence
 Align the words and the phrases in
both sides
 Extract the rewrite rules

Parsing Trees and Alignments


Rule Extraction


Rule Organization and Filtering


Applying Rewrite Rules


 Selective MT
 Pipelined MT

 Mixture MT
 Statistical Parsing in RBMT
 Rule-based Named Entity Translation in SMT
 Human-Acquired Rules in SMT
 SMT Decoding with TM Phrases

Statistical Parsing in RBMT
 Statistical parsing outperforms rulebased parsing if we have large scale
treebank.
 It is reasonable to use statistical
algorithm in the parsing component in
a RBMT system.

Rule-based Named Entity Translation
in SMT
 Ney, H. (2013). Statistical MT Systems Revisited:
How much Hybridity do they have? Proceedings of
the Second Workshop on Hybrid Approaches to
Translation, page 7, Sofia, Bulgaria, August 8,
2013.


Numerical Expression Translation

English:
3,501,749

3 million
501 thousand
and 749

3501749
350,1749
Chinese:

350 wan 1749


Human-Acquired Rules in SMT
 Li, X., Lü, Y., Meng, Y., Liu, Q., & Yu, H.
Feedback Selecting of Manually Acquired
Rules Using Automatic Evaluation.
Proceedings of the 4th Workshop on Patent
Translation, pages 52-59, MT Summit XIII,
Xiamen, China, September 2011


Human-Acquired Rules in SMT

These rules are used in the decoding process
together with the Hierarchical Phrases in a
SMT system

SMT Decoding with TM Phrases

 Philipp Koehn and Jean Senellart. 2010. Convergence of
translation memory and statistical machine translation. In
AMTA Workshop on MT Research and the Translation
Industry, pages 21–31.
 Wang, K., Zong, C., & Su, K. Y. Integrating Translation
Memory into Phrase-Based Machine Translation during
Decoding. Proceedings of the 51st Annual Meeting of the
Association for Computational Linguistics, pages 11–21,
Sofia, Bulgaria, August 4-9 2013

 Yanjun Ma, Yifan He, Andy Way and Josef van Genabith.
2011. Consistent translation using discriminative learning: a
translation memory-inspired approach. In Proceedings of the
49th Annual Meeting of the Association for Computational
Lingui stics, pages 1239–1248, Portland, Oregon.
 Yifan He, Yanjun Ma, Andy Way and Josef van Genabith.
2011. Rich linguistic features for translation memory-inspired
consistent translation. In Proceedings of the Thirteenth
Machine Translation Summit, pages 456–463.

 Extract TM phrases from similar
sentences in the translation memory
and use them in the decoding process
in the runtime.


Conclusion
 Different MT approaches have advantages and
disadvantages, which are usually complementary.
 Hybrid MT can take benefit from different MT
approaches
 Three categories of Hybrid MT is introduced:
Selective, Pipelined and Mixture.
 Actually almost all the real MT systems are hybrid
system.

Thank you!
Q&A


8. Qun Liu (DCU) Hybrid Solutions for Translation

Recomendados

Recomendados

Más contenido relacionado

Destacado

Destacado (15)

Similar a 8. Qun Liu (DCU) Hybrid Solutions for Translation

Similar a 8. Qun Liu (DCU) Hybrid Solutions for Translation (20)

Más de RIILP

Más de RIILP (20)

Último

Último (20)

8. Qun Liu (DCU) Hybrid Solutions for Translation