SlideShare una empresa de Scribd logo
1 de 41
Descargar para leer sin conexión
Phrase Tagset Mapping for French and English 
Treebanks and Its Application in Machine Translation 
25th InternationEavl aClounafteiroennce, GSCL 2013 
Aaron L.-F. Han, Derek F. Wong, Lidia S. Chao, Liangye He, Shuo Li, 
and Ling Zhu 
September 25th -27th, 2013, Darmstadt, Germany 
Natural Language Processing & Portuguese-Chinese Machine Translation Laboratory 
Department of Computer and Information Science 
University of Macau
Contents 
● Background of language Treebank 
● Motivation 
● Designed phrase tagset mapping 
● Application in MT evaluation 
1. Manual evaluations 
2. Traditional automatic MT evaluation methods 
3. Designed unsupervised MT evaluation 
4. Evaluating the evaluation method 
5. Experiments 
6. Open source code 
● Discussion 
● Further information
1. Background of language Treebank 
• To promote the development of syntactic analysis 
• Many language treebanks are developed 
– English Penn Treebank (Marcus et al., 1993; Mitchell et al., 
1994) 
– German Negra Treebank (Skut et al., 1997) 
– French Treebank (Abeillé et al., 2003) 
– Chinese Sinica Treebank (Chen et al., 2003) 
– Etc.
1. Background of language Treebank 
• Problems 
– Different treebanks use their own syntactic tagsets 
– The number of tags ranging from tens (e.g. English Penn 
Treebank) to hundreds (e.g. Chinese Sinica Treebank) 
– Inconvenient when undertaking the multilingual or cross-lingual 
research
2. Motivation 
• To bridge the gap between these treebanks and 
facilitate future research 
– E.g. the unsupervised induction of syntactic structure 
• Petrov et al. (2012) develop a universal POS tagset 
• How about the phrase level tags? 
• The disaccord problem in the phrase level tags 
remains unsolved 
– Let’s try to solve it
3. Designed phrase tagset mapping 
• Tentative design of phrase tagset mapping 
– On English Penn Treebank I, II & French Treebank 
• 9 universal phrasal categories covering 
– 14 phrase tags in English Penn Treebank I 
– 26 phrase tags in English Penn Treebank II 
– 14 phrase tags in French Treebank
3. Designed phrase tagset mapping 
Table 1: phrase tagset mapping for French and English treebanks
3. Designed phrase tagset mapping 
• Universal phrasal categories: NP (noun phrase), VP 
(verb phrase), AJP (adjective phrase), AVP (adverbial 
phrase), PP (prepositional phrase), S (sub/-sentence), 
CONJP (conjunction phrase), COP (coordinated 
phrse), X (other phrases or unknown) 
• NP covering 
– French tags: NP 
– English tags: NP, NAC (the scope of certain prenominal 
modifiers within an NP), NX (within certain complex NPs 
to mark the head of NP), WHNP (wh-noun phrase), QP 
(quantifier phrase)
3. Designed phrase tagset mapping 
• VP covering 
– French tags: VN (verbal nucleus), VP (infinitives and 
nonfinite clauses) 
– English tags: VP (verb phrase) 
• AJP covering 
– French tags: AP (adjectival phrase) 
– English tags: ADJP (adjective phrase), WHADJP (wh-adjective 
phrase)
3. Designed phrase tagset mapping 
• AVP covering 
– French tags: AdP (adverbial phrases) 
– English tags: ADVP (adverb phrase), WHAVP (wh-adverb 
phrase), PRT (particle) 
• PP covering 
– French tags: PP 
– English tags: PP, WHPP (wh-propositional phrase phrase)
3. Designed phrase tagset mapping 
• S covering 
– French tags: SENT (sentence), S (finite clause) 
– English tags: S (simple declarative clause), SBAR (clause 
introduced by a subordinating conjunction), SBARQ (direct 
question introduced by a wh-phrase), SINV (declarative 
sentence with subject-aux inversion), SQ (sub-constituent 
of SBARQ), PRN (parenthetical), FRAG (fragment), RRC 
(reduced relative clause). 
• CONJP covering 
– French tags: N/A 
– English tags: CONJP
3. Designed phrase tagset mapping 
• COP covering 
– French tags: COORD (coordinated phrase) 
– English tags: UCP (coordinated phrases belonging to 
different categories) 
• X covering 
– French tags: unknown 
– English tags: X (unknown or uncertain), INTJ 
(interjection), LST (list marker)
4. Application in Machine Translation evaluation
4.1 Manual evaluations 
• Rapid development of Machine Translations 
– MT began as early as in the 1950s (Weaver, 1955) 
– Big progress science the 1990s due to the development of 
computers (storage capacity and computational power) and 
the enlarged bilingual corpora (Marino et al. 2006) 
• Difficulties of MT evaluation 
– language variability results in no single correct translation 
– the natural languages are highly ambiguous and different 
languages do not always express the same content in the 
same way (Arnold, 2003)
4.1 Manual evaluations 
• Traditional manual evaluation criteria: 
– intelligibility (measuring how understandable the sentence 
is) 
– fidelity (measuring how much information the translated 
sentence retains as compared to the original) by the 
Automatic Language Processing Advisory Committee 
(ALPAC) around 1966 (Carroll, 1966) 
– adequacy (similar as fidelity), fluency (whether the 
sentence is well-formed and fluent) and comprehension 
(improved intelligibility) by Defense Advanced Research 
Projects Agency (DARPA) of US (White et al., 1994)
4.1 Manual evaluations 
• Problems of manual evaluations : 
– Time-consuming 
– Expensive 
– Unrepeatable 
– Low agreement (Callison-Burch, et al., 2011)
4.2 Traditional automatic MT evaluations 
• Measuring the similarity of automatic translation and 
reference translation 
– Automatic translation (or hypothesis translation, target 
translation): by automatic MT system 
– Reference translation: by professional translators 
– Source language and source document: not used 
• Traditional automatic evaluation: 
– BLEU: n-gram precisions (Papineni et al., 2002) 
– TER: edit distances (Snover et al., 2006) 
– METEOR: precision and recall (Banerjee and Lavie, 2005)
4.3 Designed unsupervised MT evaluation 
• Problems in supervised MT evaluation 
– Reference translations are expensive 
– Reference translations are not available is some cases 
• Could we get rid of the reference translation? 
– Unsupervised MT evaluation method 
– Extract information from source and target language 
– How to use the designed universal phrase tagset?
4.3 Designed unsupervised MT evaluation 
• Assume that the translated sentence should have a 
similar set of phrase categories with the source 
sentence. 
– This design is inspired by the synonymous relation between 
source and target sentence. 
• Two sentences that have similar set of phrases may 
talk about different things. 
– However, this evaluation approach is not designed for 
general circumstance 
– Assume that the targeted sentences are indeed the 
translated sentences from the source document
4.3 Designed unsupervised MT evaluation 
• First, we parse the source and target languages 
respectively 
• Then we extract the phrase set from the source and 
target sentences 
• Third, we convert the phrases into the developed 
universal phrase categories 
• Last, we measure the similarity of source and target 
language on the universal phrase sequences
4.3 Designed unsupervised MT evaluation 
Figure 1: the parsed French and English sentence
4.3 Designed unsupervised MT evaluation 
The level of extracted phrase tags: just the upper level of POS tags, bottom-up 
Figure 2: convert the extracted phrase into universal phrase tags
4.3 Designed unsupervised MT evaluation 
• What is the similarity metric we employed? 
• Designed similarity metric: HPPR 
– N1 gram position order difference penalty 
– Weighted N2 gram precision 
– Weighted N3 gram recall 
– Weighted geometric mean in n-gram precision & recall 
– Weighted harmonic mean to combine sub-factors 
– The parameters are tunable according to different language 
pairs
4.3 Designed unsupervised MT evaluation 
•
4.3 Designed unsupervised MT evaluation 
•
4.3 Designed unsupervised MT evaluation 
Figure 3: N1 gram tag alignment algorithm
4.3 Designed unsupervised MT evaluation
4.3 Designed unsupervised MT evaluation 
•
4.3 Designed unsupervised MT evaluation 
Figure 5: bigram chunk matching example
4.4 Evaluating the evaluation method 
•
4.5 Experiments 
• Corpus from WMT 
– Workshop of statistical machine translation 
– SIGMT, ACL’S special interest group of machine 
translation 
• Training data (WMT11), tune the parameters 
– 3, 003 sentences for each document 
– 18 automatic French-to-English MT systems 
• Testing data (WMT12) 
– 3, 003 sentences for each document 
– 15 automatic French-to-English MT systems
4.5 Experiments 
• Training, tune the parameters 
– N1, N2 and N3 are tuned as 2, 3 and 3 due to the fact that 
the 4-gram chunk match usually results in 0 score. 
– Tuned values of factor weights are shown in table 
Table 2: tuned parameter values
4.5 Experiments 
• Comparisons with: 
– BLEU, measure the closeness of the hypothesis and 
reference translations, n-gram precision 
– TER, measure the editing distance of hypothesis to 
reference translations
4.5 Experiments 
Table 3: training (development) scores on WMT11 corpus 
Table 4: testing scores on WMT12 corpus
4.5 Experiments 
Table 5: correlation score intro (Cohen, 1988) 
● The experiment results on the development and testing corpora show that 
HPPR without using reference translations has yielded promising correlation 
scores (0.63 and 0.59 respectively). 
● There is still potential to improve the performances of all the three metrics, 
even though that the correlation scores which are higher than 0.5 are 
already considered as strong correlation as shown in Table 5.
4.6 Open source code 
• Phrase Tagset Mapping for French and English Treebanks 
and Its Application in Machine Translation Evaluation 
– Aaron L.-F. Han, Derek F. Wong, Lidia S. Chao, Liangye 
He, Shuo Li, and Ling Zhu. GSCL 2013, Darmstadt, 
Germany. LNCS Vol. 8105, pp. 119-131, Volume Editors: 
Iryna Gurevych, Chris Biemann and Torsten Zesch. 
• Open source tool for phrase tagset mapping and HPPR 
similarity measuring algorithms: 
https://github.com/aaronlifenghan/aaron-project-hppr
5. Discussion 
• Facilitate future research in multilingual or cross-lingual 
literature, this paper designs a phrase tags 
mapping between the French Treebank and the 
English Penn Treebank using 9 phrase categories. 
• One of the potential applications of the designed 
universal phrase tagset is shown in the unsupervised 
MT evaluation task in the experiment section.
5. Discussion 
• There are still some limitations in this work to be 
addressed in the future. 
– The designed universal phrase categories may not be able 
to cover all the phrase tags of other language treebanks, 
so this tagset could be expanded when necessary. 
– The designed HPPR formula contains the n-gram factors 
of position difference, precision and recall, which may not 
be sufficient or suitable for some of the other language 
pairs, so different measuring factors should be added or 
switched when facing new tasks.
5. Discussion 
• Actually speaking, the designed models are very 
related to the similarity measuring. Where we have 
employed them is in the MT evaluation. These works 
may be further developed into other literature: 
– information retrieval 
– question and answering 
– Searching 
– text analysis 
– etc.
6. Further information 
• Ongoing and further works: 
– The combination of translation and evaluation, tuning the 
translation model using evaluation metrics 
– Evaluation models from the perspective of semantics 
– The further explorations of unsupervised evaluation 
models, extracting other features from source and target 
languages 
• Aaron open source tools: https://github.com/aaronlifenghan 
• Aaron network Home: http://www.linkedin.com/in/aaronhan
Phrase Tagset Mapping for French and English 
Treebanks and Its Application in Machine 
Translation Evaluation 
GSCL 2013, Darmstadt, Germany 
Q and A 
Aaron L.-F. Han 
email: hanlifengaaron AT gmail DOT com 
Natural Language Processing & Portuguese-Chinese Machine Translation Laboratory 
Department of Computer and Information Science 
University of Macau

Más contenido relacionado

La actualidad más candente

Group Presentation I
Group Presentation IGroup Presentation I
Group Presentation Ibetty122508
 
7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine TranslationRIILP
 
6. Khalil Sima'an (UVA) Statistical Machine Translation
6. Khalil Sima'an (UVA) Statistical Machine Translation6. Khalil Sima'an (UVA) Statistical Machine Translation
6. Khalil Sima'an (UVA) Statistical Machine TranslationRIILP
 
Onward presentation.en
Onward presentation.enOnward presentation.en
Onward presentation.enClarkTony
 
Lecture 3 basic syntax and semantics
Lecture 3  basic syntax and semanticsLecture 3  basic syntax and semantics
Lecture 3 basic syntax and semanticsalvin567
 
referát.doc
referát.docreferát.doc
referát.docbutest
 
Native Language Identification - Brief review to the state of the art
Native Language Identification - Brief review to the state of the artNative Language Identification - Brief review to the state of the art
Native Language Identification - Brief review to the state of the artFrancisco Manuel Rangel Pardo
 
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...RIILP
 
COMPUTATIONAL APPROACHES TO THE SYNTAX-PROSODY INTERFACE: USING PROSODY TO IM...
COMPUTATIONAL APPROACHES TO THE SYNTAX-PROSODY INTERFACE: USING PROSODY TO IM...COMPUTATIONAL APPROACHES TO THE SYNTAX-PROSODY INTERFACE: USING PROSODY TO IM...
COMPUTATIONAL APPROACHES TO THE SYNTAX-PROSODY INTERFACE: USING PROSODY TO IM...Hussein Ghaly
 
Some issues of contention in contrastive analysis
Some issues of contention in contrastive analysisSome issues of contention in contrastive analysis
Some issues of contention in contrastive analysisSoraya Ghoddousi
 
8. Qun Liu (DCU) Hybrid Solutions for Translation
8. Qun Liu (DCU) Hybrid Solutions for Translation8. Qun Liu (DCU) Hybrid Solutions for Translation
8. Qun Liu (DCU) Hybrid Solutions for TranslationRIILP
 
Camerata at MediaEval 2014 - Extracting Answer Passages from Classical Music ...
Camerata at MediaEval 2014 - Extracting Answer Passages from Classical Music ...Camerata at MediaEval 2014 - Extracting Answer Passages from Classical Music ...
Camerata at MediaEval 2014 - Extracting Answer Passages from Classical Music ...multimediaeval
 
Reference Scope Identification of Citances Using Convolutional Neural Network
Reference Scope Identification of Citances Using Convolutional Neural NetworkReference Scope Identification of Citances Using Convolutional Neural Network
Reference Scope Identification of Citances Using Convolutional Neural NetworkSaurav Jha
 
Automatic Key Term Extraction and Summarization from Spoken Course Lectures
Automatic Key Term Extraction and Summarization from Spoken Course LecturesAutomatic Key Term Extraction and Summarization from Spoken Course Lectures
Automatic Key Term Extraction and Summarization from Spoken Course LecturesYun-Nung (Vivian) Chen
 
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT SystemHua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT SystemAssociation for Computational Linguistics
 
Cross-Language Masked Translation Priming in High- Proficiency Chinese-Engli...
 Cross-Language Masked Translation Priming in High- Proficiency Chinese-Engli... Cross-Language Masked Translation Priming in High- Proficiency Chinese-Engli...
Cross-Language Masked Translation Priming in High- Proficiency Chinese-Engli...English Literature and Language Review ELLR
 
A New Reusability Metric for Object-Oriented Software
A New Reusability Metric for Object-Oriented SoftwareA New Reusability Metric for Object-Oriented Software
A New Reusability Metric for Object-Oriented Softwarenewreusabilitymetric
 

La actualidad más candente (20)

Group Presentation I
Group Presentation IGroup Presentation I
Group Presentation I
 
7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation7. Trevor Cohn (usfd) Statistical Machine Translation
7. Trevor Cohn (usfd) Statistical Machine Translation
 
6. Khalil Sima'an (UVA) Statistical Machine Translation
6. Khalil Sima'an (UVA) Statistical Machine Translation6. Khalil Sima'an (UVA) Statistical Machine Translation
6. Khalil Sima'an (UVA) Statistical Machine Translation
 
Miguel Rios - 2015 - Obtaining SMT dictionaries for related languages
Miguel Rios - 2015 - Obtaining SMT dictionaries for related languagesMiguel Rios - 2015 - Obtaining SMT dictionaries for related languages
Miguel Rios - 2015 - Obtaining SMT dictionaries for related languages
 
Onward presentation.en
Onward presentation.enOnward presentation.en
Onward presentation.en
 
Lecture 3 basic syntax and semantics
Lecture 3  basic syntax and semanticsLecture 3  basic syntax and semantics
Lecture 3 basic syntax and semantics
 
referát.doc
referát.docreferát.doc
referát.doc
 
Native Language Identification - Brief review to the state of the art
Native Language Identification - Brief review to the state of the artNative Language Identification - Brief review to the state of the art
Native Language Identification - Brief review to the state of the art
 
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
4. Josef Van Genabith (DCU) & Khalil Sima'an (UVA) Example Based Machine Tran...
 
COMPUTATIONAL APPROACHES TO THE SYNTAX-PROSODY INTERFACE: USING PROSODY TO IM...
COMPUTATIONAL APPROACHES TO THE SYNTAX-PROSODY INTERFACE: USING PROSODY TO IM...COMPUTATIONAL APPROACHES TO THE SYNTAX-PROSODY INTERFACE: USING PROSODY TO IM...
COMPUTATIONAL APPROACHES TO THE SYNTAX-PROSODY INTERFACE: USING PROSODY TO IM...
 
Some issues of contention in contrastive analysis
Some issues of contention in contrastive analysisSome issues of contention in contrastive analysis
Some issues of contention in contrastive analysis
 
8. Qun Liu (DCU) Hybrid Solutions for Translation
8. Qun Liu (DCU) Hybrid Solutions for Translation8. Qun Liu (DCU) Hybrid Solutions for Translation
8. Qun Liu (DCU) Hybrid Solutions for Translation
 
Camerata at MediaEval 2014 - Extracting Answer Passages from Classical Music ...
Camerata at MediaEval 2014 - Extracting Answer Passages from Classical Music ...Camerata at MediaEval 2014 - Extracting Answer Passages from Classical Music ...
Camerata at MediaEval 2014 - Extracting Answer Passages from Classical Music ...
 
Reference Scope Identification of Citances Using Convolutional Neural Network
Reference Scope Identification of Citances Using Convolutional Neural NetworkReference Scope Identification of Citances Using Convolutional Neural Network
Reference Scope Identification of Citances Using Convolutional Neural Network
 
Automatic Key Term Extraction and Summarization from Spoken Course Lectures
Automatic Key Term Extraction and Summarization from Spoken Course LecturesAutomatic Key Term Extraction and Summarization from Spoken Course Lectures
Automatic Key Term Extraction and Summarization from Spoken Course Lectures
 
Ceis 3
Ceis 3Ceis 3
Ceis 3
 
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT SystemHua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
Hua Shan - 2015 - A Dependency-to-String Model for Chinese-Japanese SMT System
 
Cross-Language Masked Translation Priming in High- Proficiency Chinese-Engli...
 Cross-Language Masked Translation Priming in High- Proficiency Chinese-Engli... Cross-Language Masked Translation Priming in High- Proficiency Chinese-Engli...
Cross-Language Masked Translation Priming in High- Proficiency Chinese-Engli...
 
A New Reusability Metric for Object-Oriented Software
A New Reusability Metric for Object-Oriented SoftwareA New Reusability Metric for Object-Oriented Software
A New Reusability Metric for Object-Oriented Software
 
Barreiro-Batista-LR4NLP@Coling2018-presentation
Barreiro-Batista-LR4NLP@Coling2018-presentationBarreiro-Batista-LR4NLP@Coling2018-presentation
Barreiro-Batista-LR4NLP@Coling2018-presentation
 

Destacado

Windows 7 64 java envirenment build
Windows 7 64 java envirenment buildWindows 7 64 java envirenment build
Windows 7 64 java envirenment buildLifeng (Aaron) Han
 
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...Lifeng (Aaron) Han
 
The Reasonable Arrangement of Beds in the Ophthalmology Hospital (眼科医院病床合理安排的...
The Reasonable Arrangement of Beds in the Ophthalmology Hospital (眼科医院病床合理安排的...The Reasonable Arrangement of Beds in the Ophthalmology Hospital (眼科医院病床合理安排的...
The Reasonable Arrangement of Beds in the Ophthalmology Hospital (眼科医院病床合理安排的...Lifeng (Aaron) Han
 
GSCL2013.A Study of Chinese Word Segmentation Based on the Characteristics of...
GSCL2013.A Study of Chinese Word Segmentation Based on the Characteristics of...GSCL2013.A Study of Chinese Word Segmentation Based on the Characteristics of...
GSCL2013.A Study of Chinese Word Segmentation Based on the Characteristics of...Lifeng (Aaron) Han
 
GSCL2013.Phrase Tagset Mapping for French and English Treebanks and Its Appli...
GSCL2013.Phrase Tagset Mapping for French and English Treebanks and Its Appli...GSCL2013.Phrase Tagset Mapping for French and English Treebanks and Its Appli...
GSCL2013.Phrase Tagset Mapping for French and English Treebanks and Its Appli...Lifeng (Aaron) Han
 

Destacado (6)

Windows 7 64 java envirenment build
Windows 7 64 java envirenment buildWindows 7 64 java envirenment build
Windows 7 64 java envirenment build
 
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
ACL-WMT2013.A Description of Tunable Machine Translation Evaluation Systems i...
 
The Reasonable Arrangement of Beds in the Ophthalmology Hospital (眼科医院病床合理安排的...
The Reasonable Arrangement of Beds in the Ophthalmology Hospital (眼科医院病床合理安排的...The Reasonable Arrangement of Beds in the Ophthalmology Hospital (眼科医院病床合理安排的...
The Reasonable Arrangement of Beds in the Ophthalmology Hospital (眼科医院病床合理安排的...
 
Biarritz
BiarritzBiarritz
Biarritz
 
GSCL2013.A Study of Chinese Word Segmentation Based on the Characteristics of...
GSCL2013.A Study of Chinese Word Segmentation Based on the Characteristics of...GSCL2013.A Study of Chinese Word Segmentation Based on the Characteristics of...
GSCL2013.A Study of Chinese Word Segmentation Based on the Characteristics of...
 
GSCL2013.Phrase Tagset Mapping for French and English Treebanks and Its Appli...
GSCL2013.Phrase Tagset Mapping for French and English Treebanks and Its Appli...GSCL2013.Phrase Tagset Mapping for French and English Treebanks and Its Appli...
GSCL2013.Phrase Tagset Mapping for French and English Treebanks and Its Appli...
 

Similar a Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Lifeng (Aaron) Han
 
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT Lifeng (Aaron) Han
 
Lepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLifeng (Aaron) Han
 
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools Lifeng (Aaron) Han
 
PPT-CCL: A Universal Phrase Tagset for Multilingual Treebanks
PPT-CCL: A Universal Phrase Tagset for Multilingual TreebanksPPT-CCL: A Universal Phrase Tagset for Multilingual Treebanks
PPT-CCL: A Universal Phrase Tagset for Multilingual TreebanksLifeng (Aaron) Han
 
TSD2013 PPT.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFO...
TSD2013 PPT.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFO...TSD2013 PPT.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFO...
TSD2013 PPT.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFO...Lifeng (Aaron) Han
 
The Effect of Translationese on Statistical Machine Translation
The Effect of Translationese on Statistical Machine TranslationThe Effect of Translationese on Statistical Machine Translation
The Effect of Translationese on Statistical Machine TranslationGennadi Lembersky
 
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...Lviv Data Science Summer School
 
Enriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationEnriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationSeonghyun Kim
 
How to expand your nlp solution to new languages using transfer learning
How to expand your nlp solution to new languages using transfer learningHow to expand your nlp solution to new languages using transfer learning
How to expand your nlp solution to new languages using transfer learningLena Shakurova
 
Meta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methodsMeta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methodsLifeng (Aaron) Han
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processingMinh Pham
 
Integration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translationIntegration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translationChamani Shiranthika
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...alessio_ferrari
 
An Introduction to NLP4L
An Introduction to NLP4LAn Introduction to NLP4L
An Introduction to NLP4LKoji Sekiguchi
 
Word Segmentation and Lexical Normalization for Unsegmented Languages
Word Segmentation and Lexical Normalization for Unsegmented LanguagesWord Segmentation and Lexical Normalization for Unsegmented Languages
Word Segmentation and Lexical Normalization for Unsegmented Languageshs0041
 
Unit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmm
Unit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmmUnit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmm
Unit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmmDhruvKushwaha12
 
Unsupervised Quality Estimation Model for English to German Translation and I...
Unsupervised Quality Estimation Model for English to German Translation and I...Unsupervised Quality Estimation Model for English to German Translation and I...
Unsupervised Quality Estimation Model for English to German Translation and I...Lifeng (Aaron) Han
 

Similar a Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation (20)

Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...
 
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT LEPOR: an augmented machine translation evaluation metric - Thesis PPT
LEPOR: an augmented machine translation evaluation metric - Thesis PPT
 
Lepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metricLepor: augmented automatic MT evaluation metric
Lepor: augmented automatic MT evaluation metric
 
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
CUHK intern PPT. Machine Translation Evaluation: Methods and Tools
 
PPT-CCL: A Universal Phrase Tagset for Multilingual Treebanks
PPT-CCL: A Universal Phrase Tagset for Multilingual TreebanksPPT-CCL: A Universal Phrase Tagset for Multilingual Treebanks
PPT-CCL: A Universal Phrase Tagset for Multilingual Treebanks
 
TSD2013 PPT.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFO...
TSD2013 PPT.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFO...TSD2013 PPT.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFO...
TSD2013 PPT.AUTOMATIC MACHINE TRANSLATION EVALUATION WITH PART-OF-SPEECH INFO...
 
The Effect of Translationese on Statistical Machine Translation
The Effect of Translationese on Statistical Machine TranslationThe Effect of Translationese on Statistical Machine Translation
The Effect of Translationese on Statistical Machine Translation
 
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
Master defence 2020 - Anastasiia Khaburska - Statistical and Neural Language ...
 
Enriching Word Vectors with Subword Information
Enriching Word Vectors with Subword InformationEnriching Word Vectors with Subword Information
Enriching Word Vectors with Subword Information
 
How to expand your nlp solution to new languages using transfer learning
How to expand your nlp solution to new languages using transfer learningHow to expand your nlp solution to new languages using transfer learning
How to expand your nlp solution to new languages using transfer learning
 
Meta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methodsMeta-evaluation of machine translation evaluation methods
Meta-evaluation of machine translation evaluation methods
 
Introduction to natural language processing
Introduction to natural language processingIntroduction to natural language processing
Introduction to natural language processing
 
Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...
Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...
Linguistic Evaluation of Support Verb Construction Translations by OpenLogos ...
 
Integration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translationIntegration of speech recognition with computer assisted translation
Integration of speech recognition with computer assisted translation
 
Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...Natural language processing for requirements engineering: ICSE 2021 Technical...
Natural language processing for requirements engineering: ICSE 2021 Technical...
 
An Introduction to NLP4L
An Introduction to NLP4LAn Introduction to NLP4L
An Introduction to NLP4L
 
AINL 2016: Eyecioglu
AINL 2016: EyeciogluAINL 2016: Eyecioglu
AINL 2016: Eyecioglu
 
Word Segmentation and Lexical Normalization for Unsegmented Languages
Word Segmentation and Lexical Normalization for Unsegmented LanguagesWord Segmentation and Lexical Normalization for Unsegmented Languages
Word Segmentation and Lexical Normalization for Unsegmented Languages
 
Unit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmm
Unit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmmUnit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmm
Unit-1 PPL PPTvvhvmmmmmmmmmmmmmmmmmmmmmm
 
Unsupervised Quality Estimation Model for English to German Translation and I...
Unsupervised Quality Estimation Model for English to German Translation and I...Unsupervised Quality Estimation Model for English to German Translation and I...
Unsupervised Quality Estimation Model for English to German Translation and I...
 

Más de Lifeng (Aaron) Han

WMT2022 Biomedical MT PPT: Logrus Global and Uni Manchester
WMT2022 Biomedical MT PPT: Logrus Global and Uni ManchesterWMT2022 Biomedical MT PPT: Logrus Global and Uni Manchester
WMT2022 Biomedical MT PPT: Logrus Global and Uni ManchesterLifeng (Aaron) Han
 
Measuring Uncertainty in Translation Quality Evaluation (TQE)
Measuring Uncertainty in Translation Quality Evaluation (TQE)Measuring Uncertainty in Translation Quality Evaluation (TQE)
Measuring Uncertainty in Translation Quality Evaluation (TQE)Lifeng (Aaron) Han
 
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...Lifeng (Aaron) Han
 
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...Lifeng (Aaron) Han
 
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio...
 HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio... HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio...Lifeng (Aaron) Han
 
Monte Carlo Modelling of Confidence Intervals in Translation Quality Evaluati...
Monte Carlo Modelling of Confidence Intervals in Translation Quality Evaluati...Monte Carlo Modelling of Confidence Intervals in Translation Quality Evaluati...
Monte Carlo Modelling of Confidence Intervals in Translation Quality Evaluati...Lifeng (Aaron) Han
 
Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...Lifeng (Aaron) Han
 
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...Lifeng (Aaron) Han
 
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Chinese Character Decomposition for  Neural MT with Multi-Word ExpressionsChinese Character Decomposition for  Neural MT with Multi-Word Expressions
Chinese Character Decomposition for Neural MT with Multi-Word ExpressionsLifeng (Aaron) Han
 
Build moses on ubuntu (64 bit) system in virtubox recorded by aaron _v2longer
Build moses on ubuntu (64 bit) system in virtubox recorded by aaron _v2longerBuild moses on ubuntu (64 bit) system in virtubox recorded by aaron _v2longer
Build moses on ubuntu (64 bit) system in virtubox recorded by aaron _v2longerLifeng (Aaron) Han
 
Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...
Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...
Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...Lifeng (Aaron) Han
 
AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations ...
AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations ...AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations ...
AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations ...Lifeng (Aaron) Han
 
MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora
MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel CorporaMultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora
MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel CorporaLifeng (Aaron) Han
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.Lifeng (Aaron) Han
 
A deep analysis of Multi-word Expression and Machine Translation
A deep analysis of Multi-word Expression and Machine TranslationA deep analysis of Multi-word Expression and Machine Translation
A deep analysis of Multi-word Expression and Machine TranslationLifeng (Aaron) Han
 
machine translation evaluation resources and methods: a survey
machine translation evaluation resources and methods: a surveymachine translation evaluation resources and methods: a survey
machine translation evaluation resources and methods: a surveyLifeng (Aaron) Han
 
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...Lifeng (Aaron) Han
 
Chinese Named Entity Recognition with Graph-based Semi-supervised Learning Model
Chinese Named Entity Recognition with Graph-based Semi-supervised Learning ModelChinese Named Entity Recognition with Graph-based Semi-supervised Learning Model
Chinese Named Entity Recognition with Graph-based Semi-supervised Learning ModelLifeng (Aaron) Han
 
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...Lifeng (Aaron) Han
 
PubhD talk: MT serving the society
PubhD talk: MT serving the societyPubhD talk: MT serving the society
PubhD talk: MT serving the societyLifeng (Aaron) Han
 

Más de Lifeng (Aaron) Han (20)

WMT2022 Biomedical MT PPT: Logrus Global and Uni Manchester
WMT2022 Biomedical MT PPT: Logrus Global and Uni ManchesterWMT2022 Biomedical MT PPT: Logrus Global and Uni Manchester
WMT2022 Biomedical MT PPT: Logrus Global and Uni Manchester
 
Measuring Uncertainty in Translation Quality Evaluation (TQE)
Measuring Uncertainty in Translation Quality Evaluation (TQE)Measuring Uncertainty in Translation Quality Evaluation (TQE)
Measuring Uncertainty in Translation Quality Evaluation (TQE)
 
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
Meta-Evaluation of Translation Evaluation Methods: a systematic up-to-date ov...
 
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Profession...
 
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio...
 HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio... HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio...
HOPE: A Task-Oriented and Human-Centric Evaluation Framework Using Professio...
 
Monte Carlo Modelling of Confidence Intervals in Translation Quality Evaluati...
Monte Carlo Modelling of Confidence Intervals in Translation Quality Evaluati...Monte Carlo Modelling of Confidence Intervals in Translation Quality Evaluati...
Monte Carlo Modelling of Confidence Intervals in Translation Quality Evaluati...
 
Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...Apply chinese radicals into neural machine translation: deeper than character...
Apply chinese radicals into neural machine translation: deeper than character...
 
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
cushLEPOR uses LABSE distilled knowledge to improve correlation with human tr...
 
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
Chinese Character Decomposition for  Neural MT with Multi-Word ExpressionsChinese Character Decomposition for  Neural MT with Multi-Word Expressions
Chinese Character Decomposition for Neural MT with Multi-Word Expressions
 
Build moses on ubuntu (64 bit) system in virtubox recorded by aaron _v2longer
Build moses on ubuntu (64 bit) system in virtubox recorded by aaron _v2longerBuild moses on ubuntu (64 bit) system in virtubox recorded by aaron _v2longer
Build moses on ubuntu (64 bit) system in virtubox recorded by aaron _v2longer
 
Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...
Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...
Detection of Verbal Multi-Word Expressions via Conditional Random Fields with...
 
AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations ...
AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations ...AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations ...
AlphaMWE: Construction of Multilingual Parallel Corpora with MWE Annotations ...
 
MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora
MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel CorporaMultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora
MultiMWE: Building a Multi-lingual Multi-Word Expression (MWE) Parallel Corpora
 
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
ADAPT Centre and My NLP journey: MT, MTE, QE, MWE, NER, Treebanks, Parsing.
 
A deep analysis of Multi-word Expression and Machine Translation
A deep analysis of Multi-word Expression and Machine TranslationA deep analysis of Multi-word Expression and Machine Translation
A deep analysis of Multi-word Expression and Machine Translation
 
machine translation evaluation resources and methods: a survey
machine translation evaluation resources and methods: a surveymachine translation evaluation resources and methods: a survey
machine translation evaluation resources and methods: a survey
 
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
Incorporating Chinese Radicals Into Neural Machine Translation: Deeper Than C...
 
Chinese Named Entity Recognition with Graph-based Semi-supervised Learning Model
Chinese Named Entity Recognition with Graph-based Semi-supervised Learning ModelChinese Named Entity Recognition with Graph-based Semi-supervised Learning Model
Chinese Named Entity Recognition with Graph-based Semi-supervised Learning Model
 
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
Quality Estimation for Machine Translation Using the Joint Method of Evaluati...
 
PubhD talk: MT serving the society
PubhD talk: MT serving the societyPubhD talk: MT serving the society
PubhD talk: MT serving the society
 

Último

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessPixlogix Infotech
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?Igalia
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Enterprise Knowledge
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdfhans926745
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 

Último (20)

Advantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your BusinessAdvantages of Hiring UIUX Design Service Providers for Your Business
Advantages of Hiring UIUX Design Service Providers for Your Business
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
Bajaj Allianz Life Insurance Company - Insurer Innovation Award 2024
 
Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...Driving Behavioral Change for Information Management through Data-Driven Gree...
Driving Behavioral Change for Information Management through Data-Driven Gree...
 
[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf[2024]Digital Global Overview Report 2024 Meltwater.pdf
[2024]Digital Global Overview Report 2024 Meltwater.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 

Pptphrase tagset mapping for french and english treebanks and its application in machine translation evaluation

  • 1. Phrase Tagset Mapping for French and English Treebanks and Its Application in Machine Translation 25th InternationEavl aClounafteiroennce, GSCL 2013 Aaron L.-F. Han, Derek F. Wong, Lidia S. Chao, Liangye He, Shuo Li, and Ling Zhu September 25th -27th, 2013, Darmstadt, Germany Natural Language Processing & Portuguese-Chinese Machine Translation Laboratory Department of Computer and Information Science University of Macau
  • 2. Contents ● Background of language Treebank ● Motivation ● Designed phrase tagset mapping ● Application in MT evaluation 1. Manual evaluations 2. Traditional automatic MT evaluation methods 3. Designed unsupervised MT evaluation 4. Evaluating the evaluation method 5. Experiments 6. Open source code ● Discussion ● Further information
  • 3. 1. Background of language Treebank • To promote the development of syntactic analysis • Many language treebanks are developed – English Penn Treebank (Marcus et al., 1993; Mitchell et al., 1994) – German Negra Treebank (Skut et al., 1997) – French Treebank (Abeillé et al., 2003) – Chinese Sinica Treebank (Chen et al., 2003) – Etc.
  • 4. 1. Background of language Treebank • Problems – Different treebanks use their own syntactic tagsets – The number of tags ranging from tens (e.g. English Penn Treebank) to hundreds (e.g. Chinese Sinica Treebank) – Inconvenient when undertaking the multilingual or cross-lingual research
  • 5. 2. Motivation • To bridge the gap between these treebanks and facilitate future research – E.g. the unsupervised induction of syntactic structure • Petrov et al. (2012) develop a universal POS tagset • How about the phrase level tags? • The disaccord problem in the phrase level tags remains unsolved – Let’s try to solve it
  • 6. 3. Designed phrase tagset mapping • Tentative design of phrase tagset mapping – On English Penn Treebank I, II & French Treebank • 9 universal phrasal categories covering – 14 phrase tags in English Penn Treebank I – 26 phrase tags in English Penn Treebank II – 14 phrase tags in French Treebank
  • 7. 3. Designed phrase tagset mapping Table 1: phrase tagset mapping for French and English treebanks
  • 8. 3. Designed phrase tagset mapping • Universal phrasal categories: NP (noun phrase), VP (verb phrase), AJP (adjective phrase), AVP (adverbial phrase), PP (prepositional phrase), S (sub/-sentence), CONJP (conjunction phrase), COP (coordinated phrse), X (other phrases or unknown) • NP covering – French tags: NP – English tags: NP, NAC (the scope of certain prenominal modifiers within an NP), NX (within certain complex NPs to mark the head of NP), WHNP (wh-noun phrase), QP (quantifier phrase)
  • 9. 3. Designed phrase tagset mapping • VP covering – French tags: VN (verbal nucleus), VP (infinitives and nonfinite clauses) – English tags: VP (verb phrase) • AJP covering – French tags: AP (adjectival phrase) – English tags: ADJP (adjective phrase), WHADJP (wh-adjective phrase)
  • 10. 3. Designed phrase tagset mapping • AVP covering – French tags: AdP (adverbial phrases) – English tags: ADVP (adverb phrase), WHAVP (wh-adverb phrase), PRT (particle) • PP covering – French tags: PP – English tags: PP, WHPP (wh-propositional phrase phrase)
  • 11. 3. Designed phrase tagset mapping • S covering – French tags: SENT (sentence), S (finite clause) – English tags: S (simple declarative clause), SBAR (clause introduced by a subordinating conjunction), SBARQ (direct question introduced by a wh-phrase), SINV (declarative sentence with subject-aux inversion), SQ (sub-constituent of SBARQ), PRN (parenthetical), FRAG (fragment), RRC (reduced relative clause). • CONJP covering – French tags: N/A – English tags: CONJP
  • 12. 3. Designed phrase tagset mapping • COP covering – French tags: COORD (coordinated phrase) – English tags: UCP (coordinated phrases belonging to different categories) • X covering – French tags: unknown – English tags: X (unknown or uncertain), INTJ (interjection), LST (list marker)
  • 13. 4. Application in Machine Translation evaluation
  • 14. 4.1 Manual evaluations • Rapid development of Machine Translations – MT began as early as in the 1950s (Weaver, 1955) – Big progress science the 1990s due to the development of computers (storage capacity and computational power) and the enlarged bilingual corpora (Marino et al. 2006) • Difficulties of MT evaluation – language variability results in no single correct translation – the natural languages are highly ambiguous and different languages do not always express the same content in the same way (Arnold, 2003)
  • 15. 4.1 Manual evaluations • Traditional manual evaluation criteria: – intelligibility (measuring how understandable the sentence is) – fidelity (measuring how much information the translated sentence retains as compared to the original) by the Automatic Language Processing Advisory Committee (ALPAC) around 1966 (Carroll, 1966) – adequacy (similar as fidelity), fluency (whether the sentence is well-formed and fluent) and comprehension (improved intelligibility) by Defense Advanced Research Projects Agency (DARPA) of US (White et al., 1994)
  • 16. 4.1 Manual evaluations • Problems of manual evaluations : – Time-consuming – Expensive – Unrepeatable – Low agreement (Callison-Burch, et al., 2011)
  • 17. 4.2 Traditional automatic MT evaluations • Measuring the similarity of automatic translation and reference translation – Automatic translation (or hypothesis translation, target translation): by automatic MT system – Reference translation: by professional translators – Source language and source document: not used • Traditional automatic evaluation: – BLEU: n-gram precisions (Papineni et al., 2002) – TER: edit distances (Snover et al., 2006) – METEOR: precision and recall (Banerjee and Lavie, 2005)
  • 18. 4.3 Designed unsupervised MT evaluation • Problems in supervised MT evaluation – Reference translations are expensive – Reference translations are not available is some cases • Could we get rid of the reference translation? – Unsupervised MT evaluation method – Extract information from source and target language – How to use the designed universal phrase tagset?
  • 19. 4.3 Designed unsupervised MT evaluation • Assume that the translated sentence should have a similar set of phrase categories with the source sentence. – This design is inspired by the synonymous relation between source and target sentence. • Two sentences that have similar set of phrases may talk about different things. – However, this evaluation approach is not designed for general circumstance – Assume that the targeted sentences are indeed the translated sentences from the source document
  • 20. 4.3 Designed unsupervised MT evaluation • First, we parse the source and target languages respectively • Then we extract the phrase set from the source and target sentences • Third, we convert the phrases into the developed universal phrase categories • Last, we measure the similarity of source and target language on the universal phrase sequences
  • 21. 4.3 Designed unsupervised MT evaluation Figure 1: the parsed French and English sentence
  • 22. 4.3 Designed unsupervised MT evaluation The level of extracted phrase tags: just the upper level of POS tags, bottom-up Figure 2: convert the extracted phrase into universal phrase tags
  • 23. 4.3 Designed unsupervised MT evaluation • What is the similarity metric we employed? • Designed similarity metric: HPPR – N1 gram position order difference penalty – Weighted N2 gram precision – Weighted N3 gram recall – Weighted geometric mean in n-gram precision & recall – Weighted harmonic mean to combine sub-factors – The parameters are tunable according to different language pairs
  • 24. 4.3 Designed unsupervised MT evaluation •
  • 25. 4.3 Designed unsupervised MT evaluation •
  • 26. 4.3 Designed unsupervised MT evaluation Figure 3: N1 gram tag alignment algorithm
  • 27. 4.3 Designed unsupervised MT evaluation
  • 28. 4.3 Designed unsupervised MT evaluation •
  • 29. 4.3 Designed unsupervised MT evaluation Figure 5: bigram chunk matching example
  • 30. 4.4 Evaluating the evaluation method •
  • 31. 4.5 Experiments • Corpus from WMT – Workshop of statistical machine translation – SIGMT, ACL’S special interest group of machine translation • Training data (WMT11), tune the parameters – 3, 003 sentences for each document – 18 automatic French-to-English MT systems • Testing data (WMT12) – 3, 003 sentences for each document – 15 automatic French-to-English MT systems
  • 32. 4.5 Experiments • Training, tune the parameters – N1, N2 and N3 are tuned as 2, 3 and 3 due to the fact that the 4-gram chunk match usually results in 0 score. – Tuned values of factor weights are shown in table Table 2: tuned parameter values
  • 33. 4.5 Experiments • Comparisons with: – BLEU, measure the closeness of the hypothesis and reference translations, n-gram precision – TER, measure the editing distance of hypothesis to reference translations
  • 34. 4.5 Experiments Table 3: training (development) scores on WMT11 corpus Table 4: testing scores on WMT12 corpus
  • 35. 4.5 Experiments Table 5: correlation score intro (Cohen, 1988) ● The experiment results on the development and testing corpora show that HPPR without using reference translations has yielded promising correlation scores (0.63 and 0.59 respectively). ● There is still potential to improve the performances of all the three metrics, even though that the correlation scores which are higher than 0.5 are already considered as strong correlation as shown in Table 5.
  • 36. 4.6 Open source code • Phrase Tagset Mapping for French and English Treebanks and Its Application in Machine Translation Evaluation – Aaron L.-F. Han, Derek F. Wong, Lidia S. Chao, Liangye He, Shuo Li, and Ling Zhu. GSCL 2013, Darmstadt, Germany. LNCS Vol. 8105, pp. 119-131, Volume Editors: Iryna Gurevych, Chris Biemann and Torsten Zesch. • Open source tool for phrase tagset mapping and HPPR similarity measuring algorithms: https://github.com/aaronlifenghan/aaron-project-hppr
  • 37. 5. Discussion • Facilitate future research in multilingual or cross-lingual literature, this paper designs a phrase tags mapping between the French Treebank and the English Penn Treebank using 9 phrase categories. • One of the potential applications of the designed universal phrase tagset is shown in the unsupervised MT evaluation task in the experiment section.
  • 38. 5. Discussion • There are still some limitations in this work to be addressed in the future. – The designed universal phrase categories may not be able to cover all the phrase tags of other language treebanks, so this tagset could be expanded when necessary. – The designed HPPR formula contains the n-gram factors of position difference, precision and recall, which may not be sufficient or suitable for some of the other language pairs, so different measuring factors should be added or switched when facing new tasks.
  • 39. 5. Discussion • Actually speaking, the designed models are very related to the similarity measuring. Where we have employed them is in the MT evaluation. These works may be further developed into other literature: – information retrieval – question and answering – Searching – text analysis – etc.
  • 40. 6. Further information • Ongoing and further works: – The combination of translation and evaluation, tuning the translation model using evaluation metrics – Evaluation models from the perspective of semantics – The further explorations of unsupervised evaluation models, extracting other features from source and target languages • Aaron open source tools: https://github.com/aaronlifenghan • Aaron network Home: http://www.linkedin.com/in/aaronhan
  • 41. Phrase Tagset Mapping for French and English Treebanks and Its Application in Machine Translation Evaluation GSCL 2013, Darmstadt, Germany Q and A Aaron L.-F. Han email: hanlifengaaron AT gmail DOT com Natural Language Processing & Portuguese-Chinese Machine Translation Laboratory Department of Computer and Information Science University of Macau