SlideShare una empresa de Scribd logo
1 de 24
technology
from seed
Contractions: To Align or Not to Align,
That Is the Question
Anabela Barreiro & Fernando Batista
LR4NLP@COLING 2018 – Santa Fe, New Mexico, USA – August 20th 2018
• Contractions
• Goal
• Approach
• Methodology
• Corpus
• Decomposed Contractions
• Non-decomposed Contractions
• Analysis of Results
• Final Remarks
• References
Outline
2
• Words formed from 2+ words that would otherwise appear
next to each other in a sequence
– different parts-of-speech (PoS) (most frequently)
– same PoS (more seldom)
– Examples
• EN: is not → isn’t, it is → it’s
• IT: di + gli → degli
• FR: la une → l’une ; de après → d’après
• PT: em + as → nas
• Problematic for several reasons
– 2+ words overlap, which makes syntactic analysis and generation more
difficult
– in cross-language analysis, the contrast between languages that have
contractions and languages that do not have them, or do not have them
in the same contexts, may present additional difficulties
Contractions
3
• Alignment of (discontinuous) multiword units (MWU) where
contractions occur
– a challenge that has been overlooked in the existing literature and can
be responsible for grammatical errors in translations (or paraphrasing)
– high quality machine translation/paraphrasing depends on the quality
of the alignments used in the processes of machine learning
Goal
4
• Linguistically motivated approach to the alignment of MWU
where contractions occurring in these MWU are required to be
decomposed, except in specific circumstances determined by
the context, such as when they constitute a non-variable (non-
inflecting) element of a frozen MWU
– Examples
• apesar da → [apesar de] [a NP]
• [à luz dos] → [à luz de] [os NP]
Approach
5
• Alignment of MWU & phrasal units based on linguistic knowledge
– Decide if contractions should be decomposed or maintained
1. Initially, the pre-processing task consisted of a semi-manual
decomposition by a linguist of all contractions
• Decomposition allowed for the correct alignment of MWU where contracted
forms required to be split so that those MWU and the phrases that follow them
could be mapped to the corresponding elements in the source language
2. Subsequently, all the decomposed forms were reviewed and the
decomposed forms in MWU and frozen expressions were changed back to
contractions
– The methodology applies whether the source-target pair are:
• 2 different languages (e.g. translation)
• 2 varieties of the same language (e.g. variety adaptation)
• the same language (e.g. paraphrases)
Methodology
6
• PT–EN CLUE4Translation parallel corpus
– subset of the reference Europarl parallel corpus (Koehn, 2005)
containing 400 Portuguese-English parallel sentences
• In the corpus, the number of contractions that need to be
decomposed is (much) greater than the number of
contractions that require to be maintained
• Work performed in the framework of the eSPERTo project
in which the alignment tool used (CLUE-Aligner) was
developed
Corpus
7
Decomposed Contractions
8
• Contractions in contexts in which they cannot be decomposed
• Decomposition is required to make possible that 2
concatenated words go in 2 different alignment pairs
– Alignment of the compound word em nome de "on behalf of"
and the NP o meu grupo "my group” with their internal elements (individual words)
• Other examples of decomposition that has taken place in
contractions
Decomposed Contractions
9
two elements, thepreposition de"of" and themasculine singular definite article o "the" (d
to align correctly both the canonical form (lemma) of the compound word em nome de
and the noun phrase o meu grupo "my group", where the preposition of the contraction
compound and thedefinite article goes with thenoun phrase, i.e., thedecomposition isre
possible that the two concatenated words go in two different alignment pairs, as illustrat
Similar decomposition hastaken place in contractions such asthose illustrated in exampl
(2) EN - across + [the Atlantic]
PT - do outro lado de(do = [ de+o] ) [o Atlântico]
(3) EN - issues like+ [theNP]
PT - questões como a de(dos = [ de+os] ) [osNP]
(4) EN - with respect to + [the N]
PT - quanto a (ao = [ a+o] ) [o N]
(5) EN - fully approves[NP: thejoint position of thecouncil]
PT - dá a sua total aprovação a (à = [ a+a] ) [NP: a posição comumdo conselho]
Decomposition of contractions also has implication in coordination. For example, t
noun phrases o parlamento "the parliament" and o conselho "the council" illustrated in F
rect complements of the Portuguese prepositional verb realizado por "carried out by". W
Decomposed Contractions
10
• Decomposition of contractions has implication in coordination
– Alignment of the coordinated MWU realizado por [NP] e por [NP], implying the
double decomposition of the contraction pelo into the preposition por of the
prepositional verb and the masculine singular definite article o of the
coordinated NP.
Non-decomposed Contractions
11
• Contractions in contexts in which they cannot be decomposed
• Contractions in non-compositional MWU, such as fixed
expressions, were maintained
– Alignment of the fixed expression à data "at the time"
Non-decomposed Contractions
12
• Contractions in non-compositional MWU, such as fixed
expressions, were maintained
– Alignment of the fixed expression na ordem do dia "the next item"
• Most contractions require decomposition in contexts where
they are part of a MWU
Analysis of Results
13
Frequency of contractions in contexts in which they require decomposition
Analysis of Results
14
The most frequent contractions in the corpus (de+a, de+o, de+os, and em+a),
establish syntactic relationships between MWU, such as compounds,
prepositional nouns, etc.
In these contractions, the preposition establishes the final border of the 1st
phrase (i.e., the last word in the phrase), and the determiner establishes the
initial border of the phrase immediately after (i.e., the 1st word in the phrase)
• Most contractions require decomposition in contexts where
they are part of a MWU
Analysis of Results
15
The second noun phrase can be a named entity
• União Europeia "European Union"
• Ásia "Asia"
or a term
• capital de risco "risk capital"
• fundos de pensão "pension funds"
• Most contractions require decomposition in contexts where
they are part of a MWU
Analysis of Results
16
Some contractions appear in discontinuous MWU
• centrar-se [] em "to deal with"
There are occurrences of contractions that require decomposition in contexts where
the preposition is part of a MWU (the last word of the MWU, e.g., em relação a
"with regard to" and the determiner is part of a regular NP, e.g., as observações "the
comments".
• Most contractions require decomposition in contexts where
they are part of a MWU
• WRT contractions that cannot be decomposed, most of
them occur in the beginning or in the middle of the MWU,
seldom in the end. For example, the contractions no, neste,
pelo, and às in the MWU no que diz respeito a "with regard to",
neste momento "at this time", pelo contrário "on the contrary",
and às 12h30 "at 12.30 p.m." cannot be decomposed, because
they are not positioned in the border with the next phrase.
The same goes for the contraction à in the MWU até à data "so
far", which occurs in a middle position. Exceptionally, the
contraction disso in the MWU além disso "in addition" also
remains undecomposed, because it corresponds to a fixed
adverbial expression.
Analysis of Results
17
• Most frequent undecomposed contractions
Analysis of Results
18
• A few observations are worth noting WRT undecomposeable
contractions:
– There are some semantico-syntactic patterns that function as
linguistic constraints. For example, the contractions às, nos, or pelos
cannot be decomposed when used with time-related named entities,
such as às seis horas da tarde "at 6 p.m", às sextas-feiras "on Fridays",
nos anos sessenta "in the sixties”, or pelos anos seguintes "for all years
ahead", among others.
– In normal circumstances, contractions of prepositions with pronouns,
such as consigo in the expression consigo próprio "with itself" should
not be decomposed.
Analysis of Results
19
• When used in MT (or paraphrasing) systems, alignments
containing linguistic knowledge contribute to improved
accuracy, reduced computational complexity and
ambiguity, and improved translation/paraphrasing
quality, as illustrated for the contractions described.
• Contractions can be a very frequent phenomenon in a
language, thus the results that can be obtained through their
correct alignment in a system can be significantly better than
those obtained in a purely statistic or ad-hoc manner.
Final Remarks
20
• Many other linguistic phenomena require further
examination.
• Without a suitable linguistic approach to the alignment task,
and limited to the capacity of the algorithms, systems will
continue to be overloaded with poor quality alignments,
which will create translation (paraphrasing) of limited
quality, requiring a greater post-editing effort.
• There is still a shortage of manually annotated alignments
that can be used in training and evaluation for many language
pairs or language variants, especially those with scarce
resources.
Final Remarks
21
• We have used a methodology to align MWU involving
contractions, which pose a challenge to their correct
alignment
• The proposed alignment methodology is independent of the
application, so the pairs of aligned MWU and phrases can be
used in translation, paraphrasing, variety adaptation and
other NLP tasks
• Knowledge learned from a linguistic approach to alignment
tasks can help solve problems related to the alignment of
MWU, provide better solutions to process and align them,
leading to more precise and better quality applications
Final Remarks
22
• Anabela Barreiro and Fernando Batista. 2016. Machine Translation of Non-Contiguous Multiword Units. Proceedings of the Workshop on Discontinuous Structures in Natural
Language Processing, DiscoNLP 2016, pages 22–30, San Diego, California, June. Association for Computational Linguistics (ACL).
• Anabela Barreiro and Cristina Mota. 2017. e-PACT: eSPERTo Paraphrase Aligned Corpus of EN-EP/BP Translations. Tradução em Revista, 1(22):87–102.
• Anabela Barreiro, Bernard Scott, Walter Kasper, and Bernd Kiefer. 2011. OpenLogos Rule-Based Machine Translation: Philosophy, Model, Resources and Customization. Machine
Translation, 25(2):107–126.
• Anabela Barreiro, Fernando Batista, Ricardo Ribeiro, Helena Moniz, and Isabel Trancoso. 2014a. OpenLogos Semantico-Syntactic Knowledge-Rich Bilingual Dictionaries. Proceedings
of the 9th International Conference on Language Resources and Evaluation, LREC 2014, pages 3774–3781, Reykjavik, Iceland, May. European Language Resources Association.
• Anabela Barreiro, Johanna Monti, Brigitte Orliac, Susanne Preuss, Kutz Arrieta, Wang Ling, Fernando Batista, and Isabel Trancoso. 2014b. Linguistic Evaluation of Support Verb
Constructions by OpenLogos and Google Translate. Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014, pages 35–40, Reykjavik,
Iceland, May. European Language Resources Association.
• Anabela Barreiro, Francisco Raposo, and Tiago Luís. 2016. CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Translation Units. Proceedings of the 10th Edition
of the Language Resources and Evaluation Conference, LREC 2016, pages 7–13. European Language Resources Association.
• Phil Blunsom and Trevor Cohn. 2006. Discriminative Word Alignment with Conditional Random Fields. In
• Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, ACL-44, pages 65–72,
Sydney, Australia. Association for Computational Linguistics.
• David Chiang. 2005. A Hierarchical Phrase-based Model for Statistical Machine Translation. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics,
ACL 2005, pages 263–270, Ann Arbor, Michigan, USA. Association for Computational Linguistics.
• João Graça, Joana Paulo Pardal, Luísa Coheur, and Diamantino Caseiro. 2008. Building a Golden Collection of Parallel Multi-Language Word Alignment., LREC 2008, pages 986–993,
Marrakech, Morocco, May. European Language Resources Association.
• Philipp Koehn. 2005. EuroParl: A Parallel Corpus for Statistical Machine Translation. In Proceedings of the 10th Machine Translation Summit, pages 79–86, Phuket, Thailand. AAMT,
Asia-Pacific Association for Machine Translation.
• Patrik Lambert, Adrià De Gispert, Rafael Banchs, and José B. Mariñ o. 2005. Guidelines for Word Alignment Evaluation and Manual Alignment. Language Resources and Evaluation,
39(4):267–285, Dec.
• Daniel Marcu, Wei Wang, Abdessamad Echihabi, and Kevin Knight. 2006. SPMT: Statistical machine translation with syntactified target language phrases. EMNLP 2006, pages 44–52,
Stroudsburg, PA, USA. ACL.
• Franz Josef Och and Hermann Ney. 2000. Improved Statistical Alignment Models. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, ACL 2000,
pages 440–447, Hong Kong. Association for Computational Linguistics.
• Bernard Scott. 2003. The Logos Model: An Historical Perspective. Machine Translation, 18(1):1–72.
• Bernard Scott. 2018. Translation, Brains and the Computer: A Neurolinguistic Solution to Ambiguity and Complexity in Machine Translation. Machine Translation: Technologies and
Applications. Springer International Publishing.
• Max Silberztein. 2016. Formalizing Natural Languages: the NooJ Approach. Wiley Eds.
• Jörg Tiedemann. 2011. Bitext Alignment. Synthesis Digital Library of Engineering and Computer Science. Morgan & Claypool.
• Andreas Zollmann and Ashish Venugopal. 2006. Syntax Augmented Machine Translation via Chart Parsing. In Proceedings of the Workshop on Statistical Machine Translation, SMT
2006, pages 138–141, New York City, June. Association for Computational Linguistics.
References
23
24
Thank you!
Obrigada!

Más contenido relacionado

La actualidad más candente

referát.doc
referát.docreferát.doc
referát.docbutest
 
Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translationasimuop
 
NLP_KASHK:Context-Free Grammar for English
NLP_KASHK:Context-Free Grammar for EnglishNLP_KASHK:Context-Free Grammar for English
NLP_KASHK:Context-Free Grammar for EnglishHemantha Kulathilake
 
NLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological ParsingNLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological ParsingHemantha Kulathilake
 
CBAS: CONTEXT BASED ARABIC STEMMER
CBAS: CONTEXT BASED ARABIC STEMMERCBAS: CONTEXT BASED ARABIC STEMMER
CBAS: CONTEXT BASED ARABIC STEMMERijnlc
 
Updating lecture 1 introduction to syntax
Updating lecture 1 introduction to syntaxUpdating lecture 1 introduction to syntax
Updating lecture 1 introduction to syntaxssuser1f22f9
 
Unsupervised Extraction of False Friends from Parallel Bi-Texts Using the Web...
Unsupervised Extraction of False Friends from Parallel Bi-Texts Using the Web...Unsupervised Extraction of False Friends from Parallel Bi-Texts Using the Web...
Unsupervised Extraction of False Friends from Parallel Bi-Texts Using the Web...Svetlin Nakov
 
Prosodic Morphology
Prosodic Morphology Prosodic Morphology
Prosodic Morphology Maroua Harrif
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelHemantha Kulathilake
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Rajnish Raj
 
NLP_KASHK:Parsing with Context-Free Grammar
NLP_KASHK:Parsing with Context-Free Grammar NLP_KASHK:Parsing with Context-Free Grammar
NLP_KASHK:Parsing with Context-Free Grammar Hemantha Kulathilake
 
Cross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristicsCross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristicsijaia
 
SYNTACTIC ANALYSIS BASED ON MORPHOLOGICAL CHARACTERISTIC FEATURES OF THE ROMA...
SYNTACTIC ANALYSIS BASED ON MORPHOLOGICAL CHARACTERISTIC FEATURES OF THE ROMA...SYNTACTIC ANALYSIS BASED ON MORPHOLOGICAL CHARACTERISTIC FEATURES OF THE ROMA...
SYNTACTIC ANALYSIS BASED ON MORPHOLOGICAL CHARACTERISTIC FEATURES OF THE ROMA...kevig
 
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse AnalysisSentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse AnalysisSubhabrata Mukherjee
 
RuleML2015 The Herbrand Manifesto - Thinking Inside the Box
RuleML2015 The Herbrand Manifesto - Thinking Inside the Box RuleML2015 The Herbrand Manifesto - Thinking Inside the Box
RuleML2015 The Herbrand Manifesto - Thinking Inside the Box RuleML
 

La actualidad más candente (17)

referát.doc
referát.docreferát.doc
referát.doc
 
Phrase structure grammar
Phrase structure grammarPhrase structure grammar
Phrase structure grammar
 
Statistical machine translation
Statistical machine translationStatistical machine translation
Statistical machine translation
 
NLP_KASHK:Context-Free Grammar for English
NLP_KASHK:Context-Free Grammar for EnglishNLP_KASHK:Context-Free Grammar for English
NLP_KASHK:Context-Free Grammar for English
 
NLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological ParsingNLP_KASHK:Finite-State Morphological Parsing
NLP_KASHK:Finite-State Morphological Parsing
 
CBAS: CONTEXT BASED ARABIC STEMMER
CBAS: CONTEXT BASED ARABIC STEMMERCBAS: CONTEXT BASED ARABIC STEMMER
CBAS: CONTEXT BASED ARABIC STEMMER
 
Updating lecture 1 introduction to syntax
Updating lecture 1 introduction to syntaxUpdating lecture 1 introduction to syntax
Updating lecture 1 introduction to syntax
 
Unsupervised Extraction of False Friends from Parallel Bi-Texts Using the Web...
Unsupervised Extraction of False Friends from Parallel Bi-Texts Using the Web...Unsupervised Extraction of False Friends from Parallel Bi-Texts Using the Web...
Unsupervised Extraction of False Friends from Parallel Bi-Texts Using the Web...
 
Prosodic Morphology
Prosodic Morphology Prosodic Morphology
Prosodic Morphology
 
NLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language ModelNLP_KASHK:Evaluating Language Model
NLP_KASHK:Evaluating Language Model
 
Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...Natural Language processing Parts of speech tagging, its classes, and how to ...
Natural Language processing Parts of speech tagging, its classes, and how to ...
 
NLP_KASHK:Parsing with Context-Free Grammar
NLP_KASHK:Parsing with Context-Free Grammar NLP_KASHK:Parsing with Context-Free Grammar
NLP_KASHK:Parsing with Context-Free Grammar
 
8 issues in pos tagging
8 issues in pos tagging8 issues in pos tagging
8 issues in pos tagging
 
Cross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristicsCross lingual similarity discrimination with translation characteristics
Cross lingual similarity discrimination with translation characteristics
 
SYNTACTIC ANALYSIS BASED ON MORPHOLOGICAL CHARACTERISTIC FEATURES OF THE ROMA...
SYNTACTIC ANALYSIS BASED ON MORPHOLOGICAL CHARACTERISTIC FEATURES OF THE ROMA...SYNTACTIC ANALYSIS BASED ON MORPHOLOGICAL CHARACTERISTIC FEATURES OF THE ROMA...
SYNTACTIC ANALYSIS BASED ON MORPHOLOGICAL CHARACTERISTIC FEATURES OF THE ROMA...
 
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse AnalysisSentiment Analysis in Twitter with Lightweight Discourse Analysis
Sentiment Analysis in Twitter with Lightweight Discourse Analysis
 
RuleML2015 The Herbrand Manifesto - Thinking Inside the Box
RuleML2015 The Herbrand Manifesto - Thinking Inside the Box RuleML2015 The Herbrand Manifesto - Thinking Inside the Box
RuleML2015 The Herbrand Manifesto - Thinking Inside the Box
 

Similar a Barreiro-Batista-LR4NLP@Coling2018-presentation

Lecture: Context-Free Grammars
Lecture: Context-Free GrammarsLecture: Context-Free Grammars
Lecture: Context-Free GrammarsMarina Santini
 
Principles of parameters
Principles of parametersPrinciples of parameters
Principles of parametersVelnar
 
Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Lifeng (Aaron) Han
 
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...Lifeng (Aaron) Han
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingRishikese MR
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingToine Bogers
 
Natural Language Processing Course in AI
Natural Language Processing Course in AINatural Language Processing Course in AI
Natural Language Processing Course in AISATHYANARAYANAKB
 
7. ku gr.sem 2013: Syntax
7. ku gr.sem 2013: Syntax7. ku gr.sem 2013: Syntax
7. ku gr.sem 2013: SyntaxTikaram Poudel
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silencepaperpublications3
 
Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Lifeng (Aaron) Han
 
Natural Language parsing.pptx
Natural Language parsing.pptxNatural Language parsing.pptx
Natural Language parsing.pptxsiddhantroy13
 
Morphological rules- Sarah Saneei
Morphological rules- Sarah SaneeiMorphological rules- Sarah Saneei
Morphological rules- Sarah SaneeiSRah Sanei
 
Syllable Structure.pptx
Syllable Structure.pptxSyllable Structure.pptx
Syllable Structure.pptxtest215275
 

Similar a Barreiro-Batista-LR4NLP@Coling2018-presentation (20)

CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Transla...
CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Transla...CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Transla...
CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Transla...
 
Lecture: Context-Free Grammars
Lecture: Context-Free GrammarsLecture: Context-Free Grammars
Lecture: Context-Free Grammars
 
When Multiwords Go Bad in Machine Translation
When Multiwords Go Bad in Machine TranslationWhen Multiwords Go Bad in Machine Translation
When Multiwords Go Bad in Machine Translation
 
Principles of parameters
Principles of parametersPrinciples of parameters
Principles of parameters
 
NLP_KASHK:Text Normalization
NLP_KASHK:Text NormalizationNLP_KASHK:Text Normalization
NLP_KASHK:Text Normalization
 
Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...
 
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
pptphrase-tagset-mapping-for-french-and-english-treebanks-and-its-application...
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
 
Natural Language Processing Course in AI
Natural Language Processing Course in AINatural Language Processing Course in AI
Natural Language Processing Course in AI
 
PROPOR2020_Barreiroetal
PROPOR2020_BarreiroetalPROPOR2020_Barreiroetal
PROPOR2020_Barreiroetal
 
P99 1067
P99 1067P99 1067
P99 1067
 
7. ku gr.sem 2013: Syntax
7. ku gr.sem 2013: Syntax7. ku gr.sem 2013: Syntax
7. ku gr.sem 2013: Syntax
 
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On SilenceSegmentation Words for Speech Synthesis in Persian Language Based On Silence
Segmentation Words for Speech Synthesis in Persian Language Based On Silence
 
Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...Pptphrase tagset mapping for french and english treebanks and its application...
Pptphrase tagset mapping for french and english treebanks and its application...
 
Natural Language parsing.pptx
Natural Language parsing.pptxNatural Language parsing.pptx
Natural Language parsing.pptx
 
Morphological rules- Sarah Saneei
Morphological rules- Sarah SaneeiMorphological rules- Sarah Saneei
Morphological rules- Sarah Saneei
 
Syllable Structure.pptx
Syllable Structure.pptxSyllable Structure.pptx
Syllable Structure.pptx
 
Syntax
SyntaxSyntax
Syntax
 
Barreiro-Mota-VarDial@Coling2018-poster
Barreiro-Mota-VarDial@Coling2018-posterBarreiro-Mota-VarDial@Coling2018-poster
Barreiro-Mota-VarDial@Coling2018-poster
 

Más de INESC-ID (Spoken Language Systems Laboratory - L2F)

Más de INESC-ID (Spoken Language Systems Laboratory - L2F) (20)

Multi3Generation@INGL2020
Multi3Generation@INGL2020Multi3Generation@INGL2020
Multi3Generation@INGL2020
 
NooJ 2020 presentation
NooJ 2020 presentationNooJ 2020 presentation
NooJ 2020 presentation
 
Análise comparativa das edições portuguesa e brasileira de Os livros que dev...
Análise comparativa das edições portuguesa e brasileira de  Os livros que dev...Análise comparativa das edições portuguesa e brasileira de  Os livros que dev...
Análise comparativa das edições portuguesa e brasileira de Os livros que dev...
 
Welcome session 3rd Annual MC Meeting - enetCollect COST Action
Welcome session 3rd Annual MC Meeting - enetCollect COST ActionWelcome session 3rd Annual MC Meeting - enetCollect COST Action
Welcome session 3rd Annual MC Meeting - enetCollect COST Action
 
Syntactic-semantic analysis for information extraction in biomedicine
Syntactic-semantic analysis for information extraction in biomedicineSyntactic-semantic analysis for information extraction in biomedicine
Syntactic-semantic analysis for information extraction in biomedicine
 
Cross language semantic relations between English and Portuguese
Cross language semantic relations between English and PortugueseCross language semantic relations between English and Portuguese
Cross language semantic relations between English and Portuguese
 
Paraphrasing biomedical support verb constructions for machine translation
Paraphrasing biomedical support verb constructions for machine translationParaphrasing biomedical support verb constructions for machine translation
Paraphrasing biomedical support verb constructions for machine translation
 
ReWriter for legal text
ReWriter for legal textReWriter for legal text
ReWriter for legal text
 
Chatbots for Language Learning
Chatbots for Language LearningChatbots for Language Learning
Chatbots for Language Learning
 
eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization
eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and SummarizationeSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization
eSPERTo’s Paraphrastic Knowledge Applied to Question-Answering and Summarization
 
Barreiro et al POP@PROPOR2018-informal2formal-language
Barreiro et al POP@PROPOR2018-informal2formal-languageBarreiro et al POP@PROPOR2018-informal2formal-language
Barreiro et al POP@PROPOR2018-informal2formal-language
 
Rebelo-Arnold et al POP@PROPOR2018-EP-BP-alignments
Rebelo-Arnold et al POP@PROPOR2018-EP-BP-alignmentsRebelo-Arnold et al POP@PROPOR2018-EP-BP-alignments
Rebelo-Arnold et al POP@PROPOR2018-EP-BP-alignments
 
NooJ-2018-Palermo
NooJ-2018-PalermoNooJ-2018-Palermo
NooJ-2018-Palermo
 
Poster @ enetCollect CA MC meeting in Iasi, Romania
Poster @ enetCollect CA MC meeting in Iasi, Romania Poster @ enetCollect CA MC meeting in Iasi, Romania
Poster @ enetCollect CA MC meeting in Iasi, Romania
 
projeto-eSPERTo
projeto-eSPERToprojeto-eSPERTo
projeto-eSPERTo
 
ReEscreve: A Translator-Friendly Multi-Purpose Paraphrasing Software Tool
ReEscreve: A Translator-Friendly Multi-Purpose Paraphrasing Software ToolReEscreve: A Translator-Friendly Multi-Purpose Paraphrasing Software Tool
ReEscreve: A Translator-Friendly Multi-Purpose Paraphrasing Software Tool
 
Poster l2f 2017
Poster l2f 2017Poster l2f 2017
Poster l2f 2017
 
Nooj2017 cmota-etal
Nooj2017 cmota-etalNooj2017 cmota-etal
Nooj2017 cmota-etal
 
Machine Translation of Discontinuous Multiword Units
Machine Translation of Discontinuous Multiword UnitsMachine Translation of Discontinuous Multiword Units
Machine Translation of Discontinuous Multiword Units
 
Automatic Paraphrasing of Human Intransitive Adjectives in Portuguese
Automatic Paraphrasing of Human Intransitive Adjectives in PortugueseAutomatic Paraphrasing of Human Intransitive Adjectives in Portuguese
Automatic Paraphrasing of Human Intransitive Adjectives in Portuguese
 

Último

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfsudhanshuwaghmare1
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdflior mazor
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?Antenna Manufacturer Coco
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsJoaquim Jorge
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 

Último (20)

Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
GenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdfGenAI Risks & Security Meetup 01052024.pdf
GenAI Risks & Security Meetup 01052024.pdf
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 

Barreiro-Batista-LR4NLP@Coling2018-presentation

  • 1. technology from seed Contractions: To Align or Not to Align, That Is the Question Anabela Barreiro & Fernando Batista LR4NLP@COLING 2018 – Santa Fe, New Mexico, USA – August 20th 2018
  • 2. • Contractions • Goal • Approach • Methodology • Corpus • Decomposed Contractions • Non-decomposed Contractions • Analysis of Results • Final Remarks • References Outline 2
  • 3. • Words formed from 2+ words that would otherwise appear next to each other in a sequence – different parts-of-speech (PoS) (most frequently) – same PoS (more seldom) – Examples • EN: is not → isn’t, it is → it’s • IT: di + gli → degli • FR: la une → l’une ; de après → d’après • PT: em + as → nas • Problematic for several reasons – 2+ words overlap, which makes syntactic analysis and generation more difficult – in cross-language analysis, the contrast between languages that have contractions and languages that do not have them, or do not have them in the same contexts, may present additional difficulties Contractions 3
  • 4. • Alignment of (discontinuous) multiword units (MWU) where contractions occur – a challenge that has been overlooked in the existing literature and can be responsible for grammatical errors in translations (or paraphrasing) – high quality machine translation/paraphrasing depends on the quality of the alignments used in the processes of machine learning Goal 4
  • 5. • Linguistically motivated approach to the alignment of MWU where contractions occurring in these MWU are required to be decomposed, except in specific circumstances determined by the context, such as when they constitute a non-variable (non- inflecting) element of a frozen MWU – Examples • apesar da → [apesar de] [a NP] • [à luz dos] → [à luz de] [os NP] Approach 5
  • 6. • Alignment of MWU & phrasal units based on linguistic knowledge – Decide if contractions should be decomposed or maintained 1. Initially, the pre-processing task consisted of a semi-manual decomposition by a linguist of all contractions • Decomposition allowed for the correct alignment of MWU where contracted forms required to be split so that those MWU and the phrases that follow them could be mapped to the corresponding elements in the source language 2. Subsequently, all the decomposed forms were reviewed and the decomposed forms in MWU and frozen expressions were changed back to contractions – The methodology applies whether the source-target pair are: • 2 different languages (e.g. translation) • 2 varieties of the same language (e.g. variety adaptation) • the same language (e.g. paraphrases) Methodology 6
  • 7. • PT–EN CLUE4Translation parallel corpus – subset of the reference Europarl parallel corpus (Koehn, 2005) containing 400 Portuguese-English parallel sentences • In the corpus, the number of contractions that need to be decomposed is (much) greater than the number of contractions that require to be maintained • Work performed in the framework of the eSPERTo project in which the alignment tool used (CLUE-Aligner) was developed Corpus 7
  • 8. Decomposed Contractions 8 • Contractions in contexts in which they cannot be decomposed • Decomposition is required to make possible that 2 concatenated words go in 2 different alignment pairs – Alignment of the compound word em nome de "on behalf of" and the NP o meu grupo "my group” with their internal elements (individual words)
  • 9. • Other examples of decomposition that has taken place in contractions Decomposed Contractions 9 two elements, thepreposition de"of" and themasculine singular definite article o "the" (d to align correctly both the canonical form (lemma) of the compound word em nome de and the noun phrase o meu grupo "my group", where the preposition of the contraction compound and thedefinite article goes with thenoun phrase, i.e., thedecomposition isre possible that the two concatenated words go in two different alignment pairs, as illustrat Similar decomposition hastaken place in contractions such asthose illustrated in exampl (2) EN - across + [the Atlantic] PT - do outro lado de(do = [ de+o] ) [o Atlântico] (3) EN - issues like+ [theNP] PT - questões como a de(dos = [ de+os] ) [osNP] (4) EN - with respect to + [the N] PT - quanto a (ao = [ a+o] ) [o N] (5) EN - fully approves[NP: thejoint position of thecouncil] PT - dá a sua total aprovação a (à = [ a+a] ) [NP: a posição comumdo conselho] Decomposition of contractions also has implication in coordination. For example, t noun phrases o parlamento "the parliament" and o conselho "the council" illustrated in F rect complements of the Portuguese prepositional verb realizado por "carried out by". W
  • 10. Decomposed Contractions 10 • Decomposition of contractions has implication in coordination – Alignment of the coordinated MWU realizado por [NP] e por [NP], implying the double decomposition of the contraction pelo into the preposition por of the prepositional verb and the masculine singular definite article o of the coordinated NP.
  • 11. Non-decomposed Contractions 11 • Contractions in contexts in which they cannot be decomposed • Contractions in non-compositional MWU, such as fixed expressions, were maintained – Alignment of the fixed expression à data "at the time"
  • 12. Non-decomposed Contractions 12 • Contractions in non-compositional MWU, such as fixed expressions, were maintained – Alignment of the fixed expression na ordem do dia "the next item"
  • 13. • Most contractions require decomposition in contexts where they are part of a MWU Analysis of Results 13 Frequency of contractions in contexts in which they require decomposition
  • 14. Analysis of Results 14 The most frequent contractions in the corpus (de+a, de+o, de+os, and em+a), establish syntactic relationships between MWU, such as compounds, prepositional nouns, etc. In these contractions, the preposition establishes the final border of the 1st phrase (i.e., the last word in the phrase), and the determiner establishes the initial border of the phrase immediately after (i.e., the 1st word in the phrase) • Most contractions require decomposition in contexts where they are part of a MWU
  • 15. Analysis of Results 15 The second noun phrase can be a named entity • União Europeia "European Union" • Ásia "Asia" or a term • capital de risco "risk capital" • fundos de pensão "pension funds" • Most contractions require decomposition in contexts where they are part of a MWU
  • 16. Analysis of Results 16 Some contractions appear in discontinuous MWU • centrar-se [] em "to deal with" There are occurrences of contractions that require decomposition in contexts where the preposition is part of a MWU (the last word of the MWU, e.g., em relação a "with regard to" and the determiner is part of a regular NP, e.g., as observações "the comments". • Most contractions require decomposition in contexts where they are part of a MWU
  • 17. • WRT contractions that cannot be decomposed, most of them occur in the beginning or in the middle of the MWU, seldom in the end. For example, the contractions no, neste, pelo, and às in the MWU no que diz respeito a "with regard to", neste momento "at this time", pelo contrário "on the contrary", and às 12h30 "at 12.30 p.m." cannot be decomposed, because they are not positioned in the border with the next phrase. The same goes for the contraction à in the MWU até à data "so far", which occurs in a middle position. Exceptionally, the contraction disso in the MWU além disso "in addition" also remains undecomposed, because it corresponds to a fixed adverbial expression. Analysis of Results 17
  • 18. • Most frequent undecomposed contractions Analysis of Results 18
  • 19. • A few observations are worth noting WRT undecomposeable contractions: – There are some semantico-syntactic patterns that function as linguistic constraints. For example, the contractions às, nos, or pelos cannot be decomposed when used with time-related named entities, such as às seis horas da tarde "at 6 p.m", às sextas-feiras "on Fridays", nos anos sessenta "in the sixties”, or pelos anos seguintes "for all years ahead", among others. – In normal circumstances, contractions of prepositions with pronouns, such as consigo in the expression consigo próprio "with itself" should not be decomposed. Analysis of Results 19
  • 20. • When used in MT (or paraphrasing) systems, alignments containing linguistic knowledge contribute to improved accuracy, reduced computational complexity and ambiguity, and improved translation/paraphrasing quality, as illustrated for the contractions described. • Contractions can be a very frequent phenomenon in a language, thus the results that can be obtained through their correct alignment in a system can be significantly better than those obtained in a purely statistic or ad-hoc manner. Final Remarks 20
  • 21. • Many other linguistic phenomena require further examination. • Without a suitable linguistic approach to the alignment task, and limited to the capacity of the algorithms, systems will continue to be overloaded with poor quality alignments, which will create translation (paraphrasing) of limited quality, requiring a greater post-editing effort. • There is still a shortage of manually annotated alignments that can be used in training and evaluation for many language pairs or language variants, especially those with scarce resources. Final Remarks 21
  • 22. • We have used a methodology to align MWU involving contractions, which pose a challenge to their correct alignment • The proposed alignment methodology is independent of the application, so the pairs of aligned MWU and phrases can be used in translation, paraphrasing, variety adaptation and other NLP tasks • Knowledge learned from a linguistic approach to alignment tasks can help solve problems related to the alignment of MWU, provide better solutions to process and align them, leading to more precise and better quality applications Final Remarks 22
  • 23. • Anabela Barreiro and Fernando Batista. 2016. Machine Translation of Non-Contiguous Multiword Units. Proceedings of the Workshop on Discontinuous Structures in Natural Language Processing, DiscoNLP 2016, pages 22–30, San Diego, California, June. Association for Computational Linguistics (ACL). • Anabela Barreiro and Cristina Mota. 2017. e-PACT: eSPERTo Paraphrase Aligned Corpus of EN-EP/BP Translations. Tradução em Revista, 1(22):87–102. • Anabela Barreiro, Bernard Scott, Walter Kasper, and Bernd Kiefer. 2011. OpenLogos Rule-Based Machine Translation: Philosophy, Model, Resources and Customization. Machine Translation, 25(2):107–126. • Anabela Barreiro, Fernando Batista, Ricardo Ribeiro, Helena Moniz, and Isabel Trancoso. 2014a. OpenLogos Semantico-Syntactic Knowledge-Rich Bilingual Dictionaries. Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014, pages 3774–3781, Reykjavik, Iceland, May. European Language Resources Association. • Anabela Barreiro, Johanna Monti, Brigitte Orliac, Susanne Preuss, Kutz Arrieta, Wang Ling, Fernando Batista, and Isabel Trancoso. 2014b. Linguistic Evaluation of Support Verb Constructions by OpenLogos and Google Translate. Proceedings of the 9th International Conference on Language Resources and Evaluation, LREC 2014, pages 35–40, Reykjavik, Iceland, May. European Language Resources Association. • Anabela Barreiro, Francisco Raposo, and Tiago Luís. 2016. CLUE-Aligner: An Alignment Tool to Annotate Pairs of Paraphrastic and Translation Units. Proceedings of the 10th Edition of the Language Resources and Evaluation Conference, LREC 2016, pages 7–13. European Language Resources Association. • Phil Blunsom and Trevor Cohn. 2006. Discriminative Word Alignment with Conditional Random Fields. In • Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, ACL-44, pages 65–72, Sydney, Australia. Association for Computational Linguistics. • David Chiang. 2005. A Hierarchical Phrase-based Model for Statistical Machine Translation. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, ACL 2005, pages 263–270, Ann Arbor, Michigan, USA. Association for Computational Linguistics. • João Graça, Joana Paulo Pardal, Luísa Coheur, and Diamantino Caseiro. 2008. Building a Golden Collection of Parallel Multi-Language Word Alignment., LREC 2008, pages 986–993, Marrakech, Morocco, May. European Language Resources Association. • Philipp Koehn. 2005. EuroParl: A Parallel Corpus for Statistical Machine Translation. In Proceedings of the 10th Machine Translation Summit, pages 79–86, Phuket, Thailand. AAMT, Asia-Pacific Association for Machine Translation. • Patrik Lambert, Adrià De Gispert, Rafael Banchs, and José B. Mariñ o. 2005. Guidelines for Word Alignment Evaluation and Manual Alignment. Language Resources and Evaluation, 39(4):267–285, Dec. • Daniel Marcu, Wei Wang, Abdessamad Echihabi, and Kevin Knight. 2006. SPMT: Statistical machine translation with syntactified target language phrases. EMNLP 2006, pages 44–52, Stroudsburg, PA, USA. ACL. • Franz Josef Och and Hermann Ney. 2000. Improved Statistical Alignment Models. In Proceedings of the 38th Annual Meeting on Association for Computational Linguistics, ACL 2000, pages 440–447, Hong Kong. Association for Computational Linguistics. • Bernard Scott. 2003. The Logos Model: An Historical Perspective. Machine Translation, 18(1):1–72. • Bernard Scott. 2018. Translation, Brains and the Computer: A Neurolinguistic Solution to Ambiguity and Complexity in Machine Translation. Machine Translation: Technologies and Applications. Springer International Publishing. • Max Silberztein. 2016. Formalizing Natural Languages: the NooJ Approach. Wiley Eds. • Jörg Tiedemann. 2011. Bitext Alignment. Synthesis Digital Library of Engineering and Computer Science. Morgan & Claypool. • Andreas Zollmann and Ashish Venugopal. 2006. Syntax Augmented Machine Translation via Chart Parsing. In Proceedings of the Workshop on Statistical Machine Translation, SMT 2006, pages 138–141, New York City, June. Association for Computational Linguistics. References 23