Presentation slides at LREC2018 about our evaluation dataset for syntactic phrase alignment on paraphrases.
The dataset is available here: https://catalog.ldc.upenn.edu/LDC2018T09
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
SPADE: Evaluation Dataset for Monolingual Phrase Alignment
1. SPADE: Evaluation Dataset for
Monolingual Phrase Alignment
Yuki Arase*† and Junichi Tsujii†◊
*Osaka University, Japan
†Artificial Intelligence Research Center (AIRC), AIST, Japan
◊NaCTeM, School of Computer Science, University of Manchester, UK
2. Created and Released a dataset annotating
Phrase alignments on parse trees
of paraphrases
her life is excellent and wonderful… she also has a very splendid… life
COOD
ADJP
VP
NP
S…
ADJP
NP
NP
VP
VP
S…
Annotator #1
Annotator #2
Annotator #3
2
15,721
alignments
4. Phrasal (N-gram) Paraphrases
♠Phrasal paraphrases of N-grams have been useful
for NLP applications
• Semantic parsing (Berant and Liang, 2014)
• Automatic QA (Dong et al., 2017)
♠PPDB (Ganitkevitch et al., 2013) is widely used as
an abundant resource
4
5. Are N-grams Sufficient?
♠Syntactic structures are important in modeling
phrases/sentences
• Semantic relatedness (Tai et al., 2015)
• Phrase embedding (Wieting et al., 2015)
♠Part of PPDB provides phrasal paraphrases under the
synchronous context free grammar (SCFG)
♠SCFG captures only a fraction of paraphrasing
phenomenon (Weese et al., 2014)
• Only 9.1% of paraphrases were reachable using SCFG
5
6. ♠Phrasal paraphrases under the linguistically
motivated grammar would deliver richer
syntactic information
♠For systematic research,
• SPADE annotates phrase alignments under
the head-driven phrase structure grammar (Pollard
and Sag, 1994)
• Evaluation metrics are proposed for benchmarking
Phrase Alignment on Paraphrases
6
7. Annotation Target
Paraphrases extracted from MT evaluation corpora
♠Paraphrases by linguistic operations
♠Paraphrases with simple summarization
Relying on team spirit, expedition members defeated difficulties.
Members of the scientific team overcame challenges living on Mars
through teamwork.
7
8. Approach
1. Gold-tree annotation by a linguistic expert
2. Phrase alignment annotation
• 3 annotators independently identified phrase
alignments using a provided annotation tool
• Refer to tree structures when helpful
8
9. Gold-Tree Annotation
her life is excellent and wonderful… she also has a very splendid… life
COOD
ADJP
VP
NP
S…
ADJP
NP
NP
VP
VP
S…
9
10. Phrase alignment annotation
her life is excellent and wonderful… she also has a very splendid… life
COOD
ADJP
VP
NP
S…
ADJP
NP
NP
VP
VP
S…
10
11. SPADE Statistics
Dev Test
# of sentence pairs 50 151
# of tokens 2,494 7,276
# of types 736 1,573
# of phrases (w/o tokens) 5,201 15,075
# of alignments (∪) 3,932 11,789
# of alignments (∩) 2,518 7,134
11
12. Evaluation Metric
♠ALIR (ALInment Recall) evaluates how gold alignments
(𝔾𝔾 & 𝔾𝔾′) can be replicated by automatic alignment (ℍ𝑎𝑎)
ALIR =
| 𝕙𝕙|𝕙𝕙 ∈ ℍ𝑎𝑎 ∧ 𝕙𝕙 ∈ 𝔾𝔾 ∩ 𝔾𝔾′
|
𝔾𝔾 ∩ 𝔾𝔾′
♠ALIP (ALInment Precision) evaluates how automatic
alignments overlap with alignments that at least an
annotator aligned
ALIP =
| 𝕙𝕙|𝕙𝕙 ∈ ℍ𝑎𝑎 ∧ 𝕙𝕙 ∈ 𝔾𝔾 ∪ 𝔾𝔾′
|
ℍ𝑎𝑎
12
14. Future Directions
Expand the dataset
1. Size
• Working on annotating 5k more paraphrase pairs
2. Linguistic phenomenon in paraphrases
• SPADE used reference translations as paraphrases
• Cover relatively simple paraphrases due to constraints by
the source sentences
14
15. Future Directions (Cont’d)
2. Linguistic phenomenon in paraphrases
• Annotate paraphrases from other datasets
• Microsoft Research Paraphrase Corpus (Dolan et al., 2004)
• Twitter URL corpus (Lan et al., 2017)
• Cover diverse linguistic phenomenon of
paraphrases in the wild
Ex) Paraphrases involve inferences/entailments
Scientists overcame challenges living on Mars.
Scientists overcame water and oxygen scarcity on the red planet. 15