SlideShare a Scribd company logo
1 of 15
Download to read offline
SPADE: Evaluation Dataset for
Monolingual Phrase Alignment
Yuki Arase*† and Junichi Tsujii†◊
*Osaka University, Japan
†Artificial Intelligence Research Center (AIRC), AIST, Japan
◊NaCTeM, School of Computer Science, University of Manchester, UK
Created and Released a dataset annotating
Phrase alignments on parse trees
of paraphrases
her life is excellent and wonderful… she also has a very splendid… life
COOD
ADJP
VP
NP
S…
ADJP
NP
NP
VP
VP
S…
Annotator #1
Annotator #2
Annotator #3
2
15,721
alignments
https://catalog.ldc.upenn.edu/LDC2018T09
3
Phrasal (N-gram) Paraphrases
♠Phrasal paraphrases of N-grams have been useful
for NLP applications
• Semantic parsing (Berant and Liang, 2014)
• Automatic QA (Dong et al., 2017)
♠PPDB (Ganitkevitch et al., 2013) is widely used as
an abundant resource
4
Are N-grams Sufficient?
♠Syntactic structures are important in modeling
phrases/sentences
• Semantic relatedness (Tai et al., 2015)
• Phrase embedding (Wieting et al., 2015)
♠Part of PPDB provides phrasal paraphrases under the
synchronous context free grammar (SCFG)
♠SCFG captures only a fraction of paraphrasing
phenomenon (Weese et al., 2014)
• Only 9.1% of paraphrases were reachable using SCFG
5
♠Phrasal paraphrases under the linguistically
motivated grammar would deliver richer
syntactic information
♠For systematic research,
• SPADE annotates phrase alignments under
the head-driven phrase structure grammar (Pollard
and Sag, 1994)
• Evaluation metrics are proposed for benchmarking
Phrase Alignment on Paraphrases
6
Annotation Target
Paraphrases extracted from MT evaluation corpora
♠Paraphrases by linguistic operations
♠Paraphrases with simple summarization
Relying on team spirit, expedition members defeated difficulties.
Members of the scientific team overcame challenges living on Mars
through teamwork.
7
Approach
1. Gold-tree annotation by a linguistic expert
2. Phrase alignment annotation
• 3 annotators independently identified phrase
alignments using a provided annotation tool
• Refer to tree structures when helpful
8
Gold-Tree Annotation
her life is excellent and wonderful… she also has a very splendid… life
COOD
ADJP
VP
NP
S…
ADJP
NP
NP
VP
VP
S…
9
Phrase alignment annotation
her life is excellent and wonderful… she also has a very splendid… life
COOD
ADJP
VP
NP
S…
ADJP
NP
NP
VP
VP
S…
10
SPADE Statistics
Dev Test
# of sentence pairs 50 151
# of tokens 2,494 7,276
# of types 736 1,573
# of phrases (w/o tokens) 5,201 15,075
# of alignments (∪) 3,932 11,789
# of alignments (∩) 2,518 7,134
11
Evaluation Metric
♠ALIR (ALInment Recall) evaluates how gold alignments
(𝔾𝔾 & 𝔾𝔾′) can be replicated by automatic alignment (ℍ𝑎𝑎)
ALIR =
| 𝕙𝕙|𝕙𝕙 ∈ ℍ𝑎𝑎 ∧ 𝕙𝕙 ∈ 𝔾𝔾 ∩ 𝔾𝔾′
|
𝔾𝔾 ∩ 𝔾𝔾′
♠ALIP (ALInment Precision) evaluates how automatic
alignments overlap with alignments that at least an
annotator aligned
ALIP =
| 𝕙𝕙|𝕙𝕙 ∈ ℍ𝑎𝑎 ∧ 𝕙𝕙 ∈ 𝔾𝔾 ∪ 𝔾𝔾′
|
ℍ𝑎𝑎
12
Benchmark
90.65
88.21
83.64
78.91
70
75
80
85
90
95
ALIR ALIP
Human
(Arase and Tsujii,
2017)
Y. Arase and J. Tsujii. 2017.
Monolingual Phrase Alignment
on Parse Forests, in Proc. of
EMNLP, pp. 1-11.
13
Future Directions
Expand the dataset
1. Size
• Working on annotating 5k more paraphrase pairs
2. Linguistic phenomenon in paraphrases
• SPADE used reference translations as paraphrases
• Cover relatively simple paraphrases due to constraints by
the source sentences
14
Future Directions (Cont’d)
2. Linguistic phenomenon in paraphrases
• Annotate paraphrases from other datasets
• Microsoft Research Paraphrase Corpus (Dolan et al., 2004)
• Twitter URL corpus (Lan et al., 2017)
• Cover diverse linguistic phenomenon of
paraphrases in the wild
Ex) Paraphrases involve inferences/entailments
Scientists overcame challenges living on Mars.
Scientists overcame water and oxygen scarcity on the red planet. 15

More Related Content

More from Yuki Arase

闘病ブログからの医薬品奏功情報認識
闘病ブログからの医薬品奏功情報認識闘病ブログからの医薬品奏功情報認識
闘病ブログからの医薬品奏功情報認識Yuki Arase
 
自然言語処理によるテキストデータ処理
自然言語処理によるテキストデータ処理自然言語処理によるテキストデータ処理
自然言語処理によるテキストデータ処理Yuki Arase
 
[最新版] JSAI2018 チュートリアル「"深層学習時代の" ゼロから始める自然言語処理」
[最新版] JSAI2018 チュートリアル「"深層学習時代の" ゼロから始める自然言語処理」 [最新版] JSAI2018 チュートリアル「"深層学習時代の" ゼロから始める自然言語処理」
[最新版] JSAI2018 チュートリアル「"深層学習時代の" ゼロから始める自然言語処理」 Yuki Arase
 
[旧版] JSAI2018 チュートリアル「"深層学習時代の" ゼロから始める自然言語処理」
[旧版] JSAI2018 チュートリアル「"深層学習時代の" ゼロから始める自然言語処理」 [旧版] JSAI2018 チュートリアル「"深層学習時代の" ゼロから始める自然言語処理」
[旧版] JSAI2018 チュートリアル「"深層学習時代の" ゼロから始める自然言語処理」 Yuki Arase
 
NLP R&D 育成と連携:NLP若手の会 (YANS)の取り組み
NLP R&D 育成と連携:NLP若手の会 (YANS)の取り組み NLP R&D 育成と連携:NLP若手の会 (YANS)の取り組み
NLP R&D 育成と連携:NLP若手の会 (YANS)の取り組み Yuki Arase
 
ゼロから始める自然言語処理 【FIT2016チュートリアル】
ゼロから始める自然言語処理 【FIT2016チュートリアル】ゼロから始める自然言語処理 【FIT2016チュートリアル】
ゼロから始める自然言語処理 【FIT2016チュートリアル】Yuki Arase
 

More from Yuki Arase (6)

闘病ブログからの医薬品奏功情報認識
闘病ブログからの医薬品奏功情報認識闘病ブログからの医薬品奏功情報認識
闘病ブログからの医薬品奏功情報認識
 
自然言語処理によるテキストデータ処理
自然言語処理によるテキストデータ処理自然言語処理によるテキストデータ処理
自然言語処理によるテキストデータ処理
 
[最新版] JSAI2018 チュートリアル「"深層学習時代の" ゼロから始める自然言語処理」
[最新版] JSAI2018 チュートリアル「"深層学習時代の" ゼロから始める自然言語処理」 [最新版] JSAI2018 チュートリアル「"深層学習時代の" ゼロから始める自然言語処理」
[最新版] JSAI2018 チュートリアル「"深層学習時代の" ゼロから始める自然言語処理」
 
[旧版] JSAI2018 チュートリアル「"深層学習時代の" ゼロから始める自然言語処理」
[旧版] JSAI2018 チュートリアル「"深層学習時代の" ゼロから始める自然言語処理」 [旧版] JSAI2018 チュートリアル「"深層学習時代の" ゼロから始める自然言語処理」
[旧版] JSAI2018 チュートリアル「"深層学習時代の" ゼロから始める自然言語処理」
 
NLP R&D 育成と連携:NLP若手の会 (YANS)の取り組み
NLP R&D 育成と連携:NLP若手の会 (YANS)の取り組み NLP R&D 育成と連携:NLP若手の会 (YANS)の取り組み
NLP R&D 育成と連携:NLP若手の会 (YANS)の取り組み
 
ゼロから始める自然言語処理 【FIT2016チュートリアル】
ゼロから始める自然言語処理 【FIT2016チュートリアル】ゼロから始める自然言語処理 【FIT2016チュートリアル】
ゼロから始める自然言語処理 【FIT2016チュートリアル】
 

Recently uploaded

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...DianaGray10
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Zilliz
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...Martijn de Jong
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyKhushali Kathiriya
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProduct Anonymous
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century educationjfdjdjcjdnsjd
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherRemote DBA Services
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024The Digital Insurer
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobeapidays
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...apidays
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businesspanagenda
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamUiPathCommunity
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfOverkill Security
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Victor Rentea
 

Recently uploaded (20)

Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 AmsterdamDEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
DEV meet-up UiPath Document Understanding May 7 2024 Amsterdam
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
Cyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdfCyberprint. Dark Pink Apt Group [EN].pdf
Cyberprint. Dark Pink Apt Group [EN].pdf
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
Modular Monolith - a Practical Alternative to Microservices @ Devoxx UK 2024
 

SPADE: Evaluation Dataset for Monolingual Phrase Alignment

  • 1. SPADE: Evaluation Dataset for Monolingual Phrase Alignment Yuki Arase*† and Junichi Tsujii†◊ *Osaka University, Japan †Artificial Intelligence Research Center (AIRC), AIST, Japan ◊NaCTeM, School of Computer Science, University of Manchester, UK
  • 2. Created and Released a dataset annotating Phrase alignments on parse trees of paraphrases her life is excellent and wonderful… she also has a very splendid… life COOD ADJP VP NP S… ADJP NP NP VP VP S… Annotator #1 Annotator #2 Annotator #3 2 15,721 alignments
  • 4. Phrasal (N-gram) Paraphrases ♠Phrasal paraphrases of N-grams have been useful for NLP applications • Semantic parsing (Berant and Liang, 2014) • Automatic QA (Dong et al., 2017) ♠PPDB (Ganitkevitch et al., 2013) is widely used as an abundant resource 4
  • 5. Are N-grams Sufficient? ♠Syntactic structures are important in modeling phrases/sentences • Semantic relatedness (Tai et al., 2015) • Phrase embedding (Wieting et al., 2015) ♠Part of PPDB provides phrasal paraphrases under the synchronous context free grammar (SCFG) ♠SCFG captures only a fraction of paraphrasing phenomenon (Weese et al., 2014) • Only 9.1% of paraphrases were reachable using SCFG 5
  • 6. ♠Phrasal paraphrases under the linguistically motivated grammar would deliver richer syntactic information ♠For systematic research, • SPADE annotates phrase alignments under the head-driven phrase structure grammar (Pollard and Sag, 1994) • Evaluation metrics are proposed for benchmarking Phrase Alignment on Paraphrases 6
  • 7. Annotation Target Paraphrases extracted from MT evaluation corpora ♠Paraphrases by linguistic operations ♠Paraphrases with simple summarization Relying on team spirit, expedition members defeated difficulties. Members of the scientific team overcame challenges living on Mars through teamwork. 7
  • 8. Approach 1. Gold-tree annotation by a linguistic expert 2. Phrase alignment annotation • 3 annotators independently identified phrase alignments using a provided annotation tool • Refer to tree structures when helpful 8
  • 9. Gold-Tree Annotation her life is excellent and wonderful… she also has a very splendid… life COOD ADJP VP NP S… ADJP NP NP VP VP S… 9
  • 10. Phrase alignment annotation her life is excellent and wonderful… she also has a very splendid… life COOD ADJP VP NP S… ADJP NP NP VP VP S… 10
  • 11. SPADE Statistics Dev Test # of sentence pairs 50 151 # of tokens 2,494 7,276 # of types 736 1,573 # of phrases (w/o tokens) 5,201 15,075 # of alignments (∪) 3,932 11,789 # of alignments (∩) 2,518 7,134 11
  • 12. Evaluation Metric ♠ALIR (ALInment Recall) evaluates how gold alignments (𝔾𝔾 & 𝔾𝔾′) can be replicated by automatic alignment (ℍ𝑎𝑎) ALIR = | 𝕙𝕙|𝕙𝕙 ∈ ℍ𝑎𝑎 ∧ 𝕙𝕙 ∈ 𝔾𝔾 ∩ 𝔾𝔾′ | 𝔾𝔾 ∩ 𝔾𝔾′ ♠ALIP (ALInment Precision) evaluates how automatic alignments overlap with alignments that at least an annotator aligned ALIP = | 𝕙𝕙|𝕙𝕙 ∈ ℍ𝑎𝑎 ∧ 𝕙𝕙 ∈ 𝔾𝔾 ∪ 𝔾𝔾′ | ℍ𝑎𝑎 12
  • 13. Benchmark 90.65 88.21 83.64 78.91 70 75 80 85 90 95 ALIR ALIP Human (Arase and Tsujii, 2017) Y. Arase and J. Tsujii. 2017. Monolingual Phrase Alignment on Parse Forests, in Proc. of EMNLP, pp. 1-11. 13
  • 14. Future Directions Expand the dataset 1. Size • Working on annotating 5k more paraphrase pairs 2. Linguistic phenomenon in paraphrases • SPADE used reference translations as paraphrases • Cover relatively simple paraphrases due to constraints by the source sentences 14
  • 15. Future Directions (Cont’d) 2. Linguistic phenomenon in paraphrases • Annotate paraphrases from other datasets • Microsoft Research Paraphrase Corpus (Dolan et al., 2004) • Twitter URL corpus (Lan et al., 2017) • Cover diverse linguistic phenomenon of paraphrases in the wild Ex) Paraphrases involve inferences/entailments Scientists overcame challenges living on Mars. Scientists overcame water and oxygen scarcity on the red planet. 15