SlideShare una empresa de Scribd logo
1 de 21
Descargar para leer sin conexión
Towards Evaluation of Cross-document Coreference Resolution
Models Using Datasets with Diverse Annotation Schemes
Anastasia Zhukova1, Felix Hamborg2, Bela Gipp1,3
1University of Wuppertal, Germany
2University of Konstanz, Germany
3University of Göttingen, Germany
LREC’22, 23 June 2022, Marseille
Motivation: annotation of cross-document coreference resolution datasets
Source 1: The Australian
Source 2: The Economist
3
Zhukova et al. "Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes"
Motivation: annotation of cross-document coreference resolution datasets
4
Source 1: The Australian
Source 2: The Economist
Zhukova et al. "Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes"
Research Question and Tasks
RQ: How to compare CDCR datasets and which aspects to consider in the analysis?
Research tasks
• RT1: Compare and discuss two perspectives of annotating coreference chains and relations:
– Event-centric chains with strict identity relations (ECB+ [Cy14])
– Concept-centric chains with loose coreference relations, i.e., combination of identity and bridging
relations (NewsWCL50 [Ha19])
• RT2: Propose further directions of evaluation of the CDCR models
5
Zhukova et al. "Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes"
Related work
• CDCR datasets have two triggers for coreference
chains
– Event-centric (occurrence of an event matters)
– Concept-centric (frequency of an event/entity matters)
• Typically separated research
– Strict identity relations
– Near-identity relations
– Bridging relations
• Related work focuses on comparing (CD)CR datasets
of same nature
– For example, event-centric ECB+, FCC, GVC [Bu21]
• Very scarce comparison of the CDCR datasets of the
different nature
– We compare diverse ECB+ and NewsWCL50
Strict
identity relations
Near-identity
relations
Bridging
relations
Datasets
ECB+
TAK KBP
ACE
MEANTIME
ISNotes
BASHI
ARRAU
DEFT
NiDENT
RED
NewsWCL50
FCC
GVC
WEC
HyperCoref
NP4E
GUM
6
Zhukova et al. "Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes"
Qualitative
comparison
Dataset structure
• ECB+
– Topic, e.g., earthquake
– Subtopic, i.e., a narrowly-related event
described by an action, participant(s),
location, and time
– Article, i.e., clearly represents a subtopic
Articles
Large content overlap
Subtopic 1 Subtopic 2
Topic
Topic
Articles
Smaller content overlap,
contain discussions of various
aspects of the event
8
• NewsWCL50
– Topic, i.e., articles reporting on the same
event
– Article, i.e., discusses at least partially
different aspects of the reported event
Zhukova et al. "Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes"
Annotation: Mentions & Coreference chains
9
Dataset Which mentions
How and which
mentions to annotate
When to link into the
coreference chains
Problems
ECB+
action-driven mentions
o action
o participant
o location
o datetime
• minimum-span style
• NPs and VPs
• pronouns
• entities: if they refer to
the same entity
• events: same attributes
(action + participant +
location + datetime)
• unannotated entity mentions
if not action attributes
• many event-centric annotated
mentions become singleton
coreference chains
NewsWCL50
frequency-driven
mentions
o action/event
o actor
o object
o GPE
o abstract entities
o (no location)
o (no datetime)
• maximum-span style
• NPs and VPs
• (no pronouns)
• minimum 5 occurrences
of an event/entity linked
with diverse relations
across the related articles
• some entities are too abstract
and need refinement
• not annotated entities:
location, datetime, and
pronoun mentions
as attributes
of an action
Zhukova et al. "Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes"
Relations
ECB+: strict identity
• Actor, location, datetime: refer to the same entity
“the U.S.” – “America”
• Event components: same action, participant, location,
datetime
“Lindsay Lohan checked into rehab.”
“Ms. Lohan entered a rehab facility.”
10
NewsWCL50: a mix of identity and bridging relations
(1) identity relations including subject-predicate coreference
“Donald Trump” – “the newly-minted president”
(2) synonym relations
“a caravan of immigrants” – “a few hundred asylum seekers”
(3) metonymy
“the Russian government” – “the Kremlin”
(4) meronymy/holonymy
“the U.S.-Mexico border” – “the San Ysidro port of entry”
(5) mentions are linked with copular verbs, e.g., “be,” “seem,” or “call”
“Trump called Kim Jong Un a Little Rocket Man” →
“Kim Jong Un” – “Little Rocket Man”
(6) mentions meet GPE definitions as of ACE annotation
“the U.S.” – “American people”
(7) mentions are elements of one set
“guaranteed to disrupt trade” – “drove down steel prices” as
members of a set “Consequences of tariff imposition”
Zhukova et al. "Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes"
Reannotation experiment
11
Head ECB+ (original) NewsWCL50 (following the coding book)
Jeffs Jeffs x15
Warren Jeffs x1
polygamist sect leader Warren Jeffs x1
prophet Warren Jeffs x2
leader Warren Jeffs x2
polygamist Warren Jeffs x1
polygamist leader Warren Jeffs x3
Warren Jeffs, Polygamist Leader x1
Jeffs x66
Warren Jeffs x4
polygamist sect leader Warren Jeffs x2
prophet Warren Jeffs x1
FLDS prophet Warren Jeffs x1
Mr. Jeffs x2
the 54-year-old Jeffs x1
Jeffs, who acted as his own attorney x1
Jeffs, who was indicted more than two years ago x1
Warren Jeffs, leader of the Fundamentalist Church of Jesus Christ… x1
Warren Jeffs, polygamist leader x1
misc he, his, him, who x22
polygamist x2
attorney x2
pedophile x1
a victim of religious persecution x1
the highest-profile defendant x1
the father of a 15-year-old FLSD member's child x1
God's spokesman on earth x1
a handful from day one x1
one individual, Warren Steed Jeffs x1
an accomplice to sexual conduct with a minor x1
Zhukova et al. "Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes"
Quantitative
comparison
Lexical diversity: previous metrics
• F1CoNLL of a simple same mentions’ head-
lemmas baseline [Cy14]
– Evaluate combinations and links between annotated
(key) and resolved (response) mentions
– Average of F1: B3, MUC, CEAFe
– High F1CoNLL = low lexical diversity
– F1CoNLL is inflated with singletons [Ca21]
15
• Unique number of mentions’ head-
lemmas [Ei21]
– A quite discrete measure, no detailed comparison
log scale
Figure source
Zhukova et al. "Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes"
Lexical diversity with a new Phrasing Diversity metric (PD)
Phrasing diversity (PD) measures lexical variation of coreference chains using the mention frequency and
phrasing variation.
→ yields more detailed degrees the lexical diversity for the same coref.chains
16
log scale
log scale
With PD, ECB+ and
NewsWCL50 share different
parts of the scatter plot
With unique head lemmas,
ECB+ and NewsWCL50 seem
quite similar
Zhukova et al. "Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes"
Numeric comparison
datasets topics subtopics articles tokens mentions
event
mentions
entity
mentions
chains singletons
inter-coder
reliability
(ICR)
ECB+ 43 86 962 377 367 12 004 4 013 7 991 4 759 3 445 0.76
NewsWCL50 10 - 50 49 591 5600 1225 4375 170 10 0.65
17
Intercoder agreement
• ECB+: κ = 0.74 event/entity mention annotation, κ = 0.76 for chain annotation
• NewsWCL50: 𝐴𝑂𝐴 = 0.65 (average observed agreement, less restrictive than κ)
→ more loosely related anaphora may result in lower ICR
with singletons without singletons
datasets
average
chain
size
F1CoNLL
average
unique
head
lemmas
phrasing
diversity
(PD)
PD
average
average
chain
size
F1CoNLL
average
unique
head
lemmas
phrasing
diversity
(PD)
PD
average
ECB+ 2.5 72.9 1.4 1.3 1.1 6.4 64.4 2.4 1.4 1.3
NewsWCL50 32.9 46.8 9.9 9.7 6.7 35 46.5 10.5 9.7 7.0
Coref.chain size distribution
Density of mention annotations per article
• ECB+: 13 per article
• NewsWCL50: 112 per article
→ exhausting annotation of NewsWCL50
chain
size
Zhukova et al. "Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes"
Discussion
• The proposed PD metric measures lexical diversity in higher
level of details compared to the unique head lemmas.
• ECB+
– Subtopic level of events with similar wording
– Only strict identity relation in coref.chains
– Smaller coreference chains and lower lexical diversity
→ lexical ambiguity problem [Bu21]
• NewsWCL50
– Topic level with large portion of new event aspects in each article
– A mix of identity and more loose anaphoric relations
– Large coreference chains and high lexical diversity
→ lexical diversity problem
18
CDCR models need to be evaluated on a diverse set of CDCR datasets with different focus and challenges.
log scale
Zhukova et al. "Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes"
Conclusion
Conclusion
• Proposed a methodology of comparing diverse CDCR datasets
– Event-centric ECB+ with strict identity coreference relations
– Concept-centric NewsWCL50 with mixed identity and bridging coreference relations
• Bugert et al. 2021 evaluated the robustness of the CDCR models to handle multiple
event-centric datasets → a lexical disambiguation challenge
• New phrasing diversity metric (PD): represents frequency and lexical variation of the
mentions in coreference chains.
• PDECB+ < PDNewsWCL50 → a need for evaluation on a lexical diversity challenge
20
Zhukova et al. "Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes"
RT1
• CDCR models should be evaluated on a combination of lexical disambiguation and lexical
diversity challenges
→ generalizability and robustness of the CDCR models
RT2
Covered References
21
• [Bu21] Bugert, M., Reimers, N., and Gurevych, I. (2021). Generalizing cross-document event coreference
resolution across multiple corpora. Computational Linguistics, pages 1–43.
• [Ca21] Cattan, A., Eirew, A., Stanovsky, G., Joshi, M., and Dagan, I. (2021). Realistic evaluation principles for
cross-document coreference resolution. In Proceedings of *SEM 2021: The Tenth Joint Conference on Lexical
and Computational Semantics, pages 143–151, Online, August. Association for Computational Linguistics.
• [Cy14] Cybulska, A. and Vossen, P. (2014). Using a sledgehammer to crack a nut? lexical diversity and event
coreference resolution. In Proceedings of the Ninth International Conference on Language Resources and
Evaluation (LREC’14), pages 4545–4552, Reykjavik, Iceland, May. European Language Resources Association
(ELRA).
• [Ei21] Eirew, A., Cattan, A., and Dagan, I. (2021). WEC: Deriving a large-scale cross-document event
coreference dataset from Wikipedia. In Proceedings of the 2021 Conference of the North American Chapter
of the Association for Computational Linguistics: Human Language Technologies, pages 2498–2510, Online,
June. Association for Computational Linguistics.
• [Ha19] Hamborg, F., Zhukova, A., and Gipp, B. (2019). Automated identification of media bias by word choice
and labeling in news articles. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL).
Zhukova et al. "Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes"
Thank you for your attention! Questions?
22
Anastasia Zhukova
• Data & Knowledge Engineering Group, University of Wuppertal
• Chair for Scientific Information Analytics, University of Göttingen
zhukova@gipplab.org
@ana_m_zhukova
Comparison of ECB+ and NewsWCL50:
• Parsed into same formats
• Contain scripts for the reported evaluation metrics
https://github.com/anastasia-zhukova/Diverse_CDCR_datasets
Zhukova et al. "Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes"
Backup slides
Phrasing diversity metric (PD) metric
24
On the chain level
Weighted-average on a topic level
where 𝐻𝑐 represents all unique heads of phrases of an annotated chain 𝑐,
𝑈ℎ is the number of unique phrases with ℎ,
𝑃ℎ is the number of all phrases with head ℎ,
𝑀𝑐 is a number of all mentions of an annotated chain 𝑐.
Zhukova et al. "Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes"
With PD, ECB+ and
NewsWCL50 share different
pars of the scatter plot

Más contenido relacionado

Similar a Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes

graduate_thesis (1)
graduate_thesis (1)graduate_thesis (1)
graduate_thesis (1)
Sihan Chen
 
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Anastasia Zhukova
 
The science behind predictive analytics a text mining perspective
The science behind predictive analytics  a text mining perspectiveThe science behind predictive analytics  a text mining perspective
The science behind predictive analytics a text mining perspective
ankurpandeyinfo
 

Similar a Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes (20)

graduate_thesis (1)
graduate_thesis (1)graduate_thesis (1)
graduate_thesis (1)
 
Co word analysis
Co word analysisCo word analysis
Co word analysis
 
Towards knowledge maintenance in scientific digital libraries with the keysto...
Towards knowledge maintenance in scientific digital libraries with the keysto...Towards knowledge maintenance in scientific digital libraries with the keysto...
Towards knowledge maintenance in scientific digital libraries with the keysto...
 
A Metadata Application Profile for KOS Vocabulary Registries (KOS-AP)
A Metadata Application Profile for KOS Vocabulary Registries (KOS-AP)A Metadata Application Profile for KOS Vocabulary Registries (KOS-AP)
A Metadata Application Profile for KOS Vocabulary Registries (KOS-AP)
 
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...
Slides: Concurrent Inference of Topic Models and Distributed Vector Represent...
 
Token
TokenToken
Token
 
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
 
A-Study_TopicModeling
A-Study_TopicModelingA-Study_TopicModeling
A-Study_TopicModeling
 
Knowledge Representation on the Web
Knowledge Representation on the WebKnowledge Representation on the Web
Knowledge Representation on the Web
 
Chapter 6 Query Language .pdf
Chapter 6 Query Language .pdfChapter 6 Query Language .pdf
Chapter 6 Query Language .pdf
 
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
TEXTS CLASSIFICATION WITH THE USAGE OF NEURAL NETWORK BASED ON THE WORD2VEC’S...
 
Texts Classification with the usage of Neural Network based on the Word2vec’s...
Texts Classification with the usage of Neural Network based on the Word2vec’s...Texts Classification with the usage of Neural Network based on the Word2vec’s...
Texts Classification with the usage of Neural Network based on the Word2vec’s...
 
Vsm 벡터공간모델
Vsm 벡터공간모델Vsm 벡터공간모델
Vsm 벡터공간모델
 
Vsm 벡터공간모델
Vsm 벡터공간모델Vsm 벡터공간모델
Vsm 벡터공간모델
 
overview of database concept
overview of database conceptoverview of database concept
overview of database concept
 
ENGL1040 World Literatures Today
ENGL1040 World Literatures TodayENGL1040 World Literatures Today
ENGL1040 World Literatures Today
 
Network analyses of SPSSI
Network analyses of SPSSINetwork analyses of SPSSI
Network analyses of SPSSI
 
Survey of natural language processing(midp2)
Survey of natural language processing(midp2)Survey of natural language processing(midp2)
Survey of natural language processing(midp2)
 
The science behind predictive analytics a text mining perspective
The science behind predictive analytics  a text mining perspectiveThe science behind predictive analytics  a text mining perspective
The science behind predictive analytics a text mining perspective
 
What to read next? Challenges and Preliminary Results in Selecting Represen...
What to read next? Challenges and  Preliminary Results in Selecting  Represen...What to read next? Challenges and  Preliminary Results in Selecting  Represen...
What to read next? Challenges and Preliminary Results in Selecting Represen...
 

Más de Anastasia Zhukova

M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
Anastasia Zhukova
 
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
Anastasia Zhukova
 
Concept Identification of Directly and Indirectly Related Mentions Referring ...
Concept Identification of Directly and Indirectly Related Mentions Referring ...Concept Identification of Directly and Indirectly Related Mentions Referring ...
Concept Identification of Directly and Indirectly Related Mentions Referring ...
Anastasia Zhukova
 

Más de Anastasia Zhukova (9)

What's in the News? Towards Identification of Bias by Commission, Omission, a...
What's in the News? Towards Identification of Bias by Commission, Omission, a...What's in the News? Towards Identification of Bias by Commission, Omission, a...
What's in the News? Towards Identification of Bias by Commission, Omission, a...
 
Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...
Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...
Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...
 
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
 
Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...
Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...
Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...
 
Putting News in a Perspective: Framing by Word Choice and Labeling
Putting News in a Perspective: Framing by Word Choice and LabelingPutting News in a Perspective: Framing by Word Choice and Labeling
Putting News in a Perspective: Framing by Word Choice and Labeling
 
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...
Interpretable Topic Modeling Using Near-Identity Cross-Document Coreference R...
 
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
 
Concept Identification of Directly and Indirectly Related Mentions Referring ...
Concept Identification of Directly and Indirectly Related Mentions Referring ...Concept Identification of Directly and Indirectly Related Mentions Referring ...
Concept Identification of Directly and Indirectly Related Mentions Referring ...
 
ANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
ANEA: Automated (Named) Entity Annotation for German Domain-Specific TextsANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
ANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
 

Último

(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
Scintica Instrumentation
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
ANSARKHAN96
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
Silpa
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
Silpa
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
Silpa
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
levieagacer
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 

Último (20)

Zoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdfZoology 5th semester notes( Sumit_yadav).pdf
Zoology 5th semester notes( Sumit_yadav).pdf
 
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body GBSN - Microbiology (Unit 3)Defense Mechanism of the body
GBSN - Microbiology (Unit 3)Defense Mechanism of the body
 
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
(May 9, 2024) Enhanced Ultrafast Vector Flow Imaging (VFI) Using Multi-Angle ...
 
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptxTHE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
THE ROLE OF BIOTECHNOLOGY IN THE ECONOMIC UPLIFT.pptx
 
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and SpectrometryFAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
FAIRSpectra - Enabling the FAIRification of Spectroscopy and Spectrometry
 
Grade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its FunctionsGrade 7 - Lesson 1 - Microscope and Its Functions
Grade 7 - Lesson 1 - Microscope and Its Functions
 
Factory Acceptance Test( FAT).pptx .
Factory Acceptance Test( FAT).pptx       .Factory Acceptance Test( FAT).pptx       .
Factory Acceptance Test( FAT).pptx .
 
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.Molecular markers- RFLP, RAPD, AFLP, SNP etc.
Molecular markers- RFLP, RAPD, AFLP, SNP etc.
 
Human genetics..........................pptx
Human genetics..........................pptxHuman genetics..........................pptx
Human genetics..........................pptx
 
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort ServiceCall Girls Ahmedabad +917728919243 call me Independent Escort Service
Call Girls Ahmedabad +917728919243 call me Independent Escort Service
 
Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.Reboulia: features, anatomy, morphology etc.
Reboulia: features, anatomy, morphology etc.
 
LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.LUNULARIA -features, morphology, anatomy ,reproduction etc.
LUNULARIA -features, morphology, anatomy ,reproduction etc.
 
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICEPATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
PATNA CALL GIRLS 8617370543 LOW PRICE ESCORT SERVICE
 
Role of AI in seed science Predictive modelling and Beyond.pptx
Role of AI in seed science  Predictive modelling and  Beyond.pptxRole of AI in seed science  Predictive modelling and  Beyond.pptx
Role of AI in seed science Predictive modelling and Beyond.pptx
 
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
Human & Veterinary Respiratory Physilogy_DR.E.Muralinath_Associate Professor....
 
Module for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learningModule for Grade 9 for Asynchronous/Distance learning
Module for Grade 9 for Asynchronous/Distance learning
 
Use of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptxUse of mutants in understanding seedling development.pptx
Use of mutants in understanding seedling development.pptx
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRLGwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
Gwalior ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Gwalior ESCORT SERVICE❤CALL GIRL
 
FAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical ScienceFAIRSpectra - Enabling the FAIRification of Analytical Science
FAIRSpectra - Enabling the FAIRification of Analytical Science
 

Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes

  • 1. Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes Anastasia Zhukova1, Felix Hamborg2, Bela Gipp1,3 1University of Wuppertal, Germany 2University of Konstanz, Germany 3University of Göttingen, Germany LREC’22, 23 June 2022, Marseille
  • 2. Motivation: annotation of cross-document coreference resolution datasets Source 1: The Australian Source 2: The Economist 3 Zhukova et al. "Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes"
  • 3. Motivation: annotation of cross-document coreference resolution datasets 4 Source 1: The Australian Source 2: The Economist Zhukova et al. "Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes"
  • 4. Research Question and Tasks RQ: How to compare CDCR datasets and which aspects to consider in the analysis? Research tasks • RT1: Compare and discuss two perspectives of annotating coreference chains and relations: – Event-centric chains with strict identity relations (ECB+ [Cy14]) – Concept-centric chains with loose coreference relations, i.e., combination of identity and bridging relations (NewsWCL50 [Ha19]) • RT2: Propose further directions of evaluation of the CDCR models 5 Zhukova et al. "Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes"
  • 5. Related work • CDCR datasets have two triggers for coreference chains – Event-centric (occurrence of an event matters) – Concept-centric (frequency of an event/entity matters) • Typically separated research – Strict identity relations – Near-identity relations – Bridging relations • Related work focuses on comparing (CD)CR datasets of same nature – For example, event-centric ECB+, FCC, GVC [Bu21] • Very scarce comparison of the CDCR datasets of the different nature – We compare diverse ECB+ and NewsWCL50 Strict identity relations Near-identity relations Bridging relations Datasets ECB+ TAK KBP ACE MEANTIME ISNotes BASHI ARRAU DEFT NiDENT RED NewsWCL50 FCC GVC WEC HyperCoref NP4E GUM 6 Zhukova et al. "Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes"
  • 7. Dataset structure • ECB+ – Topic, e.g., earthquake – Subtopic, i.e., a narrowly-related event described by an action, participant(s), location, and time – Article, i.e., clearly represents a subtopic Articles Large content overlap Subtopic 1 Subtopic 2 Topic Topic Articles Smaller content overlap, contain discussions of various aspects of the event 8 • NewsWCL50 – Topic, i.e., articles reporting on the same event – Article, i.e., discusses at least partially different aspects of the reported event Zhukova et al. "Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes"
  • 8. Annotation: Mentions & Coreference chains 9 Dataset Which mentions How and which mentions to annotate When to link into the coreference chains Problems ECB+ action-driven mentions o action o participant o location o datetime • minimum-span style • NPs and VPs • pronouns • entities: if they refer to the same entity • events: same attributes (action + participant + location + datetime) • unannotated entity mentions if not action attributes • many event-centric annotated mentions become singleton coreference chains NewsWCL50 frequency-driven mentions o action/event o actor o object o GPE o abstract entities o (no location) o (no datetime) • maximum-span style • NPs and VPs • (no pronouns) • minimum 5 occurrences of an event/entity linked with diverse relations across the related articles • some entities are too abstract and need refinement • not annotated entities: location, datetime, and pronoun mentions as attributes of an action Zhukova et al. "Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes"
  • 9. Relations ECB+: strict identity • Actor, location, datetime: refer to the same entity “the U.S.” – “America” • Event components: same action, participant, location, datetime “Lindsay Lohan checked into rehab.” “Ms. Lohan entered a rehab facility.” 10 NewsWCL50: a mix of identity and bridging relations (1) identity relations including subject-predicate coreference “Donald Trump” – “the newly-minted president” (2) synonym relations “a caravan of immigrants” – “a few hundred asylum seekers” (3) metonymy “the Russian government” – “the Kremlin” (4) meronymy/holonymy “the U.S.-Mexico border” – “the San Ysidro port of entry” (5) mentions are linked with copular verbs, e.g., “be,” “seem,” or “call” “Trump called Kim Jong Un a Little Rocket Man” → “Kim Jong Un” – “Little Rocket Man” (6) mentions meet GPE definitions as of ACE annotation “the U.S.” – “American people” (7) mentions are elements of one set “guaranteed to disrupt trade” – “drove down steel prices” as members of a set “Consequences of tariff imposition” Zhukova et al. "Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes"
  • 10. Reannotation experiment 11 Head ECB+ (original) NewsWCL50 (following the coding book) Jeffs Jeffs x15 Warren Jeffs x1 polygamist sect leader Warren Jeffs x1 prophet Warren Jeffs x2 leader Warren Jeffs x2 polygamist Warren Jeffs x1 polygamist leader Warren Jeffs x3 Warren Jeffs, Polygamist Leader x1 Jeffs x66 Warren Jeffs x4 polygamist sect leader Warren Jeffs x2 prophet Warren Jeffs x1 FLDS prophet Warren Jeffs x1 Mr. Jeffs x2 the 54-year-old Jeffs x1 Jeffs, who acted as his own attorney x1 Jeffs, who was indicted more than two years ago x1 Warren Jeffs, leader of the Fundamentalist Church of Jesus Christ… x1 Warren Jeffs, polygamist leader x1 misc he, his, him, who x22 polygamist x2 attorney x2 pedophile x1 a victim of religious persecution x1 the highest-profile defendant x1 the father of a 15-year-old FLSD member's child x1 God's spokesman on earth x1 a handful from day one x1 one individual, Warren Steed Jeffs x1 an accomplice to sexual conduct with a minor x1 Zhukova et al. "Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes"
  • 12. Lexical diversity: previous metrics • F1CoNLL of a simple same mentions’ head- lemmas baseline [Cy14] – Evaluate combinations and links between annotated (key) and resolved (response) mentions – Average of F1: B3, MUC, CEAFe – High F1CoNLL = low lexical diversity – F1CoNLL is inflated with singletons [Ca21] 15 • Unique number of mentions’ head- lemmas [Ei21] – A quite discrete measure, no detailed comparison log scale Figure source Zhukova et al. "Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes"
  • 13. Lexical diversity with a new Phrasing Diversity metric (PD) Phrasing diversity (PD) measures lexical variation of coreference chains using the mention frequency and phrasing variation. → yields more detailed degrees the lexical diversity for the same coref.chains 16 log scale log scale With PD, ECB+ and NewsWCL50 share different parts of the scatter plot With unique head lemmas, ECB+ and NewsWCL50 seem quite similar Zhukova et al. "Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes"
  • 14. Numeric comparison datasets topics subtopics articles tokens mentions event mentions entity mentions chains singletons inter-coder reliability (ICR) ECB+ 43 86 962 377 367 12 004 4 013 7 991 4 759 3 445 0.76 NewsWCL50 10 - 50 49 591 5600 1225 4375 170 10 0.65 17 Intercoder agreement • ECB+: κ = 0.74 event/entity mention annotation, κ = 0.76 for chain annotation • NewsWCL50: 𝐴𝑂𝐴 = 0.65 (average observed agreement, less restrictive than κ) → more loosely related anaphora may result in lower ICR with singletons without singletons datasets average chain size F1CoNLL average unique head lemmas phrasing diversity (PD) PD average average chain size F1CoNLL average unique head lemmas phrasing diversity (PD) PD average ECB+ 2.5 72.9 1.4 1.3 1.1 6.4 64.4 2.4 1.4 1.3 NewsWCL50 32.9 46.8 9.9 9.7 6.7 35 46.5 10.5 9.7 7.0 Coref.chain size distribution Density of mention annotations per article • ECB+: 13 per article • NewsWCL50: 112 per article → exhausting annotation of NewsWCL50 chain size Zhukova et al. "Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes"
  • 15. Discussion • The proposed PD metric measures lexical diversity in higher level of details compared to the unique head lemmas. • ECB+ – Subtopic level of events with similar wording – Only strict identity relation in coref.chains – Smaller coreference chains and lower lexical diversity → lexical ambiguity problem [Bu21] • NewsWCL50 – Topic level with large portion of new event aspects in each article – A mix of identity and more loose anaphoric relations – Large coreference chains and high lexical diversity → lexical diversity problem 18 CDCR models need to be evaluated on a diverse set of CDCR datasets with different focus and challenges. log scale Zhukova et al. "Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes"
  • 17. Conclusion • Proposed a methodology of comparing diverse CDCR datasets – Event-centric ECB+ with strict identity coreference relations – Concept-centric NewsWCL50 with mixed identity and bridging coreference relations • Bugert et al. 2021 evaluated the robustness of the CDCR models to handle multiple event-centric datasets → a lexical disambiguation challenge • New phrasing diversity metric (PD): represents frequency and lexical variation of the mentions in coreference chains. • PDECB+ < PDNewsWCL50 → a need for evaluation on a lexical diversity challenge 20 Zhukova et al. "Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes" RT1 • CDCR models should be evaluated on a combination of lexical disambiguation and lexical diversity challenges → generalizability and robustness of the CDCR models RT2
  • 18. Covered References 21 • [Bu21] Bugert, M., Reimers, N., and Gurevych, I. (2021). Generalizing cross-document event coreference resolution across multiple corpora. Computational Linguistics, pages 1–43. • [Ca21] Cattan, A., Eirew, A., Stanovsky, G., Joshi, M., and Dagan, I. (2021). Realistic evaluation principles for cross-document coreference resolution. In Proceedings of *SEM 2021: The Tenth Joint Conference on Lexical and Computational Semantics, pages 143–151, Online, August. Association for Computational Linguistics. • [Cy14] Cybulska, A. and Vossen, P. (2014). Using a sledgehammer to crack a nut? lexical diversity and event coreference resolution. In Proceedings of the Ninth International Conference on Language Resources and Evaluation (LREC’14), pages 4545–4552, Reykjavik, Iceland, May. European Language Resources Association (ELRA). • [Ei21] Eirew, A., Cattan, A., and Dagan, I. (2021). WEC: Deriving a large-scale cross-document event coreference dataset from Wikipedia. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 2498–2510, Online, June. Association for Computational Linguistics. • [Ha19] Hamborg, F., Zhukova, A., and Gipp, B. (2019). Automated identification of media bias by word choice and labeling in news articles. In Proceedings of the ACM/IEEE Joint Conference on Digital Libraries (JCDL). Zhukova et al. "Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes"
  • 19. Thank you for your attention! Questions? 22 Anastasia Zhukova • Data & Knowledge Engineering Group, University of Wuppertal • Chair for Scientific Information Analytics, University of Göttingen zhukova@gipplab.org @ana_m_zhukova Comparison of ECB+ and NewsWCL50: • Parsed into same formats • Contain scripts for the reported evaluation metrics https://github.com/anastasia-zhukova/Diverse_CDCR_datasets Zhukova et al. "Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes"
  • 21. Phrasing diversity metric (PD) metric 24 On the chain level Weighted-average on a topic level where 𝐻𝑐 represents all unique heads of phrases of an annotated chain 𝑐, 𝑈ℎ is the number of unique phrases with ℎ, 𝑃ℎ is the number of all phrases with head ℎ, 𝑀𝑐 is a number of all mentions of an annotated chain 𝑐. Zhukova et al. "Towards Evaluation of Cross-document Coreference Resolution Models Using Datasets with Diverse Annotation Schemes" With PD, ECB+ and NewsWCL50 share different pars of the scatter plot