SlideShare una empresa de Scribd logo
1 de 23
Descargar para leer sin conexión
XCoref: Cross-document Coreference
Resolution in the Wild
iConference 2022, Feb. 28 - Mar. 4, 2022, Virtual
1 University of Wuppertal, Germany
2 University of Konstanz, Germany
3 University of Zurich, Switzerland
4 Heidelberg Academy of Sciences and Humanities, Germany
Anastasia Zhukova1
Felix Hamborg2,4, Karsten Donnay3,4, and Bela Gipp1,4
Agenda
• Motivation: media bias by word choice and labeling
• Research objective
• Methodology: XCoref
• Evaluation
• Results & Discussion
• Conclusion
Zhukova et al. “XCoref: Cross-document Coreference Resolution in the Wild”
Motivation: Media bias by word choice and labeling
https://www.theaustralian.com.au/world/alexander-lukashenko-locked-and-loaded-to-fight-belarus-rats/news-story/99175bd3cfc13f71e2176310cee98288
https://www.economist.com/europe/2020/06/20/waving-slippers-at-the-cockroach-president-of-belarus 3
Motivation: Media bias by word choice and labeling
4
https://www.theaustralian.com.au/world/alexander-lukashenko-locked-and-loaded-to-fight-belarus-rats/news-story/99175bd3cfc13f71e2176310cee98288
https://www.economist.com/europe/2020/06/20/waving-slippers-at-the-cockroach-president-of-belarus
Motivation
• State-of-the-art cross-document coreference resolution (CDCR) resolves
mentions of the same entities/events with strict identity
Alexander Lukashenko = Belorussian President Alexander Lukashenko
• Problem: CDCR ignores loosely-related coreferential mentions
Vladimir Putin about Alexey Navalny: instead of Alexey’s full name always uses name calling,
e.g., “Russian Saakashvili,” “the famous blogger,” “the criminal of a famous case.”
• “Wild” political news: Substantial lexical variance + mixed coreference relations
→ bias by word choice and labeling (WCL)
• WCL influences perception of entities/events, e.g., by associations
Alexander Lukashenko = cockroach = idiosyncratic autocrat
• CDCR completely overlooks lexical diversity in non-named-entities
Tens of thousands demonstrators = the marchers = Belarus “rats”
War in Ukraine = special military operation
Zhukova et al. “XCoref: Cross-document Coreference Resolution in the Wild”
Research Objectives
Research objectives
• Revisit target concept analysis (TCA) approach finding cases of bias by WCL…
• …and propose XCoref, an unsupervised sieve-based method, that jointly resolves
mentions of strict and loose identity relations into coreferential chains
• Evaluate XCoref, TCA, and a state-of-the-art CRCR model with the standard CoNLL
metrics in (CD)CR
Zhukova et al. “XCoref: Cross-document Coreference Resolution in the Wild”
Related work
• Barely overlapping research
– Strict identity relations (i.e., CR)
– Near-identity relations
– Bridging relations
• Biased by WCL coreferential mentions are linked
with a mix of all relations (e.g., NewsWCL50)
• State-of-the-art CDCR models:
trained on strict identity relations, e.g., [Ba19]
• TCA [Ha19]: unsupervised method
- resolves mentions with diverse relations
- evaluated on NewsWCL50
- focuses only on general lexical feature of all concepts
• Goal: Revisit TCA and employ specific lexical
features per entity/event type
Strict
identity relations
Near-
identity
relations
Bridging
relations
Datasets
ECB+
TAK KBP ACE
OntoNotes MEANTIME
ISNotes
BASHI
ARRAU
DEFT
NiDENT
RED
NewsWCL50
EeCDCR [Ba19]
TCA [Ha19]
[Ha19] Hamborg, F., Zhukova, A., & Gipp, B. (2019, June). Automated Identification of Media Bias by Word Choice and Labeling in News Articles. In 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL) (pp. 196-205). IEEE Computer Society.
[Ba19] Barhom, S., Shwartz, V., Eirew, A., Bugert, M., Reimers, N., & Dagan, I. (2019, July). Revisiting Joint Modeling of Cross-document Entity and Event Coreference Resolution. In Proceedings of the 57th Annual Meeting of the Association for Computational
Linguistics (pp. 4179-4189).
XCoref
Methodology: XCoref
• Hypothesis: Multiple (sub-)methods required to address various concepts, e.g.,
– Mainly using strict identity relations, i.e., NE persons, organizations, countries
– Mainly consisting of mention with bridging/near identity relations, e.g., groups of persons
• XCoref resolves mentions of the concept types of increasing semantic complexity
Zhukova et al. “XCoref: Cross-document Coreference Resolution in the Wild”
Methodology: XCoref
Step 1
Named entity
linking (NEL)
Step 0
Coreference resolution
of named entities (NEs)
Step 2
NE head-word and
compound match
Step 3
Resolution of
non-NE mentions
Step 4
Groups of persons
Step 5
Events and
abstract entities
Steps 0-3
Named entities (NEs)
/wiki/Donald_Trump
/wiki/Donald_Trump
/wiki/Donald_Trump
/wiki/Donald_Trump
Donald Trump
Donald
Donald Trump –
Trump - Donald
undecisive president of the US
Donald Trump –
Trump – Donald -
undecisive
president of the US
Trump
illegal aliens –
undocumented immigrants
Trump administration officials
– American government
Trump-Kim meeting –
discussed an issue
illegal aliens –
undocumented immigrants
Trump administration officials
– American government
Trump-Kim meeting –
discussed an issue
Unresolved
mentions
Resolved
mentions
Donald Trump –
Trump - undecisive
president of the US
Zhukova et al. “XCoref: Cross-document Coreference Resolution in the Wild”
Step 4: identification of groups of persons
• We integrated approach proposed by Zhukova et al. [Zh21]
• OPTICS principle [An99]
– The approach finds cluster centers of the most semantically similar mentions
– The clusters are expanded with less semantically similar mentions
• Improved the clustering approach by two additional sub-steps
– Merge intermediate clusters, i.e., merge clusters if similar
– Move alien points, i.e., relocate mentions to the better-matching clusters
b) Move alien points
a) Merge intermediate clusters b) Move
alien
points
a) Merge
intermediate
clusters
b) Move alien
points
[Zh21] Zhukova, A., Hamborg, F., Donnay, K., & Gipp, B. (2021). Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons. In iConference (1) (pp. 514-526).
[An99] Ankerst, M., Breunig, M. M., Kriegel, H. P., & Sander, J. (1999). OPTICS: Ordering points to identify the clustering structure. ACM Sigmod record, 28(2), 49-60.
Step 5: events and abstract entities
Figure is based on the figure from: https://towardsdatascience.com/hierarchical-clustering-explained-e59b13846da8
Trump-Kim meeting
Trump, Kim, meeting
discussed an issue
discussed, issue
Trump-Kim
meeting
discussed
an issue
• Step 5 resolves mention chains of events and
abstract entities
– actions, objects, events
– consist of noun phrases (NPs) and verb phrases (VPs)
• Weights head-lemma word vectors of each
phrase
➢ highlights core meaning of the head of phrase
➢ keeps the context
• Hierarchical clustering
– cosine distance
– average linkage
2) Weighted word vectors
3) Hierarchical clustering
1) Lemmatization, heads of phrases
Zhukova et al. “XCoref: Cross-document Coreference Resolution in the Wild”
Evaluation
Evaluation
• Methodology: State-of-the-art metrics for coreference resolution
– Evaluate combinations and links between annotated (key) and resolved (response) mentions
• Metrics: CoNLL for (CD)CR
– MUC
– B3
– CEAFe
– F1_conll: average of all three metrics
• Previously ignored for looser coreference relations
• Dataset: NewsWCL50
– Contains coreference chains with a mix of relations
– i.e., strict identity, near-identity, bridging
Figure source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5667668/
Zhukova et al. “XCoref: Cross-document Coreference Resolution in the Wild”
Baselines
• Same-lemma
• Target Concept Analysis (TCA) Hamborg et al. (2019)
– Original version
– With improved preprocessing
– Three types of word vectors: word2vec, fastText, GloVe
• Event-entity cross document coreference resolution (EeCDCR) Barhom et al. (2019)
– With AllenNLP’s semantic role labeling (SRL)
• Ablation study for XCoref
– Intermediate XCorefinterm with baseline methods for Steps 4 & 5
• Step 4: hierarchical clustering [Zh21]
• Step 5: TCA’s step 2 [Ha19]
– Three types of word vectors: word2vec, fastText, GloVe
Zhukova et al. “XCoref: Cross-document Coreference Resolution in the Wild”
Results: General
Method Word vectors MUC B3 CEAF_e F1_conll
R P F1 R P F1 R P F1
Lemma --- 75.7 93.8 83.8 36.8 88.8 52.0 63.1 7.8 13.9 49.9
EeCDCR GloVe 69.4 90.6 78.6 33.1 82.3 47.2 58.6 7.8 13.7 46.5
TCA word2vec 73.4 89.4 80.6 37.2 73.7 49.4 51.9 8.7 14.9 48.3
TCA preproc
word2vec 72.9 89.5 80.3 38.4 75.6 50.9 54.5 9.1 15.6 48.9
fastText 72.9 87.6 79.6 37.3 71.5 49.0 52.0 9.5 16.0 48.2
GloVe 77.2 88.3 82.4 41.8 67.0 51.4 52.9 12.1 19.6 51.2
XCoref interm
word2vec 68.4 90.3 77.8 37.7 84.0 52.0 63.0 8.4 14.8 48.2
fastText 74.2 87.3 80.2 38.7 71.5 50.2 58.4 11.6 19.4 50.0
GloVe 75.7 88.5 81.6 40.6 72.1 52.0 58.9 12.1 20.0 51.2
XCoref
word2vec 70.7 89.8 79.1 36.3 82.4 50.4 63.0 9.4 16.3 48.6
fastText 78.6 90.0 83.9 43.1 70.5 53.5 60.4 13.7 22.4 53.3
GloVe 79.3 90.8 84.7 44.4 72.2 55.0 61.1 13.9 22.6 54.1
• XCoref outperforms all baselines on NewsWCL50 dataset
+ 2.9pp to the best baseline
• State-of-the-art EeCDCR performs the worst
• GloVe systematically improves performance of all methods
Zhukova et al. “XCoref: Cross-document Coreference Resolution in the Wild”
Results: Steps 4 & 5
Sieves MUC B3 CEAF_e F1_conll
R P F1 R P F1 R P F1
init shared 28.6 89.8 43.4 15.6 96.0 26.8 41.4 2.1 4.1 24.7
S1 shared 40.2 92.3 56.0 22.9 95.0 36.9 49.7 3.1 5.8 32.9
S2 shared 42.2 92.5 58.0 24.6 94.6 39.1 51.2 3.3 6.2 34.4
S3 shared 45.7 91.0 60.8 27.3 91.3 42.1 51.6 3.6 6.7 36.5
S4 interm 52.7 90.2 66.6 29.1 87.5 43.6 51.5 4.2 7.8 39.3
S5 interm 75.7 88.5 81.6 40.6 72.1 52.0 58.9 12.1 20.0 51.2
S4 54.6 91.1 68.3 30.4 86.1 44.9 51.9 4.4 8.1 40.4
S5 79.3 90.8 84.7 44.4 72.2 55.0 61.3 13.9 22.7 54.1
• Comparison of XCoref’s steps to the intermediate version of XCorefinterm
• Both steps 4 & 5 outperform their baseline methods
– Step 4: +1.1pp
– Step 5: +2.9pp
Zhukova et al. “XCoref: Cross-document Coreference Resolution in the Wild”
Results
• XCoref resolves the mentions of the mixed identity
• XCoref outperforms all methods on NewsWCL50 dataset, e.g., TCA
• XCoref suffers from not merging a considerable number of related smaller clusters
– requires being more aware of the context
• Yet, XCoref resolves the following cases with just unsupervised methods... (next slide)
Zhukova et al. “XCoref: Cross-document Coreference Resolution in the Wild”
Results: qualitative examples
Name Resolved mentions
DNC's lawsuit the process of legal discovery, a sham lawsuit about a bogus Russian collusion claim, a bogus Russian collusion claim,
allegations of obstruction of justice, a desperate attempt to keep a collusion narrative going ahead of November
mid-term elections, a new low to raise money, the DNC's move, the lawsuit to drum up donations for the party
Immigrants no legitimate asylum-seekers, the asylum-seekers, the individual migrants planning to seek asylum, groups of the
migrants with their children, migrant families that request asylum, unauthorized immigrants, the migrants in the
caravan, a caravan of immigrants, members of the caravan, these large ``Caravans'' of people, a few hundred asylum
seekers, refugees, those individuals, applicants, people who request protection, people traveling without
documents, several groups of people associated with the caravan, undocumented immigrants, asylum-seeking
immigrant ``caravan‘’, group of about 100 people
PRK-USA Summit the summit meeting, a potential meeting of the two leaders, an extraordinary meeting following months of heated
rhetoric, meet with the North Korean dictator, discuss its nuclear weapons program, Kim's offer for a summit, a great
chance to solve a world problem, won't even have a meeting at all, a once-unthinkable encounter between him and
Mr. Kim, a one-on-one meeting with North Korea leader Kim Jong Un, direct talks between U.S. President Donald
Trump and Kim, Mr. Kim's invitation to meet, the upcoming summit meeting with the North Korean leader, a great
chance to solve a world problem, unwavering determination in addressing the challenge of North Korea
Denuclearization a deal to destroy only inter-continental missiles that could reach the United States, gives up nuclear weapons,
months of heated rhetoric over Pyongyang's nuclear weapons program, to engage in a process headed toward an
ambiguous goal, broad and "abstract " statements about the need for North Korea to ``denuclearize'', give up its
nuclear program, yet to take any tangible steps to give up its nuclear arsenal, to address the threats posed by its
nuclear and missile program
Zhukova et al. “XCoref: Cross-document Coreference Resolution in the Wild”
Discussion
• Looser coreference relations represent a challenge to the established CDCR models,
– i.e., EeCDCR (performs worst among all baselines)
• Next direction for CDCR research
– employ language models trained in the news domain
– evaluate on the “wild” political news articles with diverse identity relations
Conclusion
• We propose XCoref, an unsupervised sieve-based method for cross-document coreference
resolution (CDCR)
• XCoref resolves mentions of a mix of strict and loose coreference relations
– American steelmakers – shuttered plants and mills
– the United States – Trump Administration officials
• Performance on NewsWCL50
– XCoref performs the best
– a well-established CDCR model performs the worst
• CDCR models need to be tested on more diverse CDCR datasets
– With both strict identity and more loose bridging coreference relations.
• CDCR in a challenging “wild” environment of political news articles
– increases awareness of bias by word choice and labelling
– increases applicability of CDCR research
Zhukova et al. “XCoref: Cross-document Coreference Resolution in the Wild”
References
• Hamborg, F., Zhukova, A., & Gipp, B. (2019, June). Automated Identification of Media Bias by Word Choice and Labeling in News Articles. In 2019
ACM/IEEE Joint Conference on Digital Libraries (JCDL) (pp. 196-205). IEEE Computer Society.
• Barhom, S., Shwartz, V., Eirew, A., Bugert, M., Reimers, N., & Dagan, I. (2019, July). Revisiting Joint Modeling of Cross-document Entity and Event
Coreference Resolution. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 4179-4189).
• Zhukova, A., Hamborg, F., Donnay, K., & Gipp, B. (2021). Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of
Persons. In iConference (1) (pp. 514-526).
• Minard, A. L., Speranza, M., Urizar, R., Altuna, B., Van Erp, M., Schoen, A., & Van Son, C. (2016, May). MEANTIME, the NewsReader multilingual event and
time corpus. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16) (pp. 4417-4422).
• Cybulska, A., & Vossen, P. (2014, May). Using a sledgehammer to crack a nut? Lexical diversity and event coreference resolution. In LREC (pp. 4545-4552).
• Recasens, M., Martí, M. A., & Orǎsan, C. (2012, May). Annotating near-identity from coreference disagreements. In Proceedings of the Eighth
International Conference on Language Resources and Evaluation (LREC'12) (pp. 165-172).
• O’Gorman, T., Wright-Bettner, K., & Palmer, M. (2016, November). Richer event description: Integrating event coreference with temporal, causal and
bridging annotation. In Proceedings of the 2nd Workshop on Computing News Storylines (CNS 2016) (pp. 47-56).
• Recasens, M., Hovy, E., & Martí, M. A. (2011). Identity, non-identity, and near-identity: Addressing the complexity of coreference. Lingua, 121(6), 1138-
1152.
• Hou, Y., Markert, K., & Strube, M. (2018). Unrestricted bridging resolution. Computational Linguistics, 44(2), 237-284.
• Rösiger, I. (2018, May). BASHI: A corpus of wall street journal articles annotated with bridging links. In Proceedings of the Eleventh International
Conference on Language Resources and Evaluation (LREC 2018).
• Poesio, M., & Artstein, R. (2008, May). Anaphoric Annotation in the ARRAU Corpus. In Proceedings of the Sixth International Conference on Language
Resources and Evaluation (LREC'08).
Questions
Anastasia Zhukova
zhukova@uni-wuppertal.de
Zhukova et al. “XCoref: Cross-document Coreference Resolution in the Wild”

Más contenido relacionado

Similar a XCoref: Cross-document Coreference Resolution in the Wild

Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
Anastasia Zhukova
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
Angelo Salatino
 

Similar a XCoref: Cross-document Coreference Resolution in the Wild (20)

Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
Towards Evaluation of Cross-document Coreference Resolution Models Using Data...
 
Insights from Knowledge Graphs
Insights from Knowledge GraphsInsights from Knowledge Graphs
Insights from Knowledge Graphs
 
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
The CSO Classifier: Ontology-Driven Detection of Research Topics in Scholarly...
 
A Knowledge Discovery Framework for Planetary Defense
A Knowledge Discovery Framework for Planetary DefenseA Knowledge Discovery Framework for Planetary Defense
A Knowledge Discovery Framework for Planetary Defense
 
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
What Are Links in Linked Open Data? A Characterization and Evaluation of Link...
 
CiteSeerX: Mining Scholarly Big Data
CiteSeerX: Mining Scholarly Big DataCiteSeerX: Mining Scholarly Big Data
CiteSeerX: Mining Scholarly Big Data
 
Towards Automatic Classification of LOD Datasets
Towards Automatic Classification of LOD DatasetsTowards Automatic Classification of LOD Datasets
Towards Automatic Classification of LOD Datasets
 
XLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and MyriaXLDB South America Keynote: eScience Institute and Myria
XLDB South America Keynote: eScience Institute and Myria
 
Sybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal PresentationSybrandt Thesis Proposal Presentation
Sybrandt Thesis Proposal Presentation
 
Franz sterner tdwg 2016 new power balance needed for trustworthy biodiversity...
Franz sterner tdwg 2016 new power balance needed for trustworthy biodiversity...Franz sterner tdwg 2016 new power balance needed for trustworthy biodiversity...
Franz sterner tdwg 2016 new power balance needed for trustworthy biodiversity...
 
Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016Teaching & Learning with Technology TLT 2016
Teaching & Learning with Technology TLT 2016
 
Probabilistic Topic models
Probabilistic Topic modelsProbabilistic Topic models
Probabilistic Topic models
 
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
Keynote: Global Media Monitoring - M. Grobelnik - ESWC SS 2014
 
Scratchpads introductory presentation 45mins
Scratchpads introductory presentation   45minsScratchpads introductory presentation   45mins
Scratchpads introductory presentation 45mins
 
Foundations for the Future of Science
Foundations for the Future of ScienceFoundations for the Future of Science
Foundations for the Future of Science
 
Franz et al 2017 ecn creating and publishing a symbiota based checklist version
Franz et al 2017 ecn creating and publishing a symbiota based checklist versionFranz et al 2017 ecn creating and publishing a symbiota based checklist version
Franz et al 2017 ecn creating and publishing a symbiota based checklist version
 
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research AreasThe Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
The Computer Science Ontology: A Large-Scale Taxonomy of Research Areas
 
All good things
All good thingsAll good things
All good things
 
CHI2015 - Citizen Science || Zooniverse
CHI2015 - Citizen Science || ZooniverseCHI2015 - Citizen Science || Zooniverse
CHI2015 - Citizen Science || Zooniverse
 
Diversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News StoriesDiversified Social Media Retrieval for News Stories
Diversified Social Media Retrieval for News Stories
 

Más de Anastasia Zhukova

M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
Anastasia Zhukova
 
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Anastasia Zhukova
 
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
Anastasia Zhukova
 
Concept Identification of Directly and Indirectly Related Mentions Referring ...
Concept Identification of Directly and Indirectly Related Mentions Referring ...Concept Identification of Directly and Indirectly Related Mentions Referring ...
Concept Identification of Directly and Indirectly Related Mentions Referring ...
Anastasia Zhukova
 

Más de Anastasia Zhukova (9)

What's in the News? Towards Identification of Bias by Commission, Omission, a...
What's in the News? Towards Identification of Bias by Commission, Omission, a...What's in the News? Towards Identification of Bias by Commission, Omission, a...
What's in the News? Towards Identification of Bias by Commission, Omission, a...
 
Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...
Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...
Seminar Paper: Putting News in a Perspective: Framing by Word Choice and Labe...
 
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
M.Sc. Thesis: Automated Identification of Framing by Word Choice and Labeling...
 
Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...
Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...
Talk: Automated Identification of Media Bias by Word Choice and Labeling in N...
 
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
Automated Identification of Framing by Word Choice and Labeling to Reveal Med...
 
Putting News in a Perspective: Framing by Word Choice and Labeling
Putting News in a Perspective: Framing by Word Choice and LabelingPutting News in a Perspective: Framing by Word Choice and Labeling
Putting News in a Perspective: Framing by Word Choice and Labeling
 
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
Interpretable and Comparative Textual Dataset Exploration Using Near-Identity...
 
Concept Identification of Directly and Indirectly Related Mentions Referring ...
Concept Identification of Directly and Indirectly Related Mentions Referring ...Concept Identification of Directly and Indirectly Related Mentions Referring ...
Concept Identification of Directly and Indirectly Related Mentions Referring ...
 
ANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
ANEA: Automated (Named) Entity Annotation for German Domain-Specific TextsANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
ANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts
 

Último

Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
PirithiRaju
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
Sérgio Sacani
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
PirithiRaju
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Sérgio Sacani
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
RizalinePalanog2
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Sérgio Sacani
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
PirithiRaju
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Sérgio Sacani
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
ssuser79fe74
 

Último (20)

Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )Recombination DNA Technology (Nucleic Acid Hybridization )
Recombination DNA Technology (Nucleic Acid Hybridization )
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
TEST BANK For Radiologic Science for Technologists, 12th Edition by Stewart C...
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 60009654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
9654467111 Call Girls In Raj Nagar Delhi Short 1500 Night 6000
 
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroidsHubble Asteroid Hunter III. Physical properties of newly found asteroids
Hubble Asteroid Hunter III. Physical properties of newly found asteroids
 
Creating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening DesignsCreating and Analyzing Definitive Screening Designs
Creating and Analyzing Definitive Screening Designs
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
VIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C PVIRUSES structure and classification ppt by Dr.Prince C P
VIRUSES structure and classification ppt by Dr.Prince C P
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
All-domain Anomaly Resolution Office U.S. Department of Defense (U) Case: “Eg...
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 

XCoref: Cross-document Coreference Resolution in the Wild

  • 1. XCoref: Cross-document Coreference Resolution in the Wild iConference 2022, Feb. 28 - Mar. 4, 2022, Virtual 1 University of Wuppertal, Germany 2 University of Konstanz, Germany 3 University of Zurich, Switzerland 4 Heidelberg Academy of Sciences and Humanities, Germany Anastasia Zhukova1 Felix Hamborg2,4, Karsten Donnay3,4, and Bela Gipp1,4
  • 2. Agenda • Motivation: media bias by word choice and labeling • Research objective • Methodology: XCoref • Evaluation • Results & Discussion • Conclusion Zhukova et al. “XCoref: Cross-document Coreference Resolution in the Wild”
  • 3. Motivation: Media bias by word choice and labeling https://www.theaustralian.com.au/world/alexander-lukashenko-locked-and-loaded-to-fight-belarus-rats/news-story/99175bd3cfc13f71e2176310cee98288 https://www.economist.com/europe/2020/06/20/waving-slippers-at-the-cockroach-president-of-belarus 3
  • 4. Motivation: Media bias by word choice and labeling 4 https://www.theaustralian.com.au/world/alexander-lukashenko-locked-and-loaded-to-fight-belarus-rats/news-story/99175bd3cfc13f71e2176310cee98288 https://www.economist.com/europe/2020/06/20/waving-slippers-at-the-cockroach-president-of-belarus
  • 5. Motivation • State-of-the-art cross-document coreference resolution (CDCR) resolves mentions of the same entities/events with strict identity Alexander Lukashenko = Belorussian President Alexander Lukashenko • Problem: CDCR ignores loosely-related coreferential mentions Vladimir Putin about Alexey Navalny: instead of Alexey’s full name always uses name calling, e.g., “Russian Saakashvili,” “the famous blogger,” “the criminal of a famous case.” • “Wild” political news: Substantial lexical variance + mixed coreference relations → bias by word choice and labeling (WCL) • WCL influences perception of entities/events, e.g., by associations Alexander Lukashenko = cockroach = idiosyncratic autocrat • CDCR completely overlooks lexical diversity in non-named-entities Tens of thousands demonstrators = the marchers = Belarus “rats” War in Ukraine = special military operation Zhukova et al. “XCoref: Cross-document Coreference Resolution in the Wild”
  • 6. Research Objectives Research objectives • Revisit target concept analysis (TCA) approach finding cases of bias by WCL… • …and propose XCoref, an unsupervised sieve-based method, that jointly resolves mentions of strict and loose identity relations into coreferential chains • Evaluate XCoref, TCA, and a state-of-the-art CRCR model with the standard CoNLL metrics in (CD)CR Zhukova et al. “XCoref: Cross-document Coreference Resolution in the Wild”
  • 7. Related work • Barely overlapping research – Strict identity relations (i.e., CR) – Near-identity relations – Bridging relations • Biased by WCL coreferential mentions are linked with a mix of all relations (e.g., NewsWCL50) • State-of-the-art CDCR models: trained on strict identity relations, e.g., [Ba19] • TCA [Ha19]: unsupervised method - resolves mentions with diverse relations - evaluated on NewsWCL50 - focuses only on general lexical feature of all concepts • Goal: Revisit TCA and employ specific lexical features per entity/event type Strict identity relations Near- identity relations Bridging relations Datasets ECB+ TAK KBP ACE OntoNotes MEANTIME ISNotes BASHI ARRAU DEFT NiDENT RED NewsWCL50 EeCDCR [Ba19] TCA [Ha19] [Ha19] Hamborg, F., Zhukova, A., & Gipp, B. (2019, June). Automated Identification of Media Bias by Word Choice and Labeling in News Articles. In 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL) (pp. 196-205). IEEE Computer Society. [Ba19] Barhom, S., Shwartz, V., Eirew, A., Bugert, M., Reimers, N., & Dagan, I. (2019, July). Revisiting Joint Modeling of Cross-document Entity and Event Coreference Resolution. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 4179-4189).
  • 9. Methodology: XCoref • Hypothesis: Multiple (sub-)methods required to address various concepts, e.g., – Mainly using strict identity relations, i.e., NE persons, organizations, countries – Mainly consisting of mention with bridging/near identity relations, e.g., groups of persons • XCoref resolves mentions of the concept types of increasing semantic complexity Zhukova et al. “XCoref: Cross-document Coreference Resolution in the Wild”
  • 10. Methodology: XCoref Step 1 Named entity linking (NEL) Step 0 Coreference resolution of named entities (NEs) Step 2 NE head-word and compound match Step 3 Resolution of non-NE mentions Step 4 Groups of persons Step 5 Events and abstract entities Steps 0-3 Named entities (NEs) /wiki/Donald_Trump /wiki/Donald_Trump /wiki/Donald_Trump /wiki/Donald_Trump Donald Trump Donald Donald Trump – Trump - Donald undecisive president of the US Donald Trump – Trump – Donald - undecisive president of the US Trump illegal aliens – undocumented immigrants Trump administration officials – American government Trump-Kim meeting – discussed an issue illegal aliens – undocumented immigrants Trump administration officials – American government Trump-Kim meeting – discussed an issue Unresolved mentions Resolved mentions Donald Trump – Trump - undecisive president of the US Zhukova et al. “XCoref: Cross-document Coreference Resolution in the Wild”
  • 11. Step 4: identification of groups of persons • We integrated approach proposed by Zhukova et al. [Zh21] • OPTICS principle [An99] – The approach finds cluster centers of the most semantically similar mentions – The clusters are expanded with less semantically similar mentions • Improved the clustering approach by two additional sub-steps – Merge intermediate clusters, i.e., merge clusters if similar – Move alien points, i.e., relocate mentions to the better-matching clusters b) Move alien points a) Merge intermediate clusters b) Move alien points a) Merge intermediate clusters b) Move alien points [Zh21] Zhukova, A., Hamborg, F., Donnay, K., & Gipp, B. (2021). Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons. In iConference (1) (pp. 514-526). [An99] Ankerst, M., Breunig, M. M., Kriegel, H. P., & Sander, J. (1999). OPTICS: Ordering points to identify the clustering structure. ACM Sigmod record, 28(2), 49-60.
  • 12. Step 5: events and abstract entities Figure is based on the figure from: https://towardsdatascience.com/hierarchical-clustering-explained-e59b13846da8 Trump-Kim meeting Trump, Kim, meeting discussed an issue discussed, issue Trump-Kim meeting discussed an issue • Step 5 resolves mention chains of events and abstract entities – actions, objects, events – consist of noun phrases (NPs) and verb phrases (VPs) • Weights head-lemma word vectors of each phrase ➢ highlights core meaning of the head of phrase ➢ keeps the context • Hierarchical clustering – cosine distance – average linkage 2) Weighted word vectors 3) Hierarchical clustering 1) Lemmatization, heads of phrases Zhukova et al. “XCoref: Cross-document Coreference Resolution in the Wild”
  • 14. Evaluation • Methodology: State-of-the-art metrics for coreference resolution – Evaluate combinations and links between annotated (key) and resolved (response) mentions • Metrics: CoNLL for (CD)CR – MUC – B3 – CEAFe – F1_conll: average of all three metrics • Previously ignored for looser coreference relations • Dataset: NewsWCL50 – Contains coreference chains with a mix of relations – i.e., strict identity, near-identity, bridging Figure source: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5667668/ Zhukova et al. “XCoref: Cross-document Coreference Resolution in the Wild”
  • 15. Baselines • Same-lemma • Target Concept Analysis (TCA) Hamborg et al. (2019) – Original version – With improved preprocessing – Three types of word vectors: word2vec, fastText, GloVe • Event-entity cross document coreference resolution (EeCDCR) Barhom et al. (2019) – With AllenNLP’s semantic role labeling (SRL) • Ablation study for XCoref – Intermediate XCorefinterm with baseline methods for Steps 4 & 5 • Step 4: hierarchical clustering [Zh21] • Step 5: TCA’s step 2 [Ha19] – Three types of word vectors: word2vec, fastText, GloVe Zhukova et al. “XCoref: Cross-document Coreference Resolution in the Wild”
  • 16. Results: General Method Word vectors MUC B3 CEAF_e F1_conll R P F1 R P F1 R P F1 Lemma --- 75.7 93.8 83.8 36.8 88.8 52.0 63.1 7.8 13.9 49.9 EeCDCR GloVe 69.4 90.6 78.6 33.1 82.3 47.2 58.6 7.8 13.7 46.5 TCA word2vec 73.4 89.4 80.6 37.2 73.7 49.4 51.9 8.7 14.9 48.3 TCA preproc word2vec 72.9 89.5 80.3 38.4 75.6 50.9 54.5 9.1 15.6 48.9 fastText 72.9 87.6 79.6 37.3 71.5 49.0 52.0 9.5 16.0 48.2 GloVe 77.2 88.3 82.4 41.8 67.0 51.4 52.9 12.1 19.6 51.2 XCoref interm word2vec 68.4 90.3 77.8 37.7 84.0 52.0 63.0 8.4 14.8 48.2 fastText 74.2 87.3 80.2 38.7 71.5 50.2 58.4 11.6 19.4 50.0 GloVe 75.7 88.5 81.6 40.6 72.1 52.0 58.9 12.1 20.0 51.2 XCoref word2vec 70.7 89.8 79.1 36.3 82.4 50.4 63.0 9.4 16.3 48.6 fastText 78.6 90.0 83.9 43.1 70.5 53.5 60.4 13.7 22.4 53.3 GloVe 79.3 90.8 84.7 44.4 72.2 55.0 61.1 13.9 22.6 54.1 • XCoref outperforms all baselines on NewsWCL50 dataset + 2.9pp to the best baseline • State-of-the-art EeCDCR performs the worst • GloVe systematically improves performance of all methods Zhukova et al. “XCoref: Cross-document Coreference Resolution in the Wild”
  • 17. Results: Steps 4 & 5 Sieves MUC B3 CEAF_e F1_conll R P F1 R P F1 R P F1 init shared 28.6 89.8 43.4 15.6 96.0 26.8 41.4 2.1 4.1 24.7 S1 shared 40.2 92.3 56.0 22.9 95.0 36.9 49.7 3.1 5.8 32.9 S2 shared 42.2 92.5 58.0 24.6 94.6 39.1 51.2 3.3 6.2 34.4 S3 shared 45.7 91.0 60.8 27.3 91.3 42.1 51.6 3.6 6.7 36.5 S4 interm 52.7 90.2 66.6 29.1 87.5 43.6 51.5 4.2 7.8 39.3 S5 interm 75.7 88.5 81.6 40.6 72.1 52.0 58.9 12.1 20.0 51.2 S4 54.6 91.1 68.3 30.4 86.1 44.9 51.9 4.4 8.1 40.4 S5 79.3 90.8 84.7 44.4 72.2 55.0 61.3 13.9 22.7 54.1 • Comparison of XCoref’s steps to the intermediate version of XCorefinterm • Both steps 4 & 5 outperform their baseline methods – Step 4: +1.1pp – Step 5: +2.9pp Zhukova et al. “XCoref: Cross-document Coreference Resolution in the Wild”
  • 18. Results • XCoref resolves the mentions of the mixed identity • XCoref outperforms all methods on NewsWCL50 dataset, e.g., TCA • XCoref suffers from not merging a considerable number of related smaller clusters – requires being more aware of the context • Yet, XCoref resolves the following cases with just unsupervised methods... (next slide) Zhukova et al. “XCoref: Cross-document Coreference Resolution in the Wild”
  • 19. Results: qualitative examples Name Resolved mentions DNC's lawsuit the process of legal discovery, a sham lawsuit about a bogus Russian collusion claim, a bogus Russian collusion claim, allegations of obstruction of justice, a desperate attempt to keep a collusion narrative going ahead of November mid-term elections, a new low to raise money, the DNC's move, the lawsuit to drum up donations for the party Immigrants no legitimate asylum-seekers, the asylum-seekers, the individual migrants planning to seek asylum, groups of the migrants with their children, migrant families that request asylum, unauthorized immigrants, the migrants in the caravan, a caravan of immigrants, members of the caravan, these large ``Caravans'' of people, a few hundred asylum seekers, refugees, those individuals, applicants, people who request protection, people traveling without documents, several groups of people associated with the caravan, undocumented immigrants, asylum-seeking immigrant ``caravan‘’, group of about 100 people PRK-USA Summit the summit meeting, a potential meeting of the two leaders, an extraordinary meeting following months of heated rhetoric, meet with the North Korean dictator, discuss its nuclear weapons program, Kim's offer for a summit, a great chance to solve a world problem, won't even have a meeting at all, a once-unthinkable encounter between him and Mr. Kim, a one-on-one meeting with North Korea leader Kim Jong Un, direct talks between U.S. President Donald Trump and Kim, Mr. Kim's invitation to meet, the upcoming summit meeting with the North Korean leader, a great chance to solve a world problem, unwavering determination in addressing the challenge of North Korea Denuclearization a deal to destroy only inter-continental missiles that could reach the United States, gives up nuclear weapons, months of heated rhetoric over Pyongyang's nuclear weapons program, to engage in a process headed toward an ambiguous goal, broad and "abstract " statements about the need for North Korea to ``denuclearize'', give up its nuclear program, yet to take any tangible steps to give up its nuclear arsenal, to address the threats posed by its nuclear and missile program Zhukova et al. “XCoref: Cross-document Coreference Resolution in the Wild”
  • 20. Discussion • Looser coreference relations represent a challenge to the established CDCR models, – i.e., EeCDCR (performs worst among all baselines) • Next direction for CDCR research – employ language models trained in the news domain – evaluate on the “wild” political news articles with diverse identity relations
  • 21. Conclusion • We propose XCoref, an unsupervised sieve-based method for cross-document coreference resolution (CDCR) • XCoref resolves mentions of a mix of strict and loose coreference relations – American steelmakers – shuttered plants and mills – the United States – Trump Administration officials • Performance on NewsWCL50 – XCoref performs the best – a well-established CDCR model performs the worst • CDCR models need to be tested on more diverse CDCR datasets – With both strict identity and more loose bridging coreference relations. • CDCR in a challenging “wild” environment of political news articles – increases awareness of bias by word choice and labelling – increases applicability of CDCR research Zhukova et al. “XCoref: Cross-document Coreference Resolution in the Wild”
  • 22. References • Hamborg, F., Zhukova, A., & Gipp, B. (2019, June). Automated Identification of Media Bias by Word Choice and Labeling in News Articles. In 2019 ACM/IEEE Joint Conference on Digital Libraries (JCDL) (pp. 196-205). IEEE Computer Society. • Barhom, S., Shwartz, V., Eirew, A., Bugert, M., Reimers, N., & Dagan, I. (2019, July). Revisiting Joint Modeling of Cross-document Entity and Event Coreference Resolution. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (pp. 4179-4189). • Zhukova, A., Hamborg, F., Donnay, K., & Gipp, B. (2021). Concept Identification of Directly and Indirectly Related Mentions Referring to Groups of Persons. In iConference (1) (pp. 514-526). • Minard, A. L., Speranza, M., Urizar, R., Altuna, B., Van Erp, M., Schoen, A., & Van Son, C. (2016, May). MEANTIME, the NewsReader multilingual event and time corpus. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC'16) (pp. 4417-4422). • Cybulska, A., & Vossen, P. (2014, May). Using a sledgehammer to crack a nut? Lexical diversity and event coreference resolution. In LREC (pp. 4545-4552). • Recasens, M., Martí, M. A., & Orǎsan, C. (2012, May). Annotating near-identity from coreference disagreements. In Proceedings of the Eighth International Conference on Language Resources and Evaluation (LREC'12) (pp. 165-172). • O’Gorman, T., Wright-Bettner, K., & Palmer, M. (2016, November). Richer event description: Integrating event coreference with temporal, causal and bridging annotation. In Proceedings of the 2nd Workshop on Computing News Storylines (CNS 2016) (pp. 47-56). • Recasens, M., Hovy, E., & Martí, M. A. (2011). Identity, non-identity, and near-identity: Addressing the complexity of coreference. Lingua, 121(6), 1138- 1152. • Hou, Y., Markert, K., & Strube, M. (2018). Unrestricted bridging resolution. Computational Linguistics, 44(2), 237-284. • Rösiger, I. (2018, May). BASHI: A corpus of wall street journal articles annotated with bridging links. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC 2018). • Poesio, M., & Artstein, R. (2008, May). Anaphoric Annotation in the ARRAU Corpus. In Proceedings of the Sixth International Conference on Language Resources and Evaluation (LREC'08).
  • 23. Questions Anastasia Zhukova zhukova@uni-wuppertal.de Zhukova et al. “XCoref: Cross-document Coreference Resolution in the Wild”