This paper introduces the first plural end-to-end coreference resolution model. This coreference system generates spans embeddings, which are optimized to predict the mentions and the coreferent antecedents. This model handles plural mentions and plural speakers. Our approach builds on the higher-order coreference resolution with coarse-to-fine inference by adapting it to the Friends corpus, which has plural speakers as a feature and also has singletons. Additionally, the model predicts plural antecedents as done in previous plural coreference works. These, in combination with the singular antecedents, are used to construct the final clusters, which have a one-to-one correspondence to the entities.
3. Coreference Resolution
■ Coreference Resolution
– Find expressions that refer to the same entity
– Very important for higher-level NLP tasks
– Natural language understanding: QA, summarization, information extraction, etc.
– Unresolved fundamental NLP task
– Syntactical structures, speakers, sequential order, text comprehension
– Ambiguity: She told Monica she was smart. She told Joey she was smart.
– Entities: General (locations, objects, etc.) or specific (people).
– Mentions: Nested (The Wall of China). Plural (Mom and dad, they)
4. Corpus: Friends TV Show
■ Entities
– Known entities: Main characters. Eg. Joey is great.
– GENERIC: Characters whose identity is not revealed. Eg. I like the waitress.
– GENERAL: A class of people. Eg. The ideal girl doesn’t exist.
– OTHER: Identity unknown from local context. Eg. The guy next to me.
■ Annotation
– No nested plural entities (mom and dad).
– Plural mentions are not coreferent.
– Plural mentions added to coreferent entities’ clusters (they à mom, dad)
– GENERAL, OTHER à Singletons: One mention. Eg: I like women.
5. Neural Networks
■ FFNN – Feed output forward,
as input to next layer, without
forming a cycle
■ LSTM – Artificial RNN with
loops that allow information
to persist
■ CNN – Deep NN, extracts
most important features in
condensed form
6. End-to-end Coreference Resolution
■ Produce coreferent clusters by assigning antecedents to top span
■ No syntactic parser or mention-detector
■ Y(i) = {ε, 1, . . . , i−1}, for each top span i
■ Dummy Antecedent ε à Not a mention or not coreferent with any antecedent
■ Optimize marginal log-likelihood
■ Random initial prunning,
■ Only gold mentions get positive updates
9. Higher-Order Coreference and Coarse-
to-Fine Inference
■ Higher-Order Coreference
– Expected antecedent
– Gate vector
– Update as weighted average
■ Coarse-to-Fine Inference
– Span ratio r, keep K = rT top spans, T is document length
– For each span, keep top C antecedents (Sc)
– Compute final coreference score S(i,j)
10. Plural Coreference Resolution
■ Uses gold mentions for test set predictions
■ Adds plural annotation to Friends corpus
■ Labeling for plural mentions
■ Clustering algorithm
■ Modify evaluation metrics
■ Feed all mention pairs into Agglomerative Convolutional Neural Network
12. Labeling and Clustering
■ For each span mj, look at each antecedent mi
■ L = S = Singular Antecedent: mi is singular. Assign mj to the cluster of mi
■ R = P = Plural Antecedent: mj is singular and mi is plural. Assign mi to mj’s cluster
13. Approach
■ End-to-end Neural Coreference + Plural Coreference Resolution
■ Adapt from CoNLL corpus to Friends corpus
– Nested mentions (nested mention detection, F1 72 vs 85)
– Singletons
– Plural speakers
– Character entities
– Plural mentions
■ Predict plural antecedents
■ Merge mentions into entity clusters
14. Plural speakers
■ Singular speakers à Same speaker binary flag vector
■ Plural speakers à Intersection of speakers, non-empty flag (speaker in common)
– Pros: captures existence of relationship
– Cons: no measure of match strength (exact vs partial)
■ Average speakers
– Assign embeddings to each speaker
– Average speakers’ embeddings
– Both mentions embeddings
– Pair-wise multiplication
15. Training Labels, Singletons
■ Baseline: End-to-end singular coreference resolution
– Modify gold labels for training set
– Pick ”head” mention, output singular clusters
– Evaluate on plural metrics
– Sort by appearance frequency
– Most popular, least popular, none. Eg. They à mom:2, dad:1
■ Singletons: Mentions not coreferent to other mentions
– Add left-over spans with mention score > threshold t (t=0)
– Only gold mentions receive + updates
16. Plural mentions
■ Singular Coreference
– Predict a coreferent antecedent for each mention
– Merge to the same cluster (transitive nature)
■ Plural Coreference
– Not transitive. Eg. {me, we}, {you, we} but not {me, you}
– Already have singular antecedents, need plural antecedents
– Predictions for all pairs of mentions (not just one per span)
17. Singularity
■ Singular + Plural antecedents à Revisit Higher-order coreference
– Weight Singular/Plural with Singularity S
■ Training loss = S*LossSingular + (1-S)*LossPlural 0.6 < S < 0.7
■ Antecedent labels
– Singular if span is singular (gold entity group size)
– Plural if span is plural and antecedent is singular
– Non-coreferent otherwise
18. Merging clusters + Many antecedents
■ Original clusters from baseline (singular antecedents): Add mi to cluster[mj]
– Then, for each plural prediction, add the span mj to cluster[mi]
– Example: I think we won. You did great. (we,you)àS, (I,we)àP. {you,we}, {I,we}
■ Only top singular antecedent for each span (or dummy)
– Softmax à Antecedents with score > dummy (no output label)
– Limit number of antecedents: maxplural and maxsingular (error accumulation)
– Example: I bought it for me, but we could share it.
19. Antecedent conflicts
■ Clustering: singular antecedents, then plural antecedents
■ Wrong predictions à Error propagation
■ New order for clustering
– Look at spans in order
– For each span, process its antecedents in order
■ Fix mention pair marked as singular and plural antecedent
– Pick the highest score
■ Example: I think we won. You did great.
■ Later spans, more antecedents
20. New Plural labels
■ Antecedent labels (Zhou and Choi)
– Singular if antecedent is singular
– Plural if antecedent is plural and span is singular
– Non-coreferent otherwise
– First singular, then plural antecedents
– New Plural
■ Reduce error propagation with span ordering
– New Plural + Ordering
25. Analisys
■ Many Singular antecedents
– Pros: Helps with missed antecedents. Essential for plurals
Example: I, me, I. Missed (I, me). We, I, you. Need (We,I) and (I,you)
– Cons:
Example: I, we, you. Wrong (I,you)àS.
– f
26. Conclusion
■ Successfully adapt coreference model to Friends corpus
– Plural speakers, singletons, plural mentions
■ Modify singular coreference (end-to-end) for plural coreference
– Gradually identify weaknesses and improve performance
– Labeling techniques for plural antecedents
– Clustering of antecedents
■ First model to achieve end-to-end neural plural coreference resolution
27. Bibliography
Henry Y. Chen, Ethan Zhou, and Jinho D. Choi. Robust coreference resolution and entity linking on dialogues: Character identification on tv show transcripts. In Proceedings of
the 21st Conference on Computational Natural Language Learning (CoNLL 2017), pages 216–225. Association for Computational Linguistics, 2017. doi: 28 29
10.18653/v1/K17-1023. URL http://aclweb.org/anthology/K17-1023.
Yu-Hsin Chen and Jinho D. Choi. Character identification on multiparty conversation: Identifying mentions of characters in tv shows. In Proceedings of the 17th Annual
Meeting of the Special Interest Group on Discourse and Dialogue, pages 90–100. Association for Computational Linguistics, 2016. doi: 10.18653/v1/W16-3612. URL
http://aclweb.org/anthology/W16-3612.
Kevin Clark and Christopher D. Manning. Deep reinforcement learning for mention-ranking coreference models. In Proceedings of the 2016 Conference on Empirical Methods
in Natural Language Processing, pages 2256–2262. Association for Computational Linguistics, 2016. doi: 10.18653/v1/D16-1245. URL http: //aclweb.org/anthology/D16-1245.
Kevin Clark and Christopher D. Manning. Improving coreference resolution by learning entity-level distributed representations. CoRR, abs/1606.01323, 2016. URL
http://arxiv.org/abs/1606.01323.
Arzoo Katiyar and Claire Cardie. Nested named entity recognition revisited. In Proceedings of the 2018 Conference of the North American Chapter of the Association for
Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), pages 861–871, New Orleans, Louisiana, June 2018. Association for 30 Computational
Linguistics. doi: 10.18653/v1/N18-1079. URL http://www. aclweb.org/anthology/N18-1079.
Kenton Lee, Luheng He, Mike Lewis, and Luke Zettlemoyer. End-to-end neural coreference resolution. CoRR, abs/1707.07045, 2017. URL http://arxiv.org/ abs/1707.07045.
Kenton Lee, Luheng He, and Luke Zettlemoyer. Higher-order coreference resolution with coarse-to-fine inference. CoRR, abs/1804.05392, 2018. URL http://arxiv.
org/abs/1804.05392.
Sameer Pradhan, Alessandro Moschitti, Nianwen Xue, Olga Uryupina, and Yuchen Zhang. Conll-2012 shared task: Modeling multilingual unrestricted coreference in ontonotes.
In Joint Conference on EMNLP and CoNLL - Shared Task, CoNLL ’12, pages 1–40, Stroudsburg, PA, USA, 2012. Association for Computational Linguistics. URL
http://dl.acm.org/citation.cfm?id=2391181.2391183.
Sam Wiseman, Alexander M. Rush, and Stuart M. Shieber. Learning global features for coreference resolution. CoRR, abs/1604.03035, 2016. URL http://arxiv.
org/abs/1604.03035
Ethan Zhou and Jinho D. Choi. They exist! introducing plural mentions to coreference resolution and entity linking. In Proceedings of the 27th International Conference on 31
Computational Linguistics, pages 24–34. Association for Computational Linguistics, 2018. URL http://aclweb.org/anthology/C18-1003.
https://nlp.stanford.edu/projects/coref.shtml