Selectional preferences (SPs) are widely used in natural language processing as a rich source of semantic information. While SPs have been traditionally induced from textual data, human lexical acquisition is known to rely on both linguistic and perceptual experience. We present the first SP learning method that simultaneously draws knowledge from text, images and videos, using image and video descriptions to obtain visual features. Our results show that it outperforms linguistic and visual models in isolation, as well as the existing SP induction approaches.
WSO2's API Vision: Unifying Control, Empowering Developers
Perceptually Grounded Selectional Preferences – Using Flickr Image and Video tags for Natural Language Semantics
1. Perceptually Grounded Selectional Preferences
Katia Shutova es407@cam.ac.uk
https://www.cl.cam.ac.uk/~es407/
Niket Tandon ntandon@mpi-inf.mpg.de
https://www.mpi-inf.mpg.de/~ntandon/
Gerard de Melo gdm@demelo.org
http://gerard.demelo.org
Contact
1. Philip Resnik (1993). Selection and information: A class-based approach to lexical relationships. Technical report, Univ. of Pennsylvania.
2. Frank Keller & Mirella Lapata (2003). Using the Web to obtain frequencies for unseen bigrams. Comp. Ling. 29(3):459–484.
3. Mats Rooth, Stefan Riezler, Detlef Prescher, Glenn Carroll, Franz Beil (1999). Inducing a semantically annotated lexicon via
EM-based clustering. Proc. ACL 1999.
4. Sebastian Pado, Ulrike Pado, Katrin Erk (2007). Flexible, corpus-based modelling of human plausibility judgements.
Proc. EMNLP-CoNLL 2007.
5. Diarmuid O ́Seaghdha (2010). Latent variable models of selectional preference. Proc. ACL 2010.
6. Ekaterina Shutova (2010) . Automatic metaphor interpretation as a paraphrasing task. Proc. NAACL 2010.
References
Selectional Preferences are semantic constraints of a predicate
on its arguments
The authors wrote a new paper. ✔ high plausibility
The paper wrote a new author. ✘ Very low plausibility
The cat is eating your sausage. ✔ high plausibility
The carrot is eating your keys. ✘ Very low plausibility
Knowledge of selectional preferences is useful in many NLP tasks:
●
Word Sense Disambiguation
●
Parsing (resolving attachments)
●
Semantic Role Labelling
●
Natural Language Inference
●
Detecting multi-word expressions
●
Etc.
What are Selectional Preferences?
Previous work uses purely text-based methods:
●
Problem of topic bias / figurative uses of words: E.g. “cut” mainly occurs with
“cost” and “price” as arguments in the BNC.
●
→ Skew towards abstract uses, different from our daily life experience of cutting
Our Approach: Use Multimodal Data
●
BNC for text (parsed using RASP parser)
●
100 million Flickr images/videos from Yahoo! Webscope Flickr-100M dataset
Challenge: From a set of Flickr Tags to noun–verb pairs
Collecting Multimodal Correlations
Step 1: Acquisition of Argument Classes
Observed data is sparse → Need to generalize
Spectral Clustering of nouns using Jensen-Shannon divergence as sim. measure
Step 2: Quantifying Selectional Preferences
Selectional Preference Model
Shutova (2010) approach: metaphor interpretation as paraphrasing
“a carelessly leaked report” → “a carelessly disclosed report”
1) Take maximum likelihood candidate verbs
2) Filter by semantic similarity to target verb
3) Filter for a strong selectional preference fit (assuming it indicates literalness or
conventionality) so as to remove figurative uses
Application to Metaphor Interpretation
Multimodal selectional preferences outperform
●
purely linguistic and visual models, and
●
previous state-of-the-art models
Conclusions
Method
Seen
Dataset
Unseen
Dataset
Rooth et al. (1999) EM 0.487 0.520
Pado et al. (2007)
VSM
0.490 0.430
O'Seaghda (2010) LDA 0.548 0.605
Visual Model 0.126 0.132
Linguistic Model 0.688 0.559
Interpolated Model 0.728 0.430
Direct Evaluation
mother
sitting
baby
lap
rachel lind
wristwatch
pajamas
Clothes
etc.
Ekaterina Shutova Niket Tandon Gerard de Melo
University of Cambridge Max Planck Institute
for Informatics
Tsinghua University
Shutova (2010) LSP ISP
Mean Avg. Prec. (MAP) on
Shutova (2010) gold data 0.62 0.62 0.65
Results on Keller & Lapata (2003)
Datasets (Spearman Rho)
Visual Features: verb lemmas
co-occurring with nouns
Linguistic Features:
grammatical relations
Approach
1) Stemming
2) Filtering:
Remove rare words
and named entities
3) POS tagging:
by jointly disambiguating
tags to WordNet synsets
so as to maximize
coherence
WordNet
priors
similarities
https://www.flickr.com/photos/seandreilinger/465827703/
canon
rebel
400D
ball
portfolio
yellow
serve
website
racket
roland
garros
etc.
https://www.flickr.com/photos/pysanchis/2521372121/