The document summarizes Nathan Schneider's presentation on preposition semantics. It discusses challenges in annotating prepositions in corpora and approaches to their semantic description and disambiguation. It presents Schneider's work on developing a unified semantic scheme for prepositions and possessives consisting of 50 semantic classes applied to a corpus of English web reviews. Inter-annotator agreement for the new corpus was 78%. Models for preposition disambiguation were evaluated, with the feature-rich linear model achieving the highest accuracy of 80%.
Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Automatic Disambiguation
1. The Ins and Outs of Preposition Semantics Challenges in Comprehensive Corpus Annotation and Automatic Disambiguation
Nathan Schneider, Georgetown University
@ DC-NLP Meetup, 14 October 2019
1
nert.georgetown.edu
2. What linguistically-inspired analyses
can be obtained
from humans and machines
accurately, robustly, efficiently, comprehensively
in text corpora across domains & languages?
2
3. What does meaning look like?
3
[Ribeiro et al. 2019]
UCD finished the 2006 championship as Dublin champions
ANSWERS
What did someone finish something as?
[He et al. 2015]
Categories produced from logical form
x.state(x) ^ borders(x, texas), x.size(x))
NP : texas
N : x.state(x)
SNP : x.state(x)
(SNP )/NP : x. y.borders(y, x)
(SNP )/NP : x. y.borders(x, y)
N/N : g. x.state(x) ^ g(x)
N/N : g. x.borders(x, texas) ^ g(x)
N)/NP : g. x. y.borders(x, y) ^ g(x)
NP/N : g. arg max(g, x.size(x))
S/NP : x.size(x)
nction that returns a truth value; function to refer
nts a rule. The first column lists the triggers that
e second column lists the category that is created.
he logical form at the top of this column.
[Zettlemoyer & Collins 2005]
Figure 1: (a) an example sentence, (b) its origi
(Gt) and (d) bottom-up (Gb).
h h1,
h4:_every_qh0:5i(ARG0 x6, RSTR h7, BODY h5),
h8:_person_n_1h6:12i(ARG0 x6),
h2:_fail_v_1h13:19i(ARG0 e3, ARG1 h9),
h10:_eat_v_1h23:27i(ARG0 e11, ARG1 x6, ARG2 i12)
{ h1 =q h2, h7 =q h8, h9 =q h10 } i
(e / eat-01
:polarity -
:ARG0 (p / person
:mod (e / every)))
1: ERS and AMR representations of (3a,b).[Bender et al. 2015]
6. adposition = preposition
| postposition
6
in on at by
for to of with from
about … de dans sur
pour avec à …
kā ko ne se
mẽ par tak …
(n)eun i/ga,
do, (r)eul …
7. 7
Feature 85A: Order of Adposition and Noun Phrase
Dryer in WALS, http://wals.info/chapter/85
11. “Senator Dianne Feinstein laying
the groundwork to sue DOJ
for release of the whistleblower
report”
11
https://www.dailykos.com/stories/2019/9/21/1886964/-Senator-Dianne-Feinstein-laying-the-groundwork-
to-sue-DOJ-for-release-of-the-whistleblower-report
12. “Senator Dianne Feinstein laying
the groundwork to sue DOJ
for release of the whistleblower
report”
12
https://www.dailykos.com/stories/2019/9/21/1886964/-Senator-Dianne-Feinstein-laying-the-groundwork-
to-sue-DOJ-for-release-of-the-whistleblower-report
17. With great frequency comes
great polysemy.
17
leave for Paris
spatial: goal/
destination
go to Paris
ate for hours temporal: duration
ate over most of an
hour
a gift for mother recipient give the gift to mother
go to the store for eggs purpose
go to the store to buy
eggs
pay/search for the eggs theme
spend money on the
eggs
19. Labeling Ambiguity
19
Purpose vs. Explanation
“sue DOJ for release of
the whistleblower report”
“shop for your baby”
Beneficiary vs. Possession
20. Prepositions
20
What people think I do What I actually do
space · time · causality · comparison
identity · meronymy · possession
emotion · perception · cognition
communication · social relations
benefaction · measurement · pragmatics...
21. Interesting for NLP
• Syntactic parsing (PP attachment)
• Semantic role labeling/semantic parsing →
NLU
‣ The meaning distinctions that languages tend to
grammaticalize
• Second language acquisition/grammatical
error correction
• Machine translation
‣ MT into English: mistranslation of prepositions
among most common errors [Hashemi & Hwa, 2014;
Popović, 2017]
21
23. Approaches to Semantic Description/
Disambiguation of Prepositions
• Sense-based, e.g. The Preposition Project and spinoffs [Litkowski &
Hargraves 2005, 2007; Litkowski 2014; Ye & Baldwin, 2007; Saint-Dizier 2006;
Dahlmeier et al. 2009; Tratz & Hovy 2009; Hovy et al. 2010, 2011; Tratz & Hovy 2013]
‣ Polysemy networks [e.g. Brugman 1981; Lakoff 1987; Tyler & Evans 2001]
‣ Space and time [Herskovits 1986; Regier 1996; Zwarts & Winter 2000;
Bowerman & Choi 2003; Khetarpal et al. 2009; Xu & Kemp 2010]
• Class-based [Moldovan et al. 2004; Badulescu & Moldovan 2009; O’Hara & Wiebe
2009; Srikumar & Roth 2011, 2013; Müller et al. 2012 for German]
• Our work is the first class-based approach that is comprehensive
w.r.t. tokens AND types [Schneider et al. 2015, 2016, 2018; Hwang et al. 2017]
23
24. Approaches to Semantic Description/
Disambiguation of Prepositions
• Sense-based, e.g. The Preposition Project and spinoffs [Litkowski &
Hargraves 2005, 2007; Litkowski 2014; Ye & Baldwin, 2007; Saint-Dizier 2006;
Dahlmeier et al. 2009; Tratz & Hovy 2009; Hovy et al. 2010, 2011; Tratz & Hovy 2013]
‣ Polysemy networks [e.g. Brugman 1981, Lakoff 1987, Tyler & Evans 2001]
‣ Space and time [Herskovits 1986, Regier 1996, Zwarts & Winter 2000,
Bowerman & Choi 2003, Khetarpal et al. 2009, Xu & Kemp 2010]
• Class-based [Moldovan et al. 2004; Badulescu & Moldovan 2009; O’Hara & Wiebe
2009; Srikumar & Roth 2011, 2013; Müller et al. 2012 for German]
• Our work is the first class-based approach that is comprehensive
w.r.t. tokens AND types [Schneider et al. 2015, 2016, 2018; Hwang et al. 2017]
24
25. • Coarse-grained supersenses
‣ The cat is on the mat in the kitchen on a Sunday in the afternoon
• Comprehensive with respect to naturally occurring text
• Unified scheme for prepositions and possessives
‣ the pages of the book / the book’s pages
• Scene role and preposition’s lexical contribution are
distinguished
Our Approach
25
LocusLocus LocusTime
LocusWhole
26. Semantic Network of Adposition
and Case Supersenses
26
1.2 Inventory
The v2 hierarchy is a tree with 50 labels. They are organized into three major
subhierarchies: CIRCUMSTANCE (18 labels), PARTICIPANT (14 labels), and CON-
FIGURATION (18 labels).
Circumstance
Temporal
Time
StartTime
EndTime
Frequency
Duration
Interval
Locus
Source
Goal
Path
Direction
Extent
Means
Manner
Explanation
Purpose
Participant
Causer
Agent
Co-Agent
Theme
Co-Theme
Topic
Stimulus
Experiencer
Originator
Recipient
Cost
Beneficiary
Instrument
Configuration
Identity
Species
Gestalt
Possessor
Whole
Characteristic
Possession
PartPortion
Stuff
Accompanier
InsteadOf
ComparisonRef
RateUnit
Quantity
Approximator
SocialRel
OrgRole
• Items in the CIRCUMSTANCE subhierarchy are prototypically expressed as
adjuncts of time, place, manner, purpose, etc. elaborating an event or en-
[Schneider et al., ACL 2018]
(SNACS)
27. Annotation
• We fully annotated an English corpus of web reviews
‣ Original annotators were CU Boulder students with prior linguistic
annotation experience [Schneider et al. 2016]
‣ “The main annotation was divided into 34 batches of 100
sentences.” ≈1 hr / 100 sentences / annotator
‣ “Original IAA for most of these batches fell between 60% and 78%,
depending on factors such as the identities of the annotators and
when the annotation took place (annotator experience and PrepWiki
documentation improved over time).”
‣ Original hierarchy had 75 categories. As the supersense categories
were revised (down to 50), we updated the annotations.
27
#BenderRule
28. 28
!
Supersense
Tagged
Repository of
English with a
Unified
Semantics for
Lexical
Expressions tiny.cc/streusle
✦ 55k words of English
web reviews
✴ 3,000 strong MWE
mentions
✴ 700 weak MWE
mentions
✴ 9,000 noun mentions
✴ 8,000 verb mentions
✴ 4,000 prepositions
✴ 1,000 possessives
29. STREUSLE Examples
Three weeks ago, burglars tried to gain_entry
into the rear of my home.
Mrs._Tolchin provided us with excellent service and
came with a_great_deal of knowledge and professionalism!
29
Characteristic
Theme
Quantity
[Schneider et al., ACL 2018]
(simplified slightly)
Time
Goal Whole
Possessor
30. Guidelines
• Description of the 50 supersenses as applied to English
currently stands at 85 pages (https://arxiv.org/abs/1704.02134)
‣ Examples, criteria for borderline cases, special constructions
• Currently beta-testing a website that provides browsable
guidelines + adpositions database + corpus annotations
30
32. Preposition types
32
out of
at least
due to
because of
at all
less than
as soon as
all over
instead of
other than
next to
nothing but
in front of
more than
more like
along with
just about
thanks to
rather than
as to
such as
as long as
aside from
…
to
of
in
for
with
on
at
from
about
like
by
after
as
back
before
over
than
since
around
into
out
without
off
until
through
ago
away
within
during
…
33. P tokens in web reviews (STREUSLE 4.1)
Distribution: P types
33
excludes PP idioms, possessives, ??, etc.
by
afterlike
as
about
from
158
on
199
at
225 with
336
to
357
in
488
for
499
of
508
34. P tokens in web reviews (STREUSLE 4.1)
Distribution: P types vs. SNACS
34
excludes PP idioms, possessives, ??, etc.
3948 tokens, 47 supersenses
Explanation
Stimulus
Direction
Purpose
ComparisonRef
Quantity
Theme
Topic
Goal
Time
Locus
3948 tokens, 117 types
by
afterlike
as
about
from
158
on
199
at
225 with
336
to
357
in
488
for
499
of
508
35. Possessives
• Previous literature on annotating semantics of possessives
[Badulescu & Moldovan 2009; Tratz & Hovy 2013]
• The preposition of occupies similar semantic space, sometimes
alternates with ’s:
‣ the pages of the book / the book’s pages
‣ the murder of the boy / the boy’s murder
• We applied SNACS to annotate all instances of s-genitives
(possessive ’s and possessive pronouns): 1116 tokens
‣ Can inform linguistic studies of the genitive alternation
[Rosenbach 2002; Stefanowitsch 2003; Shih et al. 2012; Wolk et al. 2013]
35[Blodgett & Schneider, LREC 2018]
36. Interannotator Agreement:
New Corpus & Genre
36[Schneider et al., ACL 2018]
After a few rounds of pilot annotation on
The Little Prince and minor additions to the
guidelines: 78% on 216 unseen targets
‣ 5 annotators, varied familiarity with scheme
‣ Exact agreement (avg. pairwise):
74.4% on roles, 81.3% on functions
✴ In the same region of the hierarchy 93% of the
time
✴ Most similar pair of annotators:
78.7% on roles, 88.0% on functions
38. Models
• Most Frequent (MF) baseline: deterministically predict
label most often seen for the preposition in training
• Neural: BiLSTM + multilayer perceptron classifier
‣ word2vec embeddings
‣ similar architecture to previous work on coarsened preposition
disambiguation [Gonen & Goldberg 2016]
• Feature-rich linear: SVM with features based on previous
work [Srikumar & Roth 2013]
38[Schneider et al., ACL 2018]
39. Disambiguation
• Accuracy using gold targets & gold syntax
39
0
10
20
30
40
50
60
70
80
90
Role Fxn Full
Most Frequent Neural Feature-rich linear
[Schneider et al., ACL 2018]
40. Frequent Errors
• Locus, the most frequent label, is overpredicted
• SocialRel vs. OrgRole
• Gestalt vs. Possessor
40[Schneider et al., ACL 2018]
41. Recent Developments:
Embeddings
• Embeddings
‣ Liu et al. NAACL 2019: Contextualized embeddings help a lot for preposition
supersense disambiguation
41
Pretrained Representation
POS Supersense ID
Avg. CCG PTB EWT Chunk NER ST GED PS-Role PS-Fx
ELMo (original) best layer 81.58 93.31 97.26 95.61 90.04 82.85 93.82 29.37 75.44 84.8
ELMo (4-layer) best layer 81.58 93.81 97.31 95.60 89.78 82.06 94.18 29.24 74.78 85.9
ELMo (transformer) best layer 80.97 92.68 97.09 95.13 93.06 81.21 93.78 30.80 72.81 82.2
OpenAI transformer best layer 75.01 82.69 93.82 91.28 86.06 58.14 87.81 33.10 66.23 76.9
BERT (base, cased) best layer 84.09 93.67 96.95 95.21 92.64 82.71 93.72 43.30 79.61 87.9
BERT (large, cased) best layer 85.07 94.28 96.73 95.80 93.64 84.44 93.83 46.46 79.17 90.1
GloVe (840B.300d) 59.94 71.58 90.49 83.93 62.28 53.22 80.92 14.94 40.79 51.5
Previous state of the art
(without pretraining)
83.44 94.7 97.96 95.82 95.77 91.38 95.15 39.83 66.89 78.2
Table 1: A comparison of the performance of the best layerwise probing model for each contextualize
GloVe-based probing baseline and the previous state of the art. The best contextualizer for each task
Results for all layers and tasks, as well as the papers describing the prior state of the art, are given in Ap
Probing Model NER GED Conj GGParent
Linear 82.85 29.37 38.72 67.50
MLP (1024d) 87.19 47.45 55.09 78.80
task–specific contextualization can come
ther fine-tuning CWRs or using fixed ou
tures as inputs to a task-trained contextua
Pretrained Representation
POS Supersense ID
Avg. CCG PTB EWT Chunk NER ST GED PS-Role PS-Fxn EF
ELMo (original) best layer 81.58 93.31 97.26 95.61 90.04 82.85 93.82 29.37 75.44 84.87 73.20
ELMo (4-layer) best layer 81.58 93.81 97.31 95.60 89.78 82.06 94.18 29.24 74.78 85.96 73.03
ELMo (transformer) best layer 80.97 92.68 97.09 95.13 93.06 81.21 93.78 30.80 72.81 82.24 70.88
OpenAI transformer best layer 75.01 82.69 93.82 91.28 86.06 58.14 87.81 33.10 66.23 76.97 74.03
BERT (base, cased) best layer 84.09 93.67 96.95 95.21 92.64 82.71 93.72 43.30 79.61 87.94 75.11
BERT (large, cased) best layer 85.07 94.28 96.73 95.80 93.64 84.44 93.83 46.46 79.17 90.13 76.25
GloVe (840B.300d) 59.94 71.58 90.49 83.93 62.28 53.22 80.92 14.94 40.79 51.54 49.70
Previous state of the art
without pretraining)
83.44 94.7 97.96 95.82 95.77 91.38 95.15 39.83 66.89 78.29 77.10
ble 1: A comparison of the performance of the best layerwise probing model for each contextualizer against a
oVe-based probing baseline and the previous state of the art. The best contextualizer for each task is bolded.
sults for all layers and tasks, as well as the papers describing the prior state of the art, are given in Appendix D.
robing Model NER GED Conj GGParent
inear 82.85 29.37 38.72 67.50
MLP (1024d) 87.19 47.45 55.09 78.80
task–specific contextualization can come from ei-
ther fine-tuning CWRs or using fixed output fea-
tures as inputs to a task-trained contextualizer; fur-
…
Linguistic Knowledge and Transferability of Contextual Representations
Nelson F. Liu~⇤
Matt Gardner|
Yonatan Belinkov}
Matthew E. Peters|
Noah A. Smith|
Paul G. Allen School of Computer Science & Engineering,
University of Washington, Seattle, WA, USA
42. Recent Developments:
Embeddings
• Embeddings
‣ Kim et al. *SEM 2019: The way embeddings are pretrained
matters a lot for learning preposition meanings in NLI
‣ E.g., pretraining for image captioning fails to capture abstract preposition
senses
43
Probing What Different NLP Tasks Teach Machines
about Function Word Comprehension
Najoung Kim†,⇤
, Roma Patel , Adam Poliak†
, Alex Wang@
,
Patrick Xia†
, R. Thomas McCoy†
, Ian Tenney , Alexis Ross⇧
,
Tal Linzen†
, Benjamin Van Durme†
, Samuel R. Bowman@
, Ellie Pavlick ,⇤
†
Johns Hopkins University Brown University
@
New York University Google AI Language ⇧
Harvard University
43. Case and Adposition Representation
for Multi-Lingual Semantics (CARMLS)
Can we use the supersenses for case markers and
adpositions in other languages?
44
44. 45
for
pendant
to
pour à
ate for hours
raise money to buy
a house
leave for Paris
a gift for mother
raise money for the church
give the gift to mother
go to Paris
45. 46
for
pendant
Duration
ate for hours
raise money to buy
a house
leave for Paris
to
pour
a gift for mother
raise money for the
church
Purpose
Purpose
Recipient
Destination
à
give the gift to mother
Recipient
go to Paris
Destination
46. Challenges
• Defining phenomena of interest (what exactly counts as an
adposition/case marker?)
• Morphology
• Application of supersenses
‣ Including semantics/pragmatics not expressed appositionally in
English!
47
47. • Korean (also Japanese) has postpositions that signal
pragmatic focus similar to English ‘only’, ‘also’, ‘even’, etc.
빵도 먹어요
bread-to eat
eat bread also (as well as other things)
빵만 먹어요
bread-man eat
eat only bread
‣ Semantic/pragmatic territory not covered by English
adpositions!
‣ Proposed Solution: Additional supersense(s).
48
48. Further Questions
1. Descriptive linguistics: What are the similarities and
differences in adposition/case semantics across
languages?
2. Annotation efficiency: Can human annotation be
simplified or partially crowdsourced/automated without
sacrificing quality?
3. Cross-lingual modeling and applications
49
49. Looking Ahead
• Adposition use in L2 English
• Beyond adpositions/possessives: general semantic roles
‣ Preliminary results on annotation of English subjects and
objects [Shalev et al. 2019]
• Integration with graph-structured semantic
representations [Prange et al. CoNLL 2019]
50
51. Adpositions & case markers are
an important challenge for NLP!
52
1.2 Inventory
The v2 hierarchy is a tree with 50 labels. They are organized into three major
subhierarchies: CIRCUMSTANCE (18 labels), PARTICIPANT (14 labels), and CON-
FIGURATION (18 labels).
Circumstance
Temporal
Time
StartTime
EndTime
Frequency
Duration
Interval
Locus
Source
Goal
Path
Direction
Extent
Means
Manner
Explanation
Purpose
Participant
Causer
Agent
Co-Agent
Theme
Co-Theme
Topic
Stimulus
Experiencer
Originator
Recipient
Cost
Beneficiary
Instrument
Configuration
Identity
Species
Gestalt
Possessor
Whole
Characteristic
Possession
PartPortion
Stuff
Accompanier
InsteadOf
ComparisonRef
RateUnit
Quantity
Approximator
SocialRel
OrgRole
• Items in the CIRCUMSTANCE subhierarchy are prototypically expressed as
adjuncts of time, place, manner, purpose, etc. elaborating an event or en-
tity.
• Items in the PARTICIPANT subhierarchy are prototypically entities func-
tioning as arguments to an event.
• Items in the CONFIGURATION subhierarchy are prototypically entities or
properties in a static relationship to some entity.
5
53. nert
Acknowledgments
CU annotators
Julia Bonn
Evan Coles-Harris
Audrey Farber
Nicole Gordiyenko
Megan Hutto
Celeste Smitz
Tim Watervoort
CMU pilot annotators
Archna Bhatia
Carlos Ramírez
Yulia Tsvetkov
Michael Mordowanec
Matt Gardner
Spencer Onuffer
Nora Kazour
Special thanks
Noah Smith
Mark Steedman
Claire Bonial
Tim Baldwin
Miriam Butt
Chris Dyer
Ed Hovy
Lingpeng Kong
Lori Levin
Ken Litkowski
Orin Hargraves
Michael Ellsworth
Dipanjan Das & Google
54
tiny.cc/streusle