Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Automatic Disambiguation

The Ins and Outs of Preposition Semantics Challenges in Comprehensive Corpus Annotation and Automatic Disambiguation
Nathan Schneider, Georgetown University
@ DC-NLP Meetup, 14 October 2019
1
nert.georgetown.edu

What linguistically-inspired analyses  
can be obtained
from humans and machines
accurately, robustly, eﬃciently, comprehensively
in text corpora across domains & languages?
2

What does meaning look like?
3
[Ribeiro et al. 2019]
UCD finished the 2006 championship as Dublin champions
ANSWERS
What did someone finish something as?
[He et al. 2015]
Categories produced from logical form
x.state(x) ^ borders(x, texas), x.size(x))
NP : texas
N : x.state(x)
SNP : x.state(x)
(SNP )/NP : x. y.borders(y, x)
(SNP )/NP : x. y.borders(x, y)
N/N : g. x.state(x) ^ g(x)
N/N : g. x.borders(x, texas) ^ g(x)
N)/NP : g. x. y.borders(x, y) ^ g(x)
NP/N : g. arg max(g, x.size(x))
S/NP : x.size(x)
nction that returns a truth value; function to refer
nts a rule. The first column lists the triggers that
e second column lists the category that is created.
he logical form at the top of this column.
[Zettlemoyer & Collins 2005]
Figure 1: (a) an example sentence, (b) its origi
(Gt) and (d) bottom-up (Gb).
h h1,
h4:_every_qh0:5i(ARG0 x6, RSTR h7, BODY h5),
h8:_person_n_1h6:12i(ARG0 x6),
h2:_fail_v_1h13:19i(ARG0 e3, ARG1 h9),
h10:_eat_v_1h23:27i(ARG0 e11, ARG1 x6, ARG2 i12)
{ h1 =q h2, h7 =q h8, h9 =q h10 } i
(e / eat-01
:polarity -
:ARG0 (p / person
:mod (e / every)))
1: ERS and AMR representations of (3a,b).[Bender et al. 2015]

4
Lexical — Abstract (e.g. supersenses, NER)
Lexical — Lexicon-based (e.g. WordNet)
Structural — Abstract (e.g. UCCA)
Structural — Lexicalist (e.g. AMR, FrameNet)
Structural — Formal Compositionality (e.g. logic)
SemanticLayers

adposition = preposition 
| postposition
6
in on at by
for to of with from
about … de dans sur
pour avec à …
kā ko ne se
mẽ par tak …
(n)eun i/ga,
do, (r)eul …

7
Feature 85A: Order of Adposition and Noun Phrase 
Dryer in WALS, http://wals.info/chapter/85

“I study preposition semantics.”
9

“Senator Dianne Feinstein laying
the groundwork to sue DOJ  
for release of the whistleblower
report”
11
https://www.dailykos.com/stories/2019/9/21/1886964/-Senator-Dianne-Feinstein-laying-the-groundwork-
to-sue-DOJ-for-release-of-the-whistleblower-report

“Senator Dianne Feinstein laying
the groundwork to sue DOJ  
for release of the whistleblower
report”
12
https://www.dailykos.com/stories/2019/9/21/1886964/-Senator-Dianne-Feinstein-laying-the-groundwork-
to-sue-DOJ-for-release-of-the-whistleblower-report

“Where did you shop
for your baby?”
13

“Where did you shop
for your baby?”
14

15
https://www.reddit.com/r/facepalm/comments/adhcno/i_believe_thats_meant_to_be_the_date/
https://twitter.com/s_crawford/status/1143721488534708224

16
based on COCA list of 5000 most frequent English words

With great frequency comes
great polysemy.
17
leave for Paris
spatial: goal/
destination
go to Paris
ate for hours temporal: duration
ate over most of an
hour
a gift for mother recipient give the gift to mother
go to the store for eggs purpose
go to the store to buy
eggs
pay/search for the eggs theme
spend money on the
eggs

Labeling Ambiguity
18
Identity vs. Purpose
 
Locus vs. Time
INSTALLED ON:

Labeling Ambiguity
19
Purpose vs. Explanation
“sue DOJ for release of
the whistleblower report”
“shop for your baby”
Beneﬁciary vs. Possession

Prepositions
20
What people think I do What I actually do
space · time · causality · comparison 
identity · meronymy · possession 
emotion · perception · cognition 
communication · social relations 
benefaction · measurement · pragmatics...

Interesting for NLP
• Syntactic parsing (PP attachment)
• Semantic role labeling/semantic parsing →
NLU
‣ The meaning distinctions that languages tend to
grammaticalize
• Second language acquisition/grammatical
error correction
• Machine translation
‣ MT into English: mistranslation of prepositions
among most common errors [Hashemi & Hwa, 2014;
Popović, 2017]
21

Goal: Disambiguation
22
• Descriptive theory (annotation scheme, guidelines)
• Dataset
• Disambiguation system (classifier)

Approaches to Semantic Description/
Disambiguation of Prepositions
• Sense-based, e.g. The Preposition Project and spinoffs [Litkowski &
Hargraves 2005, 2007; Litkowski 2014; Ye & Baldwin, 2007; Saint-Dizier 2006;
Dahlmeier et al. 2009; Tratz & Hovy 2009; Hovy et al. 2010, 2011; Tratz & Hovy 2013]
‣ Polysemy networks [e.g. Brugman 1981; Lakoff 1987; Tyler & Evans 2001]
‣ Space and time [Herskovits 1986; Regier 1996; Zwarts & Winter 2000;
Bowerman & Choi 2003; Khetarpal et al. 2009; Xu & Kemp 2010]
• Class-based [Moldovan et al. 2004; Badulescu & Moldovan 2009; O’Hara & Wiebe
2009; Srikumar & Roth 2011, 2013; Müller et al. 2012 for German]
• Our work is the ﬁrst class-based approach that is comprehensive
w.r.t. tokens AND types [Schneider et al. 2015, 2016, 2018; Hwang et al. 2017]
23

Approaches to Semantic Description/
Disambiguation of Prepositions
• Sense-based, e.g. The Preposition Project and spinoffs [Litkowski &
Hargraves 2005, 2007; Litkowski 2014; Ye & Baldwin, 2007; Saint-Dizier 2006;
Dahlmeier et al. 2009; Tratz & Hovy 2009; Hovy et al. 2010, 2011; Tratz & Hovy 2013]
‣ Polysemy networks [e.g. Brugman 1981, Lakoff 1987, Tyler & Evans 2001]
‣ Space and time [Herskovits 1986, Regier 1996, Zwarts & Winter 2000,
Bowerman & Choi 2003, Khetarpal et al. 2009, Xu & Kemp 2010]
• Class-based [Moldovan et al. 2004; Badulescu & Moldovan 2009; O’Hara & Wiebe
2009; Srikumar & Roth 2011, 2013; Müller et al. 2012 for German]
• Our work is the ﬁrst class-based approach that is comprehensive
w.r.t. tokens AND types [Schneider et al. 2015, 2016, 2018; Hwang et al. 2017]
24

• Coarse-grained supersenses
‣ The cat is on the mat in the kitchen on a Sunday in the afternoon
• Comprehensive with respect to naturally occurring text
• Uniﬁed scheme for prepositions and possessives
‣ the pages of the book / the book’s pages
• Scene role and preposition’s lexical contribution are
distinguished
Our Approach
25
LocusLocus LocusTime
LocusWhole

Semantic Network of Adposition
and Case Supersenses
26
1.2 Inventory
The v2 hierarchy is a tree with 50 labels. They are organized into three major
subhierarchies: CIRCUMSTANCE (18 labels), PARTICIPANT (14 labels), and CON-
FIGURATION (18 labels).
Circumstance
Temporal
Time
StartTime
EndTime
Frequency
Duration
Interval
Locus
Source
Goal
Path
Direction
Extent
Means
Manner
Explanation
Purpose
Participant
Causer
Agent
Co-Agent
Theme
Co-Theme
Topic
Stimulus
Experiencer
Originator
Recipient
Cost
Beneﬁciary
Instrument
Conﬁguration
Identity
Species
Gestalt
Possessor
Whole
Characteristic
Possession
PartPortion
Stuff
Accompanier
InsteadOf
ComparisonRef
RateUnit
Quantity
Approximator
SocialRel
OrgRole
• Items in the CIRCUMSTANCE subhierarchy are prototypically expressed as
adjuncts of time, place, manner, purpose, etc. elaborating an event or en-
[Schneider et al., ACL 2018]
(SNACS)

Annotation
• We fully annotated an English corpus of web reviews
‣ Original annotators were CU Boulder students with prior linguistic
annotation experience [Schneider et al. 2016]
‣ “The main annotation was divided into 34 batches of 100
sentences.” ≈1 hr / 100 sentences / annotator
‣ “Original IAA for most of these batches fell between 60% and 78%,
depending on factors such as the identities of the annotators and
when the annotation took place (annotator experience and PrepWiki
documentation improved over time).”
‣ Original hierarchy had 75 categories. As the supersense categories
were revised (down to 50), we updated the annotations.
27
#BenderRule

28
!
Supersense
Tagged
Repository of
English with a
Uniﬁed
Semantics for
Lexical
Expressions tiny.cc/streusle
✦ 55k words of English
web reviews
✴ 3,000 strong MWE
mentions
✴ 700 weak MWE
mentions
✴ 9,000 noun mentions
✴ 8,000 verb mentions
✴ 4,000 prepositions
✴ 1,000 possessives

STREUSLE Examples
Three weeks ago, burglars tried to gain_entry  
 
into the rear of my home.
Mrs._Tolchin provided us with excellent service and  
 
came with a_great_deal of knowledge and professionalism!
29
Characteristic
Theme
Quantity
(simpliﬁed slightly)
Time
Goal Whole
Possessor

Guidelines
• Description of the 50 supersenses as applied to English
currently stands at 85 pages (https://arxiv.org/abs/1704.02134)
‣ Examples, criteria for borderline cases, special constructions
• Currently beta-testing a website that provides browsable 
guidelines + adpositions database + corpus annotations
30

Prepositions
31
Other
2,503 Temporal
516
Spatial
1,148
P and PP tokens by scene role in web reviews (STREUSLE 4.1)

Preposition types
32
out of 
at least 
due to 
because of 
at all 
less than 
as soon as 
all over 
instead of 
other than 
next to 
nothing but 
in front of 
more than 
more like 
along with 
just about 
thanks to 
rather than 
as to 
such as 
as long as 
aside from 
…
to 
of 
in 
for 
with 
on 
at 
from 
about 
like 
by 
after 
as 
back 
before 
over 
than 
since 
around 
into 
out 
without 
off 
until 
through 
ago 
away 
within 
during 
…

P tokens in web reviews (STREUSLE 4.1)
Distribution: P types
33
excludes PP idioms, possessives, ??, etc.
by
afterlike
as
about
from
158
on
199
at
225 with
336
to
357
in
488
for
499
of
508

P tokens in web reviews (STREUSLE 4.1)
Distribution: P types vs. SNACS
34
excludes PP idioms, possessives, ??, etc.
3948 tokens, 47 supersenses
Explanation
Stimulus
Direction
Purpose
ComparisonRef
Quantity
Theme
Topic
Goal
Time
Locus
3948 tokens, 117 types
by
afterlike
as
about
from
158
on
199
at
225 with
336
to
357
in
488
for
499
of
508

Possessives
• Previous literature on annotating semantics of possessives
[Badulescu & Moldovan 2009; Tratz & Hovy 2013]
• The preposition of occupies similar semantic space, sometimes
alternates with ’s:
‣ the pages of the book / the book’s pages
‣ the murder of the boy / the boy’s murder
• We applied SNACS to annotate all instances of s-genitives
(possessive ’s and possessive pronouns): 1116 tokens
‣ Can inform linguistic studies of the genitive alternation  
[Rosenbach 2002; Stefanowitsch 2003; Shih et al. 2012; Wolk et al. 2013]
35[Blodgett & Schneider, LREC 2018]

Interannotator Agreement:
New Corpus & Genre
36[Schneider et al., ACL 2018]
After a few rounds of pilot annotation on  
The Little Prince and minor additions to the
guidelines: 78% on 216 unseen targets
‣ 5 annotators, varied familiarity with scheme
‣ Exact agreement (avg. pairwise):  
74.4% on roles, 81.3% on functions
✴ In the same region of the hierarchy 93% of the
time
✴ Most similar pair of annotators:  
78.7% on roles, 88.0% on functions

Goal: Disambiguation
37
• Descriptive theory (annotation scheme, guidelines)
• Dataset
• Disambiguation system (classifier)
✓
✓

Models
• Most Frequent (MF) baseline: deterministically predict
label most often seen for the preposition in training
• Neural: BiLSTM + multilayer perceptron classiﬁer
‣ word2vec embeddings
‣ similar architecture to previous work on coarsened preposition
disambiguation [Gonen & Goldberg 2016]
• Feature-rich linear: SVM with features based on previous
work [Srikumar & Roth 2013]

Disambiguation
• Accuracy using gold targets & gold syntax
39
0
10
20
30
40
50
60
70
80
90
Role Fxn Full
Most Frequent Neural Feature-rich linear

Frequent Errors
• Locus, the most frequent label, is overpredicted
• SocialRel vs. OrgRole
• Gestalt vs. Possessor

Recent Developments:
Embeddings
• Embeddings
‣ Liu et al. NAACL 2019: Contextualized embeddings help a lot for preposition
supersense disambiguation 
 
 
 
 
 
 
 
41
Pretrained Representation
POS Supersense ID
Avg. CCG PTB EWT Chunk NER ST GED PS-Role PS-Fx
ELMo (original) best layer 81.58 93.31 97.26 95.61 90.04 82.85 93.82 29.37 75.44 84.8
ELMo (4-layer) best layer 81.58 93.81 97.31 95.60 89.78 82.06 94.18 29.24 74.78 85.9
ELMo (transformer) best layer 80.97 92.68 97.09 95.13 93.06 81.21 93.78 30.80 72.81 82.2
OpenAI transformer best layer 75.01 82.69 93.82 91.28 86.06 58.14 87.81 33.10 66.23 76.9
BERT (base, cased) best layer 84.09 93.67 96.95 95.21 92.64 82.71 93.72 43.30 79.61 87.9
BERT (large, cased) best layer 85.07 94.28 96.73 95.80 93.64 84.44 93.83 46.46 79.17 90.1
GloVe (840B.300d) 59.94 71.58 90.49 83.93 62.28 53.22 80.92 14.94 40.79 51.5
Previous state of the art
(without pretraining)
83.44 94.7 97.96 95.82 95.77 91.38 95.15 39.83 66.89 78.2
Table 1: A comparison of the performance of the best layerwise probing model for each contextualize
GloVe-based probing baseline and the previous state of the art. The best contextualizer for each task
Results for all layers and tasks, as well as the papers describing the prior state of the art, are given in Ap
Probing Model NER GED Conj GGParent
Linear 82.85 29.37 38.72 67.50
MLP (1024d) 87.19 47.45 55.09 78.80
task–specific contextualization can come
ther fine-tuning CWRs or using fixed ou
tures as inputs to a task-trained contextua
Pretrained Representation
POS Supersense ID
Avg. CCG PTB EWT Chunk NER ST GED PS-Role PS-Fxn EF
ELMo (original) best layer 81.58 93.31 97.26 95.61 90.04 82.85 93.82 29.37 75.44 84.87 73.20
ELMo (4-layer) best layer 81.58 93.81 97.31 95.60 89.78 82.06 94.18 29.24 74.78 85.96 73.03
ELMo (transformer) best layer 80.97 92.68 97.09 95.13 93.06 81.21 93.78 30.80 72.81 82.24 70.88
OpenAI transformer best layer 75.01 82.69 93.82 91.28 86.06 58.14 87.81 33.10 66.23 76.97 74.03
BERT (base, cased) best layer 84.09 93.67 96.95 95.21 92.64 82.71 93.72 43.30 79.61 87.94 75.11
BERT (large, cased) best layer 85.07 94.28 96.73 95.80 93.64 84.44 93.83 46.46 79.17 90.13 76.25
GloVe (840B.300d) 59.94 71.58 90.49 83.93 62.28 53.22 80.92 14.94 40.79 51.54 49.70
Previous state of the art
without pretraining)
83.44 94.7 97.96 95.82 95.77 91.38 95.15 39.83 66.89 78.29 77.10
ble 1: A comparison of the performance of the best layerwise probing model for each contextualizer against a
oVe-based probing baseline and the previous state of the art. The best contextualizer for each task is bolded.
sults for all layers and tasks, as well as the papers describing the prior state of the art, are given in Appendix D.
robing Model NER GED Conj GGParent
inear 82.85 29.37 38.72 67.50
MLP (1024d) 87.19 47.45 55.09 78.80
task–specific contextualization can come from ei-
ther fine-tuning CWRs or using fixed output fea-
tures as inputs to a task-trained contextualizer; fur-
…
Linguistic Knowledge and Transferability of Contextual Representations
Nelson F. Liu~⇤
Matt Gardner|
Yonatan Belinkov}
Matthew E. Peters|
Noah A. Smith|

Paul G. Allen School of Computer Science & Engineering,
University of Washington, Seattle, WA, USA

Recent Developments:
Embeddings
• Embeddings
‣ Kim et al. *SEM 2019: The way embeddings are pretrained  
matters a lot for learning preposition meanings in NLI
‣ E.g., pretraining for image captioning fails to capture abstract preposition
senses
43
Probing What Different NLP Tasks Teach Machines
about Function Word Comprehension
Najoung Kim†,⇤
, Roma Patel , Adam Poliak†
, Alex Wang@
,
Patrick Xia†
, R. Thomas McCoy†
, Ian Tenney , Alexis Ross⇧
,
Tal Linzen†
, Benjamin Van Durme†
, Samuel R. Bowman@
, Ellie Pavlick ,⇤
†
Johns Hopkins University Brown University
@
New York University Google AI Language ⇧
Harvard University

Case and Adposition Representation
for Multi-Lingual Semantics (CARMLS)
Can we use the supersenses for case markers and
adpositions in other languages?
44

45
for
pendant
to
pour à
ate for hours
raise money to buy
a house
leave for Paris
a gift for mother
raise money for the church
give the gift to mother
go to Paris

46
for
pendant
Duration
ate for hours
raise money to buy
a house
leave for Paris
to
pour
a gift for mother
raise money for the
church
Purpose
Purpose
Recipient
Destination
à
give the gift to mother
Recipient
go to Paris
Destination

Challenges
• Deﬁning phenomena of interest (what exactly counts as an
adposition/case marker?)
• Morphology
• Application of supersenses
‣ Including semantics/pragmatics not expressed appositionally in
English!
47

• Korean (also Japanese) has postpositions that signal
pragmatic focus similar to English ‘only’, ‘also’, ‘even’, etc.
빵도         먹어요
bread-to    eat
eat bread also (as well as other things)
빵만         먹어요 
bread-man     eat
eat only bread
‣ Semantic/pragmatic territory not covered by English
adpositions!
‣ Proposed Solution: Additional supersense(s).
48

Further Questions
1. Descriptive linguistics: What are the similarities and
differences in adposition/case semantics across
languages?
2. Annotation efficiency: Can human annotation be
simplified or partially crowdsourced/automated without
sacrificing quality?
3. Cross-lingual modeling and applications
49

Looking Ahead
• Adposition use in L2 English
• Beyond adpositions/possessives: general semantic roles
‣ Preliminary results on annotation of English subjects and
objects [Shalev et al. 2019]
• Integration with graph-structured semantic
representations [Prange et al. CoNLL 2019]
50

Adpositions & case markers are
an important challenge for NLP!
52
1.2 Inventory
The v2 hierarchy is a tree with 50 labels. They are organized into three major
subhierarchies: CIRCUMSTANCE (18 labels), PARTICIPANT (14 labels), and CON-
FIGURATION (18 labels).
Circumstance
Temporal
Time
StartTime
EndTime
Frequency
Duration
Interval
Locus
Source
Goal
Path
Direction
Extent
Means
Manner
Explanation
Purpose
Participant
Causer
Agent
Co-Agent
Theme
Co-Theme
Topic
Stimulus
Experiencer
Originator
Recipient
Cost
Beneﬁciary
Instrument
Conﬁguration
Identity
Species
Gestalt
Possessor
Whole
Characteristic
Possession
PartPortion
Stuff
Accompanier
InsteadOf
ComparisonRef
RateUnit
Quantity
Approximator
SocialRel
OrgRole
• Items in the CIRCUMSTANCE subhierarchy are prototypically expressed as
adjuncts of time, place, manner, purpose, etc. elaborating an event or en-
tity.
• Items in the PARTICIPANT subhierarchy are prototypically entities func-
tioning as arguments to an event.
• Items in the CONFIGURATION subhierarchy are prototypically entities or
properties in a static relationship to some entity.
5

nert
Acknowledgments
CU annotators
Julia Bonn 
Evan Coles-Harris 
Audrey Farber 
Nicole Gordiyenko 
Megan Hutto 
Celeste Smitz 
Tim Watervoort 
CMU pilot annotators
Archna Bhatia 
Carlos Ramírez 
Yulia Tsvetkov 
Michael Mordowanec 
Matt Gardner 
Spencer Onuffer 
Nora Kazour 
Special thanks
Noah Smith 
Mark Steedman 
Claire Bonial 
Tim Baldwin 
Miriam Butt 
Chris Dyer 
Ed Hovy 
Lingpeng Kong 
Lori Levin 
Ken Litkowski 
Orin Hargraves 
Michael Ellsworth 
Dipanjan Das & Google
54
tiny.cc/streusle

Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Automatic Disambiguation

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Automatic Disambiguation

Similar a Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Automatic Disambiguation (20)

Más de Seth Grimes

Más de Seth Grimes (20)

Último

Último (20)

Preposition Semantics: Challenges in Comprehensive Corpus Annotation and Automatic Disambiguation