Query-Driven Hypothesis Generation for Answering Queries over NLP Graphs, by Chris Welty, Ken Barker, Lora Aroyo, Shilpa Arora

“30 are better than one”

Query-Driven Hypothesis Generation for
Answering Queries over NLP Graphs

Chris Welty, Ken Barker, Lora Aroyo, Shilpa Arora

Tex
Answering Conjunctive SPARQL Queries over NLP Graphs

Approach
the NLP process is not a one-shot deal
to decrease the cost of maintaining critical system DBs
the query provides context without changing the LSW
can we replace the human for what the user is seeking
andcan we build a machine re-interpret the text
thus an opportunity to reader for this

NLP NLP Stack

Graphs

query

re-interpret


NLP Stack
•  Contains NER, CoRef, RelEx, entity disambiguation
•  RelEx: SVM learner with output score: probabilities/
confidences for each known relation that the
sentence expresses it between each pair of
mentions
•  Run over target corpus producing NLP graph
•  nodes are entities (clusters of mentions produced
by coref)
•  edges are type statements between entities and
classes in the ontology, or relations detected
between mentions of these entities in the corpus


NLP Graph

citizenOf
Person Country
citizenOf

…
Mr.
X
of
India
…

coref

…
in
places
like
India,
Iraq,
…

citizenOf
GPE Country
subPlace

NLP Graph

citizenOf
Person Country

Mr.
X
India

coref

India
Iraq

GPE Country
subPlace

NLP Graph

Mr. X rdf:type

rdf:type

citizenOf

India Country
India
Person
GPE rdf:type

subPlaceOf

rdf:type

Iraq

rdf:subClassOf

Relation Extraction by RelEx
•  RelEx: a set of SVM binary classiﬁers, one per relation
•  for each sentence in the corpus,
•  for each pair of mentions in that sentence,
•  for each known relation
•  produce a probability that that pair is related by the
relation
•  NLP graphs are generated by selecting relations from RelEx output
in two ways:
•  Primary: takes only the top scoring relation between any
mention pair above a conﬁdence threshold
•  Secondary: takes all relations between all mention pairs above
a threshold


RelEx Secondary Graph

Mr. X rdf:type

rdf:type

causes

locatedIn

subPlaceOf

citizenOf

India Country
India
Person
GPE rdf:type

subPlaceOf

rdf:type

Iraq

rdf:subClassOf

Primary vs. Secondary

P R F

Primary @ 0.1 0.19 0.39 0.26

Primary @ 0.2 0.29 0.33 0.30

Secondary @ 0 0.01 0.95 0.02

Recall of max-F conﬁguration

Conjunctive Queries
ﬁnd all terrorist organizations that were agents of bombings
in Lebanon on October 23, 1983:

SELECT
?t

WHERE
{

?t
rdf:type
mric:TerroristOrganization
.

?b
rdf:type
mric:Bombing
.

R
=
.65
?b
mric:mediatingAgent
?t
.

R
=
.09
?b
mric:eventLocation
mric:Lebanon
.

R
=
.97
?b
mric:eventDate
"1983-‐10-‐23"
.

}

R
=
.057

Problem with Conjunctive Queries
n
•  [Π Recall(Rk) ] x Recallcoref
k=1

•  Recall for n term query O(Recalln)
•  for complex queries Recall becomes
dominating factor
•  in our experiments: query recall <.1 for n>3
•  To get any particular correct answer, all NLP
components had to get it right

Hypothesis Generation
•  For queries of size N
–  For each term
•  relax the query by removing the term H
•  for each solution
–  bind the variables in H from the solution forming a hypothesis

–  If no solutions for size N-1 are found, then try for N-2

•  appropriate for queries that are almost answerable,
e.g. missing one of the terms
•  biased towards generating more answers to queries,
e.g. perform poorly on queries for which the corpus
does not contain the answer

mric:bombing

mric:TerroristOrganiza=on

rdf:type
rdf:type

SELECT
?t
t
mric:mediatingAgent b

WHERE
{

?t
rdf:type
mric:TerroristOrganization
.

?b
rdf:type
mric:Bombing
.
mric:eventLocation
?b
mric:mediatingAgent
?t
.
mric:eventDate
?b
mric:eventLocation
mric:Lebanon
.

?b
mric:eventDate
"1983-‐10-‐23"
.

}
mric:Lebanon

1983-‐10-‐23

ﬁnd all bombings by terrorist orgs in Lebanon
(hypothesize that the bombings were on 1983-10-23)

This subgraph matches
ﬁnd all bombings by the relaxed query

terrorist orgs in Lebanon

mric:org-‐16
mric:event-‐3

mric:eventDate

1983-‐10-‐23

hypothesize that event-3 was on 1983-10-23

Hypothesis Validation
•  Once generated, a hypothesis must be validated
–  gather evidence that it is true
–  the probability of a triple being true increases

•  We utilize a stack of hypothesis checkers that provide
–  confidence whether a hypothesis holds
–  provenance: a pointer to a span of text that supports it

•  Can be used to bind complex computational tasks
–  e.g. formal reasoning/choosing between low-confidence extractions
–  such tasks are made more tractable by using hypotheses as
goals, e.g. a reasoner may be used effectively by constraining to
only a part of the graph connected to a hypothesis

Secondary Graph for Validation
•  Hypotheses can be validated by looking for the
tuple in the secondary graph
•  a tuple will appear in SG if the subject and object entities
occur in the same sentence somewhere in the corpus

•  With precision at .02, it is important to ﬁnd a
productive threshold for accepting hypotheses
•  we conducted several experiments to ﬁnd this threshold

Experiments
•  3 for dev, 3 for test

•  each experiment compares query results from
only PG to query results using the PG+SG for
hypothesis validation

•  the three experiments compare performance
at diﬀerent primary graph thresholds

0-threshold primary graph
with & without secondary graph
secondary graph: all@0

for a given PG threshold we vary the SG threshold for validated hypotheses (x-axis)

.1-threshold primary graph

best performance point
(.01 SG threshold)

red line indicates the PG threshold - the PG-only ﬂattens below this threshold as expected

.2-threshold primary graph

best performance point
(.01 SG threshold)

if a triple in the SG completes a query that is mostly answered by the PG
it is very likely to be true

the best performing conﬁguration for dev is .2 threshold PG with SG
hypotheses validated at .01 threshold

Performance

Text

the diﬀerence at the chosen threshold on the test set
signiﬁcantly outperforms the baseline on the same set

Conclusions
•  the secondary graph can be exploited for getting answers

•  the probability that a relation is true between two entities
increases signiﬁcantly when that relation completes a query
answer that is partially satisﬁed in the primary graph

•  able to target discarded interpretations when they will meet
some user need

•  the NLP process is not a one-shot deal, the query provides
context for what the user is seeking and thus an opportunity to
re-interpret the text

Query-Driven Hypothesis Generation for Answering Queries over NLP Graphs, by Chris Welty, Ken Barker, Lora Aroyo, Shilpa Arora

Recommended

Recommended

More Related Content

More from Lora Aroyo

More from Lora Aroyo (20)

Recently uploaded

Recently uploaded (20)

Query-Driven Hypothesis Generation for Answering Queries over NLP Graphs, by Chris Welty, Ken Barker, Lora Aroyo, Shilpa Arora