SlideShare una empresa de Scribd logo
1 de 26
Descargar para leer sin conexión
Exploiting Entity Linking in
Queries for Entity Retrieval
Faegheh Hasibi, Krisztian Balog, Svein Erik Bratsberg
Newark, DE, USA, September 2016
Exploiting Entity Linking in
Queries for Entity Retrieval
Faegheh Hasibi, Krisztian Balog, Svein Erik Bratsberg
Newark, DE, USA, September 2016
Entity
Entity
<rdfs:label>
<dbo:abstract>
<dbo:birthPlace>
<dbo:birthDate>
<dbo:deathPlace>
<dbo:deathDate>
<dbo:child>
<dbp:spouse>
<dbp:shortDescription>
<dbo:wikiPageWikiLink>
Ann_Dunham
Ann Dunham
Stanley Ann Dunham (November 29, 1942 – November 7,
1995) was the mother of Barack Obama, the 44th President
of the United States, and an American anthropologis [..]
<Honolulu>
<Hawaii>
<United_States>
1942-11-29
<Honolulu>
<United_States>
1995-11-07
<Barack_Obama>
<Maya_Soetoro-Ng>
<Lolo_Soetoro>
<Barack_Obama,_Sr.>
Anthropologist; mother of Barack Obama
<Barack_Obama>
<President_of_the_United_States>
<United_States>
[..]
Conventional entity retrieval
<Barack_Obama>
Annotations:
barack obama parents
Entity-based representation ˆDˆD
Term-based representation DD
term-based
matching
entity-based
matching
entity linking
<dbo:birthPlace>: [<Honolulu>,
<Hawaii> ]
<dbo:child>: <Barack_Obama>
<dbo:wikiPageWikiLink>:
[ <United_States>,
<Family_of_Barack_Obama>, …]
Query terms:
<rdfs:label>: Ann Dunham
<dbo:abstract>: Stanley Ann Dunham the mother
Barack Obama, was an American
anthropologist who …
<dbo:birthPlace>: Honolulu Hawaii …
<dbo:child>: Barack Obama
<dbo:wikiPageWikiLink>:
United States Family Barack Obama
Term-based representation
barack obama parents
Conventional entity retrieval models compare the term-based
representation of entity against the query terms.
Query
Our proposal
barack obama parents
Incorporating entity linking into the retrieval model
entity linking
Our proposal
<Barack_Obama>
Annotations:
barack obama parents
Entity-based representation ˆDˆD
Term-based representation DD
term-based
matching
entity-based
matching
entity linking
<dbo:birthPlace>: [<Honolulu>,
<Hawaii> ]
<dbo:child>: <Barack_Obama>
<dbo:wikiPageWikiLink>:
[ <United_States>,
<Family_of_Barack_Obama>, …]
Query terms:
<rdfs:label>: Ann Dunham
<dbo:abstract>: Stanley Ann Dunham the mother
Barack Obama, was an American
anthropologist who …
<dbo:birthPlace>: Honolulu Hawaii …
<dbo:child>: Barack Obama
<dbo:wikiPageWikiLink>:
United States Family Barack Obama
Term-based representation
barack obama parents
<Barack_Obama>
Annotations:
barack obama parents
Entity-based representation ˆDˆD
Term-based representation DD
term-based
matching
entity-based
matching
entity linking
<dbo:birthPlace>: [<Honolulu>,
<Hawaii> ]
<dbo:child>: <Barack_Obama>
<dbo:wikiPageWikiLink>:
[ <United_States>,
<Family_of_Barack_Obama>, …]
Query terms:
<rdfs:label>: Ann Dunham
<dbo:abstract>: Stanley Ann Dunham the mother
Barack Obama, was an American
anthropologist who …
<dbo:birthPlace>: Honolulu Hawaii …
<dbo:child>: Barack Obama
<dbo:wikiPageWikiLink>:
United States Family Barack Obama
Entity-based representation
<Barack_Obama>
entity linking
Approach
Entity Linking integrated Retrieval (ELR)
ELR approach
Based on Markov Random Fields (MRF)
D. Metzler and W. B. Croft. A Markov Random Field model for term dependencies. In Proc. of SIGIR, pages 472–479, 2005.
q1q1 q2q2 q3q3
DD
e1e1
barack obama parents BARACK OBAMAbarack obama parents <Barack_Obama>
ELR approach
Based on Markov Random Fields (MRF)
9)
0)
1)
to
al.
on
te
k-
F-
ument and a term node, (ii) 3-cliques consisting of the document
and two term nodes, and (iii) 2-cliques consisting of edges between
the document and an entity node. The potential functions for the
first two types are identical to the SDM model (Eqs. (3) and 4). We
define the potential function for the third clique type as:
E(e, D; ⇤) = exp[ EfE(e, D)],
where E is a free parameter and fE(e, D) is the feature function
for the entity e and document D (to be defined in §4.2). By substi-
tuting all feature functions into Eq. (2), the MRF ranking function
becomes:
P(D|Q)
rank
=
X
qi2Q
T fT (qi, D)+
X
qi,qi+12Q
OfO(qi, qi+1, D)+
X
qi,qi+12Q
U fU (qi, qi+1, D)+
X
e2E(Q)
EfE(e, D).
This model introduces an additional parameter for weighing the im-
portance of entity annotations, E, on top of the three parameters
( {T,O,U}) from the SDM model (cf. §3.2). There is a crucial dif-
ference between entity-based and term-based matches with regards
9)
0)
1)
to
al.
on
te
k-
F-
ument and a term node, (ii) 3-cliques consisting of the document
and two term nodes, and (iii) 2-cliques consisting of edges between
the document and an entity node. The potential functions for the
first two types are identical to the SDM model (Eqs. (3) and 4). We
define the potential function for the third clique type as:
E(e, D; ⇤) = exp[ EfE(e, D)],
where E is a free parameter and fE(e, D) is the feature function
for the entity e and document D (to be defined in §4.2). By substi-
tuting all feature functions into Eq. (2), the MRF ranking function
becomes:
P(D|Q)
rank
=
X
qi2Q
T fT (qi, D)+
X
qi,qi+12Q
OfO(qi, qi+1, D)+
X
qi,qi+12Q
U fU (qi, qi+1, D)+
X
e2E(Q)
EfE(e, D).
This model introduces an additional parameter for weighing the im-
portance of entity annotations, E, on top of the three parameters
( {T,O,U}) from the SDM model (cf. §3.2). There is a crucial dif-
ference between entity-based and term-based matches with regards
Terms
Ordered bigrams
Unordered bigrams
Entities
The framework encompass various retrieval models;
e.g., LM, MLM, PRMS, SDM, and FSDM
Challenge 1
• Number of linked entities is not proportional to query length
Affects setting of parameter
• Entity linking involves some uncertainty
Confidence scores should be integrated into the retrieval model
all cars that are produced in Germany
Germany
(score: 0.7)
Cars_(film)
(score: 0.05)
Normalization
The final model takes the normalized form:
parents BARACK OBAMA
ntation of the ELR model for the
ts”. Here, all the terms are sequen-
se "barack obama" is linked to the
hile #uwN(qi, qi+1) counts the co-
window of N words (where N is set
meter µ is the Dirichlet prior, which
ument length in the collection.
SDM is basically the weighted sum
ained from three sources: (i) query
y bigrams, and (iii) unordered match
ch in §4 employs the same sequential
DM does.
ial Dependence Model
pendence Model (FSDM) [46] ex-
ort structured document retrieval. In
ocument language model of feature
)) with those of the Mixture of Lan-
Given a fielded representation of a
hors, metadata, etc. in the context of
M computes a language model prob-
tion, that ELR is applicable to a wide range of retrieval prob
where documents, or document-based representations of ob
are to be ranked, and entity annotations are available to be l
aged in the matching of documents and queries. Our main foc
this paper, however, is limited to entity retrieval; entity annota
are an integral part of the representation here, cf. Figure 1.
is unlike to traditional document retrieval, where documents w
need to be annotated by an automated process that is prone t
rors). To show the generic nature of our approach, and also fo
sake of notational consistency with the previous section, we
refer to documents throughout this section. We detail how
documents are constructed for our particular task, entity retr
in §4.3.
Our interest in this work lies in incorporating entity annota
and not in creating them. Therefore, entity annotations of the q
are assumed to have been generated by an external entity lin
process, which we treat much like a black box. Formally, giv
input query Q = q1...qn, the set of linked entities is denote
E(Q) = {e1, ..., em}. We do not impose any restrictions on
annotations, i.e., they may be overlapping and a given query
might be linked to multiple entities. It might also be that E
is an empty set. Further, we assume that annotations have c
dence scores associated with them. For each entity e 2 E
let s(e) denote the confidence score of e, with the constraiP
e2E(Q) s(e) = 1.
The graph underlying our model consists of document, term
entity linking
score
Challenge 2
Term-based matching is modeled based on term
frequencies:
tends the SDM model to support structured document retrieval. In
essence, FSDM replaces the document language model of feature
functions (Eqs. (6), (7), and (8)) with those of the Mixture of Lan-
guage Models (MLM) [37]. Given a fielded representation of a
document (e.g., title, body, anchors, metadata, etc. in the context of
web document retrieval), MLM computes a language model prob-
ability for each field and then takes a linear combination of these
field-level models. Hence, FSDM assumes that a separate language
model is built for each document field and then computes the fea-
ture functions based on fields f 2 F, with F being the universe of
fields. For individual terms, the feature function becomes:
fT (qi, D) = log
X
f
wT
f
tfqi,Df + µf
cfqi,f
|Cf |
|Df | + µf
, (9)
while ordered and unordered bigrams are estimated as:
fO(qi, qi+1, D) =
log
X
f
wO
f
tf#1(qi,qi+1),Df
+ µf
cf#1(qi,qi+1),f
|Cf |
|Df | + µf
(10)
fU (qi, qi+1, D) =
log
X
f
wU
f
tf#uwN(qi,qi+1),Df
+ µf
cf#uwN(qi,qi+1),f
|Cf |
|Df | + µf
.
might be l
is an empt
dence scor
let s(e) deP
e2E(Q) s
The grap
entity node
are sequen
tities are in
on this ass
types of cli
ument and
and two ter
the docum
first two ty
define the p
where E
for the enti
tuting all f
becomes:
P
But entities have different behavior than terms …
Entity-based representations are sparse
Single presence of an entity in a document is considered as a
match
Entity-based matching
• The modeling considers presence/absence of
entities in the entity representation
presence/absence of
entity in document field
y that
dence
by the
model.
s as a
of en-
para-
takes
tity BARACK OBAMA (via the relationship <dbo:child>). This
entity-based representation differs from the traditional term-based
representation in at least two important ways. Firstly, each entity
appears at most once in each document field. Secondly, if an entity
appears in a field, then it should be considered a match, irrespec-
tive of what other entities may appear in that field. Consider again
the example in Figure 1, where the field <dbo:birthplace>
has multiple values, HONOLULU and HAWAII. Then, if either of
these entities is linked in the query, that should account for a per-
fect match against this particular field, irrespective of how many
other locations are present in that field. Motivated by these obser-
vations, we define the feature function fE as:
fE(e, D) = log
X
f2F
wE
f
h
(1 ↵)tf{0,1}(e, ˆDf ) + ↵
dfe,f
dff
i
, (13)
where the linear interpolation implements the Jelinek-Mercer smooth-
ing method, with ↵ set to 0.1, and tf{0,1}(e, ˆDf ) indicates whether
the entity e is present in the document field ˆDf or not. For the back-
ground model, we employ the notion of document frequency as fol-
lows: dfe,f = |{ ˆD|e 2 ˆDf }| is the total number of documents that
contain the entity e in field f and df(f) = |{ ˆD| ˆDf 6= ;}| is the
{T,O,U}
of the summations (cf. Eq. (5)) and can be trained without having to
worry about query length normalization. The parameter E, how-
ever, cannot be treated the same manner for two reasons. Firstly,
the number of annotated entities for each query varies and it is inde-
pendent of the length of the query. For example, a long natural lan-
guage query might be annotated with a single entity, while shorter
queries are often linked to several entities, due to their ambiguity.
Secondly, we need to deal with varying levels of uncertainty that
is involved with entity annotations of the query. The confidence
scores associated with the annotations, which are generated by the
entity linking process, should be integrated into the retrieval model.
To address the above issues, we re-write the parameters as a
parameterized function over each clique and define them as:
T (qi) = T
1
|Q|
,
O(qi, qi+1) = O
1
|Q| 1
,
U (qi, qi+1) = U
1
|Q| 1
,
E(e) = Es(e),
where |Q| is the query length and s(e) is the confidence score of en-
tity e obtained from the entity linking step. Considering this para-
metric form of the parameters, our final ranking function takes
the following form:
P(D|Q)
rank
= T
X
qi2Q
1
|Q|
fT (qi, D)+
O
X
qi,qi+12Q
1
|Q| 1
fO(qi, qi+1, D)+
U
X
qi,qi+12Q
1
|Q| 1
fU (qi, qi+1, D)+
D an entity-based representation ˆD is obtained
ument terms and considering only entities. In
work, the entity represented by document D sta
tionships with a number of other entities, as spec
edge base. The various relationships are model
document. Consider the example in Figure 1, wh
represents the entity ANN DUNHAM, who is bein
tity BARACK OBAMA (via the relationship <dbo
entity-based representation differs from the trad
representation in at least two important ways. F
appears at most once in each document field. Sec
appears in a field, then it should be considered
tive of what other entities may appear in that fiel
the example in Figure 1, where the field <dbo
has multiple values, HONOLULU and HAWAII.
these entities is linked in the query, that should
fect match against this particular field, irrespec
other locations are present in that field. Motivat
vations, we define the feature function fE as:
fE(e, D) = log
X
f2F
wE
f
h
(1 ↵)tf{0,1}(e, ˆDf )
where the linear interpolation implements the Jeli
ing method, with ↵ set to 0.1, and tf{0,1}(e, ˆDf )
the entity e is present in the document field ˆDf or
ground model, we employ the notion of documen
lows: dfe,f = |{ ˆD|e 2 ˆDf }| is the total number
contain the entity e in field f and df(f) = |{ ˆD
number of documents with a non-empty field f.
All of the feature functions fT , fO, fU , and f
rameters wf , which control the field weights. Z
set these type of parameters using a learning algo
to a large number of parameters to be trained (t
he query. Therefore, in SDM, {T,O,U} are taken out
tions (cf. Eq. (5)) and can be trained without having to
query length normalization. The parameter E, how-
be treated the same manner for two reasons. Firstly,
annotated entities for each query varies and it is inde-
e length of the query. For example, a long natural lan-
might be annotated with a single entity, while shorter
ten linked to several entities, due to their ambiguity.
need to deal with varying levels of uncertainty that
ith entity annotations of the query. The confidence
ated with the annotations, which are generated by the
process, should be integrated into the retrieval model.
the above issues, we re-write the parameters as a
d function over each clique and define them as:
T (qi) = T
1
|Q|
,
O(qi, qi+1) = O
1
|Q| 1
,
U (qi, qi+1) = U
1
|Q| 1
,
E(e) = Es(e),
he query length and s(e) is the confidence score of en-
d from the entity linking step. Considering this para-
of the parameters, our final ranking function takes
form:
rank
= T
X
qi2Q
1
|Q|
fT (qi, D)+
O
X
qi,qi+12Q
1
|Q| 1
fO(qi, qi+1, D)+
U
X
qi,qi+12Q
1
|Q| 1
fU (qi, qi+1, D)+
an entity-based representation of documents. For each document
D an entity-based representation ˆD is obtained by ignoring doc-
ument terms and considering only entities. In the context of our
work, the entity represented by document D stands in typed rela-
tionships with a number of other entities, as specified in the knowl-
edge base. The various relationships are modeled as fields in the
document. Consider the example in Figure 1, where the document
represents the entity ANN DUNHAM, who is being linked to the en-
tity BARACK OBAMA (via the relationship <dbo:child>). This
entity-based representation differs from the traditional term-based
representation in at least two important ways. Firstly, each entity
appears at most once in each document field. Secondly, if an entity
appears in a field, then it should be considered a match, irrespec-
tive of what other entities may appear in that field. Consider again
the example in Figure 1, where the field <dbo:birthplace>
has multiple values, HONOLULU and HAWAII. Then, if either of
these entities is linked in the query, that should account for a per-
fect match against this particular field, irrespective of how many
other locations are present in that field. Motivated by these obser-
vations, we define the feature function fE as:
fE(e, D) = log
X
f2F
wE
f
h
(1 ↵)tf{0,1}(e, ˆDf ) + ↵
dfe,f
dff
i
, (13)
where the linear interpolation implements the Jelinek-Mercer smooth-
ing method, with ↵ set to 0.1, and tf{0,1}(e, ˆDf ) indicates whether
the entity e is present in the document field ˆDf or not. For the back-
ground model, we employ the notion of document frequency as fol-
lows: dfe,f = |{ ˆD|e 2 ˆDf }| is the total number of documents that
contain the entity e in field f and df(f) = |{ ˆD| ˆDf 6= ;}| is the
number of documents with a non-empty field f.
All of the feature functions fT , fO, fU , and fE involve free pa-
rameters wf , which control the field weights. Zhiltsov et al. [46]
set these type of parameters using a learning algorithm, which leads
to a large number of parameters to be trained (the number of fea-
field weight
(PRMS probability)
Results
Overall performance
• Experiments on DBpedia-entity collection
Contains nearly 500 heterogeneous queries
Model
SemSearch ES ListSearch INEX-LD
MAP P@10 MAP P@10 MAP P@10
LM .2485 .2008 .1529 .1939 .1129 .2210
LM + ELR .2531M
(+1.8%) .2008 .1688N
(+10.4%) .2096 .1244N
(+10.2%) .2330
PRMS .3517 .2685 .1722 .2270 .1184 .2240
PRMS + ELR .3573 (+1.6%) .2700 .1956N
(+14.1%) .2417 .1303N
(+10%) .2330
SDM .2669 .2108 .1553 .1948 .1167 .2250
SDM + ELR .2641 (-1%) .2115 .1689M
(+8.8%) .2174N
.1264N
(+8.3%) .2340
FSDM* .3563 .2692 .1777 .2165 .1261 .2290
FSDM* + ELR .3581 (+.5%) .2677 .1973M
(+11%) .2391N
.1332 (+5.6%) .2330
Table 4: Results of ELR approach on different query types. Significance is tested against the line a
show the relative improvements, in terms of MAP.
Model MAP P@10
LM 0.1588 0.1664
MLM-tc 0.1821 0.1786
MLM-all 0.1940 0.1965
PRMS 0.1936 0.1977
SDM 0.1719 0.1707
FSDM* 0.2030 0.1973
LM + ELR 0.1693N
(+6.6%) 0.1757N
(+5.6%)
MLM-tc + ELR 0.1937N
(+6.4%) 0.1895N
(+6.1%)
MLM-all + ELR 0.2082N
(+7.3%) 0.2054M
(+4.5%)
PRMS + ELR 0.2078N
(+7.3%) 0.2085N
(+5.5%)
SDM + ELR 0.1794N
(+4.4%) 0.1812N
(+6.1%)
FSDM* + ELR 0.2159N
(+6.3%) 0.2078N
(+5.3%)
Table 3: Retrieval results for baseline models (top) and with
ELR applied on top of them (bottom). Significance is tested
against the corresponding baseline model. Best scores are in
boldface.
shows the results we get by a
lines. We observe consistent i
relative ranking of models rem
tc < MLM-all < PRMS < FS
proved by 4.4–7.3% in terms
of P@10. All improvements a
these results, we answer our fi
tity annotations of the query
performance.
For the analysis that follow
of these models: LM, PRMS,
enables us to make a meaning
two dimensions: (i) single vs
PRMS and FSDM*), and (ii)
(LM and PRMS vs. SDM and
6.3 Breakdown by q
How are the different quer
Intuitively, we expect ELR to
“Airports in Germany,” “the first 13
yword queries, involving a mixture of
, and attributes (e.g., “Eiffel,” “viet-
roman architecture in paris”).
guage queries (e.g., “which country
fy come from,” “give me all female
tatistics on these query subsets.
in the term-based representation is a
wo of the evaluated models (single-
mined retrieval performance against a
= 10i
, i = 0, 1, 2, 3), see Figure 3.
y frequency. It is clear from the figure
ined when the top 10 most frequent
setting in all our experiments, unless
ency, we also used the same setting,
y-based representation.
ng
parameter settings used in our exper-
in language models is set to the av-
across the collection. The unordered
d (11) is chosen to be 8, as suggested
parameters involved in SDM, FSDM,
y the Coordinate Ascent (CA) algo-
ize Mean Average Precision (MAP).
mization technique, which iteratively
while holding all other parameters
CA implementation provided in the
the number of random restarts to 3.
the parameters of SDM, FSDM*,
sing 5-fold cross validation for each
ely. We note that Zhiltsov et al. [46]
meters (Eq. (5)-(11)) for the FSDM
tity representation from [46] (with 10
default search settings, then this set is re-ranked with the specific
retrieval model (using an in-house implementation). Evaluation
scores are reported on the top 100 results. To perform cross-valida-
tion, we randomly create train and test folds from the initial result
set, and use the same folds throughout all the experiments. To mea-
sure statistical significance we employ a two-tailed paired t-test and
denote differences at the 0.01 and 0.05 levels using the N
and M
sym-
bols, respectively.
5.4 Entity linking
Entity linking is a key component of the ELR approach. For the
purpose of reproducibility, all the entity annotations in this work
are obtained using an open source entity linker, TAGME [19], ac-
cessed through its RESTful API.1
TAGME is one of the best per-
forming entity linkers for short queries [12, 14]. As suggested in
the API documentation, we use the default threshold 0.1 in our ex-
periments; we analyze the effect of the threshold parameter in §6.5.
6. RESULTS AND ANALYSIS
We begin by enumerating our research questions, then present a
series of experiments conducted to answer them.
6.1 Research questions
We address the following research questions:
• RQ1: Can entity retrieval performance be improved by in-
corporating entity annotations of the query? (§6.2)
• RQ2: How are the different query subsets impacted by ELR?
(§6.3)
• RQ3: How robust is our method with respect to parameter
settings? (§6.4)
• RQ4: What is the impact of the entity linking component on
end-to-end performance? (§6.5)
Retrieval results for the baseline models (top) and with ELR
applied on top of them (bottom). Significance is tested against
the corresponding baseline model.
6.2 Overall performance
nokia e73
What is the longest river?
Vietnam war facts
Airlines that currently use Boeing 747 planes
EU countries
University of Texas at Austin
Named entity queries:
Complex queries (list, type, and NLP queries):
Breakdown by query sets
Queries are of two main categories:
Breakdown by query sets
Model
SemSearch ES ListSearch INEX-LD QALD-2
MAP P@10 MAP P@10 MAP P@10 MAP P@10
LM .2485 .2008 .1529 .1939 .1129 .2210 .1132 .0729
LM + ELR .2531M
(+1.8%) .2008 .1688N
(+10.4%) .2096 .1244N
(+10.2%) .2330 .1241N
(+9.6%) .0836N
PRMS .3517 .2685 .1722 .2270 .1184 .2240 .1180 .0893
PRMS + ELR .3573 (+1.6%) .2700 .1956N
(+14.1%) .2417 .1303N
(+10%) .2330 .1343N
(+13.8%) .1064N
SDM .2669 .2108 .1553 .1948 .1167 .2250 .1369 .0750
SDM + ELR .2641 (-1%) .2115 .1689M
(+8.8%) .2174N
.1264N
(+8.3%) .2340 .1472M
(+7.5%) .0857
FSDM* .3563 .2692 .1777 .2165 .1261 .2290 .1364 .0921
FSDM* + ELR .3581 (+.5%) .2677 .1973M
(+11%) .2391N
.1332 (+5.6%) .2330 .1583M
(+16%) .1086N
Table 4: Results of ELR approach on different query types. Significance is tested against the line above; the numbers in parentheses
show the relative improvements, in terms of MAP.
Model MAP P@10
LM 0.1588 0.1664
MLM-tc 0.1821 0.1786
MLM-all 0.1940 0.1965
PRMS 0.1936 0.1977
SDM 0.1719 0.1707
shows the results we get by applying ELR on top of these base-
lines. We observe consistent improvements over all baselines; the
relative ranking of models remains the same (LM < SDM < MLM-
tc < MLM-all < PRMS < FSDM*), but their performance is im-
proved by 4.4–7.3% in terms of MAP, and by 4.5–6.1% in terms
of P@10. All improvements are statistically significant. Based on
these results, we answer our first research question positively: en-
ELR especially benefits complex heterogenous queries.
Analysis
Parameters
Value of parameters (average of trained values across folds)
Different query types are impacted differently by the ELR method.
Impact of parameters
Default parameters suggested based on trained values
No training is performed
(b) PRMS+ELR (c) SDM+ELR (d) FSDM*+ELR
T : unigrams, O: ordered bigrams, U : unordered bigrams, E: entities) in our experiments,
d using Coordinate Ascent).
is attributed to entity-based
O,U,E} parameters for each
are obtained by averaging
olds of the cross-validation
d across all retrieval meth-
re assigned the highest E
r but still sizable portion of
S it bears little importance.
nclude that different query
ELR method. The results
prove complex (ListSearch
us (INEX-LD) query sets,
the other hand, short key-
eit often ambiguous, entity
f parameters (RQ3)? To
configurations: (i) default
Model
Default params. Trained params.
MAP P@10 MAP P@10
LM 0.1588 0.1664
LM + ELR 0.1668M
0.1724 0.1693N
0.1757N
PRMS 0.1936 0.1977
PRMS + ELR 0.2028N
0.2035 0.2078N
0.2085N
SDM 0.1672 0.1685 0.1719 0.1707
SDM + ELR 0.1721 0.1722 0.1794N
0.1812N
FSDM 0.1969 0.1973 0.2030 0.1973
FSDM + ELR 0.2043N
0.1996 0.2159N
0.2078N
Table 5: Comparison of default vs. trained parameters over
all queries. Significance is tested against the line above.
threshold value. Figure 5 reports the results for the best performing
model, FSDM* + ELR, for both trained and default parameters.
Apart from the small fluctuations in the 0.4–0.6 range, retrieval per-
• ELR can bring improvements, even with default parameter values.
• ELR is robust against parameter settings.
Impact of entity linking
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9
Entity linking threshold
0.195
0.200
0.205
0.210
0.215
0.220
0A3
7rDined pDrDPeters
DefDult pDrDPeters
Entity linking systems involve a threshold parameter
ELR improves performance, even considering entities with low confidence
Summary
• ELR complements any term-based retrieval model
that can be emulated in the MRF framework
• ELR brings consistent improvements over all
baselines (especially for complex queries)
• ELR is robust against parameter settings
parameters
Entity linking threshold
Thanks!
Questions?
The code and runs are
publicly available:
http://bit.ly/ictir2016-elr

Más contenido relacionado

La actualidad más candente

Gleaning Types for Literals in RDF with Application to Entity Summarization
Gleaning Types for Literals in RDF with Application to Entity SummarizationGleaning Types for Literals in RDF with Application to Entity Summarization
Gleaning Types for Literals in RDF with Application to Entity SummarizationKalpa Gunaratna
 
Question Answering with Lydia
Question Answering with LydiaQuestion Answering with Lydia
Question Answering with LydiaJae Hong Kil
 
Representing financial reports on the semantic web a faithful translation f...
Representing financial reports on the semantic web   a faithful translation f...Representing financial reports on the semantic web   a faithful translation f...
Representing financial reports on the semantic web a faithful translation f...Jie Bao
 
What's next in Julia
What's next in JuliaWhat's next in Julia
What's next in JuliaJiahao Chen
 
Expressive Query Answering For Semantic Wikis (20min)
Expressive Query Answering For  Semantic Wikis (20min)Expressive Query Answering For  Semantic Wikis (20min)
Expressive Query Answering For Semantic Wikis (20min)Jie Bao
 
SQL For PHP Programmers
SQL For PHP ProgrammersSQL For PHP Programmers
SQL For PHP ProgrammersDave Stokes
 
Information Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and ToolsInformation Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and ToolsBenjamin Habegger
 
Information extraction for Free Text
Information extraction for Free TextInformation extraction for Free Text
Information extraction for Free Textbutest
 
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Sean Golliher
 
Artificial intelligence Prolog Language
Artificial intelligence Prolog LanguageArtificial intelligence Prolog Language
Artificial intelligence Prolog LanguageREHMAT ULLAH
 
Word Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology ClassesWord Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology ClassesAndre Freitas
 
Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLPRupak Roy
 
Expressive Query Answering For Semantic Wikis
Expressive Query Answering For  Semantic WikisExpressive Query Answering For  Semantic Wikis
Expressive Query Answering For Semantic WikisJie Bao
 
Everything you wanted to know about Dublin Core metadata
Everything you wanted to know about Dublin Core metadataEverything you wanted to know about Dublin Core metadata
Everything you wanted to know about Dublin Core metadataEduserv Foundation
 
Chaps 1-3-ai-prolog
Chaps 1-3-ai-prologChaps 1-3-ai-prolog
Chaps 1-3-ai-prologsaru40
 
Latent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information RetrievalLatent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information RetrievalSudarsun Santhiappan
 
Logic programming (1)
Logic programming (1)Logic programming (1)
Logic programming (1)Nitesh Singh
 
Data modelingpresentation
Data modelingpresentationData modelingpresentation
Data modelingpresentationfikirabc
 
Introduction to Text Mining
Introduction to Text Mining Introduction to Text Mining
Introduction to Text Mining Rupak Roy
 

La actualidad más candente (20)

Gleaning Types for Literals in RDF with Application to Entity Summarization
Gleaning Types for Literals in RDF with Application to Entity SummarizationGleaning Types for Literals in RDF with Application to Entity Summarization
Gleaning Types for Literals in RDF with Application to Entity Summarization
 
Question Answering with Lydia
Question Answering with LydiaQuestion Answering with Lydia
Question Answering with Lydia
 
Representing financial reports on the semantic web a faithful translation f...
Representing financial reports on the semantic web   a faithful translation f...Representing financial reports on the semantic web   a faithful translation f...
Representing financial reports on the semantic web a faithful translation f...
 
What's next in Julia
What's next in JuliaWhat's next in Julia
What's next in Julia
 
Expressive Query Answering For Semantic Wikis (20min)
Expressive Query Answering For  Semantic Wikis (20min)Expressive Query Answering For  Semantic Wikis (20min)
Expressive Query Answering For Semantic Wikis (20min)
 
SQL For PHP Programmers
SQL For PHP ProgrammersSQL For PHP Programmers
SQL For PHP Programmers
 
Information Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and ToolsInformation Extraction from the Web - Algorithms and Tools
Information Extraction from the Web - Algorithms and Tools
 
Information extraction for Free Text
Information extraction for Free TextInformation extraction for Free Text
Information extraction for Free Text
 
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)Lecture 9 - Machine Learning and Support Vector Machines (SVM)
Lecture 9 - Machine Learning and Support Vector Machines (SVM)
 
Artificial intelligence Prolog Language
Artificial intelligence Prolog LanguageArtificial intelligence Prolog Language
Artificial intelligence Prolog Language
 
Word Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology ClassesWord Tagging with Foundational Ontology Classes
Word Tagging with Foundational Ontology Classes
 
Topic Modeling - NLP
Topic Modeling - NLPTopic Modeling - NLP
Topic Modeling - NLP
 
Expressive Query Answering For Semantic Wikis
Expressive Query Answering For  Semantic WikisExpressive Query Answering For  Semantic Wikis
Expressive Query Answering For Semantic Wikis
 
Everything you wanted to know about Dublin Core metadata
Everything you wanted to know about Dublin Core metadataEverything you wanted to know about Dublin Core metadata
Everything you wanted to know about Dublin Core metadata
 
Chaps 1-3-ai-prolog
Chaps 1-3-ai-prologChaps 1-3-ai-prolog
Chaps 1-3-ai-prolog
 
Chaps 1-3-ai-prolog
Chaps 1-3-ai-prologChaps 1-3-ai-prolog
Chaps 1-3-ai-prolog
 
Latent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information RetrievalLatent Semantic Indexing For Information Retrieval
Latent Semantic Indexing For Information Retrieval
 
Logic programming (1)
Logic programming (1)Logic programming (1)
Logic programming (1)
 
Data modelingpresentation
Data modelingpresentationData modelingpresentation
Data modelingpresentation
 
Introduction to Text Mining
Introduction to Text Mining Introduction to Text Mining
Introduction to Text Mining
 

Destacado

Entity Linking in Queries: Tasks and Evaluation
Entity Linking in Queries: Tasks and EvaluationEntity Linking in Queries: Tasks and Evaluation
Entity Linking in Queries: Tasks and EvaluationFaegheh Hasibi
 
Being a PhD student: Experiences and Challenges
Being a PhD student: Experiences and ChallengesBeing a PhD student: Experiences and Challenges
Being a PhD student: Experiences and ChallengesFaegheh Hasibi
 
On the Reproducibility of the TAGME entity linking system
On the Reproducibility of the TAGME entity linking systemOn the Reproducibility of the TAGME entity linking system
On the Reproducibility of the TAGME entity linking systemFaegheh Hasibi
 
How to survive a PhD
How to survive a PhDHow to survive a PhD
How to survive a PhDncg_nuim
 
The Art of Doing a PhD
The Art of Doing a PhDThe Art of Doing a PhD
The Art of Doing a PhDJakob Bardram
 
Erd for teaching staff
Erd for teaching staffErd for teaching staff
Erd for teaching staffMrsjalland
 
Faculty evaluation system
Faculty evaluation systemFaculty evaluation system
Faculty evaluation systemEdwin Marquez
 
Online Performance Evaluation System
Online Performance Evaluation SystemOnline Performance Evaluation System
Online Performance Evaluation SystemPratham Vision
 
Exploring Linked Data content through network analysis
Exploring Linked Data content through network analysisExploring Linked Data content through network analysis
Exploring Linked Data content through network analysisChristophe Guéret
 
Automatic Term Ambiguity Detection
Automatic Term Ambiguity DetectionAutomatic Term Ambiguity Detection
Automatic Term Ambiguity DetectionYunyao Li
 
A Comparison of NER Tools w.r.t. a Domain-Specific Vocabulary
A Comparison of NER Tools w.r.t. a Domain-Specific VocabularyA Comparison of NER Tools w.r.t. a Domain-Specific Vocabulary
A Comparison of NER Tools w.r.t. a Domain-Specific VocabularyTimm Heuss
 
Linked Data: What’s the Story?
Linked Data: What’s the Story?Linked Data: What’s the Story?
Linked Data: What’s the Story?WiLS
 
SYNERGY - A Named Entity Recognition System for Resource-scarce Languages suc...
SYNERGY - A Named Entity Recognition System for Resource-scarce Languages suc...SYNERGY - A Named Entity Recognition System for Resource-scarce Languages suc...
SYNERGY - A Named Entity Recognition System for Resource-scarce Languages suc...Guy De Pauw
 
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...Olivier Grisel
 
Understanding Named-Entity Recognition (NER)
Understanding Named-Entity Recognition (NER) Understanding Named-Entity Recognition (NER)
Understanding Named-Entity Recognition (NER) Stephen Shellman
 
QER : query entity recognition
QER : query entity recognitionQER : query entity recognition
QER : query entity recognitionDhwaj Raj
 
The named entity recognition (ner)2
The named entity recognition (ner)2The named entity recognition (ner)2
The named entity recognition (ner)2Arabic_NLP_ImamU2013
 

Destacado (20)

Entity Linking in Queries: Tasks and Evaluation
Entity Linking in Queries: Tasks and EvaluationEntity Linking in Queries: Tasks and Evaluation
Entity Linking in Queries: Tasks and Evaluation
 
Being a PhD student: Experiences and Challenges
Being a PhD student: Experiences and ChallengesBeing a PhD student: Experiences and Challenges
Being a PhD student: Experiences and Challenges
 
On the Reproducibility of the TAGME entity linking system
On the Reproducibility of the TAGME entity linking systemOn the Reproducibility of the TAGME entity linking system
On the Reproducibility of the TAGME entity linking system
 
How to survive a PhD
How to survive a PhDHow to survive a PhD
How to survive a PhD
 
The Art of Doing a PhD
The Art of Doing a PhDThe Art of Doing a PhD
The Art of Doing a PhD
 
Recipes for PhD
Recipes for PhDRecipes for PhD
Recipes for PhD
 
Erd for teaching staff
Erd for teaching staffErd for teaching staff
Erd for teaching staff
 
Faculty evaluation system
Faculty evaluation systemFaculty evaluation system
Faculty evaluation system
 
Online Performance Evaluation System
Online Performance Evaluation SystemOnline Performance Evaluation System
Online Performance Evaluation System
 
Exploring Linked Data content through network analysis
Exploring Linked Data content through network analysisExploring Linked Data content through network analysis
Exploring Linked Data content through network analysis
 
Automatic Term Ambiguity Detection
Automatic Term Ambiguity DetectionAutomatic Term Ambiguity Detection
Automatic Term Ambiguity Detection
 
A Comparison of NER Tools w.r.t. a Domain-Specific Vocabulary
A Comparison of NER Tools w.r.t. a Domain-Specific VocabularyA Comparison of NER Tools w.r.t. a Domain-Specific Vocabulary
A Comparison of NER Tools w.r.t. a Domain-Specific Vocabulary
 
Linked Data: What’s the Story?
Linked Data: What’s the Story?Linked Data: What’s the Story?
Linked Data: What’s the Story?
 
Entity Search Engine
Entity Search Engine Entity Search Engine
Entity Search Engine
 
SYNERGY - A Named Entity Recognition System for Resource-scarce Languages suc...
SYNERGY - A Named Entity Recognition System for Resource-scarce Languages suc...SYNERGY - A Named Entity Recognition System for Resource-scarce Languages suc...
SYNERGY - A Named Entity Recognition System for Resource-scarce Languages suc...
 
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
Universal Topic Classification - Named Entity Disambiguation (IKS Workshop Pa...
 
Multlingual Linked Data Patterns
Multlingual Linked Data PatternsMultlingual Linked Data Patterns
Multlingual Linked Data Patterns
 
Understanding Named-Entity Recognition (NER)
Understanding Named-Entity Recognition (NER) Understanding Named-Entity Recognition (NER)
Understanding Named-Entity Recognition (NER)
 
QER : query entity recognition
QER : query entity recognitionQER : query entity recognition
QER : query entity recognition
 
The named entity recognition (ner)2
The named entity recognition (ner)2The named entity recognition (ner)2
The named entity recognition (ner)2
 

Similar a Exploiting Entity Linking in Queries For Entity Retrieval

MATH FUNCTION RECOGNITION WITH FINE TUNING PRE-TRAINED MODELS
MATH FUNCTION RECOGNITION WITH FINE TUNING PRE-TRAINED MODELSMATH FUNCTION RECOGNITION WITH FINE TUNING PRE-TRAINED MODELS
MATH FUNCTION RECOGNITION WITH FINE TUNING PRE-TRAINED MODELSIJCI JOURNAL
 
MATH FUNCTION RECOGNITION WITH FINE TUNING PRE-TRAINED MODELS
MATH FUNCTION RECOGNITION WITH FINE TUNING PRE-TRAINED MODELSMATH FUNCTION RECOGNITION WITH FINE TUNING PRE-TRAINED MODELS
MATH FUNCTION RECOGNITION WITH FINE TUNING PRE-TRAINED MODELSIJCI JOURNAL
 
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASESSCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASESijwscjournal
 
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASESSCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASESijwscjournal
 
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASESSCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASESijwscjournal
 
Bca ii dfs u-1 introduction to data structure
Bca ii dfs u-1 introduction to data structureBca ii dfs u-1 introduction to data structure
Bca ii dfs u-1 introduction to data structureRai University
 
Mca ii dfs u-1 introduction to data structure
Mca ii dfs u-1 introduction to data structureMca ii dfs u-1 introduction to data structure
Mca ii dfs u-1 introduction to data structureRai University
 
Bsc cs ii dfs u-1 introduction to data structure
Bsc cs ii dfs u-1 introduction to data structureBsc cs ii dfs u-1 introduction to data structure
Bsc cs ii dfs u-1 introduction to data structureRai University
 
Introduction to Data structure and algorithm.pptx
Introduction to Data structure and algorithm.pptxIntroduction to Data structure and algorithm.pptx
Introduction to Data structure and algorithm.pptxline24arts
 
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...ijseajournal
 
Duplicate Detection in Hierarchical Data Using XPath
Duplicate Detection in Hierarchical Data Using XPathDuplicate Detection in Hierarchical Data Using XPath
Duplicate Detection in Hierarchical Data Using XPathiosrjce
 
Unit 1 introduction to data structure
Unit 1   introduction to data structureUnit 1   introduction to data structure
Unit 1 introduction to data structurekalyanineve
 
Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of...
Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of...Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of...
Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of...FedorNikolaev
 
Chapter 1 - Concepts for Object Databases.ppt
Chapter 1 - Concepts for Object Databases.pptChapter 1 - Concepts for Object Databases.ppt
Chapter 1 - Concepts for Object Databases.pptShemse Shukre
 
Everything is composable
Everything is composableEverything is composable
Everything is composableVictor Igor
 
Efficiency of TreeMatch Algorithm in XML Tree Pattern Matching
Efficiency of TreeMatch Algorithm in XML Tree Pattern  MatchingEfficiency of TreeMatch Algorithm in XML Tree Pattern  Matching
Efficiency of TreeMatch Algorithm in XML Tree Pattern MatchingIOSR Journals
 
The Functional Programming Triad of Map, Filter and Fold
The Functional Programming Triad of Map, Filter and FoldThe Functional Programming Triad of Map, Filter and Fold
The Functional Programming Triad of Map, Filter and FoldPhilip Schwarz
 

Similar a Exploiting Entity Linking in Queries For Entity Retrieval (20)

MATH FUNCTION RECOGNITION WITH FINE TUNING PRE-TRAINED MODELS
MATH FUNCTION RECOGNITION WITH FINE TUNING PRE-TRAINED MODELSMATH FUNCTION RECOGNITION WITH FINE TUNING PRE-TRAINED MODELS
MATH FUNCTION RECOGNITION WITH FINE TUNING PRE-TRAINED MODELS
 
MATH FUNCTION RECOGNITION WITH FINE TUNING PRE-TRAINED MODELS
MATH FUNCTION RECOGNITION WITH FINE TUNING PRE-TRAINED MODELSMATH FUNCTION RECOGNITION WITH FINE TUNING PRE-TRAINED MODELS
MATH FUNCTION RECOGNITION WITH FINE TUNING PRE-TRAINED MODELS
 
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASESSCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
 
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASESSCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
 
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASESSCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
SCHEMA BASED STORAGE OF XML DOCUMENTS IN RELATIONAL DATABASES
 
Bca ii dfs u-1 introduction to data structure
Bca ii dfs u-1 introduction to data structureBca ii dfs u-1 introduction to data structure
Bca ii dfs u-1 introduction to data structure
 
Mca ii dfs u-1 introduction to data structure
Mca ii dfs u-1 introduction to data structureMca ii dfs u-1 introduction to data structure
Mca ii dfs u-1 introduction to data structure
 
Bsc cs ii dfs u-1 introduction to data structure
Bsc cs ii dfs u-1 introduction to data structureBsc cs ii dfs u-1 introduction to data structure
Bsc cs ii dfs u-1 introduction to data structure
 
Introduction to Data structure and algorithm.pptx
Introduction to Data structure and algorithm.pptxIntroduction to Data structure and algorithm.pptx
Introduction to Data structure and algorithm.pptx
 
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...
HOLISTIC EVALUATION OF XML QUERIES WITH STRUCTURAL PREFERENCES ON AN ANNOTATE...
 
J017616976
J017616976J017616976
J017616976
 
Duplicate Detection in Hierarchical Data Using XPath
Duplicate Detection in Hierarchical Data Using XPathDuplicate Detection in Hierarchical Data Using XPath
Duplicate Detection in Hierarchical Data Using XPath
 
1st lecture.ppt
1st lecture.ppt1st lecture.ppt
1st lecture.ppt
 
Unit 1 introduction to data structure
Unit 1   introduction to data structureUnit 1   introduction to data structure
Unit 1 introduction to data structure
 
Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of...
Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of...Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of...
Fielded Sequential Dependence Model for Ad-Hoc Entity Retrieval in the Web of...
 
Chapter 1 - Concepts for Object Databases.ppt
Chapter 1 - Concepts for Object Databases.pptChapter 1 - Concepts for Object Databases.ppt
Chapter 1 - Concepts for Object Databases.ppt
 
Everything is composable
Everything is composableEverything is composable
Everything is composable
 
Efficiency of TreeMatch Algorithm in XML Tree Pattern Matching
Efficiency of TreeMatch Algorithm in XML Tree Pattern  MatchingEfficiency of TreeMatch Algorithm in XML Tree Pattern  Matching
Efficiency of TreeMatch Algorithm in XML Tree Pattern Matching
 
Final exam in advance dbms
Final exam in advance dbmsFinal exam in advance dbms
Final exam in advance dbms
 
The Functional Programming Triad of Map, Filter and Fold
The Functional Programming Triad of Map, Filter and FoldThe Functional Programming Triad of Map, Filter and Fold
The Functional Programming Triad of Map, Filter and Fold
 

Último

GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)Areesha Ahmad
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Monika Rani
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxFarihaAbdulRasheed
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPirithiRaju
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLkantirani197
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxRizalinePalanog2
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPirithiRaju
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoSérgio Sacani
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRDelhi Call girls
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksSérgio Sacani
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...RohitNehra6
 

Último (20)

GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)GBSN - Microbiology (Unit 1)
GBSN - Microbiology (Unit 1)
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
CELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdfCELL -Structural and Functional unit of life.pdf
CELL -Structural and Functional unit of life.pdf
 
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
Vip profile Call Girls In Lonavala 9748763073 For Genuine Sex Service At Just...
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptxCOST ESTIMATION FOR A RESEARCH PROJECT.pptx
COST ESTIMATION FOR A RESEARCH PROJECT.pptx
 
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdfPests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
Pests of cotton_Borer_Pests_Binomics_Dr.UPR.pdf
 
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRLKochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
Kochi ❤CALL GIRL 84099*07087 ❤CALL GIRLS IN Kochi ESCORT SERVICE❤CALL GIRL
 
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptxSCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
SCIENCE-4-QUARTER4-WEEK-4-PPT-1 (1).pptx
 
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencyHire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency
 
Pests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdfPests of mustard_Identification_Management_Dr.UPR.pdf
Pests of mustard_Identification_Management_Dr.UPR.pdf
 
Isotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on IoIsotopic evidence of long-lived volcanism on Io
Isotopic evidence of long-lived volcanism on Io
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCRStunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
Stunning ➥8448380779▻ Call Girls In Panchshil Enclave Delhi NCR
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Formation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disksFormation of low mass protostars and their circumstellar disks
Formation of low mass protostars and their circumstellar disks
 
Biopesticide (2).pptx .This slides helps to know the different types of biop...
Biopesticide (2).pptx  .This slides helps to know the different types of biop...Biopesticide (2).pptx  .This slides helps to know the different types of biop...
Biopesticide (2).pptx .This slides helps to know the different types of biop...
 

Exploiting Entity Linking in Queries For Entity Retrieval

  • 1. Exploiting Entity Linking in Queries for Entity Retrieval Faegheh Hasibi, Krisztian Balog, Svein Erik Bratsberg Newark, DE, USA, September 2016
  • 2. Exploiting Entity Linking in Queries for Entity Retrieval Faegheh Hasibi, Krisztian Balog, Svein Erik Bratsberg Newark, DE, USA, September 2016
  • 3.
  • 4.
  • 6. Entity <rdfs:label> <dbo:abstract> <dbo:birthPlace> <dbo:birthDate> <dbo:deathPlace> <dbo:deathDate> <dbo:child> <dbp:spouse> <dbp:shortDescription> <dbo:wikiPageWikiLink> Ann_Dunham Ann Dunham Stanley Ann Dunham (November 29, 1942 – November 7, 1995) was the mother of Barack Obama, the 44th President of the United States, and an American anthropologis [..] <Honolulu> <Hawaii> <United_States> 1942-11-29 <Honolulu> <United_States> 1995-11-07 <Barack_Obama> <Maya_Soetoro-Ng> <Lolo_Soetoro> <Barack_Obama,_Sr.> Anthropologist; mother of Barack Obama <Barack_Obama> <President_of_the_United_States> <United_States> [..]
  • 7. Conventional entity retrieval <Barack_Obama> Annotations: barack obama parents Entity-based representation ˆDˆD Term-based representation DD term-based matching entity-based matching entity linking <dbo:birthPlace>: [<Honolulu>, <Hawaii> ] <dbo:child>: <Barack_Obama> <dbo:wikiPageWikiLink>: [ <United_States>, <Family_of_Barack_Obama>, …] Query terms: <rdfs:label>: Ann Dunham <dbo:abstract>: Stanley Ann Dunham the mother Barack Obama, was an American anthropologist who … <dbo:birthPlace>: Honolulu Hawaii … <dbo:child>: Barack Obama <dbo:wikiPageWikiLink>: United States Family Barack Obama Term-based representation barack obama parents Conventional entity retrieval models compare the term-based representation of entity against the query terms. Query
  • 8. Our proposal barack obama parents Incorporating entity linking into the retrieval model entity linking
  • 9. Our proposal <Barack_Obama> Annotations: barack obama parents Entity-based representation ˆDˆD Term-based representation DD term-based matching entity-based matching entity linking <dbo:birthPlace>: [<Honolulu>, <Hawaii> ] <dbo:child>: <Barack_Obama> <dbo:wikiPageWikiLink>: [ <United_States>, <Family_of_Barack_Obama>, …] Query terms: <rdfs:label>: Ann Dunham <dbo:abstract>: Stanley Ann Dunham the mother Barack Obama, was an American anthropologist who … <dbo:birthPlace>: Honolulu Hawaii … <dbo:child>: Barack Obama <dbo:wikiPageWikiLink>: United States Family Barack Obama Term-based representation barack obama parents <Barack_Obama> Annotations: barack obama parents Entity-based representation ˆDˆD Term-based representation DD term-based matching entity-based matching entity linking <dbo:birthPlace>: [<Honolulu>, <Hawaii> ] <dbo:child>: <Barack_Obama> <dbo:wikiPageWikiLink>: [ <United_States>, <Family_of_Barack_Obama>, …] Query terms: <rdfs:label>: Ann Dunham <dbo:abstract>: Stanley Ann Dunham the mother Barack Obama, was an American anthropologist who … <dbo:birthPlace>: Honolulu Hawaii … <dbo:child>: Barack Obama <dbo:wikiPageWikiLink>: United States Family Barack Obama Entity-based representation <Barack_Obama> entity linking
  • 11. ELR approach Based on Markov Random Fields (MRF) D. Metzler and W. B. Croft. A Markov Random Field model for term dependencies. In Proc. of SIGIR, pages 472–479, 2005. q1q1 q2q2 q3q3 DD e1e1 barack obama parents BARACK OBAMAbarack obama parents <Barack_Obama>
  • 12. ELR approach Based on Markov Random Fields (MRF) 9) 0) 1) to al. on te k- F- ument and a term node, (ii) 3-cliques consisting of the document and two term nodes, and (iii) 2-cliques consisting of edges between the document and an entity node. The potential functions for the first two types are identical to the SDM model (Eqs. (3) and 4). We define the potential function for the third clique type as: E(e, D; ⇤) = exp[ EfE(e, D)], where E is a free parameter and fE(e, D) is the feature function for the entity e and document D (to be defined in §4.2). By substi- tuting all feature functions into Eq. (2), the MRF ranking function becomes: P(D|Q) rank = X qi2Q T fT (qi, D)+ X qi,qi+12Q OfO(qi, qi+1, D)+ X qi,qi+12Q U fU (qi, qi+1, D)+ X e2E(Q) EfE(e, D). This model introduces an additional parameter for weighing the im- portance of entity annotations, E, on top of the three parameters ( {T,O,U}) from the SDM model (cf. §3.2). There is a crucial dif- ference between entity-based and term-based matches with regards 9) 0) 1) to al. on te k- F- ument and a term node, (ii) 3-cliques consisting of the document and two term nodes, and (iii) 2-cliques consisting of edges between the document and an entity node. The potential functions for the first two types are identical to the SDM model (Eqs. (3) and 4). We define the potential function for the third clique type as: E(e, D; ⇤) = exp[ EfE(e, D)], where E is a free parameter and fE(e, D) is the feature function for the entity e and document D (to be defined in §4.2). By substi- tuting all feature functions into Eq. (2), the MRF ranking function becomes: P(D|Q) rank = X qi2Q T fT (qi, D)+ X qi,qi+12Q OfO(qi, qi+1, D)+ X qi,qi+12Q U fU (qi, qi+1, D)+ X e2E(Q) EfE(e, D). This model introduces an additional parameter for weighing the im- portance of entity annotations, E, on top of the three parameters ( {T,O,U}) from the SDM model (cf. §3.2). There is a crucial dif- ference between entity-based and term-based matches with regards Terms Ordered bigrams Unordered bigrams Entities The framework encompass various retrieval models; e.g., LM, MLM, PRMS, SDM, and FSDM
  • 13. Challenge 1 • Number of linked entities is not proportional to query length Affects setting of parameter • Entity linking involves some uncertainty Confidence scores should be integrated into the retrieval model all cars that are produced in Germany Germany (score: 0.7) Cars_(film) (score: 0.05)
  • 14. Normalization The final model takes the normalized form: parents BARACK OBAMA ntation of the ELR model for the ts”. Here, all the terms are sequen- se "barack obama" is linked to the hile #uwN(qi, qi+1) counts the co- window of N words (where N is set meter µ is the Dirichlet prior, which ument length in the collection. SDM is basically the weighted sum ained from three sources: (i) query y bigrams, and (iii) unordered match ch in §4 employs the same sequential DM does. ial Dependence Model pendence Model (FSDM) [46] ex- ort structured document retrieval. In ocument language model of feature )) with those of the Mixture of Lan- Given a fielded representation of a hors, metadata, etc. in the context of M computes a language model prob- tion, that ELR is applicable to a wide range of retrieval prob where documents, or document-based representations of ob are to be ranked, and entity annotations are available to be l aged in the matching of documents and queries. Our main foc this paper, however, is limited to entity retrieval; entity annota are an integral part of the representation here, cf. Figure 1. is unlike to traditional document retrieval, where documents w need to be annotated by an automated process that is prone t rors). To show the generic nature of our approach, and also fo sake of notational consistency with the previous section, we refer to documents throughout this section. We detail how documents are constructed for our particular task, entity retr in §4.3. Our interest in this work lies in incorporating entity annota and not in creating them. Therefore, entity annotations of the q are assumed to have been generated by an external entity lin process, which we treat much like a black box. Formally, giv input query Q = q1...qn, the set of linked entities is denote E(Q) = {e1, ..., em}. We do not impose any restrictions on annotations, i.e., they may be overlapping and a given query might be linked to multiple entities. It might also be that E is an empty set. Further, we assume that annotations have c dence scores associated with them. For each entity e 2 E let s(e) denote the confidence score of e, with the constraiP e2E(Q) s(e) = 1. The graph underlying our model consists of document, term entity linking score
  • 15. Challenge 2 Term-based matching is modeled based on term frequencies: tends the SDM model to support structured document retrieval. In essence, FSDM replaces the document language model of feature functions (Eqs. (6), (7), and (8)) with those of the Mixture of Lan- guage Models (MLM) [37]. Given a fielded representation of a document (e.g., title, body, anchors, metadata, etc. in the context of web document retrieval), MLM computes a language model prob- ability for each field and then takes a linear combination of these field-level models. Hence, FSDM assumes that a separate language model is built for each document field and then computes the fea- ture functions based on fields f 2 F, with F being the universe of fields. For individual terms, the feature function becomes: fT (qi, D) = log X f wT f tfqi,Df + µf cfqi,f |Cf | |Df | + µf , (9) while ordered and unordered bigrams are estimated as: fO(qi, qi+1, D) = log X f wO f tf#1(qi,qi+1),Df + µf cf#1(qi,qi+1),f |Cf | |Df | + µf (10) fU (qi, qi+1, D) = log X f wU f tf#uwN(qi,qi+1),Df + µf cf#uwN(qi,qi+1),f |Cf | |Df | + µf . might be l is an empt dence scor let s(e) deP e2E(Q) s The grap entity node are sequen tities are in on this ass types of cli ument and and two ter the docum first two ty define the p where E for the enti tuting all f becomes: P But entities have different behavior than terms … Entity-based representations are sparse Single presence of an entity in a document is considered as a match
  • 16. Entity-based matching • The modeling considers presence/absence of entities in the entity representation presence/absence of entity in document field y that dence by the model. s as a of en- para- takes tity BARACK OBAMA (via the relationship <dbo:child>). This entity-based representation differs from the traditional term-based representation in at least two important ways. Firstly, each entity appears at most once in each document field. Secondly, if an entity appears in a field, then it should be considered a match, irrespec- tive of what other entities may appear in that field. Consider again the example in Figure 1, where the field <dbo:birthplace> has multiple values, HONOLULU and HAWAII. Then, if either of these entities is linked in the query, that should account for a per- fect match against this particular field, irrespective of how many other locations are present in that field. Motivated by these obser- vations, we define the feature function fE as: fE(e, D) = log X f2F wE f h (1 ↵)tf{0,1}(e, ˆDf ) + ↵ dfe,f dff i , (13) where the linear interpolation implements the Jelinek-Mercer smooth- ing method, with ↵ set to 0.1, and tf{0,1}(e, ˆDf ) indicates whether the entity e is present in the document field ˆDf or not. For the back- ground model, we employ the notion of document frequency as fol- lows: dfe,f = |{ ˆD|e 2 ˆDf }| is the total number of documents that contain the entity e in field f and df(f) = |{ ˆD| ˆDf 6= ;}| is the {T,O,U} of the summations (cf. Eq. (5)) and can be trained without having to worry about query length normalization. The parameter E, how- ever, cannot be treated the same manner for two reasons. Firstly, the number of annotated entities for each query varies and it is inde- pendent of the length of the query. For example, a long natural lan- guage query might be annotated with a single entity, while shorter queries are often linked to several entities, due to their ambiguity. Secondly, we need to deal with varying levels of uncertainty that is involved with entity annotations of the query. The confidence scores associated with the annotations, which are generated by the entity linking process, should be integrated into the retrieval model. To address the above issues, we re-write the parameters as a parameterized function over each clique and define them as: T (qi) = T 1 |Q| , O(qi, qi+1) = O 1 |Q| 1 , U (qi, qi+1) = U 1 |Q| 1 , E(e) = Es(e), where |Q| is the query length and s(e) is the confidence score of en- tity e obtained from the entity linking step. Considering this para- metric form of the parameters, our final ranking function takes the following form: P(D|Q) rank = T X qi2Q 1 |Q| fT (qi, D)+ O X qi,qi+12Q 1 |Q| 1 fO(qi, qi+1, D)+ U X qi,qi+12Q 1 |Q| 1 fU (qi, qi+1, D)+ D an entity-based representation ˆD is obtained ument terms and considering only entities. In work, the entity represented by document D sta tionships with a number of other entities, as spec edge base. The various relationships are model document. Consider the example in Figure 1, wh represents the entity ANN DUNHAM, who is bein tity BARACK OBAMA (via the relationship <dbo entity-based representation differs from the trad representation in at least two important ways. F appears at most once in each document field. Sec appears in a field, then it should be considered tive of what other entities may appear in that fiel the example in Figure 1, where the field <dbo has multiple values, HONOLULU and HAWAII. these entities is linked in the query, that should fect match against this particular field, irrespec other locations are present in that field. Motivat vations, we define the feature function fE as: fE(e, D) = log X f2F wE f h (1 ↵)tf{0,1}(e, ˆDf ) where the linear interpolation implements the Jeli ing method, with ↵ set to 0.1, and tf{0,1}(e, ˆDf ) the entity e is present in the document field ˆDf or ground model, we employ the notion of documen lows: dfe,f = |{ ˆD|e 2 ˆDf }| is the total number contain the entity e in field f and df(f) = |{ ˆD number of documents with a non-empty field f. All of the feature functions fT , fO, fU , and f rameters wf , which control the field weights. Z set these type of parameters using a learning algo to a large number of parameters to be trained (t he query. Therefore, in SDM, {T,O,U} are taken out tions (cf. Eq. (5)) and can be trained without having to query length normalization. The parameter E, how- be treated the same manner for two reasons. Firstly, annotated entities for each query varies and it is inde- e length of the query. For example, a long natural lan- might be annotated with a single entity, while shorter ten linked to several entities, due to their ambiguity. need to deal with varying levels of uncertainty that ith entity annotations of the query. The confidence ated with the annotations, which are generated by the process, should be integrated into the retrieval model. the above issues, we re-write the parameters as a d function over each clique and define them as: T (qi) = T 1 |Q| , O(qi, qi+1) = O 1 |Q| 1 , U (qi, qi+1) = U 1 |Q| 1 , E(e) = Es(e), he query length and s(e) is the confidence score of en- d from the entity linking step. Considering this para- of the parameters, our final ranking function takes form: rank = T X qi2Q 1 |Q| fT (qi, D)+ O X qi,qi+12Q 1 |Q| 1 fO(qi, qi+1, D)+ U X qi,qi+12Q 1 |Q| 1 fU (qi, qi+1, D)+ an entity-based representation of documents. For each document D an entity-based representation ˆD is obtained by ignoring doc- ument terms and considering only entities. In the context of our work, the entity represented by document D stands in typed rela- tionships with a number of other entities, as specified in the knowl- edge base. The various relationships are modeled as fields in the document. Consider the example in Figure 1, where the document represents the entity ANN DUNHAM, who is being linked to the en- tity BARACK OBAMA (via the relationship <dbo:child>). This entity-based representation differs from the traditional term-based representation in at least two important ways. Firstly, each entity appears at most once in each document field. Secondly, if an entity appears in a field, then it should be considered a match, irrespec- tive of what other entities may appear in that field. Consider again the example in Figure 1, where the field <dbo:birthplace> has multiple values, HONOLULU and HAWAII. Then, if either of these entities is linked in the query, that should account for a per- fect match against this particular field, irrespective of how many other locations are present in that field. Motivated by these obser- vations, we define the feature function fE as: fE(e, D) = log X f2F wE f h (1 ↵)tf{0,1}(e, ˆDf ) + ↵ dfe,f dff i , (13) where the linear interpolation implements the Jelinek-Mercer smooth- ing method, with ↵ set to 0.1, and tf{0,1}(e, ˆDf ) indicates whether the entity e is present in the document field ˆDf or not. For the back- ground model, we employ the notion of document frequency as fol- lows: dfe,f = |{ ˆD|e 2 ˆDf }| is the total number of documents that contain the entity e in field f and df(f) = |{ ˆD| ˆDf 6= ;}| is the number of documents with a non-empty field f. All of the feature functions fT , fO, fU , and fE involve free pa- rameters wf , which control the field weights. Zhiltsov et al. [46] set these type of parameters using a learning algorithm, which leads to a large number of parameters to be trained (the number of fea- field weight (PRMS probability)
  • 18. Overall performance • Experiments on DBpedia-entity collection Contains nearly 500 heterogeneous queries Model SemSearch ES ListSearch INEX-LD MAP P@10 MAP P@10 MAP P@10 LM .2485 .2008 .1529 .1939 .1129 .2210 LM + ELR .2531M (+1.8%) .2008 .1688N (+10.4%) .2096 .1244N (+10.2%) .2330 PRMS .3517 .2685 .1722 .2270 .1184 .2240 PRMS + ELR .3573 (+1.6%) .2700 .1956N (+14.1%) .2417 .1303N (+10%) .2330 SDM .2669 .2108 .1553 .1948 .1167 .2250 SDM + ELR .2641 (-1%) .2115 .1689M (+8.8%) .2174N .1264N (+8.3%) .2340 FSDM* .3563 .2692 .1777 .2165 .1261 .2290 FSDM* + ELR .3581 (+.5%) .2677 .1973M (+11%) .2391N .1332 (+5.6%) .2330 Table 4: Results of ELR approach on different query types. Significance is tested against the line a show the relative improvements, in terms of MAP. Model MAP P@10 LM 0.1588 0.1664 MLM-tc 0.1821 0.1786 MLM-all 0.1940 0.1965 PRMS 0.1936 0.1977 SDM 0.1719 0.1707 FSDM* 0.2030 0.1973 LM + ELR 0.1693N (+6.6%) 0.1757N (+5.6%) MLM-tc + ELR 0.1937N (+6.4%) 0.1895N (+6.1%) MLM-all + ELR 0.2082N (+7.3%) 0.2054M (+4.5%) PRMS + ELR 0.2078N (+7.3%) 0.2085N (+5.5%) SDM + ELR 0.1794N (+4.4%) 0.1812N (+6.1%) FSDM* + ELR 0.2159N (+6.3%) 0.2078N (+5.3%) Table 3: Retrieval results for baseline models (top) and with ELR applied on top of them (bottom). Significance is tested against the corresponding baseline model. Best scores are in boldface. shows the results we get by a lines. We observe consistent i relative ranking of models rem tc < MLM-all < PRMS < FS proved by 4.4–7.3% in terms of P@10. All improvements a these results, we answer our fi tity annotations of the query performance. For the analysis that follow of these models: LM, PRMS, enables us to make a meaning two dimensions: (i) single vs PRMS and FSDM*), and (ii) (LM and PRMS vs. SDM and 6.3 Breakdown by q How are the different quer Intuitively, we expect ELR to “Airports in Germany,” “the first 13 yword queries, involving a mixture of , and attributes (e.g., “Eiffel,” “viet- roman architecture in paris”). guage queries (e.g., “which country fy come from,” “give me all female tatistics on these query subsets. in the term-based representation is a wo of the evaluated models (single- mined retrieval performance against a = 10i , i = 0, 1, 2, 3), see Figure 3. y frequency. It is clear from the figure ined when the top 10 most frequent setting in all our experiments, unless ency, we also used the same setting, y-based representation. ng parameter settings used in our exper- in language models is set to the av- across the collection. The unordered d (11) is chosen to be 8, as suggested parameters involved in SDM, FSDM, y the Coordinate Ascent (CA) algo- ize Mean Average Precision (MAP). mization technique, which iteratively while holding all other parameters CA implementation provided in the the number of random restarts to 3. the parameters of SDM, FSDM*, sing 5-fold cross validation for each ely. We note that Zhiltsov et al. [46] meters (Eq. (5)-(11)) for the FSDM tity representation from [46] (with 10 default search settings, then this set is re-ranked with the specific retrieval model (using an in-house implementation). Evaluation scores are reported on the top 100 results. To perform cross-valida- tion, we randomly create train and test folds from the initial result set, and use the same folds throughout all the experiments. To mea- sure statistical significance we employ a two-tailed paired t-test and denote differences at the 0.01 and 0.05 levels using the N and M sym- bols, respectively. 5.4 Entity linking Entity linking is a key component of the ELR approach. For the purpose of reproducibility, all the entity annotations in this work are obtained using an open source entity linker, TAGME [19], ac- cessed through its RESTful API.1 TAGME is one of the best per- forming entity linkers for short queries [12, 14]. As suggested in the API documentation, we use the default threshold 0.1 in our ex- periments; we analyze the effect of the threshold parameter in §6.5. 6. RESULTS AND ANALYSIS We begin by enumerating our research questions, then present a series of experiments conducted to answer them. 6.1 Research questions We address the following research questions: • RQ1: Can entity retrieval performance be improved by in- corporating entity annotations of the query? (§6.2) • RQ2: How are the different query subsets impacted by ELR? (§6.3) • RQ3: How robust is our method with respect to parameter settings? (§6.4) • RQ4: What is the impact of the entity linking component on end-to-end performance? (§6.5) Retrieval results for the baseline models (top) and with ELR applied on top of them (bottom). Significance is tested against the corresponding baseline model. 6.2 Overall performance
  • 19. nokia e73 What is the longest river? Vietnam war facts Airlines that currently use Boeing 747 planes EU countries University of Texas at Austin Named entity queries: Complex queries (list, type, and NLP queries): Breakdown by query sets Queries are of two main categories:
  • 20. Breakdown by query sets Model SemSearch ES ListSearch INEX-LD QALD-2 MAP P@10 MAP P@10 MAP P@10 MAP P@10 LM .2485 .2008 .1529 .1939 .1129 .2210 .1132 .0729 LM + ELR .2531M (+1.8%) .2008 .1688N (+10.4%) .2096 .1244N (+10.2%) .2330 .1241N (+9.6%) .0836N PRMS .3517 .2685 .1722 .2270 .1184 .2240 .1180 .0893 PRMS + ELR .3573 (+1.6%) .2700 .1956N (+14.1%) .2417 .1303N (+10%) .2330 .1343N (+13.8%) .1064N SDM .2669 .2108 .1553 .1948 .1167 .2250 .1369 .0750 SDM + ELR .2641 (-1%) .2115 .1689M (+8.8%) .2174N .1264N (+8.3%) .2340 .1472M (+7.5%) .0857 FSDM* .3563 .2692 .1777 .2165 .1261 .2290 .1364 .0921 FSDM* + ELR .3581 (+.5%) .2677 .1973M (+11%) .2391N .1332 (+5.6%) .2330 .1583M (+16%) .1086N Table 4: Results of ELR approach on different query types. Significance is tested against the line above; the numbers in parentheses show the relative improvements, in terms of MAP. Model MAP P@10 LM 0.1588 0.1664 MLM-tc 0.1821 0.1786 MLM-all 0.1940 0.1965 PRMS 0.1936 0.1977 SDM 0.1719 0.1707 shows the results we get by applying ELR on top of these base- lines. We observe consistent improvements over all baselines; the relative ranking of models remains the same (LM < SDM < MLM- tc < MLM-all < PRMS < FSDM*), but their performance is im- proved by 4.4–7.3% in terms of MAP, and by 4.5–6.1% in terms of P@10. All improvements are statistically significant. Based on these results, we answer our first research question positively: en- ELR especially benefits complex heterogenous queries.
  • 22. Parameters Value of parameters (average of trained values across folds) Different query types are impacted differently by the ELR method.
  • 23. Impact of parameters Default parameters suggested based on trained values No training is performed (b) PRMS+ELR (c) SDM+ELR (d) FSDM*+ELR T : unigrams, O: ordered bigrams, U : unordered bigrams, E: entities) in our experiments, d using Coordinate Ascent). is attributed to entity-based O,U,E} parameters for each are obtained by averaging olds of the cross-validation d across all retrieval meth- re assigned the highest E r but still sizable portion of S it bears little importance. nclude that different query ELR method. The results prove complex (ListSearch us (INEX-LD) query sets, the other hand, short key- eit often ambiguous, entity f parameters (RQ3)? To configurations: (i) default Model Default params. Trained params. MAP P@10 MAP P@10 LM 0.1588 0.1664 LM + ELR 0.1668M 0.1724 0.1693N 0.1757N PRMS 0.1936 0.1977 PRMS + ELR 0.2028N 0.2035 0.2078N 0.2085N SDM 0.1672 0.1685 0.1719 0.1707 SDM + ELR 0.1721 0.1722 0.1794N 0.1812N FSDM 0.1969 0.1973 0.2030 0.1973 FSDM + ELR 0.2043N 0.1996 0.2159N 0.2078N Table 5: Comparison of default vs. trained parameters over all queries. Significance is tested against the line above. threshold value. Figure 5 reports the results for the best performing model, FSDM* + ELR, for both trained and default parameters. Apart from the small fluctuations in the 0.4–0.6 range, retrieval per- • ELR can bring improvements, even with default parameter values. • ELR is robust against parameter settings.
  • 24. Impact of entity linking 0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 Entity linking threshold 0.195 0.200 0.205 0.210 0.215 0.220 0A3 7rDined pDrDPeters DefDult pDrDPeters Entity linking systems involve a threshold parameter ELR improves performance, even considering entities with low confidence
  • 25. Summary • ELR complements any term-based retrieval model that can be emulated in the MRF framework • ELR brings consistent improvements over all baselines (especially for complex queries) • ELR is robust against parameter settings parameters Entity linking threshold
  • 26. Thanks! Questions? The code and runs are publicly available: http://bit.ly/ictir2016-elr