Is this Entitity Relevant to your Needs - CIKM2012

Is This Entity Relevant
to Your Needs?
David Carmel
IBM Research - Haifa, Israel

IBM Research - Haifa © 2012 IBM Corporation

IBM Research - Haifa

Outline
Some Open Questions in Entity Oriented Search (EoS)
What makes an entity relevant to the user needs?
Is it the same relevance that the IR community deals with
Can we adopt exiting IR models into this new area
The classical model of relevance in IR
User based relevance
Topical based relevance (Aboutness)
Similarity based relevance measurements
Supportive evidence as indication of relevancy
For Q&A
For EoS
Relevance Estimation approaches for EOS
Exploration & Discovery in EoS
Summary

2 Is This entity Relevant? © 2012 IBM Corporation


Entity Oriented Search (EoS)

When people use retrieval systems they are often not searching for
documents or text passages

Often named entities play a central role in answering such information
needs
persons, organizations, locations, products…

At least 20-30% of the queries submitted
to Web SE are simply name entities

~71% of Web search queries contain
named entities
(Named entity recognition in query, Guo et al, SIGIR09)



Popular Entity Oriented Search tools
Product Search
On-line Shopping (books, movies, electronic devices…)
Amazon, eBay…
Travel (places, hotels, flights…)
Yahoo! Travel, Kayak…
Multi-media (Music, Video, Images…)
Last.fm, YouYube, Flickr…
People Search
Expert Search (for a specific topic)
LinkedIn, ArnetMiner…
Friends (colleagues, other people with mutual interests,
lost friends …)
Facebook…

Location Search
Addresses
Businesses
Proximity Search (Find close sites to the current searcher’s
location)



Expert Search

The task:
Identify people who are
knowledgeable on a
specific topic
Find people who have
skills and experience
on a given topic

How knowledgeable can
be measured?
How persons should be
ranked, in response to a
query, such that those
with relevant expertise
are ranked first?



Are those entities satisfy our needs?
What makes an entity relevant to the user’s need?
What is the meaning of relevance in this context?
Is it the same relevance that the IR community deals with for many
decades in the context of document retrieval?
Can we adopt exiting IR models into this new area of Entity oriented
Search in a straight forward manner?

In this talk I’ll try to deal with some of those questions

I’ll overview how the same questions are handled in related
areas, (especially in Q&A)

I’ll raise some research directions that might lead to a better
understanding of the concept of relevance in EoS


What is an Entity?
Entity: an object or a “thing” that can be uniquely identified in the world
An entity must be distinguished from other entities
Can be anything (including an abstract thing!)
Attributes: Used to describe entities
An attribute contains a single piece of information
Key - A minimal set of attributes that uniquely identify an entity
Entity set: a set of entities of the same type and attributes

id birthday
Actor
name address



What is a Relationship?
Relationship: Association among two or more entities
A Relationship also may have attributes
Relationship Set: Set of relationships of the same type

code Medication name

id
Patient Prescription Physician id

name
Date



Example: ERD for Social Search in the Enterprise

Creator



Entity Relationship Graph (ERG)
Represents
Entity instances as graph nodes
Binary relationships as (weighted) edges
N-ary relations are broken into binary ones



Entity Oriented Search (EoS)

Entity
Relationship
Entities,
Relations
Index
Entity Relationship
Data

Query Examples:
• Nikon D40
• Teammates of Michael Schumacher
Query
• “Data mining” (Free Text, Entity, Hybrid query)
Runtime
Related Entities, Relationships Ranking
Navigation
Exploration


The concept of Relevance in IR



The Classical Concept of Relevance in IR (Saracevic76, Mizzaro96)

Problem Request Judgment

P: The user has R: The user expresses J: The same user
problem to solve IN explicitly, usually Judges the
or an aim to In natural language, RELEVANCE
achieve (sometimes with the of search results
help of an intermediary)
Information Query
Need

IN: The user builds Q: Formalization: R is
mental, implicit translated to a formal
representation of P query understandable by
(may be incorrect or the search system
Incomplete)



User-based (Subjective) Relevance
Relevance is a dynamic concept that depends on the
user’s subjective judgment

Subjective Relevance judgment may depend on:
User’s characteristics and perceptions
Gender, age, education, income, occupation…
Preferences, Interests,
State of mind
The context of search
Level of the user’s expertise (regarding the topic of interests)
Current Time
Current Location
Session status
Dependencies between retrieved items to the
• specific query
• sequential queries during the session



Topical-based relevance judgment
How well the topic of the information retrieved matches the topic
of the request
An object is objectively relevant to a request if
it deals with the topic of the request (Aboutness)

TREC working definition for relevance assessment:
If you are writing a report on the topic and would use the
information contained in the document in the report –
then the document is considered relevant to the topic…

A document is judged relevant if any piece of it is relevant regardless of
how small that piece is in relation to the rest of the document



Probability Ranking Principal
Given a set of documents that “match” the entity-oriented query
How do we rank them for the user?

The Probability Ranking Principal (PRP) for Document Retrieval
(Robertson 71):
``If a retrieval system's response to each request is a ranking of the documents
in the collection in order of decreasing probability of relevance to the user who
submitted the request,
where the probabilities are estimated as accurately as possible on the basis of whatever data
have been made available to the system for this purpose,
The overall effectiveness of the system to its user will be the best…''

Pr( R = 1 | d , q) Pr( R = 1 | e, q )
We need a reliable and coherent methodology for measuring
the probability of relevance of an entity to a query



Relevance estimation in classic Document Retrieval
Most relevance approximation approaches for
document retrieval are based on measuring
some kind of similarity between the user's query
and retrieved documents
Vector Space:
The Cosine of the angle between two vectors
Concept space:
similarity in the latent concept space
• e.g. LDA, LSI, ESA
Language models:
Similarity between the
documents and the query term distributions

Can we use similar approaches for EoS?



Entity Similarity
While similarity plays a central role in document retrieval for relevance
estimation many relevant entities are not similar to the queried entity
At least according to standard definitions of similarity

This problem is well known in the Question Answering domain
The answer is not necessarily “similar” to the question
The supportive passage is not always similar to the question

Example: Who killed JFK?
John F. Kennedy (JFK), the thirty-fifth President of the United States, was
assassinated at 12:30 p.m. Central Standard Time (18:30 UTC) on Friday,
November 22, 1963, in Dealey Plaza, Dallas, Texas.

The ten-month investigation of the Warren Commission of 1963–1964 concluded
that the President was assassinated by Lee Harvey Oswald.



Relevance Judgment in Question Answering
In QA we usually assume a question that identifies the information need
“precisely”
Who was the first American in space?
How many calories are there in a Big Mac?
How many Grand Slam titles did Bjorn Borg win?

When an answer will be considered relevant to the question?
It must be correct!
i.e. it Must has supportive evidences (from reliable sources)

A prominent factor in answering a question is not so much in finding an answer but in
validating whether the candidate answer is correct
Therefore supportive evidence is essential

Assessment instructions from the TREC’s QA track:
Assessors read each candidate answer and make a binary
decision as to whether or not the candidate is actually an
answer to the question
in the context provided by the supportive document


What do you mean the answer is correct?
As in Document retrieval – correctness/relevance in QA might be subjective
and user dependent
Where is the Taj Mahal?
Agra, India? The famous temple
Atlantic-City, NJ? Casino?

In TREC, it is common to consider each candidate answer with (relevant) supportive
evidences as correct one

This leads to the understanding how various candidate answers can be ranked:
i.e. Relevance judgment is transformed to the judgment of the relevance of
supporting evidences
This approach can be applied to Entity oriented Search
Rank retrieved entitles according to the amount and quality of their
supportive evidences!
Entity Ranking should be based on the supportive evidences
for their relevance to the query

Relevance Estimation Approaches
for EoS



The Expert Profile based Approach (Craswell et all 2001):
Represent each person by a virtual document (a profile)
Employee directory (in the enterprise)
Concatenating all existing passages mentioning the person

Rank those profiles according to their relevance to the query
Using standard IR ranking techniques

The user profile can be naturally used as supportive evidence to the user
expertise

Difficulties:
Co-resolution and name disambiguation
Privacy concerns



EoS: Voting approach (Balog06, MacDonald09)
Any relevant document is a “voter” for the entities it mentions / relates-to

p1
d1

q d2 p2

d3
p3

Score( p, q ) = ∑ Score(d , q ) ∗ Score( p, d )
d

What is the ratio behind?
An entity mentioned many times in relevant (top retrieved) docs
is more likely to be relevant on the given topic?



Relevance Propagation (Serdyukov 2008)
We should also consider entities that are indirectly related to the query
Relevance is propagated through the entity relationship graph

p1
d1

q d2 p2 p4

d3
p3

d4

How relevance should be propagated in the graph?



Proximity in the Entity Relationship Graph - Random walks
Random walk approach
The relationship strength between two
nodes is reflected by the probability that a
random surfer who starts at one node will
visit the second one during the walk

Justification
Popular Random Walk Approaches
The more paths that connect the two
SimRank(u,v):
entities in the graph How soon two random surfers (starting at u,v) are
the higher the probability that the expected to meet at the same node

surfer will visit the target entity Random walk with Restart (RWR) :
The surfer has a fixed restart probability to return to
The higher the relationship strength the source
between the two Lazy Random Walk
The surfer has a fixed probability of halting the walk at
each step
Effective Conductance
Only simple (cycle free) paths –
treating edges as resistors



Markov Random Fields for EoS (Raviv, Carmel, Kurland, 2012)

Q =< {q1...qn }, T >

P( E | Q) ∑
P∈{ D ,T , N }
λE P( EP | Q)
P



MRF based Entity Document Scoring P(ED|Q)
We consider three types cliques
Full Independent
Sequential dependent
Full dependent
The feature function over cliques
measures how well the clique's terms represent the entity document
Based on Dirichlet smoothed language model
T  tf (qi , ED ) + µ ⋅ cf (qi )/ | C | 
f (qi , ED ) log  
| ED | + µ
D
 
For dependent models we replace qi with
#1(qi..qi+k) and #uwN({qi,.. qj}) respectively
The entity document scoring function aggregates the feature functions
over all clique types

P( ED | Q) ∑
I ∈{T ,O ,U }
λ
I
ED ∑
c∈I ED
I
f (c )
D



Entity type Scoring P(ET|Q)
We measure the “similarity” between the query type and the entity type

 e −α d (QT , ET ) 
P ( ET | Q) = fT (c) log  −α d ( QT , E 'T ) 
 ∑ E '∈R e
 

d(QT,ET) - the type distance,
is domain dependent
In our experiments we
measured the distance in the
Wikipedia category graph
The minimal path length
between all pairs of the
query and the entity’s
page categories



Entity Name Scoring P(EN|Q)
We measure the dependency between the query term(s) and
the entity name
Globally
Measure the proximity between the query term(s) and the entity name in
the whole collection
• We use pointwise mutual information (PMI) – the likelihood of finding
one term in proximity to another term
Locally
Measure the proximity between the query terms and the entity name in the
top retrieved documents
P( EN | Q) = ∑ λE
X ∈A
X
N ∑
c∈X EN
f NX (c)

A = {S , T , O , U , PMI T , PMI O , PMI U }



Experimental Results over INEX Entity track (2007-2009)
Full Independence Sequential dependence

0.4 0.4
0.35 0.35
0.3 0.3
0.25 0.25 2007
2007
0.2 2008 0.2 2008
2009 2009
0.15 0.15
0.1 0.1
0.05 0.05
0 0
S(ED) S(ED,ET) S(ED,ET,EN) INEX top S(ED) S(ED,ET) S(ED,ET,EN) INEX top

Results are improved significantly
Full dependence

0.4
when type and name scoring were
0.35 added
0.3
0.25 2007
0.2 2008
2009
Final Results are superior to top INEX
0.15
0.1
results at 2007,2008, and comparable
0.05 to 2009
0
S(ED) S(ED,ET) S(ED,ET,EN) INEX top

Dependence models have not
improved over Independence model??


Exploratory EoS
When only an entity is given as input, the information need is quite fuzzy
Any related entity has a potential to be relevant
Therefore any related entity should be retrieved!
High diversity in search results (entity types, relationship types)

How can we ease the user to find the most relevant answers?

Iterative IR – let the user navigate and explore the ER graph
Facet search:
Categorize the search results according to their facets (entity types/attributes..)
Let the user drill down: restrict retrieved entities to a specific facet
NOTE: We still need to rank the search results in each of the facets!
Graph navigation:
Let the user explore the graph by using a retrieved entity as a pivot to a new
search
Query reformulation



Search over Social Media Data (SaND) – (Carmel 2009, Guy 2010)
SaND provides social aggregation over social
data

SaND builds an entity-entity relationship
matrix that maps a given entity to all related
entities, weighted by their relationship
strength
Direct relations of a user to:
document – as an author, tagger
and commenter
another user – as a friend or as a
manager/employee
tag – she used, or tagged by others
group –as a member/owner
Indirect relations:
Two entities are indirectly related if
both are directly related to the same
entity

The overall relationship strength between two
entities is determined by a linear combination
of their direct and indirect relationship
strengths



Search for the term ‘social’

Related People – Ranked list of
people that are related to the topic
and to the result set, in one or more
relationship types (author,
commenter, tagger, etc.)

Results contain different types of
entities – Blogs, Communities,
bookmarked documents etc..
Popular, higher ranked results
Related Tags – Ranked tag cloud for appear higher in the result set.
this result set.


Narrowing the search to Luis
Suarez’ related results Hovering over a result, highlights
the related people and tags



Viewing results for query ‘social’
and person ‘Luis Suarez’

Viewing Luis’ business card, and
results related to him



Summary
In this talk we raised several questions related to the concept of relevance
in EoS:
What makes an entity relevant to the user’s need?
What is the meaning of relevance in this context?
Is it the same notion of relevance used in document retrieval?
We argue that the relevance of an entity can be estimated, according to
supportive evidences provided by the search system
We talked on EoS common retrieval techniques:
Profile based approach
The Voting approach
Relevance propagation
We discussed several examples of EoS systems and how relevance
estimation can be applied in these domains
We claimed that the scale and diversity of EoS search results demand
Exploratory search techniques such as Facet search and Graph
navigation


Open Questions and Challenges
Entity Similarity
While in document retrieval similarity plays a central role
in relevance judgment, entity similarity measurement
should still be better understood
Attribute based similarity, Evidence based similarity
Graph proximity
Hybrid approaches
The clustering hypothesis:
Are two “similar” entities likely being relevant to the same information need?

Challenges
to what extent relevant entities are indeed similar to each other
and according to which similarity measurement
Relevance propagation: What relationship types provide effective relevance
propagation channels?
Do your friends inherit your own expertise?
Which relationship types contribute to relevance propagation?



Thank You!

Questions?

Is this Entitity Relevant to your Needs - CIKM2012

Recomendados

Recomendados

Más contenido relacionado

Similar a Is this Entitity Relevant to your Needs - CIKM2012

Similar a Is this Entitity Relevant to your Needs - CIKM2012 (20)

Is this Entitity Relevant to your Needs - CIKM2012