Más contenido relacionado
Similar a Is this Entitity Relevant to your Needs - CIKM2012 (20)
Is this Entitity Relevant to your Needs - CIKM2012
- 1. Is This Entity Relevant
to Your Needs?
David Carmel
IBM Research - Haifa, Israel
IBM Research - Haifa © 2012 IBM Corporation
- 2. IBM Research - Haifa
Outline
Some Open Questions in Entity Oriented Search (EoS)
What makes an entity relevant to the user needs?
Is it the same relevance that the IR community deals with
Can we adopt exiting IR models into this new area
The classical model of relevance in IR
User based relevance
Topical based relevance (Aboutness)
Similarity based relevance measurements
Supportive evidence as indication of relevancy
For Q&A
For EoS
Relevance Estimation approaches for EOS
Exploration & Discovery in EoS
Summary
2 Is This entity Relevant? © 2012 IBM Corporation
- 3. IBM Research - Haifa
Entity Oriented Search (EoS)
When people use retrieval systems they are often not searching for
documents or text passages
Often named entities play a central role in answering such information
needs
persons, organizations, locations, products…
At least 20-30% of the queries submitted
to Web SE are simply name entities
~71% of Web search queries contain
named entities
(Named entity recognition in query, Guo et al, SIGIR09)
3 Is This entity Relevant? © 2012 IBM Corporation
- 4. IBM Research - Haifa
Popular Entity Oriented Search tools
Product Search
On-line Shopping (books, movies, electronic devices…)
Amazon, eBay…
Travel (places, hotels, flights…)
Yahoo! Travel, Kayak…
Multi-media (Music, Video, Images…)
Last.fm, YouYube, Flickr…
People Search
Expert Search (for a specific topic)
LinkedIn, ArnetMiner…
Friends (colleagues, other people with mutual interests,
lost friends …)
Facebook…
Location Search
Addresses
Businesses
Proximity Search (Find close sites to the current searcher’s
location)
4 Is This entity Relevant? © 2012 IBM Corporation
- 5. IBM Research - Haifa
5 Is This entity Relevant? © 2012 IBM Corporation
- 6. IBM Research - Haifa
Expert Search
The task:
Identify people who are
knowledgeable on a
specific topic
Find people who have
skills and experience
on a given topic
How knowledgeable can
be measured?
How persons should be
ranked, in response to a
query, such that those
with relevant expertise
are ranked first?
6 Is This entity Relevant? © 2012 IBM Corporation
- 7. IBM Research - Haifa
Are those entities satisfy our needs?
What makes an entity relevant to the user’s need?
What is the meaning of relevance in this context?
Is it the same relevance that the IR community deals with for many
decades in the context of document retrieval?
Can we adopt exiting IR models into this new area of Entity oriented
Search in a straight forward manner?
In this talk I’ll try to deal with some of those questions
I’ll overview how the same questions are handled in related
areas, (especially in Q&A)
I’ll raise some research directions that might lead to a better
understanding of the concept of relevance in EoS
7 Is This entity Relevant? © 2012 IBM Corporation
- 8. IBM Research - Haifa
What is an Entity?
Entity: an object or a “thing” that can be uniquely identified in the world
An entity must be distinguished from other entities
Can be anything (including an abstract thing!)
Attributes: Used to describe entities
An attribute contains a single piece of information
Key - A minimal set of attributes that uniquely identify an entity
Entity set: a set of entities of the same type and attributes
id birthday
Actor
name address
8 Is This entity Relevant? © 2012 IBM Corporation
- 9. IBM Research - Haifa
What is a Relationship?
Relationship: Association among two or more entities
A Relationship also may have attributes
Relationship Set: Set of relationships of the same type
code Medication name
id
Patient Prescription Physician id
name
Date
9 Is This entity Relevant? © 2012 IBM Corporation
- 10. IBM Research - Haifa
Example: ERD for Social Search in the Enterprise
Creator
10 Is This entity Relevant? © 2012 IBM Corporation
- 11. IBM Research - Haifa
Entity Relationship Graph (ERG)
Represents
Entity instances as graph nodes
Binary relationships as (weighted) edges
N-ary relations are broken into binary ones
11 Is This entity Relevant? © 2012 IBM Corporation
- 12. IBM Research - Haifa
Entity Oriented Search (EoS)
Entity
Relationship
Entities,
Relations
Index
Entity Relationship
Data
Query Examples:
• Nikon D40
• Teammates of Michael Schumacher
Query
• “Data mining” (Free Text, Entity, Hybrid query)
Runtime
Related Entities, Relationships Ranking
Navigation
Exploration
12 Is This entity Relevant? © 2012 IBM Corporation
- 13. The concept of Relevance in IR
IBM Research - Haifa © 2012 IBM Corporation
- 14. IBM Research - Haifa
The Classical Concept of Relevance in IR (Saracevic76, Mizzaro96)
Problem Request Judgment
P: The user has R: The user expresses J: The same user
problem to solve IN explicitly, usually Judges the
or an aim to In natural language, RELEVANCE
achieve (sometimes with the of search results
help of an intermediary)
Information Query
Need
IN: The user builds Q: Formalization: R is
mental, implicit translated to a formal
representation of P query understandable by
(may be incorrect or the search system
Incomplete)
14 Is This entity Relevant? © 2012 IBM Corporation
- 15. IBM Research - Haifa
User-based (Subjective) Relevance
Relevance is a dynamic concept that depends on the
user’s subjective judgment
Subjective Relevance judgment may depend on:
User’s characteristics and perceptions
Gender, age, education, income, occupation…
Preferences, Interests,
State of mind
The context of search
Level of the user’s expertise (regarding the topic of interests)
Current Time
Current Location
Session status
Dependencies between retrieved items to the
• specific query
• sequential queries during the session
15 Is This entity Relevant? © 2012 IBM Corporation
- 16. IBM Research - Haifa
Topical-based relevance judgment
How well the topic of the information retrieved matches the topic
of the request
An object is objectively relevant to a request if
it deals with the topic of the request (Aboutness)
TREC working definition for relevance assessment:
If you are writing a report on the topic and would use the
information contained in the document in the report –
then the document is considered relevant to the topic…
A document is judged relevant if any piece of it is relevant regardless of
how small that piece is in relation to the rest of the document
16 Is This entity Relevant? © 2012 IBM Corporation
- 17. IBM Research - Haifa
Probability Ranking Principal
Given a set of documents that “match” the entity-oriented query
How do we rank them for the user?
The Probability Ranking Principal (PRP) for Document Retrieval
(Robertson 71):
``If a retrieval system's response to each request is a ranking of the documents
in the collection in order of decreasing probability of relevance to the user who
submitted the request,
where the probabilities are estimated as accurately as possible on the basis of whatever data
have been made available to the system for this purpose,
The overall effectiveness of the system to its user will be the best…''
Pr( R = 1 | d , q) Pr( R = 1 | e, q )
We need a reliable and coherent methodology for measuring
the probability of relevance of an entity to a query
17 Is This entity Relevant? © 2012 IBM Corporation
- 18. IBM Research - Haifa
Relevance estimation in classic Document Retrieval
Most relevance approximation approaches for
document retrieval are based on measuring
some kind of similarity between the user's query
and retrieved documents
Vector Space:
The Cosine of the angle between two vectors
Concept space:
similarity in the latent concept space
• e.g. LDA, LSI, ESA
Language models:
Similarity between the
documents and the query term distributions
Can we use similar approaches for EoS?
18 Is This entity Relevant? © 2012 IBM Corporation
- 19. IBM Research - Haifa
Entity Similarity
While similarity plays a central role in document retrieval for relevance
estimation many relevant entities are not similar to the queried entity
At least according to standard definitions of similarity
This problem is well known in the Question Answering domain
The answer is not necessarily “similar” to the question
The supportive passage is not always similar to the question
Example: Who killed JFK?
John F. Kennedy (JFK), the thirty-fifth President of the United States, was
assassinated at 12:30 p.m. Central Standard Time (18:30 UTC) on Friday,
November 22, 1963, in Dealey Plaza, Dallas, Texas.
The ten-month investigation of the Warren Commission of 1963–1964 concluded
that the President was assassinated by Lee Harvey Oswald.
19 Is This entity Relevant? © 2012 IBM Corporation
- 20. IBM Research - Haifa
Relevance Judgment in Question Answering
In QA we usually assume a question that identifies the information need
“precisely”
Who was the first American in space?
How many calories are there in a Big Mac?
How many Grand Slam titles did Bjorn Borg win?
When an answer will be considered relevant to the question?
It must be correct!
i.e. it Must has supportive evidences (from reliable sources)
A prominent factor in answering a question is not so much in finding an answer but in
validating whether the candidate answer is correct
Therefore supportive evidence is essential
Assessment instructions from the TREC’s QA track:
Assessors read each candidate answer and make a binary
decision as to whether or not the candidate is actually an
answer to the question
in the context provided by the supportive document
20 Is This entity Relevant? © 2012 IBM Corporation
- 21. IBM Research - Haifa
What do you mean the answer is correct?
As in Document retrieval – correctness/relevance in QA might be subjective
and user dependent
Where is the Taj Mahal?
Agra, India? The famous temple
Atlantic-City, NJ? Casino?
In TREC, it is common to consider each candidate answer with (relevant) supportive
evidences as correct one
This leads to the understanding how various candidate answers can be ranked:
i.e. Relevance judgment is transformed to the judgment of the relevance of
supporting evidences
This approach can be applied to Entity oriented Search
Rank retrieved entitles according to the amount and quality of their
supportive evidences!
Entity Ranking should be based on the supportive evidences
for their relevance to the query
21 Is This entity Relevant? © 2012 IBM Corporation
- 23. IBM Research - Haifa
The Expert Profile based Approach (Craswell et all 2001):
Represent each person by a virtual document (a profile)
Employee directory (in the enterprise)
Concatenating all existing passages mentioning the person
Rank those profiles according to their relevance to the query
Using standard IR ranking techniques
The user profile can be naturally used as supportive evidence to the user
expertise
Difficulties:
Co-resolution and name disambiguation
Privacy concerns
23 Is This entity Relevant? © 2012 IBM Corporation
- 24. IBM Research - Haifa
EoS: Voting approach (Balog06, MacDonald09)
Any relevant document is a “voter” for the entities it mentions / relates-to
p1
d1
q d2 p2
d3
p3
Score( p, q ) = ∑ Score(d , q ) ∗ Score( p, d )
d
What is the ratio behind?
An entity mentioned many times in relevant (top retrieved) docs
is more likely to be relevant on the given topic?
24 Is This entity Relevant? © 2012 IBM Corporation
- 25. IBM Research - Haifa
Relevance Propagation (Serdyukov 2008)
We should also consider entities that are indirectly related to the query
Relevance is propagated through the entity relationship graph
p1
d1
q d2 p2 p4
d3
p3
d4
How relevance should be propagated in the graph?
25 Is This entity Relevant? © 2012 IBM Corporation
- 26. IBM Research - Haifa
Proximity in the Entity Relationship Graph - Random walks
Random walk approach
The relationship strength between two
nodes is reflected by the probability that a
random surfer who starts at one node will
visit the second one during the walk
Justification
Popular Random Walk Approaches
The more paths that connect the two
SimRank(u,v):
entities in the graph How soon two random surfers (starting at u,v) are
the higher the probability that the expected to meet at the same node
surfer will visit the target entity Random walk with Restart (RWR) :
The surfer has a fixed restart probability to return to
The higher the relationship strength the source
between the two Lazy Random Walk
The surfer has a fixed probability of halting the walk at
each step
Effective Conductance
Only simple (cycle free) paths –
treating edges as resistors
26 Is This entity Relevant? © 2012 IBM Corporation
- 27. IBM Research - Haifa
Markov Random Fields for EoS (Raviv, Carmel, Kurland, 2012)
Q =< {q1...qn }, T >
P( E | Q) ∑
P∈{ D ,T , N }
λE P( EP | Q)
P
27 Is This entity Relevant? © 2012 IBM Corporation
- 28. IBM Research - Haifa
MRF based Entity Document Scoring P(ED|Q)
We consider three types cliques
Full Independent
Sequential dependent
Full dependent
The feature function over cliques
measures how well the clique's terms represent the entity document
Based on Dirichlet smoothed language model
T tf (qi , ED ) + µ ⋅ cf (qi )/ | C |
f (qi , ED ) log
| ED | + µ
D
For dependent models we replace qi with
#1(qi..qi+k) and #uwN({qi,.. qj}) respectively
The entity document scoring function aggregates the feature functions
over all clique types
P( ED | Q) ∑
I ∈{T ,O ,U }
λ
I
ED ∑
c∈I ED
I
f (c )
D
28 Is This entity Relevant? © 2012 IBM Corporation
- 29. IBM Research - Haifa
Entity type Scoring P(ET|Q)
We measure the “similarity” between the query type and the entity type
e −α d (QT , ET )
P ( ET | Q) = fT (c) log −α d ( QT , E 'T )
∑ E '∈R e
d(QT,ET) - the type distance,
is domain dependent
In our experiments we
measured the distance in the
Wikipedia category graph
The minimal path length
between all pairs of the
query and the entity’s
page categories
29 Is This entity Relevant? © 2012 IBM Corporation
- 30. IBM Research - Haifa
Entity Name Scoring P(EN|Q)
We measure the dependency between the query term(s) and
the entity name
Globally
Measure the proximity between the query term(s) and the entity name in
the whole collection
• We use pointwise mutual information (PMI) – the likelihood of finding
one term in proximity to another term
Locally
Measure the proximity between the query terms and the entity name in the
top retrieved documents
P( EN | Q) = ∑ λE
X ∈A
X
N ∑
c∈X EN
f NX (c)
A = {S , T , O , U , PMI T , PMI O , PMI U }
30 Is This entity Relevant? © 2012 IBM Corporation
- 31. IBM Research - Haifa
Experimental Results over INEX Entity track (2007-2009)
Full Independence Sequential dependence
0.4 0.4
0.35 0.35
0.3 0.3
0.25 0.25 2007
2007
0.2 2008 0.2 2008
2009 2009
0.15 0.15
0.1 0.1
0.05 0.05
0 0
S(ED) S(ED,ET) S(ED,ET,EN) INEX top S(ED) S(ED,ET) S(ED,ET,EN) INEX top
Results are improved significantly
Full dependence
0.4
when type and name scoring were
0.35 added
0.3
0.25 2007
0.2 2008
2009
Final Results are superior to top INEX
0.15
0.1
results at 2007,2008, and comparable
0.05 to 2009
0
S(ED) S(ED,ET) S(ED,ET,EN) INEX top
Dependence models have not
improved over Independence model??
31 Is This entity Relevant? © 2012 IBM Corporation
- 32. IBM Research - Haifa
Exploratory EoS
When only an entity is given as input, the information need is quite fuzzy
Any related entity has a potential to be relevant
Therefore any related entity should be retrieved!
High diversity in search results (entity types, relationship types)
How can we ease the user to find the most relevant answers?
Iterative IR – let the user navigate and explore the ER graph
Facet search:
Categorize the search results according to their facets (entity types/attributes..)
Let the user drill down: restrict retrieved entities to a specific facet
NOTE: We still need to rank the search results in each of the facets!
Graph navigation:
Let the user explore the graph by using a retrieved entity as a pivot to a new
search
Query reformulation
32 Is This entity Relevant? © 2012 IBM Corporation
- 33. IBM Research - Haifa
Search over Social Media Data (SaND) – (Carmel 2009, Guy 2010)
SaND provides social aggregation over social
data
SaND builds an entity-entity relationship
matrix that maps a given entity to all related
entities, weighted by their relationship
strength
Direct relations of a user to:
document – as an author, tagger
and commenter
another user – as a friend or as a
manager/employee
tag – she used, or tagged by others
group –as a member/owner
Indirect relations:
Two entities are indirectly related if
both are directly related to the same
entity
The overall relationship strength between two
entities is determined by a linear combination
of their direct and indirect relationship
strengths
33 Is This entity Relevant? © 2012 IBM Corporation
- 34. IBM Research - Haifa
Search for the term ‘social’
Related People – Ranked list of
people that are related to the topic
and to the result set, in one or more
relationship types (author,
commenter, tagger, etc.)
Results contain different types of
entities – Blogs, Communities,
bookmarked documents etc..
Popular, higher ranked results
Related Tags – Ranked tag cloud for appear higher in the result set.
this result set.
34 Is This entity Relevant? © 2012 IBM Corporation
- 35. IBM Research - Haifa
Narrowing the search to Luis
Suarez’ related results Hovering over a result, highlights
the related people and tags
35 Is This entity Relevant? © 2012 IBM Corporation
- 36. IBM Research - Haifa
Viewing results for query ‘social’
and person ‘Luis Suarez’
Viewing Luis’ business card, and
results related to him
36 Is This entity Relevant? © 2012 IBM Corporation
- 37. IBM Research - Haifa
Summary
In this talk we raised several questions related to the concept of relevance
in EoS:
What makes an entity relevant to the user’s need?
What is the meaning of relevance in this context?
Is it the same notion of relevance used in document retrieval?
We argue that the relevance of an entity can be estimated, according to
supportive evidences provided by the search system
We talked on EoS common retrieval techniques:
Profile based approach
The Voting approach
Relevance propagation
We discussed several examples of EoS systems and how relevance
estimation can be applied in these domains
We claimed that the scale and diversity of EoS search results demand
Exploratory search techniques such as Facet search and Graph
navigation
37 Is This entity Relevant? © 2012 IBM Corporation
- 38. IBM Research - Haifa
Open Questions and Challenges
Entity Similarity
While in document retrieval similarity plays a central role
in relevance judgment, entity similarity measurement
should still be better understood
Attribute based similarity, Evidence based similarity
Graph proximity
Hybrid approaches
The clustering hypothesis:
Are two “similar” entities likely being relevant to the same information need?
Challenges
to what extent relevant entities are indeed similar to each other
and according to which similarity measurement
Relevance propagation: What relationship types provide effective relevance
propagation channels?
Do your friends inherit your own expertise?
Which relationship types contribute to relevance propagation?
38 Is This entity Relevant? © 2012 IBM Corporation
- 39. IBM Research - Haifa
Thank You!
Questions?
39 Is This entity Relevant? © 2012 IBM Corporation
- 40. Is This Entity Relevant
to Your Needs?
David Carmel
IBM Research - Haifa, Israel
IBM Research - Haifa © 2012 IBM Corporation