1. Personalization in Information Retrieval,
Extraction and Access
Workshop On Ontology, NLP, Personalization And IE/IR - IIT Bombay, Mumbai 15-17 July 2008
Vasudeva Varma
www.iiit.ac.in/~vasu
2. Search Engine Heat is On!
2
Applications of Search Technologies
Web search
Product search
Service search
Domain Search
Already a BIG Market
HUGE Opportunity
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008
2
3. Agenda
3
Evolution of Search Engines
Information Retrieval Vs. Extraction Vs. Access
Personalization in IR, IE and IA
Applications in Personalized IA
Conclusions
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008
3
4. Evolution of Search Engines
4
Crawling and Indexing
Topic directories
Clustering and Classification
Hyperlink analysis
Resource discovery and vertical portals
Semantic Web
???
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008
4
5. Current IR engines fail – why?
5
Wide variation in retrieval results
User topic
Retrieval system
Different approaches work for different systems.
No way to determine which approach will work for
a particular query.
Solution:
Deeper analysis of the content and Query
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008
5
6. Motivation for Deeper Analysis
6
Texts are one of the major sources of
information and knowledge.
However, they are not transparent.
They have to be systematically integrated with
the other sources like data bases, numerical data,
etc.
NLP/IR/IE for better analysis
IA for better presentation
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H
5/30/2008
6
7. Agenda
7
Evolution of Search Engines
Information Retrieval Vs. Extraction Vs. Access
Personalization in IR, IE and IA
Applications in Personalized IA
Conclusions
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008
7
8. IR vs. IE vs. IA
8
To search and retrieve documents in response to queries for
information
Vs.
To extract information that fits pre-defined database schemas or
templates, specifying the output formats
Vs.
To make the required information accessible to the user in their
choice of language, mode, level of detail and format
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008
8
9. Characterization of Texts
IR System
Queries
Collection of Texts
9 IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008
10. Knowledge
Characterization of Texts Interpretation
IR System
Queries
Collection of Texts
10 IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008
11. Knowledge
Characterization of Texts Interpretation
Passage
IR System Queries
Collection of Texts
11 IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008
12. Knowledge
Characterization of Texts
Interpretation
Passage IE System
IR System Queries
Structures
of
Sentences
Collection of Texts NLP
Texts Templates
12 IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008
13. Information Access
Knowledge
Technologies
Interpretation Machine
Translation
Passage IE System
IR System
Summarization
I
Snippet
Generation
NL Generation
Visualization
Tools
13 IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008
14. Agenda
14
Evolution of Search Engines
Information Retrieval Vs. Extraction Vs. Access
Personalization in IR, IE and IA
Applications in Personalized IA
Conclusions
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008
14
15. Limitations of Current IR Systems
15
All users get same results for a given query –
independent of:
Previous search history
Current Search Context
Treat all users the same
Does one size fits all?
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008
15
16. Personalized Web Search
16
Automatic adjustment of information content, structure, and
presentation tailored to an individual user.
Characteristics: Age, Gender, Special Interest Groups, Topic
Personalize Search Results using
Personal content
Past Activities (long term and short term)
Variations:
Explicit or Implicit profile setup
Explicit or Implicit relevance feedback
Client side or server side storage of information (privacy implications)
User control over amount of personalization
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008
16
17. Overview of Personalized Search
17
Typically a 3 step process:
1. Obtain results (n>>10)
2. Computer Similarity (results, User)
3. Re-rank the results
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008
17
18. 18
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008
18
19. 19
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008
19
20. Techniques
20
Co-active Techniques
Pro-active Techniques
Collaborative Filtering
User Profile based Result Pruning
User Profile based Query Expansion
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008
20
21. Problem Description
Personalized Search - Issues
What to use to Personalize?
How to Personalize?
When not to Personalize?
How to know Personalization helped?
21
22. Problem Description
We focus on the issue How to Personalize?
Problem Statement
How to learn to personalize for future searches using
past search history
How to model and represent past search contexts
How to use it to improve search results
22
23. Solution - Outline
Model and Represent past user feedback – Learning
user profile
Use implicit feedback
Long term learning
User contexts – triples
{user,query,{relevant documents}}
Improve Search Results – Reranking
Get Initial Search results
Take top few and rescore using user profile and rearrange
23
24. Contributions
I Search : A suite of approaches for Personalized
Web Search
Proposed Personalized search approaches
Baseline
Basic Retrieval methods
Automatic Evaluation
Analysis of Query Log
24
25. Review of Personalized Search
Personalized Search
Query logs Machine learning Language modeling Community based Others
25
26. I Search : A suite of Techniques for
Personalized IR
Suite of Approaches???
Statistical Language modeling based approaches
Simple N-gram based methods
Noisy Channel Model based method
Machine learning based approach
Ranking SVM based method
Personalization without relevance feedback
Simple N-gram based method
26
27. Statistical Language Modeling based
Approaches:Overview
From user contexts, capture statistical properties
of texts
Use the same to improve search results
Different Contexts
Unigram and Bigrams
Simple N-gram based approaches
Relationship between query and document words
Noisy Channel based approach
27
28. Simple N-gram based approaches
N-gram : general term for words
1-gram : unigram, 2-gram : bigram
Capture statistical properties of text
Single words (Unigrams)
Two adjacent words (Bigrams)
28
29. Learning user profile
Given Past search history
Hu = {(q1, rf1), (q2, rf2), …, (qn, rfn)}
rfall = contentation of all rf
For each unigram wi
User profile
29
31. Reranking
In general LM for IR
Our Approach
31
32. Noisy Channel based Approach
Documents and Queries different information spaces
Queries – short, concise
Documents – more descriptive
Most methods to retrieval or personalized web
search do not model this
Capture relationship between query and document
words
32
33. Machine Learning based
Approaches:Introduction
Most machine learning for IR - Binary classification
problem – “relevant” and “non-relevant”
Click through data
Click is not an absolute relevance but relative relevance
i.e., assuming clicked – relevant, un clicked - irrelevant
is wrong.
Clicks – biased
Partial relative relevance - Clicked documents are more
relevant than the un clicked documents.
33
34. Personalized Search without Relevance
Feedback:Introduction
Can personalized be done without relevance
feedback about which documents are relevant
How much informative are the queries posed by
users
Is information contained in the queries enough to
personalize?
34
35. Approach
Past queries of the user available
Make effective use of past queries
Simple N-gram based approach
35
36. Experiment Results
Language Modeling – Best Results!
Interesting framework Personalized Search
Simple N-gram based approaches also worked well
Noisy Channel model worked best
Extracting Synthetic Queries helped
Different Training schemes
IBM Model1 Vs GIZA++
Snippet Vs Document
Machine Learning – competitive results
Different Features and weights
Without Relevance Feedback – Very encouraging results
Simple Approach worked well
Sparsity – Query log was useful
36
37. Agenda
37
Evolution of Search Engines
Information Retrieval Vs. Extraction Vs. Access
Personalization in IR, IE and IA
Applications in Personalized IA
Conclusions
Personalized Search Personalized
Engine for Mobile Summarization
Phones (for Mobile Devices)
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008
37
38. “Personalized” Search Engine for
mobile devices
To develop a “personalized” Search Engine for mobile devices
that will produce more relevant results based on the query
and the “context”
What we mean by “Personalized” search?
user will be able to configure the search interfaces (Explicit feedback)
System will observe user behavior and customize itself to suit user’s
needs (Implicit feedback)
What we mean by “Context”?
User, time, location, …
Goal is to make Search accessible on Nokia mobile devices and make use of
the mobile aspects for personalization.
38 (C) Vasudeva Varma, IIIT Hyderabad, India
38
39. Scope of the Application
Client Side Server Side
39 (C) Vasudeva Varma, IIIT Hyderabad, India
39
40. Problem Re-Definition
Dynamic user behavior tracking
An observer that keeps track of all “relevant” user actions
Client module
Analysis of user actions
Interpret the user actions to derive user interests (categories of interests)
so that more relevant results are displayed
Construction of user profile implicitly
Implicit Supervised learning
Personalization
Based on Query
Based on User Profile
Based on other parameters such as time, location
40 (C) Vasudeva Varma, IIIT Hyderabad, India
40
42. Personalized Summarization:
Motivation
The success that search engine providers have found on the PC
have failed to translate to the mobile phone. why?
Because trying to force a PC-based search experience inside a mobile
device falls short on a key area of usability
Search queries typically return hundreds of potential hits.
Making sense of such output is difficult.
The results may or may not be of user interest.
We are looking for a faster and easier way to access
precise information on our mobile devices.
42 (C) Vasudeva Varma, IIIT Hyderabad, India
42
43. Challenges
Can we offer users a more simple, friendly and
intuitive experience?
We are looking forward to provide more
information with less payload in form of a summary
which will take care of
context
history
preferences
device capabilities
social network
43 (C) Vasudeva Varma, IIIT Hyderabad, India
43
44. System Model
Search Engine
44
(C) Vasudeva Varma, IIIT Hyderabad, India
44
45. Summary
45
Current Search Engines are inadequate and current
know-how is only the tip of an ice-berg
IR, IE and IA areas have enjoyed huge commercial
success and have a huge growth potential
Personalization is perhaps the next big wave
Various personalization techniques are available -
yet this is a very fertile research field
The two personalization application shown are just
examples of many possibilities.
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008
45
46. 46 Thank You – Questions?
Vasudeva Varma, IIIT Hyderabad
vv@iiit.ac.in or www.iiit.ac.in/~vasu
IR and IE Technologies and Personalization (C) Vasudeva Varma IIIT-H 5/30/2008