Exploring Session Context using Distributed Representations of Queries and Reformulations (SIGIR 2015)

Exploring Session Context using
Distributed Representations of
Queries and Reformulations
Bhaskar Mitra
Microsoft
(Paper: http://research.microsoft.com/apps/pubs/default.aspx?id=244728)

Intuitively, the following query reformulation (or intent shift)
is similar to
london things to do in london
new york new york tourist attractions

…and
is similar to
san francisco san francisco 49ers
detroit detroit lions

…but
is NOT similar to
movies new movies
york new york

Questions
• Can we learn intuitively “meaningful” vector representations for
query reformulations?
• Can we use it for modelling session context for tasks such as query
auto-completion (QAC)?

Session Context for QAC
muscle cars
f
facebook
fandango
forever 21
fox news
f
facebook
ford
ford mustang
fast and furious
or
Previous query

Session Context
What’s the more likely query after “big ben”?
Topical disambiguation. vs Transition likelihood
(symmetrical) (asymmetrical)
big ben
big ben height
london clock tower

Distributed Representation
A (low-dimensional) vector representation for items (e.g., words,
sentences, images, etc.) such that all the values in a vector are
necessary to determine the exact item.
Imaginary example:
Also called embeddings.
6 3 0 4 1 7 2 8

As opposed to…
One-hot representation scheme, where all except one of the values
of the vector are zeros.
Imaginary example:
0 1 0 0 0 0 0 1

For Neural Networks…
Localist Representations
• One neuron to represent each
item
• One-to-one relationship
• For few items / classes only
Distributed Representations
• Multiple neurons to represent
each item
• Many-to-many relationship
• For many items with shared
attributes

Vector Algebra on Word Embeddings
Word2vec linguistic regularities
vector(“king”) – vector(“man”) + vector(“woman”) = vector(“queen”)
T. Mikolov, et al. Efficient estimation of word representations in vector space. arXiv preprint, 2013.
T. Mikolov, et al. Distributed representations of words and phrases and their compositionality. NIPS, 2013.

Convolutional Latent Semantic Model
• DNN trained on clickthrough data
• Maximize cosine similarity
• Tri-gram hashing over raw terms
• Convolutional-Pooling structure
P.-S. Huang, et al. Learning deep structured semantic models for web search using clickthrough data. CIKM, 2013.
Y. Shen, et al. Learning semantic representations using convolutional neural networks for web search. WWW, 2014.

Main Contributions
• CLSM models trained on Session Pairs (SP)
• Demonstrate semantic regularities in the CLSM query embedding
space
• Leverage the regularities to explicitly represent query reformulations
as vectors
• Improved Mean Reciprocal Rank (MRR) for session context-aware
QAC ranking by more than 10% using CLSM based features

Training on Session Pairs
• Pairs of consecutive queries
from search sessions
• Pre-Query and Post-Query
model
• Symmetric vs. Asymmetric
models
q1 q2 q3 q4
Advantages
1. Demonstrates higher levels of
reformulation regularities
(discussed next)
2. Train on time-stamped query log,
no need for clickthrough data

Vector Algebra on Query Embeddings (SP)

Embeddings for Query Reformulations
Explicit vector representation
k-means clustering of 65K in-
session query pairs shows
intuitive clusters

Reformulation Likelihood
How crowded is the neighborhood of a reformulation in the
embedding space?

Session Context-Aware QAC
• Evaluation setup based on
• Temporally separated background, train, validation and test sets
• Sample queries and extract all possible prefixes
• Submitted query as ground truth
• Re-rank top N suggestion candidates using a LambdaMART model
• Two testbeds: search logs from AOL & Bing
M. Shokouhi. Learning to personalize query auto-completion. SIGIR, 2013.

Features
Non-contextual features
Prefix length, suggestion length, vowels-
alphabet ratio, contains numeric, etc.
N-gram similarity features
Character trigram similarity between
previous queries and suggestion candidate
Pairwise frequency feature
Pairwise frequency based on popular
sessions pairs in the background data
CLSM topical similarity features
CLSM similarity between previous
queries and suggestion candidate
CLSM reformulation features
Values along each dimension of the
reformulation vector based on previous
query and suggestion candidate

Results
LambdaMART-based baseline
performs better than
Popularity-based baseline

Results
The models with CLSM features
perform better than
the model without CLSM features

Results
Reformulation + Similarity features
perform better than
only Similarity features

Results
Session Pairs based CLSMs
perform better than
Query-Document based CLSM

Results
Session Pairs based CLSMs
mostly win due to
Reformulation based features

Effect of Prefix Length, History Length and
Vector Dimensions

Summary of Contributions
• CLSM models trained on Session Pairs (SP)
• Demonstrate semantic regularities in the CLSM query embedding
space
• Leverage the regularities to explicitly represent query reformulations
as vectors
• Improved Mean Reciprocal Rank (MRR) for session context-aware
QAC ranking by more than 10% using CLSM based features

Potential Future Work
• Studying search trails (White et. al.) in the embedding space
• Query change retrieval model (Guan et. al.) using reformulation
embeddings
• Generating user embeddings for search personalization
• Study how reformulations vary by user expertise and device

Exploring Session Context using Distributed Representations of Queries and Reformulations (SIGIR 2015)

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a Exploring Session Context using Distributed Representations of Queries and Reformulations (SIGIR 2015)

Similar a Exploring Session Context using Distributed Representations of Queries and Reformulations (SIGIR 2015) (20)

Más de Bhaskar Mitra

Más de Bhaskar Mitra (18)

Último

Último (20)

Exploring Session Context using Distributed Representations of Queries and Reformulations (SIGIR 2015)