Thesis Proposal Estimating Relevance from Search Click Logs

Thesis Proposal — Determining Relevance
Rankings with Search Click Logs

Inderjeet Singh

Supervisor: Dr. Carson Kai-Sang Leung

July 8, 2011

Abstract

Search engines track users’ search activities in search click logs.

These logs can be mined to get better insights into user behavior and

to make user behavior models. These models can be then used in a

ranking algorithm to give better, more focused and desirable results

to the user.

There are two problems with the existing models. First, researchers

have not considered trust bias while interpreting click logs. Trust bias

or trust factor is the preference the user gives to certain URLs that

he/she trusts. For example, users show preference for websites like

wikipedia.com, yahoo answers, stackoverﬂow.com and many others

1

because users trust the documents from these URLs. The trusted

websites can be different for people in different areas or niches. Thus,

trust bias is an important parameter to be considered while designing

a user behavior model and using it in a ranking algorithm. Second,

researchers have not considered user clicks on other parts of a search

page like advertisements while making their models. Interpreting these

clicks is important because advertisements are also a part of search

results and relevant advertisements help a user fulfill his information

needs.

I propose to extend the existing research to make a user behavior

model from search click logs that will overcome the above two problems

and then estimates the relevance of documents.

1 Introduction

Search engines are used to answer ad-hoc or specific queries at various times.
Queries can be of two types: navigational and informational. A navigational
query looks for specific information such as a single website, web page or a
single entity. An informational query looks for information about general or
niche topics.
Search engines rank search results in decreasing order of relevance to the
query. Search engines assign scores to the documents for the user query using
a ranking function, which is derived automatically from a ranking algorithm
using training data.

2

Training data is a collection of query-document pairs. Each query-document
pair in the training data is represented by a set of properties of both query
and document called features. Each query-document pair is labeled accord-
ing to its relevance under categories such as perfect, excellent, good, fair or
bad. These relevance labels are assigned by humans to each query-document
pair indicating how well the document matches the query. These human
judgments are known as editorial judgments. A good editorial judgement
is important because the quality of a ranking function depends upon the
quality of the training data used.
Ranking function efficiency depends upon two critical aspects: the con-
struction of training data with good relevance labels and the selection of
features in the feature set.
Usually, the user starts examining the result snippets (the combination
of a URL and a small description of the document) for a query from top to
bottom. The probability of a user examining a result snippet is called the
examination probability. When examining, the user may find some snippets
useful. This usefulness is the perceived relevance or attractiveness of the
snippet. Eventually, the user clicks on a snippet and lands on a document.
The information fulfillment that the user gets out of that document is the
actual relevance of the document. The probability that the user will click on a
result snippet is known as the click probability. The click probability is always
less than or equal to the examination probability. The click probability for
URLs on a results page decreases from top to bottom. This decrease is known

3

as position bias.
Search engines maintain logs of every user interaction. The log entries
have information about the queries, the results displayed for each query, the
number of results displayed, which results were clicked, user IP address and
timestamps. Generally, these logs are on the order of terabytes in size.
Mining click logs gives a better insight into how a user interacts with
the search engine. For example, a click can be interpreted as a vote for
that document for a particular query. So, the information about clicks and
user interaction can be used to make user behavior models that capture user
preferences. These models can be further used to estimate the document
relevance for better search results as described in Section 2.1. The model in
this context would be a set of equations and rules describing user interactions
or actions on the search page in terms of probability values.
Also, till date most of the relevance labels for training data are manually
assigned by editors (humans) who can be biased at times and may not nec-
essarily represent the aggregate behavior of search engine users. Manually
drafting training data is also a time consuming process. To overcome these
problems, training data must be automatically labeled; see Section 2.4.
I intend to make a user behavior model that is diﬀerent from existing
models in considering trust bias and clicks on other parts of a search page.
My model will closely capture user preferences and will make realistic and
ﬂexible assumptions on user behavior — for example, a user can do any of
the following: click a single result or multiple results, go to a document and

4

never come back to the results, or click advertisements on a search page.
This model will then be used to estimate the actual document relevance for
a query. The relevance estimates will then be added as a feature in training
data to compute a better ranking function.
To evaluate my model, I will ﬁrst compare the relevance estimates of doc-
uments from my model with editorial judgments and then with the relevance
estimates of earlier models; see Section 5. I will look for an improved ranking
function after adding the relevance estimates from my model as a feature in
the training data.

2 Related Work

Section 2.1 describes how to make user behavior models by interpreting click
logs and then use the document relevance estimated from these models as
a feature in the training data to get a new ranking function. My research
will follow this work closely and I will try to improve upon these models.
Section 2.2 describes trust bias modeling in online communities. My user
behavior model will consider trust bias to certain URLs while interpreting
document relevance from click logs. Section 2.3 describes modeling the rel-
evance of advertisements for queries using click logs. In my model, the rel-
evance of advertisements will be considered in the overall fulﬁllment of user
information needs. Section 2.4 describes a method to automatically estimate
relevance labels for query-document pairs in training data from click logs. I

5

will use this method to automatically-generate labels for the training data
which will also include feature from my proposed user behavior model.

2.1 Estimating Document Relevance from User Be-

havior Models

Dupret and Liao [4] designed a user behavior model that estimates the ac-
tual relevance of clicked documents and not the perceived relevance. Their
model focuses on the fulfillment that a user gets while browsing and click-
ing documents. Their main assumption is that a user stops searching when
his information need is fulfilled. Dupret and Liao did not limit the number
of clicks (single or multiple) or the number of query reformulations in their
model assumptions, which makes their model quite realistic. My model will
match their solution methodology in considering their assumptions in addi-
tion to my own of including trust factor and other parts of the search page.
I will consider two more ranking efficiency metrics in my evaluation over and
above what they have used; see Section 5.
Craswell et al. [3] designed a model that explained position bias from click
logs. They assumed that the user examines the result snippets sequentially
from top to bottom and that the user’s search ends as soon as he/she clicks
a relevant document for the query. This assumption is known as single-click
assumption. Their work also assumes that the user does not skip a result
snippet without examining it.

6

Chapelle and Zhang [2] developed a model that gives an unbiased esti-
mation of the actual relevance of a webpage, i.e., the model removes any
position bias. Chapelle and Zhang’s work extends the work of Craswell et
al. [3] with the assumption that the user will not stop searching until satisfied
with the information. They overrule Craswell et al.’s single-click assumption,
instead, assuming multiple clicks and query reformulations. Their work, how-
ever, does not consider anything about the other parts of a search page, like
sponsored results and related queries, which I am going to consider in my
work.
Dupret and Piwowarski [5] developed a model that differs from the work
of Craswell et al. [3] in the sense that the user can skip a document without
examining it. Their focus is more on attractiveness and perceived relevance
and they are only modeling single clicks. This model has a lot of assumptions,
which makes it limited for estimating actual user behavior.
Guo et al. [7] proposed independent and dependent click models for mod-
eling multiple clicks on a result page. The independent click model assumes
that the click probability is independent for different positions of results and
the examination probability is unity for every result. This model is only
successful in explaining that the user usually clicks on the first snippet on a
result page. The dependent click model extends the idea of Craswell et al. [3]
for multiple clicks. This model describes the interdependence between clicks
and examination at different positions. The dependent click model is good
at explaining the clicks on the first and the last snippet on the result page.

7

These two models can also be used with click log streams, i.e., a continuous
flow of data. They do not, however, consider trust bias and other elements
of a search page, which I will work on.

2.2 Trust Bias Modeling in Online Communities

Finin et al. [6] modeled trust bias or influence in online social communi-
ties. They discussed how a popular blog or website in an online community
can influence opinions of other blogs. Their model of trust bias in online
communities can be applied to URLs in my user behavior model.

2.3 Advertisement Relevance Prediction from Search

Click Logs

Raghavan and Hillard [8] proposed a model that improves the relevance of ad-
vertisements for a query in a search engine. The earlier models that ranked
advertisements for a query depended upon the number of clicks an adver-
tisement received, i.e., the advertisement got a better rank based upon the
number of clicks. Raghavan and Hillard’s model interprets click logs to es-
timate the actual relevance of an advertisement to the query and then rank
them. Their model is not based upon the number of clicks. I will use the
actual relevance of advertisements from Raghavan and Hillard applied to
overall fulfillment of user information need for a query in my model.

8

2.4 Automatically Estimating Relevance Labels for Train-

ing Data

Agrawal et al. [1] proposed a method that can be used to automatically
estimate relevance labels of query-document pairs from click logs. They
transformed user clicks into weighted, directed graphs and formulated the
label generation problem as an ordered graph partitioning problem. In full
generality, the problem of finding n labels in N P hard. Agrawal et al. showed
that optimal labeling of a query-document pair can be done in linear time
by using only two labels (relevant or non-relevant). They have proposed
heuristic solutions to automatically estimate efficient labels from click logs.
This automatically labeled training data can save humans from manually
defining labels for query-document pairs.

3 Problem Description

Previous user-behavior models for estimating the relevance of documents
from search click logs have not considered the trust bias. Also, while making
models, less consideration has been given to assumptions such as clicks on
other parts of a search page like advertisements. These assumptions closely
interpret flexible and realistic user behavior from click logs.

9

4 Solution Methodology

My proposed model will be an extension of the work of Dupret and Liao [4].
In addition to their assumptions described in Section 2.1, I will include my
own based on the trust bias and clicks on other parts of the search page,
especially advertisements.
I intend to make a model that will estimate the actual relevance of doc-
uments with respect to speciﬁc queries. My model will be a set of equations
and rules describing user interactions or actions on the search page in terms
of probability values. A user session is a set of actions that the user per-
forms on a search page to satisfy his information needs, like examining a
result snippet, clicking on a search result or advertisement, coming back and
again clicking some more results in decreasing ranking order, reformulating
the query or abandoning the search.
I will model trust bias in the form of probability equations just like any
other user interaction. Trust bias equations would come into play in my
model after the user has already examined the result snippet and ﬁnds that
snippet attractive. Now, the user clicks on snippet based upon his trust in
that URL. Thus, the click probability on a URL now depends on its exam-
ination probability, attractiveness and then the trust bias while in Dupret
and Liao’s model, click probability on a URL only depends on its examina-
tion probability and then it’s attractiveness. For instance, the examination
probability of a URL is e, it’s attractiveness probability is a, trust bias is t,

10

click probability is c, query is q, document is d, actual document relevance
is r and fulfillment probability is f , the click probability c on a particular
d for specific q will now depend upon the joint probability of e, a, and t
happening in sequence. Also, the actual document relevance r is dependent
on fulfillment probability f from the clicked document d.
Modeling clicks on other parts of a search page such as advertisements will
be done in the form of probability equations. The clicks on advertisements
are a form of user interaction on the search page. Advertisements also become
part of overall fulfillment of user’s information needs. I am still researching
on solution methodologies for modeling these clicks.
After making the model with above mentioned improvements, the esti-
mated relevance of documents for a query from my model will be combined
with the existing features of the training data to recompute a new ranking
function. The ranking obtained by the new function will be measured by
discounted cumulative gain metric, which is a metric to measure ranking ef-
fectiveness. If the metric improvement is significant when compared with the
results obtained by ranking function used in existing search engine for the
same query, then the document relevance from my model can be used as a
feature in training data.
My challenge will be to get search click logs data for a commercial search
engine. If I am unable to get such logs, I will try getting logs from some meta
search engine like metacrawler, dogpile or excite. If that is not possible, I
will implement my own meta search engine and then collect logs. If all fails,

11

I will be using previously-released logs from a search engine.

5 Evaluation

I will use the discounted cumulative gain and normalized discounted cumula-
tive gain to measure ranking effectiveness. I will also use precision and recall
as metrics in my evaluation. A correct result is a result retrieved by a search
engine that is relevant to a query. Precision is the ratio of correct results to
the results retrieved and recall is the ratio of the correct results to relevant
results for the query.
A raw search click-log will be pre-processed to remove duplicate queries
and noise. Noise here refers to the following queries: queries for which the
number of user sessions is less than 10, queries with less than 10 results
and queries with no clicks on snippets in a user session. Queries with no
clicks on snippets are removed because most of these queries are misspelled
or ambiguous. Only the first result page will be considered because most of
the clicks happen here. Position bias will be removed from the dataset using
editorial judgments for the results of a query.
After pre-processing, the dataset will be produced automatically by using
the methods described in Section 2.4. The dataset will then be split equally
into training and test datasets. The training dataset will be used to train the
ranking algorithm to generate a new ranking function, while the test dataset
will be used to measure how effectively the function now ranks the results

12

for a query by the user.
I will do a comparative analysis of the estimated relevance of results from
my model with the models of Dupret and Liao [4] and Guo et al. [7]. The
analysis will also compare the estimated document relevance from these three
models with the editorial judgments for the same query-document pairs. The
comparison will give an idea of how accurately these models are estimating
relevance. Results will be compared for both informational and navigational
queries.
If the estimated document relevance from my model matches to a consid-
erable extent with editorial judgments, then these relevance estimates will be
used as a feature in the training data. After this step, the ranking algorithm
will be trained on the above data to generate a new ranking function which
will be used to rank the test data. Rankings will also be generated for the
same test data by existing ranking function used by popular search engines.
I will thus calculate the discounted cumulative gain, normalized discounted
cumulative gain, precision, and recall. If these metrics show considerable
improvement, my model can be considered successful.

6 Timeline

13

Task Start Date End Date

Literature review Sept 2010 Ongoing

Designing the model May 2011 Sept 2011

Comparative analysis with existing Oct 2011 Dec 2011
models

Inclusion of relevance feature from my ﬁrst week of Jan last week of Jan
model in the training data 2012 2012

Evaluation of the new ranking function Feb 2012 last week of
for diﬀerent metric improvements April 2012

Thesis Write-up May 2012 mid July 2012

7 Summary

I want to make a model that can estimate a document’s actual relevance
from click logs, after modeling trust bias and clicks on the other parts of a
search page. This model will follow some of the assumptions and solution
methodology of Dupret and Liao [4]. If successful, this model can be used as
a feature in training data to improve the ranking function of a search engine.

References

[1] Rakesh Agrawal, Alan Halverson, Krishnaram Kenthapadi, Nina Mishra,
and Panayiotis Tsaparas. Generating labels from clicks. In Proceed-

14

ings of the Second Web Search and Data Mining (WSDM) Conference,
Barcelona, Spain, pages 172–181. ACM, 9–11 February 2009.

[2] Olivier Chapelle and Ye Zhang. A dynamic Bayesian network click model
for web search and ranking. In Proceedings of the 18th International
Conference on World Wide Web (WWW), Madrid, Spain, pages 1–10.
ACM, 20–24 April 2009.

[3] Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. An exper-
imental comparison of click position-bias models. In Proceedings of First
Web Search and Data Mining (WSDM) Conference, Palo Alto, CA, USA,
pages 87–94. ACM, 11–12 February 2008.

[4] Georges Dupret and Ciya Liao. A model to estimate intrinsic document
relevance from the clickthrough logs of a web search engine. In Proceedings
of Third Web Search and Data Mining (WSDM) Conference, New York
City, NY, USA, pages 181–190. ACM, 4–6 February 2010.

[5] Georges Dupret and Benjamin Piwowarski. A user browsing model to
predict search engine click data from past observations. In Proceedings of
the 31st Annual International ACM SIGIR Conference on Research and
Development in Information Retrieval, Singapore, pages 331–338. ACM,
20–24 July 2008.

15

[6] Tim Finin, Anupam Joshi, Pranam Kolari, Akshay Java, Anubhav Kale,
and Amit Karandikar. The information ecology of social media and online
communities. AI Magazine, 29(3):77–92, 2008.

[7] Fan Guo, Chao Liu, and Yi Min Wang. Eﬃcient multiple-click models
in web search. In Proceedings of Second Web Search and Data Min-
ing (WSDM) Conference, Barcelona, Spain, pages 124–131. ACM, 9–11
February 2009.

[8] Hema Raghavan and Dustin Hillard. A relevance model based ﬁlter for
improving ad quality. In Proceedings of the 32nd International ACM SI-
GIR Conference on Research and Development in Information Retrieval,
Boston, MA, USA, pages 762–763. ACM, 19-23 July 2009.

16

Thesis Proposal Estimating Relevance from Search Click Logs

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (6)

Similar a Thesis Proposal Estimating Relevance from Search Click Logs

Similar a Thesis Proposal Estimating Relevance from Search Click Logs (20)

Último

Último (20)

Thesis Proposal Estimating Relevance from Search Click Logs