SlideShare una empresa de Scribd logo
1 de 16
Descargar para leer sin conexión
Thesis Proposal — Determining Relevance
       Rankings with Search Click Logs

                          Inderjeet Singh



          Supervisor: Dr. Carson Kai-Sang Leung

                            July 8, 2011


                               Abstract

    Search engines track users’ search activities in search click logs.

 These logs can be mined to get better insights into user behavior and

 to make user behavior models. These models can be then used in a

 ranking algorithm to give better, more focused and desirable results

 to the user.

    There are two problems with the existing models. First, researchers

 have not considered trust bias while interpreting click logs. Trust bias

 or trust factor is the preference the user gives to certain URLs that

 he/she trusts. For example, users show preference for websites like

 wikipedia.com, yahoo answers, stackoverflow.com and many others


                                    1
because users trust the documents from these URLs. The trusted

      websites can be different for people in different areas or niches. Thus,

      trust bias is an important parameter to be considered while designing

      a user behavior model and using it in a ranking algorithm. Second,

      researchers have not considered user clicks on other parts of a search

      page like advertisements while making their models. Interpreting these

      clicks is important because advertisements are also a part of search

      results and relevant advertisements help a user fulfill his information

      needs.

         I propose to extend the existing research to make a user behavior

      model from search click logs that will overcome the above two problems

      and then estimates the relevance of documents.



1     Introduction

Search engines are used to answer ad-hoc or specific queries at various times.
Queries can be of two types: navigational and informational. A navigational
query looks for specific information such as a single website, web page or a
single entity. An informational query looks for information about general or
niche topics.
    Search engines rank search results in decreasing order of relevance to the
query. Search engines assign scores to the documents for the user query using
a ranking function, which is derived automatically from a ranking algorithm
using training data.


                                        2
Training data is a collection of query-document pairs. Each query-document
pair in the training data is represented by a set of properties of both query
and document called features. Each query-document pair is labeled accord-
ing to its relevance under categories such as perfect, excellent, good, fair or
bad. These relevance labels are assigned by humans to each query-document
pair indicating how well the document matches the query. These human
judgments are known as editorial judgments. A good editorial judgement
is important because the quality of a ranking function depends upon the
quality of the training data used.
   Ranking function efficiency depends upon two critical aspects: the con-
struction of training data with good relevance labels and the selection of
features in the feature set.
   Usually, the user starts examining the result snippets (the combination
of a URL and a small description of the document) for a query from top to
bottom. The probability of a user examining a result snippet is called the
examination probability. When examining, the user may find some snippets
useful. This usefulness is the perceived relevance or attractiveness of the
snippet. Eventually, the user clicks on a snippet and lands on a document.
The information fulfillment that the user gets out of that document is the
actual relevance of the document. The probability that the user will click on a
result snippet is known as the click probability. The click probability is always
less than or equal to the examination probability. The click probability for
URLs on a results page decreases from top to bottom. This decrease is known


                                       3
as position bias.
   Search engines maintain logs of every user interaction. The log entries
have information about the queries, the results displayed for each query, the
number of results displayed, which results were clicked, user IP address and
timestamps. Generally, these logs are on the order of terabytes in size.
   Mining click logs gives a better insight into how a user interacts with
the search engine. For example, a click can be interpreted as a vote for
that document for a particular query. So, the information about clicks and
user interaction can be used to make user behavior models that capture user
preferences. These models can be further used to estimate the document
relevance for better search results as described in Section 2.1. The model in
this context would be a set of equations and rules describing user interactions
or actions on the search page in terms of probability values.
   Also, till date most of the relevance labels for training data are manually
assigned by editors (humans) who can be biased at times and may not nec-
essarily represent the aggregate behavior of search engine users. Manually
drafting training data is also a time consuming process. To overcome these
problems, training data must be automatically labeled; see Section 2.4.
   I intend to make a user behavior model that is different from existing
models in considering trust bias and clicks on other parts of a search page.
My model will closely capture user preferences and will make realistic and
flexible assumptions on user behavior — for example, a user can do any of
the following: click a single result or multiple results, go to a document and


                                      4
never come back to the results, or click advertisements on a search page.
This model will then be used to estimate the actual document relevance for
a query. The relevance estimates will then be added as a feature in training
data to compute a better ranking function.
    To evaluate my model, I will first compare the relevance estimates of doc-
uments from my model with editorial judgments and then with the relevance
estimates of earlier models; see Section 5. I will look for an improved ranking
function after adding the relevance estimates from my model as a feature in
the training data.



2     Related Work

Section 2.1 describes how to make user behavior models by interpreting click
logs and then use the document relevance estimated from these models as
a feature in the training data to get a new ranking function. My research
will follow this work closely and I will try to improve upon these models.
Section 2.2 describes trust bias modeling in online communities. My user
behavior model will consider trust bias to certain URLs while interpreting
document relevance from click logs. Section 2.3 describes modeling the rel-
evance of advertisements for queries using click logs. In my model, the rel-
evance of advertisements will be considered in the overall fulfillment of user
information needs. Section 2.4 describes a method to automatically estimate
relevance labels for query-document pairs in training data from click logs. I


                                      5
will use this method to automatically-generate labels for the training data
which will also include feature from my proposed user behavior model.


2.1    Estimating Document Relevance from User Be-

       havior Models

Dupret and Liao [4] designed a user behavior model that estimates the ac-
tual relevance of clicked documents and not the perceived relevance. Their
model focuses on the fulfillment that a user gets while browsing and click-
ing documents. Their main assumption is that a user stops searching when
his information need is fulfilled. Dupret and Liao did not limit the number
of clicks (single or multiple) or the number of query reformulations in their
model assumptions, which makes their model quite realistic. My model will
match their solution methodology in considering their assumptions in addi-
tion to my own of including trust factor and other parts of the search page.
I will consider two more ranking efficiency metrics in my evaluation over and
above what they have used; see Section 5.
   Craswell et al. [3] designed a model that explained position bias from click
logs. They assumed that the user examines the result snippets sequentially
from top to bottom and that the user’s search ends as soon as he/she clicks
a relevant document for the query. This assumption is known as single-click
assumption. Their work also assumes that the user does not skip a result
snippet without examining it.



                                      6
Chapelle and Zhang [2] developed a model that gives an unbiased esti-
mation of the actual relevance of a webpage, i.e., the model removes any
position bias. Chapelle and Zhang’s work extends the work of Craswell et
al. [3] with the assumption that the user will not stop searching until satisfied
with the information. They overrule Craswell et al.’s single-click assumption,
instead, assuming multiple clicks and query reformulations. Their work, how-
ever, does not consider anything about the other parts of a search page, like
sponsored results and related queries, which I am going to consider in my
work.
   Dupret and Piwowarski [5] developed a model that differs from the work
of Craswell et al. [3] in the sense that the user can skip a document without
examining it. Their focus is more on attractiveness and perceived relevance
and they are only modeling single clicks. This model has a lot of assumptions,
which makes it limited for estimating actual user behavior.
   Guo et al. [7] proposed independent and dependent click models for mod-
eling multiple clicks on a result page. The independent click model assumes
that the click probability is independent for different positions of results and
the examination probability is unity for every result. This model is only
successful in explaining that the user usually clicks on the first snippet on a
result page. The dependent click model extends the idea of Craswell et al. [3]
for multiple clicks. This model describes the interdependence between clicks
and examination at different positions. The dependent click model is good
at explaining the clicks on the first and the last snippet on the result page.


                                       7
These two models can also be used with click log streams, i.e., a continuous
flow of data. They do not, however, consider trust bias and other elements
of a search page, which I will work on.


2.2    Trust Bias Modeling in Online Communities

Finin et al. [6] modeled trust bias or influence in online social communi-
ties. They discussed how a popular blog or website in an online community
can influence opinions of other blogs. Their model of trust bias in online
communities can be applied to URLs in my user behavior model.


2.3    Advertisement Relevance Prediction from Search

       Click Logs

Raghavan and Hillard [8] proposed a model that improves the relevance of ad-
vertisements for a query in a search engine. The earlier models that ranked
advertisements for a query depended upon the number of clicks an adver-
tisement received, i.e., the advertisement got a better rank based upon the
number of clicks. Raghavan and Hillard’s model interprets click logs to es-
timate the actual relevance of an advertisement to the query and then rank
them. Their model is not based upon the number of clicks. I will use the
actual relevance of advertisements from Raghavan and Hillard applied to
overall fulfillment of user information need for a query in my model.




                                     8
2.4    Automatically Estimating Relevance Labels for Train-

       ing Data

Agrawal et al. [1] proposed a method that can be used to automatically
estimate relevance labels of query-document pairs from click logs. They
transformed user clicks into weighted, directed graphs and formulated the
label generation problem as an ordered graph partitioning problem. In full
generality, the problem of finding n labels in N P hard. Agrawal et al. showed
that optimal labeling of a query-document pair can be done in linear time
by using only two labels (relevant or non-relevant). They have proposed
heuristic solutions to automatically estimate efficient labels from click logs.
This automatically labeled training data can save humans from manually
defining labels for query-document pairs.



3     Problem Description

Previous user-behavior models for estimating the relevance of documents
from search click logs have not considered the trust bias. Also, while making
models, less consideration has been given to assumptions such as clicks on
other parts of a search page like advertisements. These assumptions closely
interpret flexible and realistic user behavior from click logs.




                                      9
4     Solution Methodology

My proposed model will be an extension of the work of Dupret and Liao [4].
In addition to their assumptions described in Section 2.1, I will include my
own based on the trust bias and clicks on other parts of the search page,
especially advertisements.
    I intend to make a model that will estimate the actual relevance of doc-
uments with respect to specific queries. My model will be a set of equations
and rules describing user interactions or actions on the search page in terms
of probability values. A user session is a set of actions that the user per-
forms on a search page to satisfy his information needs, like examining a
result snippet, clicking on a search result or advertisement, coming back and
again clicking some more results in decreasing ranking order, reformulating
the query or abandoning the search.
    I will model trust bias in the form of probability equations just like any
other user interaction. Trust bias equations would come into play in my
model after the user has already examined the result snippet and finds that
snippet attractive. Now, the user clicks on snippet based upon his trust in
that URL. Thus, the click probability on a URL now depends on its exam-
ination probability, attractiveness and then the trust bias while in Dupret
and Liao’s model, click probability on a URL only depends on its examina-
tion probability and then it’s attractiveness. For instance, the examination
probability of a URL is e, it’s attractiveness probability is a, trust bias is t,



                                       10
click probability is c, query is q, document is d, actual document relevance
is r and fulfillment probability is f , the click probability c on a particular
d for specific q will now depend upon the joint probability of e, a, and t
happening in sequence. Also, the actual document relevance r is dependent
on fulfillment probability f from the clicked document d.
   Modeling clicks on other parts of a search page such as advertisements will
be done in the form of probability equations. The clicks on advertisements
are a form of user interaction on the search page. Advertisements also become
part of overall fulfillment of user’s information needs. I am still researching
on solution methodologies for modeling these clicks.
   After making the model with above mentioned improvements, the esti-
mated relevance of documents for a query from my model will be combined
with the existing features of the training data to recompute a new ranking
function. The ranking obtained by the new function will be measured by
discounted cumulative gain metric, which is a metric to measure ranking ef-
fectiveness. If the metric improvement is significant when compared with the
results obtained by ranking function used in existing search engine for the
same query, then the document relevance from my model can be used as a
feature in training data.
   My challenge will be to get search click logs data for a commercial search
engine. If I am unable to get such logs, I will try getting logs from some meta
search engine like metacrawler, dogpile or excite. If that is not possible, I
will implement my own meta search engine and then collect logs. If all fails,


                                      11
I will be using previously-released logs from a search engine.



5     Evaluation

I will use the discounted cumulative gain and normalized discounted cumula-
tive gain to measure ranking effectiveness. I will also use precision and recall
as metrics in my evaluation. A correct result is a result retrieved by a search
engine that is relevant to a query. Precision is the ratio of correct results to
the results retrieved and recall is the ratio of the correct results to relevant
results for the query.
    A raw search click-log will be pre-processed to remove duplicate queries
and noise. Noise here refers to the following queries: queries for which the
number of user sessions is less than 10, queries with less than 10 results
and queries with no clicks on snippets in a user session. Queries with no
clicks on snippets are removed because most of these queries are misspelled
or ambiguous. Only the first result page will be considered because most of
the clicks happen here. Position bias will be removed from the dataset using
editorial judgments for the results of a query.
    After pre-processing, the dataset will be produced automatically by using
the methods described in Section 2.4. The dataset will then be split equally
into training and test datasets. The training dataset will be used to train the
ranking algorithm to generate a new ranking function, while the test dataset
will be used to measure how effectively the function now ranks the results


                                      12
for a query by the user.
    I will do a comparative analysis of the estimated relevance of results from
my model with the models of Dupret and Liao [4] and Guo et al. [7]. The
analysis will also compare the estimated document relevance from these three
models with the editorial judgments for the same query-document pairs. The
comparison will give an idea of how accurately these models are estimating
relevance. Results will be compared for both informational and navigational
queries.
    If the estimated document relevance from my model matches to a consid-
erable extent with editorial judgments, then these relevance estimates will be
used as a feature in the training data. After this step, the ranking algorithm
will be trained on the above data to generate a new ranking function which
will be used to rank the test data. Rankings will also be generated for the
same test data by existing ranking function used by popular search engines.
I will thus calculate the discounted cumulative gain, normalized discounted
cumulative gain, precision, and recall. If these metrics show considerable
improvement, my model can be considered successful.



6     Timeline




                                      13
Task                                      Start Date         End Date

 Literature review                         Sept 2010          Ongoing

 Designing the model                       May 2011           Sept 2011

 Comparative analysis with existing Oct 2011                  Dec 2011
 models

 Inclusion of relevance feature from my    first week of Jan last week of Jan
 model in the training data                2012               2012

 Evaluation of the new ranking function Feb 2012              last   week        of
 for different metric improvements                             April 2012

 Thesis Write-up                           May 2012           mid July 2012


7    Summary

I want to make a model that can estimate a document’s actual relevance
from click logs, after modeling trust bias and clicks on the other parts of a
search page. This model will follow some of the assumptions and solution
methodology of Dupret and Liao [4]. If successful, this model can be used as
a feature in training data to improve the ranking function of a search engine.



References

[1] Rakesh Agrawal, Alan Halverson, Krishnaram Kenthapadi, Nina Mishra,
    and Panayiotis Tsaparas. Generating labels from clicks. In Proceed-



                                     14
ings of the Second Web Search and Data Mining (WSDM) Conference,
   Barcelona, Spain, pages 172–181. ACM, 9–11 February 2009.

[2] Olivier Chapelle and Ye Zhang. A dynamic Bayesian network click model
   for web search and ranking. In Proceedings of the 18th International
   Conference on World Wide Web (WWW), Madrid, Spain, pages 1–10.
   ACM, 20–24 April 2009.

[3] Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. An exper-
   imental comparison of click position-bias models. In Proceedings of First
   Web Search and Data Mining (WSDM) Conference, Palo Alto, CA, USA,
   pages 87–94. ACM, 11–12 February 2008.

[4] Georges Dupret and Ciya Liao. A model to estimate intrinsic document
   relevance from the clickthrough logs of a web search engine. In Proceedings
   of Third Web Search and Data Mining (WSDM) Conference, New York
   City, NY, USA, pages 181–190. ACM, 4–6 February 2010.

[5] Georges Dupret and Benjamin Piwowarski. A user browsing model to
   predict search engine click data from past observations. In Proceedings of
   the 31st Annual International ACM SIGIR Conference on Research and
   Development in Information Retrieval, Singapore, pages 331–338. ACM,
   20–24 July 2008.




                                     15
[6] Tim Finin, Anupam Joshi, Pranam Kolari, Akshay Java, Anubhav Kale,
   and Amit Karandikar. The information ecology of social media and online
   communities. AI Magazine, 29(3):77–92, 2008.

[7] Fan Guo, Chao Liu, and Yi Min Wang. Efficient multiple-click models
   in web search. In Proceedings of Second Web Search and Data Min-
   ing (WSDM) Conference, Barcelona, Spain, pages 124–131. ACM, 9–11
   February 2009.

[8] Hema Raghavan and Dustin Hillard. A relevance model based filter for
   improving ad quality. In Proceedings of the 32nd International ACM SI-
   GIR Conference on Research and Development in Information Retrieval,
   Boston, MA, USA, pages 762–763. ACM, 19-23 July 2009.




                                   16

Más contenido relacionado

La actualidad más candente

Query- And User-Dependent Approach for Ranking Query Results in Web Databases
Query- And User-Dependent Approach for Ranking Query  Results in Web DatabasesQuery- And User-Dependent Approach for Ranking Query  Results in Web Databases
Query- And User-Dependent Approach for Ranking Query Results in Web DatabasesIOSR Journals
 
IJRET : International Journal of Research in Engineering and TechnologyImprov...
IJRET : International Journal of Research in Engineering and TechnologyImprov...IJRET : International Journal of Research in Engineering and TechnologyImprov...
IJRET : International Journal of Research in Engineering and TechnologyImprov...eSAT Publishing House
 
Structural Balance Theory Based Recommendation for Social Service Portal
Structural Balance Theory Based Recommendation for Social Service PortalStructural Balance Theory Based Recommendation for Social Service Portal
Structural Balance Theory Based Recommendation for Social Service PortalYogeshIJTSRD
 
A LOCATION-BASED RECOMMENDER SYSTEM FRAMEWORK TO IMPROVE ACCURACY IN USERBASE...
A LOCATION-BASED RECOMMENDER SYSTEM FRAMEWORK TO IMPROVE ACCURACY IN USERBASE...A LOCATION-BASED RECOMMENDER SYSTEM FRAMEWORK TO IMPROVE ACCURACY IN USERBASE...
A LOCATION-BASED RECOMMENDER SYSTEM FRAMEWORK TO IMPROVE ACCURACY IN USERBASE...ijcsa
 
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...inventionjournals
 
IRJET- E-Commerce Recommendation based on Users Rating Data
IRJET-  	  E-Commerce Recommendation based on Users Rating DataIRJET-  	  E-Commerce Recommendation based on Users Rating Data
IRJET- E-Commerce Recommendation based on Users Rating DataIRJET Journal
 
FIND MY VENUE: Content & Review Based Location Recommendation System
FIND MY VENUE: Content & Review Based Location Recommendation SystemFIND MY VENUE: Content & Review Based Location Recommendation System
FIND MY VENUE: Content & Review Based Location Recommendation SystemIJTET Journal
 
Protect Social Connection Using Privacy Predictive Algorithm
Protect Social Connection Using Privacy Predictive AlgorithmProtect Social Connection Using Privacy Predictive Algorithm
Protect Social Connection Using Privacy Predictive AlgorithmIRJET Journal
 
FHCC: A SOFT HIERARCHICAL CLUSTERING APPROACH FOR COLLABORATIVE FILTERING REC...
FHCC: A SOFT HIERARCHICAL CLUSTERING APPROACH FOR COLLABORATIVE FILTERING REC...FHCC: A SOFT HIERARCHICAL CLUSTERING APPROACH FOR COLLABORATIVE FILTERING REC...
FHCC: A SOFT HIERARCHICAL CLUSTERING APPROACH FOR COLLABORATIVE FILTERING REC...IJDKP
 
Custom-Made Ranking in Databases Establishing and Utilizing an Appropriate Wo...
Custom-Made Ranking in Databases Establishing and Utilizing an Appropriate Wo...Custom-Made Ranking in Databases Establishing and Utilizing an Appropriate Wo...
Custom-Made Ranking in Databases Establishing and Utilizing an Appropriate Wo...ijsrd.com
 
A Hybrid Approach for Personalized Recommender System Using Weighted TFIDF on...
A Hybrid Approach for Personalized Recommender System Using Weighted TFIDF on...A Hybrid Approach for Personalized Recommender System Using Weighted TFIDF on...
A Hybrid Approach for Personalized Recommender System Using Weighted TFIDF on...Editor IJCATR
 
Recommender systems in indian e-commerce context
Recommender systems in indian e-commerce contextRecommender systems in indian e-commerce context
Recommender systems in indian e-commerce contextAjit Bhingarkar
 
SIMILARITY MEASURES FOR RECOMMENDER SYSTEMS: A COMPARATIVE STUDY
SIMILARITY MEASURES FOR RECOMMENDER SYSTEMS: A COMPARATIVE STUDYSIMILARITY MEASURES FOR RECOMMENDER SYSTEMS: A COMPARATIVE STUDY
SIMILARITY MEASURES FOR RECOMMENDER SYSTEMS: A COMPARATIVE STUDYJournal For Research
 
IRJET- Personalize Travel Recommandation based on Facebook Data
IRJET- Personalize Travel Recommandation based on Facebook DataIRJET- Personalize Travel Recommandation based on Facebook Data
IRJET- Personalize Travel Recommandation based on Facebook DataIRJET Journal
 
Recommending the Appropriate Products for target user in E-commerce using SBT...
Recommending the Appropriate Products for target user in E-commerce using SBT...Recommending the Appropriate Products for target user in E-commerce using SBT...
Recommending the Appropriate Products for target user in E-commerce using SBT...IRJET Journal
 
An Efficient Trust Evaluation using Fact-Finder Technique
An Efficient Trust Evaluation using Fact-Finder TechniqueAn Efficient Trust Evaluation using Fact-Finder Technique
An Efficient Trust Evaluation using Fact-Finder TechniqueIJCSIS Research Publications
 
Search Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignSearch Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignMarianne Sweeny
 
Adaptive Search Based On User Tags in Social Networking
Adaptive Search Based On User Tags in Social NetworkingAdaptive Search Based On User Tags in Social Networking
Adaptive Search Based On User Tags in Social NetworkingIOSR Journals
 

La actualidad más candente (20)

Query- And User-Dependent Approach for Ranking Query Results in Web Databases
Query- And User-Dependent Approach for Ranking Query  Results in Web DatabasesQuery- And User-Dependent Approach for Ranking Query  Results in Web Databases
Query- And User-Dependent Approach for Ranking Query Results in Web Databases
 
IJRET : International Journal of Research in Engineering and TechnologyImprov...
IJRET : International Journal of Research in Engineering and TechnologyImprov...IJRET : International Journal of Research in Engineering and TechnologyImprov...
IJRET : International Journal of Research in Engineering and TechnologyImprov...
 
Structural Balance Theory Based Recommendation for Social Service Portal
Structural Balance Theory Based Recommendation for Social Service PortalStructural Balance Theory Based Recommendation for Social Service Portal
Structural Balance Theory Based Recommendation for Social Service Portal
 
A LOCATION-BASED RECOMMENDER SYSTEM FRAMEWORK TO IMPROVE ACCURACY IN USERBASE...
A LOCATION-BASED RECOMMENDER SYSTEM FRAMEWORK TO IMPROVE ACCURACY IN USERBASE...A LOCATION-BASED RECOMMENDER SYSTEM FRAMEWORK TO IMPROVE ACCURACY IN USERBASE...
A LOCATION-BASED RECOMMENDER SYSTEM FRAMEWORK TO IMPROVE ACCURACY IN USERBASE...
 
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
Enhanced Web Usage Mining Using Fuzzy Clustering and Collaborative Filtering ...
 
IRJET- E-Commerce Recommendation based on Users Rating Data
IRJET-  	  E-Commerce Recommendation based on Users Rating DataIRJET-  	  E-Commerce Recommendation based on Users Rating Data
IRJET- E-Commerce Recommendation based on Users Rating Data
 
FIND MY VENUE: Content & Review Based Location Recommendation System
FIND MY VENUE: Content & Review Based Location Recommendation SystemFIND MY VENUE: Content & Review Based Location Recommendation System
FIND MY VENUE: Content & Review Based Location Recommendation System
 
Protect Social Connection Using Privacy Predictive Algorithm
Protect Social Connection Using Privacy Predictive AlgorithmProtect Social Connection Using Privacy Predictive Algorithm
Protect Social Connection Using Privacy Predictive Algorithm
 
FHCC: A SOFT HIERARCHICAL CLUSTERING APPROACH FOR COLLABORATIVE FILTERING REC...
FHCC: A SOFT HIERARCHICAL CLUSTERING APPROACH FOR COLLABORATIVE FILTERING REC...FHCC: A SOFT HIERARCHICAL CLUSTERING APPROACH FOR COLLABORATIVE FILTERING REC...
FHCC: A SOFT HIERARCHICAL CLUSTERING APPROACH FOR COLLABORATIVE FILTERING REC...
 
Custom-Made Ranking in Databases Establishing and Utilizing an Appropriate Wo...
Custom-Made Ranking in Databases Establishing and Utilizing an Appropriate Wo...Custom-Made Ranking in Databases Establishing and Utilizing an Appropriate Wo...
Custom-Made Ranking in Databases Establishing and Utilizing an Appropriate Wo...
 
A Hybrid Approach for Personalized Recommender System Using Weighted TFIDF on...
A Hybrid Approach for Personalized Recommender System Using Weighted TFIDF on...A Hybrid Approach for Personalized Recommender System Using Weighted TFIDF on...
A Hybrid Approach for Personalized Recommender System Using Weighted TFIDF on...
 
K1803057782
K1803057782K1803057782
K1803057782
 
Recommender systems in indian e-commerce context
Recommender systems in indian e-commerce contextRecommender systems in indian e-commerce context
Recommender systems in indian e-commerce context
 
SIMILARITY MEASURES FOR RECOMMENDER SYSTEMS: A COMPARATIVE STUDY
SIMILARITY MEASURES FOR RECOMMENDER SYSTEMS: A COMPARATIVE STUDYSIMILARITY MEASURES FOR RECOMMENDER SYSTEMS: A COMPARATIVE STUDY
SIMILARITY MEASURES FOR RECOMMENDER SYSTEMS: A COMPARATIVE STUDY
 
IRJET- Personalize Travel Recommandation based on Facebook Data
IRJET- Personalize Travel Recommandation based on Facebook DataIRJET- Personalize Travel Recommandation based on Facebook Data
IRJET- Personalize Travel Recommandation based on Facebook Data
 
Recommending the Appropriate Products for target user in E-commerce using SBT...
Recommending the Appropriate Products for target user in E-commerce using SBT...Recommending the Appropriate Products for target user in E-commerce using SBT...
Recommending the Appropriate Products for target user in E-commerce using SBT...
 
An Efficient Trust Evaluation using Fact-Finder Technique
An Efficient Trust Evaluation using Fact-Finder TechniqueAn Efficient Trust Evaluation using Fact-Finder Technique
An Efficient Trust Evaluation using Fact-Finder Technique
 
Search Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By DesignSearch Solutions 2011: Successful Enterprise Search By Design
Search Solutions 2011: Successful Enterprise Search By Design
 
Adaptive Search Based On User Tags in Social Networking
Adaptive Search Based On User Tags in Social NetworkingAdaptive Search Based On User Tags in Social Networking
Adaptive Search Based On User Tags in Social Networking
 
pedersen
pedersenpedersen
pedersen
 

Destacado

Measuring Search Engine Quality using Spark and Python
Measuring Search Engine Quality using Spark and PythonMeasuring Search Engine Quality using Spark and Python
Measuring Search Engine Quality using Spark and PythonSujit Pal
 
HPC with Clouds and Cloud Technologies
HPC with Clouds and Cloud TechnologiesHPC with Clouds and Cloud Technologies
HPC with Clouds and Cloud TechnologiesInderjeet Singh
 
All Pair Shortest Path Algorithm – Parallel Implementation and Analysis
All Pair Shortest Path Algorithm – Parallel Implementation and AnalysisAll Pair Shortest Path Algorithm – Parallel Implementation and Analysis
All Pair Shortest Path Algorithm – Parallel Implementation and AnalysisInderjeet Singh
 
Neural Network Classification and its Applications in Insurance Industry
Neural Network Classification and its Applications in Insurance IndustryNeural Network Classification and its Applications in Insurance Industry
Neural Network Classification and its Applications in Insurance IndustryInderjeet Singh
 
Neural Network Classification and its Applications in Insurance Industry
Neural Network Classification and its Applications in Insurance IndustryNeural Network Classification and its Applications in Insurance Industry
Neural Network Classification and its Applications in Insurance IndustryInderjeet Singh
 

Destacado (6)

Measuring Search Engine Quality using Spark and Python
Measuring Search Engine Quality using Spark and PythonMeasuring Search Engine Quality using Spark and Python
Measuring Search Engine Quality using Spark and Python
 
HPC with Clouds and Cloud Technologies
HPC with Clouds and Cloud TechnologiesHPC with Clouds and Cloud Technologies
HPC with Clouds and Cloud Technologies
 
Project
ProjectProject
Project
 
All Pair Shortest Path Algorithm – Parallel Implementation and Analysis
All Pair Shortest Path Algorithm – Parallel Implementation and AnalysisAll Pair Shortest Path Algorithm – Parallel Implementation and Analysis
All Pair Shortest Path Algorithm – Parallel Implementation and Analysis
 
Neural Network Classification and its Applications in Insurance Industry
Neural Network Classification and its Applications in Insurance IndustryNeural Network Classification and its Applications in Insurance Industry
Neural Network Classification and its Applications in Insurance Industry
 
Neural Network Classification and its Applications in Insurance Industry
Neural Network Classification and its Applications in Insurance IndustryNeural Network Classification and its Applications in Insurance Industry
Neural Network Classification and its Applications in Insurance Industry
 

Similar a Thesis Proposal Estimating Relevance from Search Click Logs

User search goal inference and feedback session using fast generalized – fuzz...
User search goal inference and feedback session using fast generalized – fuzz...User search goal inference and feedback session using fast generalized – fuzz...
User search goal inference and feedback session using fast generalized – fuzz...eSAT Publishing House
 
Efficient way of user search location in query processing
Efficient way of user search location in query processingEfficient way of user search location in query processing
Efficient way of user search location in query processingeSAT Publishing House
 
Vol 12 No 1 - April 2014
Vol 12 No 1 - April 2014Vol 12 No 1 - April 2014
Vol 12 No 1 - April 2014ijcsbi
 
Personalized web search using browsing history and domain knowledge
Personalized web search using browsing history and domain knowledgePersonalized web search using browsing history and domain knowledge
Personalized web search using browsing history and domain knowledgeRishikesh Pathak
 
IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...
IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...
IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...IRJET Journal
 
Fuzzy Logic Based Recommender System
Fuzzy Logic Based Recommender SystemFuzzy Logic Based Recommender System
Fuzzy Logic Based Recommender SystemRSIS International
 
USER PROFILE BASED PERSONALIZED WEB SEARCH
USER PROFILE BASED PERSONALIZED WEB SEARCHUSER PROFILE BASED PERSONALIZED WEB SEARCH
USER PROFILE BASED PERSONALIZED WEB SEARCHijmpict
 
CONTENT AND USER CLICK BASED PAGE RANKING FOR IMPROVED WEB INFORMATION RETRIEVAL
CONTENT AND USER CLICK BASED PAGE RANKING FOR IMPROVED WEB INFORMATION RETRIEVALCONTENT AND USER CLICK BASED PAGE RANKING FOR IMPROVED WEB INFORMATION RETRIEVAL
CONTENT AND USER CLICK BASED PAGE RANKING FOR IMPROVED WEB INFORMATION RETRIEVALijcsa
 
A New Algorithm for Inferring User Search Goals with Feedback Sessions
A New Algorithm for Inferring User Search Goals with Feedback SessionsA New Algorithm for Inferring User Search Goals with Feedback Sessions
A New Algorithm for Inferring User Search Goals with Feedback SessionsIJERA Editor
 
TWO WAY CHAINED PACKETS MARKING TECHNIQUE FOR SECURE COMMUNICATION IN WIRELES...
TWO WAY CHAINED PACKETS MARKING TECHNIQUE FOR SECURE COMMUNICATION IN WIRELES...TWO WAY CHAINED PACKETS MARKING TECHNIQUE FOR SECURE COMMUNICATION IN WIRELES...
TWO WAY CHAINED PACKETS MARKING TECHNIQUE FOR SECURE COMMUNICATION IN WIRELES...pharmaindexing
 
Context Based Classification of Reviews Using Association Rule Mining, Fuzzy ...
Context Based Classification of Reviews Using Association Rule Mining, Fuzzy ...Context Based Classification of Reviews Using Association Rule Mining, Fuzzy ...
Context Based Classification of Reviews Using Association Rule Mining, Fuzzy ...journalBEEI
 
Amazon research memo
Amazon research memoAmazon research memo
Amazon research memoBrett Combs
 
Quest Trail: An Effective Approach for Construction of Personalized Search En...
Quest Trail: An Effective Approach for Construction of Personalized Search En...Quest Trail: An Effective Approach for Construction of Personalized Search En...
Quest Trail: An Effective Approach for Construction of Personalized Search En...Editor IJCATR
 
Ontological and clustering approach for content based recommendation systems
Ontological and clustering approach for content based recommendation systemsOntological and clustering approach for content based recommendation systems
Ontological and clustering approach for content based recommendation systemsvikramadityajakkula
 
Personalization of the Web Search
Personalization of the Web SearchPersonalization of the Web Search
Personalization of the Web SearchIJMER
 
APPLYING OPINION MINING TO ORGANIZE WEB OPINIONS
APPLYING OPINION MINING TO ORGANIZE WEB OPINIONSAPPLYING OPINION MINING TO ORGANIZE WEB OPINIONS
APPLYING OPINION MINING TO ORGANIZE WEB OPINIONSIJCSEA Journal
 
Co-Extracting Opinions from Online Reviews
Co-Extracting Opinions from Online ReviewsCo-Extracting Opinions from Online Reviews
Co-Extracting Opinions from Online ReviewsEditor IJCATR
 
Analysing the performance of Recommendation System using different similarity...
Analysing the performance of Recommendation System using different similarity...Analysing the performance of Recommendation System using different similarity...
Analysing the performance of Recommendation System using different similarity...IRJET Journal
 

Similar a Thesis Proposal Estimating Relevance from Search Click Logs (20)

User search goal inference and feedback session using fast generalized – fuzz...
User search goal inference and feedback session using fast generalized – fuzz...User search goal inference and feedback session using fast generalized – fuzz...
User search goal inference and feedback session using fast generalized – fuzz...
 
AN EFFECTIVE FRAMEWORK FOR GENERATING RECOMMENDATIONS
AN EFFECTIVE FRAMEWORK FOR GENERATING RECOMMENDATIONSAN EFFECTIVE FRAMEWORK FOR GENERATING RECOMMENDATIONS
AN EFFECTIVE FRAMEWORK FOR GENERATING RECOMMENDATIONS
 
Efficient way of user search location in query processing
Efficient way of user search location in query processingEfficient way of user search location in query processing
Efficient way of user search location in query processing
 
Vol 12 No 1 - April 2014
Vol 12 No 1 - April 2014Vol 12 No 1 - April 2014
Vol 12 No 1 - April 2014
 
Personalized web search using browsing history and domain knowledge
Personalized web search using browsing history and domain knowledgePersonalized web search using browsing history and domain knowledge
Personalized web search using browsing history and domain knowledge
 
IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...
IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...
IRJET- Analysis on Existing Methodologies of User Service Rating Prediction S...
 
Fuzzy Logic Based Recommender System
Fuzzy Logic Based Recommender SystemFuzzy Logic Based Recommender System
Fuzzy Logic Based Recommender System
 
USER PROFILE BASED PERSONALIZED WEB SEARCH
USER PROFILE BASED PERSONALIZED WEB SEARCHUSER PROFILE BASED PERSONALIZED WEB SEARCH
USER PROFILE BASED PERSONALIZED WEB SEARCH
 
CONTENT AND USER CLICK BASED PAGE RANKING FOR IMPROVED WEB INFORMATION RETRIEVAL
CONTENT AND USER CLICK BASED PAGE RANKING FOR IMPROVED WEB INFORMATION RETRIEVALCONTENT AND USER CLICK BASED PAGE RANKING FOR IMPROVED WEB INFORMATION RETRIEVAL
CONTENT AND USER CLICK BASED PAGE RANKING FOR IMPROVED WEB INFORMATION RETRIEVAL
 
A New Algorithm for Inferring User Search Goals with Feedback Sessions
A New Algorithm for Inferring User Search Goals with Feedback SessionsA New Algorithm for Inferring User Search Goals with Feedback Sessions
A New Algorithm for Inferring User Search Goals with Feedback Sessions
 
TWO WAY CHAINED PACKETS MARKING TECHNIQUE FOR SECURE COMMUNICATION IN WIRELES...
TWO WAY CHAINED PACKETS MARKING TECHNIQUE FOR SECURE COMMUNICATION IN WIRELES...TWO WAY CHAINED PACKETS MARKING TECHNIQUE FOR SECURE COMMUNICATION IN WIRELES...
TWO WAY CHAINED PACKETS MARKING TECHNIQUE FOR SECURE COMMUNICATION IN WIRELES...
 
B1802021823
B1802021823B1802021823
B1802021823
 
Context Based Classification of Reviews Using Association Rule Mining, Fuzzy ...
Context Based Classification of Reviews Using Association Rule Mining, Fuzzy ...Context Based Classification of Reviews Using Association Rule Mining, Fuzzy ...
Context Based Classification of Reviews Using Association Rule Mining, Fuzzy ...
 
Amazon research memo
Amazon research memoAmazon research memo
Amazon research memo
 
Quest Trail: An Effective Approach for Construction of Personalized Search En...
Quest Trail: An Effective Approach for Construction of Personalized Search En...Quest Trail: An Effective Approach for Construction of Personalized Search En...
Quest Trail: An Effective Approach for Construction of Personalized Search En...
 
Ontological and clustering approach for content based recommendation systems
Ontological and clustering approach for content based recommendation systemsOntological and clustering approach for content based recommendation systems
Ontological and clustering approach for content based recommendation systems
 
Personalization of the Web Search
Personalization of the Web SearchPersonalization of the Web Search
Personalization of the Web Search
 
APPLYING OPINION MINING TO ORGANIZE WEB OPINIONS
APPLYING OPINION MINING TO ORGANIZE WEB OPINIONSAPPLYING OPINION MINING TO ORGANIZE WEB OPINIONS
APPLYING OPINION MINING TO ORGANIZE WEB OPINIONS
 
Co-Extracting Opinions from Online Reviews
Co-Extracting Opinions from Online ReviewsCo-Extracting Opinions from Online Reviews
Co-Extracting Opinions from Online Reviews
 
Analysing the performance of Recommendation System using different similarity...
Analysing the performance of Recommendation System using different similarity...Analysing the performance of Recommendation System using different similarity...
Analysing the performance of Recommendation System using different similarity...
 

Último

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024TopCSSGallery
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...Wes McKinney
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesKari Kakkonen
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxfnnc6jmgwh
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch TuesdayIvanti
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024Lonnie McRorey
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integrationmarketing932765
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxLoriGlavin3
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Strongerpanagenda
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 

Último (20)

TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024Top 10 Hubspot Development Companies in 2024
Top 10 Hubspot Development Companies in 2024
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
The Future Roadmap for the Composable Data Stack - Wes McKinney - Data Counci...
 
Testing tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examplesTesting tools and AI - ideas what to try with some tool examples
Testing tools and AI - ideas what to try with some tool examples
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptxGenerative AI - Gitex v1Generative AI - Gitex v1.pptx
Generative AI - Gitex v1Generative AI - Gitex v1.pptx
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
2024 April Patch Tuesday
2024 April Patch Tuesday2024 April Patch Tuesday
2024 April Patch Tuesday
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024TeamStation AI System Report LATAM IT Salaries 2024
TeamStation AI System Report LATAM IT Salaries 2024
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS:  6 Ways to Automate Your Data IntegrationBridging Between CAD & GIS:  6 Ways to Automate Your Data Integration
Bridging Between CAD & GIS: 6 Ways to Automate Your Data Integration
 
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptxUse of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
Use of FIDO in the Payments and Identity Landscape: FIDO Paris Seminar.pptx
 
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better StrongerModern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
Modern Roaming for Notes and Nomad – Cheaper Faster Better Stronger
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 

Thesis Proposal Estimating Relevance from Search Click Logs

  • 1. Thesis Proposal — Determining Relevance Rankings with Search Click Logs Inderjeet Singh Supervisor: Dr. Carson Kai-Sang Leung July 8, 2011 Abstract Search engines track users’ search activities in search click logs. These logs can be mined to get better insights into user behavior and to make user behavior models. These models can be then used in a ranking algorithm to give better, more focused and desirable results to the user. There are two problems with the existing models. First, researchers have not considered trust bias while interpreting click logs. Trust bias or trust factor is the preference the user gives to certain URLs that he/she trusts. For example, users show preference for websites like wikipedia.com, yahoo answers, stackoverflow.com and many others 1
  • 2. because users trust the documents from these URLs. The trusted websites can be different for people in different areas or niches. Thus, trust bias is an important parameter to be considered while designing a user behavior model and using it in a ranking algorithm. Second, researchers have not considered user clicks on other parts of a search page like advertisements while making their models. Interpreting these clicks is important because advertisements are also a part of search results and relevant advertisements help a user fulfill his information needs. I propose to extend the existing research to make a user behavior model from search click logs that will overcome the above two problems and then estimates the relevance of documents. 1 Introduction Search engines are used to answer ad-hoc or specific queries at various times. Queries can be of two types: navigational and informational. A navigational query looks for specific information such as a single website, web page or a single entity. An informational query looks for information about general or niche topics. Search engines rank search results in decreasing order of relevance to the query. Search engines assign scores to the documents for the user query using a ranking function, which is derived automatically from a ranking algorithm using training data. 2
  • 3. Training data is a collection of query-document pairs. Each query-document pair in the training data is represented by a set of properties of both query and document called features. Each query-document pair is labeled accord- ing to its relevance under categories such as perfect, excellent, good, fair or bad. These relevance labels are assigned by humans to each query-document pair indicating how well the document matches the query. These human judgments are known as editorial judgments. A good editorial judgement is important because the quality of a ranking function depends upon the quality of the training data used. Ranking function efficiency depends upon two critical aspects: the con- struction of training data with good relevance labels and the selection of features in the feature set. Usually, the user starts examining the result snippets (the combination of a URL and a small description of the document) for a query from top to bottom. The probability of a user examining a result snippet is called the examination probability. When examining, the user may find some snippets useful. This usefulness is the perceived relevance or attractiveness of the snippet. Eventually, the user clicks on a snippet and lands on a document. The information fulfillment that the user gets out of that document is the actual relevance of the document. The probability that the user will click on a result snippet is known as the click probability. The click probability is always less than or equal to the examination probability. The click probability for URLs on a results page decreases from top to bottom. This decrease is known 3
  • 4. as position bias. Search engines maintain logs of every user interaction. The log entries have information about the queries, the results displayed for each query, the number of results displayed, which results were clicked, user IP address and timestamps. Generally, these logs are on the order of terabytes in size. Mining click logs gives a better insight into how a user interacts with the search engine. For example, a click can be interpreted as a vote for that document for a particular query. So, the information about clicks and user interaction can be used to make user behavior models that capture user preferences. These models can be further used to estimate the document relevance for better search results as described in Section 2.1. The model in this context would be a set of equations and rules describing user interactions or actions on the search page in terms of probability values. Also, till date most of the relevance labels for training data are manually assigned by editors (humans) who can be biased at times and may not nec- essarily represent the aggregate behavior of search engine users. Manually drafting training data is also a time consuming process. To overcome these problems, training data must be automatically labeled; see Section 2.4. I intend to make a user behavior model that is different from existing models in considering trust bias and clicks on other parts of a search page. My model will closely capture user preferences and will make realistic and flexible assumptions on user behavior — for example, a user can do any of the following: click a single result or multiple results, go to a document and 4
  • 5. never come back to the results, or click advertisements on a search page. This model will then be used to estimate the actual document relevance for a query. The relevance estimates will then be added as a feature in training data to compute a better ranking function. To evaluate my model, I will first compare the relevance estimates of doc- uments from my model with editorial judgments and then with the relevance estimates of earlier models; see Section 5. I will look for an improved ranking function after adding the relevance estimates from my model as a feature in the training data. 2 Related Work Section 2.1 describes how to make user behavior models by interpreting click logs and then use the document relevance estimated from these models as a feature in the training data to get a new ranking function. My research will follow this work closely and I will try to improve upon these models. Section 2.2 describes trust bias modeling in online communities. My user behavior model will consider trust bias to certain URLs while interpreting document relevance from click logs. Section 2.3 describes modeling the rel- evance of advertisements for queries using click logs. In my model, the rel- evance of advertisements will be considered in the overall fulfillment of user information needs. Section 2.4 describes a method to automatically estimate relevance labels for query-document pairs in training data from click logs. I 5
  • 6. will use this method to automatically-generate labels for the training data which will also include feature from my proposed user behavior model. 2.1 Estimating Document Relevance from User Be- havior Models Dupret and Liao [4] designed a user behavior model that estimates the ac- tual relevance of clicked documents and not the perceived relevance. Their model focuses on the fulfillment that a user gets while browsing and click- ing documents. Their main assumption is that a user stops searching when his information need is fulfilled. Dupret and Liao did not limit the number of clicks (single or multiple) or the number of query reformulations in their model assumptions, which makes their model quite realistic. My model will match their solution methodology in considering their assumptions in addi- tion to my own of including trust factor and other parts of the search page. I will consider two more ranking efficiency metrics in my evaluation over and above what they have used; see Section 5. Craswell et al. [3] designed a model that explained position bias from click logs. They assumed that the user examines the result snippets sequentially from top to bottom and that the user’s search ends as soon as he/she clicks a relevant document for the query. This assumption is known as single-click assumption. Their work also assumes that the user does not skip a result snippet without examining it. 6
  • 7. Chapelle and Zhang [2] developed a model that gives an unbiased esti- mation of the actual relevance of a webpage, i.e., the model removes any position bias. Chapelle and Zhang’s work extends the work of Craswell et al. [3] with the assumption that the user will not stop searching until satisfied with the information. They overrule Craswell et al.’s single-click assumption, instead, assuming multiple clicks and query reformulations. Their work, how- ever, does not consider anything about the other parts of a search page, like sponsored results and related queries, which I am going to consider in my work. Dupret and Piwowarski [5] developed a model that differs from the work of Craswell et al. [3] in the sense that the user can skip a document without examining it. Their focus is more on attractiveness and perceived relevance and they are only modeling single clicks. This model has a lot of assumptions, which makes it limited for estimating actual user behavior. Guo et al. [7] proposed independent and dependent click models for mod- eling multiple clicks on a result page. The independent click model assumes that the click probability is independent for different positions of results and the examination probability is unity for every result. This model is only successful in explaining that the user usually clicks on the first snippet on a result page. The dependent click model extends the idea of Craswell et al. [3] for multiple clicks. This model describes the interdependence between clicks and examination at different positions. The dependent click model is good at explaining the clicks on the first and the last snippet on the result page. 7
  • 8. These two models can also be used with click log streams, i.e., a continuous flow of data. They do not, however, consider trust bias and other elements of a search page, which I will work on. 2.2 Trust Bias Modeling in Online Communities Finin et al. [6] modeled trust bias or influence in online social communi- ties. They discussed how a popular blog or website in an online community can influence opinions of other blogs. Their model of trust bias in online communities can be applied to URLs in my user behavior model. 2.3 Advertisement Relevance Prediction from Search Click Logs Raghavan and Hillard [8] proposed a model that improves the relevance of ad- vertisements for a query in a search engine. The earlier models that ranked advertisements for a query depended upon the number of clicks an adver- tisement received, i.e., the advertisement got a better rank based upon the number of clicks. Raghavan and Hillard’s model interprets click logs to es- timate the actual relevance of an advertisement to the query and then rank them. Their model is not based upon the number of clicks. I will use the actual relevance of advertisements from Raghavan and Hillard applied to overall fulfillment of user information need for a query in my model. 8
  • 9. 2.4 Automatically Estimating Relevance Labels for Train- ing Data Agrawal et al. [1] proposed a method that can be used to automatically estimate relevance labels of query-document pairs from click logs. They transformed user clicks into weighted, directed graphs and formulated the label generation problem as an ordered graph partitioning problem. In full generality, the problem of finding n labels in N P hard. Agrawal et al. showed that optimal labeling of a query-document pair can be done in linear time by using only two labels (relevant or non-relevant). They have proposed heuristic solutions to automatically estimate efficient labels from click logs. This automatically labeled training data can save humans from manually defining labels for query-document pairs. 3 Problem Description Previous user-behavior models for estimating the relevance of documents from search click logs have not considered the trust bias. Also, while making models, less consideration has been given to assumptions such as clicks on other parts of a search page like advertisements. These assumptions closely interpret flexible and realistic user behavior from click logs. 9
  • 10. 4 Solution Methodology My proposed model will be an extension of the work of Dupret and Liao [4]. In addition to their assumptions described in Section 2.1, I will include my own based on the trust bias and clicks on other parts of the search page, especially advertisements. I intend to make a model that will estimate the actual relevance of doc- uments with respect to specific queries. My model will be a set of equations and rules describing user interactions or actions on the search page in terms of probability values. A user session is a set of actions that the user per- forms on a search page to satisfy his information needs, like examining a result snippet, clicking on a search result or advertisement, coming back and again clicking some more results in decreasing ranking order, reformulating the query or abandoning the search. I will model trust bias in the form of probability equations just like any other user interaction. Trust bias equations would come into play in my model after the user has already examined the result snippet and finds that snippet attractive. Now, the user clicks on snippet based upon his trust in that URL. Thus, the click probability on a URL now depends on its exam- ination probability, attractiveness and then the trust bias while in Dupret and Liao’s model, click probability on a URL only depends on its examina- tion probability and then it’s attractiveness. For instance, the examination probability of a URL is e, it’s attractiveness probability is a, trust bias is t, 10
  • 11. click probability is c, query is q, document is d, actual document relevance is r and fulfillment probability is f , the click probability c on a particular d for specific q will now depend upon the joint probability of e, a, and t happening in sequence. Also, the actual document relevance r is dependent on fulfillment probability f from the clicked document d. Modeling clicks on other parts of a search page such as advertisements will be done in the form of probability equations. The clicks on advertisements are a form of user interaction on the search page. Advertisements also become part of overall fulfillment of user’s information needs. I am still researching on solution methodologies for modeling these clicks. After making the model with above mentioned improvements, the esti- mated relevance of documents for a query from my model will be combined with the existing features of the training data to recompute a new ranking function. The ranking obtained by the new function will be measured by discounted cumulative gain metric, which is a metric to measure ranking ef- fectiveness. If the metric improvement is significant when compared with the results obtained by ranking function used in existing search engine for the same query, then the document relevance from my model can be used as a feature in training data. My challenge will be to get search click logs data for a commercial search engine. If I am unable to get such logs, I will try getting logs from some meta search engine like metacrawler, dogpile or excite. If that is not possible, I will implement my own meta search engine and then collect logs. If all fails, 11
  • 12. I will be using previously-released logs from a search engine. 5 Evaluation I will use the discounted cumulative gain and normalized discounted cumula- tive gain to measure ranking effectiveness. I will also use precision and recall as metrics in my evaluation. A correct result is a result retrieved by a search engine that is relevant to a query. Precision is the ratio of correct results to the results retrieved and recall is the ratio of the correct results to relevant results for the query. A raw search click-log will be pre-processed to remove duplicate queries and noise. Noise here refers to the following queries: queries for which the number of user sessions is less than 10, queries with less than 10 results and queries with no clicks on snippets in a user session. Queries with no clicks on snippets are removed because most of these queries are misspelled or ambiguous. Only the first result page will be considered because most of the clicks happen here. Position bias will be removed from the dataset using editorial judgments for the results of a query. After pre-processing, the dataset will be produced automatically by using the methods described in Section 2.4. The dataset will then be split equally into training and test datasets. The training dataset will be used to train the ranking algorithm to generate a new ranking function, while the test dataset will be used to measure how effectively the function now ranks the results 12
  • 13. for a query by the user. I will do a comparative analysis of the estimated relevance of results from my model with the models of Dupret and Liao [4] and Guo et al. [7]. The analysis will also compare the estimated document relevance from these three models with the editorial judgments for the same query-document pairs. The comparison will give an idea of how accurately these models are estimating relevance. Results will be compared for both informational and navigational queries. If the estimated document relevance from my model matches to a consid- erable extent with editorial judgments, then these relevance estimates will be used as a feature in the training data. After this step, the ranking algorithm will be trained on the above data to generate a new ranking function which will be used to rank the test data. Rankings will also be generated for the same test data by existing ranking function used by popular search engines. I will thus calculate the discounted cumulative gain, normalized discounted cumulative gain, precision, and recall. If these metrics show considerable improvement, my model can be considered successful. 6 Timeline 13
  • 14. Task Start Date End Date Literature review Sept 2010 Ongoing Designing the model May 2011 Sept 2011 Comparative analysis with existing Oct 2011 Dec 2011 models Inclusion of relevance feature from my first week of Jan last week of Jan model in the training data 2012 2012 Evaluation of the new ranking function Feb 2012 last week of for different metric improvements April 2012 Thesis Write-up May 2012 mid July 2012 7 Summary I want to make a model that can estimate a document’s actual relevance from click logs, after modeling trust bias and clicks on the other parts of a search page. This model will follow some of the assumptions and solution methodology of Dupret and Liao [4]. If successful, this model can be used as a feature in training data to improve the ranking function of a search engine. References [1] Rakesh Agrawal, Alan Halverson, Krishnaram Kenthapadi, Nina Mishra, and Panayiotis Tsaparas. Generating labels from clicks. In Proceed- 14
  • 15. ings of the Second Web Search and Data Mining (WSDM) Conference, Barcelona, Spain, pages 172–181. ACM, 9–11 February 2009. [2] Olivier Chapelle and Ye Zhang. A dynamic Bayesian network click model for web search and ranking. In Proceedings of the 18th International Conference on World Wide Web (WWW), Madrid, Spain, pages 1–10. ACM, 20–24 April 2009. [3] Nick Craswell, Onno Zoeter, Michael Taylor, and Bill Ramsey. An exper- imental comparison of click position-bias models. In Proceedings of First Web Search and Data Mining (WSDM) Conference, Palo Alto, CA, USA, pages 87–94. ACM, 11–12 February 2008. [4] Georges Dupret and Ciya Liao. A model to estimate intrinsic document relevance from the clickthrough logs of a web search engine. In Proceedings of Third Web Search and Data Mining (WSDM) Conference, New York City, NY, USA, pages 181–190. ACM, 4–6 February 2010. [5] Georges Dupret and Benjamin Piwowarski. A user browsing model to predict search engine click data from past observations. In Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Singapore, pages 331–338. ACM, 20–24 July 2008. 15
  • 16. [6] Tim Finin, Anupam Joshi, Pranam Kolari, Akshay Java, Anubhav Kale, and Amit Karandikar. The information ecology of social media and online communities. AI Magazine, 29(3):77–92, 2008. [7] Fan Guo, Chao Liu, and Yi Min Wang. Efficient multiple-click models in web search. In Proceedings of Second Web Search and Data Min- ing (WSDM) Conference, Barcelona, Spain, pages 124–131. ACM, 9–11 February 2009. [8] Hema Raghavan and Dustin Hillard. A relevance model based filter for improving ad quality. In Proceedings of the 32nd International ACM SI- GIR Conference on Research and Development in Information Retrieval, Boston, MA, USA, pages 762–763. ACM, 19-23 July 2009. 16