SlideShare a Scribd company logo
1 of 28
Download to read offline
Current Approaches in
Search Result Diversification
         Mario Sangiorgio
Presentation outline
Problem definition

What is diversity?

The relevance/diversity trade-off

Performance evaluation

Open issues and conclusions
Why is result diversification needed?

   A couple of real life examples
Ambiguous query


     Flash
Unambiguous query



 Nuclear power plant
Problem definition


Search result diversification is an optimization
 problem aiming to find k items which are the
subset of all relevant results that contains both
  most relevant and most diverse results.
What is needed



Relevance measure       Diversity measure




           Diversification objective
The result diversification process



Items are ranked by relevance                          Diversity is measured




                 The two measures are used to get the final ranking
What is diversity?
How can items be diverse?

  Word sense diversity,
from ambiguous queries




                  Information source
                     diversity, from
                  unambiguous queries
Measures of diversity

Diversity is tightly coupled with the concept of
                     similarity

To address the different aspects of the problem
         several measures emerged:
             Semantic distance
            Categorical distance
             Novel information
Semantic distance
     Diversifies on content dissimilarity
Uses the min-hashing
  scheme to get the        Sd ={MH h 1 d  ,... , MH h d }
                                                         n


sketch of a document
                                         ∣Su ∩Sv∣
  Distance is computed      sim u , v=
                                         ∣Su ∪Sv∣
 from Jaccard similarity      d u , v=1−sim u , v


Does not work well when the documents have too
      different lengths or small sketch size
Categorical distance
Emphasizes word sense diversification

  It is based on metadata (Taxonomy)

The measure is a weighted tree distance
                    l u                             l v
                                     1                                 1
   d u , v=      ∑            2
                                    e i−1
                                                    ∑            2
                                                                      e i−1
                i=lca u , v                     i=lca u , v


        Examples of taxonomies:
      /Top/Health vs /Top/Finance
/Top/Sport/Racing vs /Top/Sport/Football
Novel information
   Diversifies on a general sense regarding
   content dissimilarity. Good for subtopics
Results are represented with unigram language
models (Used for natural language processing)
  For each document is evaluated (with the
Kullback-Leibler divergence) how much novel
       information it brings into the set
How many extra bits will be needed to describe
  the new document using only the already
        selected document in the set
Diversity measures: open issues

  Some aspects not taken into account:

    intrinsic properties of the document

          genre of the document

       sentiment regarding the topic
The relevance/diversity
 optimization problem
Diversification objectives
It has been proved impossible to find a function
       that has all the required properties:

               scale invariance
                 consistency
                   richness
                    stability
     independence of irrelevant attributes
                 monotonicity
            strength of relevance
            strength of similarity
Diversification objectives
           Several functions proposed:
         Max sum                 Max min
        (No stability)        (No consistency
                                nor stability)
 Max sum of max score
                              Mono objective
(Maximizes relevancy and
                              (No consistency)
     then diversity)
                                 Categorical
        Max product
                            (Results have to cover
(It is based on the already
                              a set of categories)
       chosen results)
Diversification algorithms
 Finding the best solution is a NP-Hard problem

  Algorithm depends on the objective function
    Approximation              Greedy


                  Open issues:
   Is Off-line
                           Are there efficient
pre-computation
                           data structures?
   applicable?
How to evaluate diversity in
         search
Data set for the evaluation
                     Full text
                 TREC Interactive
   Top results from commercial search engine

               Structured data
      Taxonomies (Open Directory Project)

                 Ground truth
        Wikipedia disambiguation pages
   Judgements from Amazon Mechanical Turk

There is the need of task-specific standard datasets
Benchmarks
          Adaptation from existing metrics:
    Alpha-NDCG             Subtopic recall and
Normalized discounted          precision
   cumulative gain         Number of subtopics
                                covered
      User intent              Comparison
   Results distribution         against the
 should reflect what the         optimum
    user is asking for
Alpha-nDCG

  Based on information nuggets (Answer to a
                 question)

A document is relevant when it contains a nugget
              needed by the user

 Quality of results graded by human assessors

    The most nuggets are in the set the best
Subtopic recall and precision

                   Is the result set exhaustive?
                  number of subtopics covered by the first k documents
s−recall at k =
                              total number of subtopics




                      Is the result set efficient?
                                       minRank S opt , r
                    s− precision at r=
                                        minRank S , r
Conclusions


Diversification can really improve quality of search
                       results

There is still some work to do in order to achieve
   good results in all the possible scenarios
Open issues

  There is room for improvement defining new
           diversity types and metrics

Ranking functions should take in account diversity
  from the beginning in an integrated process

  Datasets to evaluate each notion of diversity
                should be built
References
   Minack, E., Demartini, G., Nejdl W.: Current Approaches to Search
           Result Diversification. In: Proceedings of ISWC '09

     Gollapudi, S., Sharma, A.: An Axiomatic Approach for Result
             Diversification.In: Proocedings of WWW '09

 Zhai, C.X., Cohen, W.W., Lafferty, J.: Beyond Independent Relevance:
Methods and Evaluation Metrics for Subtopic Retrieval. In: Proceedings
                              of SIGIR '03

 Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying Search
                  Results. In: Proceedings of WSDM '09

 Clough, P., Sanderson, M., Abouammoh, M., Navarro, S., Paramita, M.:
 Multiple Approaches to Analysing Query Diversity. In: Proceedings of
                               SIGIR '09

   Clarke, C.L., Kolla, M., Cormack, G.V., Vechtomova, O., Ashkan, A.,
   Büttcher, S., MacKinnon, I.: Novelty and Diversity in Information
           Retrieval Evaluation. In: Proceedings of SIGIR '08
Current Approaches in Search Result Diversification

More Related Content

What's hot

Learning from Multiple Annotators
Learning  from  Multiple AnnotatorsLearning  from  Multiple Annotators
Learning from Multiple AnnotatorsGaurav Trivedi
 
data_mining_Projectreport
data_mining_Projectreportdata_mining_Projectreport
data_mining_ProjectreportSampath Velaga
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learningbutest
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.pptbutest
 
An Overview of Naïve Bayes Classifier
An Overview of Naïve Bayes Classifier An Overview of Naïve Bayes Classifier
An Overview of Naïve Bayes Classifier ananth
 
Supervised WSD Using Master- Slave Voting Technique
Supervised WSD Using Master- Slave Voting TechniqueSupervised WSD Using Master- Slave Voting Technique
Supervised WSD Using Master- Slave Voting Techniqueiosrjce
 
Centralized Class Specific Dictionary Learning for wearable sensors based phy...
Centralized Class Specific Dictionary Learning for wearable sensors based phy...Centralized Class Specific Dictionary Learning for wearable sensors based phy...
Centralized Class Specific Dictionary Learning for wearable sensors based phy...Sherin Mathews
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401butest
 
Extending Recommendation Systems With Semantics And Context Awareness
Extending Recommendation Systems With Semantics And Context AwarenessExtending Recommendation Systems With Semantics And Context Awareness
Extending Recommendation Systems With Semantics And Context AwarenessVictor Codina
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningLior Rokach
 
A SYSTEM OF SERIAL COMPUTATION FOR CLASSIFIED RULES PREDICTION IN NONREGULAR ...
A SYSTEM OF SERIAL COMPUTATION FOR CLASSIFIED RULES PREDICTION IN NONREGULAR ...A SYSTEM OF SERIAL COMPUTATION FOR CLASSIFIED RULES PREDICTION IN NONREGULAR ...
A SYSTEM OF SERIAL COMPUTATION FOR CLASSIFIED RULES PREDICTION IN NONREGULAR ...ijaia
 
Knewton adaptive-learning-white-paper
Knewton adaptive-learning-white-paperKnewton adaptive-learning-white-paper
Knewton adaptive-learning-white-paperdearrd
 
uai2004_V1.doc.doc.doc
uai2004_V1.doc.doc.docuai2004_V1.doc.doc.doc
uai2004_V1.doc.doc.docbutest
 
Text Classification/Categorization
Text Classification/CategorizationText Classification/Categorization
Text Classification/CategorizationOswal Abhishek
 
Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...IJERA Editor
 
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...IRJET Journal
 

What's hot (18)

Learning from Multiple Annotators
Learning  from  Multiple AnnotatorsLearning  from  Multiple Annotators
Learning from Multiple Annotators
 
I0704047054
I0704047054I0704047054
I0704047054
 
data_mining_Projectreport
data_mining_Projectreportdata_mining_Projectreport
data_mining_Projectreport
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learning
 
MachineLearning.ppt
MachineLearning.pptMachineLearning.ppt
MachineLearning.ppt
 
An Overview of Naïve Bayes Classifier
An Overview of Naïve Bayes Classifier An Overview of Naïve Bayes Classifier
An Overview of Naïve Bayes Classifier
 
Supervised WSD Using Master- Slave Voting Technique
Supervised WSD Using Master- Slave Voting TechniqueSupervised WSD Using Master- Slave Voting Technique
Supervised WSD Using Master- Slave Voting Technique
 
Centralized Class Specific Dictionary Learning for wearable sensors based phy...
Centralized Class Specific Dictionary Learning for wearable sensors based phy...Centralized Class Specific Dictionary Learning for wearable sensors based phy...
Centralized Class Specific Dictionary Learning for wearable sensors based phy...
 
Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401Machine Learning: Foundations Course Number 0368403401
Machine Learning: Foundations Course Number 0368403401
 
Extending Recommendation Systems With Semantics And Context Awareness
Extending Recommendation Systems With Semantics And Context AwarenessExtending Recommendation Systems With Semantics And Context Awareness
Extending Recommendation Systems With Semantics And Context Awareness
 
Naive Bayes | Statistics
Naive Bayes | StatisticsNaive Bayes | Statistics
Naive Bayes | Statistics
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
A SYSTEM OF SERIAL COMPUTATION FOR CLASSIFIED RULES PREDICTION IN NONREGULAR ...
A SYSTEM OF SERIAL COMPUTATION FOR CLASSIFIED RULES PREDICTION IN NONREGULAR ...A SYSTEM OF SERIAL COMPUTATION FOR CLASSIFIED RULES PREDICTION IN NONREGULAR ...
A SYSTEM OF SERIAL COMPUTATION FOR CLASSIFIED RULES PREDICTION IN NONREGULAR ...
 
Knewton adaptive-learning-white-paper
Knewton adaptive-learning-white-paperKnewton adaptive-learning-white-paper
Knewton adaptive-learning-white-paper
 
uai2004_V1.doc.doc.doc
uai2004_V1.doc.doc.docuai2004_V1.doc.doc.doc
uai2004_V1.doc.doc.doc
 
Text Classification/Categorization
Text Classification/CategorizationText Classification/Categorization
Text Classification/Categorization
 
Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...Comparision of methods for combination of multiple classifiers that predict b...
Comparision of methods for combination of multiple classifiers that predict b...
 
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
IRJET- Evaluation of Classification Algorithms with Solutions to Class Imbala...
 

Similar to Current Approaches in Search Result Diversification

Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Ian Morgan
 
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Bayes Nets meetup London
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data MiningKai Koenig
 
G04124041046
G04124041046G04124041046
G04124041046IOSR-JEN
 
Iterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer PredictionIterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer PredictionAlessandro Suglia
 
Iterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer PredictionIterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer PredictionClaudio Greco
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Miningbutest
 
Binary search query classifier
Binary search query classifierBinary search query classifier
Binary search query classifierEsteban Ribero
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Sudeep Das, Ph.D.
 
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...Geetika Gautam
 
Multivariate Models in Questionnaire Development
Multivariate Models in Questionnaire DevelopmentMultivariate Models in Questionnaire Development
Multivariate Models in Questionnaire DevelopmentD Dutta Roy
 
A Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and DistributionsA Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and DistributionsRebecca Bilbro
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroPyData
 
STUDENT PERFORMANCE ANALYSIS USING DECISION TREE
STUDENT PERFORMANCE ANALYSIS USING DECISION TREESTUDENT PERFORMANCE ANALYSIS USING DECISION TREE
STUDENT PERFORMANCE ANALYSIS USING DECISION TREEAkshay Jain
 
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data AnalysisWorkshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data AnalysisOlga Scrivner
 
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5ssuser33da69
 

Similar to Current Approaches in Search Result Diversification (20)

Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
 
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
Professor Steve Roberts; The Bayesian Crowd: scalable information combinati...
 
Introduction to Data Mining
Introduction to Data MiningIntroduction to Data Mining
Introduction to Data Mining
 
G04124041046
G04124041046G04124041046
G04124041046
 
Iterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer PredictionIterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer Prediction
 
Iterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer PredictionIterative Multi-document Neural Attention for Multiple Answer Prediction
Iterative Multi-document Neural Attention for Multiple Answer Prediction
 
Presentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data MiningPresentation on Machine Learning and Data Mining
Presentation on Machine Learning and Data Mining
 
Binary search query classifier
Binary search query classifierBinary search query classifier
Binary search query classifier
 
Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it! Crafting Recommenders: the Shallow and the Deep of it!
Crafting Recommenders: the Shallow and the Deep of it!
 
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...Project prSentiment Analysis  of Twitter Data Using Machine Learning Approach...
Project prSentiment Analysis of Twitter Data Using Machine Learning Approach...
 
Summary2 (1)
Summary2 (1)Summary2 (1)
Summary2 (1)
 
Classification
ClassificationClassification
Classification
 
Classification
ClassificationClassification
Classification
 
Multivariate Models in Questionnaire Development
Multivariate Models in Questionnaire DevelopmentMultivariate Models in Questionnaire Development
Multivariate Models in Questionnaire Development
 
A Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and DistributionsA Visual Exploration of Distance, Documents, and Distributions
A Visual Exploration of Distance, Documents, and Distributions
 
Words in Space - Rebecca Bilbro
Words in Space - Rebecca BilbroWords in Space - Rebecca Bilbro
Words in Space - Rebecca Bilbro
 
Clustering
ClusteringClustering
Clustering
 
STUDENT PERFORMANCE ANALYSIS USING DECISION TREE
STUDENT PERFORMANCE ANALYSIS USING DECISION TREESTUDENT PERFORMANCE ANALYSIS USING DECISION TREE
STUDENT PERFORMANCE ANALYSIS USING DECISION TREE
 
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data AnalysisWorkshop nwav 47 - LVS - Tool for Quantitative Data Analysis
Workshop nwav 47 - LVS - Tool for Quantitative Data Analysis
 
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
Decision treeinductionmethodsandtheirapplicationtobigdatafinal 5
 

Recently uploaded

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfMounikaPolabathina
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxLoriGlavin3
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsSergiu Bodiu
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteDianaGray10
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxNavinnSomaal
 

Recently uploaded (20)

The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
What is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdfWhat is DBT - The Ultimate Data Build Tool.pdf
What is DBT - The Ultimate Data Build Tool.pdf
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptxPasskey Providers and Enabling Portability: FIDO Paris Seminar.pptx
Passkey Providers and Enabling Portability: FIDO Paris Seminar.pptx
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
DevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platformsDevEX - reference for building teams, processes, and platforms
DevEX - reference for building teams, processes, and platforms
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Take control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test SuiteTake control of your SAP testing with UiPath Test Suite
Take control of your SAP testing with UiPath Test Suite
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptxSAP Build Work Zone - Overview L2-L3.pptx
SAP Build Work Zone - Overview L2-L3.pptx
 

Current Approaches in Search Result Diversification

  • 1. Current Approaches in Search Result Diversification Mario Sangiorgio
  • 2. Presentation outline Problem definition What is diversity? The relevance/diversity trade-off Performance evaluation Open issues and conclusions
  • 3. Why is result diversification needed? A couple of real life examples
  • 6. Problem definition Search result diversification is an optimization problem aiming to find k items which are the subset of all relevant results that contains both most relevant and most diverse results.
  • 7. What is needed Relevance measure Diversity measure Diversification objective
  • 8. The result diversification process Items are ranked by relevance Diversity is measured The two measures are used to get the final ranking
  • 10. How can items be diverse? Word sense diversity, from ambiguous queries Information source diversity, from unambiguous queries
  • 11. Measures of diversity Diversity is tightly coupled with the concept of similarity To address the different aspects of the problem several measures emerged: Semantic distance Categorical distance Novel information
  • 12. Semantic distance Diversifies on content dissimilarity Uses the min-hashing scheme to get the Sd ={MH h 1 d  ,... , MH h d } n sketch of a document ∣Su ∩Sv∣ Distance is computed sim u , v= ∣Su ∪Sv∣ from Jaccard similarity d u , v=1−sim u , v Does not work well when the documents have too different lengths or small sketch size
  • 13. Categorical distance Emphasizes word sense diversification It is based on metadata (Taxonomy) The measure is a weighted tree distance l u  l v 1 1 d u , v= ∑ 2 e i−1  ∑ 2 e i−1 i=lca u , v i=lca u , v Examples of taxonomies: /Top/Health vs /Top/Finance /Top/Sport/Racing vs /Top/Sport/Football
  • 14. Novel information Diversifies on a general sense regarding content dissimilarity. Good for subtopics Results are represented with unigram language models (Used for natural language processing) For each document is evaluated (with the Kullback-Leibler divergence) how much novel information it brings into the set How many extra bits will be needed to describe the new document using only the already selected document in the set
  • 15. Diversity measures: open issues Some aspects not taken into account: intrinsic properties of the document genre of the document sentiment regarding the topic
  • 17. Diversification objectives It has been proved impossible to find a function that has all the required properties: scale invariance consistency richness stability independence of irrelevant attributes monotonicity strength of relevance strength of similarity
  • 18. Diversification objectives Several functions proposed: Max sum Max min (No stability) (No consistency nor stability) Max sum of max score Mono objective (Maximizes relevancy and (No consistency) then diversity) Categorical Max product (Results have to cover (It is based on the already a set of categories) chosen results)
  • 19. Diversification algorithms Finding the best solution is a NP-Hard problem Algorithm depends on the objective function Approximation Greedy Open issues: Is Off-line Are there efficient pre-computation data structures? applicable?
  • 20. How to evaluate diversity in search
  • 21. Data set for the evaluation Full text TREC Interactive Top results from commercial search engine Structured data Taxonomies (Open Directory Project) Ground truth Wikipedia disambiguation pages Judgements from Amazon Mechanical Turk There is the need of task-specific standard datasets
  • 22. Benchmarks Adaptation from existing metrics: Alpha-NDCG Subtopic recall and Normalized discounted precision cumulative gain Number of subtopics covered User intent Comparison Results distribution against the should reflect what the optimum user is asking for
  • 23. Alpha-nDCG Based on information nuggets (Answer to a question) A document is relevant when it contains a nugget needed by the user Quality of results graded by human assessors The most nuggets are in the set the best
  • 24. Subtopic recall and precision Is the result set exhaustive? number of subtopics covered by the first k documents s−recall at k = total number of subtopics Is the result set efficient? minRank S opt , r s− precision at r= minRank S , r
  • 25. Conclusions Diversification can really improve quality of search results There is still some work to do in order to achieve good results in all the possible scenarios
  • 26. Open issues There is room for improvement defining new diversity types and metrics Ranking functions should take in account diversity from the beginning in an integrated process Datasets to evaluate each notion of diversity should be built
  • 27. References Minack, E., Demartini, G., Nejdl W.: Current Approaches to Search Result Diversification. In: Proceedings of ISWC '09 Gollapudi, S., Sharma, A.: An Axiomatic Approach for Result Diversification.In: Proocedings of WWW '09 Zhai, C.X., Cohen, W.W., Lafferty, J.: Beyond Independent Relevance: Methods and Evaluation Metrics for Subtopic Retrieval. In: Proceedings of SIGIR '03 Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S.: Diversifying Search Results. In: Proceedings of WSDM '09 Clough, P., Sanderson, M., Abouammoh, M., Navarro, S., Paramita, M.: Multiple Approaches to Analysing Query Diversity. In: Proceedings of SIGIR '09 Clarke, C.L., Kolla, M., Cormack, G.V., Vechtomova, O., Ashkan, A., Büttcher, S., MacKinnon, I.: Novelty and Diversity in Information Retrieval Evaluation. In: Proceedings of SIGIR '08