SlideShare a Scribd company logo
1 of 23
Download to read offline
CLEF’10: Conference on Multilingual and Multimodal
                     Information Access Evaluation
                        September 20-23, Padua, Italy




                     Tie-Breaking Bias:
           Effect of an Uncontrolled Parameter
           on Information Retrieval Evaluation

           Guillaume Cabanac, Gilles Hubert,
         Mohand Boughanem, Claude Chrisment
Effect of the Tie-Breaking Bias                                        G. Cabanac et al.



                                   Outline
1. Motivation                     A tale about two TREC participants

2. Context                        IRS effectiveness evaluation

    Issue                         Tie-breaking bias effects

3. Contribution                   Reordering strategies

4. Experiments                    Impact of the tie-breaking bias

5. Conclusion and Future Works


                                                                               2
Effect of the Tie-Breaking Bias                                        G. Cabanac et al.



                                   Outline
1. Motivation                     A tale about two TREC participants

2. Context                        IRS effectiveness evaluation

    Issue                         Tie-breaking bias effects

3. Contribution                   Reordering strategies

4. Experiments                    Impact of the tie-breaking bias

5. Conclusion and Future Works


                                                                               3
1. Motivation  Tie-breaking bias illustration                                              G. Cabanac et al.


A tale about two TREC participants                                         (1/2)


Topic 031 “satellite launch contracts”                             5 relevant documents




                 Chris                                                         Ellen
                                         one single difference




   C = ( , 0.8), ( , 0.8), ( , 0.5)                            E = ( , 0.8), ( , 0.8), ( , 0.5)




               unlucky                                                         lucky
                              Why such a huge difference?                                             4
1. Motivation  Tie-breaking bias illustration                                            G. Cabanac et al.


A tale about two TREC participants                                      (2/2)
                Chris                                                       Ellen
                                      one single difference




 C = ( , 0.8), ( , 0.8), ( , 0.5)                           E = ( , 0.8), ( , 0.8), ( , 0.5)




      After 15 days of hard work




                Only difference: the name of one document                                        5
Effect of the Tie-Breaking Bias                                        G. Cabanac et al.



                                   Outline
1. Motivation                     A tale about two TREC participants

2. Context                        IRS effectiveness evaluation

    Issue                         Tie-breaking bias effects

3. Contribution                   Reordering strategies

4. Experiments                    Impact of the tie-breaking bias

5. Conclusion and Future Works


                                                                               6
2. Context & issue  Tie-breaking bias                                          G. Cabanac et al.


Measuring the effectiveness of IRSs
    User-centered vs. System-focused                          [Spärk Jones & Willett, 1997]


    Evaluation campaigns
         1958     Cranfield                                           UK
         1992     TREC         Text Retrieval Conference              USA
         1999     NTCIR        NII Test Collection for IR Systems     Japan
         2001     CLEF         Cross-Language Evaluation Forum        Europe
         …

    “Cranfield” methodology
         Task
       Test collection
           Corpus

           Topics

           Qrels

       Measures : MAP, P@X ...
                                                                                            7
                      using trec_eval                            [Voorhees, 2007]
2. Context & issue  Tie-breaking bias                                           G. Cabanac et al.


    Runs are reordered prior to their evaluation
Qrels = qid, iter, docno, rel            Run = qid, iter, docno, rank, sim, run_id




                                                      ( , 0.8), ( , 0.8), ( , 0.5)

                                                        Reordering by trec_eval
                                                     qid asc, sim desc, docno desc


                                                      ( , 0.8), ( , 0.8), ( , 0.5)



               Effectiveness measure = f (intrinsic_quality,                  )
                                     MAP, P@X, MRR…                                            8
Effect of the Tie-Breaking Bias                                        G. Cabanac et al.



                                   Outline
1. Motivation                     A tale about two TREC participants

2. Context                        IRS effectiveness evaluation

    Issue                         Tie-breaking bias effects

3. Contribution                   Reordering strategies

4. Experiments                    Impact of the tie-breaking bias

5. Conclusion and Future Works


                                                                               9
3. Contribution  Reordering strategies                                         G. Cabanac et al.


Consequences of run reordering
    Measures of effectiveness for an IRS s
       RR(s,t)        1/rank of the 1st relevant document, for topic t
                                                                                      
       P(s,t,d)       precision at document d, for topic t                       Sensitive to
       AP(s,t)        average precision for topic t                               document
                                                                                     rank
       MAP(s)         mean average precision

 Tie-breaking bias

               Ellen



               Chris


         Is the Wall Street Journal collection more relevant than Associated Press?

             Problem 1         comparing 2 systems            AP(s1, t) vs. AP(s2, t)
             Problem 2         comparing 2 topics             AP(s, t1) vs. AP(s, t2)    10
3. Contribution  Reordering strategies                                       G. Cabanac et al.


      Alternative unbiased reordering strategies

                   ex aequo




ex aequo




       Conventional reordering (TREC)
                Ties sorted Z  A            qid asc, sim desc,            docno desc

       Realistic reordering
                Relevant docs last           qid asc, sim desc, rel asc,   docno desc

       Optimistic reordering
                                                                                               11
                Relevant docs first          qid asc, sim desc, rel desc, docno desc
Effect of the Tie-Breaking Bias                                        G. Cabanac et al.



                                   Outline
1. Motivation                     A tale about two TREC participants

2. Context                        IRS effectiveness evaluation

    Issue                         Tie-breaking bias effects

3. Contribution                   Reordering strategies

4. Experiments                    Impact of the tie-breaking bias

5. Conclusion and Future Works


                                                                             12
4. Experiments  Impact of the tie-breaking bias                                             G. Cabanac et al.


Effect of the tie-breaking bias
    Study of 4 TREC tasks
1993                       1997   1998   1999    2000       2002         2004                2009


            routing                             filtering          web

                   adhoc


        22 editions
                                  3 GB of data from trec.nist.gov
        1360 runs




    Assessing the effect of tie-breaking
          Proportion of document ties  How frequent is the bias?
          Effect on measure values
               Top 3 observed differences
               Observed difference in %
               Significance of the observed difference: Student’s t-test (paired, unilateral)
                                                                                                       13
4. Experiments  Impact of the tie-breaking bias   G. Cabanac et al.


Ties demographics
    89.6% of the runs comprise ties

    Ties are present all along the runs




                                                             14
4. Experiments  Impact of the tie-breaking bias                                                G. Cabanac et al.


    Proportion of tied documents in submitted runs




                                                                                                             15
On average, 25.2 % of a result-list = tied documents   On average, 10.6 docs in a tied group of docs
4. Experiments  Impact of the tie-breaking bias   G. Cabanac et al.


Effect on Reciprocal Rank (RR)




                                                          16
4. Experiments  Impact of the tie-breaking bias   G. Cabanac et al.


Effect on Average Precision (AP)




                                                          17
4. Experiments  Impact of the tie-breaking bias                      G. Cabanac et al.


Effect on Mean Average Precision (MAP)




                                                    Difference of ranks computed
                                                    on MAP not significant
                                                    (Kendall’s t)

                                                                             18
4. Experiments  Impact of the tie-breaking bias                             G. Cabanac et al.


What we learnt: Beware of tie-breaking for AP
    Poor effect on MAP, larger effect on AP

    Measure bounds APRealistic  APConventionnal  APOptimistic

                                                                      padre1, adhoc’94




    Failure analysis for the ranking process
         Error bar = element of chance  potential for improvement
                                                                                         19
4. Experiments  Impact of the tie-breaking bias                                          G. Cabanac et al.


     Related works in IR evaluation


Topics reliability?
   [Buckley & Voorhees, 2000]       25
   [Voorhees & Buckley, 2002]      error rate
   [Voorhees, 2009]                n collections


Qrels reliability?
   [Voorhees, 1998]             quality
   [Al-Maskari et al., 2008]    TREC vs. TREC


                                                                                             [Voorhees, 2007]
   Measures reliability?
       [Buckley & Voorhees, 2000]      MAP 
       [Sakai, 2008]                   ‘system bias’
       [Moffat & Zobel, 2008]          new measures
       [Raghavan et al., 1989]         Precall             Pooling reliability?
       [McSherry & Najork, 2008]       Tied scores            [Zobel, 1998]              approximation 
                                                              [Sanderson & Joho, 2004]   manual
       [Cabanac et al., 2010]          tie-breaking bias      [Buckley et al., 2007]     size adaptation 20
Effect of the Tie-Breaking Bias                                        G. Cabanac et al.



                                   Outline
1. Motivation                     A tale about two TREC participants

2. Context                        IRS effectiveness evaluation

    Issue                         Tie-breaking bias effects

3. Contribution                   Reordering strategies

4. Experiments                    Impact of the tie-breaking bias

5. Conclusion and Future Works


                                                                             21
Impact du « biais des ex aequo » dans les évaluations de RI     G. Cabanac et al.


Conclusions and future works
    Context: IR evaluation
         TREC and other campaigns based on trec_eval

    Contributions
       Measure = f (intrinsic_quality, luck)  tie-breaking bias

         Measure bounds (realistic  conventional  optimistic)

         Study of the tie-breaking bias effect
             (conventional, realistic) for RR, AP and MAP

             Strong correlation, yet significant difference

             No difference on system rankings (based on MAP)

    Future works
       Study of other / more recent evaluation campaigns
       Reordering-free measures
                                                                          22
       Finer grained analyses: finding vs. ranking
CLEF’10: Conference on Multilingual and Multimodal
                     Information Access Evaluation
                        September 20-23, Padua, Italy




                 Thank you

More Related Content

Viewers also liked

Pi save sense clapet pour ventouses
Pi save sense clapet pour ventousesPi save sense clapet pour ventouses
Pi save sense clapet pour ventousesEUROPAGES
 
Facebook Alben auf der eigenen Website mit Composite C1 darstellen
Facebook Alben auf der eigenen Website mit Composite C1 darstellenFacebook Alben auf der eigenen Website mit Composite C1 darstellen
Facebook Alben auf der eigenen Website mit Composite C1 darstellenCGN Cloud Company
 
Fast Esp搜索系统
Fast Esp搜索系统Fast Esp搜索系统
Fast Esp搜索系统xiaochawan
 
Culturile africii
Culturile africiiCulturile africii
Culturile africiigruianul
 
Far West – Juillet 2008
Far West – Juillet 2008Far West – Juillet 2008
Far West – Juillet 2008tux2600
 
Guarrate presentation
Guarrate presentationGuarrate presentation
Guarrate presentationLaura Galan
 
Adivina A Quien Pertenece Esta Mansion
Adivina A Quien Pertenece Esta MansionAdivina A Quien Pertenece Esta Mansion
Adivina A Quien Pertenece Esta MansionNieve11
 
Valencia cap. II
Valencia cap. IIValencia cap. II
Valencia cap. IIgabriel
 
evaluation in infomation retrival
evaluation in infomation retrivalevaluation in infomation retrival
evaluation in infomation retrivaljetaime
 
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6 - Indexing
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6  - IndexingInformation Retrieval, Encoding, Indexing, Big Table. Lecture 6  - Indexing
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6 - IndexingSean Golliher
 
Evaluation in Information Retrieval
Evaluation in Information RetrievalEvaluation in Information Retrieval
Evaluation in Information RetrievalDishant Ailawadi
 
Database retrieval system and related semantic web application
Database retrieval system and related semantic web applicationDatabase retrieval system and related semantic web application
Database retrieval system and related semantic web applicationShailendra Kumar
 
Chapter 8 : Evaluation in Information Retrieval
Chapter 8 : Evaluation in Information RetrievalChapter 8 : Evaluation in Information Retrieval
Chapter 8 : Evaluation in Information RetrievalJoongjin Bae
 
Ppt evaluation of information retrieval system
Ppt evaluation of information retrieval systemPpt evaluation of information retrieval system
Ppt evaluation of information retrieval systemsilambu111
 

Viewers also liked (20)

Pi save sense clapet pour ventouses
Pi save sense clapet pour ventousesPi save sense clapet pour ventouses
Pi save sense clapet pour ventouses
 
Playing test
Playing testPlaying test
Playing test
 
Facebook Alben auf der eigenen Website mit Composite C1 darstellen
Facebook Alben auf der eigenen Website mit Composite C1 darstellenFacebook Alben auf der eigenen Website mit Composite C1 darstellen
Facebook Alben auf der eigenen Website mit Composite C1 darstellen
 
Fast Esp搜索系统
Fast Esp搜索系统Fast Esp搜索系统
Fast Esp搜索系统
 
Nazca
NazcaNazca
Nazca
 
Culturile africii
Culturile africiiCulturile africii
Culturile africii
 
Far West – Juillet 2008
Far West – Juillet 2008Far West – Juillet 2008
Far West – Juillet 2008
 
Guarrate presentation
Guarrate presentationGuarrate presentation
Guarrate presentation
 
1 Sam 3.docx
1 Sam 3.docx1 Sam 3.docx
1 Sam 3.docx
 
Abcom
AbcomAbcom
Abcom
 
Adivina A Quien Pertenece Esta Mansion
Adivina A Quien Pertenece Esta MansionAdivina A Quien Pertenece Esta Mansion
Adivina A Quien Pertenece Esta Mansion
 
Ice 2007
Ice 2007Ice 2007
Ice 2007
 
Valencia cap. II
Valencia cap. IIValencia cap. II
Valencia cap. II
 
evaluation in infomation retrival
evaluation in infomation retrivalevaluation in infomation retrival
evaluation in infomation retrival
 
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6 - Indexing
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6  - IndexingInformation Retrieval, Encoding, Indexing, Big Table. Lecture 6  - Indexing
Information Retrieval, Encoding, Indexing, Big Table. Lecture 6 - Indexing
 
Ir 08
Ir   08Ir   08
Ir 08
 
Evaluation in Information Retrieval
Evaluation in Information RetrievalEvaluation in Information Retrieval
Evaluation in Information Retrieval
 
Database retrieval system and related semantic web application
Database retrieval system and related semantic web applicationDatabase retrieval system and related semantic web application
Database retrieval system and related semantic web application
 
Chapter 8 : Evaluation in Information Retrieval
Chapter 8 : Evaluation in Information RetrievalChapter 8 : Evaluation in Information Retrieval
Chapter 8 : Evaluation in Information Retrieval
 
Ppt evaluation of information retrieval system
Ppt evaluation of information retrieval systemPpt evaluation of information retrieval system
Ppt evaluation of information retrieval system
 

Similar to Impact of Tie-Breaking Bias on Information Retrieval Evaluation

A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...Salford Systems
 
Ghotra icse
Ghotra icseGhotra icse
Ghotra icseSAIL_QU
 
Kernel based approaches in drug target interaction prediction
Kernel based approaches in drug target interaction predictionKernel based approaches in drug target interaction prediction
Kernel based approaches in drug target interaction predictionXinyi Z.
 
3 article azojete vol 7 24 33
3 article azojete vol 7 24 333 article azojete vol 7 24 33
3 article azojete vol 7 24 33Oyeniyi Samuel
 
PARTITION SORT REVISITED: RECONFIRMING THE ROBUSTNESS IN AVERAGE CASE AND MUC...
PARTITION SORT REVISITED: RECONFIRMING THE ROBUSTNESS IN AVERAGE CASE AND MUC...PARTITION SORT REVISITED: RECONFIRMING THE ROBUSTNESS IN AVERAGE CASE AND MUC...
PARTITION SORT REVISITED: RECONFIRMING THE ROBUSTNESS IN AVERAGE CASE AND MUC...IJCSEA Journal
 
Item Response Theory in Constructing Measures
Item Response Theory in Constructing MeasuresItem Response Theory in Constructing Measures
Item Response Theory in Constructing MeasuresCarlo Magno
 
INFLUENCE OF PRIORS OVER MULTITYPED OBJECT IN EVOLUTIONARY CLUSTERING
INFLUENCE OF PRIORS OVER MULTITYPED OBJECT IN EVOLUTIONARY CLUSTERINGINFLUENCE OF PRIORS OVER MULTITYPED OBJECT IN EVOLUTIONARY CLUSTERING
INFLUENCE OF PRIORS OVER MULTITYPED OBJECT IN EVOLUTIONARY CLUSTERINGcscpconf
 
Influence of priors over multityped object in evolutionary clustering
Influence of priors over multityped object in evolutionary clusteringInfluence of priors over multityped object in evolutionary clustering
Influence of priors over multityped object in evolutionary clusteringcsandit
 
Quality By Design
Quality By DesignQuality By Design
Quality By Designrealmayank
 
A New Model for Credit Approval Problems a Neuro Genetic System with Quantum ...
A New Model for Credit Approval Problems a Neuro Genetic System with Quantum ...A New Model for Credit Approval Problems a Neuro Genetic System with Quantum ...
A New Model for Credit Approval Problems a Neuro Genetic System with Quantum ...Anderson Pinho
 
07.12.2012 - Aprajit Mahajan
07.12.2012 - Aprajit Mahajan07.12.2012 - Aprajit Mahajan
07.12.2012 - Aprajit MahajanAMDSeminarSeries
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...ijceronline
 
A Validation of Object-Oriented Design Metrics as Quality Indicators
A Validation of Object-Oriented Design Metrics as Quality IndicatorsA Validation of Object-Oriented Design Metrics as Quality Indicators
A Validation of Object-Oriented Design Metrics as Quality Indicatorsvie_dels
 
Optimum failure truncated testing strategies
Optimum failure truncated testing strategies Optimum failure truncated testing strategies
Optimum failure truncated testing strategies ASQ Reliability Division
 
Mining at scale with latent factor models for matrix completion
Mining at scale with latent factor models for matrix completionMining at scale with latent factor models for matrix completion
Mining at scale with latent factor models for matrix completionFabio Petroni, PhD
 

Similar to Impact of Tie-Breaking Bias on Information Retrieval Evaluation (20)

A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
A Hybrid Method of CART and Artificial Neural Network for Short Term Load For...
 
Ghotra icse
Ghotra icseGhotra icse
Ghotra icse
 
Fulltext
FulltextFulltext
Fulltext
 
PhD defense talk slides
PhD  defense talk slidesPhD  defense talk slides
PhD defense talk slides
 
Bb25322324
Bb25322324Bb25322324
Bb25322324
 
Kernel based approaches in drug target interaction prediction
Kernel based approaches in drug target interaction predictionKernel based approaches in drug target interaction prediction
Kernel based approaches in drug target interaction prediction
 
Yahoo search-study
Yahoo search-studyYahoo search-study
Yahoo search-study
 
3 article azojete vol 7 24 33
3 article azojete vol 7 24 333 article azojete vol 7 24 33
3 article azojete vol 7 24 33
 
PARTITION SORT REVISITED: RECONFIRMING THE ROBUSTNESS IN AVERAGE CASE AND MUC...
PARTITION SORT REVISITED: RECONFIRMING THE ROBUSTNESS IN AVERAGE CASE AND MUC...PARTITION SORT REVISITED: RECONFIRMING THE ROBUSTNESS IN AVERAGE CASE AND MUC...
PARTITION SORT REVISITED: RECONFIRMING THE ROBUSTNESS IN AVERAGE CASE AND MUC...
 
Item Response Theory in Constructing Measures
Item Response Theory in Constructing MeasuresItem Response Theory in Constructing Measures
Item Response Theory in Constructing Measures
 
INFLUENCE OF PRIORS OVER MULTITYPED OBJECT IN EVOLUTIONARY CLUSTERING
INFLUENCE OF PRIORS OVER MULTITYPED OBJECT IN EVOLUTIONARY CLUSTERINGINFLUENCE OF PRIORS OVER MULTITYPED OBJECT IN EVOLUTIONARY CLUSTERING
INFLUENCE OF PRIORS OVER MULTITYPED OBJECT IN EVOLUTIONARY CLUSTERING
 
Influence of priors over multityped object in evolutionary clustering
Influence of priors over multityped object in evolutionary clusteringInfluence of priors over multityped object in evolutionary clustering
Influence of priors over multityped object in evolutionary clustering
 
Quality By Design
Quality By DesignQuality By Design
Quality By Design
 
A New Model for Credit Approval Problems a Neuro Genetic System with Quantum ...
A New Model for Credit Approval Problems a Neuro Genetic System with Quantum ...A New Model for Credit Approval Problems a Neuro Genetic System with Quantum ...
A New Model for Credit Approval Problems a Neuro Genetic System with Quantum ...
 
07.12.2012 - Aprajit Mahajan
07.12.2012 - Aprajit Mahajan07.12.2012 - Aprajit Mahajan
07.12.2012 - Aprajit Mahajan
 
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...IJCER (www.ijceronline.com) International Journal of computational Engineerin...
IJCER (www.ijceronline.com) International Journal of computational Engineerin...
 
A Validation of Object-Oriented Design Metrics as Quality Indicators
A Validation of Object-Oriented Design Metrics as Quality IndicatorsA Validation of Object-Oriented Design Metrics as Quality Indicators
A Validation of Object-Oriented Design Metrics as Quality Indicators
 
Deg rbn eccs
Deg rbn eccsDeg rbn eccs
Deg rbn eccs
 
Optimum failure truncated testing strategies
Optimum failure truncated testing strategies Optimum failure truncated testing strategies
Optimum failure truncated testing strategies
 
Mining at scale with latent factor models for matrix completion
Mining at scale with latent factor models for matrix completionMining at scale with latent factor models for matrix completion
Mining at scale with latent factor models for matrix completion
 

More from Guillaume Cabanac

Adoption de l’identifiant ORCID : le cas des universités toulousaines
Adoption de l’identifiant ORCID : le cas des universités toulousainesAdoption de l’identifiant ORCID : le cas des universités toulousaines
Adoption de l’identifiant ORCID : le cas des universités toulousainesGuillaume Cabanac
 
Dépollution de la littérature scientifique : traque d’expression torturées ...
Dépollution de la littérature scientifique : traque d’expression torturées ...Dépollution de la littérature scientifique : traque d’expression torturées ...
Dépollution de la littérature scientifique : traque d’expression torturées ...Guillaume Cabanac
 
Valoriser le capital documentaire (en sommeil) d’une organisation : exploitat...
Valoriser le capital documentaire (en sommeil) d’une organisation : exploitat...Valoriser le capital documentaire (en sommeil) d’une organisation : exploitat...
Valoriser le capital documentaire (en sommeil) d’une organisation : exploitat...Guillaume Cabanac
 
Comment analyser une mobilisation collective dans les réseaux socionumériques...
Comment analyser une mobilisation collective dans les réseaux socionumériques...Comment analyser une mobilisation collective dans les réseaux socionumériques...
Comment analyser une mobilisation collective dans les réseaux socionumériques...Guillaume Cabanac
 
Gender as a Variable to Study Academic Writing
Gender as a Variable to Study Academic WritingGender as a Variable to Study Academic Writing
Gender as a Variable to Study Academic WritingGuillaume Cabanac
 
Prospection de textes scientifiques : vision prospective
Prospection de textes scientifiques : vision prospectiveProspection de textes scientifiques : vision prospective
Prospection de textes scientifiques : vision prospectiveGuillaume Cabanac
 
Questionner le texte scientifique pour caractériser la science et l'innovation
Questionner le texte scientifique pour caractériser la science et l'innovationQuestionner le texte scientifique pour caractériser la science et l'innovation
Questionner le texte scientifique pour caractériser la science et l'innovationGuillaume Cabanac
 
Le carnet de l'avent de la sociologie francophone sur Twitter : réseaux et al...
Le carnet de l'avent de la sociologie francophone sur Twitter : réseaux et al...Le carnet de l'avent de la sociologie francophone sur Twitter : réseaux et al...
Le carnet de l'avent de la sociologie francophone sur Twitter : réseaux et al...Guillaume Cabanac
 
Interroger le texte scientifique
Interroger le texte scientifiqueInterroger le texte scientifique
Interroger le texte scientifiqueGuillaume Cabanac
 
The promises of web scrapping: Mining the web for relational data about artists
The promises of web scrapping: Mining the web for relational data about artistsThe promises of web scrapping: Mining the web for relational data about artists
The promises of web scrapping: Mining the web for relational data about artistsGuillaume Cabanac
 
Émergence de l’open access « gris » : LibGen et Sci-Hub comme filières clande...
Émergence de l’open access « gris » : LibGen et Sci-Hub comme filières clande...Émergence de l’open access « gris » : LibGen et Sci-Hub comme filières clande...
Émergence de l’open access « gris » : LibGen et Sci-Hub comme filières clande...Guillaume Cabanac
 
Confrontation à la perception humaine de mesures de similarité entre membres
Confrontation à la perception humaine de mesures de similarité entre membres Confrontation à la perception humaine de mesures de similarité entre membres
Confrontation à la perception humaine de mesures de similarité entre membres Guillaume Cabanac
 
« T'as pensé à retweeter mon article ? » Enjeux, limites et critique de la bi...
« T'as pensé à retweeter mon article ? » Enjeux, limites et critique de la bi...« T'as pensé à retweeter mon article ? » Enjeux, limites et critique de la bi...
« T'as pensé à retweeter mon article ? » Enjeux, limites et critique de la bi...Guillaume Cabanac
 
Émergence de l’open access « gris » : LibGen et Sci-Hub
Émergence de l’open access « gris » : LibGen et Sci-HubÉmergence de l’open access « gris » : LibGen et Sci-Hub
Émergence de l’open access « gris » : LibGen et Sci-HubGuillaume Cabanac
 
Sur les étagères des bibliothèques numériques clandestines:
Sur les étagères des bibliothèques numériques clandestines: Sur les étagères des bibliothèques numériques clandestines:
Sur les étagères des bibliothèques numériques clandestines: Guillaume Cabanac
 
Les altmetrics : estimer l'engouement pour la recherche sur les médias sociaux
Les altmetrics : estimer l'engouement pour la recherche sur les médias sociauxLes altmetrics : estimer l'engouement pour la recherche sur les médias sociaux
Les altmetrics : estimer l'engouement pour la recherche sur les médias sociauxGuillaume Cabanac
 
A Journey in Scientometrics: quantitative studies of science at the crossroad...
A Journey in Scientometrics: quantitative studies of science at the crossroad...A Journey in Scientometrics: quantitative studies of science at the crossroad...
A Journey in Scientometrics: quantitative studies of science at the crossroad...Guillaume Cabanac
 
Bibliogifts ? Les bibliothèques clandestines de l'édition scientifique
Bibliogifts ? Les bibliothèques clandestines de l'édition scientifiqueBibliogifts ? Les bibliothèques clandestines de l'édition scientifique
Bibliogifts ? Les bibliothèques clandestines de l'édition scientifiqueGuillaume Cabanac
 
Le renfort des liens forts - dynamique relationnelle du coauthorship
Le renfort des liens forts - dynamique relationnelle du coauthorshipLe renfort des liens forts - dynamique relationnelle du coauthorship
Le renfort des liens forts - dynamique relationnelle du coauthorshipGuillaume Cabanac
 

More from Guillaume Cabanac (20)

Adoption de l’identifiant ORCID : le cas des universités toulousaines
Adoption de l’identifiant ORCID : le cas des universités toulousainesAdoption de l’identifiant ORCID : le cas des universités toulousaines
Adoption de l’identifiant ORCID : le cas des universités toulousaines
 
Dépollution de la littérature scientifique : traque d’expression torturées ...
Dépollution de la littérature scientifique : traque d’expression torturées ...Dépollution de la littérature scientifique : traque d’expression torturées ...
Dépollution de la littérature scientifique : traque d’expression torturées ...
 
Interroger la science
Interroger la scienceInterroger la science
Interroger la science
 
Valoriser le capital documentaire (en sommeil) d’une organisation : exploitat...
Valoriser le capital documentaire (en sommeil) d’une organisation : exploitat...Valoriser le capital documentaire (en sommeil) d’une organisation : exploitat...
Valoriser le capital documentaire (en sommeil) d’une organisation : exploitat...
 
Comment analyser une mobilisation collective dans les réseaux socionumériques...
Comment analyser une mobilisation collective dans les réseaux socionumériques...Comment analyser une mobilisation collective dans les réseaux socionumériques...
Comment analyser une mobilisation collective dans les réseaux socionumériques...
 
Gender as a Variable to Study Academic Writing
Gender as a Variable to Study Academic WritingGender as a Variable to Study Academic Writing
Gender as a Variable to Study Academic Writing
 
Prospection de textes scientifiques : vision prospective
Prospection de textes scientifiques : vision prospectiveProspection de textes scientifiques : vision prospective
Prospection de textes scientifiques : vision prospective
 
Questionner le texte scientifique pour caractériser la science et l'innovation
Questionner le texte scientifique pour caractériser la science et l'innovationQuestionner le texte scientifique pour caractériser la science et l'innovation
Questionner le texte scientifique pour caractériser la science et l'innovation
 
Le carnet de l'avent de la sociologie francophone sur Twitter : réseaux et al...
Le carnet de l'avent de la sociologie francophone sur Twitter : réseaux et al...Le carnet de l'avent de la sociologie francophone sur Twitter : réseaux et al...
Le carnet de l'avent de la sociologie francophone sur Twitter : réseaux et al...
 
Interroger le texte scientifique
Interroger le texte scientifiqueInterroger le texte scientifique
Interroger le texte scientifique
 
The promises of web scrapping: Mining the web for relational data about artists
The promises of web scrapping: Mining the web for relational data about artistsThe promises of web scrapping: Mining the web for relational data about artists
The promises of web scrapping: Mining the web for relational data about artists
 
Émergence de l’open access « gris » : LibGen et Sci-Hub comme filières clande...
Émergence de l’open access « gris » : LibGen et Sci-Hub comme filières clande...Émergence de l’open access « gris » : LibGen et Sci-Hub comme filières clande...
Émergence de l’open access « gris » : LibGen et Sci-Hub comme filières clande...
 
Confrontation à la perception humaine de mesures de similarité entre membres
Confrontation à la perception humaine de mesures de similarité entre membres Confrontation à la perception humaine de mesures de similarité entre membres
Confrontation à la perception humaine de mesures de similarité entre membres
 
« T'as pensé à retweeter mon article ? » Enjeux, limites et critique de la bi...
« T'as pensé à retweeter mon article ? » Enjeux, limites et critique de la bi...« T'as pensé à retweeter mon article ? » Enjeux, limites et critique de la bi...
« T'as pensé à retweeter mon article ? » Enjeux, limites et critique de la bi...
 
Émergence de l’open access « gris » : LibGen et Sci-Hub
Émergence de l’open access « gris » : LibGen et Sci-HubÉmergence de l’open access « gris » : LibGen et Sci-Hub
Émergence de l’open access « gris » : LibGen et Sci-Hub
 
Sur les étagères des bibliothèques numériques clandestines:
Sur les étagères des bibliothèques numériques clandestines: Sur les étagères des bibliothèques numériques clandestines:
Sur les étagères des bibliothèques numériques clandestines:
 
Les altmetrics : estimer l'engouement pour la recherche sur les médias sociaux
Les altmetrics : estimer l'engouement pour la recherche sur les médias sociauxLes altmetrics : estimer l'engouement pour la recherche sur les médias sociaux
Les altmetrics : estimer l'engouement pour la recherche sur les médias sociaux
 
A Journey in Scientometrics: quantitative studies of science at the crossroad...
A Journey in Scientometrics: quantitative studies of science at the crossroad...A Journey in Scientometrics: quantitative studies of science at the crossroad...
A Journey in Scientometrics: quantitative studies of science at the crossroad...
 
Bibliogifts ? Les bibliothèques clandestines de l'édition scientifique
Bibliogifts ? Les bibliothèques clandestines de l'édition scientifiqueBibliogifts ? Les bibliothèques clandestines de l'édition scientifique
Bibliogifts ? Les bibliothèques clandestines de l'édition scientifique
 
Le renfort des liens forts - dynamique relationnelle du coauthorship
Le renfort des liens forts - dynamique relationnelle du coauthorshipLe renfort des liens forts - dynamique relationnelle du coauthorship
Le renfort des liens forts - dynamique relationnelle du coauthorship
 

Impact of Tie-Breaking Bias on Information Retrieval Evaluation

  • 1. CLEF’10: Conference on Multilingual and Multimodal Information Access Evaluation September 20-23, Padua, Italy Tie-Breaking Bias: Effect of an Uncontrolled Parameter on Information Retrieval Evaluation Guillaume Cabanac, Gilles Hubert, Mohand Boughanem, Claude Chrisment
  • 2. Effect of the Tie-Breaking Bias G. Cabanac et al. Outline 1. Motivation A tale about two TREC participants 2. Context IRS effectiveness evaluation Issue Tie-breaking bias effects 3. Contribution Reordering strategies 4. Experiments Impact of the tie-breaking bias 5. Conclusion and Future Works 2
  • 3. Effect of the Tie-Breaking Bias G. Cabanac et al. Outline 1. Motivation A tale about two TREC participants 2. Context IRS effectiveness evaluation Issue Tie-breaking bias effects 3. Contribution Reordering strategies 4. Experiments Impact of the tie-breaking bias 5. Conclusion and Future Works 3
  • 4. 1. Motivation  Tie-breaking bias illustration G. Cabanac et al. A tale about two TREC participants (1/2) Topic 031 “satellite launch contracts” 5 relevant documents Chris Ellen one single difference C = ( , 0.8), ( , 0.8), ( , 0.5) E = ( , 0.8), ( , 0.8), ( , 0.5) unlucky lucky Why such a huge difference? 4
  • 5. 1. Motivation  Tie-breaking bias illustration G. Cabanac et al. A tale about two TREC participants (2/2) Chris Ellen one single difference C = ( , 0.8), ( , 0.8), ( , 0.5) E = ( , 0.8), ( , 0.8), ( , 0.5) After 15 days of hard work  Only difference: the name of one document  5
  • 6. Effect of the Tie-Breaking Bias G. Cabanac et al. Outline 1. Motivation A tale about two TREC participants 2. Context IRS effectiveness evaluation Issue Tie-breaking bias effects 3. Contribution Reordering strategies 4. Experiments Impact of the tie-breaking bias 5. Conclusion and Future Works 6
  • 7. 2. Context & issue  Tie-breaking bias G. Cabanac et al. Measuring the effectiveness of IRSs  User-centered vs. System-focused [Spärk Jones & Willett, 1997]  Evaluation campaigns  1958 Cranfield UK  1992 TREC Text Retrieval Conference USA  1999 NTCIR NII Test Collection for IR Systems Japan  2001 CLEF Cross-Language Evaluation Forum Europe  …  “Cranfield” methodology  Task  Test collection  Corpus  Topics  Qrels  Measures : MAP, P@X ... 7 using trec_eval [Voorhees, 2007]
  • 8. 2. Context & issue  Tie-breaking bias G. Cabanac et al. Runs are reordered prior to their evaluation Qrels = qid, iter, docno, rel Run = qid, iter, docno, rank, sim, run_id ( , 0.8), ( , 0.8), ( , 0.5) Reordering by trec_eval qid asc, sim desc, docno desc ( , 0.8), ( , 0.8), ( , 0.5) Effectiveness measure = f (intrinsic_quality, ) MAP, P@X, MRR… 8
  • 9. Effect of the Tie-Breaking Bias G. Cabanac et al. Outline 1. Motivation A tale about two TREC participants 2. Context IRS effectiveness evaluation Issue Tie-breaking bias effects 3. Contribution Reordering strategies 4. Experiments Impact of the tie-breaking bias 5. Conclusion and Future Works 9
  • 10. 3. Contribution  Reordering strategies G. Cabanac et al. Consequences of run reordering  Measures of effectiveness for an IRS s  RR(s,t) 1/rank of the 1st relevant document, for topic t   P(s,t,d) precision at document d, for topic t Sensitive to  AP(s,t) average precision for topic t document rank  MAP(s) mean average precision  Tie-breaking bias Ellen Chris  Is the Wall Street Journal collection more relevant than Associated Press?  Problem 1 comparing 2 systems AP(s1, t) vs. AP(s2, t)  Problem 2 comparing 2 topics AP(s, t1) vs. AP(s, t2) 10
  • 11. 3. Contribution  Reordering strategies G. Cabanac et al. Alternative unbiased reordering strategies ex aequo ex aequo  Conventional reordering (TREC)  Ties sorted Z  A qid asc, sim desc, docno desc  Realistic reordering  Relevant docs last qid asc, sim desc, rel asc, docno desc  Optimistic reordering 11  Relevant docs first qid asc, sim desc, rel desc, docno desc
  • 12. Effect of the Tie-Breaking Bias G. Cabanac et al. Outline 1. Motivation A tale about two TREC participants 2. Context IRS effectiveness evaluation Issue Tie-breaking bias effects 3. Contribution Reordering strategies 4. Experiments Impact of the tie-breaking bias 5. Conclusion and Future Works 12
  • 13. 4. Experiments  Impact of the tie-breaking bias G. Cabanac et al. Effect of the tie-breaking bias  Study of 4 TREC tasks 1993 1997 1998 1999 2000 2002 2004 2009 routing filtering web adhoc  22 editions 3 GB of data from trec.nist.gov  1360 runs  Assessing the effect of tie-breaking  Proportion of document ties  How frequent is the bias?  Effect on measure values  Top 3 observed differences  Observed difference in %  Significance of the observed difference: Student’s t-test (paired, unilateral) 13
  • 14. 4. Experiments  Impact of the tie-breaking bias G. Cabanac et al. Ties demographics  89.6% of the runs comprise ties  Ties are present all along the runs 14
  • 15. 4. Experiments  Impact of the tie-breaking bias G. Cabanac et al. Proportion of tied documents in submitted runs 15 On average, 25.2 % of a result-list = tied documents On average, 10.6 docs in a tied group of docs
  • 16. 4. Experiments  Impact of the tie-breaking bias G. Cabanac et al. Effect on Reciprocal Rank (RR) 16
  • 17. 4. Experiments  Impact of the tie-breaking bias G. Cabanac et al. Effect on Average Precision (AP) 17
  • 18. 4. Experiments  Impact of the tie-breaking bias G. Cabanac et al. Effect on Mean Average Precision (MAP) Difference of ranks computed on MAP not significant (Kendall’s t) 18
  • 19. 4. Experiments  Impact of the tie-breaking bias G. Cabanac et al. What we learnt: Beware of tie-breaking for AP  Poor effect on MAP, larger effect on AP  Measure bounds APRealistic  APConventionnal  APOptimistic padre1, adhoc’94  Failure analysis for the ranking process  Error bar = element of chance  potential for improvement 19
  • 20. 4. Experiments  Impact of the tie-breaking bias G. Cabanac et al. Related works in IR evaluation Topics reliability? [Buckley & Voorhees, 2000]  25 [Voorhees & Buckley, 2002] error rate [Voorhees, 2009] n collections Qrels reliability? [Voorhees, 1998] quality [Al-Maskari et al., 2008] TREC vs. TREC [Voorhees, 2007] Measures reliability? [Buckley & Voorhees, 2000] MAP  [Sakai, 2008] ‘system bias’ [Moffat & Zobel, 2008] new measures [Raghavan et al., 1989] Precall Pooling reliability? [McSherry & Najork, 2008] Tied scores [Zobel, 1998] approximation  [Sanderson & Joho, 2004] manual [Cabanac et al., 2010] tie-breaking bias [Buckley et al., 2007] size adaptation 20
  • 21. Effect of the Tie-Breaking Bias G. Cabanac et al. Outline 1. Motivation A tale about two TREC participants 2. Context IRS effectiveness evaluation Issue Tie-breaking bias effects 3. Contribution Reordering strategies 4. Experiments Impact of the tie-breaking bias 5. Conclusion and Future Works 21
  • 22. Impact du « biais des ex aequo » dans les évaluations de RI G. Cabanac et al. Conclusions and future works  Context: IR evaluation  TREC and other campaigns based on trec_eval  Contributions  Measure = f (intrinsic_quality, luck)  tie-breaking bias  Measure bounds (realistic  conventional  optimistic)  Study of the tie-breaking bias effect  (conventional, realistic) for RR, AP and MAP  Strong correlation, yet significant difference  No difference on system rankings (based on MAP)  Future works  Study of other / more recent evaluation campaigns  Reordering-free measures 22  Finer grained analyses: finding vs. ranking
  • 23. CLEF’10: Conference on Multilingual and Multimodal Information Access Evaluation September 20-23, Padua, Italy Thank you