Impact of Tie-Breaking Bias on Information Retrieval Evaluation

CLEF’10: Conference on Multilingual and Multimodal
Information Access Evaluation
September 20-23, Padua, Italy

Tie-Breaking Bias:
Effect of an Uncontrolled Parameter
on Information Retrieval Evaluation

Guillaume Cabanac, Gilles Hubert,
Mohand Boughanem, Claude Chrisment

Effect of the Tie-Breaking Bias G. Cabanac et al.

Outline
1. Motivation A tale about two TREC participants

2. Context IRS effectiveness evaluation

Issue Tie-breaking bias effects

3. Contribution Reordering strategies

4. Experiments Impact of the tie-breaking bias

5. Conclusion and Future Works

2


Outline






3

1. Motivation  Tie-breaking bias illustration G. Cabanac et al.

A tale about two TREC participants (1/2)

Topic 031 “satellite launch contracts” 5 relevant documents

Chris Ellen
one single difference

C = ( , 0.8), ( , 0.8), ( , 0.5) E = ( , 0.8), ( , 0.8), ( , 0.5)

unlucky lucky
Why such a huge difference? 4

1. Motivation  Tie-breaking bias illustration G. Cabanac et al.

A tale about two TREC participants (2/2)
Chris Ellen
one single difference

C = ( , 0.8), ( , 0.8), ( , 0.5) E = ( , 0.8), ( , 0.8), ( , 0.5)

After 15 days of hard work

 Only difference: the name of one document  5


Outline






6

2. Context & issue  Tie-breaking bias G. Cabanac et al.

Measuring the effectiveness of IRSs
 User-centered vs. System-focused [Spärk Jones & Willett, 1997]

 Evaluation campaigns
 1958 Cranfield UK
 1992 TREC Text Retrieval Conference USA
 1999 NTCIR NII Test Collection for IR Systems Japan
 2001 CLEF Cross-Language Evaluation Forum Europe
 …

 “Cranfield” methodology
 Task
 Test collection
 Corpus

 Topics

 Qrels

 Measures : MAP, P@X ...
7
using trec_eval [Voorhees, 2007]

2. Context & issue  Tie-breaking bias G. Cabanac et al.

Runs are reordered prior to their evaluation
Qrels = qid, iter, docno, rel Run = qid, iter, docno, rank, sim, run_id

( , 0.8), ( , 0.8), ( , 0.5)

Reordering by trec_eval
qid asc, sim desc, docno desc

( , 0.8), ( , 0.8), ( , 0.5)

Effectiveness measure = f (intrinsic_quality, )
MAP, P@X, MRR… 8


Outline






9

3. Contribution  Reordering strategies G. Cabanac et al.

Consequences of run reordering
 Measures of effectiveness for an IRS s
 RR(s,t) 1/rank of the 1st relevant document, for topic t

 P(s,t,d) precision at document d, for topic t Sensitive to
 AP(s,t) average precision for topic t document
rank
 MAP(s) mean average precision

 Tie-breaking bias

Ellen

Chris

 Is the Wall Street Journal collection more relevant than Associated Press?

 Problem 1 comparing 2 systems AP(s1, t) vs. AP(s2, t)
 Problem 2 comparing 2 topics AP(s, t1) vs. AP(s, t2) 10

3. Contribution  Reordering strategies G. Cabanac et al.

Alternative unbiased reordering strategies

ex aequo

ex aequo

 Conventional reordering (TREC)
 Ties sorted Z  A qid asc, sim desc, docno desc

 Realistic reordering
 Relevant docs last qid asc, sim desc, rel asc, docno desc

 Optimistic reordering
11
 Relevant docs first qid asc, sim desc, rel desc, docno desc


Outline






12

4. Experiments  Impact of the tie-breaking bias G. Cabanac et al.

Effect of the tie-breaking bias
 Study of 4 TREC tasks
1993 1997 1998 1999 2000 2002 2004 2009

routing filtering web

adhoc

 22 editions
3 GB of data from trec.nist.gov
 1360 runs

 Assessing the effect of tie-breaking
 Proportion of document ties  How frequent is the bias?
 Effect on measure values
 Top 3 observed differences
 Observed difference in %
 Significance of the observed difference: Student’s t-test (paired, unilateral)
13


Ties demographics
 89.6% of the runs comprise ties

 Ties are present all along the runs

14


Proportion of tied documents in submitted runs

15
On average, 25.2 % of a result-list = tied documents On average, 10.6 docs in a tied group of docs


Effect on Reciprocal Rank (RR)

16


Effect on Average Precision (AP)

17


Effect on Mean Average Precision (MAP)

Difference of ranks computed
on MAP not significant
(Kendall’s t)

18


What we learnt: Beware of tie-breaking for AP
 Poor effect on MAP, larger effect on AP

 Measure bounds APRealistic  APConventionnal  APOptimistic

padre1, adhoc’94

 Failure analysis for the ranking process
 Error bar = element of chance  potential for improvement
19


Related works in IR evaluation

Topics reliability?
[Buckley & Voorhees, 2000]  25
[Voorhees & Buckley, 2002] error rate
[Voorhees, 2009] n collections

Qrels reliability?
[Voorhees, 1998] quality
[Al-Maskari et al., 2008] TREC vs. TREC

[Voorhees, 2007]
Measures reliability?
[Buckley & Voorhees, 2000] MAP 
[Sakai, 2008] ‘system bias’
[Moffat & Zobel, 2008] new measures
[Raghavan et al., 1989] Precall Pooling reliability?
[McSherry & Najork, 2008] Tied scores [Zobel, 1998] approximation 
[Sanderson & Joho, 2004] manual
[Cabanac et al., 2010] tie-breaking bias [Buckley et al., 2007] size adaptation 20


Outline






21

Impact du « biais des ex aequo » dans les évaluations de RI G. Cabanac et al.

Conclusions and future works
 Context: IR evaluation
 TREC and other campaigns based on trec_eval

 Contributions
 Measure = f (intrinsic_quality, luck)  tie-breaking bias

 Measure bounds (realistic  conventional  optimistic)

 Study of the tie-breaking bias effect
 (conventional, realistic) for RR, AP and MAP

 Strong correlation, yet significant difference

 No difference on system rankings (based on MAP)

 Future works
 Study of other / more recent evaluation campaigns
 Reordering-free measures
22
 Finer grained analyses: finding vs. ranking

CLEF’10: Conference on Multilingual and Multimodal
Information Access Evaluation
September 20-23, Padua, Italy

Thank you

Impact of Tie-Breaking Bias on Information Retrieval Evaluation

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Impact of Tie-Breaking Bias on Information Retrieval Evaluation

Similar to Impact of Tie-Breaking Bias on Information Retrieval Evaluation (20)

More from Guillaume Cabanac

More from Guillaume Cabanac (20)

Impact of Tie-Breaking Bias on Information Retrieval Evaluation