In many domains of information retrieval, system estimates of document relevance are based on multidimensional quality criteria that have to be accommodated in a unidimensional result ranking. Current solutions to this challenge are often inconsistent with the formal probabilistic framework in which constituent scores were estimated, or use sophisticated learning methods that make it difficult for humans to understand the origin of the final ranking. To address these issues, we introduce the use of copulas, a powerful statistical framework for modeling complex multi-dimensional dependencies, to information retrieval tasks. We provide a formal background to copulas and demonstrate their effectiveness on standard IR tasks such as combining multidimensional relevance estimates and fusion of results from multiple search engines. We introduce copula-based versions of standard relevance estimators and fusion methods and show that these lead to significant performance improvements on several tasks, as evaluated on large-scale standard corpora, compared to their non-copula counterparts. We also investigate criteria for understanding the likely effect of using copula models in a given retrieval scenario.
This work together with Arjen P. de Vries and Kevyn Collins-Thompson has been accepted for full oral presentation at the 36th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR) in Dublin, Ireland. The full version of this paper is available at: http://dl.acm.org/citation.cfm?id=2484066
2. Copulas – What is it all about?
• Assume two sufficiently different
commodities
• Rare elemental metals
• Pork bellies
• No apparent correlations
0
1
2
3
4
5
6
Rare Earths Pork Bellies
3. Copulas – What is it all about?
• Two seemingly independent variables
• Yet, for rare extreme cases, there are
co-movements
• “Tail dependencies”
• Copulas decouple observations and
dependencies
• IR models are good at estimating marginals
• Copulas are good at combining them
4. Overview
1. Non-linear Dependency Structures in IR
2. Copulas – Intuition & Background
3. Multivariate Relevance Estimation
4. When to use them?
5. Score Fusion
6. Conclusion & Future Directions
6. Multivariate Relevance Modelling
• IR Systems index and retrieve a growing variety of document types
• Many structured, or at least “complex”
• Single-criteria relevance frameworks do not perform well
• Multi-criteria models tend to be either:
a) Naïve (e.g., independence assumption), or,
b) Hard to qualitatively interpret for humans (e.g., L2R)
7. Non-Linear Dependencies
• Non-linear dependency structures are still a challenge
• TREC 2010 Faceted Blog Distillation Task, Topic 1171, “mysql”
• Relevance Criteria:
• Topicality
• Subjectivity
11. Non-Linear Dependencies
• Pearson’s ᵨ= 0.18
• So, there is no real dependency
• …right?
• In the lower third of the scale,
we note ᵨ= 0.37,
• And in the upper third, it turns
to ᵨ= -0.4
13. Copulas (from copulare, to join)
• Copulas model complex non-linear dependencies between variables
that simple correlations can't capture
• Decouple marginal distributions from dependency structure
• Approximate joint multivariate distributions
• Applied previously in portfolio and risk management, meteorology,
river flooding predictions, …
14. Formal Basics
• Given a k-dimensional rv
• Map to unit cube
• Describe joint cdf with copula
• Isolation of a component
• Copula’s zero
15. Closing the circle
• Recall the example TREC topic 1171
• Linear combination: AP = 0.14,
below collection average (0.25)
• Fit Clayton copula to model joint
relevance distribution
• AP rises to 0.22
17. Joint Relevance Estimation
• Estimate marginal distributions from data
• Estimate copula fitting parameters to maximize posterior probability of
observing data
• Use copula to represent joint probability of relevance
18. Joint Relevance Estimation
• We study three different scenarios:
• Opinionated blog posts
• Personalized bookmarks
• Child-friendly websites
• Use original training portion of the corpora where available
• A 90/10 split otherwise
19. Results I – Opinionated Blog Posts
• TREC Blogs08 dataset
• 1.3 M documents
• Relevance dimensions: Topicality & Subjectivity
• Significantly higher performance than linear combination model
20. Results II – Personalized Bookmarks
• Dataset by Vallet & Castells
• 339k documents
• Relevance Dimensions: Topicality & Personal relevance
• Significantly performance gains in some metrics
21. Results III – Child-friendly Websites
• Dataset from the PuppyIR project (http://puppyir.eu)
• 22k documents
• Relevance Dimensions: Topicality & Child-suitability
• Worse-than-baseline performance
23. When to use them?
• Previously: Strongly varying performance for different settings
• Is there a way of predicting the merit?
• Recall: copulas model tail dependencies between dimensions
25. Measuring Tail Dependencies
• According to Frees and Valdez 1998: IL and IU measure strength of
lower and upper tail dependencies
• Anderson-Darling test of goodness-of-fit between copula and
observed data
Domain Frees Tail index Anderson-Darling Actual Retrieval
Performance
Opinionated Blogs IL = 0.07 0.67 Copulas > linear
Personalized Bookmarks IU = 0.49 0.47 Copulas = linear
Child-friendly Websites IL = IU = 0 0.046 Copulas < linear
27. Score Fusion
• A different angle on relevance estimation
• Combine individual retrieval system scores instead of modelling relevance
from content criteria
• In this setting, submissions to historic TRECs serve as criteria
• We randomly draw k individual runs and combine them using copulas
29. Results – TREC 4
• Results are averaged across 200 randomizations per setting of k
• Relative improvements over the best, worst and median fused run in
terms of percentages of MAP
• Small but consistent improvements over non-copula fusion baselines
30. Robustness - CombSUM
• Fusion approaches are often
sensitive to weak contributions
• We control the number of weak
submissions added to the fusion
• Copulas’ explicit modeling of
dependency structure is more
robust
31. Robustness - CombMNZ
• Fusion approaches are often
sensitive to weak contributions
• We control the number of weak
submissions added to the fusion
• Copulas’ explicit modeling of
dependency structure is more
robust
33. Conclusion
• Copulas decouple observations and dependencies
• IR models are good at estimating marginal
• Copulas are good at combining them
• We use them for multivariate relevance estimation
• Strongly scenario-dependent performance
• Tail indices & goodness of fit tests as estimators of expected performance
• Copulas for score fusion
• Robust to outliers
34. The Road Ahead
• Currently, we use single copulas for relevance modelling
• Copula mixtures and composite Archimedean copulas for higher accuracy
• Here, we use pre-existing copula families and fit them to data
• Instead, can we formalize copulas from scratch to include domain knowledge?
• So far, we explored two-dimensional relevance spaces
• What happens as we move into higher-order systems?