2. Traditional robustness
Definition of contamination
Simple examples
Weighted representation
Independent Contamination
The Idea
Why traditional robust estimates don’t work
Naive approaches
Cell-weighting approach
2 / 17
3. The Problem (aka Disclaimer) and Terminology
Estimation of mean vector µ and covariance matrix Σ of
supposedly i.i.d. multivariate sample: x1 , . . . , xn ∈ Rp .
Data matrix
x1 x11 x12 ... x1p
x x21 x22 ... x2p
2
X= . = .
. . .
. .
. . .
. .
. .
.
xn xn1 xn2 . . . xnp
Vectors xi ∈ Rp – data cases
Values xij ∈ R – data values or cells
3 / 17
4. Types of error in Statistics
1. Usual statistical error.
Every observation is moderately affected
Xobs = Xmean + e, with e ∼ N (0, σ 2 )
where variance of e defines the quality of the data.
2. Contamination.
Some observations are ruined:
Xgood , usually
Xobs =
Xhorrible , sometimes.
Typically comes on top of the usual error:
Xgood = Xmean + e.
4 / 17
5. Mixture contamination model
Observed data come from the mixture distribution
F = (1 − ε)F0 (θ) + εH
F0 (θ) is the distribution of interest
H is an arbitrary unknown nuisance distribution.
Equivalently
X = (1 − B)Xgood + BXhorrible ,
where B is a Bernoulli(ε) indicator.
Estimate T (F ): feed data from F , obtain estimates for θ.
Breakdown point
εBP (T ) = sup sup T (F (θ, ε, H)) < ∞
ε H
that is the maximum ε such that T can still isolate F0 from H.
Maximum achievable (and desirable)
εBP (T ) ≤ 0.5.
5 / 17
6. Examples: simple robust estimates
Location
Median: x(n/2)
n(1−δ/2)
1
Trimmed mean: x(i) , with δ ∈ (0, 1).
n(1 − δ)
i=nδ/2
Scale
MAD: Median |xi − Median xj |
i j
IQR: x(n/4) − x(3n/4)
Regression
LMS: arg min Median(yi − β xi )2
β i
6 / 17
7. Examples: multivariate robust estimates
Minimum Covariance Determinant (MCD) by Rousseeuw (1985):
minimize determinant of sample covariance of 50% of data points:
6
Sample Covariance
4
MCD
2
Clean
0
−2
−4
−6
7 / 17
8. Weighted representation
Many robust estimates can be represented as weighted versions of
familiar estimates
n
i=1 wi xi
ˆ
µ= n
i=1 wi
n
ˆ i=1 wi (xi − µ)(xi
ˆ − µ)
ˆ
Σ= n ,
i=1 wi
with weights depending on the estimates themselves
ˆ ˆ
wi = w(MD(xi ; µ, Σ)),
where Mahalanobis Distances are given by
MD(xi ; µ, Σ) = (xi − µ) Σ−1 (xi − µ).
ˆ ˆ ˆ ˆ ˆ
8 / 17
9. Contaminated cells not cases
Traditional Contamination Independent Contamination
ε = 10%
q q
9 / 17
10. Generalized Contamination
Data entry errors, hardware malfunction, etc
Can express as
Xj = (1 − Bj )(XGood )j + Bj (XHorrible )j , for j = 1, . . . , p,
or, in matrix form, as
X = (1 − B)X Good + BX Horrible ,
where B is a vector of Bernoulli r.v.’s
B’s dependence structure is important
Will assume Independent Contamination: all Bj are
independent and independent of X’s.
Also: P[Bj = 1] = ε for simplicity.
10 / 17
11. Number of clean cases
each case will appear as outlier if diagnosed with MD’s
P[case is clean] = (1 − ε)p
e.g. with ε = 0.05 and p = 20 — only 20% are clean
waste of data
exceeds breakdown point of traditional robust estimates.
11 / 17
12. Affine-equivariance
Definition: if data set Y = A + XB, then
ˆ ˆ
µ(Y ) = A + B µ(Y )
ˆ ˆ
Σ(Y ) = B ΣB,
Desirable: easy to study etc
Most “respectable” robust estimates are A-E
Alqallaf et al (2009) have a proof that reasonable A-E
estimates cannot be robust against IC
if know how it behaves on X, then know for Y ; and vice versa
12 / 17
13. Affine Transformation of Contaminated Data
Original Contaminated Transformed
X → Y = XB
−→
q q
13 / 17
14. Pairwise approach
P[pair of variables are clean] = (1 − ε)2 (1 − ε)p
ˆ
Estimate all elements Σab , for a, b = 1, . . . , p separately
Problem: multivariate structure is damaged/destroyed
Particular problem: may not be positive-definite.
May or may not be a problem. Usually is.
Studied to some extent by Alqallaf (2003, PhD thesis)
14 / 17
15. Detecting cells
Some are obvious: univariate outliers
Some only show up with respect to other cells: structural
outliers
Van Aelst et al (2009) use Stahel-Donoho projections
Little and Smith (1987) used partial Mahalanobis distances:
ˆ ˆ
if MD(x; µ, Σ) is large,
ˆ ˆ
consider MD(x−j ; µ, Σ) for all j = 1, . . . , p.
Mike explores MD-approach and iterative estimation of
covariances in his thesis.
15 / 17
16. Weighted estimate with cell weights
Van Aelst et al (2009) proposed a weighted estimate, but it is
pairwise and not SPD
Mike knows how to deal with zero weights - remove the values
and treat them as MCAR. Then do MLE via EM, for example.
Proper cell-weighted estimate is still to be developed.
16 / 17