4. Markov Random Field Models for Computer Vision
Inference:
Model : Graph Cut (GC)
discrete or continuous variables? Belief Propagtion (BP)
discrete or continuous space? Tree-Reweighted Message Parsing
Dependence between variables? (TRW)
…
Iterated Conditional Modes (ICM)
Cutting-plane
Dual-decomposition
Applications:
2D/3D Image segmentation …
Object Recognition
3D reconstruction
Stereo matching Learning:
Image denoising Exhaustive search (grid search)
Texture Synthesis Pseudo-Likelihood approximation
Pose estimation
Training in Pieces
Panoramic Stitching
… Max-margin
…
5. Recap: Image Segmentation
Input z Maximum-a-posteriori (MAP)
x* = argmax P(x|z) = argmin E(x)
x x
P(x|z) ~ P(z|x) P(x) Posterior; Likelihood; Prior
P(x|z) ~ exp{-E(x)} Gibbs distribution
E: {0,1}n → R
E(x) = ∑ θi (xi) + w∑ θij (xi,xj) Energy
i i,j Є N4
Unary term Pairwise term
6. Min-Marginals
(uncertainty of MAP-solution)
Definition: ψv;i = min E(x)
xv=i
image MAP Min-Marginals
(foreground)
(bright=very certain)
Can be used in several ways:
• Insights on the model
• For optimization (TRW, comes later)
7. Introducing Factor Graphs
Write probability distributions as Graphical models:
- Direct graphical model
- Undirected graphical model (… what Andrew Blake used)
- Factor graphs
References:
- Pattern Recognition and Machine Learning *Bishop ‘08, book, chapter 8+
- several lectures at the Machine Learning Summer School 2009
(see video lectures)
8. Factor Graphs
P(x) ~ θ(x1,x2,x4) θ(x3,x4) θ(x2,x3) θ(x4,x5) “4 factors”
P(x) ~ exp{-E(x)} Gibbs distribution
E(x) = θ(x1,x2,x3) + θ(x2,x4) + θ(x3,x4) + θ(x3,x5)
unobserved/latent/hidden variable
x1 x2
variables are in same factor.
x3 x4
x5
Factor graph
9. Definition “Order”
Definition “Order”:
The arity (number of variables) of
the largest factor x1 x2
P(X) ~ θ(x1,x2,x3) θ(x2,x4) θ(x3,x4) θ(x3,x5)
arity 3 arity 2 x3 x4
x5
Factor graph
with order 3
10. Examples - Order
4-connected; higher(8)-connected; Higher-order MRF
pairwise MRF pairwise MRF
E(x) = ∑ θij (xi,xj) E(x) = ∑ θij (xi,xj) E(x) = ∑ θij (xi,xj)
i,j Є N4
i,j Є N4 i,j Є N8
+θ(x1,…,xn)
Order 2 Order 2 Order n
“Pairwise energy” “higher-order energy”
11. Example: Image segmentation
P(x|z) ~ exp{-E(x)}
E(x) = ∑ θi (xi,zi) + ∑ θij (xi,xj)
i i,j Є N4
Observed variable
Unobserved (latent) variable
zi
xj xi
Factor graph
13. Most simple inference technique:
ICM (iterated conditional mode)
means observed
Goal: x* = argmin E(x)
x2 x
E(x) = θ12 (x1,x2)+ θ13 (x1,x3)+
θ14 (x1,x4)+ θ15 (x1,x5)+…
x3 x1 x4
x5
Simulated Annealing: accept a move even if ICM Global min
energy increases (with certain probability) Can get stuck in local minima!
15. Stereo matching
d=0
d=4
Image – left(a) Image – right(b) Ground truth depth
• Images rectified
• Ignore occlusion for now
Energy: di
E(d): {0,…,D-1}n → R
Labels: d (depth/shift)
16. Stereo matching - Energy
Energy:
E(d): {0,…,D-1}n → R
E(d) = ∑ θi (di) + ∑ θij (di,dj)
i i,j Є N4
Unary:
θi (di) = (lj-ri-di) many others
“SAD; Sum of absolute differences”
Right Image
(many others possible, NCC,…)
left
i
right
i-2
(di=2) Left Image
Pairwise:
θij (di,dj) = g(|di-dj|)
17. Stereo matching - prior
θij (di,dj) = g(|di-dj|)
cost
|di-dj|
No truncation
(global min.)
[Olga Veksler PhD thesis,
Daniel Cremers et al.]
18. Stereo matching - prior
θij (di,dj) = g(|di-dj|)
cost
|di-dj|
discontinuity preserving potentials
*Blake&Zisserman’83,’87+
No truncation with truncation
(global min.) (NP hard optimization)
[Olga Veksler PhD thesis,
Daniel Cremers et al.]
19. Stereo matching
see http://vision.middlebury.edu/stereo/
No MRF No horizontal links
Pixel independent (WTA) Efficient since independent chains
Pairwise MRF Ground truth
[Boykov et al. ‘01+
20. Texture synthesis
Input
Output
Good case: Bad case:
b b b
a a
i j i j
a
E: {0,1}n → R
O E(x) = ∑ |xi-xj| [ |ai-bi|+|aj-bj| ]
i,j Є N4
1
[Kwatra et. al. Siggraph ‘03 +
25. Recap: 4-connected MRFs
• A lot of useful vision systems are based on
4-connected pairwise MRFs.
• Possible Reason (see Inference part):
a lot of fast and good (globally optimal)
inference methods exist
27. Why larger connectivity?
We have seen…
• “Knock-on” effect (each pixel influences each other pixel)
• Many good systems
What is missing:
1. Modelling real-world texture (images)
2. Reduce discretization artefacts
3. Encode complex prior knowledge
4. Use non-local parameters
28. Reason 1: Texture modelling
Training images Test image Test image (60% Noise)
Result MRF Result MRF Result MRF
4-connected 4-connected 9-connected
(neighbours) (7 attractive; 2 repulsive)
30. Reason2: Discretization artefacts
1
Length of the paths:
Eucl. 4-con. 8-con.
5.65 6.28 5.08
8 6.28 6.75
Larger connectivity can model true Euclidean
length (also other metric possible)
*Boykov et al ‘03, ‘05+
31. Reason2: Discretization artefacts
4-connected 8-connected 8-connected
Euclidean Euclidean geodesic
higher-connectivity can model
true Euclidean length [Boykov et al. ‘03; ‘05+
33. Reason 3: Encode complex prior knowledge:
Stereo with occlusion
E(d): {1,…,D}2n → R
Each pixel is connected to D pixels in the other image
d=1 (∞ cost)
1 1
d=10 (match)
θlr (dl,dr) = d d
match
d=20 (0 cost)
D D
dl dr Left view right view
34. Stereo with occlusion
Ground truth Stereo with occlusion Stereo without occlusion
[Kolmogrov et al. ‘02+ *Boykov et al. ‘01+
35. Reason 4: Use Non-local parameters:
Interactive Segmentation (GrabCut)
[Boykov and Jolly ’01+
GrabCut *Rother et al. ’04+
37. Reason 4: Use Non-local parameters:
Interactive Segmentation (GrabCut)
w
Model jointly segmentation and color model:
E(x,w): {0,1}n x {GMMs}→ R
E(x,w) = ∑ θi (xi,w) + ∑ θij (xi,xj)
i i,j Є N4
An object is a compact set of colors:
Red
Red
[Rother et al Siggraph ’04+
38. Reason 4: Use Non-local parameters:
Segmentation and Recognition
Goal, Segment test image:
1
Large set of example segmentation:
T(1) T(2) T(3)
Up to 2.000.000 exemplars
E(x,w): {0,1}n x {Exemplar}→ R
E(x,w) = ∑ |T(w)i-xi| + ∑ θij (xi,xj)
i i,j Є N4
“Hamming distance”
[Lempisky et al. ECCV ’08+
39. Reason 4: Use Non-local parameters:
Segmentation and Recognition
UIUC dataset; 98.8% accuracy
[Lempisky et al. ECCV ’08+
41. Why Higher-order Functions?
In general θ(x1,x2,x3) ≠ θ(x1,x2) + θ(x1,x3) + θ(x2,x3)
Reasons for higher-order MRFs:
1. Even better image(texture) models:
– Field-of Expert [FoE, Roth et al. ‘05+
– Curvature *Woodford et al. ‘08+
2. Use global Priors:
– Connectivity *Vicente et al. ‘08, Nowizin et al. ‘09+
– Encode better training statistics *Woodford et al. ‘09+
42. Reason1: Better Texture Modelling
Higher Order Structure
not Preserved
Training images
Test Image Test Image (60% Noise) Result pairwise MRF Higher-order MRF
9-connected
*Rother et al CVPR ‘09+
43. Reason 2: Use global Prior
Foreground object must be connected:
User input Standard MRF: with connectivity
Removes noise (+)
Shrinks boundary (-)
E(x) = P(x) + h(x) with h(x)= {
∞ if not 4-connected
0 otherwise
[Vicente et. al. ’08
Nowizin et al ‘09+
44. Reason 2: Use global Prior
What is the prior of a MAP-MRF solution:
Training image: 60% black, 40% white
MAP: Others less likely :
8
prior(x) = 0.6 = 0.016 prior(x) = 0.6 5 * 0.4 3 = 0.005
MRF is a bad prior since input marginal statistic ignored !
Introduce a global term, which controls global stats:
Noisy input
Ground truth Pairwise MRF – Global gradient prior
Increase Prior strength
[Woodford et. al. ICCV ‘09]
(see poster on Friday)
45. Summary
• Introduce Factor graphs notation
• Categorization of models in Computer Vision?
– 4-connected MRFs
– Highly-connected MRFs
– Higher-order MRFs
…. all useful models,
but how do I optimize them?
46. Course Program
9.30-10.00 Introduction (Andrew Blake)
10.00-11.00 Discrete Models in Computer Vision (Carsten Rother)
15min Coffee break
11.15-12.30 Message Passing: DP, TRW, LP relaxation (Pawan Kumar)
12.30-13.00 Quadratic pseudo-boolean optimization (Pushmeet Kohli)
1 hour Lunch break
14:00-15.00 Transformation and move-making methods (Pushmeet Kohli)
15:00-15.30 Speed and Efficiency (Pushmeet Kohli)
15min Coffee break
15:45-16.15 Comparison of Methods (Carsten Rother)
16:15-17.30 Recent Advances: Dual-decomposition, higher-order, etc.
(Carsten Rother + Pawan Kumar)
All online material will be online (after conference):
http://research.microsoft.com/en-us/um/cambridge/projects/tutorial/
49. Markov Property
xi
• Markov Property: Each variable is only connected to a few others,
i.e. many pixels are conditional independent
• This makes inference easier (possible at all)
• But still… every pixel can influence any other pixel (knock-on effect)
50. Recap: Factor Graphs
• Factor graphs: very good representation since it
reflects directly the given energy
• MRF (Markov Property) means many pixels are
conditional independent
• Still … all pixels influence each other (knock-on effect)
51. Interactive Segmentation - Tutorial example
Goal
z = (R,G,B)n x = {0,1}n
Given Z and unknown (latent) variables x:
P(x|z) = P(z|x) P(x) / P(z) ~ P(z|x) P(x)
Posterior Likelihood Prior
Probability (data- (data-
dependent) independent)
Maximium a Posteriori (MAP): x* = argmax P(x|z)
X
52. Likelihood P(x|z) ~ P(z|x) P(x)
Green
Green
Red Red
53. Likelihood P(x|z) ~ P(z|x) P(x)
Log P(zi|xi=0) Log P(zi|xi=1)
Maximum likelihood:
x* = argmax P(z|x) =
X
argmax ∏ P(zi|xi)
x xi
54. Prior P(x|z) ~ P(z|x) P(x)
xi xj
P(x) = 1/f ∏ θij (xi,xj)
i,j Є N
f = ∑ ∏ θij (xi,xj) “partition function”
x i,j Є N
θij (xi,xj) = exp{-|xi-xj|} “ising prior”
(exp{-1}=0.36; exp{0}=1)
55. Posterior distribution
P(x|z) ~ P(z|x) P(x)
Posterior “Gibbs” distribution:
θi (xi,zi) = P(zi|xi=1) xi + P(zi|xi=0) (1-xi) Likelihood
θ (x ,x ) = |x -x | prior
ij i j i j
P(x|z) = 1/f(z,w) exp{-E(x,z,w)}
f(z,w) = ∑ exp{-E(x,z,w)}
X
E(x,z,w) = ∑ θi (xi,zi) + w∑ θij (xi,xj) Energy
i i,j
Unary terms Pairwise terms
Note, likelihood can be an arbitrary function of the data
56. Energy minization
P(x|z) = 1/f(z,w) exp{-E(x,z,w)}
-log P(x|z) = -log (1/f(z,w)) + E(x,z,w)
f(z,w) = ∑ exp{-E(x,z,w)}
X
x* = argmin E(x,z,w)
X
MAP same as minimum Energy
MAP; Global min E ML
57. Weight prior and likelihood
w =0 w =10
w =40 w =200
E(x,z,w) = ∑ θi (xi,zi) + w∑ θij (xi,xj)
58. Moving away from a pure prior …
ising cost Contrast Cost
E(x,z,w) = ∑ θi (xi,zi) + w ∑ θij (xi,xj,zi,zj)
i i,j
θij (xi,xj,zi,zj) = |xi-xj| (-exp{-ß||zi-zj||2})
cost
ß=2(Mean(||zi-zj||2) )-1
“Going from a Markov random Field to ||zi-zj||2
Conditional random field”
59. Tree vs Loopy graphs
Markov blanket of xi:
all variables which are
in same factor as xi [Felzenschwalb, Huttenlocher ‘01+
xi root
- MAP (in general) NP hard tree chain
(see inference part)
- Marginals P(xi) also NP hard • MAP is tractable
• Marginal, e.g. P(foot), tractable
64. Conditional Random Field (CRF)
E(x) = ∑ θi (xi,zi) + ∑ θij (xi,xj,zi,zj) θij
i i,j Є N4
with θij (xi,xj,zi,zj) = |xi-xj| exp(-ß||zi-zj||2)
||zi-zj||2
zjj
zii
z
xjj
xji Ising cost Contrast Cost
Factor graph
Definition CRF: all factors may depend on the data z
No problem for inference (but parameter learning)