20 cv mil_models_for_words

Computer vision: models,
learning and inference
Chapter 20
Models for visual words

Please send errata to s.prince@cs.ucl.ac.uk

Visual words

• Most models treat data as continuous
• Likelihood based on normal distribution
• Visual words = discrete representation of
image
• Likelihood based on categorical distribution
• Useful for difficult tasks such as scene
recognition and object recognition

Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 2

Motivation: scene recognition


Structure

• Computing visual words
• Bag of words model
• Latent Dirichlet allocation
• Single author-topic model
• Constellation model
• Scene model
• Applications


Computing dictionary of visual words

1. For every one of the I training images, select a
set of Ji spatial locations.
• Interest points
• Regular grid
2. Compute a descriptor at each spatial location in
each image
3. Cluster all of these descriptor vectors into K
groups using a method such as the K-Means
algorithm
4. The means of the K clusters are used as the K
prototype vectors in the dictionary.

Encoding images as visual words
1. Select a set of J spatial locations in the image using the same
method as for the dictionary
2. Compute the descriptor at each of the J spatial locations.
3. Compare each descriptor to the set of K prototype
descriptors in the dictionary
4. Assign a discrete index to this location that corresponds to
the index of the closest word in the dictionary.

End result:

Discrete feature index x and y position

Structure

• Scene model
• Applications


Bag of words model
Key idea:

• Abandon all spatial information
• Just represent image by relative frequency
(histogram) of words from dictionary

where


Bag of words


Structure
Learning (MAP solution):

Inference:


Bag of words for object recognition


Problems with bag of words


Structure

• Scene model
• Applications


Latent Dirichlet allocation
• Describes relative frequency of visual words in a
single image (no world term)
• Words not generated independently (connected by
hidden variable)
• Analogy to text documents
– Each image contains mixture of several topics (parts)
– Each topic induces a distribution over words


Generative equations

Marginal distribution over features

Conjugate priors over parameters


Learning LDA model
• Part labels p hidden variables
• If we knew them then it would be easy to estimate the
parameters

• How about EM algorithm? Unfortunately, parts within in
image not independent


Learning
Strategy:

1. Write an expression for posterior distribution
over part labels
2. Draw samples from posterior using MCMC
3. Use samples to estimate parameters


1. Posterior over part labels

Denominator
intractable
Can compute two terms in numerator in closed form


2. Draw samples from posterior
Gibbs’ sampling: fix all part labels except one and sample
from conditional distribution

This can be computed in closed form



Samples substitute in for real part labels in update
equations


Structure

• Scene model
• Applications


Single author topic model


Single author-topic model


Learning

Likelihood same as before, prior becomes


Learning



Inference
Likelihood that words in this image are due to
category n

Compute posterior over categories


Structure

• Scene model
• Applications


Constellation model


Learning

Prior same as before, likelihood becomes


Learning


Part and word probabilities as before

Inference
Likelihood that words in this image are due to
category n

Compute posterior over categories


Learning


Structure

• Scene model
• Applications


Scene model


Structure

• Scene model
• Applications


Video Google


Action recognition

Spatio-temporal bag of words model 91.8% classification


Action recognition


20 cv mil_models_for_words

Recomendados

Recomendados

Más contenido relacionado

Destacado

Destacado (20)

Similar a 20 cv mil_models_for_words

Similar a 20 cv mil_models_for_words (20)

Más de zukun

Más de zukun (20)

Último

Último (20)

20 cv mil_models_for_words