More Related Content
Similar to 03 cv mil_probability_distributions
Similar to 03 cv mil_probability_distributions (19)
03 cv mil_probability_distributions
- 3. Why model these complicated quantities?
Because we need probability distributions over model parameters as well as
over data and world state. Hence, some of the distributions describe the
parameters of the others:
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 3
- 4. Why model these complicated quantities?
Because we need probability distributions over model parameters as well as
over data and world state. Hence, some of the distributions describe the
parameters of the others:
Example:
Parameters modelled by:
Models variance
Models mean
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 4
- 5. Bernoulli Distribution
or
For short we write:
Bernoulli distribution describes situation where only two possible
outcomes y=0/y=1 or failure/success
Takes a single parameter
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 5
- 6. Beta Distribution
Defined over data (i.e. parameter of Bernoulli)
• Two parameters both > 0 For short we write:
• Mean depends on relative values E[ ] = .
• Concentration depends on magnitude
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 6
- 7. Categorical Distribution
or can think of data as vector with all
elements zero except kth e.g. [0,0,0,1 0]
For short we write:
Categorical distribution describes situation where K possible
outcomes y=1… y=k.
Takes a K parameters where
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 7
- 9. Univariate Normal Distribution
For short we write:
Univariate normal distribution
describes single continuous
variable.
Takes 2 parameters and 2>0
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 9
- 10. Normal Inverse Gamma Distribution
Defined on 2 variables and 2>0
or for short
Four parameters and
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 10
- 11. Multivariate Normal Distribution
For short we write:
Multivariate normal distribution describes multiple continuous
variables. Takes 2 parameters
• a vector containing mean position,
• a symmetric “positive definite” covariance matrix
Positive definite: is positive for any real
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 11
- 12. Types of covariance
Covariance matrix has three forms, termed spherical, diagonal and full
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 12
- 13. Normal Inverse Wishart
Defined on two variables: a mean vector and a symmetric positive definite
matrix, .
or for short:
Has four parameters
• a positive scalar,
• a positive definite matrix
• a positive scalar,
• a vector
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 13
- 14. Samples from Normal
Inverse Wishart
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 14
- 15. Conjugate Distributions
The pairs of distributions discussed have a special
relationship: they are conjugate distributions
• Beta is conjugate to Bernouilli
• Dirichlet is conjugate to categorical
• Normal inverse gamma is conjugate to univariate
normal
• Normal inverse Wishart is conjugate to
multivariate normal
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 15
- 16. Conjugate Distributions
When we take product of distribution and it’s conjugate, the
result has the same form as the conjugate.
For example, consider the case where
then
a constant A new Beta distribution
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 16
- 17. Example proof
When we take product of distribution and it’s conjugate, the
result has the same form as the conjugate.
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 17
- 18. Bayes’ Rule Terminology
Likelihood – propensity for Prior – what we know
observing a certain value of about y before seeing x
x given a certain value of y
Posterior – what we Evidence – a constant to
know about y after ensure that the left hand
seeing x side is a valid distribution
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 18
- 19. Importance of the Conjugate Relation 1
1. Choose prior that
• Learning parameters: is conjugate to
likelihood
2. Implies that posterior 3. Posterior must be a distribution
must have same form as which implies that evidence must equal
conjugate prior distribution constant from conjugate relation
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 19
- 20. Importance of the Conjugate Relation 2
• Marginalizing over parameters
2. Integral becomes easy --the product becomes a 1. Chosen so conjugate
constant times a distribution to other term
Integral of constant times probability distribution
= constant times integral of probability distribution
= constant x 1 = constant
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 20
- 21. Conclusions
• Presented four distributions which model useful quantities
• Presented four other distributions which model the
parameters of the first four
• They are paired in a special way – the second set is
conjugate to the other
• In the following material we’ll see that this relationship is
very useful
Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 21