4. 4
Before we start with Bayes ...
● Another perspective on conditional probability
● Conditional probability via growing trimmed trees
● https://www.youtube.com/watch?v=Zxm4Xxvzohk
6. 6
Bayes Theorem
● P(A|B) is conditional probability of observing A
given B is true
● P(B|A) is conditional probability of observing B
given A is true
● P(A) and P(B) are probabilities of A and B without
conditioning on each other
13. 13
Visualize Bayes Theorem
● Given that someone has cancer, what is the probability that said
person had a positive test?
14. 14
Example: Fake coin
● Two coins
– One fair
– One unfair
● What is the probability of having the fair coin
after flipping Heads?
CC image courtesy of user pagedooley on Flickr
17. 17
Update of beliefs
● Allows new evidence to update beliefs
● Prior can also be posterior of previous update
18. 18
Example: Fake coin
CC image courtesy of user pagedooley on Flickr
● Belief update
● What is probability of seeing a fair coin after we
have already seen one Heads
24. 24
Prior
● Prior belief about parameter(s)
● Conjugate prior
– Posterior of same distribution as prior
– Beta distribution conjugate to binomial
● Beta prior
25. 25
Beta distribution
● Continuous probability distribution
● Interval [0,1]
● Two shape parameters: α and β
– If >= 1, interpret as pseudo counts
– α would refer to flipping heads
31. 31
Posterior
● Posterior also Beta distribution
● For exact deviation:
http://www.cs.cmu.edu/~10701/lecture/technote2_betabinomial.pdf
32. 32
Posterior
● Assume
– Binomial p = 0.4
– Uniform Beta prior: α=1 and β=1
– 200 random variates from binomial distribution (Heads=80)
– Update posterior
33. 33
Posterior
● Assume
– Binomial p = 0.4
– Biased Beta prior: α=50 and β=10
– 200 random variates from binomial distribution (Heads=80)
– Update posterior
34. 34
Posterior
● Convex combination of prior and data
● The stronger our prior belief, the more data we
need to overrule the prior
● The less prior belief we have, the quicker the
data overrules the prior
35. 36
So is the coin fair?
● Examine posterior
– 95% posterior density interval
– ROPE [1]: Region of practical equivalence for null hypothesis
– Fair coin: [0.45,0.55]
● 95% HDI: (0.33, 0.47)
● Cannot reject null
● More samples→ we can
[1] Kruschke, John. Doing Bayesian data analysis: A tutorial
with R, JAGS, and Stan. Academic Press, 2014.
37. 38
Bayesian Model Comparison
● Bayes factors [1]
● Ratio of marginal likelihoods
● Interpretation table by Kass & Raftery [1]
● >100 → decisive evidence against M2
[1] Kass, Robert E., and Adrian E. Raftery. "Bayes factors."
Journal of the american statistical association 90.430 (1995): 773-795.
38. 39
So is the coin fair?
● Null hypothesis
● Alternative hypothesis
– Anything is possible
– Beta(1,1)
● Bayes factor
39. 40
So is the coin fair?
● n = 200
● k = 80
● Bayes factor
● (Decent) preference for alt. hypothesis
40. 41
Other priors
● Prior can encode (theories) hypotheses
● Biased hypothesis: Beta(101,11)
● Haldane prior: Beta(0.001, 0.001)
– u-shaped
– high probability on p=1 or (1-p)=1
41. 42
Frequentist approach
● So is the coin fair?
● Binomial test with null p=0.5
– one-tailed
– 0.0028
● Chi² test
43. 44
Bayesian prediction
● Posterior predictive distribution
● Distribution of unobserved observations
conditioned on observed data (train, test)
Frequentist
MLE
44. 45
Alternative Bayesian Inference
● Often marginal likelihood not easy to evaluate
– No analytical solution
– Numerical integration expensive
● Alternatives
– Monte Carlo integration
● Markov Chain Monte Carlo (MCMC)
● Gibbs sampling
● Metropolis-Hastings algorithm
– Laplace approximation
– Variational Bayes