Introduction to Bayesian Statistics

Introduction to Bayesian Statistics
Machine Learning and Data Mining
Philipp Singer
CC image courtesy of user mattbuck007 on Flickr

3
Conditional Probability
● Probability of event A given that B is true
● P(cough|cold) > P(cough)
● Fundamental in probability theory

4
Before we start with Bayes ...
● Another perspective on conditional probability
● Conditional probability via growing trimmed trees
● https://www.youtube.com/watch?v=Zxm4Xxvzohk

6
Bayes Theorem
● P(A|B) is conditional probability of observing A
given B is true
● P(B|A) is conditional probability of observing B
given A is true
● P(A) and P(B) are probabilities of A and B without
conditioning on each other

7
Visualize Bayes Theorem
Source: https://oscarbonilla.com/2009/05/visualizing-bayes-theorem/
All possible
outcomes
Some event

8
All people
in study
People having
cancer

9
All people
in study
People where
screening test
is positive

10
People having
positive screening
test and cancer

11
● Given the test is positive, what is the probability that said
person has cancer?

12
● Given the test is positive, what is the probability that said
person has cancer?

13
● Given that someone has cancer, what is the probability that said
person had a positive test?

14
Example: Fake coin
● Two coins
– One fair
– One unfair
● What is the probability of having the fair coin
after flipping Heads?
CC image courtesy of user pagedooley on Flickr

15
Example: Fake coin

16
Example: Fake coin

17
Update of beliefs
● Allows new evidence to update beliefs
● Prior can also be posterior of previous update

18
Example: Fake coin
● Belief update
● What is probability of seeing a fair coin after we
have already seen one Heads

20Source: https://xkcd.com/1132/

21
Bayesian Inference
● Statistical inference of parameters
Parameters
Data
Additional
knowledge

22
Coin flip example
● Flip a coin several times
● Is it fair?
● Let's use Bayesian inference

23
Binomial model
● Probability p of flipping heads
● Flipping tails: 1-p
● Binomial model

24
Prior
● Prior belief about parameter(s)
● Conjugate prior
– Posterior of same distribution as prior
– Beta distribution conjugate to binomial
● Beta prior

25
Beta distribution
● Continuous probability distribution
● Interval [0,1]
● Two shape parameters: α and β
– If >= 1, interpret as pseudo counts
– α would refer to flipping heads

31
Posterior
● Posterior also Beta distribution
● For exact deviation:
http://www.cs.cmu.edu/~10701/lecture/technote2_betabinomial.pdf

32
Posterior
● Assume
– Binomial p = 0.4
– Uniform Beta prior: α=1 and β=1
– 200 random variates from binomial distribution (Heads=80)
– Update posterior

33
Posterior
● Assume
– Binomial p = 0.4
– Biased Beta prior: α=50 and β=10
– 200 random variates from binomial distribution (Heads=80)
– Update posterior

34
Posterior
● Convex combination of prior and data
● The stronger our prior belief, the more data we
need to overrule the prior
● The less prior belief we have, the quicker the
data overrules the prior

36
So is the coin fair?
● Examine posterior
– 95% posterior density interval
– ROPE [1]: Region of practical equivalence for null hypothesis
– Fair coin: [0.45,0.55]
● 95% HDI: (0.33, 0.47)
● Cannot reject null
● More samples→ we can
[1] Kruschke, John. Doing Bayesian data analysis: A tutorial
with R, JAGS, and Stan. Academic Press, 2014.

37
Bayesian Model Comparison
● Parameters marginalized out
● Average of likelihood weighted by prior
Evidence

38
Bayesian Model Comparison
● Bayes factors [1]
● Ratio of marginal likelihoods
● Interpretation table by Kass & Raftery [1]
● >100 → decisive evidence against M2
[1] Kass, Robert E., and Adrian E. Raftery. "Bayes factors."
Journal of the american statistical association 90.430 (1995): 773-795.

39
● Null hypothesis
● Alternative hypothesis
– Anything is possible
– Beta(1,1)
● Bayes factor

40
● n = 200
● k = 80
● Bayes factor
● (Decent) preference for alt. hypothesis

41
Other priors
● Prior can encode (theories) hypotheses
● Biased hypothesis: Beta(101,11)
● Haldane prior: Beta(0.001, 0.001)
– u-shaped
– high probability on p=1 or (1-p)=1

42
Frequentist approach
● So is the coin fair?
● Binomial test with null p=0.5
– one-tailed
– 0.0028
● Chi² test

43
Posterior prediction
● Posterior mean
● If data large→converges to MLE
● MAP: Maximum a posteriori
– Bayesian estimator
– uses mode

44
Bayesian prediction
● Posterior predictive distribution
● Distribution of unobserved observations
conditioned on observed data (train, test)
Frequentist
MLE

45
Alternative Bayesian Inference
● Often marginal likelihood not easy to evaluate
– No analytical solution
– Numerical integration expensive
● Alternatives
– Monte Carlo integration
● Markov Chain Monte Carlo (MCMC)
● Gibbs sampling
● Metropolis-Hastings algorithm
– Laplace approximation
– Variational Bayes

46
Bayesian (Machine) Learning

47
Bayesian Models
● Example: Markov Chain Model
– Dirichlet prior, Categorical Likelihood
● Bayesian networks
● Topic models (LDA)
● Hierarchical Bayesian models

48
Generalized Linear Model
● Multiple linear regression
● Logistic regression
● Bayesian ANOVA

49
Bayesian Statistical Tests
● Alternatives to frequentist approaches
● Bayesian correlation
● Bayesian t-test

50
Questions?
Philipp Singer
philipp.singer@gesis.org
Image credit: talk of Mike West: http://www2.stat.duke.edu/~mw/ABS04/Lecture_Slides/4.Stats_Regression.pdf

Introduction to Bayesian Statistics

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Introduction to Bayesian Statistics

Similar a Introduction to Bayesian Statistics (9)

Último

Último (20)

Introduction to Bayesian Statistics