SlideShare una empresa de Scribd logo
1 de 93
Descargar para leer sin conexión
Accelerating Metropolis-Hastings with
Lightweight Inference Compilation
Feynman Liang, Nim Arora, Nazanin Tehrani, Yucen Li, Michael
Tingley, Erik Meijer
November 11, 2020
Facebook Probability, Facebook AI Infrastructure, UC Berkeley Mahoney Lab
Table of contents
Background
Probabilistic Programming
Bayesian Inference
Inference compilation
SIS in imperative PPLs
Lightweight Inference Compilation for MCMC
Results
Future directions
1
Background
Two competing philosophies
[van de Meent et al., 2018] To build machines that can
reason, random variables and probabilistic calculations
are:
2
Two competing philosophies
[van de Meent et al., 2018] To build machines that can
reason, random variables and probabilistic calculations
are:
Probabilistic ML
An engineering requirement
[Tenenbaum et al., 2011,
Ghahramani, 2015]
2
Two competing philosophies
[van de Meent et al., 2018] To build machines that can
reason, random variables and probabilistic calculations
are:
Probabilistic ML
An engineering requirement
[Tenenbaum et al., 2011,
Ghahramani, 2015]
Deep Learning
Irrelevant
[LeCun et al., 2015,
Goodfellow et al., 2016]
2
Two competing philosophies
[van de Meent et al., 2018] To build machines that can
reason, random variables and probabilistic calculations
are:
Probabilistic ML
An engineering requirement
[Tenenbaum et al., 2011,
Ghahramani, 2015]
Deep Learning
Irrelevant
[LeCun et al., 2015,
Goodfellow et al., 2016]
2
Two competing philosophies
[van de Meent et al., 2018] To build machines that can
reason, random variables and probabilistic calculations
are:
Probabilistic ML
An engineering requirement
[Tenenbaum et al., 2011,
Ghahramani, 2015]
Deep Learning
Irrelevant
[LeCun et al., 2015,
Goodfellow et al., 2016]
2
Background
Probabilistic Programming
Probabilistic programming languages (PPLs)
Just as programming beyond the simplest algo-
rithms requires tools for abstraction and com-
position, complex probabilistic modeling requires
new progress in model representation—proba-
bilistic programming languages.
[Goodman, 2013]
3
Abstractions over deterministic computations
Low Level Assembly
1 mov dx, msg
2 ; ah=9 - "print string" sub-function
3 mov ah, 9
4 int 0x21
5
6 "exit" sub-function
7 mov ah, 0x4c
8 int 0x21
9
10 msg db 'Hello!', 0x0d, 0x0a, '$'
4
Abstractions over deterministic computations
Low Level Assembly
1 mov dx, msg
2 ; ah=9 - "print string" sub-function
3 mov ah, 9
4 int 0x21
5
6 "exit" sub-function
7 mov ah, 0x4c
8 int 0x21
9
10 msg db 'Hello!', 0x0d, 0x0a, '$'
High Level Python
1 print("Hello!")
4
Abstractions over probabilistic computations
Figure 1:
[Koller and Friedman, 2009]
5
Abstractions over probabilistic computations
Figure 1:
[Koller and Friedman, 2009]
1 d ~ Bernoulli
2 i ~ Normal
3 g ~ Categorical(fn(d, i))
4 s ~ Normal(fn(i))
5 l ~ Bernoulli(fn(g))
5
Abstractions over probabilistic computations
Figure 1:
[Koller and Friedman, 2009]
1 d ~ Bernoulli
2 i ~ Normal
3 g ~ Categorical(fn(d, i))
4 s ~ Normal(fn(i))
5 l ~ Bernoulli(fn(g))
Generative model:
P(D, I, G, S, L) = P(D)P(I)P(G | D, I)P(S | I)P(G | L)
5
Abstractions over probabilistic computations
Figure 1:
[Koller and Friedman, 2009]
1 d ~ Bernoulli
2 i ~ Normal
3 g ~ Categorical(fn(d, i))
4 s ~ Normal(fn(i))
5 l ~ Bernoulli(fn(g))
Question : Given a student’s recommendation letter
and SAT score, what should I expect their
intelligence to be?
5
Abstractions over probabilistic computations
Figure 1:
[Koller and Friedman, 2009]
1 d ~ Bernoulli
2 i ~ Normal
3 g ~ Categorical(fn(d, i))
4 s ~ Normal(fn(i))
5 l ~ Bernoulli(fn(g))
Question : Given a student’s recommendation letter
and SAT score, what should I expect their
intelligence to be?
Using a PPL : infer(i, {l=Good, s=800}) 5
Background
Bayesian Inference
Bayesian Inference Basics
Latent Variables X
Observed Variables Y
Prior P(X)
Likelihood P(Y | X)
6
Bayesian Inference Basics
Latent Variables X
Observed Variables Y
Prior P(X)
Likelihood P(Y | X)
Goal: Approximate the posterior P(X | Y)
6
Bayesian Inference Basics
Goal: Approximate the posterior P(X | Y)
Figure 2: [van de Meent et al., 2018]
6
Bayesian Inference Basics
Goal: Approximate the posterior P(X | Y)
X Y
intelligence letter and grade
scene description image
simulation simulator output
program source code program return value
policy prior and world simulator rewards
cognitive decision making process observed behavior
Table 1: [van de Meent et al., 2018]
6
Why only approximate?
P(X | Y) =
P(Y | X)P(X)
P(Y)
=
P(Y | X)P(X)
X
P(Y | X)P(X)dX
7
Why only approximate?
P(X | Y) =
P(Y | X)P(X)
P(Y)
=
P(Y | X)P(X)
X
P(Y | X)P(X)dX
Marginal likelihood P(Y) (i.e. partition function)
high-dimensional integral
Tractable only for small family of conjugate
prior/likelihood pairs
7
How to approximate?
Variational Inference Monte Carlo
8
How to approximate?
Variational Inference
Let qφ be a tractable
parametric family (e.g.
Gaussian mean-field
qφ(X) = d
i=1 N(Xi | φ1,i, φ2,i))
Monte Carlo
8
How to approximate?
Variational Inference
Let qφ be a tractable
parametric family (e.g.
Gaussian mean-field
qφ(X) = d
i=1 N(Xi | φ1,i, φ2,i))
arg min
φ
KL(qφ(X) | P(X | Y))
= arg max
φ
Eqφ
log qφ(X)
log P(X | Y)
Monte Carlo
8
How to approximate?
Variational Inference
Let qφ be a tractable
parametric family (e.g.
Gaussian mean-field
qφ(X) = d
i=1 N(Xi | φ1,i, φ2,i))
arg min
φ
KL(qφ(X) | P(X | Y))
= arg max
φ
Eqφ
log qφ(X)
log P(X | Y)
= arg max
φ
Eqφ
log qφ(X)
log P(X, Y)
Monte Carlo
8
How to approximate?
Variational Inference
Let qφ be a tractable
parametric family (e.g.
Gaussian mean-field
qφ(X) = d
i=1 N(Xi | φ1,i, φ2,i))
arg min
φ
KL(qφ(X) | P(X | Y))
= arg max
φ
Eqφ
log qφ(X)
log P(X | Y)
Monte Carlo
Sample Xi
iid
∼ P(X | Y). Then
8
How to approximate?
Variational Inference
Let qφ be a tractable
parametric family (e.g.
Gaussian mean-field
qφ(X) = d
i=1 N(Xi | φ1,i, φ2,i))
arg min
φ
KL(qφ(X) | P(X | Y))
= arg max
φ
Eqφ
log qφ(X)
log P(X | Y)
Monte Carlo
Sample Xi
iid
∼ P(X | Y). Then
E[g(X)|Y]
= g(X) · P(X | Y)dX
≈
1
N
N
n=1
g(Xi)
8
How to approximate?
Variational Inference
Let qφ be a tractable
parametric family (e.g.
Gaussian mean-field
qφ(X) = d
i=1 N(Xi | φ1,i, φ2,i))
arg min
φ
KL(qφ(X) | P(X | Y))
= arg max
φ
Eqφ
log qφ(X)
log P(X | Y)
Monte Carlo
Sample Xi
iid
∼ P(X | Y). Then
E[g(X)|Y]
= g(X) · P(X | Y)dX
≈
1
N
N
n=1
g(Xi)
8
How to approximate?
Variational Inference
Let qφ be a tractable
parametric family (e.g.
Gaussian mean-field
qφ(X) = d
i=1 N(Xi | φ1,i, φ2,i))
arg min
φ
KL(qφ(X) | P(X | Y))
= arg max
φ
Eqφ
log qφ(X)
log P(X | Y)
Monte Carlo
Sample Xi
iid
∼ P(X | Y). Then
E[g(X)|Y]
= g(X) · P(X | Y)dX
≈
1
N
N
n=1
g(Xi)
8
Inference compilation
Imperative vs Declarative PPLs
Imperative: Evaluation-based, samples (linear) execution
traces (Pyro, Church, WebPPL)
9
Imperative vs Declarative PPLs
Imperative: Evaluation-based, samples (linear) execution
traces (Pyro, Church, WebPPL)
1 (begin
2 (define geometric
3 (lambda (p)
4 (if (flip p)
5 1
6 (+ 1 (geometric p)))))
7 (geometric .3))
9
Imperative vs Declarative PPLs
Imperative: Evaluation-based, samples (linear) execution
traces (Pyro, Church, WebPPL)
1 (begin
2 (define geometric
3 (lambda (p)
4 (if (flip p)
5 1
6 (+ 1 (geometric p)))))
7 (geometric .3))
Figure 2: [Wingate et al., 2011] 9
Inference compilation
SIS in imperative PPLs
Sequential importance sampling (SIS)
EP(X|Y)[g(X)]
10
Sequential importance sampling (SIS)
EP(X|Y)[g(X)] = EP(X|Y)
q(X)
q(X)
g(X) =
Eq
P(X,Y)
q(X)
g(X)
P(Y)
10
Sequential importance sampling (SIS)
EP(X|Y)[g(X)] = EP(X|Y)
q(X)
q(X)
g(X) =
Eq
P(X,Y)
q(X)
g(X)
P(Y)
≈
1
N
N
i
P(Xi,Y)
q(Xi)
P(Y)
g(Xi) ≈
N
i
P(Xi,Y)
q(Xi)
N
i
P(Xi,Y)
q(Xi)
g(Xi)
where Xi ∼ q from proposal distribution q
10
Sequential importance sampling (SIS)
EP(X|Y)[g(X)] = EP(X|Y)
q(X)
q(X)
g(X) =
Eq
P(X,Y)
q(X)
g(X)
P(Y)
≈
1
N
N
i
P(Xi,Y)
q(Xi)
P(Y)
g(Xi) ≈
N
i
P(Xi,Y)
q(Xi)
N
i
P(Xi,Y)
q(Xi)
g(Xi)
where Xi ∼ q from proposal distribution q
Rate Varq
P(X,Y)
q(X)
g(X)
−1/2
[Yuan and Druzdzel, 2007]
10
SIS of execution traces
1. Execute the probabilistic program forwards
11
SIS of execution traces
1. Execute the probabilistic program forwards
2. At each latent variable Xk (i.e. sample statement),
sample q(Xk), assign value, multiply node
importance weight into trace
11
SIS of execution traces
1. Execute the probabilistic program forwards
2. At each latent variable Xk (i.e. sample statement),
sample q(Xk), assign value, multiply node
importance weight into trace
3. At each observed random variable (i.e. observe
statement), multiply likelihood P(Yk | X1:k, Y1:k) into
trace
11
SIS of execution traces
1. Execute the probabilistic program forwards
2. At each latent variable Xk (i.e. sample statement),
sample q(Xk), assign value, multiply node
importance weight into trace
3. At each observed random variable (i.e. observe
statement), multiply likelihood P(Yk | X1:k, Y1:k) into
trace
11
SIS of execution traces
1. Execute the probabilistic program forwards
2. At each latent variable Xk (i.e. sample statement),
sample q(Xk), assign value, multiply node
importance weight into trace
3. At each observed random variable (i.e. observe
statement), multiply likelihood P(Yk | X1:k, Y1:k) into
trace
Problem: Myopic choices from sampling q(X) early in the
trace may result in low importance weights (poor
explanation of Y) later.
11
Constructing a proposal distribution
How to choose q?
12
Constructing a proposal distribution
How to choose q?
Likelihood-Weighting [Norvig and Intelligence, 2002]:
q(X) = P(X)
12
Constructing a proposal distribution
How to choose q?
Likelihood-Weighting [Norvig and Intelligence, 2002]:
q(X) = P(X)
Direct sampling: q(X) = P(X | Y), optimal
12
Constructing a proposal distribution
How to choose q?
Likelihood-Weighting [Norvig and Intelligence, 2002]:
q(X) = P(X)
Direct sampling: q(X) = P(X | Y), optimal
Key Idea: Account for Y when constructing q. Exploit
access to P(X, Y) to build a proposers q “close” to
P(X | Y)?
12
Intuition for inference compilation
−5.0 −2.5 0.0 2.5 5.0 7.5 10.0 12.5 15.0
x
−4
−2
0
2
4
y
Generative model p(x,y)
−5.0 −2.5 0.0 2.5 5.0 7.5 10.0 12.5 15.0
x
0.00
0.02
0.04
0.06
0.08
0.10
PDF
Posterior density given y=0.25
p(x|y = 0.25)
x ∼ p(x | y = 0.25)
q(x; φ(y = 0.25), K = 1)
q(x; φ(y = 0.25), K = 2)
13
Trace-based inference compilation (IC)
• Construct DNN with parameters φ mapping
observations Y (amortized inference,
[Goodman, 2013]) and execution prefix to proposal
distribution qφ(· | Y)
• Train qφ against forward samples from the
probabilistic program p(x, y) (inference compilation)
Figure 3: [Le et al., 2017]
14
Trace-based inference compilation (IC)
Figure 4: [Le et al., 2017]
15
Sensitivity to nuisance random variables
1 def magnitude(obs, M):
2 x = sample(Normal (0, 10))
3 [sample(Normal (0 ,10)) for _ in range(M)] # extend trace with nuisance
4 y = sample(Normal (0 ,10))
5 observe(obs**2, Likelihood=Normal(x**2 + y**2, 0.1)0
6 return x, y
7
Figure 5: [Harvey et al., 2019] 16
Inference compilation
Lightweight Inference Compilation for
MCMC
Imperative vs Declarative PPLs
Declarative: Graph-based, samples instantiated graphical
models (i.e. worlds) (BUGS, BLOG, Stan, beanmachine)
Figure 6: [Blei et al., 2003]
Key Idea: Markov blanket available in declarative PPL
17
MCMC sampling of graphical models
Metropolis-within-Gibbs / Lightweight MH
[Wingate et al., 2011]:
• Initialize minimal self-supporting world consistent
with Y
18
MCMC sampling of graphical models
Metropolis-within-Gibbs / Lightweight MH
[Wingate et al., 2011]:
• Initialize minimal self-supporting world consistent
with Y
• Repeat:
• Pick single random unobserved node Xi
18
MCMC sampling of graphical models
Metropolis-within-Gibbs / Lightweight MH
[Wingate et al., 2011]:
• Initialize minimal self-supporting world consistent
with Y
• Repeat:
• Pick single random unobserved node Xi
• Sample proposal q(Xi) to propose new value
18
MCMC sampling of graphical models
Metropolis-within-Gibbs / Lightweight MH
[Wingate et al., 2011]:
• Initialize minimal self-supporting world consistent
with Y
• Repeat:
• Pick single random unobserved node Xi
• Sample proposal q(Xi) to propose new value
• Accept with probability α and revert otherwise
18
MCMC sampling of graphical models
Metropolis-within-Gibbs / Lightweight MH
[Wingate et al., 2011]:
• Initialize minimal self-supporting world consistent
with Y
• Repeat:
• Pick single random unobserved node Xi
• Sample proposal q(Xi) to propose new value
• Accept with probability α and revert otherwise
18
MCMC sampling of graphical models
Metropolis-within-Gibbs / Lightweight MH
[Wingate et al., 2011]:
• Initialize minimal self-supporting world consistent
with Y
• Repeat:
• Pick single random unobserved node Xi
• Sample proposal q(Xi) to propose new value
• Accept with probability α and revert otherwise
Theorem ([Hastings, 1970])
With appropriately chosen α, the above algorithm yields
a Markov Chain with the posterior as the invariant
distribution.
18
MH proposal distributions
Different q(·) =⇒ different MCMC algorithms
19
MH proposal distributions
Different q(·) =⇒ different MCMC algorithms
• Random walk MH q(·) isotropic Gaussian
19
MH proposal distributions
Different q(·) =⇒ different MCMC algorithms
• Random walk MH q(·) isotropic Gaussian
• Newtonian Monte Carlo q(·) Gaussian with empirical
Fisher information precision
19
MH proposal distributions
Different q(·) =⇒ different MCMC algorithms
• Random walk MH q(·) isotropic Gaussian
• Newtonian Monte Carlo q(·) Gaussian with empirical
Fisher information precision
• Hamiltonian Monte Carlo q(·) integrates
iso-Hamiltonian system
19
MH proposal distributions
Different q(·) =⇒ different MCMC algorithms
• Random walk MH q(·) isotropic Gaussian
• Newtonian Monte Carlo q(·) Gaussian with empirical
Fisher information precision
• Hamiltonian Monte Carlo q(·) integrates
iso-Hamiltonian system
• Lightweight Inference Compilation q(· | MB(Xi)) a
neural network function of Markov Blanket
19
MH proposal distributions
Different q(·) =⇒ different MCMC algorithms
• Random walk MH q(·) isotropic Gaussian
• Newtonian Monte Carlo q(·) Gaussian with empirical
Fisher information precision
• Hamiltonian Monte Carlo q(·) integrates
iso-Hamiltonian system
• Lightweight Inference Compilation q(· | MB(Xi)) a
neural network function of Markov Blanket
Theorem ([Pearl, 1987])
Gibbs distributions P(Xi | Xc
i ) = P(Xi | MB(Xi)) have
acceptance probability 1
19
MH proposal distributions
Different q(·) =⇒ different MCMC algorithms
• Random walk MH q(·) isotropic Gaussian
• Newtonian Monte Carlo q(·) Gaussian with empirical
Fisher information precision
• Hamiltonian Monte Carlo q(·) integrates
iso-Hamiltonian system
• Lightweight Inference Compilation q(· | MB(Xi)) a
neural network function of Markov Blanket
Theorem ([Pearl, 1987])
Gibbs distributions P(Xi | Xc
i ) = P(Xi | MB(Xi)) have
acceptance probability 1
∴ MB(Xi) minimal sufficient inputs for constructing
proposal distribution 19
LIC artifacts for student network
q(d|g, i)
q(i|g, d, s)
q(g|i, d, l)
q(s|i)
q(l|g)
20
LIC proposer for grade
21
Compare against SIS IC’s (non-minimal) proposer
q(d|observations)
q(i|d, observations)
q(g|i, d, observations)
q(s|g, i, d, observations)
q(l|s, g, i, d, observations)
22
Results
Recovering conjugate expressions in normal-normal
x ∼ N(0, 2), y | x ∼ N(x, 0.1)
Know: y | x ∼ N(0.999x, 0.0001)
10 5 0 5 10
y (Observed value)
10
5
0
5
10
value
Learning a conjugate model's posterior
variable
Closed-form mean
Mean of LIC proposer
23
GMM Mode Escape
−5.0 −2.5 0.0 2.5 5.0 7.5 10.0 12.5 15.0
x
−4
−2
0
2
4
y
Generative model p(x,y)
−5.0 −2.5 0.0 2.5 5.0 7.5 10.0 12.5 15.0
x
0.00
0.02
0.04
0.06
0.08
0.10
PDF
Posterior density given y=0.25
p(x|y = 0.25)
x ∼ p(x | y = 0.25)
q(x; φ(y = 0.25), K = 1)
q(x; φ(y = 0.25), K = 2)
0.000
0.025
0.050
0.075
Density
method = Adaptive HMC (Hoffman 2014) method = Adaptive RWMH (Garthwaite 2016) method = Ground Truth
−5 0 5 10 15
x
0.000
0.025
0.050
0.075
Density
method = Inference Compilation (this paper)
−5 0 5 10 15
x
method = NMC (Arora 2020)
−5 0 5 10 15
x
method = NUTS (Stan defaults)
24
Robustness to nuisance random variables
1 def magnitude(obs):
2 x = sample(Normal(0, 10))
3 for _ in range(100):
4 nuisance = sample(Normal(0, 10))
5 y = sample(Normal(0, 10))
6 observe(
7 obs**2,
8 likelihood=Normal(x**2 + y**2, 0.1))
9 return x
1 class NuisanceModel:
2 @random_variable
3 def x(self):
4 return dist.Normal(0, 10)
5 @random_variable
6 def nuisance(self, i):
7 return dist.Normal(0, 10)
8 @random_variable
9 def y(self):
10 return dist.Normal(0, 10)
11 @random_variable
12 def noisy_sq_length(self):
13 return dist.Normal(
14 self.x()**2 + self.y()**2,
15 0.1)
# params compile time ESS
LIC (this paper) 3,358 44 sec. 49.75
[Le et al., 2017] 21,952 472 sec. 10.99
25
Bayesian Logistic Regression
β ∼ Nd+1(0d+1, diag(10, 2.51d))
yi | xi
iid
∼ Bernoulli(σ(β xi)) where σ(t) = (1 + e−t
)−1
LIC NMC RWMH NUTS
ppl
0
10
20
30
40
value
variable = Compile time
LIC NMC RWMH NUTS
ppl
0
10
20
variable = Inference time
LIC NMC RWMH NUTS
ppl
4.3
4.4
4.5
4.6
variable = PLL
LIC NMC RWMH NUTS
ppl
100
200
value
variable = ESS
LIC NMC RWMH NUTS
ppl
1.2
1.4
1.6
variable = Rhat
26
n-Schools
β0 ∼ StudentT(3, 0, 10)
τi ∼ HalfCauchy(σi) for i ∈ [district, state, type]
βi,j ∼ N(0, τi) for i ∈ [district, state, type], j ∈ [ni]
yk ∼ N(β0 +
i
βi,jk
, σk)
27
n-Schools
LIC NMC RWMH NUTS
ppl
0
25
50
75
100
value
variable = Compile time (sec)
LIC NMC RWMH NUTS
ppl
0
10
20
variable = Inference time (sec)
LIC NMC RWMH NUTS
ppl
7
8
9
10
variable = PLL
LIC NMC RWMH NUTS
ppl
10
15
20
25
value
variable = ESS
LIC NMC RWMH NUTS
ppl
2
4
6
8
variable = Rhat
27
Future directions
Adaptive LIC
Problem: forward samples from P(X, Y) may not
represent Y at inference
RWMH [Garthwaite et al., 2016] and HMC
[Hoffman and Gelman, 2014] all have adaptive variants.
Idea: Perform MH with LIC to draw posterior samples
(x(m)
, y(m)
= obs) ∼ P(x | y = obs), hill-climb LIC artifacts
on inclusive KL between conditional (rather than joint)
posterior
arg min
φ
DKL(p(x|y = obs)||q(x|y = obs; φ))
≈ arg min
φ
N
m=1
log Q(x(m)
| y = obs, φ)
28
IAF density estimators
Problem: GMM in LIC may provide poor approximations
Idea: Parameterize IAFs [Kingma et al., 2016] with LIC
outputs
Figure 7: Neal’s funnel (left) and a 7-component isotropic GMM
(middle) and 7-layer IAF (right) density approximation
29
Heavy-tailed density estimators
Problem: GMMs and standard IAFs (Lipschitz functions of
Gaussians) remain sub-Gaussian, n-schools is heavy
tailed
Idea: IAFs with heavy-tailed base distribution
Figure 8: IAF density estimation of a Cauchy(−2, 1) (left) and
their K-S statistics when using a Normal (right top) and
StudentT (right bottom) base distribution
30
Thank You!
31
References i
Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003).
Latent dirichlet allocation.
Journal of machine Learning research, 3(Jan):993–1022.
Garthwaite, P. H., Fan, Y., and Sisson, S. A. (2016).
Adaptive optimal scaling of metropolis–hastings
algorithms using the robbins–monro process.
Communications in Statistics-Theory and Methods,
45(17):5098–5111.
32
References ii
Ghahramani, Z. (2015).
Probabilistic machine learning and artificial
intelligence.
Nature, 521(7553):452–459.
Goodfellow, I., Bengio, Y., Courville, A., and Bengio, Y.
(2016).
Deep learning.
MIT press Cambridge.
33
References iii
Goodman, N. D. (2013).
The principles and practice of probabilistic
programming.
ACM SIGPLAN Notices, 48(1):399–402.
Harvey, W., Munk, A., Baydin, A. G., Bergholm, A., and
Wood, F. (2019).
Attention for inference compilation.
arXiv preprint arXiv:1910.11961.
Hastings, W. K. (1970).
Monte carlo sampling methods using markov chains
and their applications.
34
References iv
Hoffman, M. D. and Gelman, A. (2014).
The no-u-turn sampler: adaptively setting path
lengths in hamiltonian monte carlo.
J. Mach. Learn. Res., 15(1):1593–1623.
Kingma, D. P., Salimans, T., Jozefowicz, R., Chen, X.,
Sutskever, I., and Welling, M. (2016).
Improved variational inference with inverse
autoregressive flow.
In Advances in neural information processing
systems, pages 4743–4751.
35
References v
Koller, D. and Friedman, N. (2009).
Probabilistic graphical models: principles and
techniques.
MIT press.
Le, T. A., Baydin, A. G., and Wood, F. (2017).
Inference compilation and universal probabilistic
programming.
In Artificial Intelligence and Statistics, pages
1338–1348.
36
References vi
LeCun, Y., Bengio, Y., and Hinton, G. (2015).
Deep learning.
nature, 521(7553):436–444.
Norvig, P. R. and Intelligence, S. A. (2002).
A modern approach.
Prentice Hall.
Pearl, J. (1987).
Evidential reasoning using stochastic simulation of
causal models.
Artificial Intelligence, 32(2):245–257.
37
References vii
Tenenbaum, J. B., Kemp, C., Griffiths, T. L., and
Goodman, N. D. (2011).
How to grow a mind: Statistics, structure, and
abstraction.
science, 331(6022):1279–1285.
van de Meent, J.-W., Paige, B., Yang, H., and Wood, F.
(2018).
An introduction to probabilistic programming.
arXiv preprint arXiv:1809.10756.
38
References viii
Wingate, D., Stuhlmüller, A., and Goodman, N. (2011).
Lightweight implementations of probabilistic
programming languages via transformational
compilation.
In Proceedings of the Fourteenth International
Conference on Artificial Intelligence and Statistics,
pages 770–778.
39
References ix
Yuan, C. and Druzdzel, M. J. (2007).
Theoretical analysis and practical insights on
importance sampling in bayesian networks.
International Journal of Approximate Reasoning,
46(2):320–333.
40

Más contenido relacionado

La actualidad más candente

Giáo trình Phân tích và thiết kế giải thuật - CHAP 8
Giáo trình Phân tích và thiết kế giải thuật - CHAP 8Giáo trình Phân tích và thiết kế giải thuật - CHAP 8
Giáo trình Phân tích và thiết kế giải thuật - CHAP 8Nguyễn Công Hoàng
 
Ee693 sept2014midsem
Ee693 sept2014midsemEe693 sept2014midsem
Ee693 sept2014midsemGopi Saiteja
 
t-tests in R - Lab slides for UGA course FANR 6750
t-tests in R - Lab slides for UGA course FANR 6750t-tests in R - Lab slides for UGA course FANR 6750
t-tests in R - Lab slides for UGA course FANR 6750richardchandler
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modulesChristian Robert
 
Principal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and VisualizationPrincipal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and VisualizationMarjan Sterjev
 
An algorithm for generating new mandelbrot and julia sets
An algorithm for generating new mandelbrot and julia setsAn algorithm for generating new mandelbrot and julia sets
An algorithm for generating new mandelbrot and julia setsAlexander Decker
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine LearningBig_Data_Ukraine
 
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMINGChristian Robert
 
Data-Driven Recommender Systems
Data-Driven Recommender SystemsData-Driven Recommender Systems
Data-Driven Recommender Systemsrecsysfr
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classificationSung Yub Kim
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational AutoencoderMark Chang
 

La actualidad más candente (20)

Np cooks theorem
Np cooks theoremNp cooks theorem
Np cooks theorem
 
Daa unit 4
Daa unit 4Daa unit 4
Daa unit 4
 
Lecture11 xing
Lecture11 xingLecture11 xing
Lecture11 xing
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Solution 3.
Solution 3.Solution 3.
Solution 3.
 
Lec4
Lec4Lec4
Lec4
 
Algorithms DM
Algorithms DMAlgorithms DM
Algorithms DM
 
Giáo trình Phân tích và thiết kế giải thuật - CHAP 8
Giáo trình Phân tích và thiết kế giải thuật - CHAP 8Giáo trình Phân tích và thiết kế giải thuật - CHAP 8
Giáo trình Phân tích và thiết kế giải thuật - CHAP 8
 
Ee693 sept2014midsem
Ee693 sept2014midsemEe693 sept2014midsem
Ee693 sept2014midsem
 
t-tests in R - Lab slides for UGA course FANR 6750
t-tests in R - Lab slides for UGA course FANR 6750t-tests in R - Lab slides for UGA course FANR 6750
t-tests in R - Lab slides for UGA course FANR 6750
 
better together? statistical learning in models made of modules
better together? statistical learning in models made of modulesbetter together? statistical learning in models made of modules
better together? statistical learning in models made of modules
 
Chapter 3 ds
Chapter 3 dsChapter 3 ds
Chapter 3 ds
 
Principal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and VisualizationPrincipal Components Analysis, Calculation and Visualization
Principal Components Analysis, Calculation and Visualization
 
An algorithm for generating new mandelbrot and julia sets
An algorithm for generating new mandelbrot and julia setsAn algorithm for generating new mandelbrot and julia sets
An algorithm for generating new mandelbrot and julia sets
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
3rd NIPS Workshop on PROBABILISTIC PROGRAMMING
 
Data-Driven Recommender Systems
Data-Driven Recommender SystemsData-Driven Recommender Systems
Data-Driven Recommender Systems
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Linear models for classification
Linear models for classificationLinear models for classification
Linear models for classification
 
Variational Autoencoder
Variational AutoencoderVariational Autoencoder
Variational Autoencoder
 

Similar a Accelerating Metropolis Hastings with Lightweight Inference Compilation

Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Fabian Pedregosa
 
Iwsm2014 an analogy-based approach to estimation of software development ef...
Iwsm2014   an analogy-based approach to estimation of software development ef...Iwsm2014   an analogy-based approach to estimation of software development ef...
Iwsm2014 an analogy-based approach to estimation of software development ef...Nesma
 
Statement of stochastic programming problems
Statement of stochastic programming problemsStatement of stochastic programming problems
Statement of stochastic programming problemsSSA KPI
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheetSuvrat Mishra
 
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...NTNU
 
Pert 05 aplikasi clustering
Pert 05 aplikasi clusteringPert 05 aplikasi clustering
Pert 05 aplikasi clusteringaiiniR
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheetJoachim Gwoke
 

Similar a Accelerating Metropolis Hastings with Lightweight Inference Compilation (20)

QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Lecture12 xing
Lecture12 xingLecture12 xing
Lecture12 xing
 
Bayesian_Decision_Theory-3.pdf
Bayesian_Decision_Theory-3.pdfBayesian_Decision_Theory-3.pdf
Bayesian_Decision_Theory-3.pdf
 
Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4Random Matrix Theory and Machine Learning - Part 4
Random Matrix Theory and Machine Learning - Part 4
 
Iwsm2014 an analogy-based approach to estimation of software development ef...
Iwsm2014   an analogy-based approach to estimation of software development ef...Iwsm2014   an analogy-based approach to estimation of software development ef...
Iwsm2014 an analogy-based approach to estimation of software development ef...
 
A basic introduction to learning
A basic introduction to learningA basic introduction to learning
A basic introduction to learning
 
QMC: Transition Workshop - Approximating Multivariate Functions When Function...
QMC: Transition Workshop - Approximating Multivariate Functions When Function...QMC: Transition Workshop - Approximating Multivariate Functions When Function...
QMC: Transition Workshop - Approximating Multivariate Functions When Function...
 
Econometrics 2017-graduate-3
Econometrics 2017-graduate-3Econometrics 2017-graduate-3
Econometrics 2017-graduate-3
 
CLIM Program: Remote Sensing Workshop, Statistical Emulation with Dimension R...
CLIM Program: Remote Sensing Workshop, Statistical Emulation with Dimension R...CLIM Program: Remote Sensing Workshop, Statistical Emulation with Dimension R...
CLIM Program: Remote Sensing Workshop, Statistical Emulation with Dimension R...
 
ML unit-1.pptx
ML unit-1.pptxML unit-1.pptx
ML unit-1.pptx
 
Statement of stochastic programming problems
Statement of stochastic programming problemsStatement of stochastic programming problems
Statement of stochastic programming problems
 
pattern recognition
pattern recognition pattern recognition
pattern recognition
 
XGBoostLSS - An extension of XGBoost to probabilistic forecasting, Alexander ...
XGBoostLSS - An extension of XGBoost to probabilistic forecasting, Alexander ...XGBoostLSS - An extension of XGBoost to probabilistic forecasting, Alexander ...
XGBoostLSS - An extension of XGBoost to probabilistic forecasting, Alexander ...
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
 
Probability Cheatsheet.pdf
Probability Cheatsheet.pdfProbability Cheatsheet.pdf
Probability Cheatsheet.pdf
 
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
An Importance Sampling Approach to Integrate Expert Knowledge When Learning B...
 
CLIM: Transition Workshop - Statistical Emulation with Dimension Reduction fo...
CLIM: Transition Workshop - Statistical Emulation with Dimension Reduction fo...CLIM: Transition Workshop - Statistical Emulation with Dimension Reduction fo...
CLIM: Transition Workshop - Statistical Emulation with Dimension Reduction fo...
 
BAYSM'14, Wien, Austria
BAYSM'14, Wien, AustriaBAYSM'14, Wien, Austria
BAYSM'14, Wien, Austria
 
Pert 05 aplikasi clustering
Pert 05 aplikasi clusteringPert 05 aplikasi clustering
Pert 05 aplikasi clustering
 
Probability cheatsheet
Probability cheatsheetProbability cheatsheet
Probability cheatsheet
 

Más de Feynman Liang

Detecting paraphrases using recursive autoencoders
Detecting paraphrases using recursive autoencodersDetecting paraphrases using recursive autoencoders
Detecting paraphrases using recursive autoencodersFeynman Liang
 
transplantation-isospectral-poster
transplantation-isospectral-postertransplantation-isospectral-poster
transplantation-isospectral-posterFeynman Liang
 
A Unifying Review of Gaussian Linear Models (Roweis 1999)
A Unifying Review of Gaussian Linear Models (Roweis 1999)A Unifying Review of Gaussian Linear Models (Roweis 1999)
A Unifying Review of Gaussian Linear Models (Roweis 1999)Feynman Liang
 
Recursive Autoencoders for Paraphrase Detection (Socher et al)
Recursive Autoencoders for Paraphrase Detection (Socher et al)Recursive Autoencoders for Paraphrase Detection (Socher et al)
Recursive Autoencoders for Paraphrase Detection (Socher et al)Feynman Liang
 
Engineered histone acetylation using DNA-binding domains (DBD), chemical ind...
 Engineered histone acetylation using DNA-binding domains (DBD), chemical ind... Engineered histone acetylation using DNA-binding domains (DBD), chemical ind...
Engineered histone acetylation using DNA-binding domains (DBD), chemical ind...Feynman Liang
 
A LOV2 Domain-Based Optogenetic Tool to Control Protein Degradation and Cellu...
A LOV2 Domain-Based Optogenetic Tool to Control Protein Degradation and Cellu...A LOV2 Domain-Based Optogenetic Tool to Control Protein Degradation and Cellu...
A LOV2 Domain-Based Optogenetic Tool to Control Protein Degradation and Cellu...Feynman Liang
 
Integrated Genomic and Proteomic Analyses of a Systematically Perturbed Metab...
Integrated Genomic and Proteomic Analyses of a Systematically Perturbed Metab...Integrated Genomic and Proteomic Analyses of a Systematically Perturbed Metab...
Integrated Genomic and Proteomic Analyses of a Systematically Perturbed Metab...Feynman Liang
 

Más de Feynman Liang (7)

Detecting paraphrases using recursive autoencoders
Detecting paraphrases using recursive autoencodersDetecting paraphrases using recursive autoencoders
Detecting paraphrases using recursive autoencoders
 
transplantation-isospectral-poster
transplantation-isospectral-postertransplantation-isospectral-poster
transplantation-isospectral-poster
 
A Unifying Review of Gaussian Linear Models (Roweis 1999)
A Unifying Review of Gaussian Linear Models (Roweis 1999)A Unifying Review of Gaussian Linear Models (Roweis 1999)
A Unifying Review of Gaussian Linear Models (Roweis 1999)
 
Recursive Autoencoders for Paraphrase Detection (Socher et al)
Recursive Autoencoders for Paraphrase Detection (Socher et al)Recursive Autoencoders for Paraphrase Detection (Socher et al)
Recursive Autoencoders for Paraphrase Detection (Socher et al)
 
Engineered histone acetylation using DNA-binding domains (DBD), chemical ind...
 Engineered histone acetylation using DNA-binding domains (DBD), chemical ind... Engineered histone acetylation using DNA-binding domains (DBD), chemical ind...
Engineered histone acetylation using DNA-binding domains (DBD), chemical ind...
 
A LOV2 Domain-Based Optogenetic Tool to Control Protein Degradation and Cellu...
A LOV2 Domain-Based Optogenetic Tool to Control Protein Degradation and Cellu...A LOV2 Domain-Based Optogenetic Tool to Control Protein Degradation and Cellu...
A LOV2 Domain-Based Optogenetic Tool to Control Protein Degradation and Cellu...
 
Integrated Genomic and Proteomic Analyses of a Systematically Perturbed Metab...
Integrated Genomic and Proteomic Analyses of a Systematically Perturbed Metab...Integrated Genomic and Proteomic Analyses of a Systematically Perturbed Metab...
Integrated Genomic and Proteomic Analyses of a Systematically Perturbed Metab...
 

Último

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFxolyaivanovalion
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...amitlee9823
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsJoseMangaJr1
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823
 

Último (20)

Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdfAccredited-Transport-Cooperatives-Jan-2021-Web.pdf
Accredited-Transport-Cooperatives-Jan-2021-Web.pdf
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
ELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptxELKO dropshipping via API with DroFx.pptx
ELKO dropshipping via API with DroFx.pptx
 
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
Call Girls Jalahalli Just Call 👗 7737669865 👗 Top Class Call Girl Service Ban...
 
Sampling (random) method and Non random.ppt
Sampling (random) method and Non random.pptSampling (random) method and Non random.ppt
Sampling (random) method and Non random.ppt
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Probability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter LessonsProbability Grade 10 Third Quarter Lessons
Probability Grade 10 Third Quarter Lessons
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 

Accelerating Metropolis Hastings with Lightweight Inference Compilation

  • 1. Accelerating Metropolis-Hastings with Lightweight Inference Compilation Feynman Liang, Nim Arora, Nazanin Tehrani, Yucen Li, Michael Tingley, Erik Meijer November 11, 2020 Facebook Probability, Facebook AI Infrastructure, UC Berkeley Mahoney Lab
  • 2. Table of contents Background Probabilistic Programming Bayesian Inference Inference compilation SIS in imperative PPLs Lightweight Inference Compilation for MCMC Results Future directions 1
  • 4. Two competing philosophies [van de Meent et al., 2018] To build machines that can reason, random variables and probabilistic calculations are: 2
  • 5. Two competing philosophies [van de Meent et al., 2018] To build machines that can reason, random variables and probabilistic calculations are: Probabilistic ML An engineering requirement [Tenenbaum et al., 2011, Ghahramani, 2015] 2
  • 6. Two competing philosophies [van de Meent et al., 2018] To build machines that can reason, random variables and probabilistic calculations are: Probabilistic ML An engineering requirement [Tenenbaum et al., 2011, Ghahramani, 2015] Deep Learning Irrelevant [LeCun et al., 2015, Goodfellow et al., 2016] 2
  • 7. Two competing philosophies [van de Meent et al., 2018] To build machines that can reason, random variables and probabilistic calculations are: Probabilistic ML An engineering requirement [Tenenbaum et al., 2011, Ghahramani, 2015] Deep Learning Irrelevant [LeCun et al., 2015, Goodfellow et al., 2016] 2
  • 8. Two competing philosophies [van de Meent et al., 2018] To build machines that can reason, random variables and probabilistic calculations are: Probabilistic ML An engineering requirement [Tenenbaum et al., 2011, Ghahramani, 2015] Deep Learning Irrelevant [LeCun et al., 2015, Goodfellow et al., 2016] 2
  • 10. Probabilistic programming languages (PPLs) Just as programming beyond the simplest algo- rithms requires tools for abstraction and com- position, complex probabilistic modeling requires new progress in model representation—proba- bilistic programming languages. [Goodman, 2013] 3
  • 11. Abstractions over deterministic computations Low Level Assembly 1 mov dx, msg 2 ; ah=9 - "print string" sub-function 3 mov ah, 9 4 int 0x21 5 6 "exit" sub-function 7 mov ah, 0x4c 8 int 0x21 9 10 msg db 'Hello!', 0x0d, 0x0a, '$' 4
  • 12. Abstractions over deterministic computations Low Level Assembly 1 mov dx, msg 2 ; ah=9 - "print string" sub-function 3 mov ah, 9 4 int 0x21 5 6 "exit" sub-function 7 mov ah, 0x4c 8 int 0x21 9 10 msg db 'Hello!', 0x0d, 0x0a, '$' High Level Python 1 print("Hello!") 4
  • 13. Abstractions over probabilistic computations Figure 1: [Koller and Friedman, 2009] 5
  • 14. Abstractions over probabilistic computations Figure 1: [Koller and Friedman, 2009] 1 d ~ Bernoulli 2 i ~ Normal 3 g ~ Categorical(fn(d, i)) 4 s ~ Normal(fn(i)) 5 l ~ Bernoulli(fn(g)) 5
  • 15. Abstractions over probabilistic computations Figure 1: [Koller and Friedman, 2009] 1 d ~ Bernoulli 2 i ~ Normal 3 g ~ Categorical(fn(d, i)) 4 s ~ Normal(fn(i)) 5 l ~ Bernoulli(fn(g)) Generative model: P(D, I, G, S, L) = P(D)P(I)P(G | D, I)P(S | I)P(G | L) 5
  • 16. Abstractions over probabilistic computations Figure 1: [Koller and Friedman, 2009] 1 d ~ Bernoulli 2 i ~ Normal 3 g ~ Categorical(fn(d, i)) 4 s ~ Normal(fn(i)) 5 l ~ Bernoulli(fn(g)) Question : Given a student’s recommendation letter and SAT score, what should I expect their intelligence to be? 5
  • 17. Abstractions over probabilistic computations Figure 1: [Koller and Friedman, 2009] 1 d ~ Bernoulli 2 i ~ Normal 3 g ~ Categorical(fn(d, i)) 4 s ~ Normal(fn(i)) 5 l ~ Bernoulli(fn(g)) Question : Given a student’s recommendation letter and SAT score, what should I expect their intelligence to be? Using a PPL : infer(i, {l=Good, s=800}) 5
  • 19. Bayesian Inference Basics Latent Variables X Observed Variables Y Prior P(X) Likelihood P(Y | X) 6
  • 20. Bayesian Inference Basics Latent Variables X Observed Variables Y Prior P(X) Likelihood P(Y | X) Goal: Approximate the posterior P(X | Y) 6
  • 21. Bayesian Inference Basics Goal: Approximate the posterior P(X | Y) Figure 2: [van de Meent et al., 2018] 6
  • 22. Bayesian Inference Basics Goal: Approximate the posterior P(X | Y) X Y intelligence letter and grade scene description image simulation simulator output program source code program return value policy prior and world simulator rewards cognitive decision making process observed behavior Table 1: [van de Meent et al., 2018] 6
  • 23. Why only approximate? P(X | Y) = P(Y | X)P(X) P(Y) = P(Y | X)P(X) X P(Y | X)P(X)dX 7
  • 24. Why only approximate? P(X | Y) = P(Y | X)P(X) P(Y) = P(Y | X)P(X) X P(Y | X)P(X)dX Marginal likelihood P(Y) (i.e. partition function) high-dimensional integral Tractable only for small family of conjugate prior/likelihood pairs 7
  • 25. How to approximate? Variational Inference Monte Carlo 8
  • 26. How to approximate? Variational Inference Let qφ be a tractable parametric family (e.g. Gaussian mean-field qφ(X) = d i=1 N(Xi | φ1,i, φ2,i)) Monte Carlo 8
  • 27. How to approximate? Variational Inference Let qφ be a tractable parametric family (e.g. Gaussian mean-field qφ(X) = d i=1 N(Xi | φ1,i, φ2,i)) arg min φ KL(qφ(X) | P(X | Y)) = arg max φ Eqφ log qφ(X) log P(X | Y) Monte Carlo 8
  • 28. How to approximate? Variational Inference Let qφ be a tractable parametric family (e.g. Gaussian mean-field qφ(X) = d i=1 N(Xi | φ1,i, φ2,i)) arg min φ KL(qφ(X) | P(X | Y)) = arg max φ Eqφ log qφ(X) log P(X | Y) = arg max φ Eqφ log qφ(X) log P(X, Y) Monte Carlo 8
  • 29. How to approximate? Variational Inference Let qφ be a tractable parametric family (e.g. Gaussian mean-field qφ(X) = d i=1 N(Xi | φ1,i, φ2,i)) arg min φ KL(qφ(X) | P(X | Y)) = arg max φ Eqφ log qφ(X) log P(X | Y) Monte Carlo Sample Xi iid ∼ P(X | Y). Then 8
  • 30. How to approximate? Variational Inference Let qφ be a tractable parametric family (e.g. Gaussian mean-field qφ(X) = d i=1 N(Xi | φ1,i, φ2,i)) arg min φ KL(qφ(X) | P(X | Y)) = arg max φ Eqφ log qφ(X) log P(X | Y) Monte Carlo Sample Xi iid ∼ P(X | Y). Then E[g(X)|Y] = g(X) · P(X | Y)dX ≈ 1 N N n=1 g(Xi) 8
  • 31. How to approximate? Variational Inference Let qφ be a tractable parametric family (e.g. Gaussian mean-field qφ(X) = d i=1 N(Xi | φ1,i, φ2,i)) arg min φ KL(qφ(X) | P(X | Y)) = arg max φ Eqφ log qφ(X) log P(X | Y) Monte Carlo Sample Xi iid ∼ P(X | Y). Then E[g(X)|Y] = g(X) · P(X | Y)dX ≈ 1 N N n=1 g(Xi) 8
  • 32. How to approximate? Variational Inference Let qφ be a tractable parametric family (e.g. Gaussian mean-field qφ(X) = d i=1 N(Xi | φ1,i, φ2,i)) arg min φ KL(qφ(X) | P(X | Y)) = arg max φ Eqφ log qφ(X) log P(X | Y) Monte Carlo Sample Xi iid ∼ P(X | Y). Then E[g(X)|Y] = g(X) · P(X | Y)dX ≈ 1 N N n=1 g(Xi) 8
  • 34. Imperative vs Declarative PPLs Imperative: Evaluation-based, samples (linear) execution traces (Pyro, Church, WebPPL) 9
  • 35. Imperative vs Declarative PPLs Imperative: Evaluation-based, samples (linear) execution traces (Pyro, Church, WebPPL) 1 (begin 2 (define geometric 3 (lambda (p) 4 (if (flip p) 5 1 6 (+ 1 (geometric p))))) 7 (geometric .3)) 9
  • 36. Imperative vs Declarative PPLs Imperative: Evaluation-based, samples (linear) execution traces (Pyro, Church, WebPPL) 1 (begin 2 (define geometric 3 (lambda (p) 4 (if (flip p) 5 1 6 (+ 1 (geometric p))))) 7 (geometric .3)) Figure 2: [Wingate et al., 2011] 9
  • 37. Inference compilation SIS in imperative PPLs
  • 38. Sequential importance sampling (SIS) EP(X|Y)[g(X)] 10
  • 39. Sequential importance sampling (SIS) EP(X|Y)[g(X)] = EP(X|Y) q(X) q(X) g(X) = Eq P(X,Y) q(X) g(X) P(Y) 10
  • 40. Sequential importance sampling (SIS) EP(X|Y)[g(X)] = EP(X|Y) q(X) q(X) g(X) = Eq P(X,Y) q(X) g(X) P(Y) ≈ 1 N N i P(Xi,Y) q(Xi) P(Y) g(Xi) ≈ N i P(Xi,Y) q(Xi) N i P(Xi,Y) q(Xi) g(Xi) where Xi ∼ q from proposal distribution q 10
  • 41. Sequential importance sampling (SIS) EP(X|Y)[g(X)] = EP(X|Y) q(X) q(X) g(X) = Eq P(X,Y) q(X) g(X) P(Y) ≈ 1 N N i P(Xi,Y) q(Xi) P(Y) g(Xi) ≈ N i P(Xi,Y) q(Xi) N i P(Xi,Y) q(Xi) g(Xi) where Xi ∼ q from proposal distribution q Rate Varq P(X,Y) q(X) g(X) −1/2 [Yuan and Druzdzel, 2007] 10
  • 42. SIS of execution traces 1. Execute the probabilistic program forwards 11
  • 43. SIS of execution traces 1. Execute the probabilistic program forwards 2. At each latent variable Xk (i.e. sample statement), sample q(Xk), assign value, multiply node importance weight into trace 11
  • 44. SIS of execution traces 1. Execute the probabilistic program forwards 2. At each latent variable Xk (i.e. sample statement), sample q(Xk), assign value, multiply node importance weight into trace 3. At each observed random variable (i.e. observe statement), multiply likelihood P(Yk | X1:k, Y1:k) into trace 11
  • 45. SIS of execution traces 1. Execute the probabilistic program forwards 2. At each latent variable Xk (i.e. sample statement), sample q(Xk), assign value, multiply node importance weight into trace 3. At each observed random variable (i.e. observe statement), multiply likelihood P(Yk | X1:k, Y1:k) into trace 11
  • 46. SIS of execution traces 1. Execute the probabilistic program forwards 2. At each latent variable Xk (i.e. sample statement), sample q(Xk), assign value, multiply node importance weight into trace 3. At each observed random variable (i.e. observe statement), multiply likelihood P(Yk | X1:k, Y1:k) into trace Problem: Myopic choices from sampling q(X) early in the trace may result in low importance weights (poor explanation of Y) later. 11
  • 47. Constructing a proposal distribution How to choose q? 12
  • 48. Constructing a proposal distribution How to choose q? Likelihood-Weighting [Norvig and Intelligence, 2002]: q(X) = P(X) 12
  • 49. Constructing a proposal distribution How to choose q? Likelihood-Weighting [Norvig and Intelligence, 2002]: q(X) = P(X) Direct sampling: q(X) = P(X | Y), optimal 12
  • 50. Constructing a proposal distribution How to choose q? Likelihood-Weighting [Norvig and Intelligence, 2002]: q(X) = P(X) Direct sampling: q(X) = P(X | Y), optimal Key Idea: Account for Y when constructing q. Exploit access to P(X, Y) to build a proposers q “close” to P(X | Y)? 12
  • 51. Intuition for inference compilation −5.0 −2.5 0.0 2.5 5.0 7.5 10.0 12.5 15.0 x −4 −2 0 2 4 y Generative model p(x,y) −5.0 −2.5 0.0 2.5 5.0 7.5 10.0 12.5 15.0 x 0.00 0.02 0.04 0.06 0.08 0.10 PDF Posterior density given y=0.25 p(x|y = 0.25) x ∼ p(x | y = 0.25) q(x; φ(y = 0.25), K = 1) q(x; φ(y = 0.25), K = 2) 13
  • 52. Trace-based inference compilation (IC) • Construct DNN with parameters φ mapping observations Y (amortized inference, [Goodman, 2013]) and execution prefix to proposal distribution qφ(· | Y) • Train qφ against forward samples from the probabilistic program p(x, y) (inference compilation) Figure 3: [Le et al., 2017] 14
  • 53. Trace-based inference compilation (IC) Figure 4: [Le et al., 2017] 15
  • 54. Sensitivity to nuisance random variables 1 def magnitude(obs, M): 2 x = sample(Normal (0, 10)) 3 [sample(Normal (0 ,10)) for _ in range(M)] # extend trace with nuisance 4 y = sample(Normal (0 ,10)) 5 observe(obs**2, Likelihood=Normal(x**2 + y**2, 0.1)0 6 return x, y 7 Figure 5: [Harvey et al., 2019] 16
  • 56. Imperative vs Declarative PPLs Declarative: Graph-based, samples instantiated graphical models (i.e. worlds) (BUGS, BLOG, Stan, beanmachine) Figure 6: [Blei et al., 2003] Key Idea: Markov blanket available in declarative PPL 17
  • 57. MCMC sampling of graphical models Metropolis-within-Gibbs / Lightweight MH [Wingate et al., 2011]: • Initialize minimal self-supporting world consistent with Y 18
  • 58. MCMC sampling of graphical models Metropolis-within-Gibbs / Lightweight MH [Wingate et al., 2011]: • Initialize minimal self-supporting world consistent with Y • Repeat: • Pick single random unobserved node Xi 18
  • 59. MCMC sampling of graphical models Metropolis-within-Gibbs / Lightweight MH [Wingate et al., 2011]: • Initialize minimal self-supporting world consistent with Y • Repeat: • Pick single random unobserved node Xi • Sample proposal q(Xi) to propose new value 18
  • 60. MCMC sampling of graphical models Metropolis-within-Gibbs / Lightweight MH [Wingate et al., 2011]: • Initialize minimal self-supporting world consistent with Y • Repeat: • Pick single random unobserved node Xi • Sample proposal q(Xi) to propose new value • Accept with probability α and revert otherwise 18
  • 61. MCMC sampling of graphical models Metropolis-within-Gibbs / Lightweight MH [Wingate et al., 2011]: • Initialize minimal self-supporting world consistent with Y • Repeat: • Pick single random unobserved node Xi • Sample proposal q(Xi) to propose new value • Accept with probability α and revert otherwise 18
  • 62. MCMC sampling of graphical models Metropolis-within-Gibbs / Lightweight MH [Wingate et al., 2011]: • Initialize minimal self-supporting world consistent with Y • Repeat: • Pick single random unobserved node Xi • Sample proposal q(Xi) to propose new value • Accept with probability α and revert otherwise Theorem ([Hastings, 1970]) With appropriately chosen α, the above algorithm yields a Markov Chain with the posterior as the invariant distribution. 18
  • 63. MH proposal distributions Different q(·) =⇒ different MCMC algorithms 19
  • 64. MH proposal distributions Different q(·) =⇒ different MCMC algorithms • Random walk MH q(·) isotropic Gaussian 19
  • 65. MH proposal distributions Different q(·) =⇒ different MCMC algorithms • Random walk MH q(·) isotropic Gaussian • Newtonian Monte Carlo q(·) Gaussian with empirical Fisher information precision 19
  • 66. MH proposal distributions Different q(·) =⇒ different MCMC algorithms • Random walk MH q(·) isotropic Gaussian • Newtonian Monte Carlo q(·) Gaussian with empirical Fisher information precision • Hamiltonian Monte Carlo q(·) integrates iso-Hamiltonian system 19
  • 67. MH proposal distributions Different q(·) =⇒ different MCMC algorithms • Random walk MH q(·) isotropic Gaussian • Newtonian Monte Carlo q(·) Gaussian with empirical Fisher information precision • Hamiltonian Monte Carlo q(·) integrates iso-Hamiltonian system • Lightweight Inference Compilation q(· | MB(Xi)) a neural network function of Markov Blanket 19
  • 68. MH proposal distributions Different q(·) =⇒ different MCMC algorithms • Random walk MH q(·) isotropic Gaussian • Newtonian Monte Carlo q(·) Gaussian with empirical Fisher information precision • Hamiltonian Monte Carlo q(·) integrates iso-Hamiltonian system • Lightweight Inference Compilation q(· | MB(Xi)) a neural network function of Markov Blanket Theorem ([Pearl, 1987]) Gibbs distributions P(Xi | Xc i ) = P(Xi | MB(Xi)) have acceptance probability 1 19
  • 69. MH proposal distributions Different q(·) =⇒ different MCMC algorithms • Random walk MH q(·) isotropic Gaussian • Newtonian Monte Carlo q(·) Gaussian with empirical Fisher information precision • Hamiltonian Monte Carlo q(·) integrates iso-Hamiltonian system • Lightweight Inference Compilation q(· | MB(Xi)) a neural network function of Markov Blanket Theorem ([Pearl, 1987]) Gibbs distributions P(Xi | Xc i ) = P(Xi | MB(Xi)) have acceptance probability 1 ∴ MB(Xi) minimal sufficient inputs for constructing proposal distribution 19
  • 70. LIC artifacts for student network q(d|g, i) q(i|g, d, s) q(g|i, d, l) q(s|i) q(l|g) 20
  • 71. LIC proposer for grade 21
  • 72. Compare against SIS IC’s (non-minimal) proposer q(d|observations) q(i|d, observations) q(g|i, d, observations) q(s|g, i, d, observations) q(l|s, g, i, d, observations) 22
  • 74. Recovering conjugate expressions in normal-normal x ∼ N(0, 2), y | x ∼ N(x, 0.1) Know: y | x ∼ N(0.999x, 0.0001) 10 5 0 5 10 y (Observed value) 10 5 0 5 10 value Learning a conjugate model's posterior variable Closed-form mean Mean of LIC proposer 23
  • 75. GMM Mode Escape −5.0 −2.5 0.0 2.5 5.0 7.5 10.0 12.5 15.0 x −4 −2 0 2 4 y Generative model p(x,y) −5.0 −2.5 0.0 2.5 5.0 7.5 10.0 12.5 15.0 x 0.00 0.02 0.04 0.06 0.08 0.10 PDF Posterior density given y=0.25 p(x|y = 0.25) x ∼ p(x | y = 0.25) q(x; φ(y = 0.25), K = 1) q(x; φ(y = 0.25), K = 2) 0.000 0.025 0.050 0.075 Density method = Adaptive HMC (Hoffman 2014) method = Adaptive RWMH (Garthwaite 2016) method = Ground Truth −5 0 5 10 15 x 0.000 0.025 0.050 0.075 Density method = Inference Compilation (this paper) −5 0 5 10 15 x method = NMC (Arora 2020) −5 0 5 10 15 x method = NUTS (Stan defaults) 24
  • 76. Robustness to nuisance random variables 1 def magnitude(obs): 2 x = sample(Normal(0, 10)) 3 for _ in range(100): 4 nuisance = sample(Normal(0, 10)) 5 y = sample(Normal(0, 10)) 6 observe( 7 obs**2, 8 likelihood=Normal(x**2 + y**2, 0.1)) 9 return x 1 class NuisanceModel: 2 @random_variable 3 def x(self): 4 return dist.Normal(0, 10) 5 @random_variable 6 def nuisance(self, i): 7 return dist.Normal(0, 10) 8 @random_variable 9 def y(self): 10 return dist.Normal(0, 10) 11 @random_variable 12 def noisy_sq_length(self): 13 return dist.Normal( 14 self.x()**2 + self.y()**2, 15 0.1) # params compile time ESS LIC (this paper) 3,358 44 sec. 49.75 [Le et al., 2017] 21,952 472 sec. 10.99 25
  • 77. Bayesian Logistic Regression β ∼ Nd+1(0d+1, diag(10, 2.51d)) yi | xi iid ∼ Bernoulli(σ(β xi)) where σ(t) = (1 + e−t )−1 LIC NMC RWMH NUTS ppl 0 10 20 30 40 value variable = Compile time LIC NMC RWMH NUTS ppl 0 10 20 variable = Inference time LIC NMC RWMH NUTS ppl 4.3 4.4 4.5 4.6 variable = PLL LIC NMC RWMH NUTS ppl 100 200 value variable = ESS LIC NMC RWMH NUTS ppl 1.2 1.4 1.6 variable = Rhat 26
  • 78. n-Schools β0 ∼ StudentT(3, 0, 10) τi ∼ HalfCauchy(σi) for i ∈ [district, state, type] βi,j ∼ N(0, τi) for i ∈ [district, state, type], j ∈ [ni] yk ∼ N(β0 + i βi,jk , σk) 27
  • 79. n-Schools LIC NMC RWMH NUTS ppl 0 25 50 75 100 value variable = Compile time (sec) LIC NMC RWMH NUTS ppl 0 10 20 variable = Inference time (sec) LIC NMC RWMH NUTS ppl 7 8 9 10 variable = PLL LIC NMC RWMH NUTS ppl 10 15 20 25 value variable = ESS LIC NMC RWMH NUTS ppl 2 4 6 8 variable = Rhat 27
  • 81. Adaptive LIC Problem: forward samples from P(X, Y) may not represent Y at inference RWMH [Garthwaite et al., 2016] and HMC [Hoffman and Gelman, 2014] all have adaptive variants. Idea: Perform MH with LIC to draw posterior samples (x(m) , y(m) = obs) ∼ P(x | y = obs), hill-climb LIC artifacts on inclusive KL between conditional (rather than joint) posterior arg min φ DKL(p(x|y = obs)||q(x|y = obs; φ)) ≈ arg min φ N m=1 log Q(x(m) | y = obs, φ) 28
  • 82. IAF density estimators Problem: GMM in LIC may provide poor approximations Idea: Parameterize IAFs [Kingma et al., 2016] with LIC outputs Figure 7: Neal’s funnel (left) and a 7-component isotropic GMM (middle) and 7-layer IAF (right) density approximation 29
  • 83. Heavy-tailed density estimators Problem: GMMs and standard IAFs (Lipschitz functions of Gaussians) remain sub-Gaussian, n-schools is heavy tailed Idea: IAFs with heavy-tailed base distribution Figure 8: IAF density estimation of a Cauchy(−2, 1) (left) and their K-S statistics when using a Normal (right top) and StudentT (right bottom) base distribution 30
  • 85. References i Blei, D. M., Ng, A. Y., and Jordan, M. I. (2003). Latent dirichlet allocation. Journal of machine Learning research, 3(Jan):993–1022. Garthwaite, P. H., Fan, Y., and Sisson, S. A. (2016). Adaptive optimal scaling of metropolis–hastings algorithms using the robbins–monro process. Communications in Statistics-Theory and Methods, 45(17):5098–5111. 32
  • 86. References ii Ghahramani, Z. (2015). Probabilistic machine learning and artificial intelligence. Nature, 521(7553):452–459. Goodfellow, I., Bengio, Y., Courville, A., and Bengio, Y. (2016). Deep learning. MIT press Cambridge. 33
  • 87. References iii Goodman, N. D. (2013). The principles and practice of probabilistic programming. ACM SIGPLAN Notices, 48(1):399–402. Harvey, W., Munk, A., Baydin, A. G., Bergholm, A., and Wood, F. (2019). Attention for inference compilation. arXiv preprint arXiv:1910.11961. Hastings, W. K. (1970). Monte carlo sampling methods using markov chains and their applications. 34
  • 88. References iv Hoffman, M. D. and Gelman, A. (2014). The no-u-turn sampler: adaptively setting path lengths in hamiltonian monte carlo. J. Mach. Learn. Res., 15(1):1593–1623. Kingma, D. P., Salimans, T., Jozefowicz, R., Chen, X., Sutskever, I., and Welling, M. (2016). Improved variational inference with inverse autoregressive flow. In Advances in neural information processing systems, pages 4743–4751. 35
  • 89. References v Koller, D. and Friedman, N. (2009). Probabilistic graphical models: principles and techniques. MIT press. Le, T. A., Baydin, A. G., and Wood, F. (2017). Inference compilation and universal probabilistic programming. In Artificial Intelligence and Statistics, pages 1338–1348. 36
  • 90. References vi LeCun, Y., Bengio, Y., and Hinton, G. (2015). Deep learning. nature, 521(7553):436–444. Norvig, P. R. and Intelligence, S. A. (2002). A modern approach. Prentice Hall. Pearl, J. (1987). Evidential reasoning using stochastic simulation of causal models. Artificial Intelligence, 32(2):245–257. 37
  • 91. References vii Tenenbaum, J. B., Kemp, C., Griffiths, T. L., and Goodman, N. D. (2011). How to grow a mind: Statistics, structure, and abstraction. science, 331(6022):1279–1285. van de Meent, J.-W., Paige, B., Yang, H., and Wood, F. (2018). An introduction to probabilistic programming. arXiv preprint arXiv:1809.10756. 38
  • 92. References viii Wingate, D., Stuhlmüller, A., and Goodman, N. (2011). Lightweight implementations of probabilistic programming languages via transformational compilation. In Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics, pages 770–778. 39
  • 93. References ix Yuan, C. and Druzdzel, M. J. (2007). Theoretical analysis and practical insights on importance sampling in bayesian networks. International Journal of Approximate Reasoning, 46(2):320–333. 40