SlideShare una empresa de Scribd logo
1 de 45
Descargar para leer sin conexión
Estimation of the score vector and observed
information matrix in intractable models
Arnaud Doucet (University of Oxford)
Pierre E. Jacob (University of Oxford)
Sylvain Rubenthaler (Universit´e Nice Sophia Antipolis)
November 28th, 2014
Pierre Jacob Derivative estimation 1/ 39
Outline
1 Context
2 General results and connections
3 Posterior concentration when the prior concentrates
4 Hidden Markov models
Pierre Jacob Derivative estimation 1/ 39
Outline
1 Context
2 General results and connections
3 Posterior concentration when the prior concentrates
4 Hidden Markov models
Pierre Jacob Derivative estimation 2/ 39
Motivation
Derivatives of the likelihood help optimizing / sampling.
For many models they are not available.
One can resort to approximation techniques.
Pierre Jacob Derivative estimation 2/ 39
Using derivatives in sampling algorithms
Modified Adjusted Langevin Algorithm
At step t, given a point θt, do:
propose
θ⋆
∼ q(dθ | θt) ≡ N(θt +
σ2
2
∇θ log π(θt), σ2
),
with probability
1 ∧
π(θ⋆)q(θt | θ⋆)
π(θt)q(θ⋆ | θt)
set θt+1 = θ⋆, otherwise set θt+1 = θt.
Pierre Jacob Derivative estimation 3/ 39
Using derivatives in sampling algorithms
Figure : Proposal mechanism for random walk Metropolis–Hastings.
Pierre Jacob Derivative estimation 4/ 39
Using derivatives in sampling algorithms
Figure : Proposal mechanism for MALA.
Pierre Jacob Derivative estimation 5/ 39
Using derivatives in sampling algorithms
In what sense is MALA better than MH?
Scaling with the dimension of the state space
For Metropolis–Hastings, optimal scaling leads to
σ2
= O(d−1
),
For MALA, optimal scaling leads to
σ2
= O(d−1/3
).
Roberts & Rosenthal, Optimal Scaling for Various
Metropolis-Hastings Algorithms, 2001.
Pierre Jacob Derivative estimation 6/ 39
Hidden Markov models
y2
X2X0
y1
X1
...
... yT
XT
θ
Figure : Graph representation of a general hidden Markov model.
Hidden process: initial distribution µθ, transition fθ.
Observations conditional upon the hidden process, from gθ.
Pierre Jacob Derivative estimation 7/ 39
Assumptions
Input:
Parameter θ : unknown, prior distribution p.
Initial condition µθ(dx0) : can be sampled from.
Transition fθ(dxt|xt−1) : can be sampled from.
Measurement gθ(yt|xt) : can be evaluated point-wise.
Observations y1:T = (y1, . . . , yT ).
Goals:
score: ∇θ log L(θ; y1:T ) for any θ,
observed information matrix: −∇2
θ log L(θ; y1:T ) for any θ.
Note: throughout the talk, the observations, and thus the
likelihood, are fixed.
Pierre Jacob Derivative estimation 8/ 39
Why is it an intractable model?
The likelihood function does not admit a closed form expression:
L(θ; y1, . . . , yT ) =
∫
XT+1
p(y1, . . . , yT | x0, . . . xT , θ)p(dx0, . . . dxT | θ)
=
∫
XT+1
T∏
t=1
gθ(yt | xt) µθ(dx0)
T∏
t=1
fθ(dxt | xt−1).
Hence the likelihood can only be estimated, e.g. by standard
Monte Carlo, or by particle filters.
What about the derivatives of the likelihood?
Pierre Jacob Derivative estimation 9/ 39
Fisher and Louis’ identities
Write the score as:
∇ℓ(θ) =
∫
∇ log p(x0:T , y1:T | θ)p(dx0:T | y1:T , θ).
which is an integral, with respect to the smoothing distribution
p(dx0:T | y1:T , θ), of
∇ log p(x0:T , y1:T | θ) = ∇ log µθ(x0)
+
T∑
t=1
∇ log fθ(xt | xt−1) +
T∑
t=1
∇ log gθ(yt | xt).
However pointwise evaluations of ∇ log µθ(x0) and
∇ log fθ(xt | xt−1) are not always available.
Pierre Jacob Derivative estimation 10/ 39
New kid on the block: Iterated Filtering
Perturbed model
Hidden states ˜Xt = (˜θt, Xt).
{
˜θ0 ∼ N(θ0, τ2Σ)
X0 ∼ µ˜θ0
(·)
and
{
˜θt ∼ N(˜θt−1, σ2Σ)
Xt ∼ f˜θt
(· | Xt−1 = xt−1)
Observations ˜Yt ∼ g˜θt
(· | Xt).
Score estimate
Consider VP,t = Cov[˜θt | y1:t−1] and ˜θF,t = E[˜θt | y1:t].
T∑
t=1
VP,t
−1
(
˜θF,t − ˜θF,t−1
)
≈ ∇ℓ(θ0)
when τ → 0 and σ/τ → 0. Ionides, Breto, King, PNAS, 2006.
Pierre Jacob Derivative estimation 11/ 39
Iterated Filtering: the mystery
Why is it valid?
Is it related to known techniques?
Can it be extended to estimate the second derivatives (i.e.
the Hessian, i.e. the observed information matrix)?
How does it compare to other methods such as finite
difference?
Pierre Jacob Derivative estimation 12/ 39
Outline
1 Context
2 General results and connections
3 Posterior concentration when the prior concentrates
4 Hidden Markov models
Pierre Jacob Derivative estimation 13/ 39
Iterated Filtering
Given a log likelihood ℓ and a given point, consider a prior
θ ∼ N(θ0, σ2
).
Posterior expectation when the prior variance goes to zero
First-order moments give first-order derivatives:
|σ−2
(E[θ|Y ] − θ0) − ∇ℓ(θ0)| ≤ Cσ2
.
Phrased simply,
posterior mean − prior mean
prior variance
≈ score.
Result from Ionides, Bhadra, Atchad´e, King, Iterated filtering,
2011.
Pierre Jacob Derivative estimation 13/ 39
Extension of Iterated Filtering
Posterior variance when the prior variance goes to zero
Second-order moments give second-order derivatives:
|σ−4
(
Cov[θ|Y ] − σ2
)
− ∇2
ℓ(θ0)| ≤ Cσ2
.
Phrased simply,
posterior variance − prior variance
prior variance2 ≈ hessian.
Result from Doucet, Jacob, Rubenthaler on arXiv, 2013.
Pierre Jacob Derivative estimation 14/ 39
Proximity mapping
Given a real function f and a point θ0, consider for any σ2 > 0
θ → f (θ) exp
{
−
1
2σ2
(θ − θ0)2
}
θ
θ0
Figure : Example for f : θ → exp(−|θ|) and three values of σ2
.
Pierre Jacob Derivative estimation 15/ 39
Proximity mapping
Proximity mapping
The σ2-proximity mapping is defined by
proxf : θ0 → argmaxθ∈R f (θ) exp
{
−
1
2σ2
(θ − θ0)2
}
.
Moreau approximation
The σ2-Moreau approximation is defined by
fσ2 : θ0 → C supθ∈R f (θ) exp
{
−
1
2σ2
(θ − θ0)2
}
where C is a normalizing constant.
Pierre Jacob Derivative estimation 16/ 39
Proximity mapping
θ
Figure : θ → f (θ) and θ → fσ2 (θ) for three values of σ2
.
Pierre Jacob Derivative estimation 16/ 39
Proximity mapping
Property
Those objects are such that
proxf (θ0) − θ0
σ2
= ∇ log fσ2 (θ0) −−−→
σ2→0
∇ log f (θ0)
Moreau (1962), Fonctions convexes duales et points proximaux
dans un espace Hilbertien.
Pereyra (2013), Proximal Markov chain Monte Carlo
algorithms.
Pierre Jacob Derivative estimation 17/ 39
Proximity mapping
Bayesian interpretation
If f is a seen as a likelihood function then
θ → f (θ) exp
{
−
1
2σ2
(θ − θ0)2
}
is an unnormalized posterior density function based on a
Normal prior with mean θ0 and variance σ2.
Hence
proxf (θ0) − θ0
σ2
−−−→
σ2→0
∇ log f (θ0)
can be read
posterior mode − prior mode
prior variance
≈ score.
Pierre Jacob Derivative estimation 18/ 39
Stein’s lemma
Stein’s lemma states that
θ ∼ N(θ0, σ2
)
if and only if for any function g such that E [|∇g(θ)|] < ∞,
E [(θ − θ0) g (θ)] = σ2
E [∇g (θ)] .
If we choose the function g : θ → exp ℓ (θ) /Z with
Z = E [exp ℓ (θ)] and apply Stein’s lemma we obtain
1
Z
E [θ exp ℓ(θ)] − θ0 =
σ2
Z
E [∇ℓ (θ) exp (ℓ (θ))]
⇔ σ−2
(E [θ | Y ] − θ0) = E [∇ℓ (θ) | Y ] .
Notation: E[φ(θ) | Y ] := E[φ(θ) exp ℓ(θ)]/Z.
Pierre Jacob Derivative estimation 19/ 39
Stein’s lemma
For the second derivative, we consider
h : θ → (θ − θ0) exp ℓ (θ) /Z.
Then
E
[
(θ − θ0)2
| Y
]
= σ2
+ σ4
E
[
∇2
ℓ(θ) + ∇ℓ(θ)2
| Y
]
.
Adding and subtracting terms also yields
σ−4
(
V [θ | Y ] − σ2
)
= E
[
∇2
ℓ(θ) | Y
]
+
{
E
[
∇ℓ(θ)2
| Y
]
− (E [∇ℓ(θ) | Y ])2
}
.
. . . but what we really want is
∇ℓ(θ0), ∇ℓ2
(θ0)
and not
E [∇ℓ(θ) | Y ] , E
[
∇ℓ2
(θ) | Y
]
.
Pierre Jacob Derivative estimation 20/ 39
Outline
1 Context
2 General results and connections
3 Posterior concentration when the prior concentrates
4 Hidden Markov models
Pierre Jacob Derivative estimation 21/ 39
Core Idea
The prior is a essentially a normal distribution N(θ0, σ2), but in
general has a density denoted by κ.
Posterior concentration induced by the prior
Under some assumptions, when σ → 0:
the posterior looks more and more like the prior,
the shift in posterior moments is in O(σ2).
Our arXived proof suffers from an overdose of Taylor
expansions.
Pierre Jacob Derivative estimation 21/ 39
Details
Introduce a test function h such that |h(u)| < c|u|α for some
c, α.
We start by writing
E {h (θ − θ0)| y} =
∫
h (σu) exp {ℓ (θ0 + σu) − ℓ(θ0)} κ (u) du
∫
exp {ℓ (θ0 + σu) − ℓ(θ0)} κ (u) du
using u = (θ − θ0)/σ and then focus on the numerator
∫
h (σu) exp {ℓ (θ0 + σu) − ℓ(θ0)} κ (u) du
since the denominator is a particular instance of this expression
with h : u → 1.
Pierre Jacob Derivative estimation 22/ 39
Details
For the numerator:
∫
h (σu) exp {ℓ (θ0 + σu) − ℓ(θ0)} κ (u) du
we use a Taylor expansion of ℓ around θ0 and a Taylor
expansion of exp around 0, and then take the integral with
respect to κ.
Notation:
ℓ(k)
(θ).u⊗k
=
∑
1≤i1,...,ik≤d
∂kℓ(θ)
∂θi1 . . . ∂θik
ui1 . . . uik
which in one dimension becomes
ℓ(k)
(θ).u⊗k
=
dkf (θ)
dθk
uk
.
Pierre Jacob Derivative estimation 23/ 39
Details
Main expansion:
∫
h(σu) exp {ℓ (θ0 + σu) − ℓ(θ0)} κ(u)du =
∫
h(σu)κ(u)du + σ
∫
h(σu)ℓ(1)
(θ0).u κ(u)du
+ σ2
∫
h(σu)
{
1
2
ℓ(2)
(θ0).u⊗2
+
1
2
(ℓ(1)
(θ0).u)2
}
κ(u)du
+ σ3
∫
h(σu)
{
1
3!
(ℓ(1)
(θ0).u)3
+
1
2
(ℓ(1)
(θ0).u)(ℓ(2)
(θ0).u⊗2
)
+
1
3!
ℓ(3)
(θ0).u⊗3
}
κ(u)du + O(σ4+α
).
In general, assumptions on the tails of the prior and the
likelihood are used to control the remainder terms and to
ensure there are O(σ4+α).
Pierre Jacob Derivative estimation 24/ 39
Details
We cut the integral into two bits:
∫
h(σu) exp {ℓ (θ0 + σu) − ℓ(θ0)} κ(u)du
=
∫
σ|u|≤ρ
h(σu) exp {ℓ (θ0 + σu) − ℓ(θ0)} κ(u)du
+
∫
σ|u|>ρ
h(σu) exp {ℓ (θ0 + σu) − ℓ(θ0)} κ(u)du
The expansion stems from the first term, where σ|u| is
small.
The second term ends up in the remainder in O(σ4+α)
using the assumptions.
Classic technique in Bayesian asymptotics theory.
Pierre Jacob Derivative estimation 25/ 39
Details
To get the score from the expansion, choose
h : u → u.
To get the observed information matrix from the
expansion, choose
h : u → u2
,
and surprisingly (?) further assume that κ is mesokurtic,
i.e. ∫
u4
κ(u)du = 3
(∫
u2
κ(u)du
)2
⇒ choose a Gaussian prior to obtain the hessian.
Pierre Jacob Derivative estimation 26/ 39
Outline
1 Context
2 General results and connections
3 Posterior concentration when the prior concentrates
4 Hidden Markov models
Pierre Jacob Derivative estimation 27/ 39
Hidden Markov models
y2
X2X0
y1
X1
...
... yT
XT
θ
Figure : Graph representation of a general hidden Markov model.
Pierre Jacob Derivative estimation 27/ 39
Hidden Markov models
Direct application of the previous results
1 Prior distribution N(θ0, σ2) on the parameter θ.
2 The derivative approximations involve E[θ|Y ] and
Cov[θ|Y ].
3 Posterior moments for HMMs can be estimated by
particle MCMC,
SMC2
,
ABC
or your favourite method.
Ionides et al. proposed another approach.
Pierre Jacob Derivative estimation 28/ 39
Iterated Filtering
Modification of the model: θ is time-varying.
The associated loglikelihood is
¯ℓ(θ1:T ) = log p(y1:T ; θ1:T )
= log
∫
XT+1
T∏
t=1
g(yt | xt, θt) µ(dx1 | θ1)
T∏
t=2
f (dxt | xt−1, θt).
Introducing θ → (θ, θ, . . . , θ) := θ[T] ∈ RT , we have
¯ℓ(θ[T]
) = ℓ(θ)
and the chain rule yields
dℓ(θ)
dθ
=
T∑
t=1
∂¯ℓ(θ[T])
∂θt
.
Pierre Jacob Derivative estimation 29/ 39
Iterated Filtering
Choice of prior on θ1:T :
θ1 = θ0 + V1, V1 ∼ τ−1
κ
{
τ−1
(·)
}
θt+1 − θ0 = ρ
(
θt − θ0
)
+ Vt+1, Vt+1 ∼ σ−1
κ
{
σ−1
(·)
}
Choose σ2 such that τ2 = σ2/(1 − ρ2). Covariance of the prior
on θ1:T :
ΣT = τ2












1 ρ · · · · · · · · · ρT−1
ρ 1 ρ · · · · · · ρT−2
ρ2 ρ 1
... ρT−3
...
...
...
...
...
ρT−2 ... 1 ρ
ρT−1 · · · · · · · · · ρ 1












.
Pierre Jacob Derivative estimation 30/ 39
Iterated Filtering
Applying the general results for this prior yields, with
|x| =
∑T
t=1 |xi|:
|∇¯ℓ(θ
[T]
0 ) − Σ−1
T
(
E
[
θ1:T | Y
]
− θ
[T]
0
)
| ≤ Cτ2
Moreover we have
T∑
t=1
∂¯ℓ(θ[T])
∂θt
−
T∑
t=1
{
Σ−1
T
(
E
[
θ1:T | Y
]
− θ
[T]
0
)}
t
≤
T∑
t=1
∂¯ℓ(θ[T])
∂θt
−
{
Σ−1
T
(
E
[
θ1:T | Y
]
− θ
[T]
0
)}
t
and
dℓ(θ)
dθ
=
T∑
t=1
∂¯ℓ(θ[T])
∂θt
.
Pierre Jacob Derivative estimation 31/ 39
Iterated Filtering
The estimator of the score is thus given by
T∑
t=1
{
Σ−1
T
(
E
[
θ1:T | Y
]
− θ
[T]
0
)}
t
which can be reduced to
Sτ,ρ,T (θ0) =
τ−2
1 + ρ
[
(1 − ρ)
{T−1∑
t=2
E
(
θt Y
)
}
− {(1 − ρ) T + 2ρ} θ0
+E
(
θ1 Y
)
+ E
(
θT Y
)]
,
given the form of Σ−1
T . Note that in the quantities E(θt | Y ),
Y = Y1:T is the complete dataset, thus those expectations are
with respect to the smoothing distribution.
Pierre Jacob Derivative estimation 32/ 39
Iterated Filtering
If ρ = 1, then the parameters follow a random walk:
θ1 = θ0 + N(0, τ2
) and θt+1 = θt + N(0, σ2
).
In this case Ionides et al. proposed the estimator
Sτ,σ,T = τ−2
(
E
(
θT | Y
)
− θ0
)
as well as
S
(bis)
τ,σ,T =
T∑
t=1
VP,t
−1
(
˜θF,t − ˜θF,t−1
)
with VP,t = Cov[˜θt | y1:t−1] and ˜θF,t = E[θt | y1:t].
Those expressions only involve expectations with respect to
filtering distributions.
Pierre Jacob Derivative estimation 33/ 39
Iterated Filtering
If ρ = 0, then the parameters are i.i.d:
θ1 = θ0 + N(0, τ2
) and θt+1 = θ0 + N(0, τ2
).
In this case the expression of the score estimator reduces to
Sτ,T = τ−2
T∑
t=1
(
E
(
θt | Y
)
− θ0
)
which involves smoothing distributions.
There’s only one parameter τ2 to choose for the prior.
However smoothing for general hidden Markov models is
difficult, and typically resorts to “fixed lag approximations”.
Pierre Jacob Derivative estimation 34/ 39
Iterated Smoothing
Only for the case ρ = 0 are we able to obtain simple expressions
for the observed information matrix. We propose the following
estimator:
Iτ,T (θ0) = −τ−4
{ T∑
s=1
T∑
t=1
Cov
(
θs, θt Y
)
− τ2
T
}
.
for which we can show that
Iτ,T − (−∇2
ℓ(θ0)) ≤ Cτ2
.
Pierre Jacob Derivative estimation 35/ 39
Numerical results
Linear Gaussian state space model where the ground truth is
available through the Kalman filter.
X0 ∼ N(0, 1) and Xt = ρXt−1 + N(0, V )
Yt = ηXt + N(0, W ).
Generate T = 100 observations and set
ρ = 0.9, V = 0.7, η = 0.9 and W = 0.1, 0.2, 0.4, 0.9.
240 independent runs, matching the computational costs
between methods in terms of number of calls to the transition
kernel.
Pierre Jacob Derivative estimation 36/ 39
Numerical results
q
q
q
q
q
q
q q q
q q q q q q q q q qq
q
q
qq
q
q
q
q q q q q q q q q q q
q
q
q
q
q
q
q
q
q
q
q q q
q q q q q
q
q
q
q
q
q
q
q
q
q
q
q
q
q q q
q
q
q
q
10
100
1000
10000
0.0 0.1 0.2 0.3
h
RMSE
parameter q q q q
1 2 3 4
Finite Difference
qq q q q q q q q q q q q q q q q q q q
qq q q q q q q q q q q q q q q q q q q
qq q q q q q q q q q q q q q q q q q q
q
q q q q q q q q q q q q q q q q q q q
10
100
1000
10000
0.1 0.2 0.3 0.4 0.5
tau
RMSE
parameter q q q q
1 2 3 4
Iterated Smoothing
Figure : 240 runs for Iterated Smoothing and Finite Difference.
Pierre Jacob Derivative estimation 37/ 39
Numerical results
qqqqqqq q q q q q q q q q q q q q
qqqqqqq q q q q q q q q q q q q q
qqqqqqq q q q q q q q q q q q q q
q
qqqqqq q q q q q q q q q q q q q
10
100
1000
10000
0.1 0.2 0.3 0.4 0.5
tau
RMSE
parameter q q q q
1 2 3 4
Iterated Smoothing
qq
q
q
q
q q q q q
q
q
q
q q
q q q q q
q
q
q
q
q
q
q q q q
q
q
q
q
q q q q q q
10
100
1000
10000
0.0 0.1 0.2 0.3 0.4 0.5
tau
RMSE
parameter q q q q
1 2 3 4
Iterated Filtering 1
qqqq q q q q q q
qqqq q q q q q q
qqqq q q
q
q q q
qqqq
q
q
q q q q
10
100
1000
10000
0.0 0.1 0.2 0.3 0.4 0.5
tau
RMSE
parameter q q q q
1 2 3 4
Iterated Filtering 2
Figure : 240 runs for Iterated Smoothing and Iterated Filtering.
Pierre Jacob Derivative estimation 38/ 39
Bibliography
Main references:
Inference for nonlinear dynamical systems, Ionides, Breto,
King, PNAS, 2006.
Iterated filtering, Ionides, Bhadra, Atchad´e, King, Annals
of Statistics, 2011.
Efficient iterated filtering, Lindstr¨om, Ionides, Frydendall,
Madsen, 16th IFAC Symposium on System Identification.
Derivative-Free Estimation of the Score Vector
and Observed Information Matrix,
Doucet, Jacob, Rubenthaler, 2013 (on arXiv).
Pierre Jacob Derivative estimation 39/ 39

Más contenido relacionado

La actualidad más candente

Numerical Fourier transform based on hyperfunction theory
Numerical Fourier transform based on hyperfunction theoryNumerical Fourier transform based on hyperfunction theory
Numerical Fourier transform based on hyperfunction theoryHidenoriOgata
 
Numerical integration based on the hyperfunction theory
Numerical integration based on the hyperfunction theoryNumerical integration based on the hyperfunction theory
Numerical integration based on the hyperfunction theoryHidenoriOgata
 
Classification and regression based on derivatives: a consistency result for ...
Classification and regression based on derivatives: a consistency result for ...Classification and regression based on derivatives: a consistency result for ...
Classification and regression based on derivatives: a consistency result for ...tuxette
 
Tensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantificationTensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantificationAlexander Litvinenko
 
Nonconvex Compressed Sensing with the Sum-of-Squares Method
Nonconvex Compressed Sensing with the Sum-of-Squares MethodNonconvex Compressed Sensing with the Sum-of-Squares Method
Nonconvex Compressed Sensing with the Sum-of-Squares MethodTasuku Soma
 
Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...Alexander Litvinenko
 
Signal Processing Introduction using Fourier Transforms
Signal Processing Introduction using Fourier TransformsSignal Processing Introduction using Fourier Transforms
Signal Processing Introduction using Fourier TransformsArvind Devaraj
 
Ada boost brown boost performance with noisy data
Ada boost brown boost performance with noisy dataAda boost brown boost performance with noisy data
Ada boost brown boost performance with noisy dataShadhin Rahman
 
Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...HidenoriOgata
 
Levitan Centenary Conference Talk, June 27 2014
Levitan Centenary Conference Talk, June 27 2014Levitan Centenary Conference Talk, June 27 2014
Levitan Centenary Conference Talk, June 27 2014Nikita V. Artamonov
 
Bregman divergences from comparative convexity
Bregman divergences from comparative convexityBregman divergences from comparative convexity
Bregman divergences from comparative convexityFrank Nielsen
 
Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)Umberto Picchini
 
Random Matrix Theory and Machine Learning - Part 1
Random Matrix Theory and Machine Learning - Part 1Random Matrix Theory and Machine Learning - Part 1
Random Matrix Theory and Machine Learning - Part 1Fabian Pedregosa
 
On the Jensen-Shannon symmetrization of distances relying on abstract means
On the Jensen-Shannon symmetrization of distances relying on abstract meansOn the Jensen-Shannon symmetrization of distances relying on abstract means
On the Jensen-Shannon symmetrization of distances relying on abstract meansFrank Nielsen
 

La actualidad más candente (20)

Numerical Fourier transform based on hyperfunction theory
Numerical Fourier transform based on hyperfunction theoryNumerical Fourier transform based on hyperfunction theory
Numerical Fourier transform based on hyperfunction theory
 
Numerical integration based on the hyperfunction theory
Numerical integration based on the hyperfunction theoryNumerical integration based on the hyperfunction theory
Numerical integration based on the hyperfunction theory
 
Classification and regression based on derivatives: a consistency result for ...
Classification and regression based on derivatives: a consistency result for ...Classification and regression based on derivatives: a consistency result for ...
Classification and regression based on derivatives: a consistency result for ...
 
Tensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantificationTensor Train data format for uncertainty quantification
Tensor Train data format for uncertainty quantification
 
Nonconvex Compressed Sensing with the Sum-of-Squares Method
Nonconvex Compressed Sensing with the Sum-of-Squares MethodNonconvex Compressed Sensing with the Sum-of-Squares Method
Nonconvex Compressed Sensing with the Sum-of-Squares Method
 
Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...Low rank tensor approximation of probability density and characteristic funct...
Low rank tensor approximation of probability density and characteristic funct...
 
Signal Processing Introduction using Fourier Transforms
Signal Processing Introduction using Fourier TransformsSignal Processing Introduction using Fourier Transforms
Signal Processing Introduction using Fourier Transforms
 
Ada boost brown boost performance with noisy data
Ada boost brown boost performance with noisy dataAda boost brown boost performance with noisy data
Ada boost brown boost performance with noisy data
 
Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...
 
2018 MUMS Fall Course - Statistical and Mathematical Techniques for Sensitivi...
2018 MUMS Fall Course - Statistical and Mathematical Techniques for Sensitivi...2018 MUMS Fall Course - Statistical and Mathematical Techniques for Sensitivi...
2018 MUMS Fall Course - Statistical and Mathematical Techniques for Sensitivi...
 
Congrès SMAI 2019
Congrès SMAI 2019Congrès SMAI 2019
Congrès SMAI 2019
 
Levitan Centenary Conference Talk, June 27 2014
Levitan Centenary Conference Talk, June 27 2014Levitan Centenary Conference Talk, June 27 2014
Levitan Centenary Conference Talk, June 27 2014
 
Bregman divergences from comparative convexity
Bregman divergences from comparative convexityBregman divergences from comparative convexity
Bregman divergences from comparative convexity
 
Rousseau
RousseauRousseau
Rousseau
 
Chapter 4 (maths 3)
Chapter 4 (maths 3)Chapter 4 (maths 3)
Chapter 4 (maths 3)
 
Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)Intro to Approximate Bayesian Computation (ABC)
Intro to Approximate Bayesian Computation (ABC)
 
Random Matrix Theory and Machine Learning - Part 1
Random Matrix Theory and Machine Learning - Part 1Random Matrix Theory and Machine Learning - Part 1
Random Matrix Theory and Machine Learning - Part 1
 
Ss 2013 midterm
Ss 2013 midtermSs 2013 midterm
Ss 2013 midterm
 
On the Jensen-Shannon symmetrization of distances relying on abstract means
On the Jensen-Shannon symmetrization of distances relying on abstract meansOn the Jensen-Shannon symmetrization of distances relying on abstract means
On the Jensen-Shannon symmetrization of distances relying on abstract means
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Appli...
 Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Appli... Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Appli...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Appli...
 

Similar a Estimation of score vector and observed information matrix in intractable models

Linear response theory and TDDFT
Linear response theory and TDDFT Linear response theory and TDDFT
Linear response theory and TDDFT Claudio Attaccalite
 
On Twisted Paraproducts and some other Multilinear Singular Integrals
On Twisted Paraproducts and some other Multilinear Singular IntegralsOn Twisted Paraproducts and some other Multilinear Singular Integrals
On Twisted Paraproducts and some other Multilinear Singular IntegralsVjekoslavKovac1
 
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...Alexander Litvinenko
 
Litvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfLitvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfAlexander Litvinenko
 
Meta-learning and the ELBO
Meta-learning and the ELBOMeta-learning and the ELBO
Meta-learning and the ELBOYoonho Lee
 
Tales on two commuting transformations or flows
Tales on two commuting transformations or flowsTales on two commuting transformations or flows
Tales on two commuting transformations or flowsVjekoslavKovac1
 
Bayesian inference on mixtures
Bayesian inference on mixturesBayesian inference on mixtures
Bayesian inference on mixturesChristian Robert
 
IVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionIVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionCharles Deledalle
 
Unbiased Hamiltonian Monte Carlo
Unbiased Hamiltonian Monte CarloUnbiased Hamiltonian Monte Carlo
Unbiased Hamiltonian Monte CarloJeremyHeng10
 
Diffusion Schrödinger bridges for score-based generative modeling
Diffusion Schrödinger bridges for score-based generative modelingDiffusion Schrödinger bridges for score-based generative modeling
Diffusion Schrödinger bridges for score-based generative modelingJeremyHeng10
 
Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Alexander Litvinenko
 
Bayesian Inference and Uncertainty Quantification for Inverse Problems
Bayesian Inference and Uncertainty Quantification for Inverse ProblemsBayesian Inference and Uncertainty Quantification for Inverse Problems
Bayesian Inference and Uncertainty Quantification for Inverse ProblemsMatt Moores
 
A Fibonacci-like universe expansion on time-scale
A Fibonacci-like universe expansion on time-scaleA Fibonacci-like universe expansion on time-scale
A Fibonacci-like universe expansion on time-scaleOctavianPostavaru
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Daisuke Yoneoka
 
Spectral sum rules for conformal field theories
Spectral sum rules for conformal field theoriesSpectral sum rules for conformal field theories
Spectral sum rules for conformal field theoriesSubham Dutta Chowdhury
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsStefano Cabras
 
Diffusion Schrödinger bridges for score-based generative modeling
Diffusion Schrödinger bridges for score-based generative modelingDiffusion Schrödinger bridges for score-based generative modeling
Diffusion Schrödinger bridges for score-based generative modelingJeremyHeng10
 

Similar a Estimation of score vector and observed information matrix in intractable models (20)

Linear response theory and TDDFT
Linear response theory and TDDFT Linear response theory and TDDFT
Linear response theory and TDDFT
 
On Twisted Paraproducts and some other Multilinear Singular Integrals
On Twisted Paraproducts and some other Multilinear Singular IntegralsOn Twisted Paraproducts and some other Multilinear Singular Integrals
On Twisted Paraproducts and some other Multilinear Singular Integrals
 
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
Computing f-Divergences and Distances of\\ High-Dimensional Probability Densi...
 
Lecture_9.pdf
Lecture_9.pdfLecture_9.pdf
Lecture_9.pdf
 
Litvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdfLitvinenko_RWTH_UQ_Seminar_talk.pdf
Litvinenko_RWTH_UQ_Seminar_talk.pdf
 
Meta-learning and the ELBO
Meta-learning and the ELBOMeta-learning and the ELBO
Meta-learning and the ELBO
 
Tales on two commuting transformations or flows
Tales on two commuting transformations or flowsTales on two commuting transformations or flows
Tales on two commuting transformations or flows
 
Bayesian inference on mixtures
Bayesian inference on mixturesBayesian inference on mixtures
Bayesian inference on mixtures
 
Muchtadi
MuchtadiMuchtadi
Muchtadi
 
IVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionIVR - Chapter 1 - Introduction
IVR - Chapter 1 - Introduction
 
Unbiased Hamiltonian Monte Carlo
Unbiased Hamiltonian Monte CarloUnbiased Hamiltonian Monte Carlo
Unbiased Hamiltonian Monte Carlo
 
Diffusion Schrödinger bridges for score-based generative modeling
Diffusion Schrödinger bridges for score-based generative modelingDiffusion Schrödinger bridges for score-based generative modeling
Diffusion Schrödinger bridges for score-based generative modeling
 
Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...Hierarchical matrices for approximating large covariance matries and computin...
Hierarchical matrices for approximating large covariance matries and computin...
 
Bayesian Inference and Uncertainty Quantification for Inverse Problems
Bayesian Inference and Uncertainty Quantification for Inverse ProblemsBayesian Inference and Uncertainty Quantification for Inverse Problems
Bayesian Inference and Uncertainty Quantification for Inverse Problems
 
A Fibonacci-like universe expansion on time-scale
A Fibonacci-like universe expansion on time-scaleA Fibonacci-like universe expansion on time-scale
A Fibonacci-like universe expansion on time-scale
 
Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9Murphy: Machine learning A probabilistic perspective: Ch.9
Murphy: Machine learning A probabilistic perspective: Ch.9
 
Spectral sum rules for conformal field theories
Spectral sum rules for conformal field theoriesSpectral sum rules for conformal field theories
Spectral sum rules for conformal field theories
 
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
2018 MUMS Fall Course - Statistical Representation of Model Input (EDITED) - ...
 
Approximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-LikelihoodsApproximate Bayesian Computation with Quasi-Likelihoods
Approximate Bayesian Computation with Quasi-Likelihoods
 
Diffusion Schrödinger bridges for score-based generative modeling
Diffusion Schrödinger bridges for score-based generative modelingDiffusion Schrödinger bridges for score-based generative modeling
Diffusion Schrödinger bridges for score-based generative modeling
 

Más de Pierre Jacob

Talk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniquesTalk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniquesPierre Jacob
 
ISBA 2022 Susie Bayarri lecture
ISBA 2022 Susie Bayarri lectureISBA 2022 Susie Bayarri lecture
ISBA 2022 Susie Bayarri lecturePierre Jacob
 
Couplings of Markov chains and the Poisson equation
Couplings of Markov chains and the Poisson equation Couplings of Markov chains and the Poisson equation
Couplings of Markov chains and the Poisson equation Pierre Jacob
 
Monte Carlo methods for some not-quite-but-almost Bayesian problems
Monte Carlo methods for some not-quite-but-almost Bayesian problemsMonte Carlo methods for some not-quite-but-almost Bayesian problems
Monte Carlo methods for some not-quite-but-almost Bayesian problemsPierre Jacob
 
Monte Carlo methods for some not-quite-but-almost Bayesian problems
Monte Carlo methods for some not-quite-but-almost Bayesian problemsMonte Carlo methods for some not-quite-but-almost Bayesian problems
Monte Carlo methods for some not-quite-but-almost Bayesian problemsPierre Jacob
 
Markov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing themMarkov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing themPierre Jacob
 
Unbiased MCMC with couplings
Unbiased MCMC with couplingsUnbiased MCMC with couplings
Unbiased MCMC with couplingsPierre Jacob
 
Unbiased Markov chain Monte Carlo methods
Unbiased Markov chain Monte Carlo methods Unbiased Markov chain Monte Carlo methods
Unbiased Markov chain Monte Carlo methods Pierre Jacob
 
Recent developments on unbiased MCMC
Recent developments on unbiased MCMCRecent developments on unbiased MCMC
Recent developments on unbiased MCMCPierre Jacob
 
Current limitations of sequential inference in general hidden Markov models
Current limitations of sequential inference in general hidden Markov modelsCurrent limitations of sequential inference in general hidden Markov models
Current limitations of sequential inference in general hidden Markov modelsPierre Jacob
 
On non-negative unbiased estimators
On non-negative unbiased estimatorsOn non-negative unbiased estimators
On non-negative unbiased estimatorsPierre Jacob
 
Path storage in the particle filter
Path storage in the particle filterPath storage in the particle filter
Path storage in the particle filterPierre Jacob
 
Density exploration methods
Density exploration methodsDensity exploration methods
Density exploration methodsPierre Jacob
 
SMC^2: an algorithm for sequential analysis of state-space models
SMC^2: an algorithm for sequential analysis of state-space modelsSMC^2: an algorithm for sequential analysis of state-space models
SMC^2: an algorithm for sequential analysis of state-space modelsPierre Jacob
 
PAWL - GPU meeting @ Warwick
PAWL - GPU meeting @ WarwickPAWL - GPU meeting @ Warwick
PAWL - GPU meeting @ WarwickPierre Jacob
 
Presentation of SMC^2 at BISP7
Presentation of SMC^2 at BISP7Presentation of SMC^2 at BISP7
Presentation of SMC^2 at BISP7Pierre Jacob
 
Presentation MCB seminar 09032011
Presentation MCB seminar 09032011Presentation MCB seminar 09032011
Presentation MCB seminar 09032011Pierre Jacob
 

Más de Pierre Jacob (17)

Talk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniquesTalk at CIRM on Poisson equation and debiasing techniques
Talk at CIRM on Poisson equation and debiasing techniques
 
ISBA 2022 Susie Bayarri lecture
ISBA 2022 Susie Bayarri lectureISBA 2022 Susie Bayarri lecture
ISBA 2022 Susie Bayarri lecture
 
Couplings of Markov chains and the Poisson equation
Couplings of Markov chains and the Poisson equation Couplings of Markov chains and the Poisson equation
Couplings of Markov chains and the Poisson equation
 
Monte Carlo methods for some not-quite-but-almost Bayesian problems
Monte Carlo methods for some not-quite-but-almost Bayesian problemsMonte Carlo methods for some not-quite-but-almost Bayesian problems
Monte Carlo methods for some not-quite-but-almost Bayesian problems
 
Monte Carlo methods for some not-quite-but-almost Bayesian problems
Monte Carlo methods for some not-quite-but-almost Bayesian problemsMonte Carlo methods for some not-quite-but-almost Bayesian problems
Monte Carlo methods for some not-quite-but-almost Bayesian problems
 
Markov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing themMarkov chain Monte Carlo methods and some attempts at parallelizing them
Markov chain Monte Carlo methods and some attempts at parallelizing them
 
Unbiased MCMC with couplings
Unbiased MCMC with couplingsUnbiased MCMC with couplings
Unbiased MCMC with couplings
 
Unbiased Markov chain Monte Carlo methods
Unbiased Markov chain Monte Carlo methods Unbiased Markov chain Monte Carlo methods
Unbiased Markov chain Monte Carlo methods
 
Recent developments on unbiased MCMC
Recent developments on unbiased MCMCRecent developments on unbiased MCMC
Recent developments on unbiased MCMC
 
Current limitations of sequential inference in general hidden Markov models
Current limitations of sequential inference in general hidden Markov modelsCurrent limitations of sequential inference in general hidden Markov models
Current limitations of sequential inference in general hidden Markov models
 
On non-negative unbiased estimators
On non-negative unbiased estimatorsOn non-negative unbiased estimators
On non-negative unbiased estimators
 
Path storage in the particle filter
Path storage in the particle filterPath storage in the particle filter
Path storage in the particle filter
 
Density exploration methods
Density exploration methodsDensity exploration methods
Density exploration methods
 
SMC^2: an algorithm for sequential analysis of state-space models
SMC^2: an algorithm for sequential analysis of state-space modelsSMC^2: an algorithm for sequential analysis of state-space models
SMC^2: an algorithm for sequential analysis of state-space models
 
PAWL - GPU meeting @ Warwick
PAWL - GPU meeting @ WarwickPAWL - GPU meeting @ Warwick
PAWL - GPU meeting @ Warwick
 
Presentation of SMC^2 at BISP7
Presentation of SMC^2 at BISP7Presentation of SMC^2 at BISP7
Presentation of SMC^2 at BISP7
 
Presentation MCB seminar 09032011
Presentation MCB seminar 09032011Presentation MCB seminar 09032011
Presentation MCB seminar 09032011
 

Último

CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptxpallavirawat456
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPirithiRaju
 
Science (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and PitfallsScience (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and PitfallsDobusch Leonhard
 
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11GelineAvendao
 
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPRPirithiRaju
 
final waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterfinal waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterHanHyoKim
 
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2AuEnriquezLontok
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsSérgio Sacani
 
Advances in AI-driven Image Recognition for Early Detection of Cancer
Advances in AI-driven Image Recognition for Early Detection of CancerAdvances in AI-driven Image Recognition for Early Detection of Cancer
Advances in AI-driven Image Recognition for Early Detection of CancerLuis Miguel Chong Chong
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxRitchAndruAgustin
 
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPRPirithiRaju
 
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika DasBACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika DasChayanika Das
 
DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxGiDMOh
 
Timeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological CorrelationsTimeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological CorrelationsDanielBaumann11
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learningvschiavoni
 
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Christina Parmionova
 
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...Chayanika Das
 
Environmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxEnvironmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxpriyankatabhane
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024Jene van der Heide
 

Último (20)

Ultrastructure and functions of Chloroplast.pptx
Ultrastructure and functions of Chloroplast.pptxUltrastructure and functions of Chloroplast.pptx
Ultrastructure and functions of Chloroplast.pptx
 
CHROMATOGRAPHY PALLAVI RAWAT.pptx
CHROMATOGRAPHY  PALLAVI RAWAT.pptxCHROMATOGRAPHY  PALLAVI RAWAT.pptx
CHROMATOGRAPHY PALLAVI RAWAT.pptx
 
Pests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPRPests of Sunflower_Binomics_Identification_Dr.UPR
Pests of Sunflower_Binomics_Identification_Dr.UPR
 
Science (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and PitfallsScience (Communication) and Wikipedia - Potentials and Pitfalls
Science (Communication) and Wikipedia - Potentials and Pitfalls
 
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
WEEK 4 PHYSICAL SCIENCE QUARTER 3 FOR G11
 
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
6.1 Pests of Groundnut_Binomics_Identification_Dr.UPR
 
final waves properties grade 7 - third quarter
final waves properties grade 7 - third quarterfinal waves properties grade 7 - third quarter
final waves properties grade 7 - third quarter
 
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
LESSON PLAN IN SCIENCE GRADE 4 WEEK 1 DAY 2
 
Observational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive starsObservational constraints on mergers creating magnetism in massive stars
Observational constraints on mergers creating magnetism in massive stars
 
Advances in AI-driven Image Recognition for Early Detection of Cancer
Advances in AI-driven Image Recognition for Early Detection of CancerAdvances in AI-driven Image Recognition for Early Detection of Cancer
Advances in AI-driven Image Recognition for Early Detection of Cancer
 
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptxGENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
GENERAL PHYSICS 2 REFRACTION OF LIGHT SENIOR HIGH SCHOOL GENPHYS2.pptx
 
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
6.2 Pests of Sesame_Identification_Binomics_Dr.UPR
 
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika DasBACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
BACTERIAL DEFENSE SYSTEM by Dr. Chayanika Das
 
DNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptxDNA isolation molecular biology practical.pptx
DNA isolation molecular biology practical.pptx
 
Timeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological CorrelationsTimeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
Timeless Cosmology: Towards a Geometric Origin of Cosmological Correlations
 
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep LearningCombining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
Combining Asynchronous Task Parallelism and Intel SGX for Secure Deep Learning
 
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
Charateristics of the Angara-A5 spacecraft launched from the Vostochny Cosmod...
 
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
ESSENTIAL FEATURES REQUIRED FOR ESTABLISHING FOUR TYPES OF BIOSAFETY LABORATO...
 
Environmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptxEnvironmental acoustics- noise criteria.pptx
Environmental acoustics- noise criteria.pptx
 
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024GenAI talk for Young at Wageningen University & Research (WUR) March 2024
GenAI talk for Young at Wageningen University & Research (WUR) March 2024
 

Estimation of score vector and observed information matrix in intractable models

  • 1. Estimation of the score vector and observed information matrix in intractable models Arnaud Doucet (University of Oxford) Pierre E. Jacob (University of Oxford) Sylvain Rubenthaler (Universit´e Nice Sophia Antipolis) November 28th, 2014 Pierre Jacob Derivative estimation 1/ 39
  • 2. Outline 1 Context 2 General results and connections 3 Posterior concentration when the prior concentrates 4 Hidden Markov models Pierre Jacob Derivative estimation 1/ 39
  • 3. Outline 1 Context 2 General results and connections 3 Posterior concentration when the prior concentrates 4 Hidden Markov models Pierre Jacob Derivative estimation 2/ 39
  • 4. Motivation Derivatives of the likelihood help optimizing / sampling. For many models they are not available. One can resort to approximation techniques. Pierre Jacob Derivative estimation 2/ 39
  • 5. Using derivatives in sampling algorithms Modified Adjusted Langevin Algorithm At step t, given a point θt, do: propose θ⋆ ∼ q(dθ | θt) ≡ N(θt + σ2 2 ∇θ log π(θt), σ2 ), with probability 1 ∧ π(θ⋆)q(θt | θ⋆) π(θt)q(θ⋆ | θt) set θt+1 = θ⋆, otherwise set θt+1 = θt. Pierre Jacob Derivative estimation 3/ 39
  • 6. Using derivatives in sampling algorithms Figure : Proposal mechanism for random walk Metropolis–Hastings. Pierre Jacob Derivative estimation 4/ 39
  • 7. Using derivatives in sampling algorithms Figure : Proposal mechanism for MALA. Pierre Jacob Derivative estimation 5/ 39
  • 8. Using derivatives in sampling algorithms In what sense is MALA better than MH? Scaling with the dimension of the state space For Metropolis–Hastings, optimal scaling leads to σ2 = O(d−1 ), For MALA, optimal scaling leads to σ2 = O(d−1/3 ). Roberts & Rosenthal, Optimal Scaling for Various Metropolis-Hastings Algorithms, 2001. Pierre Jacob Derivative estimation 6/ 39
  • 9. Hidden Markov models y2 X2X0 y1 X1 ... ... yT XT θ Figure : Graph representation of a general hidden Markov model. Hidden process: initial distribution µθ, transition fθ. Observations conditional upon the hidden process, from gθ. Pierre Jacob Derivative estimation 7/ 39
  • 10. Assumptions Input: Parameter θ : unknown, prior distribution p. Initial condition µθ(dx0) : can be sampled from. Transition fθ(dxt|xt−1) : can be sampled from. Measurement gθ(yt|xt) : can be evaluated point-wise. Observations y1:T = (y1, . . . , yT ). Goals: score: ∇θ log L(θ; y1:T ) for any θ, observed information matrix: −∇2 θ log L(θ; y1:T ) for any θ. Note: throughout the talk, the observations, and thus the likelihood, are fixed. Pierre Jacob Derivative estimation 8/ 39
  • 11. Why is it an intractable model? The likelihood function does not admit a closed form expression: L(θ; y1, . . . , yT ) = ∫ XT+1 p(y1, . . . , yT | x0, . . . xT , θ)p(dx0, . . . dxT | θ) = ∫ XT+1 T∏ t=1 gθ(yt | xt) µθ(dx0) T∏ t=1 fθ(dxt | xt−1). Hence the likelihood can only be estimated, e.g. by standard Monte Carlo, or by particle filters. What about the derivatives of the likelihood? Pierre Jacob Derivative estimation 9/ 39
  • 12. Fisher and Louis’ identities Write the score as: ∇ℓ(θ) = ∫ ∇ log p(x0:T , y1:T | θ)p(dx0:T | y1:T , θ). which is an integral, with respect to the smoothing distribution p(dx0:T | y1:T , θ), of ∇ log p(x0:T , y1:T | θ) = ∇ log µθ(x0) + T∑ t=1 ∇ log fθ(xt | xt−1) + T∑ t=1 ∇ log gθ(yt | xt). However pointwise evaluations of ∇ log µθ(x0) and ∇ log fθ(xt | xt−1) are not always available. Pierre Jacob Derivative estimation 10/ 39
  • 13. New kid on the block: Iterated Filtering Perturbed model Hidden states ˜Xt = (˜θt, Xt). { ˜θ0 ∼ N(θ0, τ2Σ) X0 ∼ µ˜θ0 (·) and { ˜θt ∼ N(˜θt−1, σ2Σ) Xt ∼ f˜θt (· | Xt−1 = xt−1) Observations ˜Yt ∼ g˜θt (· | Xt). Score estimate Consider VP,t = Cov[˜θt | y1:t−1] and ˜θF,t = E[˜θt | y1:t]. T∑ t=1 VP,t −1 ( ˜θF,t − ˜θF,t−1 ) ≈ ∇ℓ(θ0) when τ → 0 and σ/τ → 0. Ionides, Breto, King, PNAS, 2006. Pierre Jacob Derivative estimation 11/ 39
  • 14. Iterated Filtering: the mystery Why is it valid? Is it related to known techniques? Can it be extended to estimate the second derivatives (i.e. the Hessian, i.e. the observed information matrix)? How does it compare to other methods such as finite difference? Pierre Jacob Derivative estimation 12/ 39
  • 15. Outline 1 Context 2 General results and connections 3 Posterior concentration when the prior concentrates 4 Hidden Markov models Pierre Jacob Derivative estimation 13/ 39
  • 16. Iterated Filtering Given a log likelihood ℓ and a given point, consider a prior θ ∼ N(θ0, σ2 ). Posterior expectation when the prior variance goes to zero First-order moments give first-order derivatives: |σ−2 (E[θ|Y ] − θ0) − ∇ℓ(θ0)| ≤ Cσ2 . Phrased simply, posterior mean − prior mean prior variance ≈ score. Result from Ionides, Bhadra, Atchad´e, King, Iterated filtering, 2011. Pierre Jacob Derivative estimation 13/ 39
  • 17. Extension of Iterated Filtering Posterior variance when the prior variance goes to zero Second-order moments give second-order derivatives: |σ−4 ( Cov[θ|Y ] − σ2 ) − ∇2 ℓ(θ0)| ≤ Cσ2 . Phrased simply, posterior variance − prior variance prior variance2 ≈ hessian. Result from Doucet, Jacob, Rubenthaler on arXiv, 2013. Pierre Jacob Derivative estimation 14/ 39
  • 18. Proximity mapping Given a real function f and a point θ0, consider for any σ2 > 0 θ → f (θ) exp { − 1 2σ2 (θ − θ0)2 } θ θ0 Figure : Example for f : θ → exp(−|θ|) and three values of σ2 . Pierre Jacob Derivative estimation 15/ 39
  • 19. Proximity mapping Proximity mapping The σ2-proximity mapping is defined by proxf : θ0 → argmaxθ∈R f (θ) exp { − 1 2σ2 (θ − θ0)2 } . Moreau approximation The σ2-Moreau approximation is defined by fσ2 : θ0 → C supθ∈R f (θ) exp { − 1 2σ2 (θ − θ0)2 } where C is a normalizing constant. Pierre Jacob Derivative estimation 16/ 39
  • 20. Proximity mapping θ Figure : θ → f (θ) and θ → fσ2 (θ) for three values of σ2 . Pierre Jacob Derivative estimation 16/ 39
  • 21. Proximity mapping Property Those objects are such that proxf (θ0) − θ0 σ2 = ∇ log fσ2 (θ0) −−−→ σ2→0 ∇ log f (θ0) Moreau (1962), Fonctions convexes duales et points proximaux dans un espace Hilbertien. Pereyra (2013), Proximal Markov chain Monte Carlo algorithms. Pierre Jacob Derivative estimation 17/ 39
  • 22. Proximity mapping Bayesian interpretation If f is a seen as a likelihood function then θ → f (θ) exp { − 1 2σ2 (θ − θ0)2 } is an unnormalized posterior density function based on a Normal prior with mean θ0 and variance σ2. Hence proxf (θ0) − θ0 σ2 −−−→ σ2→0 ∇ log f (θ0) can be read posterior mode − prior mode prior variance ≈ score. Pierre Jacob Derivative estimation 18/ 39
  • 23. Stein’s lemma Stein’s lemma states that θ ∼ N(θ0, σ2 ) if and only if for any function g such that E [|∇g(θ)|] < ∞, E [(θ − θ0) g (θ)] = σ2 E [∇g (θ)] . If we choose the function g : θ → exp ℓ (θ) /Z with Z = E [exp ℓ (θ)] and apply Stein’s lemma we obtain 1 Z E [θ exp ℓ(θ)] − θ0 = σ2 Z E [∇ℓ (θ) exp (ℓ (θ))] ⇔ σ−2 (E [θ | Y ] − θ0) = E [∇ℓ (θ) | Y ] . Notation: E[φ(θ) | Y ] := E[φ(θ) exp ℓ(θ)]/Z. Pierre Jacob Derivative estimation 19/ 39
  • 24. Stein’s lemma For the second derivative, we consider h : θ → (θ − θ0) exp ℓ (θ) /Z. Then E [ (θ − θ0)2 | Y ] = σ2 + σ4 E [ ∇2 ℓ(θ) + ∇ℓ(θ)2 | Y ] . Adding and subtracting terms also yields σ−4 ( V [θ | Y ] − σ2 ) = E [ ∇2 ℓ(θ) | Y ] + { E [ ∇ℓ(θ)2 | Y ] − (E [∇ℓ(θ) | Y ])2 } . . . . but what we really want is ∇ℓ(θ0), ∇ℓ2 (θ0) and not E [∇ℓ(θ) | Y ] , E [ ∇ℓ2 (θ) | Y ] . Pierre Jacob Derivative estimation 20/ 39
  • 25. Outline 1 Context 2 General results and connections 3 Posterior concentration when the prior concentrates 4 Hidden Markov models Pierre Jacob Derivative estimation 21/ 39
  • 26. Core Idea The prior is a essentially a normal distribution N(θ0, σ2), but in general has a density denoted by κ. Posterior concentration induced by the prior Under some assumptions, when σ → 0: the posterior looks more and more like the prior, the shift in posterior moments is in O(σ2). Our arXived proof suffers from an overdose of Taylor expansions. Pierre Jacob Derivative estimation 21/ 39
  • 27. Details Introduce a test function h such that |h(u)| < c|u|α for some c, α. We start by writing E {h (θ − θ0)| y} = ∫ h (σu) exp {ℓ (θ0 + σu) − ℓ(θ0)} κ (u) du ∫ exp {ℓ (θ0 + σu) − ℓ(θ0)} κ (u) du using u = (θ − θ0)/σ and then focus on the numerator ∫ h (σu) exp {ℓ (θ0 + σu) − ℓ(θ0)} κ (u) du since the denominator is a particular instance of this expression with h : u → 1. Pierre Jacob Derivative estimation 22/ 39
  • 28. Details For the numerator: ∫ h (σu) exp {ℓ (θ0 + σu) − ℓ(θ0)} κ (u) du we use a Taylor expansion of ℓ around θ0 and a Taylor expansion of exp around 0, and then take the integral with respect to κ. Notation: ℓ(k) (θ).u⊗k = ∑ 1≤i1,...,ik≤d ∂kℓ(θ) ∂θi1 . . . ∂θik ui1 . . . uik which in one dimension becomes ℓ(k) (θ).u⊗k = dkf (θ) dθk uk . Pierre Jacob Derivative estimation 23/ 39
  • 29. Details Main expansion: ∫ h(σu) exp {ℓ (θ0 + σu) − ℓ(θ0)} κ(u)du = ∫ h(σu)κ(u)du + σ ∫ h(σu)ℓ(1) (θ0).u κ(u)du + σ2 ∫ h(σu) { 1 2 ℓ(2) (θ0).u⊗2 + 1 2 (ℓ(1) (θ0).u)2 } κ(u)du + σ3 ∫ h(σu) { 1 3! (ℓ(1) (θ0).u)3 + 1 2 (ℓ(1) (θ0).u)(ℓ(2) (θ0).u⊗2 ) + 1 3! ℓ(3) (θ0).u⊗3 } κ(u)du + O(σ4+α ). In general, assumptions on the tails of the prior and the likelihood are used to control the remainder terms and to ensure there are O(σ4+α). Pierre Jacob Derivative estimation 24/ 39
  • 30. Details We cut the integral into two bits: ∫ h(σu) exp {ℓ (θ0 + σu) − ℓ(θ0)} κ(u)du = ∫ σ|u|≤ρ h(σu) exp {ℓ (θ0 + σu) − ℓ(θ0)} κ(u)du + ∫ σ|u|>ρ h(σu) exp {ℓ (θ0 + σu) − ℓ(θ0)} κ(u)du The expansion stems from the first term, where σ|u| is small. The second term ends up in the remainder in O(σ4+α) using the assumptions. Classic technique in Bayesian asymptotics theory. Pierre Jacob Derivative estimation 25/ 39
  • 31. Details To get the score from the expansion, choose h : u → u. To get the observed information matrix from the expansion, choose h : u → u2 , and surprisingly (?) further assume that κ is mesokurtic, i.e. ∫ u4 κ(u)du = 3 (∫ u2 κ(u)du )2 ⇒ choose a Gaussian prior to obtain the hessian. Pierre Jacob Derivative estimation 26/ 39
  • 32. Outline 1 Context 2 General results and connections 3 Posterior concentration when the prior concentrates 4 Hidden Markov models Pierre Jacob Derivative estimation 27/ 39
  • 33. Hidden Markov models y2 X2X0 y1 X1 ... ... yT XT θ Figure : Graph representation of a general hidden Markov model. Pierre Jacob Derivative estimation 27/ 39
  • 34. Hidden Markov models Direct application of the previous results 1 Prior distribution N(θ0, σ2) on the parameter θ. 2 The derivative approximations involve E[θ|Y ] and Cov[θ|Y ]. 3 Posterior moments for HMMs can be estimated by particle MCMC, SMC2 , ABC or your favourite method. Ionides et al. proposed another approach. Pierre Jacob Derivative estimation 28/ 39
  • 35. Iterated Filtering Modification of the model: θ is time-varying. The associated loglikelihood is ¯ℓ(θ1:T ) = log p(y1:T ; θ1:T ) = log ∫ XT+1 T∏ t=1 g(yt | xt, θt) µ(dx1 | θ1) T∏ t=2 f (dxt | xt−1, θt). Introducing θ → (θ, θ, . . . , θ) := θ[T] ∈ RT , we have ¯ℓ(θ[T] ) = ℓ(θ) and the chain rule yields dℓ(θ) dθ = T∑ t=1 ∂¯ℓ(θ[T]) ∂θt . Pierre Jacob Derivative estimation 29/ 39
  • 36. Iterated Filtering Choice of prior on θ1:T : θ1 = θ0 + V1, V1 ∼ τ−1 κ { τ−1 (·) } θt+1 − θ0 = ρ ( θt − θ0 ) + Vt+1, Vt+1 ∼ σ−1 κ { σ−1 (·) } Choose σ2 such that τ2 = σ2/(1 − ρ2). Covariance of the prior on θ1:T : ΣT = τ2             1 ρ · · · · · · · · · ρT−1 ρ 1 ρ · · · · · · ρT−2 ρ2 ρ 1 ... ρT−3 ... ... ... ... ... ρT−2 ... 1 ρ ρT−1 · · · · · · · · · ρ 1             . Pierre Jacob Derivative estimation 30/ 39
  • 37. Iterated Filtering Applying the general results for this prior yields, with |x| = ∑T t=1 |xi|: |∇¯ℓ(θ [T] 0 ) − Σ−1 T ( E [ θ1:T | Y ] − θ [T] 0 ) | ≤ Cτ2 Moreover we have T∑ t=1 ∂¯ℓ(θ[T]) ∂θt − T∑ t=1 { Σ−1 T ( E [ θ1:T | Y ] − θ [T] 0 )} t ≤ T∑ t=1 ∂¯ℓ(θ[T]) ∂θt − { Σ−1 T ( E [ θ1:T | Y ] − θ [T] 0 )} t and dℓ(θ) dθ = T∑ t=1 ∂¯ℓ(θ[T]) ∂θt . Pierre Jacob Derivative estimation 31/ 39
  • 38. Iterated Filtering The estimator of the score is thus given by T∑ t=1 { Σ−1 T ( E [ θ1:T | Y ] − θ [T] 0 )} t which can be reduced to Sτ,ρ,T (θ0) = τ−2 1 + ρ [ (1 − ρ) {T−1∑ t=2 E ( θt Y ) } − {(1 − ρ) T + 2ρ} θ0 +E ( θ1 Y ) + E ( θT Y )] , given the form of Σ−1 T . Note that in the quantities E(θt | Y ), Y = Y1:T is the complete dataset, thus those expectations are with respect to the smoothing distribution. Pierre Jacob Derivative estimation 32/ 39
  • 39. Iterated Filtering If ρ = 1, then the parameters follow a random walk: θ1 = θ0 + N(0, τ2 ) and θt+1 = θt + N(0, σ2 ). In this case Ionides et al. proposed the estimator Sτ,σ,T = τ−2 ( E ( θT | Y ) − θ0 ) as well as S (bis) τ,σ,T = T∑ t=1 VP,t −1 ( ˜θF,t − ˜θF,t−1 ) with VP,t = Cov[˜θt | y1:t−1] and ˜θF,t = E[θt | y1:t]. Those expressions only involve expectations with respect to filtering distributions. Pierre Jacob Derivative estimation 33/ 39
  • 40. Iterated Filtering If ρ = 0, then the parameters are i.i.d: θ1 = θ0 + N(0, τ2 ) and θt+1 = θ0 + N(0, τ2 ). In this case the expression of the score estimator reduces to Sτ,T = τ−2 T∑ t=1 ( E ( θt | Y ) − θ0 ) which involves smoothing distributions. There’s only one parameter τ2 to choose for the prior. However smoothing for general hidden Markov models is difficult, and typically resorts to “fixed lag approximations”. Pierre Jacob Derivative estimation 34/ 39
  • 41. Iterated Smoothing Only for the case ρ = 0 are we able to obtain simple expressions for the observed information matrix. We propose the following estimator: Iτ,T (θ0) = −τ−4 { T∑ s=1 T∑ t=1 Cov ( θs, θt Y ) − τ2 T } . for which we can show that Iτ,T − (−∇2 ℓ(θ0)) ≤ Cτ2 . Pierre Jacob Derivative estimation 35/ 39
  • 42. Numerical results Linear Gaussian state space model where the ground truth is available through the Kalman filter. X0 ∼ N(0, 1) and Xt = ρXt−1 + N(0, V ) Yt = ηXt + N(0, W ). Generate T = 100 observations and set ρ = 0.9, V = 0.7, η = 0.9 and W = 0.1, 0.2, 0.4, 0.9. 240 independent runs, matching the computational costs between methods in terms of number of calls to the transition kernel. Pierre Jacob Derivative estimation 36/ 39
  • 43. Numerical results q q q q q q q q q q q q q q q q q q qq q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 10 100 1000 10000 0.0 0.1 0.2 0.3 h RMSE parameter q q q q 1 2 3 4 Finite Difference qq q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 10 100 1000 10000 0.1 0.2 0.3 0.4 0.5 tau RMSE parameter q q q q 1 2 3 4 Iterated Smoothing Figure : 240 runs for Iterated Smoothing and Finite Difference. Pierre Jacob Derivative estimation 37/ 39
  • 44. Numerical results qqqqqqq q q q q q q q q q q q q q qqqqqqq q q q q q q q q q q q q q qqqqqqq q q q q q q q q q q q q q q qqqqqq q q q q q q q q q q q q q 10 100 1000 10000 0.1 0.2 0.3 0.4 0.5 tau RMSE parameter q q q q 1 2 3 4 Iterated Smoothing qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 10 100 1000 10000 0.0 0.1 0.2 0.3 0.4 0.5 tau RMSE parameter q q q q 1 2 3 4 Iterated Filtering 1 qqqq q q q q q q qqqq q q q q q q qqqq q q q q q q qqqq q q q q q q 10 100 1000 10000 0.0 0.1 0.2 0.3 0.4 0.5 tau RMSE parameter q q q q 1 2 3 4 Iterated Filtering 2 Figure : 240 runs for Iterated Smoothing and Iterated Filtering. Pierre Jacob Derivative estimation 38/ 39
  • 45. Bibliography Main references: Inference for nonlinear dynamical systems, Ionides, Breto, King, PNAS, 2006. Iterated filtering, Ionides, Bhadra, Atchad´e, King, Annals of Statistics, 2011. Efficient iterated filtering, Lindstr¨om, Ionides, Frydendall, Madsen, 16th IFAC Symposium on System Identification. Derivative-Free Estimation of the Score Vector and Observed Information Matrix, Doucet, Jacob, Rubenthaler, 2013 (on arXiv). Pierre Jacob Derivative estimation 39/ 39