SlideShare una empresa de Scribd logo
1 de 35
Descargar para leer sin conexión
Maximum likelihood estimation of
regularisation parameters in inverse
problems: an empirical Bayesian approach.
V. De Bortoli
joint work with: A.F. Vidal, M. Pereyra, A. Durmus
January 28, 2021
Oxford University
0 / 31
Outline
1 Bayesian inference in imaging inverse problems
2 Sampling from high-dimensional models
3 Empirical Bayes estimation
4 Conclusion
1 / 31
Forward imaging problem
True scene
Imaging device
Observed image
2 / 31
Inverse imaging problem
True scene
Imaging device
Observed image
Restored image
3 / 31
General setting
How to recover in an unknown image x ∈ Rd
?
We measure y, related to x by some mathematical model.
For example, in many imaging problems
y = A(x) + w,
for some operator A : Rd
→ Rd
(non necessarily linear) that
might be poorly conditioned or rank deficient, and an unknown
perturbation or “noise” w.
The recovery of x from y is often ill-posed or ill-conditioned,
so we regularise the problem to make it well posed.
4 / 31
Bayesian basics
Probabilistic framework to provide estimations on the recovery x
and related quantities (uncertainty, high posterior density
intervall...)
Adopting a subjective probability approach, we propose a
prior on x, denoted p(x) (more on the choice of the prior later).
To derive inferences about x from y we postulate a joint
statistical model p(x, y); typically specified via the decomposition
p(x, y) = p(y|x)p(x).
Using this decomposition, we then compute quantities related to
p(x|y) using Bayes’ rule.
5 / 31
Likelihood VS prior information
The decomposition p(x, y) = p(y|x)p(x) has two key ingredients:
The likelihood function: the conditional distribution p(y|x) that
models the data observation process (forward model).
The prior function: the marginal distribution p(x) that models
our knowledge about x “before observing y”.
In our examples, p(y|x) is Gaussian (with semi-definite positive
covariance matrix). This covers many imaging problems provided
that the noise is Gaussian (deblurring, denoising, hyperspectral
unmixing).
" Many other possible choices for the noise (Poisson, binomial...)
leading to other (and often more complicated) models.
Usually p(x) enforces desirable properties on the solution
(sparsity in a wavelet basis, smoothness) but new
machine-learning based approaches use data-based priors, see
Song and Ermon (2019).
6 / 31
Regularisation parameters and prior
Often the prior will be of the form
p(x|θ) = exp[−hθ, ϕ(x)i]
Z
Rd
exp[−hθ, ϕ(x̃)i]dx̃ , (1)
where θ ∈ Rp
is a regularisation parameter, ϕ : Rd
→ Rp
.
θ controls the trade-off between the likelihood information and
the prior information.
θ might be hard to select depending on the problem. There exists
numerous approaches to tune θ:
generalised cross-validation Golub et al. (1979)
L-curve Lawson and Hanson (1995)
the discrepancy principle Morozov (2012)
residual whiteness measures Almeida and Figueiredo (2013)
Stein’s Unbiased Risk Estimator Deledalle et al. (2014)
hierarchical Bayes Pereyra et al. (2013)
empirical Bayes Carlin and Louis (2000)
7 / 31
Maximum-a-posteriori (MAP) estimation
A first estimator: Maximum A Posteriori estimation
x?
= arg max
x∈Rd
p(x|y, θ) = arg max
x∈Rd
{p(y|x)p(x|θ)} . (2)
In the convex case, huge literature on the topic (Nesterov (2005);
Nemirovski (2004); Chambolle and Pock (2011))
Fast algorithms in many cases (non-differentiable priors: ISTA,
FISTA Beck and Teboulle (2009), constrained composite
problems Chaux et al. (2009), ADMM Boyd et al. (2011)...)
However, depending on the application, MAP estimation has some
limitations e.g.,
This is a point estimator. Can we trust our estimator? How to
perform model selection?
Is the mode really what we want? (what about the mean or the
median?)
Sensitivity w.r.t to θ.
8 / 31
Illustrative example: astronomical image reconstruction
Recover x ∈ Rd
from low-dimensional degraded observation
y = MFx + w,
where F is the Fourier transform, M ∈ Cm×d
is a measurement
operator and w is Gaussian noise. We use the model
p(x|y) ∝ exp

−ky − MFxk2
/2σ2
− θkΨxk1

1Rn
+
(x). (3)
y
x?
Figure 1: Radio-interferometric image reconstruction of the W28
supernova. Image from Repetti et al. (2019)
9 / 31
Contribution
Our goal here is to:
Define efficient samplers in high-dimensional space (sampling
from pθ(x|y)).
Estimate the regularisation parameter using the Bayesian
framework.
Our main ingredients:
Markov chain sampling (functional autoregressive models),
Stochastic approximation with Markovian noise,
Empirical Bayes methodology.
10 / 31
Outline
1 Bayesian inference in imaging inverse problems
2 Sampling from high-dimensional models
3 Empirical Bayes estimation
4 Conclusion
11 / 31
Langevin diffusion
Sampling from π(x) ∝ e−U(x)
: a continuous-time solution
dXt = −∇U(Xt) +
√
2dBt ,
with (Bt)t≥0 d-dimensional Brownian motion.
Existence of unique strong solution for Lipschitz ∇U.
Pt(x, A) = P (Xx
t ∈ A) (semigroup of the diffusion).
Ergodicity under weak assumptions Roberts and Tweedie (1996).
Ergodicity of Langevin diffusion
If there exists R ≥ 0 such that for any x ∈ Rd
with kxk ≥ R,
h∇U(x), xi ≥ −akxk2
then the diffusion is ergodic, i.e.
lim
t→+∞
kPt(x, ·) − πkTV = 0 ,
where we recall that kµ − νkTV = supA∈B(Rd){µ(A) − ν(A)}.
12 / 31
Euler-Maruyama discretization
We cannot sample from general continuous-time processes.
The Euler-Maruyama discretizes this continuous-time
dynamics: Unadjusted Langevin Algorithm (ULA)
Xk+1 = Xk − γ∇U(Xk) +
p
2γZk+1 ,
(Zk)k∈N i.i.d. Gaussian r.v with zero mean and Id covariance matrix.
13 / 31
Ergodicity results
Similarly to the continuous-time process define
Rγ(x, A) = P

x − γ∇U(x) +
p
2γZ ∈ A

. (4)
We define
Rγf(x) =
Z
Rd
f(y)Rγ(x, dy) = E [f(Xx
)] . (5)
and
Rn
γ (x, A) =
Z
Rd
· · ·
Z
Rd
Rγ(x, dx2)Rγ(x2, dx3) . . . Rγ(xn, A) . (6)
(Rn
γ )n∈N admits an invariant measure πγ under a Lyapunov
type condition RγV (x) ≤ V (x) − γ + bγ1x∈K.
The chain is ergodic limn→+∞ kRn
γ (x, ·) − πγkTV = 0, see (Douc
et al., 2018, Theorem 10.2.13, Theorem 11.3.1).
We have limγ→0 kπ − πγkTV = 0 Durmus and Moulines (2017).
14 / 31
Quantitative convergence bounds
Can we get quantitative convergence rates?
Using Foster-Lyapunov conditions and minorization conditions
Douc et al. (2018) we can obtain geometric convergence for some
distance even without strong convexity.
We introduce the Wasserstein distance with cost c, given for
any µ, ν ∈ P(Rd
) by
Wc(µ, ν) = infπ∈Λ(µ,ν)
R
Rd×Rd c(x, y)dπ(x, y) . (7)
Λ(µ, ν) = set of couplings between µ and ν.
Example 1: c(x, y) = 1Rd{0}(x, y) → total variation.
Example 2: c(x, y) = kx − yk → Wasserstein distance of order 1.
15 / 31
Convergence of the EM discretization
Geometric ergodicity of ULA
Assume that
k∇U(x) − ∇U(y)k ≤ L kx − yk,
h∇U(x) − ∇U(y), x − yi ≥ m kx − yk
2
for kx − yk ≥ R
There exist γ̄  0, Dγ̄,1, Dγ̄,2, Eγ̄ ≥ 0 and λγ̄, ργ̄ ∈ [0, 1) with
λγ̄ ≤ ργ̄, such that for any γ ∈ (0, γ̄], x, y ∈ Rd
and k ∈ N
Wc(δxRk
γ, δyRk
γ) ≤ λ
kγ/4
γ̄ [Dγ̄,1c(x, y) + Dγ̄,21x6=y] + Eγ̄ρ
kγ/4
γ̄ 1x6=y ,
where c(x, y) = 1x6=y(1 + kx − yk /R).
the first cv rate characterizes the forgetting of the initial
conditions.
The second cv rate characterizes effective convergence rate.
Independence w.r.t the dimension d.
Geometric ergodicity w.r.t k · kTV and W1.
16 / 31
Non-differentiable case
What if U = f + g with g non-differentiable (but convex)? → use
the Moreau-Yoshida envelope.
Different converging schemes
Xk+1 = proxγ
g (Xk − γ∇f(Xk) +
p
2γZk+1) , (8)
Xk+1 = proxγ
g (Xk) − γ∇f(proxγ
g (Xk)) +
p
2γZk+1 , (9)
Xk+1 = Xk − γ(∇f(Xk) + (Xk − proxγ
g (Xk))/γ) +
p
2γZk+1 .
(10)
Note that (10), Moreau Yoshida Unadjusted Langevin Algorithm
(MYULA) is ULA applied to f + gγ
.
Geometric convergence under similar conditions as in the
differentiable case (regularity + strong convexity at infinity).
17 / 31
Outline
1 Bayesian inference in imaging inverse problems
2 Sampling from high-dimensional models
3 Empirical Bayes estimation
4 Conclusion
18 / 31
Regularisation parameter MLE
Back to the estimation of θ.
p(x|y, θ) ∝ p(y|x)p(x|θ) . (11)
In this talk we adopt an empirical Bayes approach and consider the
Maximum Likelihood Estimation (MLE)
θ?
= arg max
θ∈Θ
p(y|θ) = arg max
θ∈Θ
Z
Rd
p(y, x|θ)dx ,
Θ is some convex compact set in Rp
.
We solve it by using a stochastic gradient algorithm driven by
two proximal MCMC kernels.
Given θ?
, we then compute
x?
= arg min
x∈Rd
{− log(p(y|x)) − log(p(x|θ?
))} , (12)
using efficient algorithms available in the optimization field.
19 / 31
Projected gradient algorithm
First idea to find the minimizers of θ 7→ − log(p(y|θ)): use some
projected gradient descent algorithm
θn+1 = ΠΘ [θn + δn∇θ log p(y|θn)] , (13)
with (δn)n∈N some sequence of stepsizes and ΠΘ the projection
onto Θ.
If θ 7→ p(y|θ) is convex then this scheme converges towards θ?
(if
it is unique).
Problem: ∇ log p(y|θ) is intractable.
20 / 31
Stochastic projected gradient algorithm
Remark that we have (Fisher’s identity) (p(x|θ) ∝ exp[−hθ, ϕi])
∇θ log p(y|θ) = Ex|y,θ[∇θ log p(x, y|θ)]
= −Ex|y,θ[ϕ + ∇θ log Z(θ)] ,
where Z(θ) is the normalizing constant
Z(θ) =
R
Rd exp[−hθ, ϕ(x)i]dx.
In addition, since ∇θ log Z(θ) = −Ex|θ[ϕ(x)], we get that
∇θ log p(y|θ) = Ex|θ[ϕ(x)] − Ex|y,θ[ϕ(x)] . (14)
But, again, most of the time these expectations are intractable.
Similarities with Energy-based models (EBM) for generative
modelling (in this setting θ represents the parameter of a neural
network).
21 / 31
Our algorithm
In the differentiable case: Stochastic Optimization with Unadjusted
Langevin (SOUL) Algorithm.
Initialisation X0, U0 ∈ Rd
, θ ∈ Θ, (δk)k∈N = (δ0(k + 1)−0.8
)k∈N.
for k = 0 to n
(i) Markov chain update (MYULA) Xn+1 with target
x 7→ p(y|x, θn)
(ii) Markov chain update (MYULA) Un+1 with target
x 7→ p(x|θn)
(iii) Stochastic gradient update
θn+1 = ΠΘ[θn + δn(ϕ(Un+1) − ϕ(Xn+1))] . (15)
end for
Output The iterates (θn)n∈N.
22 / 31
Our algorithm (explicit recursion)
Initialisation X0, U0 ∈ Rd
, θ ∈ Θ, (δk)k∈N = (δ0(k + 1)−0.8
)k∈N.
for k = 0 to n
(i) Markov chain update (MYULA) Xn+1
Xn+1 = (1 − γn/λn)Xn + γn∇ log(p(y|Xn, θn))
+ (γn/λn) proxλn
log(p(x|θn))(Xn) +
p
2γnZ1
n+1 . (16)
(ii) Markov chain update (MYULA) Un+1
Un+1 = (1 − γn/λn)Un + (γn/λn) proxλn
log(p(x|θn))(Un) +
p
2γnZ2
n+1 .
(17)
(iii) Stochastic gradient update
θn+1 = ΠΘ[θn + δn(ϕ(Un+1) − ϕ(Xn+1))] . (18)
end for
Output The iterates (θn)n∈N.
23 / 31
Convergence results
Convergence of the averaged sequence
Assume that
θ 7→ log(p(y|θ)) is convex with Lipschitz gradient.
k log(p(x))k ≥ ηkxk − c for any x ∈ Rd
P
n∈N δn = +∞,
P
n∈N δnγ
1/2
n  +∞,
P
n∈N δ2
nγ−2
n  +∞
Then almost surely
exp
( n
X
k=1
δk log(p(y|θk))
, n
X
k=1
δk
)
−min
Θ
log(p(y|θ)) ≤ C
, n
X
k=1
δk
!
.
(19)
other conditions on log(p) can be considered (tail conditions).
A similar result holds in expectation with explicit bounds.
Possible extension to the non-convex setting (convergence of the
averaged sequence associated with (k∇f(θk)k2
)k∈N.
24 / 31
Deblurring with Total-Variation Prior
SNR=20dB SNR=30 SNR=40
MSE Time (min) MSE Time (min) MSE Time (min)
Best 23.29 21.39 19.06
Emp. Bayes 23.50 0.86 21.46 0.85 19.24 0.85
Hier. Bayes 25.07 0.58 22.84 1.27 19.84 3.27
SUGAR 24.44 3.92 24.24 4.50 24.21 4.81
Original Degraded x

EB
x

HB x

DP x

SUG
X
X
X
SNR=20
SNR=30
SNR=40
X Min MSE
Empirical B.
Disc. Prin.
Hierarchical B.
SUGAR
10-4 0.001 0.010 0.100 1
θ
20
30
40
50
60
70
MSE(θ)
Image:flinstones
25 / 31
Denoising with Total Generalized Variation
We consider TGV2
θ(u) = infr∈R2d
n
θ1 krk1,2 + θ2 kJ(∆x − r)k1,Frob
o
Chambolle and Lions (1997).
Figure 2: Goldhill image (Original-Degraded-Estimated MAP),
SNR=12dB.
26 / 31
Denoising with Total Generalized Variation
27 / 31
Denoising with Total Generalized Variation
Evolution of θ through iterations starting from different initial values:
θinit=10
θinit=0.1
θinit=40
28 / 31
Outline
1 Bayesian inference in imaging inverse problems
2 Sampling from high-dimensional models
3 Empirical Bayes estimation
4 Conclusion
29 / 31
Conclusion
The Bayesian framework provides a mathematical setting to
compute many statistics on p(x|y, θ).
In this presentation, we focus on the problem of selecting the
regularisation parameter.
Combining tools from Markov chain theory and stochastic
approximation we derive a scheme which provably converges
towards the optimal regularizing parameter in an empirical
Bayesian sense.
The algorithm works well in practice (even in cases not covered
by the theory (yet!)).
30 / 31
Perspectives
The inverse problems we consider are still quite
simple/generic (total variation, `1 loss...). Can we extend our
tools to cover more intricate and problem specific priors?
(wavelet-based prior or composite prior)
Can we use more advanced optimization schemes for
sampling/optimization? (dual averaging, mirror descent) Better
convergence guarantees?
Can we use data-based priors Song and Ermon (2019)?
Thank you for your attention!
31 / 31
Bibliography:
Mariana SC Almeida and Mário AT Figueiredo. Parameter estimation for blind
and non-blind deblurring using residual whiteness measures. IEEE
Transactions on Image Processing, 22(7):2751–2763, 2013.
Amir Beck and Marc Teboulle. A fast iterative shrinkage-thresholding algorithm
for linear inverse problems. SIAM journal on imaging sciences, 2(1):183–202,
2009.
Stephen Boyd, Neal Parikh, and Eric Chu. Distributed optimization and statistical
learning via the alternating direction method of multipliers. Now Publishers
Inc, 2011.
Bradley P. Carlin and Thomas A. Louis. Empirical Bayes: past, present and
future. J. Amer. Statist. Assoc., 95(452):1286–1289, 2000. ISSN 0162-1459.
doi: 10.2307/2669771. URL https://doi.org/10.2307/2669771.
Antonin Chambolle and Pierre-Louis Lions. Image recovery via total variation
minimization and related problems. Numerische Mathematik, 76(2):167–188,
1997.
Antonin Chambolle and Thomas Pock. A first-order primal-dual algorithm for
convex problems with applications to imaging. Journal of mathematical
imaging and vision, 40(1):120–145, 2011.
32 / 31
Caroline Chaux, Jean-Christophe Pesquet, and Nelly Pustelnik. Nested iterative
algorithms for convex constrained image recovery problems. SIAM Journal on
Imaging Sciences, 2(2):730–762, 2009.
Charles-Alban Deledalle, Samuel Vaiter, Jalal Fadili, and Gabriel Peyré. Stein
Unbiased GrAdient estimator of the Risk (SUGAR) for multiple parameter
selection. SIAM Journal on Imaging Sciences, 7(4):2448–2487, 2014.
Randal Douc, Eric Moulines, Pierre Priouret, and Philippe Soulier. Markov
chains. Springer Series in Operations Research and Financial Engineering.
Springer, Cham, 2018. ISBN 978-3-319-97703-4; 978-3-319-97704-1.
A. Durmus and É. Moulines. Nonasymptotic convergence analysis for the
unadjusted Langevin algorithm. Ann. Appl. Probab., 27(3):1551–1587, 2017.
ISSN 1050-5164.
Gene H Golub, Michael Heath, and Grace Wahba. Generalized cross-validation as
a method for choosing a good ridge parameter. Technometrics, 21(2):215–223,
1979.
Charles L Lawson and Richard J Hanson. Solving least squares problems,
volume 15. Siam, 1995.
Vladimir Alekseevich Morozov. Methods for solving incorrectly posed problems.
Springer Science  Business Media, 2012.
33 / 31
Arkadi Nemirovski. Prox-method with rate of convergence o (1/t) for variational
inequalities with lipschitz continuous monotone operators and smooth
convex-concave saddle point problems. SIAM Journal on Optimization, 15(1):
229–251, 2004.
Yu Nesterov. Smooth minimization of non-smooth functions. Mathematical
programming, 103(1):127–152, 2005.
Marcelo Pereyra, Nicolas Dobigeon, Hadj Batatia, and Jean-Yves Tourneret.
Estimating the granularity coefficient of a Potts-Markov random field within a
Markov chain Monte Carlo algorithm. IEEE Transactions on Image
Processing, 22(6):2385–2397, 2013.
Audrey Repetti, Marcelo Pereyra, and Yves Wiaux. Scalable bayesian uncertainty
quantification in imaging inverse problems via convex optimization. SIAM
Journal on Imaging Sciences, 12(1):87–118, 2019.
G. O. Roberts and R. L. Tweedie. Exponential convergence of Langevin
distributions and their discrete approximations. Bernoulli, 2(4):341–363, 1996.
ISSN 1350-7265.
Yang Song and Stefano Ermon. Generative modeling by estimating gradients of
the data distribution. arXiv preprint arXiv:1907.05600, 2019.
34 / 31

Más contenido relacionado

La actualidad más candente

Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsRao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsChristian Robert
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerChristian Robert
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsChristian Robert
 
Poster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferencePoster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferenceChristian Robert
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsChristian Robert
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?Christian Robert
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distancesChristian Robert
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big DataChristian Robert
 
accurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannaccurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannolli0601
 
Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010Christian Robert
 

La actualidad más candente (20)

Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithmsRao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
Rao-Blackwellisation schemes for accelerating Metropolis-Hastings algorithms
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Coordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like samplerCoordinate sampler : A non-reversible Gibbs-like sampler
Coordinate sampler : A non-reversible Gibbs-like sampler
 
Delayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithmsDelayed acceptance for Metropolis-Hastings algorithms
Delayed acceptance for Metropolis-Hastings algorithms
 
Poster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conferencePoster for Bayesian Statistics in the Big Data Era conference
Poster for Bayesian Statistics in the Big Data Era conference
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Multiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximationsMultiple estimators for Monte Carlo approximations
Multiple estimators for Monte Carlo approximations
 
Nested sampling
Nested samplingNested sampling
Nested sampling
 
Can we estimate a constant?
Can we estimate a constant?Can we estimate a constant?
Can we estimate a constant?
 
Richard Everitt's slides
Richard Everitt's slidesRichard Everitt's slides
Richard Everitt's slides
 
ABC based on Wasserstein distances
ABC based on Wasserstein distancesABC based on Wasserstein distances
ABC based on Wasserstein distances
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Unbiased Bayes for Big Data
Unbiased Bayes for Big DataUnbiased Bayes for Big Data
Unbiased Bayes for Big Data
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
Big model, big data
Big model, big dataBig model, big data
Big model, big data
 
accurate ABC Oliver Ratmann
accurate ABC Oliver Ratmannaccurate ABC Oliver Ratmann
accurate ABC Oliver Ratmann
 
Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010Mark Girolami's Read Paper 2010
Mark Girolami's Read Paper 2010
 
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
QMC Program: Trends and Advances in Monte Carlo Sampling Algorithms Workshop,...
 
ABC in Venezia
ABC in VeneziaABC in Venezia
ABC in Venezia
 

Similar a Maximum likelihood estimation of regularisation parameters in inverse problems: an empirical Bayesian approach

Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...HidenoriOgata
 
IVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionIVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionCharles Deledalle
 
1 hofstad
1 hofstad1 hofstad
1 hofstadYandex
 
Numerical integration based on the hyperfunction theory
Numerical integration based on the hyperfunction theoryNumerical integration based on the hyperfunction theory
Numerical integration based on the hyperfunction theoryHidenoriOgata
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking componentsChristian Robert
 
Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Umberto Picchini
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep LearningRayKim51
 
Distributed solution of stochastic optimal control problem on GPUs
Distributed solution of stochastic optimal control problem on GPUsDistributed solution of stochastic optimal control problem on GPUs
Distributed solution of stochastic optimal control problem on GPUsPantelis Sopasakis
 
590-Article Text.pdf
590-Article Text.pdf590-Article Text.pdf
590-Article Text.pdfBenoitValea
 
590-Article Text.pdf
590-Article Text.pdf590-Article Text.pdf
590-Article Text.pdfBenoitValea
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsElvis DOHMATOB
 
Higher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming Problem
Higher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming ProblemHigher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming Problem
Higher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming Probleminventionjournals
 
Q-Metrics in Theory and Practice
Q-Metrics in Theory and PracticeQ-Metrics in Theory and Practice
Q-Metrics in Theory and PracticeMagdi Mohamed
 
Q-Metrics in Theory And Practice
Q-Metrics in Theory And PracticeQ-Metrics in Theory And Practice
Q-Metrics in Theory And Practiceguest3550292
 

Similar a Maximum likelihood estimation of regularisation parameters in inverse problems: an empirical Bayesian approach (20)

QMC: Operator Splitting Workshop, Proximal Algorithms in Probability Spaces -...
QMC: Operator Splitting Workshop, Proximal Algorithms in Probability Spaces -...QMC: Operator Splitting Workshop, Proximal Algorithms in Probability Spaces -...
QMC: Operator Splitting Workshop, Proximal Algorithms in Probability Spaces -...
 
MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models...
MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models...MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models...
MUMS Opening Workshop - Panel Discussion: Facts About Some Statisitcal Models...
 
Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...Hyperfunction method for numerical integration and Fredholm integral equation...
Hyperfunction method for numerical integration and Fredholm integral equation...
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Model Selection in the...
 
IVR - Chapter 1 - Introduction
IVR - Chapter 1 - IntroductionIVR - Chapter 1 - Introduction
IVR - Chapter 1 - Introduction
 
1 hofstad
1 hofstad1 hofstad
1 hofstad
 
Numerical integration based on the hyperfunction theory
Numerical integration based on the hyperfunction theoryNumerical integration based on the hyperfunction theory
Numerical integration based on the hyperfunction theory
 
cswiercz-general-presentation
cswiercz-general-presentationcswiercz-general-presentation
cswiercz-general-presentation
 
Testing for mixtures by seeking components
Testing for mixtures by seeking componentsTesting for mixtures by seeking components
Testing for mixtures by seeking components
 
Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...Inference for stochastic differential equations via approximate Bayesian comp...
Inference for stochastic differential equations via approximate Bayesian comp...
 
Bayesian Deep Learning
Bayesian Deep LearningBayesian Deep Learning
Bayesian Deep Learning
 
talk MCMC & SMC 2004
talk MCMC & SMC 2004talk MCMC & SMC 2004
talk MCMC & SMC 2004
 
Distributed solution of stochastic optimal control problem on GPUs
Distributed solution of stochastic optimal control problem on GPUsDistributed solution of stochastic optimal control problem on GPUs
Distributed solution of stochastic optimal control problem on GPUs
 
590-Article Text.pdf
590-Article Text.pdf590-Article Text.pdf
590-Article Text.pdf
 
590-Article Text.pdf
590-Article Text.pdf590-Article Text.pdf
590-Article Text.pdf
 
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
Program on Quasi-Monte Carlo and High-Dimensional Sampling Methods for Applie...
 
MVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priorsMVPA with SpaceNet: sparse structured priors
MVPA with SpaceNet: sparse structured priors
 
Higher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming Problem
Higher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming ProblemHigher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming Problem
Higher-Order (F, α, β, ρ, d) –Convexity for Multiobjective Programming Problem
 
Q-Metrics in Theory and Practice
Q-Metrics in Theory and PracticeQ-Metrics in Theory and Practice
Q-Metrics in Theory and Practice
 
Q-Metrics in Theory And Practice
Q-Metrics in Theory And PracticeQ-Metrics in Theory And Practice
Q-Metrics in Theory And Practice
 

Último

Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsSérgio Sacani
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)Areesha Ahmad
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...ssuser79fe74
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...Lokesh Kothari
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and ClassificationsAreesha Ahmad
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑Damini Dixit
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptxAlMamun560346
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfSumit Kumar yadav
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfSumit Kumar yadav
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bSérgio Sacani
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...chandars293
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Sérgio Sacani
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfrohankumarsinghrore1
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​kaibalyasahoo82800
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfSumit Kumar yadav
 

Último (20)

Site Acceptance Test .
Site Acceptance Test                    .Site Acceptance Test                    .
Site Acceptance Test .
 
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune WaterworldsBiogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
Biogenic Sulfur Gases as Biosignatures on Temperate Sub-Neptune Waterworlds
 
GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)GBSN - Biochemistry (Unit 1)
GBSN - Biochemistry (Unit 1)
 
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
Chemical Tests; flame test, positive and negative ions test Edexcel Internati...
 
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
GUIDELINES ON SIMILAR BIOLOGICS Regulatory Requirements for Marketing Authori...
 
Bacterial Identification and Classifications
Bacterial Identification and ClassificationsBacterial Identification and Classifications
Bacterial Identification and Classifications
 
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
High Profile 🔝 8250077686 📞 Call Girls Service in GTB Nagar🍑
 
Seismic Method Estimate velocity from seismic data.pptx
Seismic Method Estimate velocity from seismic  data.pptxSeismic Method Estimate velocity from seismic  data.pptx
Seismic Method Estimate velocity from seismic data.pptx
 
Chemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdfChemistry 4th semester series (krishna).pdf
Chemistry 4th semester series (krishna).pdf
 
Botany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdfBotany 4th semester file By Sumit Kumar yadav.pdf
Botany 4th semester file By Sumit Kumar yadav.pdf
 
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 bAsymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
Asymmetry in the atmosphere of the ultra-hot Jupiter WASP-76 b
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
High Class Escorts in Hyderabad ₹7.5k Pick Up & Drop With Cash Payment 969456...
 
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
Discovery of an Accretion Streamer and a Slow Wide-angle Outflow around FUOri...
 
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdfPests of cotton_Sucking_Pests_Dr.UPR.pdf
Pests of cotton_Sucking_Pests_Dr.UPR.pdf
 
Forensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdfForensic Biology & Its biological significance.pdf
Forensic Biology & Its biological significance.pdf
 
Nanoparticles synthesis and characterization​ ​
Nanoparticles synthesis and characterization​  ​Nanoparticles synthesis and characterization​  ​
Nanoparticles synthesis and characterization​ ​
 
Presentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptxPresentation Vikram Lander by Vedansh Gupta.pptx
Presentation Vikram Lander by Vedansh Gupta.pptx
 
Botany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdfBotany 4th semester series (krishna).pdf
Botany 4th semester series (krishna).pdf
 

Maximum likelihood estimation of regularisation parameters in inverse problems: an empirical Bayesian approach

  • 1. Maximum likelihood estimation of regularisation parameters in inverse problems: an empirical Bayesian approach. V. De Bortoli joint work with: A.F. Vidal, M. Pereyra, A. Durmus January 28, 2021 Oxford University 0 / 31
  • 2. Outline 1 Bayesian inference in imaging inverse problems 2 Sampling from high-dimensional models 3 Empirical Bayes estimation 4 Conclusion 1 / 31
  • 3. Forward imaging problem True scene Imaging device Observed image 2 / 31
  • 4. Inverse imaging problem True scene Imaging device Observed image Restored image 3 / 31
  • 5. General setting How to recover in an unknown image x ∈ Rd ? We measure y, related to x by some mathematical model. For example, in many imaging problems y = A(x) + w, for some operator A : Rd → Rd (non necessarily linear) that might be poorly conditioned or rank deficient, and an unknown perturbation or “noise” w. The recovery of x from y is often ill-posed or ill-conditioned, so we regularise the problem to make it well posed. 4 / 31
  • 6. Bayesian basics Probabilistic framework to provide estimations on the recovery x and related quantities (uncertainty, high posterior density intervall...) Adopting a subjective probability approach, we propose a prior on x, denoted p(x) (more on the choice of the prior later). To derive inferences about x from y we postulate a joint statistical model p(x, y); typically specified via the decomposition p(x, y) = p(y|x)p(x). Using this decomposition, we then compute quantities related to p(x|y) using Bayes’ rule. 5 / 31
  • 7. Likelihood VS prior information The decomposition p(x, y) = p(y|x)p(x) has two key ingredients: The likelihood function: the conditional distribution p(y|x) that models the data observation process (forward model). The prior function: the marginal distribution p(x) that models our knowledge about x “before observing y”. In our examples, p(y|x) is Gaussian (with semi-definite positive covariance matrix). This covers many imaging problems provided that the noise is Gaussian (deblurring, denoising, hyperspectral unmixing). " Many other possible choices for the noise (Poisson, binomial...) leading to other (and often more complicated) models. Usually p(x) enforces desirable properties on the solution (sparsity in a wavelet basis, smoothness) but new machine-learning based approaches use data-based priors, see Song and Ermon (2019). 6 / 31
  • 8. Regularisation parameters and prior Often the prior will be of the form p(x|θ) = exp[−hθ, ϕ(x)i] Z Rd exp[−hθ, ϕ(x̃)i]dx̃ , (1) where θ ∈ Rp is a regularisation parameter, ϕ : Rd → Rp . θ controls the trade-off between the likelihood information and the prior information. θ might be hard to select depending on the problem. There exists numerous approaches to tune θ: generalised cross-validation Golub et al. (1979) L-curve Lawson and Hanson (1995) the discrepancy principle Morozov (2012) residual whiteness measures Almeida and Figueiredo (2013) Stein’s Unbiased Risk Estimator Deledalle et al. (2014) hierarchical Bayes Pereyra et al. (2013) empirical Bayes Carlin and Louis (2000) 7 / 31
  • 9. Maximum-a-posteriori (MAP) estimation A first estimator: Maximum A Posteriori estimation x? = arg max x∈Rd p(x|y, θ) = arg max x∈Rd {p(y|x)p(x|θ)} . (2) In the convex case, huge literature on the topic (Nesterov (2005); Nemirovski (2004); Chambolle and Pock (2011)) Fast algorithms in many cases (non-differentiable priors: ISTA, FISTA Beck and Teboulle (2009), constrained composite problems Chaux et al. (2009), ADMM Boyd et al. (2011)...) However, depending on the application, MAP estimation has some limitations e.g., This is a point estimator. Can we trust our estimator? How to perform model selection? Is the mode really what we want? (what about the mean or the median?) Sensitivity w.r.t to θ. 8 / 31
  • 10. Illustrative example: astronomical image reconstruction Recover x ∈ Rd from low-dimensional degraded observation y = MFx + w, where F is the Fourier transform, M ∈ Cm×d is a measurement operator and w is Gaussian noise. We use the model p(x|y) ∝ exp −ky − MFxk2 /2σ2 − θkΨxk1 1Rn + (x). (3) y x? Figure 1: Radio-interferometric image reconstruction of the W28 supernova. Image from Repetti et al. (2019) 9 / 31
  • 11. Contribution Our goal here is to: Define efficient samplers in high-dimensional space (sampling from pθ(x|y)). Estimate the regularisation parameter using the Bayesian framework. Our main ingredients: Markov chain sampling (functional autoregressive models), Stochastic approximation with Markovian noise, Empirical Bayes methodology. 10 / 31
  • 12. Outline 1 Bayesian inference in imaging inverse problems 2 Sampling from high-dimensional models 3 Empirical Bayes estimation 4 Conclusion 11 / 31
  • 13. Langevin diffusion Sampling from π(x) ∝ e−U(x) : a continuous-time solution dXt = −∇U(Xt) + √ 2dBt , with (Bt)t≥0 d-dimensional Brownian motion. Existence of unique strong solution for Lipschitz ∇U. Pt(x, A) = P (Xx t ∈ A) (semigroup of the diffusion). Ergodicity under weak assumptions Roberts and Tweedie (1996). Ergodicity of Langevin diffusion If there exists R ≥ 0 such that for any x ∈ Rd with kxk ≥ R, h∇U(x), xi ≥ −akxk2 then the diffusion is ergodic, i.e. lim t→+∞ kPt(x, ·) − πkTV = 0 , where we recall that kµ − νkTV = supA∈B(Rd){µ(A) − ν(A)}. 12 / 31
  • 14. Euler-Maruyama discretization We cannot sample from general continuous-time processes. The Euler-Maruyama discretizes this continuous-time dynamics: Unadjusted Langevin Algorithm (ULA) Xk+1 = Xk − γ∇U(Xk) + p 2γZk+1 , (Zk)k∈N i.i.d. Gaussian r.v with zero mean and Id covariance matrix. 13 / 31
  • 15. Ergodicity results Similarly to the continuous-time process define Rγ(x, A) = P x − γ∇U(x) + p 2γZ ∈ A . (4) We define Rγf(x) = Z Rd f(y)Rγ(x, dy) = E [f(Xx )] . (5) and Rn γ (x, A) = Z Rd · · · Z Rd Rγ(x, dx2)Rγ(x2, dx3) . . . Rγ(xn, A) . (6) (Rn γ )n∈N admits an invariant measure πγ under a Lyapunov type condition RγV (x) ≤ V (x) − γ + bγ1x∈K. The chain is ergodic limn→+∞ kRn γ (x, ·) − πγkTV = 0, see (Douc et al., 2018, Theorem 10.2.13, Theorem 11.3.1). We have limγ→0 kπ − πγkTV = 0 Durmus and Moulines (2017). 14 / 31
  • 16. Quantitative convergence bounds Can we get quantitative convergence rates? Using Foster-Lyapunov conditions and minorization conditions Douc et al. (2018) we can obtain geometric convergence for some distance even without strong convexity. We introduce the Wasserstein distance with cost c, given for any µ, ν ∈ P(Rd ) by Wc(µ, ν) = infπ∈Λ(µ,ν) R Rd×Rd c(x, y)dπ(x, y) . (7) Λ(µ, ν) = set of couplings between µ and ν. Example 1: c(x, y) = 1Rd{0}(x, y) → total variation. Example 2: c(x, y) = kx − yk → Wasserstein distance of order 1. 15 / 31
  • 17. Convergence of the EM discretization Geometric ergodicity of ULA Assume that k∇U(x) − ∇U(y)k ≤ L kx − yk, h∇U(x) − ∇U(y), x − yi ≥ m kx − yk 2 for kx − yk ≥ R There exist γ̄ 0, Dγ̄,1, Dγ̄,2, Eγ̄ ≥ 0 and λγ̄, ργ̄ ∈ [0, 1) with λγ̄ ≤ ργ̄, such that for any γ ∈ (0, γ̄], x, y ∈ Rd and k ∈ N Wc(δxRk γ, δyRk γ) ≤ λ kγ/4 γ̄ [Dγ̄,1c(x, y) + Dγ̄,21x6=y] + Eγ̄ρ kγ/4 γ̄ 1x6=y , where c(x, y) = 1x6=y(1 + kx − yk /R). the first cv rate characterizes the forgetting of the initial conditions. The second cv rate characterizes effective convergence rate. Independence w.r.t the dimension d. Geometric ergodicity w.r.t k · kTV and W1. 16 / 31
  • 18. Non-differentiable case What if U = f + g with g non-differentiable (but convex)? → use the Moreau-Yoshida envelope. Different converging schemes Xk+1 = proxγ g (Xk − γ∇f(Xk) + p 2γZk+1) , (8) Xk+1 = proxγ g (Xk) − γ∇f(proxγ g (Xk)) + p 2γZk+1 , (9) Xk+1 = Xk − γ(∇f(Xk) + (Xk − proxγ g (Xk))/γ) + p 2γZk+1 . (10) Note that (10), Moreau Yoshida Unadjusted Langevin Algorithm (MYULA) is ULA applied to f + gγ . Geometric convergence under similar conditions as in the differentiable case (regularity + strong convexity at infinity). 17 / 31
  • 19. Outline 1 Bayesian inference in imaging inverse problems 2 Sampling from high-dimensional models 3 Empirical Bayes estimation 4 Conclusion 18 / 31
  • 20. Regularisation parameter MLE Back to the estimation of θ. p(x|y, θ) ∝ p(y|x)p(x|θ) . (11) In this talk we adopt an empirical Bayes approach and consider the Maximum Likelihood Estimation (MLE) θ? = arg max θ∈Θ p(y|θ) = arg max θ∈Θ Z Rd p(y, x|θ)dx , Θ is some convex compact set in Rp . We solve it by using a stochastic gradient algorithm driven by two proximal MCMC kernels. Given θ? , we then compute x? = arg min x∈Rd {− log(p(y|x)) − log(p(x|θ? ))} , (12) using efficient algorithms available in the optimization field. 19 / 31
  • 21. Projected gradient algorithm First idea to find the minimizers of θ 7→ − log(p(y|θ)): use some projected gradient descent algorithm θn+1 = ΠΘ [θn + δn∇θ log p(y|θn)] , (13) with (δn)n∈N some sequence of stepsizes and ΠΘ the projection onto Θ. If θ 7→ p(y|θ) is convex then this scheme converges towards θ? (if it is unique). Problem: ∇ log p(y|θ) is intractable. 20 / 31
  • 22. Stochastic projected gradient algorithm Remark that we have (Fisher’s identity) (p(x|θ) ∝ exp[−hθ, ϕi]) ∇θ log p(y|θ) = Ex|y,θ[∇θ log p(x, y|θ)] = −Ex|y,θ[ϕ + ∇θ log Z(θ)] , where Z(θ) is the normalizing constant Z(θ) = R Rd exp[−hθ, ϕ(x)i]dx. In addition, since ∇θ log Z(θ) = −Ex|θ[ϕ(x)], we get that ∇θ log p(y|θ) = Ex|θ[ϕ(x)] − Ex|y,θ[ϕ(x)] . (14) But, again, most of the time these expectations are intractable. Similarities with Energy-based models (EBM) for generative modelling (in this setting θ represents the parameter of a neural network). 21 / 31
  • 23. Our algorithm In the differentiable case: Stochastic Optimization with Unadjusted Langevin (SOUL) Algorithm. Initialisation X0, U0 ∈ Rd , θ ∈ Θ, (δk)k∈N = (δ0(k + 1)−0.8 )k∈N. for k = 0 to n (i) Markov chain update (MYULA) Xn+1 with target x 7→ p(y|x, θn) (ii) Markov chain update (MYULA) Un+1 with target x 7→ p(x|θn) (iii) Stochastic gradient update θn+1 = ΠΘ[θn + δn(ϕ(Un+1) − ϕ(Xn+1))] . (15) end for Output The iterates (θn)n∈N. 22 / 31
  • 24. Our algorithm (explicit recursion) Initialisation X0, U0 ∈ Rd , θ ∈ Θ, (δk)k∈N = (δ0(k + 1)−0.8 )k∈N. for k = 0 to n (i) Markov chain update (MYULA) Xn+1 Xn+1 = (1 − γn/λn)Xn + γn∇ log(p(y|Xn, θn)) + (γn/λn) proxλn log(p(x|θn))(Xn) + p 2γnZ1 n+1 . (16) (ii) Markov chain update (MYULA) Un+1 Un+1 = (1 − γn/λn)Un + (γn/λn) proxλn log(p(x|θn))(Un) + p 2γnZ2 n+1 . (17) (iii) Stochastic gradient update θn+1 = ΠΘ[θn + δn(ϕ(Un+1) − ϕ(Xn+1))] . (18) end for Output The iterates (θn)n∈N. 23 / 31
  • 25. Convergence results Convergence of the averaged sequence Assume that θ 7→ log(p(y|θ)) is convex with Lipschitz gradient. k log(p(x))k ≥ ηkxk − c for any x ∈ Rd P n∈N δn = +∞, P n∈N δnγ 1/2 n +∞, P n∈N δ2 nγ−2 n +∞ Then almost surely exp ( n X k=1 δk log(p(y|θk)) , n X k=1 δk ) −min Θ log(p(y|θ)) ≤ C , n X k=1 δk ! . (19) other conditions on log(p) can be considered (tail conditions). A similar result holds in expectation with explicit bounds. Possible extension to the non-convex setting (convergence of the averaged sequence associated with (k∇f(θk)k2 )k∈N. 24 / 31
  • 26. Deblurring with Total-Variation Prior SNR=20dB SNR=30 SNR=40 MSE Time (min) MSE Time (min) MSE Time (min) Best 23.29 21.39 19.06 Emp. Bayes 23.50 0.86 21.46 0.85 19.24 0.85 Hier. Bayes 25.07 0.58 22.84 1.27 19.84 3.27 SUGAR 24.44 3.92 24.24 4.50 24.21 4.81 Original Degraded x  EB x  HB x  DP x  SUG X X X SNR=20 SNR=30 SNR=40 X Min MSE Empirical B. Disc. Prin. Hierarchical B. SUGAR 10-4 0.001 0.010 0.100 1 θ 20 30 40 50 60 70 MSE(θ) Image:flinstones 25 / 31
  • 27. Denoising with Total Generalized Variation We consider TGV2 θ(u) = infr∈R2d n θ1 krk1,2 + θ2 kJ(∆x − r)k1,Frob o Chambolle and Lions (1997). Figure 2: Goldhill image (Original-Degraded-Estimated MAP), SNR=12dB. 26 / 31
  • 28. Denoising with Total Generalized Variation 27 / 31
  • 29. Denoising with Total Generalized Variation Evolution of θ through iterations starting from different initial values: θinit=10 θinit=0.1 θinit=40 28 / 31
  • 30. Outline 1 Bayesian inference in imaging inverse problems 2 Sampling from high-dimensional models 3 Empirical Bayes estimation 4 Conclusion 29 / 31
  • 31. Conclusion The Bayesian framework provides a mathematical setting to compute many statistics on p(x|y, θ). In this presentation, we focus on the problem of selecting the regularisation parameter. Combining tools from Markov chain theory and stochastic approximation we derive a scheme which provably converges towards the optimal regularizing parameter in an empirical Bayesian sense. The algorithm works well in practice (even in cases not covered by the theory (yet!)). 30 / 31
  • 32. Perspectives The inverse problems we consider are still quite simple/generic (total variation, `1 loss...). Can we extend our tools to cover more intricate and problem specific priors? (wavelet-based prior or composite prior) Can we use more advanced optimization schemes for sampling/optimization? (dual averaging, mirror descent) Better convergence guarantees? Can we use data-based priors Song and Ermon (2019)? Thank you for your attention! 31 / 31
  • 33. Bibliography: Mariana SC Almeida and Mário AT Figueiredo. Parameter estimation for blind and non-blind deblurring using residual whiteness measures. IEEE Transactions on Image Processing, 22(7):2751–2763, 2013. Amir Beck and Marc Teboulle. A fast iterative shrinkage-thresholding algorithm for linear inverse problems. SIAM journal on imaging sciences, 2(1):183–202, 2009. Stephen Boyd, Neal Parikh, and Eric Chu. Distributed optimization and statistical learning via the alternating direction method of multipliers. Now Publishers Inc, 2011. Bradley P. Carlin and Thomas A. Louis. Empirical Bayes: past, present and future. J. Amer. Statist. Assoc., 95(452):1286–1289, 2000. ISSN 0162-1459. doi: 10.2307/2669771. URL https://doi.org/10.2307/2669771. Antonin Chambolle and Pierre-Louis Lions. Image recovery via total variation minimization and related problems. Numerische Mathematik, 76(2):167–188, 1997. Antonin Chambolle and Thomas Pock. A first-order primal-dual algorithm for convex problems with applications to imaging. Journal of mathematical imaging and vision, 40(1):120–145, 2011. 32 / 31
  • 34. Caroline Chaux, Jean-Christophe Pesquet, and Nelly Pustelnik. Nested iterative algorithms for convex constrained image recovery problems. SIAM Journal on Imaging Sciences, 2(2):730–762, 2009. Charles-Alban Deledalle, Samuel Vaiter, Jalal Fadili, and Gabriel Peyré. Stein Unbiased GrAdient estimator of the Risk (SUGAR) for multiple parameter selection. SIAM Journal on Imaging Sciences, 7(4):2448–2487, 2014. Randal Douc, Eric Moulines, Pierre Priouret, and Philippe Soulier. Markov chains. Springer Series in Operations Research and Financial Engineering. Springer, Cham, 2018. ISBN 978-3-319-97703-4; 978-3-319-97704-1. A. Durmus and É. Moulines. Nonasymptotic convergence analysis for the unadjusted Langevin algorithm. Ann. Appl. Probab., 27(3):1551–1587, 2017. ISSN 1050-5164. Gene H Golub, Michael Heath, and Grace Wahba. Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics, 21(2):215–223, 1979. Charles L Lawson and Richard J Hanson. Solving least squares problems, volume 15. Siam, 1995. Vladimir Alekseevich Morozov. Methods for solving incorrectly posed problems. Springer Science Business Media, 2012. 33 / 31
  • 35. Arkadi Nemirovski. Prox-method with rate of convergence o (1/t) for variational inequalities with lipschitz continuous monotone operators and smooth convex-concave saddle point problems. SIAM Journal on Optimization, 15(1): 229–251, 2004. Yu Nesterov. Smooth minimization of non-smooth functions. Mathematical programming, 103(1):127–152, 2005. Marcelo Pereyra, Nicolas Dobigeon, Hadj Batatia, and Jean-Yves Tourneret. Estimating the granularity coefficient of a Potts-Markov random field within a Markov chain Monte Carlo algorithm. IEEE Transactions on Image Processing, 22(6):2385–2397, 2013. Audrey Repetti, Marcelo Pereyra, and Yves Wiaux. Scalable bayesian uncertainty quantification in imaging inverse problems via convex optimization. SIAM Journal on Imaging Sciences, 12(1):87–118, 2019. G. O. Roberts and R. L. Tweedie. Exponential convergence of Langevin distributions and their discrete approximations. Bernoulli, 2(4):341–363, 1996. ISSN 1350-7265. Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. arXiv preprint arXiv:1907.05600, 2019. 34 / 31