1. Motivation: improve statistical models
2. Motivation: disadvantages of matrices
3. Tools: Tucker tensor format
4. Tensor approximation of Matern covariance function via FFT
5. Typical statistical operations in Tucker tensor format
6. Numerical experiments
Tucker tensor analysis of Matern functions in spatial statistics
1. Tucker tensor analysis of Mat´ern functions in spatial
statistics
Alexander Litvinenko,
(joint work with Boris and Venera Khoromskij, Hermann G.
Matthies and David Keyes)
SIAM UQ 2018, Orange County, LA, USA
Bayesian Computational Statistics & Modeling, KAUST
https://bayescomp.kaust.edu.sa/
Stochastic Numerics Group, KAUST
http://sri-uq.kaust.edu.sa/
Extreme Computing Research Center, KAUST
https://ecrc.kaust.edu.sa/
2. 4*
The structure of the talk
Low-rank Tucker tensor methods in spatial statistics
1. Motivation: improve statistical models
2. Motivation: disadvantages of matrices
3. Tools: Tucker tensor format
4. Tensor approximation of Mat´ern covariance function via FFT
5. Typical statistical operations in Tucker tensor format
6. Numerical experiments
2
3. 4*
Motivation: Mat´ern Fields (Whittle, 63)
Taken from D. Simpson (see also Finn Lindgren, Haavard Rue, David Bolin,...)
Theorem
The Mat´ern covariance function
c(x, y) =
1
Γ(ν + d/2)(4π)d/2κ2ν2ν−1
(κ x − y )ν
Kν(κ x − y )
(1)
is the Green’s function of the differential operator
L2
ν = κ2
− ∆
ν+d/2
. (2)
BUT! d-Laplacian over uniform tensor grid is
∆d = A⊗IN⊗...⊗IN+IN⊗A⊗...⊗IN+...+IN⊗IN⊗...⊗A ∈ RI⊗d ⊗I⊗d
(3)
with A = ∆1 = tridiag{−1, 2, −1} ∈ RN×N, and IN being the
N × N identity.
4. 4*
Gaussian Field and Green Function
x(s) is a continuously indexed Gaussian Field (GF) if all finite
collections {x(si } are jointly Gaussian distributed. In most cases,
the GF is specified by using a mean function µ(·) and a covariance
function C(·, ·), so the mean is µ = (µ(si )) and the covariance
matrix is Σ = (C(si , sj )).
4
5. 4*
Gaussian Field and Green Function
A Gaussian field x(u) ∈ Rd with the Mat´ern covariance is a
solution to the linear fractional SPDE
(κ2
− ∆)ν+d/2
x(u) = W (u), κ > 0, ν > 0. (4)
W (u) - is spatial Gaussian white noise with unit variance.
For all x, y ∈ Ω the Green function G(x, y) is the solution of
LG(·, y) = δy with b.c. G(·, y)|Γ = 0, where δy is the Dirac
distribution at y ∈ Ω. The Green function is the kernel of the
inverse L−1, i.e.,
u(x) =
Ω
G(x, y)f (y)dy. (5)
For L = −∆, G(x, y) is analytic in Ω.
6. 4*
Five tasks in statistics to solve
Task 1. Approximate a covariance function in a low-rank tensor
format.
c(x, y) − r
i=1
d
µ=1 ciµ(xµ, yµ) ≤ ε, for some given ε > 0.
Task 2. Computing C1/2 to generate random fields
Task 3. Kriging estimate ˆs = CszC−1
zz z.
6
7. 4*
Five tasks in statistics to solve
Task 4. Geostatistical design.
φA = N−1
trace Css|y , and φC = zT
(Css|y )z,
where Css|y := Css − Csy C−1
yy Cys
Task 5. Computing the joint Gaussian log-likelihood function.
L(θ) = −
N
2
log(2π) −
1
2
log det{C(θ)} −
1
2
(zT
· C(θ)−1
z).
7
9. 4*
Low-rank tensor methods for spatial statistics
r3
3
2
r
1
r
r
r1
B
1
3
2
r
2
A
V
(1)
V
(2)
V
(3)
n
n
n
n
1
n
3
n
2
See more in
[A. Litvinenko, D. Keyes, V. Khoromskaia, B. N. Khoromskij, H. G. Matthies,
Tucker Tensor analysis of Mat´ern functions in spatial statistics, preprint
arXiv:1711.06874, 2017]Accepted for CMAM, March 9, 2018.
9
10. 4*
Two ways to find a low-rank tensor approx. of Mat´ern cov.
Task 1. Approximate Mat´ern covariance
10
11. 4*
Two ways to find a low-rank tensor approx. of Mat´ern cov.
We assume that U(ξ) = Fd (C) is known analytically and has a
low-rank tensor approximation U = r
j=1
d
ν=1 ujν. Then
F−d
(U) =
d
ν=1
F−1
ν
r
i=1
d
ν=1
uνi
=
r
i=1
d
ν=1
F−1
ν (uνi ) =
r
i=1
d
ν=1
˜uνi =: C
U(ξ) := Fd
(C(r)) = β · 1 +
2
2ν
|ξ|2
−ν−d/2
, (8)
11
12. 4*
Trace, diagonal, and determinant of C:
Let C ≈ ˜C = r
i=1
d
µ=1 Ciµ, then
diag(˜C) = diag
r
i=1
d
µ=1
Ciµ
=
r
i=1
d
µ=1
diag (Ciµ) . (9)
trace(˜C) = trace
r
i=1
d
µ=1
Ciµ
=
r
i=1
d
µ=1
trace(Ciµ), (10)
and for the determinant it holds only for r = 1
log det(C1⊗C2⊗C3) = n2n3log det C1+n1n3log det C2+n1n2log det C3.
(11)
assuming Ci ∈ Rni ×ni .
12
13. 4*
Discretization and basis
n × n × n Cartesian grid Ωn with mesh size h = 2b/n (n is even) in
Ω = [−b, b]3.
For the 3-tuple index i = (i1, i2, i3) ∈ I, I = I1 × I2 × I3, with
i ∈ I = {1, ..., n}.
Kernel q( x ) is discretized by its projection onto the basis set
{ψi}:
Q := [qi] ∈ Rn×n×n
, qi =
R3
ψi(x)q( x ) dx. (12)
The low-rank CP decomposition of Q is based on applying
exponentially convergent sinc-quadratures to the integral
representation of the function q(p), p ∈ R in the form
q(p) =
R
a1(t)e−p2a2(t)
dt,
specified by the weights a1(t), a2(t) > 0.
14. 4*
Existence of the canonical low-rank tensor approximation
Scheme of the proof of existence of the canonical low-rank tensor
approximation1.
It could be easier to apply the Laplace transform to the Fourier
transform of a Mat´ern covariance matrix, than to the Mat´ern
covariance. To approximate the resulting Laplace integral we apply
the sinc quadrature (The sinc method provides a low-rank
canonical representation and also used for proofs and rank
estimates).
1
A. Litvinenko, et al., Tucker Tensor analysis of Mat´ern functions in spatial
statistics, arXiv:1711.06874, 2017
15. 4*
sinc- approximation
Let (tk) be quadrature points with weights (ak):
tk = khM, and ak = a(tk)hM, where hM = C0log(M)/M, C0 > 0.
q( x ) =
R+
a(t)e−t2 x 2
dt ≈
M
k=−M
ake−t2
k x 2
=
M
k=−M
ak
3
=1
e−t2
k x2
,
providing an exponential convergence rate in M:
q( x ) −
M
k=−M
ake−t2
k x 2
≤
C
a
e−β
√
M
, with some C, β > 0.
15
16. 4*
sinc- approximation
qi ≈
M
k=−M
ak
R3
ψi(x)e−t2
k x 2
dx =
M
k=−M
ak
3
=1 R
ψ
( )
i (x )e−t2
k x2
dx .
Recalling that ak > 0, we define the vector q as
q
( )
k = a
1/3
k b
( )
i (tk)
n
i =1
∈ Rn
with b
( )
i (tk) =
R
ψ
( )
i (x )e−t2
k x2
dx .
Then, the third order tensor Q can be approximated by the R-term
(R = 2M + 1) canonical representation
Q ≈ QR =
M
k=−M
ak
3
=1
b( )
(tk) =
M
k=−M
q
(1)
k ⊗q
(2)
k ⊗q
(3)
k ∈ Rn×n×n
,
where q
( )
k ∈ Rn. Given a threshold ε > 0, M can be chosen as the
minimal number such that in the max-norm
Q − QR ≤ ε Q .
17. 4*
Numerics: relative error vs Tucker rank
C(x, y) = e− x−y , x, y ∈ R3 (denote after discretisation by Q)
5 10 15
Tucker rank
10 -10
10 -5
error
N=65 3
N=129 3
N=257 3
N=513 3
Relative error
Q−Q(r)
Q , where Q(r) is the tensor reconstructed
from the Tucker rank-r decomposition of Q vs Tucker ranks.
17
18. 4*
Convergence w.r.t. Tucker rank
Fast exponential convergence of the tensor approximation in the
Tucker rank. Let C(x) = e− x p
discretized on n1 × n2 × n3 3D
Cartesian grid with n = 100, = 1, 2, 3.
Tucker rank
5 10 15 20
error
10
-20
10
-15
10
-10
10 -5
10
0 p-Slater
p=0.1
p=0.2
p=1.9
p=2.0
index
0 50 100 150 200singularvalues
10 -40
10 -30
10 -20
10 -10
10 0
10 10
exp(-r)
(1+r)*exp(-r)
(1+r+r*r)*exp(-r)
Convergence in the Frobenius error w.r.t. the Tucker rank for the
function C(x) = e− x p
with p = 0.1, 0.2, 1.9, 2.0 (left); Decay
of singular values (right).
18
19. 4*
Convergence w.r.t. tucker rank
fα,ν(ρ) :=
const
(α2 + ρ2)ν+d/2
.
The Tucker decomposition rank is strongly dependent on the
parameter α and weakly depend on the parameter ν.
Tucker rank
5 10 15
error
10 -10
10 -5
10 0 SD Matern, α = 0.1
ν=0.1
ν=0.2
ν=0.4
ν=0.8
Tucker rank
5 10 15
error
10 -20
10 -15
10 -10
10 -5 SD Matern, α = 100
ν=0.1
ν=0.2
ν=0.4
ν=0.8
Convergence w.r.t Tucker rank of 3D spectral density of Mat´ern
covariance with α = 0.1 (left) and α = 100 (right).
19
20. 4*
3D spectral densities
The shape of 3D spectral density of Mat´ern covariance with
α = 0.1 (left) and α = 100 (right).
20
21. 4*
Tensor approximation of Gaussian covariance
Let cov(x, y) = exp−|x−y|2
be the Gaussian covariance function,
where x = (x1, .., xd ), y = (y1, ..., yd ) ∈ [a, b]d .
Then cov(x, y) = exp−|x1−y1|2
⊗ . . . ⊗ exp−|xd −yd |2
or
C = C1 ⊗ ... ⊗ Cd .
Lemma: If d Cholesky decompositions exist, i.e, Ci = Li · LT
i ,
i = 1..d. Then
C1 ⊗ ... ⊗ Cd = (L1LT
1 ) ⊗ ... ⊗ (Ld LT
d ) (13)
= (L1 ⊗ ... ⊗ Ld ) · (LT
1 ⊗ ... ⊗ LT
d ). (14)
Lemma: If inverse matrices C−1
i , i = 1..d, exist, then
(C1 ⊗ ... ⊗ Cd )−1
= C−1
1 ⊗ ... ⊗ C−1
d . (15)
The cost drops from O(NlogN), N = nd , to O(dnlogn).
21
22. 4*
Likelihood in d-dimensions
Example: N = 60003. Using MATLAB on a MacBookPro with 16
GB RAM, the time required set up the matrices C1, C2, and C3 is
11 seconds; it takes 4 seconds to compute L1, L2, and L3.
H-matrix approximate Ci and its H-Cholesky factor Li for
n = 2 · 106 cost 2 minutes. Here, assuming C = C1 ⊗ . . . ⊗ Cd , we
approximate C for N = (2 · 106)d in 2d minutes.
L ≈ ˜L = −
d
ν=1 nν
log(2π)
−
d
j=1
log det Cj
d
i=1,i=j
ni −
r
i=1
r
j=1
d
ν=1
(uT
i,ν, uj,ν).
22
23. 4*
Conclusion
Today we discussed:
basic functions and operators used in spatial statistics may be
represented using rank-structured tensor formats and that the
error of this representation exhibits the exponential decay with
respect to the tensor rank.
Fourier transform of Mat´ern function could be easier to
approximate as Mat´ern itself
Proof that Mat´ern covariance has a low-rank CP
representation (via Laplace transform and sinc-quadrature)
Dependence of tensor ranks on parameters of Mat´ern
covariance
Five typical statistical tasks in CP low-rank tensor format.
24. 4*
Tensor Software
Ivan Oseledets et al., Tensor Train toolbox (Matlab)
D.Kressner, C. Tobler, Hierarchical Tucker Toolbox (Matlab)
M. Espig, et al, Tensor Calculus library (in C)
Vervliet N., Debals O., Sorber L., Van Barel M. and De Lathauwer
L., TensorLab
24
25. 4*
Literature
1. A. Litvinenko, D. Keyes, V. Khoromskaia, B.N. Khoromskij, H.G.
Matthies, Tucker Tensor analysis of Matern functions in spatial statistics,
preprint arXiv:1711.06874, 2017
2. A. Litvinenko, HLIBCov: Parallel Hierarchical Matrix Approximation of
Large Covariance Matrices and Likelihoods with Applications in
Parameter Identification, preprint arXiv:1709.08625, 2017
3. A. Litvinenko, Y. Sun, M.G. Genton, D. Keyes, Likelihood Approximation
With Hierarchical Matrices For Large Spatial Datasets, preprint
arXiv:1709.04419, 2017
4. B.N. Khoromskij, A. Litvinenko, H.G. Matthies, Application of
hierarchical matrices for computing the Karhunen-Lo´eve expansion,
Computing 84 (1-2), 49-67, 2009
25
26. 4*
Literature
6. H.G. Matthies, E. Zander, B.V. Rosic, A. Litvinenko, Parameter
estimation via conditional expectation: a Bayesian inversion, Advanced
Modeling and Simulation in Engineering Sciences 3 (1), 24, 2016
7. H.G. Matthies, A Litvinenko, BV Rosic, E Zander, Bayesian Parameter
Estimation via Filtering and Functional Approximations, preprint
arXiv:1611.09293, 2016
8. H. G. Matthies, E. Zander, O. Pajonk, B. V. Rosic, A. Litvinenko, Inverse
Problems in a Bayesian Setting, Computational Methods for Solids and
Fluids Multiscale Analysis, Probability Aspects and Model Reduction
Editors: Ibrahimbegovic, Adnan (Ed.), ISSN: 1871-3033, pp 245-286,
2016
9. A. Litvinenko, Application of hierarchical matrices for solving multiscale
problems, Dissertation, Leipzig University, Germany,
http://www.wire.tu-bs.de/mitarbeiter/litvinen/diss.pdf, 2006
10. W. Nowak, A. Litvinenko, Kriging and spatial design accelerated by
orders of magnitude: Combining low-rank covariance approximations with
FFT-techniques, Mathematical Geosciences 45 (4), 411-435, 2013
26
27. 4*
Acknowledgement
1. KAUST Bayesian Computational Statistics and Modeling
(Prof. H. Rue)
2. KAUST Stochastic Numerics group (Prof. R. Tempone)
3. KAUST Extreme Computing Research Center (Prof. D.
Keyes)
27