Stochastic Approximation and Simulated Annealing

Lecture 8

Stochastic Approximation and
Simulated Annealing

Leonidas Sakalauskas
Institute of Mathematics and Informatics
Vilnius, Lithuania <sakal@ktl.mii.lt>

EURO Working Group on Continuous Optimization

Content

Introduction.
Stochastic Approximation:
SPSA with Lipschitz perturbation operator;
SPSA with Uniform perturbation operator;
Standard Finite Difference Approximation
algorithm.
Simulated Annealing
Implementation and Applications
Wrap-Up and Conclusions

Introduction

In many practical problems of technical design
some of the data may be subject to significant
uncertainty which is reduced to probabilistic-
statistical models.

The performance of such problems can be
viewed like constrained stochastic optimization
programming tasks.

Stochastic Approximation can be considered as
alternative to traditional optimization methods,
especially when objective functions are no
differentiable or computed with noise.

Stochastic Approximation

Application of Stochastic Approximation to
solving of optimization problems, while the
objective function is non-differentiable or
nonsmooth and computed with noise is a topical
theoretical and practical problem.

The known methods of Stochastic
Approximation for solving of these problems use
the idea of stochastic gradient and certain rules
of changing of step length for ensuring the
convergence.

Formulation of the optimization problem

The optimization problem is (minimization)
as follows:

f x min
x n

where f : is a bounded from below Lipshitz
n

function.

Formulation of the optimization problem

Let f ( x ) be generalized gradient of this function.

Assume X * to be a set of stationary points:

and F * to be a set of function values:

X* x0 f x ,

F* zz f x ,x X* .

We consider a function smoothed by
perturbation operator:

f x, Ef x , ~p

where 0 is the value of the perturbation
parameter.

The functions smoothed by this operator are
twice continuously differentiable (Rubinstein &
Shapiro (1993), Bartkute & Sakalauskas
(2004)), that offers certain opportunities
creating optimization algorithms.

Advantages of SPSA

At last time the interesting research was focussed
on Simulated Perturbation Stochastic
Approximation (SPSA)

It is enough to calculate values of the function
only in one or some points for the estimation of
the stochastic gradient in SPSA algorithms, that
promises for us to reduce numerical complexity
of optimization.

SA algorithms

1. SPSA with Lipschitz perturbation operator.

2. SPSA with Uniform perturbation operator.

3. Standard Finite Difference Approximation
algorithm.

General Stochastic Approximation scheme

xk 1
xk k g k , k 1, 2, ...

where g k g xk , k , k stochastic gradient

and g x, E g x, , , g x, g x , 0.

This scheme is the same for different Stochastic
Approximation algorithms whose distinguish only by
approach for stochastic gradient estimation.

SPSA with Lipschitz perturbation operator
Gradient estimator of the SPSA with Lipschitz perturbation operator is
expressed as:

f x f x
g x, ,

where -is the value of the perturbation parameter,

vector -is uniformly distributed in the unit ball
1
, if y 1,
y Vn
0 , if y 1.

Vn -is the volume of the n-dimensional ball (Bartkute & Sakalauskas
(2007))

SPSA with Uniform perturbation operator

Gradient estimator of the SPSA with Uniform perturbation operator is
expressed as:

f x f x
g x, ,
2


1, 2 , .... , n -is a vector consisting of variables uniformly
distributed from the interval [-1;1] (Mikhalevitch et al
(1987)).

Standard Finite Difference Approximation algorithm

Gradient estimator of the Standard Finite Difference Approximation
algorithm is expressed as:

f x i f x
gi x , , ,

vector
-is uniformly distributed in the unit ball;

i 0,0,0,....,1,.....,0
-is the vector with zero components except ith one,
which is equal to 1. (Mikhalevitch et al (1987)).

Rate of convergence

Let consider that the function f(x) has a sharp
minimum in the point x * , in which the algorithm
converges
a b
, a 0, k , b 0, 0 a 1
k 1, .
when k k b 2 H

k 1 * 2 A K2 a b 1 1
E x x o 2 aH
,
k 1
H k1 b
k
Then

where A>0, H>0, K>0 are certain constants, x* k 1
is
minimum point of the smoothed function.

Computer simulation

The proposed methods were tested with following
functions: n
f ak xk M
k 1
where ak is a set of real numbers randomly and
uniformly generated in the interval ,K ,
K 0.
The samples of T=500 test functions were generated,
when 2, K 5.

Empirical and theoretical rates of convergence by SA methods

0.5 0.75 0.9

Theoretical
1.5 1.75 1.9
rates
Empirical rates
SPSA (Lipshitz perturbation)

n=2 1.45509 1.72013 1.892668

n=4 1.41801 1.74426 1.958998
SPSA ( Uniform perturbation)

n=2 1.605244 1.938319 1.988265

n=4 1.551486 1.784519 1.998132
Stochastic Difference Approximation method

n=2 1.52799 1.76399 1.90479

n=4 1.50236 1.75057 1.90621

The rate of convergence (n = 2)
2
E xk x*

The rate of convergence (n = 10)

2
E xk x*

Volatility estimation by Stochastic Approximation algorithm

Let us consider the application of SA to the
minimization of the mean absolute pricing error for
the parameter calibration in the Heston Stochastic
Volatility model [Heston S. L.(1993)].
We consider the mean absolute pricing error
(MAE) defined as :
N
1
MAE , , , , v, CiH , , , , v, Ci
N i 1

where N is the total number of options, C i and CiH represent
the realized market price and the implied the theoretical model price,
respectively, while , , , , v, (n=6) are the parameters of the
Heston model to be estimated.

To compute option prices by the Heston model,
one needs input parameters that can hardly be found
from the market data.
We need to estimate the above parameters by an
appropriate calibration procedure. The estimates of
the Heston model parameters are obtained by
minimizing MAE:

MAE , , , , v, min

Let consider the Heston model for the Call option
on SPX (29 May 2002).

Minimization of the mean absolute pricing error
by SPSA and SFDA methods

Optimal Design of Cargo Oil Tankers

In cargo oil tankers design, it is necessary to
choose such sizes for bulkheads, that the weight of
bulkheads would be minimal.

The minimization of weight of bulkheads for the cargo oil tank we can
formulate like nonlinear programing task (Reklaitis et al (1986)):
5.885 x4 x1 x3
f x min
2 2
x1 x 3 x 2
subject to 1 2 2
g1 x x2 x4 0.4 x1 x3 8.94 x1 x3 x2 0
6
4
2 1 2 2 3
g2 x x x 4 0.2 x1
2 x3 2.2 8.94 x1 x 3 x 2 0
12
g3 x x4 0.0156 x1 0.15 0

g4 x x4 0.0156 x3 0.15 0

g5 x x4 1.05 0

g6 x x3 x2 0

where x1- width, x2 -debt, x 3 - lenght, x4 - thikness.

SPSA with Lipschitz perturbation for the
cargo oil target design
7.5

7.4

7.3

7.2

7.1

7

6.9

6.8

6.7

6.6

6.5
100 1000 1900 2800 3700 4600 5500 6400 7300 8200 9100 10000

Number of iterations

Confidence bounds of the minimum
(A=6.84241, T=100, N=1000)

7.1
Upper bound
7
6.9 Lower bound
6.8
Minimum of the
6.7 objective function
6.6
6.5
6.4
2 102 202 302 402 502 602 702 802 902 Number of iterations

Simulated Annealing
Global optimization methods

 Global algorithms (bounds and branch
algorithms, dynamic programming,
full selection, etc)
 Greedy optimization (local search)
 Heuristic optimization

Metaheuristics
 Simulated Annealing
 Genetic Algorithms
 Swarm Intelligence
 Ant Colony
 Taboo search
 Scatter search
 Variable neighborhood
 Neural Networks
 Etc.

Simulated Annealing algorithm

Simulated Annealing algorithm is
developed by modeling steel
annealing process (Metropolis et al.
(1953))

A lot of applications in Operational
Research and Data Analysis, etc.

Simulated Annealing

Main idea:
to simulate drift of current solution with
probability distribution P( x, T k )

to improve solution updating
- temperature function Tk
- neighborhood function k

Simulated Annealing algorithm
0
Step 1. Choose , T x , set k 0,
0.
0

Step 2. Generate drift Z k 1 with probability
distribution P( x, T k )
Step 3. If Zk 1 k

and f ( xk ) f ( xk Z k 1 ) (Metropolis rule)
Tk
e  U (0,1)

then accept: k 1 k k 1 ; k=k+1; otherwise Step 2
x x Z

Improvement of SA by Pareto Type
models
The theoretical investigation of SA convergence shows,
that in these algorithms Pareto type models can be
applied to form search sequence (Yang (2000)).

Class of Pareto models, main feature and parameter:
Pareto model’s distributions have "heavy tails“.
α - the main parameter of these models, which impacts
the heaviness of the tail
α –stable distributions are Pareto (follows to C.L.T.)

Pareto type (Heavy-tailed)
distributions
Main features:
infinite variance, infinite mean

Introduced by Pareto in the 1920’s
Mandelbrot established the use of heavy-tailed
distributions to model real-world fractal
phenomena.

There are a lot of other applications (financial
market, traffic in computer and
telecommunication networks, etc.).

Pareto type (Heavy-tailed)
distributions
Heavy-Tailed - Power Law has polynomial decay (e.g.
Pareto-Levy):

P{X x}~ Cx ,x 0
where 0 < α < 2 and C > 0 are constants

Comparison of tail probabilities
for standard normal, Cauchy and Levy
distributions

In this table were compared the
tail probabilities for the three
distributions. It is clear that the
tail probability for the normal
quickly becomes negligible,
whereas the other two
distributions have a significant
probability mass in the tail.

Improvement of SA by Pareto type
models
The convergence conditions (Yang (2000)) indicate that,
under suitable conditions, an appropriate choice of the
temperature and neighborhood size updating functions
ensures the convergence of the SA algorithm to the
global minimum of the objective function over the
domain of interest.

The following corollaries give different forms of
temperature and neighborhood size updating functions
corresponding to different kinds of generation probability
density functions to guarantee the global convergence
of the SA algorithm.

Convergence of Simulated Annealing

Improvement of SA in continuous
optimization

The above corollaries indicate that a different form of temperature
updating function has to be used with respect to a different kind of
generation probability density function in order to ensure the global
convergence of the corresponding SA algorithm.

Convergence of Simulated Annealing

Some Pareto-type models , Table 1.

Testing of SA for continuous optimization

In global and combinatorial optimization problems, when
optimization algorithms are used, the reliability and
efficiency of these algorithms is needed to be tested.
Special testing functions, known in literature, are used for
this.
Some of these functions have one or more global minimum,
some of them have global and local minimums.
With the help of these functions it can be ensured, that the
methods are efficient enough, thus, it is possible to test and
prevent algorithms from being trapped in local minimum, as
well as the speed and accuracy of convergence and other
parameters can be watched.

Testing criteria
By modeling SA algorithm with some testing functions
with two different distributions, and changing some
optional parameters, there were some questions:
 which of these distributions guarantees the faster
convergence to global minimum by value of
objective function;
 what are probabilities of finding global minimum,
how can impact these probabilities the changing
of some parameters;
 what the proper number of iterations, which
guarantees the finding global minimum with
desirable probability.

Testing criteria

Characteristics to be evaluated
by Monte-Carlo simulation:
 value of minimized objective function;
 probability to find global minimum after
some number of iterations.

These characteristics were computed by
Monte-Carlo method - N realizations
(N=100, 500, 1000) with K iterations each
(K=100, 500, 1000, 3000, 10000, 30000).

Test functions

An example of test function:
Branin’s RCOS (RC) function (2 variables):

RC(x1,x2)=(x2-(5/(4 2))x12+(5/ )x1-
6)2+10(1-(1/(8 )))cos(x1)+10;
Search domain:
5 < x1 < 10, 0 < x2 < 15;

3 minima:
(x1 , x2)*=(- , 12.275), ( , 2.275), (9.42478 ,
2.475);
RC((x1 , x2)*)=0.397887.

Simulation results
1
0,9
0,8
0,7
0,6
0,5
0,4
0,3
0,2
0,1
0
1 200 400 600 800 1000 1200 1400 1600 1800 2000 2200 2400 2600 2800 3000
Iteracijų skaičius

Fig. 1. Probability to find global minimum
by SA for Rastrigin function

Wrap-Up and Conclusions

1. The SA methods have been considered for
comparison SPSA with Lipschitz perturbation
operator; SPSA with Uniform perturbation
operator and SFDA method as well Simulated
Annealing;

2. Computer simulation by Monte-Carlo method has
shown that the empirical estimates of the rate of
convergence of SA for nondifferentiable functions
corroborate the theoretical rates O 1 , 1 2
k

Stochastic Approximation and Simulated Annealing

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (8)

Similar to Stochastic Approximation and Simulated Annealing

Similar to Stochastic Approximation and Simulated Annealing (20)

More from SSA KPI

More from SSA KPI (20)

Recently uploaded

Recently uploaded (20)

Stochastic Approximation and Simulated Annealing