Stochastic modelling and quasi-random numbers

Stochastic models + quasi-random
(Teytaud, Tao (Inria), Lri (Paris-Sud), UMR-Cnrs 8623, France;
OASE Lab, NUTN, Taiwan

First part: randomness.
What is a stochastic / randomized model
Terminology, tools

Second part: quasi-random points
Random points can be very disappointing
Sometimes quasi-random points are better

Useful maths
we will need these tools...
Prime number: 2,3,5,7,11,13,17,...
P(A|B): conditionning in probability.
P(dice=1 | dice in {1,2,3} ) ?
P(dice=3 | dice in {1,2} ) ?
Frequency in datas x(1),x(2),...,x(n):
1,2,6,3,7: frequency(odd) ?
frequency ( x(i+1) > x(i) ) ?
frequency ( x(i+1) > 3 | x(i) < 4 ) ?

Let's take time for understanding
random simulations
I guess you all know how to simulate a random
variable uniform in [0,1]
e.g. double u=drand48();
But do you know how to simulate one year of
weather in Tainan ?
Not so simple.
Let's see this in more details.

Random sequence
in dimension 1
What is a climate model ?
Define:
w1 = weather at time step 1
…
==> let's keep it simple, let's define the weather
by one single number in [0,1].

I want a generative model
As well as I can repeat u=drand48(), and
generate a sample u1, u2, u3, I want to be able
to generate
W1=(w11,w12,w13,...,w1T)
W2=(w21,w22,w23,...,w2T)
W3=...
…
==> think of a generator of
curves

Random sequence
in dimension 1
What is a climate model ?
Define:
The models tells you how can be w1. For example,
it gives the density function g:
P(w1 in I) = integral of g on I

0
1

Take-home message number 1:

a random variable w on R
is entirely defined by
P(w in I)
for each interval I

0
1

Random sequence
in dimension 1

P(w1 <= c) = integral of g on [-infinity,c] = G(c)

g
0
1

Generating w1: easy with the
inverse cumulative distribution
P(w1 <= c) = integral of g on [0,c] = G(c) G

Consider invG= inverse of G.
G=cumulative distribution
i.e. G(invG(x))=x
Trick for generating w1:
u=drand48()
w1=invG(u)=invCDF(u);
1 g
0

1

G

u=drand48()
w1=invG(u);
1
0

1

G

u=drand48()
w1=invG(u);

0

Take-home message number 2:

a random variable w on R
is more conveniently defined by
P(w <= t)
for each t,
and the best is
invCDF = inverse of (t → P(w<=t))
Because then:
w=invP(drand48());

Generating w2: also easy with
inv. cumulative distribution ?
w1=invG1(drand48()); can we
generate
w2=invG2(drand48()); each wi
w3=invG3(drand48()); independently ?
…
==> very easy !

==> but very bad :-(
==> no correlation :-(
==> w4 very high and w5 very low is unrealistic;
but in this model it happens very often!

Generating wi: also easy with
inv. cumulative distribution ?

Realistic:
large-scale variations!

Unrealistic;
and average value
almost constant

So how can we do ?

ould not give the (independent) distribution of w2, but the distribution of w2

rand48());
rand48());
rand48());

e sense ? This is a Markov Chain.
should NOT be generated independently!

Variant

ould not give the (independent) distribution of w2, but the distribution of w2

drand48());
w1, drand48());
w1, drand48());
w2, drand48());
w3, drand48());

kov chain
order 1 for today

Let's see an example
Assume that we have a plant.
This plant is a function:
(Production,State,Benefit) =
f( Demand , State , Weather )
Demand = g(weather,economy,noise)
(where Economy is the part of Economy
which is not too dependent on weather)
Benefit per year

Graphically
Weather:
w1, w2, w3, w4, w5, … ==> random sequence
==> we assume a distribution of w(i) | w(i-1)
==> this is a Markov Model ( forget w(i-2) )
Economy
e1, e2, e3, e4, e5, … ==> random sequence
==> we assume a distribution of e(i) | e(i-1)
Noise = given distribution
==> n1, n2, n3, ....

Graphically
m
ea
ns
:
de
e1 e2 pe
e3 e4 e5
n
de
nc
d1 d2 d3y d4 d5

w1 w2 w3 w4 w5

The “model” should tell you how to generate d2, given d1, e2,w2.
(ei,di,wi) is a Markov chain. (di) is a hidden Markov chain:
a part is hidden.

How to build a
stochastic model ?
It's about uncertainties
Even without hidden models, it's complicated
We have not discussed how to design a
stochastic model (typically from historical
archive):
Typically, discretization: w(k) in I1 or I2 or I3
with I1=[- ,a],
I2=]a,b], I3=]b,I ]
G(w,w')= frequency of w(k+1) <= w'
for w(k) in same interval as w

Yet another take-home message

Typically, discretization: w(k) in I1 or I2 or I3
with I1=[- ,a], I2=]a,b], I3=]b,, ]

G(w,w')= frequency of w(k+1) <= w'
for w(k) in same interval as w

(obviously more intervals in many real cases...)
==> However, this reduces extreme values

A completely different
approach ?
Write p1,p2,p3,...,pN all the parameters of the
model
Collect data x1,...,xD
For each i in {1,2,...,D}, xi=(xi1,...,xiT) = a curve
Optimize p1,p2,p3,...,pN so that all moments of
order <= 2 are (nearly) the same as the moments of
the archive.
Moment1(i) = (x1i+x2i+...+xDi)/D ==> where is i ?
Moment2(i,j) = average of

Example of parametric HMM
Parameters = { parameters of e, parameters of
w, parameters of d } = {15 sets of parameters }
= very big
e1 e2 e3 e4 e5

d1 d2 d3 d4 d5

w1 w2 w3 w4 w5

Main troubles
Ok, we know what is a stochastic model
The case of HMM is much more complicated
(but tools exist)
But gathering data is not always so easy.
For example, climate: do you trust the 50 last
years for predicting the next 10 years ?
Even if you trust the past 50 years, do you think
it's enough for building a sophisticated model ?
We need a combination between

Validation
Statistical models always lie
Because the structure is wrong
Because there are not enough data
==> typically, extreme values are more rare in
models than in reality
Check the extreme events
Usually, it's good to have more extreme values
than datas (because all models tend to make
them too rare...).

Example: French climate
France has a quite climate
No big wind
No heavy rains
6.2 times more
No heat wave than 921 earthquake!

But:
2003: huge heat wave. 15 000 died in France.
1999: hurricane-like winds (96 died in Europe;
gusts at 169 km/h in Paris)
1987: huge rain falls (96 mm in 24 hours)

Example: 2003 heat wave
Paris:
9 days with max temp. > 35°C
1 night with no less than 25.5°C <== disaster
France: 15 000 died
Italy: 20 000 died

==> European countries
were not ready for this

==> plenty of take-home messages
Bad model: air conditionning sometimes
automatically stopped because such
high temperatures = considered as
measurement bugs ==> extreme values
neglected
Heat wave + no wind ==> increased
pollution
==> old people die (babies carefully


Be careful with extreme values
neglected
==> extreme values are not always
measurement bugs
==> removing air conditionning
because it's too hot...
(some systems were not ready


Be careful with extreme values
neglected
==> extreme values are not always
measurement bugs

Independence is a very strong
assumption

Quasi-random points
(Teytaud, Tao (Inria), Lri (Paris-Sud), UMR-Cnrs 8623;
collabs with S. Gelly, J. Mary, S. Lallich, E. Prudhomme,...)

Quasi-random points ?
Dimension 1
Dimension n
Better in dimension n
Strange spaces

Why do we need random /
quasi-random points ?

Numerical integration [thousands of papers; Niederreiter 92]
integral(f) nearly equal to
sum f(xi)
Learning [Cervellera et al, IEEETNN 2004, Mary phD 2005]

Optimization [Teytaud et al, EA'2005]
Modelizat° of random-process [Growe-Kruska et al, IEEEBPTP'03]
Path planning [Tuffin]

Where do we need numerical
integration ?

Just everywhere.
Expected pollution (=average pollution...)
= integral of possible
pollutions as a function of many random
variables
(weather, defaults on pieces, gasoline, use
of the car...)

Take-home message

When optimizing
the design of something
which is built in a factory,
take into account the variance in the production
system ==> all cars are different.

==> very important effect
==> real piece != specifications

Why do we need numerical
integration ?

Expected benefit (=average benefit...)
benefit as a function of many random
variables
(weather, prices of raw materials...)

==> economical benefit (company)
==> overall welfare (state)

integration ?

Risk (=probability of failure...)
failures as a function of many random
variables
(quakes, flood, heat waves,
electricity breakdowns, human error...)

Take-home message

Human error must be taken
into account:

- difficult to modelize
- e.g. a minimum probability that action X
is not performed (for all actions)
(or that unexpected action Y is performed)
(what about an adversarial human ?)
==> protection by independent validations

integration ?

Expected benefit as a function
of many prices/random variables,

Expected efficiency depending on machining
vibrations

Evaluating schedulings in industry (with
random events like faults, delay...)
(e.g. processors)

How to know if some points
are well distributed ?

I propose N points x=(x1,...,xN)
How to know if these points are well distributed ?
A naive solution:

f(x)=max min ||y-xi|| (maximized)
y i

(naive, but not always so bad)

How to know if some points
are well distributed ?

I propose N points x=(x1,...,xN)
How to know if these points are well distributed ?
A naive solution:

g(x)=min min ||xj-xi||2 (maximized)
i j!=i

= “dispersion” (naive, but not always so bad)

Low Discrepancy ?
Discrepancy = Sup |Area – Frequency |
Rectangle

Low Discrepancy ?
Discrepancy2 = mean ( |Area – Frequency |2 )
Rectangle

Is there better than random
points for low discrepancy ?
Random --> Discrepancy ~ sqrt ( 1/n )
Quasi-random --> Discrepancy ~ log(n)^d/n
Quasi-random with N known --> Discrepancy ~ log(n)^(d-1)/n

Koksma & Hlawka :
error in Monte-Carlo integration
< Discrepancy x V

V= total variation (Hardy & Krause)
( many generalizations in Hickernel, A Generalized
Discrepancy and Quadrature Error Bound, 1997 )

==> sometimes V or log(n)^d huge
==> don't always trust QR

Dimension 1
What would you do ?

Dimension 1
What would you do ?
--> Van Der Corput
n=1, n=2, n=3...
n=1, n=10, n=11, n=100, n=101, n=110... (p=2)
x=.1, x=.01, x=.11, x=.001, x=.101, … (binary!)

Dimension 1
What would you do ?
--> Van Der Corput
n=1, n=2, n=3...
n=1, n=2, n=10, n=11, n=12, n=20... (p=3)
x=.1, x=.2, x=.01, x=.11, x=.21, x=.02... (ternary!)

Dimension 1 more general
p=2, but also p=3, 4, ...

but p=13 is not very nice :

Dimension 2: maybe just
use two Van Der Corput sequences
with same p ?
x --> (x,x) ?

Dimension 2
x --> (x,x') ? with two different basis.

Dimension 2 or n : Halton
x --> (x,x') with diff. prime numbers is ok
(needs
maths...)
(as small
numbers
are better,
use the n
smallest...)

Dimension n+1 : Hammersley
(n/(N+1),xn,x'n) --> closed sequence
(i.e.,
number N
known
in
advance)

Dimension n : the trouble
There are not so many small prime numbers

Dimension n : scrambling
(here, random comes back)
Pi(p) : [1,p-1] --> [1,p-1]
Pi(p) applied to
coordinate with
prime number p

Dimension n : scrambling
Pi(p) : [1,p-1] --> [1,p-1] (randomly chosen)
Pi(p) applied to coordinate with prime p (there
is much more complicated)

Beyond low discrepancy ?
Other discrepancies : why rectangles ?
Other solutions : lattices
{x0+nx} modulo 1
(very fast and simple)
Let's see very different approaches
Low discrepancy for other spaces than [0,1]^n
Stratification
Symmetries

Some animals
are quite good
Why in the square ? for
low-discrepancy

Other spaces/distributions:gaussians,sphere

Why in the square ?
Uniformity in the square is ok
But what about Gaussians distributions ?
x in ]0,1[^d
y(i) such that P( N > y(i) ) = x(i)
with N standard gaussian
then y is quasi-random and gaussian
==> so you can have
quasi-random Gaussian numbers

Why in the square ?
Other n-dimensionnal random variables by the
“conditionning” trick
Consider a QR point: (x1,....xn) in [0,1]^n
You want to simulate z with distribution Z
z1=inf { z; P(Z1<z) >x1 } = invG1(x1)
z2=inf { z; P(Z2<z|Z1=z1) > x2 } =
invG2(z1,x2)
z3=inf { z; P(Z3<z|Z1=z1,Z2=z2) > x3 } =
invG2(z1,z2,x3)

Why in the square ?

Theorem: If x is random([0,1]n),
then z is distributed as Z !

==> convert the uniform square into strange spaces or variables

Why not for random walks ?
500 steps of random walks ==> huge
dimension
Quasi-random basically does not work in huge
dimension
But first coordinates of QR are ok; just use
them for most
important coordinates! ==> change the
order of variables
and use conditionning !

Quasi-random number x in R^500
(e.g. Gaussian)
Change order: y(250) first (y(250) ---> x(1) )
y(1 | y(250) ) <---> x(2)
y(500 | y(1) and y(250)) <---> x(3)

dimension
But strong derandomization possible : start by
y(250), then y(1), then y(500), then y(125), then
y(375)...

dimension
But strong derandomization possible :

Very different approaches for
derandomization ?
Symetries : instead of
x1 and x2 in [0,1],
try

x and 1-x
Or more generally, just draw n/2 points,
and
use their symetries
==> in dimension d, n/2d points and their 2d

Free !

Symmetries in Octave/Matlab

x=rand(800,2);
subplot(2,2,1);plot(x(:,1),x(:,2),'+');

x=rand(400,2);x=[x;1-x];
subplot(2,2,2);plot(x(:,1),x(:,2),'+');

x=rand(200,2);x=[x;1-x;x(:,1),1-x(:,2);1-x(:,1),x(:,2)];
subplot(2,2,3);plot(x(:,1),x(:,2),'+');

x=rand(100,2);x=[x;1-x;x(:,1),1-x(:,2);1-x(:,1),x(:,2)];x=[x;x(:,2),x(:,1)];
subplot(2,2,4);plot(x(:,1),x(:,2),'+');

Antithetic variables in
Octave/Matlab

x=rand(800,2);
subplot(2,2,1);plot(x(:,1),x(:,2),'+');

x=rand(400,2);x=[x;1-x];
subplot(2,2,2);plot(x(:,1),x(:,2),'+');

x=rand(200,2);x=[x;1-x;x(:,1),1-x(:,2);1-x(:,1),x(:,2)];
subplot(2,2,3);plot(x(:,1),x(:,2),'+');

x=rand(100,2);x=[x;1-x;x(:,1),1-x(:,2);1-x(:,1),x(:,2)];x=[x;x(:,2),x(:,1)];
subplot(2,2,4);plot(x(:,1),x(:,2),'+');

derandomization ?
Control : instead of estimating
E f(x)
Choose g “looking like” f and estimate
E (g-f)(x)
Then E f = E g +E(g-f) is much better
Troubles:
You need a good g
You must be able of evaluating Eg

derandomization ?
Pi-estimation : instead of estimating
E f(x)
Look for y with density ≃(f)d(x)
Then E f(x) = E f(y) d(x)/d(y)
==> Variance is much better
Troubles:
You have to generate y
You have to know (f)

derandomization ?
Stratification (jittering) :
Instead of generating n points i.i.d
Generate
k points in stratum 1
k points in stratum 2
...
k points in stratum m
with m.k=n ==> more stable ==>
depends on the choice of strata

Simple stratification

x=rand(40,2);
subplot(1,2,1);
plot(x(:,1),x(:,2),'+')
;

Summary on MC
improvements ?
In many books you will read that quasi-random
points are great.
Remember that people who spend their life
studying quasi-random numbers will rarely
conclude that all this was a bit useless.
Sometimes it's really good.
Sometimes it's similar to random.
Modern Quasi-Monte-Carlo methods
(randomized) are usually at least as good as
random methods ==> no risk.

Summary on MC
improvements ?
Carefully designing the model (from data) is
often more important than the randomization.
Typically, neglecting dependencies is often a
disaster.
Yet, there are cases in which improved MC are
the key.
Remarks on random search: dispersion much
better than discrepancy...

Biblio (almost all on google)
“Pi-estimation” books for stratification, symmetries, ...
Owen, A.B. "Quasi-Monte Carlo Sampling", A Chapter on
QMC for a SIGGRAPH 2003 course.

Fred J. Hickernell, A generalized discrepancy and
quadrature error bound, 1998
B. Tuffin, On the Use of low-Discrepancy sequences
in Monte-Carlo methods, 1996
Matousek, Geometric Discrepancy (book 99)

these slides : http://www.lri.fr/~teytaud/btr2.pdf
or http://www.lri.fr/~teytaud/btr2.ppt

Stochastic modelling and quasi-random numbers

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (16)

Similar a Stochastic modelling and quasi-random numbers

Similar a Stochastic modelling and quasi-random numbers (20)

Último

Último (20)

Stochastic modelling and quasi-random numbers