The cross-entropy method is a parametric technique performing adaptive importance sampling in Monte-Carlo methods. Information about the distribution being integrated is collected as samples are generated from a parametric sampling distribution. The parameters of this distribution are iteratively updated to minimize the cross-entropy between the sample of the distribution of interest and the sampling distribution. This versatile adaptive technique has found many applications in rare even simulation, combinatorial optimization and optimization of functions with multiple extrema. Through a series of use cases, I'll present a quick practioner's guide to using the cross-entropy method and will discuss common tricks and pitfalls.
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Simulation of rare events and optimisation with the cross-entropy method
1. What is cross-entropy?
From Riemann to Monte-Carlo
Cross-Entropy techniques
Cross-Entropy tricks
Questions
Using cross-entropy techniques for rare event
simulation and optimization
Arthur Breitman
NYC Machine learning meetup
August 18, 2011
Arthur Breitman crossentropy for rare event simulation and optimization
2. What is cross-entropy?
From Riemann to Monte-Carlo
Entropy
Cross-Entropy techniques
Kullback-Leibler divergence
Cross-Entropy tricks
Questions
Outline
What is cross-entropy?
Entropy
Kullback-Leibler divergence
From Riemann to Monte-Carlo
Riemann integration
Monte-Carlo integration
Importance sampling
Cross-Entropy techniques
Analytical expressions
Simulation of rare events
Optimization
Fitting parameters
Cross-Entropy tricks
Multiple maxima
Slow convergence
Questions Arthur Breitman crossentropy for rare event simulation and optimization
3. What is cross-entropy?
From Riemann to Monte-Carlo
Entropy
Cross-Entropy techniques
Kullback-Leibler divergence
Cross-Entropy tricks
Questions
Information entropy
definition of information entropy
◮ Entropy measures disorder of a physical system
◮ Entropy measures information (Shannon)
◮ Entropy measures ignorance (E.T. Jaynes)
◮ Formally:
H=− p(x) ln(p(x))
x∈Ω
Arthur Breitman crossentropy for rare event simulation and optimization
4. What is cross-entropy?
From Riemann to Monte-Carlo
Entropy
Cross-Entropy techniques
Kullback-Leibler divergence
Cross-Entropy tricks
Questions
The continuous case
In the continuous case, for a random variable X with p.d.f p(x)
entropy is defined as
H(X ) = − P(x) ln(p(x))dx
Ω
Simple, right?
Arthur Breitman crossentropy for rare event simulation and optimization
5. What is cross-entropy?
From Riemann to Monte-Carlo
Entropy
Cross-Entropy techniques
Kullback-Leibler divergence
Cross-Entropy tricks
Questions
The entropy of a probability distribution is meaningless
Wrong!
◮ Not invariant under a change of variable
◮ Can even be negative!
◮ Not an extension of Shannon’s entropy.
Arthur Breitman crossentropy for rare event simulation and optimization
6. What is cross-entropy?
From Riemann to Monte-Carlo
Entropy
Cross-Entropy techniques
Kullback-Leibler divergence
Cross-Entropy tricks
Questions
E.T. Jaynes to the rescue
E.T. Jaynes, adjusted the definition. Consider a sequence of
discrete values in Ω dense in Ω, it must a approach a distribution
m. Set
p(x)
H(X ) = − P(x) ln dx
Ω m(x)
N.B. m is not necessarily a probability distribution, just a density,
so improper priors are O.K.
Arthur Breitman crossentropy for rare event simulation and optimization
7. What is cross-entropy?
From Riemann to Monte-Carlo
Entropy
Cross-Entropy techniques
Kullback-Leibler divergence
Cross-Entropy tricks
Questions
Outline
What is cross-entropy?
Entropy
Kullback-Leibler divergence
From Riemann to Monte-Carlo
Riemann integration
Monte-Carlo integration
Importance sampling
Cross-Entropy techniques
Analytical expressions
Simulation of rare events
Optimization
Fitting parameters
Cross-Entropy tricks
Multiple maxima
Slow convergence
Questions Arthur Breitman crossentropy for rare event simulation and optimization
8. What is cross-entropy?
From Riemann to Monte-Carlo
Entropy
Cross-Entropy techniques
Kullback-Leibler divergence
Cross-Entropy tricks
Questions
Definition of KL divergence
Kullback-Leibler divergence: entropy of a probability distribution p
relative to probability distribution q
p(x)
DKL (P||Q) = − P(x) ln dx
Ω q(x)
◮ Similar but distinct from entropy.
◮ Expected number of nats (or bits) to encode data drawn from
Q assuming it is drawn from P.
◮ Not symmetric!
Arthur Breitman crossentropy for rare event simulation and optimization
9. What is cross-entropy?
From Riemann to Monte-Carlo
Entropy
Cross-Entropy techniques
Kullback-Leibler divergence
Cross-Entropy tricks
Questions
Why code length matter
◮ All ML problems ⇔ fitting a probability distribution
◮ KL divergence measures how concise your description is
◮ Relates to MDL and Solomonoff induction
◮ PAC-learning patches against a lack of epistemology
Arthur Breitman crossentropy for rare event simulation and optimization
10. What is cross-entropy?
From Riemann to Monte-Carlo
Entropy
Cross-Entropy techniques
Kullback-Leibler divergence
Cross-Entropy tricks
Questions
Likelihood of parameters and Cross-Entropy
Given a sample {q}i of Q, and {P}θ∈Θ ,
1
LL(θ|{q}i ) = H(Pθ ) + DKL Pθ δqi
N
i
The likelihood of θ is the KL-divergence of Pθ w.r.t a Dirac comb.
Arthur Breitman crossentropy for rare event simulation and optimization
11. What is cross-entropy?
From Riemann to Monte-Carlo Riemann integration
Cross-Entropy techniques Monte-Carlo integration
Cross-Entropy tricks Importance sampling
Questions
Outline
What is cross-entropy?
Entropy
Kullback-Leibler divergence
From Riemann to Monte-Carlo
Riemann integration
Monte-Carlo integration
Importance sampling
Cross-Entropy techniques
Analytical expressions
Simulation of rare events
Optimization
Fitting parameters
Cross-Entropy tricks
Multiple maxima
Slow convergence
Questions Arthur Breitman crossentropy for rare event simulation and optimization
12. What is cross-entropy?
From Riemann to Monte-Carlo Riemann integration
Cross-Entropy techniques Monte-Carlo integration
Cross-Entropy tricks Importance sampling
Questions
Riemann integration
How does one compute the integral of a function? Rectangle
method:
b N−1
1 i
f (x)dx → f a + (b − a)
a N N
i=0
Linear convergence.
Arthur Breitman crossentropy for rare event simulation and optimization
13. What is cross-entropy?
From Riemann to Monte-Carlo Riemann integration
Cross-Entropy techniques Monte-Carlo integration
Cross-Entropy tricks Importance sampling
Questions
The curse of dimensionality
Multiple dimensions?
b1 bm N−1 N−1
1 1
··· f (x)dx → m ··· f a+ i ◦ (b − a)
a1 am N N
i1 =0 im =0
Computation is exponential in m.
Arthur Breitman crossentropy for rare event simulation and optimization
14. What is cross-entropy?
From Riemann to Monte-Carlo Riemann integration
Cross-Entropy techniques Monte-Carlo integration
Cross-Entropy tricks Importance sampling
Questions
Outline
What is cross-entropy?
Entropy
Kullback-Leibler divergence
From Riemann to Monte-Carlo
Riemann integration
Monte-Carlo integration
Importance sampling
Cross-Entropy techniques
Analytical expressions
Simulation of rare events
Optimization
Fitting parameters
Cross-Entropy tricks
Multiple maxima
Slow convergence
Questions Arthur Breitman crossentropy for rare event simulation and optimization
15. What is cross-entropy?
From Riemann to Monte-Carlo Riemann integration
Cross-Entropy techniques Monte-Carlo integration
Cross-Entropy tricks Importance sampling
Questions
Monte-Carlo integration
If P is a probability distribution over Ω, draw {x}i from P:
N
1 f (xi )
f (x)dx ∼
Ω N p(xi )
i=1
Very simple to implement, often p ∼ 1
Arthur Breitman crossentropy for rare event simulation and optimization
16. What is cross-entropy?
From Riemann to Monte-Carlo Riemann integration
Cross-Entropy techniques Monte-Carlo integration
Cross-Entropy tricks Importance sampling
Questions
Monte-Carlo convergence
◮ Let random variable Xp = f (x)/p(x)
◮ If var(Xp ) < ∞, convergence is O(N 1/2 ) by the central-limit
theorem!
◮ If m > 2, Monte-Carlo becomes attractive.
Arthur Breitman crossentropy for rare event simulation and optimization
17. What is cross-entropy?
From Riemann to Monte-Carlo Riemann integration
Cross-Entropy techniques Monte-Carlo integration
Cross-Entropy tricks Importance sampling
Questions
Problems with MC
◮ If the mass of f is concentrated in a small region, convergence
can be very slow.
◮ also a problem with Riemann integration...
Arthur Breitman crossentropy for rare event simulation and optimization
18. What is cross-entropy?
From Riemann to Monte-Carlo Riemann integration
Cross-Entropy techniques Monte-Carlo integration
Cross-Entropy tricks Importance sampling
Questions
Outline
What is cross-entropy?
Entropy
Kullback-Leibler divergence
From Riemann to Monte-Carlo
Riemann integration
Monte-Carlo integration
Importance sampling
Cross-Entropy techniques
Analytical expressions
Simulation of rare events
Optimization
Fitting parameters
Cross-Entropy tricks
Multiple maxima
Slow convergence
Questions Arthur Breitman crossentropy for rare event simulation and optimization
19. What is cross-entropy?
From Riemann to Monte-Carlo Riemann integration
Cross-Entropy techniques Monte-Carlo integration
Cross-Entropy tricks Importance sampling
Questions
Importance sampling
◮ Sample preferably the regions of interest by picking p to
minimize the variance of f /p
◮ In Riemann world, equivalent to an irregular grid
f
◮ Ideal sampling distribution (if f > 0) is f
, but we don’t
know f!
◮ Best convergence when χ2 of f w.r.t p is minimized
Arthur Breitman crossentropy for rare event simulation and optimization
20. What is cross-entropy?
From Riemann to Monte-Carlo Riemann integration
Cross-Entropy techniques Monte-Carlo integration
Cross-Entropy tricks Importance sampling
Questions
Adaptive importance sampling
◮ What if we don’t know the shape of f ?
◮ Learn it adaptively from the sampling.
◮ Iteratively improve the importance sampling function.
Arthur Breitman crossentropy for rare event simulation and optimization
21. What is cross-entropy?
From Riemann to Monte-Carlo Riemann integration
Cross-Entropy techniques Monte-Carlo integration
Cross-Entropy tricks Importance sampling
Questions
Vegas algorithm and cross-entropy
◮ Vegas algorithm, use histograms and separate variables
◮ Cross-entropy algorithm, pick p from a family of distributions
to minimize cross-entropy to the sample
Arthur Breitman crossentropy for rare event simulation and optimization
22. What is cross-entropy?
Analytical expressions
From Riemann to Monte-Carlo
Simulation of rare events
Cross-Entropy techniques
Optimization
Cross-Entropy tricks
Fitting parameters
Questions
Outline
What is cross-entropy?
Entropy
Kullback-Leibler divergence
From Riemann to Monte-Carlo
Riemann integration
Monte-Carlo integration
Importance sampling
Cross-Entropy techniques
Analytical expressions
Simulation of rare events
Optimization
Fitting parameters
Cross-Entropy tricks
Multiple maxima
Slow convergence
Questions Arthur Breitman crossentropy for rare event simulation and optimization
23. What is cross-entropy?
Analytical expressions
From Riemann to Monte-Carlo
Simulation of rare events
Cross-Entropy techniques
Optimization
Cross-Entropy tricks
Fitting parameters
Questions
Why cross-entropy?
In many cases, the expression is analytical and computationally
cheap to derive, e.g.
◮ the uniform distribution
◮ the categorical distribution (finite, discrete)
◮ all the natural exponential family
Arthur Breitman crossentropy for rare event simulation and optimization
24. What is cross-entropy?
Analytical expressions
From Riemann to Monte-Carlo
Simulation of rare events
Cross-Entropy techniques
Optimization
Cross-Entropy tricks
Fitting parameters
Questions
The natural exponential distribution?
fX (x|θ) = h(x) exp (θ∗ x − A(θ))
◮ theta is the sufficient statistic
◮ maximum cross-entropy distribution given θ w.r.t dH
◮ Examples: normal, multivariate normal, gamma, binomial,
multinomial, negative binomial
Arthur Breitman crossentropy for rare event simulation and optimization
25. What is cross-entropy?
Analytical expressions
From Riemann to Monte-Carlo
Simulation of rare events
Cross-Entropy techniques
Optimization
Cross-Entropy tricks
Fitting parameters
Questions
Beta distribution
Not analytical! To fit, start with approximate values from the
moment’s method
¯ ¯
X (1 − X ¯ ¯
X (1 − X
¯
α=X ¯
− 1 , β = (1 − X ) −1
S2 S2
The likelihood is given by
n n
n(ln(Γ(α+β)−ln(Γ(α)−ln(Γ(β))+(α−1) ln(Xi )+(β−1) ln(1−Xi )
i=0 i=0
The first and second derivatives are the digamma and trigamma
function, available in the gsl. Newton’s method using the Jacobian
converges in a couple iterations. Very useful to model bounded
variables.
Arthur Breitman crossentropy for rare event simulation and optimization
26. What is cross-entropy?
Analytical expressions
From Riemann to Monte-Carlo
Simulation of rare events
Cross-Entropy techniques
Optimization
Cross-Entropy tricks
Fitting parameters
Questions
Outline
What is cross-entropy?
Entropy
Kullback-Leibler divergence
From Riemann to Monte-Carlo
Riemann integration
Monte-Carlo integration
Importance sampling
Cross-Entropy techniques
Analytical expressions
Simulation of rare events
Optimization
Fitting parameters
Cross-Entropy tricks
Multiple maxima
Slow convergence
Questions Arthur Breitman crossentropy for rare event simulation and optimization
27. What is cross-entropy?
Analytical expressions
From Riemann to Monte-Carlo
Simulation of rare events
Cross-Entropy techniques
Optimization
Cross-Entropy tricks
Fitting parameters
Questions
Surviving the zombie hordes
Figure: Electric fences, the horde and you
Arthur Breitman crossentropy for rare event simulation and optimization
28. What is cross-entropy?
Analytical expressions
From Riemann to Monte-Carlo
Simulation of rare events
Cross-Entropy techniques
Optimization
Cross-Entropy tricks
Fitting parameters
Questions
Simulating zombie breakouts
◮ Each fence (Ui , λi ) delivers u ∼ max(Ui − Exp(λi ), 0) volts.
◮ Crossing a fence deals u damage to a zombie
◮ Zombies come from everywhere and can take 5 damage hits
each.
◮ Zombies outbreaks are very rare!
Arthur Breitman crossentropy for rare event simulation and optimization
29. What is cross-entropy?
Analytical expressions
From Riemann to Monte-Carlo
Simulation of rare events
Cross-Entropy techniques
Optimization
Cross-Entropy tricks
Fitting parameters
Questions
Mere integration fails!
◮ We can estimate this probability by sampling the random
voltages and finding a shortest path.
◮ Speed of Monte-Carlo proportional to poutbreak (1 − poutbreak ),
too slow!
Arthur Breitman crossentropy for rare event simulation and optimization
30. What is cross-entropy?
Analytical expressions
From Riemann to Monte-Carlo
Simulation of rare events
Cross-Entropy techniques
Optimization
Cross-Entropy tricks
Fitting parameters
Questions
Cross-Entropy to the rescue
◮ We want to approximate the multivariate power distribution
conditional on an outbreak occurring!
◮ Approximate the shape by changing the parameters Ui and λi
for each fence
◮ Generate samples, fit Ui and λi on the samples inducing an
outbreak
Arthur Breitman crossentropy for rare event simulation and optimization
31. What is cross-entropy?
Analytical expressions
From Riemann to Monte-Carlo
Simulation of rare events
Cross-Entropy techniques
Optimization
Cross-Entropy tricks
Fitting parameters
Questions
The elite sample
What if the probability is so low that we don’t observe any
outbreak in our sample?
◮ Generate n samplings using the sampling distribution
◮ If more than e samples are outbreaks, fit to those samples,
break
◮ Otherwise, fit on the e best sample, the elite sample.
◮ Iterate
◮ Generate a sample, weight each points by the importance
sampling weight, estimate probability
Arthur Breitman crossentropy for rare event simulation and optimization
32. What is cross-entropy?
Analytical expressions
From Riemann to Monte-Carlo
Simulation of rare events
Cross-Entropy techniques
Optimization
Cross-Entropy tricks
Fitting parameters
Questions
Other examples
◮ Modeling rare event for any complex probability distribution,
e.g. Bayesian networks.
◮ Estimating tails for the sum of fat-tailed distributions
Arthur Breitman crossentropy for rare event simulation and optimization
33. What is cross-entropy?
Analytical expressions
From Riemann to Monte-Carlo
Simulation of rare events
Cross-Entropy techniques
Optimization
Cross-Entropy tricks
Fitting parameters
Questions
Outline
What is cross-entropy?
Entropy
Kullback-Leibler divergence
From Riemann to Monte-Carlo
Riemann integration
Monte-Carlo integration
Importance sampling
Cross-Entropy techniques
Analytical expressions
Simulation of rare events
Optimization
Fitting parameters
Cross-Entropy tricks
Multiple maxima
Slow convergence
Questions Arthur Breitman crossentropy for rare event simulation and optimization
34. What is cross-entropy?
Analytical expressions
From Riemann to Monte-Carlo
Simulation of rare events
Cross-Entropy techniques
Optimization
Cross-Entropy tricks
Fitting parameters
Questions
From integration to optimization
Using an elite sample to help convergence is a trick that does a
form of hill climbing of a smooth function approximating the
indicator function of the rare event.
◮ Interesting even if not interested in integrating f .
◮ Keep iterating based on an elite sample to converge towards
one global maximum.
◮ variance of the sampling distribution follows the curvature of
f.
◮ e.g. using a multivariate normal allows the covariance to
reflect the differential
Arthur Breitman crossentropy for rare event simulation and optimization
35. What is cross-entropy?
Analytical expressions
From Riemann to Monte-Carlo
Simulation of rare events
Cross-Entropy techniques
Optimization
Cross-Entropy tricks
Fitting parameters
Questions
Combinatorial optimization
One classical example if combinatorial optimization. To solve a
TSP with Cross-Entropy:
◮ Assume the travel is a Markov chain on the graph nodes.
◮ Generate travels by coercing them to be permutations.
◮ Update transition probabilities from the elite sample.
Arthur Breitman crossentropy for rare event simulation and optimization
36. What is cross-entropy?
Analytical expressions
From Riemann to Monte-Carlo
Simulation of rare events
Cross-Entropy techniques
Optimization
Cross-Entropy tricks
Fitting parameters
Questions
Clustering
CE does clustering too!
◮ Assign probabilities of membership to classes for each point
(the sampling distribution).
◮ Sample random membership assignments.
◮ Use average distance to centroids to find an elite sample.
◮ Slower than K-means but much less sensitive to initial choice
of centroids.
Arthur Breitman crossentropy for rare event simulation and optimization
37. What is cross-entropy?
Analytical expressions
From Riemann to Monte-Carlo
Simulation of rare events
Cross-Entropy techniques
Optimization
Cross-Entropy tricks
Fitting parameters
Questions
A form of global optimization
Is it global optimization?
◮ If the sampling distribution is bounded below by a distribution
that covers the global maximum, yes, with probability 1!
◮ In practice we may never see one maximum and converge to
another local maximum.
Arthur Breitman crossentropy for rare event simulation and optimization
38. What is cross-entropy?
Analytical expressions
From Riemann to Monte-Carlo
Simulation of rare events
Cross-Entropy techniques
Optimization
Cross-Entropy tricks
Fitting parameters
Questions
Outline
What is cross-entropy?
Entropy
Kullback-Leibler divergence
From Riemann to Monte-Carlo
Riemann integration
Monte-Carlo integration
Importance sampling
Cross-Entropy techniques
Analytical expressions
Simulation of rare events
Optimization
Fitting parameters
Cross-Entropy tricks
Multiple maxima
Slow convergence
Questions Arthur Breitman crossentropy for rare event simulation and optimization
39. What is cross-entropy?
Analytical expressions
From Riemann to Monte-Carlo
Simulation of rare events
Cross-Entropy techniques
Optimization
Cross-Entropy tricks
Fitting parameters
Questions
Fitting model parameters with CE
Cross-Entropy techniques work generally very well for finding ML
parameters of a model. Why?
◮ Models often have different sensitivities to different
parameters, CE reflects that.
◮ With a covariance structure, it does a form of gradient ascent.
◮ But it can deal with discrete parameters at the same time!
◮ It does not tend to get trapped in local maxima.
◮ Well suited for high-dimensional parameter spaces.
Arthur Breitman crossentropy for rare event simulation and optimization
40. What is cross-entropy?
From Riemann to Monte-Carlo
Multiple maxima
Cross-Entropy techniques
Slow convergence
Cross-Entropy tricks
Questions
Outline
What is cross-entropy?
Entropy
Kullback-Leibler divergence
From Riemann to Monte-Carlo
Riemann integration
Monte-Carlo integration
Importance sampling
Cross-Entropy techniques
Analytical expressions
Simulation of rare events
Optimization
Fitting parameters
Cross-Entropy tricks
Multiple maxima
Slow convergence
Questions Arthur Breitman crossentropy for rare event simulation and optimization
41. What is cross-entropy?
From Riemann to Monte-Carlo
Multiple maxima
Cross-Entropy techniques
Slow convergence
Cross-Entropy tricks
Questions
Forgetting maxima
Some maxima can be ”forgotten”
◮ Smooth changes in the sampling function.
◮ Expand the sampling function (equivalent to applying a prior
or ”shrinkage”).
◮ Keep the entire sample
Arthur Breitman crossentropy for rare event simulation and optimization
42. What is cross-entropy?
From Riemann to Monte-Carlo
Multiple maxima
Cross-Entropy techniques
Slow convergence
Cross-Entropy tricks
Questions
Not converging to a maximum
Multiple maxima may prevent variance of the sampling from
decreasing.
◮ Mixtures of multivariate normals can deal with this.
◮ They can be introduced dynamically.
◮ Fit with EM.
Arthur Breitman crossentropy for rare event simulation and optimization
43. What is cross-entropy?
From Riemann to Monte-Carlo
Multiple maxima
Cross-Entropy techniques
Slow convergence
Cross-Entropy tricks
Questions
Outline
What is cross-entropy?
Entropy
Kullback-Leibler divergence
From Riemann to Monte-Carlo
Riemann integration
Monte-Carlo integration
Importance sampling
Cross-Entropy techniques
Analytical expressions
Simulation of rare events
Optimization
Fitting parameters
Cross-Entropy tricks
Multiple maxima
Slow convergence
Questions Arthur Breitman crossentropy for rare event simulation and optimization
44. What is cross-entropy?
From Riemann to Monte-Carlo
Multiple maxima
Cross-Entropy techniques
Slow convergence
Cross-Entropy tricks
Questions
Independent variables
If the sampling distribution is separable, convergence can be sped
up by sampling over one dimension at a time.
Arthur Breitman crossentropy for rare event simulation and optimization
45. What is cross-entropy?
From Riemann to Monte-Carlo
Cross-Entropy techniques
Cross-Entropy tricks
Questions
Questions
Questions?
Arthur Breitman crossentropy for rare event simulation and optimization