The Codex of Business Writing Software for Real-World Solutions 2.pptx
Using Problem-Specific Knowledge and Learning from Experience in Estimation of Distribution Algorithms
1. Using Problem-Specific Knowledge and Learning
from Experience in Estimation of Distribution
Algorithms
Martin Pelikan and Mark W. Hauschild
Missouri Estimation of Distribution Algorithms Laboratory (MEDAL)
University of Missouri, St. Louis, MO
pelikan@cs.umsl.edu, mwh308@umsl.edu
http://medal.cs.umsl.edu/
Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
2. Motivation
Two key questions
Can we use past EDA runs to solve future problems faster?
EDAs do more than solve a problem.
EDAs provide us with lot of information about the landscape.
Why throw out this information?
Can we use problem-specific knowledge to speed up EDAs?
EDAs are able to adapt exploration operators to the problem.
We do not have to know much about the problem to solve it.
But why throw out prior problem-specific information if available?
This presentation
Reviews some of the approaches that attempt to do this.
Focus is on two areas:
Using prior problem-specific knowledge.
Learning from experience (past EDA runs).
Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
3. Outline
1. EDA bottlenecks.
2. Prior problem-specific knowledge.
3. Learning from experience.
4. Summary and conclusions.
Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
4. Estimation of Distribution Algorithms
Estimation of distribution algorithms (EDAs)
Work with a population of candidate solutions.
Learn probabilistic model of promising solutions.
Sample the model to generate new solutions.
Probabilistic Model-Building GAs
Current Selected New
population population population
Probabilistic
Model
…replace crossover+mutation with learning in EDAs
Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience
5. Efficiency Enhancement of EDAs
Main EDA bottlenecks
Evaluation.
Model building.
Model sampling.
Memory complexity (models, candidate solutions).
Efficiency enhancement techniques
Address one or more bottlenecks.
Can adopt much from standard evolutionary algorithms.
But EDAs provide opportunities to do more than that!
Many approaches, we focus on a few.
Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
6. What Comes Next?
1. Using problem-specific knowledge.
2. Learning from experience.
Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
7. Problem-Specific Knowledge in EDAs
Basic idea
We don’t have to know much about the problem to use EDAs.
But what if we do know something about it?
Can we use prior problem-specific knowledge in EDAs?
Bias populations
Inject high quality solutions into population.
Modify solutions using a problem-specific procedure.
Bias model building
How to bias
Bias model structure (e.g. Bayesian network structure).
Bias model parameters (e.g. conditional probabilities).
Types of bias
Hard bias: Restrict admissible models/parameters.
Soft bias: Some models/parameters given preference over others.
Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
8. Example: Biasing Model Structure in Graph Bipartitioning
Graph bipartitioning
Input
Graph G = (V, E).
V are nodes.
E are edges.
Task
Split V into equally sized subsets so that the number of edges
between these subsets is minimized.
Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
9. Example: Biasing Model Structure in Graph Bipartitioning
Biasing models in graph bipartitioning
Soft bias (Schwarz & Ocenasek, 2000)
Increase prior probability of models with dependencies included in E.
Decrease prior probability of models with dependencies not included in E.
Hard bias (M¨hlenbein and Mahnig, 2002)
u
Strictly disallow model dependencies that disagree with edges in E.
In both cases performance of EDAs was substantially improved.
Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
10. Important Challenges
Challenges in the use of prior knowledge in EDAs
Parameter bias using prior probabilities not explored much.
Structural bias introduced only rarely.
Model bias often studied only on surface.
Theory missing.
Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
11. Learning from Experience
Basic idea
Consider solving many instances of the same problem class.
Can we learn from past EDA runs to solve future instances of
this problem type faster?
Similar to the use of prior knowledge, but in this case we
automate the discovery of problem properties (instead of
relying on expert knowledge).
What features to learn?
Model structure.
Promising candidate solutions or partial solutions.
Algorithm parameters.
How to use the learned features?
Modify/restrict algorithm parameters.
Bias populations.
Bias models.
Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
12. Example: Probability Coincidence Matrix
Probability coincidence matrix (PCM)
Hauschild, Pelikan, Sastry, Goldberg (2008).
Each model may contain dependency between Xi and Xj .
PCM stores observed probabilities of dependencies.
PCM = {pij } where i, j ∈ {1, 2, . . . , n}.
pi,j = proportion of models with dependency between Xi and Xj .
Example PCM
Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
13. Example: Probability Coincidence Matrix
Using PCM for hard bias
Hauschild et al. (2008).
Set threshold for the minimum proportion of a dependency.
Only accept dependencies occuring at least that often.
Strictly disallow other dependencies.
Using PCM for soft bias
Hauschild and Pelikan (2009).
Introduce prior probability of a model structure.
Dependencies that were more likely in the past are given
preference.
Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
14. (b) 24x24
Results: PCM for 32 × 32 2D Spin Glass
5
Execution Time Speedup
4
3
2
1
0
1.5 2 0 0.5 1 1.5 2
ntage allowed Minimum edge percentage allowed
(d) 32x32
(Hauschild, Pelikan, Sastry, Goldberg; 2008)
edupPelikan, Mark W. Hauschild restrictions on model-building Experience in EDAs
Martin
with increased Prior Knowledge and Learning from
for 10
15. Results:Hauschild for 32 × 32 2D Spin Glass
Mark W. PCM
Size Execution-time speedup pmin % Total Dep.
256 (16 × 16) 3.89 0.020 6.4%
324 (18 × 18) 4.37 0.011 8.7%
400 (20 × 20) 4.34 0.020 7.0%
484 (22 × 22) 4.61 0.010 6.3%
576 (24 × 24) 4.63 0.013 4.6%
676 (26 × 26) 4.62 0.011 4.7%
784 (28 × 28) 4.45 0.009 5.4%
900 (30 × 30) 4.93 0.005 8.1%
1024 (32 × 32) 4.14 0.007 5.5%
Table 2: Optimal speedup and the corresponding PCM threshold pmin as well as the
percentage of total possible dependencies that were considered for the 2D Ising spin
glass.
(Hauschild, Pelikan, Sastry, Goldberg; 2008)
maximum distance of dependencies remains a challenge. If the distances are restricted
too severely, the bias on the model building may be too strong to allow for sufficiently
complex models; this was supported also with results in Hauschild, Pelikan, Lima, and
Sastry (2007). On the other hand, if the distances are not restricted sufficiently, the
benefits of this approach may be negligible. Prior Knowledge and Learning from Experience in EDAs
Martin Pelikan, Mark W. Hauschild
16. Example: Distance Restrictions
PCM limitations
Only can be applied when variables have fixed “function”.
Dependencies between specific variables are either more likely
or less likely across many problem instances.
Concept is difficult to scale with the number of variables.
Distance restrictions
Hauschild, Pelikan, Sastry, Goldberg (2008).
Introduce a distance metric over problem variables such that
variables at shorter distances are more likely to interact.
Gather statistics of dependencies at particular distances.
Decide on distance threshold to disallow some dependencies.
Use distances to provide soft bias via prior distributions.
Distance metrics are often straightforward, especially for
additively decomposable problems.
Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
17. Example: Distance Restrictions for Graph Bipartitioning
Example for graph bipartitioning
Given graph G = (V, E).
Assign weight 1 for all edges in E.
Distance given as shortest path between vertices.
Unconnected vertices given distance |V |.
Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
18. Example: Distance Restrictions for ADFs
Distance metric for additively decomposable function
Additively decomposable function (ADF):
m
f (X1 , . . . , Xn ) = fi (Si )
i=1
fi is ith subfunction
Si is subset of variables from {X1 , . . . , Xn }
Connect variables in the same subset Si for some i.
Distance is shortest path between variables (if connected).
Distance is n if path doesn’t exist.
Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
19. (b) 20 × 20
Results: Distance Restrictions on 28 × 28 2D Spin Glass
6
Execution Time Speedup
5 ←
←4 5
←6
4 ←3 ←7
7 ←8
←8 3
←9
←9 ←2 ← 10
← 10 ← 11
← 11 2 ← 13
← 12 ← 12
← 14
24 → 1 28 →
0
0.8 1 0.2 0.4 0.6 0.8 1
ependencies Original Ratio of Total Dependencies
(Hauschild, Pelikan; 2009) (d) 28 × 28
Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
20. Results: Distance Restrictions on 2D Spin Glass
Biasing models in hBOA using prior knowledge
Size Execution-time speedup Max Dist Allowed qmin % Total Dep.
256 (16 × 16) 4.2901 2 0.62 4.7%
400 (20 × 20) 4.9288 3 0.64 6.0%
576 (24 × 24) 5.2156 3 0.60 4.1%
784 (28 × 28) 4.9007 5 0.63 7.6%
Table 3: Distance cutoff runs with their best speedups by distance as well as the per-
centage of total possible dependencies that were considered for 2D Ising spin glass
(Hauschild, Pelikan; 2009) with dependencies restricted by the maximum distance,
instances we ran experiments
which was varied from 1 to the maximum distance found between any two proposi-
tions (for example, for p = 2−4 we ran experiments using a maximum distance from 1
to 9). For some instances with p = 1 the maximum distance was 500, indicating that
there was no path between some pairs of propositions. On the tested problems, small
distance restrictions (restricting to only distance 1 or 2) were sometimes too restrictive
and some instances would not be solved even with extremely large population sizes
(N = 512000); in these cases the results were omitted (such restrictions were not used).
Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
21. Important Challenges
Challenges in learning from experience
The process of selecting threshold is manual and difficult.
The ideas must be applied and tested on more problem types.
Theory is missing.
Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
22. Another Related Idea: Model-Directed Hybridization
Model-directed hybridization
EDA models reveal lot about problem landscape
Use this information to design advanced neighborhood
structures (operators).
Use this information to design problem-specific operators.
Lot of successes, lot of work to be done.
Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
23. Conclusions and Future Work
Conclusions
EDAs do a lot more than just solve the problem.
EDAs give us a lot of information about the problem.
EDAs allow use of prior knowledge of various forms.
Yet, most EDA researchers focus on design of new EDAs and only
few look at the use of EDAs beyond solving an isolated problem
instance.
Future work
Some of the key challenges were mentioned throughout the talk.
If you are interested in collaboration, talk to us.
Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs
24. Acknowledgments
Acknowledgments
NSF; NSF CAREER grant ECS-0547013.
University of Missouri; High Performance Computing
Collaboratory sponsored by Information Technology Services;
Research Award; Research Board.
Martin Pelikan, Mark W. Hauschild Prior Knowledge and Learning from Experience in EDAs