SlideShare una empresa de Scribd logo
1 de 27
Explicit Modelling in
Metaheuristic Optimization
                Dr Marcus Gallagher
  School of Information Technology and Electrical
                    Engineering
        University of Queensland Q. 4072
             marcusg@itee.uq.edu.au
Talk outline:
     Optimization, heuristics and metaheuristics.
     “Estimation of Distribution” (optimization)
      algorithms (EDAs): a brief overview.
     A framework for describing EDAs.
     Other modelling approaches in
      metaheuristics.
     Summary




             Marcus Gallagher - MASCOS Symposium, 26/11/04   2
“Hard” Optimization Problems
Goal: Find
     x*   S such that f (x*)                f (x),         x S
   where S is often multi-dimensional; real-valued or
    binary
             n                         n
     S     R or S               0,1
   Many classes of optimization problems (and
    algorithms) exist.
   When might it be worthwhile to consider metaheuristic
    or machine learning approaches?

                 Marcus Gallagher - MASCOS Symposium, 26/11/04   3
Finding an “exact” solution is intractable.
Limited knowledge of f()
   No derivative information.
   May be discontinuous, noisy,…
Evaluating f() is expensive in terms of time
or cost.
f() is known or suspected to contain nasty
features
   Many local minima, plateaus, ravines.
The search space is high-dimensional.

             Marcus Gallagher - MASCOS Symposium, 26/11/04   4
What is the “practical” goal of (global)
optimization?
   “There exists a goal (e.g. to find as small a
    value of f() as possible), there exist resources
    (e.g. some number of trials), and the problem
    is how to use these resources in an optimal
    way.”
           A. Torn and A. Zilinskas, Global Optimisation. Springer-
            Verlag, 1989. Lecture Notes in Computer Science, Vol.
            350.




                  Marcus Gallagher - MASCOS Symposium, 26/11/04        5
Heuristics
Heuristic (or approximate) algorithms aim
to find a good solution to a problem in a
reasonable amount of computation time –
but with no guarantee of “goodness” or
“efficiency” (cf. exact or complete
algorithms).
Broad classes of heuristics:
   Constructive methods
   Local search methods

            Marcus Gallagher - MASCOS Symposium, 26/11/04   6
Metaheuristics
Metaheuristics are (roughly) high-level strategies
that combinine lower-level techniques for
exploration and exploitation of the search space.
   An overarching term to refer to algorithms including
    Evolutionary Algorithms, Simulated Annealing, Tabu
    Search, Ant Colony, Particle Swarm, Cross-
    Entropy,…
      C. Blum and A. Roli. Metaheuristics in Combinatorial
      Optimization: Overview and Conceptual Comparison. ACM
      Computing Surveys, 35(3), 2003, pp. 268-308.



               Marcus Gallagher - MASCOS Symposium, 26/11/04   7
Learning/Modelling for Optimization
 Most optimization algorithms make some (explicit or
 implicit) assumptions about the nature of f().
 Many algorithms vary their behaviour during execution
 (e.g. simulated annealing).
 In some optimization algorithms the search is adaptive
    Future search points evaluated depend on previous points
     searched (and/or their f() values, derivatives of f() etc).
 Learning/modelling can be implicit (e.g, adapting the
 step-size in gradient descent, population in an EA).
 …or explicit; examples from optimization literature:
    Nelder-Mead simplex algorithm.
    Response surfaces (metamodelling, surrogate function).


                  Marcus Gallagher - MASCOS Symposium, 26/11/04    8
EDAs: Probabilistic Modelling for
             Optimization
Based on the use of (unsupervised) density
estimators/generative statistical models.
Idea is to convert the optimization problem into a
search over probability distributions.
   P. Larranaga and J. A. Lozano (eds.). Estimation of Distribution
    Algorithms: a new tool for evolutionary computation. Kluwer
    Academic Publishers, 2002.
The probabilistic model is in some sense an
explicit model of (currently) promising regions of
the search space.

                  Marcus Gallagher - MASCOS Symposium, 26/11/04        9
EDAs: toy example




  Marcus Gallagher - MASCOS Symposium, 26/11/04   10
EDAs: toy example




  Marcus Gallagher - MASCOS Symposium, 26/11/04   11
GAs and EDAs compared
GA pseudocode
1.   Initialize the population, X(t);
2.   Evaluate the objective function for each
     point;
3.   Selection();
4.   Crossover();
5.   Mutation();
6.    Form new population X(t+1);
7.   While !(terminate()) Goto 2;


             Marcus Gallagher - MASCOS Symposium, 26/11/04   12
GAs and EDAs compared
 EDA pseudocode
1.   Initialize a probability model, Q(x);
2.   Create a population of points by
     sampling from Q(x);
3.   Evaluate the objective function for
     each point;
4.   Update Q(x) using selected population
     and f() values;
5.   While !(terminate()) Goto 2;



            Marcus Gallagher - MASCOS Symposium, 26/11/04   13
EDA Example 1
Population-based Incremental Learning
(PBIL)
   S. Baluja, R. Caruana. Removing the Genetics from the
    Standard Genetic Algorithm. ICML’95.


                  p1 =         p2 =                   pn =
                 Pr(x1=1)     Pr(x2=1)               Pr(xn=1)




     pi     1       pi           xib


                Marcus Gallagher - MASCOS Symposium, 26/11/04   14
EDA Example 2
Mutual Information Maximization for Input
Clustering (MIMIC)
   J. De Bonet, C. Isbell and P. Viola. MIMIC: Finding optima by
    estimating probability densities. Advances in Neural Information
    Processing Systems, vol.9, 1997.




    p(x)     p( xi1 | xi2 ) p( xi2 | xi3 ) p( xin 1 | xin ) p( xin )


                   Marcus Gallagher - MASCOS Symposium, 26/11/04        15
EDA Example 3
Combining Optimizers with Mutual Information
Trees (COMIT)
   S. Baluja and S. Davies. Using optimal dependency-trees for combinatorial
    optimization: learning the structure of the search space. Proc. ICML’97.

Uses a tree-structured graphical model
   Model can be constructed in O(n2) time using a
    variant of the minimum spanning tree algorithm.
   Model is optimal, given the restrictions, in the sense
    that the Kullback-Liebler divergence between the
    model and a full joint distribution is minimized.


                    Marcus Gallagher - MASCOS Symposium, 26/11/04               16
EDA Example 4
Bayesian Optimization Algorithm (BOA)
   M. Pelikan, D. Goldberg and E. Cantu-Paz. BOA: The Bayesian
    optimization algorithm. In Proc. GECCO’99.

Bayesian network model where nodes can
have at most k parents.
   Greedy search over the Bayesian Dirichlet
    equivalence metric to find the network
    structure.



                 Marcus Gallagher - MASCOS Symposium, 26/11/04    17
Further work on EDAs
EDAs have also been developed
   For problems with continuous and mixed
    variables.
   That use mixture models and kernel
    estimators - allowing for the modelling of
    multi-modal distributions.
   …and more!



              Marcus Gallagher - MASCOS Symposium, 26/11/04   18
A framework to describe building and adapting a
      probabilistic model for optimization

See:
       M. Gallagher and M. Frean. Population-Based Continuous
       Optimization, Probabilistic Modelling and Mean Shift. To
       appear, Evolutionary Computation, 2005.
Consider a continuous EDA with model
                 n
       Q(x)           Qi ( xi )
                i 1

Consider a Boltzmann distribution over f(x)
                1                 f ( x)
       P( x)      exp
                Z                  T
                Marcus Gallagher - MASCOS Symposium, 26/11/04     19
As T→0, P(x) tends towards a set of impulse
spikes over the global optima.
Now, we have a probability distribution that we
know the form of, Q(x) and we would like to
modify it to be close to P(x). KL divergence:
                     Q( x)
    K      Q( x) log       dx
         x
                     P( x)
Let Q(x) be a Gaussian; try and minimize K via
gradient descent with respect to the mean
parameter of Q(x).



            Marcus Gallagher - MASCOS Symposium, 26/11/04   20
The gradient becomes
      Q               x
              Q( x)
                          v
                  1
          K           Q( x).(x               ) f ( x)dx
                 vT x
An approximation to the integral is to use a
sample of x from Q(x)
                 1
          K              ( xi         ) f ( xi )
                nvT xi S


                 Marcus Gallagher - MASCOS Symposium, 26/11/04   21
The algorithm update rule is then

                        (x   i     ˆ ( xi )
                                  )f
             n xi   S


Similar ideas can be found in:
     A. Berny. Statistical Machine Learning and Combinatorial
     Optimization. In L. Kallel et al. eds, Theoretical Aspects of
     Evolutionary Computation, pp. 287-306. Springer. 2001.
     M. Toussaint. On the evolution of phenotypic exploration
     distributions. In C. Cotta et al. eds, Foundations of Genetic
     Algorithms (FOGA VII), pp. 169-182. Morgan Kaufmann. 2003.




              Marcus Gallagher - MASCOS Symposium, 26/11/04   22
Some insights
The derived update rule is closely related
to those found in Evolution Strategies and
a version of PBIL for continuous spaces.
It is possible to view these existing
algorithms as approximately doing KL
minimization.
The objective function appears explicitly in
this update rule (no selection).

           Marcus Gallagher - MASCOS Symposium, 26/11/04   23
Other Research in Learning/Modelling
          for Optimization
J. A. Boyan and A. W. Moore. Learning Evaluation Functions to
Improve Optimization by Local Search. Journal of Machine Learning
Research 1:2, 2000.
B. Anderson, A. Moore and D. Cohn. A Nonparametric Approach to
Noisy and Costly Optimization. International Conference on
Machine Learning, 2000.
D. R. Jones. A Taxonomy of Global Optimization Methods Based
on Response Surfaces. Journal of Global Optimization 21(4):345-
383, 2001.
Reinforcement learning
    R. J. Williams (1992). Simple statistical gradient-following algorithms for
     connectionist reinforcement learning. Machine Learning, 8:229-256.
    V. V. Miagkikh and W. F. Punch III, An Approach to Solving Combinatorial
     Optimization Problems Using a Population of Reinforcement Learning Agents,
     Genetic and Evolutionary Computation Conf.(GECCO-99), p.1358-1365, 1999.




                     Marcus Gallagher - MASCOS Symposium, 26/11/04                 24
Summary
The field of metaheuristics (including
Evolutionary Computation) has produced
   A large variety of optimization algorithms
   Demonstrated good performance on a range of real-
    world problems.
 Metaheuristics are considerably more general:
   can even be applied when there isn’t a “true”
    objective function (coevolution).
   Can evolve non-numerical objects.


               Marcus Gallagher - MASCOS Symposium, 26/11/04   25
Summary
EDAs take an explicit modelling approach to
optimization.
   Existing statistical models and model-fitting algorithms can be
    employed.
   Potential for solving challenging problems.
   Model can be more easily visualized/interpreted than a dynamic
    population in a conventional EA.
Although the field is highly active, it is still relatively
immature
   Improve quality of experimental results.
   Make sure research goals are well-defined.
   Lots of preliminary ideas, but lack of comparative/followup
    research.
   Difficult to keep up with the literature and see connections with
    other fields.

                  Marcus Gallagher - MASCOS Symposium, 26/11/04         26
The End!
Questions?




         Marcus Gallagher - MASCOS Symposium, 26/11/04   27

Más contenido relacionado

La actualidad más candente

Differential analyses of structures in HiC data
Differential analyses of structures in HiC dataDifferential analyses of structures in HiC data
Differential analyses of structures in HiC datatuxette
 
'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysistuxette
 
Histogram-Based Method for Effective Initialization of the K-Means Clustering...
Histogram-Based Method for Effective Initialization of the K-Means Clustering...Histogram-Based Method for Effective Initialization of the K-Means Clustering...
Histogram-Based Method for Effective Initialization of the K-Means Clustering...Gingles Caroline
 
Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...tuxette
 
Probabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering NetworksProbabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering NetworksTomaso Aste
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...butest
 
A short introduction to statistical learning
A short introduction to statistical learningA short introduction to statistical learning
A short introduction to statistical learningtuxette
 
Historical Simulation with Component Weight and Ghosted Scenarios
Historical Simulation with Component Weight and Ghosted ScenariosHistorical Simulation with Component Weight and Ghosted Scenarios
Historical Simulation with Component Weight and Ghosted Scenariossimonliuxinyi
 
Reliable ABC model choice via random forests
Reliable ABC model choice via random forestsReliable ABC model choice via random forests
Reliable ABC model choice via random forestsChristian Robert
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelstuxette
 
Classification accuracy of sar images for various land
Classification accuracy of sar images for various landClassification accuracy of sar images for various land
Classification accuracy of sar images for various landeSAT Publishing House
 
Probabilistic PCA, EM, and more
Probabilistic PCA, EM, and moreProbabilistic PCA, EM, and more
Probabilistic PCA, EM, and morehsharmasshare
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional VerificationSai Kiran Kadam
 
RECENT ADVANCES in PREDICTIVE (MACHINE) LEARNING
RECENT ADVANCES in PREDICTIVE (MACHINE) LEARNINGRECENT ADVANCES in PREDICTIVE (MACHINE) LEARNING
RECENT ADVANCES in PREDICTIVE (MACHINE) LEARNINGbutest
 

La actualidad más candente (20)

Differential analyses of structures in HiC data
Differential analyses of structures in HiC dataDifferential analyses of structures in HiC data
Differential analyses of structures in HiC data
 
'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis'ACCOST' for differential HiC analysis
'ACCOST' for differential HiC analysis
 
Histogram-Based Method for Effective Initialization of the K-Means Clustering...
Histogram-Based Method for Effective Initialization of the K-Means Clustering...Histogram-Based Method for Effective Initialization of the K-Means Clustering...
Histogram-Based Method for Effective Initialization of the K-Means Clustering...
 
Intractable likelihoods
Intractable likelihoodsIntractable likelihoods
Intractable likelihoods
 
Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...Kernel methods and variable selection for exploratory analysis and multi-omic...
Kernel methods and variable selection for exploratory analysis and multi-omic...
 
Probabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering NetworksProbabilistic Modelling with Information Filtering Networks
Probabilistic Modelling with Information Filtering Networks
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...
 
Extracting biclusters of similar values with Triadic Concept Analysis
Extracting biclusters of similar values with Triadic Concept AnalysisExtracting biclusters of similar values with Triadic Concept Analysis
Extracting biclusters of similar values with Triadic Concept Analysis
 
A short introduction to statistical learning
A short introduction to statistical learningA short introduction to statistical learning
A short introduction to statistical learning
 
Historical Simulation with Component Weight and Ghosted Scenarios
Historical Simulation with Component Weight and Ghosted ScenariosHistorical Simulation with Component Weight and Ghosted Scenarios
Historical Simulation with Component Weight and Ghosted Scenarios
 
Estimating Space-Time Covariance from Finite Sample Sets
Estimating Space-Time Covariance from Finite Sample SetsEstimating Space-Time Covariance from Finite Sample Sets
Estimating Space-Time Covariance from Finite Sample Sets
 
Reliable ABC model choice via random forests
Reliable ABC model choice via random forestsReliable ABC model choice via random forests
Reliable ABC model choice via random forests
 
A short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction modelsA short and naive introduction to using network in prediction models
A short and naive introduction to using network in prediction models
 
Classification accuracy of sar images for various land
Classification accuracy of sar images for various landClassification accuracy of sar images for various land
Classification accuracy of sar images for various land
 
Polynomial Matrix Decompositions
Polynomial Matrix DecompositionsPolynomial Matrix Decompositions
Polynomial Matrix Decompositions
 
Probabilistic PCA, EM, and more
Probabilistic PCA, EM, and moreProbabilistic PCA, EM, and more
Probabilistic PCA, EM, and more
 
Planted Clique Research Paper
Planted Clique Research PaperPlanted Clique Research Paper
Planted Clique Research Paper
 
SVM - Functional Verification
SVM - Functional VerificationSVM - Functional Verification
SVM - Functional Verification
 
O0447796
O0447796O0447796
O0447796
 
RECENT ADVANCES in PREDICTIVE (MACHINE) LEARNING
RECENT ADVANCES in PREDICTIVE (MACHINE) LEARNINGRECENT ADVANCES in PREDICTIVE (MACHINE) LEARNING
RECENT ADVANCES in PREDICTIVE (MACHINE) LEARNING
 

Destacado

Teaching Machine Learning to Design Students
Teaching Machine Learning to Design StudentsTeaching Machine Learning to Design Students
Teaching Machine Learning to Design Studentsbutest
 
Christopher N. Bull History-Sensitive Detection of Design Flaws B ...
Christopher N. Bull History-Sensitive Detection of Design Flaws B ...Christopher N. Bull History-Sensitive Detection of Design Flaws B ...
Christopher N. Bull History-Sensitive Detection of Design Flaws B ...butest
 
T2L3.doc
T2L3.docT2L3.doc
T2L3.docbutest
 
SATANJEEV BANERJEE
SATANJEEV BANERJEESATANJEEV BANERJEE
SATANJEEV BANERJEEbutest
 
Techniques for integrating machine learning with knowledge ...
Techniques for integrating machine learning with knowledge ...Techniques for integrating machine learning with knowledge ...
Techniques for integrating machine learning with knowledge ...butest
 
BenMartine.doc
BenMartine.docBenMartine.doc
BenMartine.docbutest
 
Advanced Web Design and Development - Spring 2005.doc
Advanced Web Design and Development - Spring 2005.docAdvanced Web Design and Development - Spring 2005.doc
Advanced Web Design and Development - Spring 2005.docbutest
 
EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEbutest
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同butest
 

Destacado (9)

Teaching Machine Learning to Design Students
Teaching Machine Learning to Design StudentsTeaching Machine Learning to Design Students
Teaching Machine Learning to Design Students
 
Christopher N. Bull History-Sensitive Detection of Design Flaws B ...
Christopher N. Bull History-Sensitive Detection of Design Flaws B ...Christopher N. Bull History-Sensitive Detection of Design Flaws B ...
Christopher N. Bull History-Sensitive Detection of Design Flaws B ...
 
T2L3.doc
T2L3.docT2L3.doc
T2L3.doc
 
SATANJEEV BANERJEE
SATANJEEV BANERJEESATANJEEV BANERJEE
SATANJEEV BANERJEE
 
Techniques for integrating machine learning with knowledge ...
Techniques for integrating machine learning with knowledge ...Techniques for integrating machine learning with knowledge ...
Techniques for integrating machine learning with knowledge ...
 
BenMartine.doc
BenMartine.docBenMartine.doc
BenMartine.doc
 
Advanced Web Design and Development - Spring 2005.doc
Advanced Web Design and Development - Spring 2005.docAdvanced Web Design and Development - Spring 2005.doc
Advanced Web Design and Development - Spring 2005.doc
 
EL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBEEL MODELO DE NEGOCIO DE YOUTUBE
EL MODELO DE NEGOCIO DE YOUTUBE
 
1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同1. MPEG I.B.P frame之不同
1. MPEG I.B.P frame之不同
 

Similar a Learning for Optimization: EDAs, probabilistic modelling, or ...

Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdfQuark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdfPo-Chuan Chen
 
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdfQuark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdfPo-Chuan Chen
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet AllocationMarco Righini
 
SigOpt_Bayesian_Optimization_Primer
SigOpt_Bayesian_Optimization_PrimerSigOpt_Bayesian_Optimization_Primer
SigOpt_Bayesian_Optimization_PrimerIan Dewancker
 
Similarity Features, and their Role in Concept Alignment Learning
Similarity Features, and their Role in Concept Alignment Learning Similarity Features, and their Role in Concept Alignment Learning
Similarity Features, and their Role in Concept Alignment Learning Shenghui Wang
 
Asynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsAsynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsFabian Pedregosa
 
15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learningAnil Yadav
 
block-mdp-masters-defense.pdf
block-mdp-masters-defense.pdfblock-mdp-masters-defense.pdf
block-mdp-masters-defense.pdfJunghyun Lee
 
Parallel hybrid chicken swarm optimization for solving the quadratic assignme...
Parallel hybrid chicken swarm optimization for solving the quadratic assignme...Parallel hybrid chicken swarm optimization for solving the quadratic assignme...
Parallel hybrid chicken swarm optimization for solving the quadratic assignme...IJECEIAES
 
Deep learning ensembles loss landscape
Deep learning ensembles loss landscapeDeep learning ensembles loss landscape
Deep learning ensembles loss landscapeDevansh16
 
Kernel methods in machine learning
Kernel methods in machine learningKernel methods in machine learning
Kernel methods in machine learningbutest
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmIJORCS
 
Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisDavid Gleich
 
Ensemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringEnsemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringIJERD Editor
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorizationmidi
 

Similar a Learning for Optimization: EDAs, probabilistic modelling, or ... (20)

Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdfQuark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
 
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdfQuark: Controllable Text Generation with Reinforced [Un]learning.pdf
Quark: Controllable Text Generation with Reinforced [Un]learning.pdf
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
 
SigOpt_Bayesian_Optimization_Primer
SigOpt_Bayesian_Optimization_PrimerSigOpt_Bayesian_Optimization_Primer
SigOpt_Bayesian_Optimization_Primer
 
Similarity Features, and their Role in Concept Alignment Learning
Similarity Features, and their Role in Concept Alignment Learning Similarity Features, and their Role in Concept Alignment Learning
Similarity Features, and their Role in Concept Alignment Learning
 
Asynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and AlgorithmsAsynchronous Stochastic Optimization, New Analysis and Algorithms
Asynchronous Stochastic Optimization, New Analysis and Algorithms
 
One Graduate Paper
One Graduate PaperOne Graduate Paper
One Graduate Paper
 
15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning15857 cse422 unsupervised-learning
15857 cse422 unsupervised-learning
 
block-mdp-masters-defense.pdf
block-mdp-masters-defense.pdfblock-mdp-masters-defense.pdf
block-mdp-masters-defense.pdf
 
Parallel hybrid chicken swarm optimization for solving the quadratic assignme...
Parallel hybrid chicken swarm optimization for solving the quadratic assignme...Parallel hybrid chicken swarm optimization for solving the quadratic assignme...
Parallel hybrid chicken swarm optimization for solving the quadratic assignme...
 
Deep learning ensembles loss landscape
Deep learning ensembles loss landscapeDeep learning ensembles loss landscape
Deep learning ensembles loss landscape
 
Kernel methods in machine learning
Kernel methods in machine learningKernel methods in machine learning
Kernel methods in machine learning
 
A PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering AlgorithmA PSO-Based Subtractive Data Clustering Algorithm
A PSO-Based Subtractive Data Clustering Algorithm
 
Cluster
ClusterCluster
Cluster
 
Joco pavone
Joco pavoneJoco pavone
Joco pavone
 
Engineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network AnalysisEngineering Data Science Objectives for Social Network Analysis
Engineering Data Science Objectives for Social Network Analysis
 
MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilis...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilis...MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilis...
MUMS: Bayesian, Fiducial, and Frequentist Conference - Generalized Probabilis...
 
Ensemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes ClusteringEnsemble based Distributed K-Modes Clustering
Ensemble based Distributed K-Modes Clustering
 
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
MUMS Opening Workshop - An Overview of Reduced-Order Models and Emulators (ED...
 
20070702 Text Categorization
20070702 Text Categorization20070702 Text Categorization
20070702 Text Categorization
 

Más de butest

LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jacksonbutest
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...butest
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALbutest
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer IIbutest
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazzbutest
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.docbutest
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1butest
 
Facebook
Facebook Facebook
Facebook butest
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...butest
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...butest
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTbutest
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docbutest
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docbutest
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.docbutest
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!butest
 
Download
DownloadDownload
Downloadbutest
 
resume.doc
resume.docresume.doc
resume.docbutest
 

Más de butest (20)

LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Timeline: The Life of Michael Jackson
Timeline: The Life of Michael JacksonTimeline: The Life of Michael Jackson
Timeline: The Life of Michael Jackson
 
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
Popular Reading Last Updated April 1, 2010 Adams, Lorraine The ...
 
LESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIALLESSONS FROM THE MICHAEL JACKSON TRIAL
LESSONS FROM THE MICHAEL JACKSON TRIAL
 
Com 380, Summer II
Com 380, Summer IICom 380, Summer II
Com 380, Summer II
 
PPT
PPTPPT
PPT
 
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet JazzThe MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
The MYnstrel Free Press Volume 2: Economic Struggles, Meet Jazz
 
MICHAEL JACKSON.doc
MICHAEL JACKSON.docMICHAEL JACKSON.doc
MICHAEL JACKSON.doc
 
Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1Social Networks: Twitter Facebook SL - Slide 1
Social Networks: Twitter Facebook SL - Slide 1
 
Facebook
Facebook Facebook
Facebook
 
Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...Executive Summary Hare Chevrolet is a General Motors dealership ...
Executive Summary Hare Chevrolet is a General Motors dealership ...
 
Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...Welcome to the Dougherty County Public Library's Facebook and ...
Welcome to the Dougherty County Public Library's Facebook and ...
 
NEWS ANNOUNCEMENT
NEWS ANNOUNCEMENTNEWS ANNOUNCEMENT
NEWS ANNOUNCEMENT
 
C-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.docC-2100 Ultra Zoom.doc
C-2100 Ultra Zoom.doc
 
MAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.docMAC Printing on ITS Printers.doc.doc
MAC Printing on ITS Printers.doc.doc
 
Mac OS X Guide.doc
Mac OS X Guide.docMac OS X Guide.doc
Mac OS X Guide.doc
 
hier
hierhier
hier
 
WEB DESIGN!
WEB DESIGN!WEB DESIGN!
WEB DESIGN!
 
Download
DownloadDownload
Download
 
resume.doc
resume.docresume.doc
resume.doc
 

Learning for Optimization: EDAs, probabilistic modelling, or ...

  • 1. Explicit Modelling in Metaheuristic Optimization Dr Marcus Gallagher School of Information Technology and Electrical Engineering University of Queensland Q. 4072 marcusg@itee.uq.edu.au
  • 2. Talk outline:  Optimization, heuristics and metaheuristics.  “Estimation of Distribution” (optimization) algorithms (EDAs): a brief overview.  A framework for describing EDAs.  Other modelling approaches in metaheuristics.  Summary Marcus Gallagher - MASCOS Symposium, 26/11/04 2
  • 3. “Hard” Optimization Problems Goal: Find x* S such that f (x*) f (x), x S  where S is often multi-dimensional; real-valued or binary n n S R or S 0,1  Many classes of optimization problems (and algorithms) exist.  When might it be worthwhile to consider metaheuristic or machine learning approaches? Marcus Gallagher - MASCOS Symposium, 26/11/04 3
  • 4. Finding an “exact” solution is intractable. Limited knowledge of f()  No derivative information.  May be discontinuous, noisy,… Evaluating f() is expensive in terms of time or cost. f() is known or suspected to contain nasty features  Many local minima, plateaus, ravines. The search space is high-dimensional. Marcus Gallagher - MASCOS Symposium, 26/11/04 4
  • 5. What is the “practical” goal of (global) optimization?  “There exists a goal (e.g. to find as small a value of f() as possible), there exist resources (e.g. some number of trials), and the problem is how to use these resources in an optimal way.”  A. Torn and A. Zilinskas, Global Optimisation. Springer- Verlag, 1989. Lecture Notes in Computer Science, Vol. 350. Marcus Gallagher - MASCOS Symposium, 26/11/04 5
  • 6. Heuristics Heuristic (or approximate) algorithms aim to find a good solution to a problem in a reasonable amount of computation time – but with no guarantee of “goodness” or “efficiency” (cf. exact or complete algorithms). Broad classes of heuristics:  Constructive methods  Local search methods Marcus Gallagher - MASCOS Symposium, 26/11/04 6
  • 7. Metaheuristics Metaheuristics are (roughly) high-level strategies that combinine lower-level techniques for exploration and exploitation of the search space.  An overarching term to refer to algorithms including Evolutionary Algorithms, Simulated Annealing, Tabu Search, Ant Colony, Particle Swarm, Cross- Entropy,… C. Blum and A. Roli. Metaheuristics in Combinatorial Optimization: Overview and Conceptual Comparison. ACM Computing Surveys, 35(3), 2003, pp. 268-308. Marcus Gallagher - MASCOS Symposium, 26/11/04 7
  • 8. Learning/Modelling for Optimization Most optimization algorithms make some (explicit or implicit) assumptions about the nature of f(). Many algorithms vary their behaviour during execution (e.g. simulated annealing). In some optimization algorithms the search is adaptive  Future search points evaluated depend on previous points searched (and/or their f() values, derivatives of f() etc). Learning/modelling can be implicit (e.g, adapting the step-size in gradient descent, population in an EA). …or explicit; examples from optimization literature:  Nelder-Mead simplex algorithm.  Response surfaces (metamodelling, surrogate function). Marcus Gallagher - MASCOS Symposium, 26/11/04 8
  • 9. EDAs: Probabilistic Modelling for Optimization Based on the use of (unsupervised) density estimators/generative statistical models. Idea is to convert the optimization problem into a search over probability distributions.  P. Larranaga and J. A. Lozano (eds.). Estimation of Distribution Algorithms: a new tool for evolutionary computation. Kluwer Academic Publishers, 2002. The probabilistic model is in some sense an explicit model of (currently) promising regions of the search space. Marcus Gallagher - MASCOS Symposium, 26/11/04 9
  • 10. EDAs: toy example Marcus Gallagher - MASCOS Symposium, 26/11/04 10
  • 11. EDAs: toy example Marcus Gallagher - MASCOS Symposium, 26/11/04 11
  • 12. GAs and EDAs compared GA pseudocode 1. Initialize the population, X(t); 2. Evaluate the objective function for each point; 3. Selection(); 4. Crossover(); 5. Mutation(); 6.  Form new population X(t+1); 7. While !(terminate()) Goto 2; Marcus Gallagher - MASCOS Symposium, 26/11/04 12
  • 13. GAs and EDAs compared EDA pseudocode 1. Initialize a probability model, Q(x); 2. Create a population of points by sampling from Q(x); 3. Evaluate the objective function for each point; 4. Update Q(x) using selected population and f() values; 5. While !(terminate()) Goto 2; Marcus Gallagher - MASCOS Symposium, 26/11/04 13
  • 14. EDA Example 1 Population-based Incremental Learning (PBIL)  S. Baluja, R. Caruana. Removing the Genetics from the Standard Genetic Algorithm. ICML’95. p1 = p2 = pn = Pr(x1=1) Pr(x2=1) Pr(xn=1) pi 1 pi xib Marcus Gallagher - MASCOS Symposium, 26/11/04 14
  • 15. EDA Example 2 Mutual Information Maximization for Input Clustering (MIMIC)  J. De Bonet, C. Isbell and P. Viola. MIMIC: Finding optima by estimating probability densities. Advances in Neural Information Processing Systems, vol.9, 1997. p(x) p( xi1 | xi2 ) p( xi2 | xi3 ) p( xin 1 | xin ) p( xin ) Marcus Gallagher - MASCOS Symposium, 26/11/04 15
  • 16. EDA Example 3 Combining Optimizers with Mutual Information Trees (COMIT)  S. Baluja and S. Davies. Using optimal dependency-trees for combinatorial optimization: learning the structure of the search space. Proc. ICML’97. Uses a tree-structured graphical model  Model can be constructed in O(n2) time using a variant of the minimum spanning tree algorithm.  Model is optimal, given the restrictions, in the sense that the Kullback-Liebler divergence between the model and a full joint distribution is minimized. Marcus Gallagher - MASCOS Symposium, 26/11/04 16
  • 17. EDA Example 4 Bayesian Optimization Algorithm (BOA)  M. Pelikan, D. Goldberg and E. Cantu-Paz. BOA: The Bayesian optimization algorithm. In Proc. GECCO’99. Bayesian network model where nodes can have at most k parents.  Greedy search over the Bayesian Dirichlet equivalence metric to find the network structure. Marcus Gallagher - MASCOS Symposium, 26/11/04 17
  • 18. Further work on EDAs EDAs have also been developed  For problems with continuous and mixed variables.  That use mixture models and kernel estimators - allowing for the modelling of multi-modal distributions.  …and more! Marcus Gallagher - MASCOS Symposium, 26/11/04 18
  • 19. A framework to describe building and adapting a probabilistic model for optimization See: M. Gallagher and M. Frean. Population-Based Continuous Optimization, Probabilistic Modelling and Mean Shift. To appear, Evolutionary Computation, 2005. Consider a continuous EDA with model n Q(x) Qi ( xi ) i 1 Consider a Boltzmann distribution over f(x) 1 f ( x) P( x) exp Z T Marcus Gallagher - MASCOS Symposium, 26/11/04 19
  • 20. As T→0, P(x) tends towards a set of impulse spikes over the global optima. Now, we have a probability distribution that we know the form of, Q(x) and we would like to modify it to be close to P(x). KL divergence: Q( x) K Q( x) log dx x P( x) Let Q(x) be a Gaussian; try and minimize K via gradient descent with respect to the mean parameter of Q(x). Marcus Gallagher - MASCOS Symposium, 26/11/04 20
  • 21. The gradient becomes Q x Q( x) v 1 K Q( x).(x ) f ( x)dx vT x An approximation to the integral is to use a sample of x from Q(x) 1 K ( xi ) f ( xi ) nvT xi S Marcus Gallagher - MASCOS Symposium, 26/11/04 21
  • 22. The algorithm update rule is then (x i ˆ ( xi ) )f n xi S Similar ideas can be found in: A. Berny. Statistical Machine Learning and Combinatorial Optimization. In L. Kallel et al. eds, Theoretical Aspects of Evolutionary Computation, pp. 287-306. Springer. 2001. M. Toussaint. On the evolution of phenotypic exploration distributions. In C. Cotta et al. eds, Foundations of Genetic Algorithms (FOGA VII), pp. 169-182. Morgan Kaufmann. 2003. Marcus Gallagher - MASCOS Symposium, 26/11/04 22
  • 23. Some insights The derived update rule is closely related to those found in Evolution Strategies and a version of PBIL for continuous spaces. It is possible to view these existing algorithms as approximately doing KL minimization. The objective function appears explicitly in this update rule (no selection). Marcus Gallagher - MASCOS Symposium, 26/11/04 23
  • 24. Other Research in Learning/Modelling for Optimization J. A. Boyan and A. W. Moore. Learning Evaluation Functions to Improve Optimization by Local Search. Journal of Machine Learning Research 1:2, 2000. B. Anderson, A. Moore and D. Cohn. A Nonparametric Approach to Noisy and Costly Optimization. International Conference on Machine Learning, 2000. D. R. Jones. A Taxonomy of Global Optimization Methods Based on Response Surfaces. Journal of Global Optimization 21(4):345- 383, 2001. Reinforcement learning  R. J. Williams (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8:229-256.  V. V. Miagkikh and W. F. Punch III, An Approach to Solving Combinatorial Optimization Problems Using a Population of Reinforcement Learning Agents, Genetic and Evolutionary Computation Conf.(GECCO-99), p.1358-1365, 1999. Marcus Gallagher - MASCOS Symposium, 26/11/04 24
  • 25. Summary The field of metaheuristics (including Evolutionary Computation) has produced  A large variety of optimization algorithms  Demonstrated good performance on a range of real- world problems. Metaheuristics are considerably more general:  can even be applied when there isn’t a “true” objective function (coevolution).  Can evolve non-numerical objects. Marcus Gallagher - MASCOS Symposium, 26/11/04 25
  • 26. Summary EDAs take an explicit modelling approach to optimization.  Existing statistical models and model-fitting algorithms can be employed.  Potential for solving challenging problems.  Model can be more easily visualized/interpreted than a dynamic population in a conventional EA. Although the field is highly active, it is still relatively immature  Improve quality of experimental results.  Make sure research goals are well-defined.  Lots of preliminary ideas, but lack of comparative/followup research.  Difficult to keep up with the literature and see connections with other fields. Marcus Gallagher - MASCOS Symposium, 26/11/04 26
  • 27. The End! Questions? Marcus Gallagher - MASCOS Symposium, 26/11/04 27