How to Troubleshoot Apps for the Modern Connected Worker
Automatically Defined Functions for Learning Classifier Systems
1. Evolutionary
Computation
Research
Group
Code Fragments for
Learning Classifier Systems
Scaling with LCS
Muhammad Iqbal
Victoria University of Wellington
Iqbal@ecs.vuw.ac.nz
Will N. Browne
Victoria University of Wellington
Will.Browne@vuw.ac.nz
Mengjie Zhang
Victoria University of Wellington
Mengjie.Zhang@ecs.vuw.ac.nz
2. Evolutionary
Computation
Research
Group
Automatically Defined Functions (ADFs)
for
Learning Classifier System
Muhammad Iqbal Scaling with LCS
Victoria University of Wellington
Iqbal@ecs.vuw.ac.nz
Will N. Browne
Victoria University of Wellington
Will.Browne@vuw.ac.nz
Mengjie Zhang
Victoria University of Wellington
Mengjie.Zhang@ecs.vuw.ac.nz
3. Outline
• Initial investigations into the scaling of LCS
• Three year research question: can LCS scale to
complex problems from learning simpler related
problems?
• Immediate question: can Automatically Defined
Functions (ADFs) be useful to LCS?
3
4. Outline
• Genetic Programming
• Automatically Defined Functions
• Learning Classifier Systems (LCS)
• Code Fragmented LCS
• Automatically Defined Functions for LCS
• Results and Discussion
• Conclusions
• Future Work
4
5. Genetic Programming (GP)
• Evolutionary algorithm-based methodology
• To discover a computer program that maps
some input to some output
• Tree based representation
• Example:
X Output
0 1
1 3
2 7 Output = F(X)
=?
3 13
4 21
... ...
... ... 5
6. Genetic Programming (GP)
• Evolutionary algorithm-based methodology
• To discover a computer program that maps
some input to some output
• Tree based representation
• Example:
X Output
0 1
1 3
2 7
3 13
4 21
... ...
... ... 6
7. Boolean Multiplexer
a
d=2
n=a+d
n
Num test cases = 2
20-mux 1 million test cases
37-mux 137 billion test cases
7
9. AND, OR, NAND, NOR
X Y X|Y
AND: &
X
0
Y
0
X&Y
0 0 0 0
OR: |
0 1 0 0 1 1
1 0 0 1 0 1
1 1 1 1 1 1
X Y XdY X Y XrY
NAND: d 0 0 1 0 0 1
NOR: r
0 1 1 0 1 0
1 0 1 1 0 0
1 1 0 1 1 0
9
10. Automatically Defined Functions (ADFs)
• Genetic programming trees often have repeated
patterns.
• Repeated subtrees can be treated as subroutines.
• ADFs is a methodology to automatically select and
implement modularity in GP.
• This modularity can:
• Reduce the size of GP tree
• Reduce training time
10
11. Comparison of GP Methods
Population without ADFs = 262144
Population with ADFs = 48640 11
24. Compare Code Fragmented Actions with
Environment Action
1. Using the environmental condition.
2. Using the associated condition from the classifier rule
itself. (# either 0 or 1)
24
25. Code Fragmented Actions -
Message
1.00
0.90 Multiplexer
6-bits using binary actions
Performance
0.80
11-bits using binary actions
0.70 20-bits using binary actions
6-bits using code fragmented actions
0.60 11-bits using code fragmented actions
20-bits using code fragmented actions
0.50
0 20000 40000 60000 80000 100000 120000
Instances
25
26. Code Fragmented Actions - 2
1.00
0.90 Multiplexer
6-bits using binary actions
Performance
0.80
11-bits using binary actions
0.70 20-bits using binary actions
6-bits using code fragmented actions
0.60 11-bits using code fragmented actions
20-bits using code fragmented actions
0.50
0 20000 40000 60000 80000 100000 120000
Instances
26
27. Code Fragmented Actions – Rule
Sample (Random 0 or 1 for #)
1.00
0.90 Multiplexer
6-bits using binary actions
Performance
0.80
11-bits using binary actions
0.70 20-bits using binary actions
6-bits using code fragmented actions
0.60 11-bits using code fragmented actions
20-bits using code fragmented actions
0.50
0 20000 40000 60000 80000 100000 120000
Instances
27
28. Code Fragmented Actions - 4
1.00
0.90 Multiplexer
6-bits using binary actions
Performance
0.80
11-bits using binary actions
0.70 20-bits using binary actions
6-bits using code fragmented actions
0.60 11-bits using code fragmented actions
20-bits using code fragmented actions
0.50
0 10000 20000 30000 40000 50000 60000 70000
Instances
28
40. Comparison of XCS using ADFs
1.00
0.90 Multiplexer
6-bits using standard XCS
Performance
0.80
11-bits using standard XCS
0.70 20-bits using standard XCS
6-bits using XCS with ADFs
0.60 11-bits using XCS with ADFs
20-bits using XCS with ADFs
0.50
0 10000 20000 30000 40000 50000
Instances
Number of ADFs used = 10 ADFs 40
41. Comparison of XCS using ADFs
1
37-bits Multiplexer
0.9
0.8
Performance
0.7
0.6 XCS using 20 ADFs
Standard XCS
0.5
0.4
0 100000 200000 300000 400000 500000
Instances
Just 1 run results. 41
42. Comparison using Multilevel ADFs
1.00
0.90
0.80
Performance
37-bits Multiplexer
0.70
0.60
XCS using multilevel ADFs
0.50
Standard XCS
0.40
0 100000 200000 300000 400000 500000
Instances
42
Number of classifiers used = 8000, 20 runs average
43. Conclusions
• Code Fragments capture important information.
• Automatically Defined Functions reduce training
time in GP.
• Automatically Defined Functions reduce the
number of iterations needed during training in
LCS.
• Automatically Defined Functions produce
compact GP trees.
• Multiple genotypes to a phenotype issue in
feature rich encodings (code fragments and
ADFs) disrupts the subsumption deletion
function. 43
44. Future Work
• Simplification into ADFs in LCS
• Remove non-responsive classifiers.
• MAM technique for ADFs’ fitness.
• Seed identified fit ADFs from a simple
problem to a more complex problem in the
same domain.
• Multiple populations of ADFs from
different problem domains for a general
problem solving system.
44
47. GP Resource Demands
• GP is notoriously resource consuming
• CPU cycles
• Memory
• Standard GP system, 1µs per node
• Binary trees, depth 17: 131 ms per tree
• Fitness cases: 1,000 Population size: 1,000
• Generations: 1,000 Number of runs: 100
» Runtime: 10 Gs ≈ 317 years
• Standard GP system, 1ns per node
» Runtime: 116 days
• Limits to what we can approach with GP
47
[Banzhaf and Harding – GECCO 2009]
48. Sources of Speed-up
• Fast machines
• Vector Processors
• Parallel Machines (MIMD/SIMD)
• Clusters
• Loose Networks
• Multi-core
• Graphics Processing Units (GPU)
48
49. Why GPU is faster than CPU ?
The GPU Devotes More Transistors to Data Processing.
[CUDA C Programming Guide Version 3.2 ] 49
50. GPU Programming APIs
(Application Programming Interface)
• There are a number of toolkits available for
programming GPUs.
• CUDA
• MS Accelerator
• RapidMind
• Shader programming
• So far, researchers in GP have not converged on
one platform
50
52. CUDA Memory Model
CUDA exposes all the different types of memory on the GPU:
52
[CUDA C Programming Guide Version 3.2 ]
53. Boolean Multiplexer
a
d=2
n=a+d
n
Num test cases = 2
20-mux 1 million test cases
37-mux 137 billion test cases
53
54. A Many Threaded CUDA
Interpreter for Genetic
Programming
• Solved 20-bits Multiplexer
• 220 = 1048576 fitness cases
• Has never been solved by tree GP before
• Previously estimated time: more than 4 years
• GPU has consistently done it in less than an hour
• Solved 37-bits Multiplexer
• 237 = 137438953472 fitness cases
• Has never been attempted before using GP
• GPU solves it in under a day
54
[W.B.Langdon, EuroGP-2010]
55. Genetic Programming Parameters
for Solving 20 and 37 Multiplexers
Terminals 20 or 37 Boolean inputs D0 – D19 or D0 – D36 respectively
Functions AND, OR, NAND, NOR
Fitness Pseudo random sample of 2048 of 1048576 or 8192 of 137438953472
fitness cases.
Tournament 4 members run on same random sample. New samples for each tournament
and each generation.
Population 262144
Initial Ramped half-and-half 4:5 (20-Mux) or 5:7 (37-Mux)
Population
Parameters 50% subtree crossover, 5% subtree 45% point mutation.
Max depth 15, max size 511 (20-Mux) or 1023 (37-Mux)
Termination 5000 generations
Solutions are found in generations 423 (20-Mux) and 2866 (37-Mux).
[W.B.Langdon, EuroGP-2010] 55
60. Comparison of XCS using ADFs
1
0.9
37-bits Multiplexer
0.8
Performance
0.7
0.6 XCS using standalone ADFs
0.5 Standard XCS
0.4
0 100000 200000 300000 400000 500000
Instances
60
Number of classifiers used = 8000 Just 1 run results.
64. Comparison of XCS using ADFs
1.00
0.90 Multiplexer
6-bits using standard XCS
Performance
0.80
11-bits using standard XCS
0.70 20-bits using standard XCS
6-bits using XCS with ADFs
0.60 11-bits using XCS with ADFs
20-bits using XCS with ADFs
0.50
0 10000 20000 30000 40000 50000
Instances
Number of ADFs used = 10 ADFs 64
65. Comparison of XCS using ADFs
1
37-bits Multiplexer
0.9
0.8
Performance
0.7
0.6 XCS using 20 ADFs
Standard XCS
0.5
0.4
0 100000 200000 300000 400000 500000
Instances
65
Number of classifiers used = 8000 Just 1 run results.
66. Comparison of XCS using ADFs
1
0.9
37-bits Multiplexer
0.8
Performance
0.7
0.6
Standard XCS
0.5 XCS using 100 ADFs
0.4
0 100000 200000 300000 400000 500000
Instances
66
Number of classifiers used = 8000 Just 1 run results.
67. XCS using Multilevel ADFs
• Code fragments do not explore the search space
as efficiently as ADFs can do.
• ADFs takes a lot of time to evaluate an ADF-
Tree because of having nested calls to other
ADFs.
• ADFs that can not call other ADFs are in
between the above two techniques both in terms
of exploring search space and taking time for it.
• So, tried one more option ..... Multilevel ADFs
67
68. XCS using Multilevel ADFs
• Three level ADFs
• 20 ADFs at each level
• Each ADF taking two arguments
• ADFs at level 1 can call any ADFs from level 2
and level 3 but can not call any ADF from level 1
• ADFs at level 2 can call any ADF from level 3
but can not call any ADF from level 1 and level 2
• Level 3 ADFs are not allowed to call any other
ADF
68
70. Comparison using Multilevel ADFs
1.00
0.90 Multiplexer
Performance
0.80 6-bits using standard XCS
11-bits using standard XCS
0.70 20-bits using standard XCS
6-bits using XCS with multilevel ADFs
0.60 11-bits using XCS with multilevel ADFs
20-bits using XCS with multilevel ADFs
0.50
0 10000 20000 30000 40000 50000 60000 70000 80000
Instances
70
71. Comparison using Multilevel ADFs
1.00
0.90
0.80
Performance
37-bits Multiplexer
0.70
0.60
XCS using multilevel ADFs
0.50
Standard XCS
0.40
0 100000 200000 300000 400000 500000
Instances
71
Number of classifiers used = 8000, 20 runs average