2013-1 Machine Learning Lecture 07 - Michael Negnevitsky - Hybrid Intellig…

 Negnevitsky, Pearson Education, 2011Negnevitsky, Pearson Education, 2011 1
Lecture 12Lecture 12
Hybrid intelligent systems:Hybrid intelligent systems:
Evolutionary neural networks and fuzzyEvolutionary neural networks and fuzzy
evolutionary systemsevolutionary systems
II IntroductionIntroduction
II Evolutionary neural networksEvolutionary neural networks
II Fuzzy evolutionary systemsFuzzy evolutionary systems
II SummarySummary

Evolutionary neural networksEvolutionary neural networks
II Although neural networks are used for solving aAlthough neural networks are used for solving a
variety of problems, they still have somevariety of problems, they still have some
limitations.limitations.
II One of the most common is associated with neuralOne of the most common is associated with neural
network training. The backnetwork training. The back--propagation learningpropagation learning
algorithm cannot guarantee an optimal solution.algorithm cannot guarantee an optimal solution.
In realIn real--world applications, the backworld applications, the back--propagationpropagation
algorithm might converge to a set of subalgorithm might converge to a set of sub--optimaloptimal
weights from which it cannot escape. As a result,weights from which it cannot escape. As a result,
the neural network is often unable to find athe neural network is often unable to find a
desirable solution to a problem at hand.desirable solution to a problem at hand.

II Another difficulty is related to selecting anAnother difficulty is related to selecting an
optimal topology for the neural network. Theoptimal topology for the neural network. The
““rightright”” network architecture for a particularnetwork architecture for a particular
problem is often chosen by means of heuristics,problem is often chosen by means of heuristics,
and designing a neural network topology is stilland designing a neural network topology is still
more art than engineering.more art than engineering.
II Genetic algorithms are an effective optimisationGenetic algorithms are an effective optimisation
technique that can guide both weight optimisationtechnique that can guide both weight optimisation
and topology selection.and topology selection.

y
0.9
1
3
4
5
6
7
8
x1
x3
x2
2
-0.8
0.4
0.8
-0.7
0.2
-0.2
0.6
-0.3 0.1
-0.2
0.9
-0.60.1
0.3
0.5
From neuron:
To neuron:
1 2 3 4 5 6 7 8
1
2
3
4
5
6
7
8
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0
0.9 -0.3 -0.7 0 0 0 0 0
-0.8 0.6 0.3 0 0 0 0 0
0.1 -0.2 0.2 0 0 0 0 0
0.4 0.5 0.8 0 0 0 0 0
0 0 0 -0.6 0.1 -0.2 0.9 0
Chromosome: 0.9 -0.3 -0.7 -0.8 0.6 0.3 0.1 -0.2 0.2 0.4 0.5 0.8 -0.6 0.1 -0.2 0.9
Encoding a set of weights in a chromosomeEncoding a set of weights in a chromosome

II The second step is to define a fitness function forThe second step is to define a fitness function for
evaluating the chromosomeevaluating the chromosome’’s performance. Thiss performance. This
function must estimate the performance of afunction must estimate the performance of a
given neural network. We can apply here agiven neural network. We can apply here a
simple function defined by the sum of squaredsimple function defined by the sum of squared
errors.errors.
II The training set of examples is presented to theThe training set of examples is presented to the
network, and the sum of squared errors isnetwork, and the sum of squared errors is
calculated. The smaller the sum, the fitter thecalculated. The smaller the sum, the fitter the
chromosome.chromosome. The genetic algorithm attemptsThe genetic algorithm attempts
to find a set of weights that minimises the sumto find a set of weights that minimises the sum
of squared errors.of squared errors.

II The third step is to choose the genetic operatorsThe third step is to choose the genetic operators ––
crossover and mutation. A crossover operatorcrossover and mutation. A crossover operator
takes two parent chromosomes and creates atakes two parent chromosomes and creates a
single child with genetic material from bothsingle child with genetic material from both
parents. Each gene in the childparents. Each gene in the child’’s chromosome iss chromosome is
represented by the corresponding gene of therepresented by the corresponding gene of the
randomly selected parent.randomly selected parent.
II A mutation operator selects a gene in aA mutation operator selects a gene in a
chromosome and adds a small random valuechromosome and adds a small random value
betweenbetween −−1 and 1 to each weight in this gene.1 and 1 to each weight in this gene.

Crossover in weight optimisationCrossover in weight optimisation
3
4
5
y
6
x2
2
-0.3
0.9
-0.7
0.5
-0.8
-0.6
Parent 1
x1
1
-0.2
0.1
0.4
3
4
5
y
6
x2
2
-0.1
-0.5
0.2
-0.9
0.6
0.3
Parent 2
x1
1 0.9
0.3
-0.8
0.1 -0.7 -0.6 0.5 -0.8-0.2 0.9 0.4 -0.3 0.3 0.2 0.3 -0.9 0.60.9 -0.5 -0.8 -0.1
0.1 -0.7 -0.6 0.5 -0.80.9 -0.5 -0.8 0.1
3
4
5
y
6
x2
2
-0.1
-0.5
-0.7
0.5
-0.8
-0.6
Child
x1
1 0.9
0.1
-0.8

Mutation in weight optimisationMutation in weight optimisation
Original network
3
4
5
y
6
x2
2
-0.3
0.9
-0.7
0.5
-0.8
-0.6x1
1
-0.2
0.1
0.4
0.1 -0.7 -0.6 0.5 -0.8-0.2 0.9
3
4
5
y
6
x2
2
0.2
0.9
-0.7
0.5
-0.8
-0.6x1
1
-0.2
0.1
-0.1
0.1 -0.7 -0.6 0.5 -0.8-0.2 0.9
Mutated network
0.4 -0.3 -0.1 0.2

Can genetic algorithms help us in selectingCan genetic algorithms help us in selecting
the network architecture?the network architecture?
The architecture of the network (i.e. the number ofThe architecture of the network (i.e. the number of
neurons and their interconnections) oftenneurons and their interconnections) often
determines the success or failure of the application.determines the success or failure of the application.
Usually the network architecture is decided by trialUsually the network architecture is decided by trial
and error; there is a great need for a method ofand error; there is a great need for a method of
automatically designing the architecture for aautomatically designing the architecture for a
particular application. Genetic algorithms mayparticular application. Genetic algorithms may
well be suited for this task.well be suited for this task.

II The basic idea behind evolving a suitable networkThe basic idea behind evolving a suitable network
architecture is to conduct a genetic search in aarchitecture is to conduct a genetic search in a
population of possible architectures.population of possible architectures.
II We must first choose a method of encoding aWe must first choose a method of encoding a
networknetwork’’s architecture into a chromosome.s architecture into a chromosome.

Encoding the network architectureEncoding the network architecture
II The connection topology of a neural network canThe connection topology of a neural network can
be represented by a square connectivity matrix.be represented by a square connectivity matrix.
II Each entry in the matrix defines the type ofEach entry in the matrix defines the type of
connection from one neuron (column) to anotherconnection from one neuron (column) to another
(row), where 0 means no connection and 1(row), where 0 means no connection and 1
denotes connection for which the weight can bedenotes connection for which the weight can be
changed through learning.changed through learning.
II To transform the connectivity matrix into aTo transform the connectivity matrix into a
chromosome, we need only to string the rows ofchromosome, we need only to string the rows of
the matrix together.the matrix together.

Encoding of the network topologyEncoding of the network topology
From neuron:
To neuron:
1 2 3 4 5 6
1
2
3
4
5
6
0 0 0 0 0 0
0 0 0 0 0 0
1 1 0 0 0 0
1 0 0 0 0 0
0 1 0 0 0 0
0 1 1 1 1 0
3
4
5
y
6
x2
2
x1
1
Chromosome:
0 0 0 0 0 0 0 0 0 0 0 0 1 1 0 0 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 1 0

The cycle of evolving a neural network topologyThe cycle of evolving a neural network topology
Neural Network j
Fitness = 117
Neural Network j
Fitness = 117
Generation i
Training Data Set
0 0 1.0000
0.1000 0.0998 0.8869
0.2000 0.1987 0.7551
0.3000 0.2955 0.6142
0.4000 0.3894 0.4720
0.5000 0.4794 0.3345
0.6000 0.5646 0.2060
0.7000 0.6442 0.0892
0.8000 0.7174 -0.0143
0.9000 0.7833 -0.1038
1.0000 0.8415 -0.1794
Child 2
Child 1
Crossover
Parent 1
Parent 2
Mutation
Generation (i + 1)

Fuzzy evolutionary systemsFuzzy evolutionary systems
II Evolutionary computation is also used in theEvolutionary computation is also used in the
design of fuzzy systems, particularly for generatingdesign of fuzzy systems, particularly for generating
fuzzy rules and adjusting membership functions offuzzy rules and adjusting membership functions of
fuzzy sets.fuzzy sets.
II In this section, we introduce an application ofIn this section, we introduce an application of
genetic algorithms to select an appropriate set ofgenetic algorithms to select an appropriate set of
fuzzy IFfuzzy IF--THEN rules for a classification problem.THEN rules for a classification problem.
II For a classification problem, a set of fuzzyFor a classification problem, a set of fuzzy
IFIF--THEN rules is generated from numerical data.THEN rules is generated from numerical data.
II First, we use a gridFirst, we use a grid--type fuzzy partition of an inputtype fuzzy partition of an input
space.space.

Fuzzy partition by a 3Fuzzy partition by a 3××××××××3 fuzzy grid3 fuzzy grid
0 1
A1 A2 A3
X1
B2
B1
B3
0
1
X2
Class 1:
Class 2:
µ(x1)
µ(x2)
0
10 1
1
2
3
6
7
4
5
9
8
11
10
12
16
15
14
13
x11
x21

II Black and white dots denote the training patternsBlack and white dots denote the training patterns
ofof ClassClass 1 and1 and ClassClass 2, respectively.2, respectively.
II The gridThe grid--type fuzzy partition can be seen as atype fuzzy partition can be seen as a
rule table.rule table.
II The linguistic values of inputThe linguistic values of input xx1 (1 (AA11,, AA22 andand AA33))
form the horizontal axis, and the linguisticform the horizontal axis, and the linguistic
values of inputvalues of input xx2 (2 (BB11,, BB22 andand BB33) form the) form the
vertical axis.vertical axis.
II At the intersection of a row and a column lies theAt the intersection of a row and a column lies the
rule consequent.rule consequent.
Fuzzy partitionFuzzy partition

In the rule table, each fuzzy subspace can haveIn the rule table, each fuzzy subspace can have
only one fuzzy IFonly one fuzzy IF--THEN rule, and thus the totalTHEN rule, and thus the total
number of rules that can be generated in anumber of rules that can be generated in a KK××KK
grid is equal togrid is equal to KK××KK..

Fuzzy rules that correspond to theFuzzy rules that correspond to the KK××KK fuzzyfuzzy
partition can be represented in a general form as:partition can be represented in a general form as:
wherewhere xxpp is a training pattern on input spaceis a training pattern on input space XX11××XX2,2,
PP is the total number of training patterns,is the total number of training patterns, CCnn is theis the
rule consequent (eitherrule consequent (either ClassClass 1 or1 or ClassClass 2), and2), and
is the certaintyis the certainty factor that a pattern in fuzzyfactor that a pattern in fuzzy
subspacesubspace AAiiBBjj belongs to classbelongs to class CCnn..
is Ai i = 1, 2, . . . , K
is Bj j = 1, 2, . . . , K
Rule Rij :
IF x1p
THEN xp
AND x2p
∈ Cn





 n
ji
C
BA
CF xp = (x1p, x2p), p = 1, 2, . . . , P
CFCFAAii BBjj
CCnn

To determine the rule consequent and the certaintyTo determine the rule consequent and the certainty
factor, we use the following procedure:factor, we use the following procedure:
Step 1Step 1:: Partition an input space intoPartition an input space into KK××KK fuzzyfuzzy
subspaces, and calculate the strength of each classsubspaces, and calculate the strength of each class
of training patterns in every fuzzy subspace.of training patterns in every fuzzy subspace.
Each class in a given fuzzy subspace is representedEach class in a given fuzzy subspace is represented
by its training patterns. The more training patterns,by its training patterns. The more training patterns,
the stronger the classthe stronger the class −− in a given fuzzy subspace,in a given fuzzy subspace,
the rule consequent becomes more certain whenthe rule consequent becomes more certain when
patterns of one particular class appear more oftenpatterns of one particular class appear more often
than patterns of any other class.than patterns of any other class.
Step 2Step 2:: Determine the rule consequent and theDetermine the rule consequent and the
certainty factor in each fuzzy subspace.certainty factor in each fuzzy subspace.

The certainty factor can be interpreted asThe certainty factor can be interpreted as
follows:follows:
II If all the training patterns in fuzzy subspaceIf all the training patterns in fuzzy subspace AAiiBBjj
belong to the same class, then the certaintybelong to the same class, then the certainty
factor is maximum and it is certain that any newfactor is maximum and it is certain that any new
pattern in this subspace will belong to this class.pattern in this subspace will belong to this class.
II If, however, training patterns belong to differentIf, however, training patterns belong to different
classes and these classes have similar strengths,classes and these classes have similar strengths,
then the certainty factor is minimum and it isthen the certainty factor is minimum and it is
uncertain that a new pattern will belong to anyuncertain that a new pattern will belong to any
particular class.particular class.

II This means that patterns in a fuzzy subspace canThis means that patterns in a fuzzy subspace can
be misclassified. Moreover, if a fuzzy subspacebe misclassified. Moreover, if a fuzzy subspace
does not have any training patterns, we cannotdoes not have any training patterns, we cannot
determine the rule consequent at all.determine the rule consequent at all.
II If a fuzzy partition is too coarse, many patternsIf a fuzzy partition is too coarse, many patterns
may be misclassified. On the other hand, if amay be misclassified. On the other hand, if a
fuzzy partition is too fine, many fuzzy rulesfuzzy partition is too fine, many fuzzy rules
cannot be obtained, because of the lack ofcannot be obtained, because of the lack of
training patterns in the corresponding fuzzytraining patterns in the corresponding fuzzy
subspaces.subspaces.

Training patterns are not necessarilyTraining patterns are not necessarily
distributed evenly in the input space. As adistributed evenly in the input space. As a
result, it is often difficult to choose anresult, it is often difficult to choose an
appropriate density for the fuzzy grid. Toappropriate density for the fuzzy grid. To
overcome this difficulty, we useovercome this difficulty, we use multiplemultiple
fuzzy rule tablesfuzzy rule tables..

Multiple fuzzy rule tablesMultiple fuzzy rule tables
K = 2 K = 3 K = 4 K = 5 K = 6
Fuzzy IFFuzzy IF--THEN rules are generated for each fuzzyTHEN rules are generated for each fuzzy
subspace of multiple fuzzy rule tables, and thus asubspace of multiple fuzzy rule tables, and thus a
complete set of rules for our case can be specifiedcomplete set of rules for our case can be specified
as:as:
2222
++ 3322
++ 4422
++ 5522
++ 6622
= 90 rules.= 90 rules.

Once the set of rulesOnce the set of rules SSALLALL is generated, a newis generated, a new
pattern,pattern, xx = (= (xx1,1, xx2), can be classified by the2), can be classified by the
following procedure:following procedure:
Step 1Step 1:: In every fuzzy subspace of the multipleIn every fuzzy subspace of the multiple
fuzzy rule tables, calculate the degree offuzzy rule tables, calculate the degree of
compatibility of a new pattern with each class.compatibility of a new pattern with each class.
Step 2Step 2:: Determine the maximum degree ofDetermine the maximum degree of
compatibility of the new pattern with each class.compatibility of the new pattern with each class.
Step 3Step 3:: Determine the class with which the newDetermine the class with which the new
pattern has the highest degree of compatibility,pattern has the highest degree of compatibility,
and assign the pattern to this class.and assign the pattern to this class.

The number of multiple fuzzy rule tablesThe number of multiple fuzzy rule tables
required for an accurate pattern classificationrequired for an accurate pattern classification
may be large. Consequently, a complete set ofmay be large. Consequently, a complete set of
rules can be enormous. Meanwhile, these rulesrules can be enormous. Meanwhile, these rules
have different classification abilities, and thushave different classification abilities, and thus
by selecting only rules with high potential forby selecting only rules with high potential for
accurate classification, we reduce the numberaccurate classification, we reduce the number
of rules.of rules.

Can we use genetic algorithms for selectingCan we use genetic algorithms for selecting
fuzzy IFfuzzy IF--THEN rules ?THEN rules ?
II The problem of selecting fuzzy IFThe problem of selecting fuzzy IF--THEN rulesTHEN rules
can be seen as a combinatorial optimisationcan be seen as a combinatorial optimisation
problem with two objectives.problem with two objectives.
II The first, more important, objective is toThe first, more important, objective is to
maximise the number of correctly classifiedmaximise the number of correctly classified
patterns.patterns.
II The second objective is to minimise the numberThe second objective is to minimise the number
of rules.of rules.
II Genetic algorithms can be applied to thisGenetic algorithms can be applied to this
problem.problem.

A basic genetic algorithm for selecting fuzzy IFA basic genetic algorithm for selecting fuzzy IF--
THEN rules includes the following steps:THEN rules includes the following steps:
Step 1Step 1:: Randomly generate an initial population ofRandomly generate an initial population of
chromosomes. The population size may bechromosomes. The population size may be
relatively small, say 10 or 20 chromosomes.relatively small, say 10 or 20 chromosomes.
Each gene in a chromosome corresponds to aEach gene in a chromosome corresponds to a
particular fuzzy IFparticular fuzzy IF--THEN rule in the rule setTHEN rule in the rule set
defined bydefined by SSALLALL..
Step 2Step 2:: Calculate the performance, or fitness, ofCalculate the performance, or fitness, of
each individual chromosome in the currenteach individual chromosome in the current
population.population.

The problem of selecting fuzzy rules has twoThe problem of selecting fuzzy rules has two
objectives: to maximise the accuracy of the patternobjectives: to maximise the accuracy of the pattern
classification and to minimise the size of a rule set.classification and to minimise the size of a rule set.
The fitness function has to accommodate both theseThe fitness function has to accommodate both these
objectives. This can be achieved by introducing twoobjectives. This can be achieved by introducing two
respective weights,respective weights, wwPP andand wwNN, in the fitness function:, in the fitness function:
wherewhere PPss is the number of patterns classifiedis the number of patterns classified
successfully,successfully, PPALLALL is the total number of patternsis the total number of patterns
presented to the classification system,presented to the classification system, NNSS andand NNALLALL areare
the numbers of fuzzy IFthe numbers of fuzzy IF--THEN rules in setTHEN rules in set SS and setand set
SSALLALL, respectively., respectively.
ALL
S
N
ALL
P
N
N
w
P
P
wSf s −=)(

The classification accuracy is more important thanThe classification accuracy is more important than
the size of a rule set. That is,the size of a rule set. That is,
ALL
S
ALL N
N
P
P
Sf s −=10)(

Step 3Step 3:: Select a pair of chromosomes for mating.Select a pair of chromosomes for mating.
Parent chromosomes are selected with aParent chromosomes are selected with a
probability associated with their fitness; a betterprobability associated with their fitness; a better
fit chromosome has a higher probability of beingfit chromosome has a higher probability of being
selected.selected.
Step 4Step 4:: Create a pair of offspring chromosomesCreate a pair of offspring chromosomes
by applying a standard crossover operator.by applying a standard crossover operator.
Parent chromosomes are crossed at the randomlyParent chromosomes are crossed at the randomly
selected crossover point.selected crossover point.
Step 5Step 5:: Perform mutation on each gene of thePerform mutation on each gene of the
created offspring. The mutation probability iscreated offspring. The mutation probability is
normally kept quite low, say 0.01. The mutationnormally kept quite low, say 0.01. The mutation
is done by multiplying the gene value byis done by multiplying the gene value by ––1.1.

Step 6Step 6:: Place the created offspring chromosomes inPlace the created offspring chromosomes in
the new population.the new population.
Step 7Step 7:: RepeatRepeat Step 3Step 3 until the size of the newuntil the size of the new
population becomes equal to the size of the initialpopulation becomes equal to the size of the initial
population, and then replace the initial (parent)population, and then replace the initial (parent)
population with the new (offspring) population.population with the new (offspring) population.
Step 9Step 9:: Go toGo to Step 2Step 2, and repeat the process until a, and repeat the process until a
specified number of generations (typically severalspecified number of generations (typically several
hundreds) is considered.hundreds) is considered.
The number of rules can be cut down to less thanThe number of rules can be cut down to less than
2% of the initially generated set of rules.2% of the initially generated set of rules.

2013-1 Machine Learning Lecture 07 - Michael Negnevitsky - Hybrid Intellig…

Recommended

Recommended

More Related Content

What's hot

What's hot (19)

Viewers also liked

Viewers also liked (20)

Similar to 2013-1 Machine Learning Lecture 07 - Michael Negnevitsky - Hybrid Intellig…

Similar to 2013-1 Machine Learning Lecture 07 - Michael Negnevitsky - Hybrid Intellig… (20)

More from Dongseo University

More from Dongseo University (20)

2013-1 Machine Learning Lecture 07 - Michael Negnevitsky - Hybrid Intellig…