SlideShare una empresa de Scribd logo
1 de 20
Descargar para leer sin conexión
Supervised versus Unsupervised Learning




Supervised

   Labelled Data.
   Difficult to justify biologically.
   Doesn’t fit all situations.


Unsupervised

   Input Environment only.
Self Organising Neural Networks




  The basic design of an Unsupervised Network
  Unsupervised Learning
  Geometric Interpretation
  What they Learn
  Problems with Self Organising Neural Networks
  Statistical Views.


  Origins: Rosenblatt’s “spontaneous learning” in per-
  ceptrons
  Important work by Fukushima, Grossberg, Kohonen,
  von der Malsburg, Willshaw
No Teachers




  Learn about regularities in the environment
  Recognition — familiarity to previous inputs
  Classification — clustering
  Feature Mapping — topographic mappings
  Encoding — dimensionality reduction — data com-
  pression
  What determines what is learnt?
Example



von der Malsburg, C (1973). Self–organisation of orientation sensi-
tive cells in the striate cortex. Kybernetik, 14: 85–100.

     Environment.




     Initially random.
     Orientation tuned units.
Basic Requirements for Unsupervised Networks



Rumelhart, D. and Zipser, D. (1985). Feature discovery by competi-
tive learning. Cognitive Science, 2: 75–112.

 1. Input units or Input lines.
 2. Response units.
         Number of units.
         Units not all the same.
 3. Limit the strength of units.




                   5     2      1              0   1   2000

    Input pattern 1 0.5 0.01. Weight normalisation.
 4. Allow the units to compete. “winner take all”.
 5. Learning.
Learning in the Rumelhart and Zipser Network




   Winning unit learns.
   Weights become more like input patterns
   (classification).
   Normalisation by weight redistribution:
               ´
      ¡Û           ¼          if loses on stimulus
                   «   «Û
                       Ò
                              if wins on stimulus

           = 1 (0) if input is (in)active on pattern .
      Ò = number of inputs active for pattern
      È                                              (Ò
              ).
      « is the learning constant.
Example of Weight Redistribution




   16 inputs; for each stimulus   assume 8 inputs are
   active.

                     È
   Assume that for each output unit , weights are ini-
   tially normalised: ½ ½ Û    ½.
   Then
          ¡Û     ½
                  «   «Û    if wins and is ON
          ¡Û      «Û        if wins and is OFF

   All weights for wining unit decremented by «Û
   Total weight from all lines decremented
                                              È «Û
   Since
          ÈÛ ½, loss = total deducted from all weights
   on winning unit = «
   Each weight on an active line is incremented by   ½
                                                         «
   gain = total amount of weight added =      ¡ ½ « «.

   loss = gain, so no net change in weight.
Network Summary




                  -

                  -
Example




  2 classification units — binary classification
  16 input lines
  Dipole input (2/16 neighbouring inputs active)




  Weights learned




          unit 1                 unit 2

  Also discovers horizontal, diagonal divisions; simi-
  lar result in 3-d.
  System discovers spatial structure, not in architec-
  ture.
Geometric representation of Learning
Problems with “Competitive Learning”




  How many units?
  Normalisation - biological?
  Problem of dead units?
   1. Leaky learning
   2. Conscience mechanism
  Not a magic technique. c.f. horizontal/vertical line
  task (Rumelhart & Zipser, 1985).
Competitive Learning



                           -



                           -




  Input space is divided up – units learn about a subset
  of the input patterns.
  Input space broken into groups of maximum simi-
  larity.
  Cluster analysis.
  Two sources of competition:
   1. Winner-take-all mechanism
   2. Resource limitation (normalisation)
Statistical Views


                                     y




                  w1        w2           wi        wN



             x1        x2                     xi         xN

   Simple Hebbian learning:
                                dÛ
                                dØ
                                          «Ü Ý
   Linear activation function.
                            Ý         ÛÜ
                            Ý        Û¡Ü
   Then
                        dÛ
                         dØ
                                          «Ü            ÛÜ
                        dÛ
                  Û      dØ
                                          «        Û ÜÜ
Correlation matrix




   Ensemble average and slow changing weights:
                        ¶                  ·
                 Û          «       Û ÜÜ
                 Û      «       Û ÜÜ
                 Û      «       Û
                 Û      « Û
   where   is the correlation matrix:
                                ÜÜ
Eigenvectors & Eigenvalues




  Vector Ü viewed as a point in N dimensional space
  (e.g. Ü = 1,1,1.5 ).



                         z=1.5


                                 y=1


                                       x=1




  A Matrix as a linear transformation.
                           Ú Õ
  Eigenvectors/values
Unconstrained Hebbian Learning




                       Û    « Û
  Over a large number of patterns the eigenvector with
  the largest eigenvalue will be the dominant influ-
  ence in weight change.
  Weights change fastest in the direction of the eigen-
  vector with the largest eigenvalue.
  So weights tend to the principle component of the
  data.
  Solutions to unbounded weights:
     Explicit Normalisation.
     Oja type rule – new terms.
     Simple weight decay.
Principal Components




             Find principal components:




Principal component of data = maximal eigenvector of
the covariance matrix of the data.
Oja rule




   Simple Hebbian learning is unstable, weights grow
   without limits:


                         dÛ
                         dØ
                               «Ü Ý
   Oja rule adds weight decay term:


                    dÛ
                    dØ
                           «Ý´Ü   ÝÛ µ
   Several properties (p202, Hertz et al., 1991)
   1. Û tends to 1.
   2. Û is maximal eigenvector of .
                             ª «
   3. Variance of the output, Ý ¾ , is maximised by Û.
 ¯ Decorrelate output units (via lateral inhibitory con-
   nections) to get other components (Sanger).
Correlation matrices and eigenvectors



Given the simple rule:


                Û         Û            (ignore «)

w can be rewritten in terms of the eigenvectors ( ) of
with eigenvalues :

                 Û       ½ ½   ·   ¾ ¾   ·          Ò   Ò

        where            Û¡
                Û         ´ ½ ½·        ¾ ¾   ·         Ò   Ò
                                                                µ
But since            :

                Û        ½ ½ ½     ·    ¾ ¾ ¾   ·           Ò Ò     Ò


So weight derivative grows mostly in direction of eigen-
vector Ñ with largest eigenvalue Ñ
Summary




  No external teacher needed.
  Competition arises from “winner take all” and weight
  normalisation.
  Discovers principal features of input environment.
  Output units have maximal variance.

Más contenido relacionado

La actualidad más candente

Nonequilibrium statistical mechanics of cluster-cluster aggregation, School o...
Nonequilibrium statistical mechanics of cluster-cluster aggregation, School o...Nonequilibrium statistical mechanics of cluster-cluster aggregation, School o...
Nonequilibrium statistical mechanics of cluster-cluster aggregation, School o...Colm Connaughton
 
Doclad Ulm 2008
Doclad Ulm 2008Doclad Ulm 2008
Doclad Ulm 2008shmill
 
Doubly Decomposing Nonparametric Tensor Regression
Doubly Decomposing Nonparametric Tensor RegressionDoubly Decomposing Nonparametric Tensor Regression
Doubly Decomposing Nonparametric Tensor RegressionMasaaki Imaizumi
 
Expectation Maximization and Gaussian Mixture Models
Expectation Maximization and Gaussian Mixture ModelsExpectation Maximization and Gaussian Mixture Models
Expectation Maximization and Gaussian Mixture Modelspetitegeek
 
Introduction to (weak) wave turbulence
Introduction to (weak) wave turbulenceIntroduction to (weak) wave turbulence
Introduction to (weak) wave turbulenceColm Connaughton
 
Density exploration methods
Density exploration methodsDensity exploration methods
Density exploration methodsPierre Jacob
 

La actualidad más candente (6)

Nonequilibrium statistical mechanics of cluster-cluster aggregation, School o...
Nonequilibrium statistical mechanics of cluster-cluster aggregation, School o...Nonequilibrium statistical mechanics of cluster-cluster aggregation, School o...
Nonequilibrium statistical mechanics of cluster-cluster aggregation, School o...
 
Doclad Ulm 2008
Doclad Ulm 2008Doclad Ulm 2008
Doclad Ulm 2008
 
Doubly Decomposing Nonparametric Tensor Regression
Doubly Decomposing Nonparametric Tensor RegressionDoubly Decomposing Nonparametric Tensor Regression
Doubly Decomposing Nonparametric Tensor Regression
 
Expectation Maximization and Gaussian Mixture Models
Expectation Maximization and Gaussian Mixture ModelsExpectation Maximization and Gaussian Mixture Models
Expectation Maximization and Gaussian Mixture Models
 
Introduction to (weak) wave turbulence
Introduction to (weak) wave turbulenceIntroduction to (weak) wave turbulence
Introduction to (weak) wave turbulence
 
Density exploration methods
Density exploration methodsDensity exploration methods
Density exploration methods
 

Destacado

Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligencesaloni sharma
 
Prolog Code [Family Tree] by Shahzeb Pirzada
Prolog Code [Family Tree] by Shahzeb PirzadaProlog Code [Family Tree] by Shahzeb Pirzada
Prolog Code [Family Tree] by Shahzeb PirzadaShahzeb Pirzada
 
6 Frames for Thinking About Information
6 Frames for Thinking About Information6 Frames for Thinking About Information
6 Frames for Thinking About Informationkangazul
 
Logic Programming and Prolog
Logic Programming and PrologLogic Programming and Prolog
Logic Programming and PrologSadegh Dorri N.
 
Frames: Notes on Improvisation and Design
Frames: Notes on Improvisation and DesignFrames: Notes on Improvisation and Design
Frames: Notes on Improvisation and DesignLiz Danzico
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learningParas Kohli
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learningTonmoy Bhagawati
 
Different types and design of spectacle frames
Different types and design of spectacle framesDifferent types and design of spectacle frames
Different types and design of spectacle framesPabita Dhungel
 

Destacado (16)

Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
 
Prolog Code [Family Tree] by Shahzeb Pirzada
Prolog Code [Family Tree] by Shahzeb PirzadaProlog Code [Family Tree] by Shahzeb Pirzada
Prolog Code [Family Tree] by Shahzeb Pirzada
 
6 Frames for Thinking About Information
6 Frames for Thinking About Information6 Frames for Thinking About Information
6 Frames for Thinking About Information
 
Semantic Networks
Semantic NetworksSemantic Networks
Semantic Networks
 
Semantic networks
Semantic networksSemantic networks
Semantic networks
 
Logic Programming and Prolog
Logic Programming and PrologLogic Programming and Prolog
Logic Programming and Prolog
 
Frames
FramesFrames
Frames
 
Frames: Notes on Improvisation and Design
Frames: Notes on Improvisation and DesignFrames: Notes on Improvisation and Design
Frames: Notes on Improvisation and Design
 
Supervised and unsupervised learning
Supervised and unsupervised learningSupervised and unsupervised learning
Supervised and unsupervised learning
 
Prolog & lisp
Prolog & lispProlog & lisp
Prolog & lisp
 
Space frames!
Space frames!Space frames!
Space frames!
 
Presentation on supervised learning
Presentation on supervised learningPresentation on supervised learning
Presentation on supervised learning
 
Different types and design of spectacle frames
Different types and design of spectacle framesDifferent types and design of spectacle frames
Different types and design of spectacle frames
 
Space frame
Space frameSpace frame
Space frame
 
Artificial Intelligence
Artificial IntelligenceArtificial Intelligence
Artificial Intelligence
 
Space frames
Space framesSpace frames
Space frames
 

Similar a REDES NEURONALES APRENDIZAJE Supervised Vs Unsupervised

Perceptrons
PerceptronsPerceptrons
PerceptronsESCOM
 
Bayesian inversion of deterministic dynamic causal models
Bayesian inversion of deterministic dynamic causal modelsBayesian inversion of deterministic dynamic causal models
Bayesian inversion of deterministic dynamic causal modelskhbrodersen
 
Gauge Theory for Beginners.pptx
Gauge Theory for Beginners.pptxGauge Theory for Beginners.pptx
Gauge Theory for Beginners.pptxHassaan Saleem
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Networkssuserab4f3e
 
ARTIFICIAL-NEURAL-NETWORKMACHINELEARNING
ARTIFICIAL-NEURAL-NETWORKMACHINELEARNINGARTIFICIAL-NEURAL-NETWORKMACHINELEARNING
ARTIFICIAL-NEURAL-NETWORKMACHINELEARNINGmohanapriyastp
 
Boundness of a neural network weights using the notion of a limit of a sequence
Boundness of a neural network weights using the notion of a limit of a sequenceBoundness of a neural network weights using the notion of a limit of a sequence
Boundness of a neural network weights using the notion of a limit of a sequenceIJDKP
 
PPTChapter12.pdf
PPTChapter12.pdfPPTChapter12.pdf
PPTChapter12.pdfVara Prasad
 
Neural Networks. Overview
Neural Networks. OverviewNeural Networks. Overview
Neural Networks. OverviewOleksandr Baiev
 
CS532L4_Backpropagation.pptx
CS532L4_Backpropagation.pptxCS532L4_Backpropagation.pptx
CS532L4_Backpropagation.pptxMFaisalRiaz5
 
Neural network
Neural networkNeural network
Neural networkmarada0033
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural NetworkAtul Krishna
 
20200428135045cfbc718e2c.pdf
20200428135045cfbc718e2c.pdf20200428135045cfbc718e2c.pdf
20200428135045cfbc718e2c.pdfTitleTube
 
Tutorial Equivariance in Imaging ICMS 23.pptx
Tutorial Equivariance in Imaging ICMS 23.pptxTutorial Equivariance in Imaging ICMS 23.pptx
Tutorial Equivariance in Imaging ICMS 23.pptxJulián Tachella
 
Kdd12 tutorial-inf-part-iii
Kdd12 tutorial-inf-part-iiiKdd12 tutorial-inf-part-iii
Kdd12 tutorial-inf-part-iiiLaks Lakshmanan
 
Nural network ER. Abhishek k. upadhyay Learning rules
Nural network ER. Abhishek  k. upadhyay Learning rulesNural network ER. Abhishek  k. upadhyay Learning rules
Nural network ER. Abhishek k. upadhyay Learning rulesabhishek upadhyay
 

Similar a REDES NEURONALES APRENDIZAJE Supervised Vs Unsupervised (20)

Perceptrons
PerceptronsPerceptrons
Perceptrons
 
Bayesian inversion of deterministic dynamic causal models
Bayesian inversion of deterministic dynamic causal modelsBayesian inversion of deterministic dynamic causal models
Bayesian inversion of deterministic dynamic causal models
 
Gauge Theory for Beginners.pptx
Gauge Theory for Beginners.pptxGauge Theory for Beginners.pptx
Gauge Theory for Beginners.pptx
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
ARTIFICIAL-NEURAL-NETWORKMACHINELEARNING
ARTIFICIAL-NEURAL-NETWORKMACHINELEARNINGARTIFICIAL-NEURAL-NETWORKMACHINELEARNING
ARTIFICIAL-NEURAL-NETWORKMACHINELEARNING
 
Boundness of a neural network weights using the notion of a limit of a sequence
Boundness of a neural network weights using the notion of a limit of a sequenceBoundness of a neural network weights using the notion of a limit of a sequence
Boundness of a neural network weights using the notion of a limit of a sequence
 
PPTChapter12.pdf
PPTChapter12.pdfPPTChapter12.pdf
PPTChapter12.pdf
 
Lec10
Lec10Lec10
Lec10
 
Neural Networks. Overview
Neural Networks. OverviewNeural Networks. Overview
Neural Networks. Overview
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
CS532L4_Backpropagation.pptx
CS532L4_Backpropagation.pptxCS532L4_Backpropagation.pptx
CS532L4_Backpropagation.pptx
 
CS767_Lecture_04.pptx
CS767_Lecture_04.pptxCS767_Lecture_04.pptx
CS767_Lecture_04.pptx
 
Machine Learning 1
Machine Learning 1Machine Learning 1
Machine Learning 1
 
Neural network
Neural networkNeural network
Neural network
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
20200428135045cfbc718e2c.pdf
20200428135045cfbc718e2c.pdf20200428135045cfbc718e2c.pdf
20200428135045cfbc718e2c.pdf
 
Tutorial Equivariance in Imaging ICMS 23.pptx
Tutorial Equivariance in Imaging ICMS 23.pptxTutorial Equivariance in Imaging ICMS 23.pptx
Tutorial Equivariance in Imaging ICMS 23.pptx
 
Perceptron
PerceptronPerceptron
Perceptron
 
Kdd12 tutorial-inf-part-iii
Kdd12 tutorial-inf-part-iiiKdd12 tutorial-inf-part-iii
Kdd12 tutorial-inf-part-iii
 
Nural network ER. Abhishek k. upadhyay Learning rules
Nural network ER. Abhishek  k. upadhyay Learning rulesNural network ER. Abhishek  k. upadhyay Learning rules
Nural network ER. Abhishek k. upadhyay Learning rules
 

Más de ESCOM

redes neuronales tipo Som
redes neuronales tipo Somredes neuronales tipo Som
redes neuronales tipo SomESCOM
 
redes neuronales Som
redes neuronales Somredes neuronales Som
redes neuronales SomESCOM
 
redes neuronales Som Slides
redes neuronales Som Slidesredes neuronales Som Slides
redes neuronales Som SlidesESCOM
 
red neuronal Som Net
red neuronal Som Netred neuronal Som Net
red neuronal Som NetESCOM
 
Self Organinising neural networks
Self Organinising  neural networksSelf Organinising  neural networks
Self Organinising neural networksESCOM
 
redes neuronales Kohonen
redes neuronales Kohonenredes neuronales Kohonen
redes neuronales KohonenESCOM
 
Teoria Resonancia Adaptativa
Teoria Resonancia AdaptativaTeoria Resonancia Adaptativa
Teoria Resonancia AdaptativaESCOM
 
ejemplo red neuronal Art1
ejemplo red neuronal Art1ejemplo red neuronal Art1
ejemplo red neuronal Art1ESCOM
 
redes neuronales tipo Art3
redes neuronales tipo Art3redes neuronales tipo Art3
redes neuronales tipo Art3ESCOM
 
Art2
Art2Art2
Art2ESCOM
 
Redes neuronales tipo Art
Redes neuronales tipo ArtRedes neuronales tipo Art
Redes neuronales tipo ArtESCOM
 
Neocognitron
NeocognitronNeocognitron
NeocognitronESCOM
 
Neocognitron
NeocognitronNeocognitron
NeocognitronESCOM
 
Neocognitron
NeocognitronNeocognitron
NeocognitronESCOM
 
Fukushima Cognitron
Fukushima CognitronFukushima Cognitron
Fukushima CognitronESCOM
 
Counterpropagation NETWORK
Counterpropagation NETWORKCounterpropagation NETWORK
Counterpropagation NETWORKESCOM
 
Counterpropagation NETWORK
Counterpropagation NETWORKCounterpropagation NETWORK
Counterpropagation NETWORKESCOM
 
Counterpropagation
CounterpropagationCounterpropagation
CounterpropagationESCOM
 
Teoría de Resonancia Adaptativa Art2 ARTMAP
Teoría de Resonancia Adaptativa Art2 ARTMAPTeoría de Resonancia Adaptativa Art2 ARTMAP
Teoría de Resonancia Adaptativa Art2 ARTMAPESCOM
 
Teoría de Resonancia Adaptativa ART1
Teoría de Resonancia Adaptativa ART1Teoría de Resonancia Adaptativa ART1
Teoría de Resonancia Adaptativa ART1ESCOM
 

Más de ESCOM (20)

redes neuronales tipo Som
redes neuronales tipo Somredes neuronales tipo Som
redes neuronales tipo Som
 
redes neuronales Som
redes neuronales Somredes neuronales Som
redes neuronales Som
 
redes neuronales Som Slides
redes neuronales Som Slidesredes neuronales Som Slides
redes neuronales Som Slides
 
red neuronal Som Net
red neuronal Som Netred neuronal Som Net
red neuronal Som Net
 
Self Organinising neural networks
Self Organinising  neural networksSelf Organinising  neural networks
Self Organinising neural networks
 
redes neuronales Kohonen
redes neuronales Kohonenredes neuronales Kohonen
redes neuronales Kohonen
 
Teoria Resonancia Adaptativa
Teoria Resonancia AdaptativaTeoria Resonancia Adaptativa
Teoria Resonancia Adaptativa
 
ejemplo red neuronal Art1
ejemplo red neuronal Art1ejemplo red neuronal Art1
ejemplo red neuronal Art1
 
redes neuronales tipo Art3
redes neuronales tipo Art3redes neuronales tipo Art3
redes neuronales tipo Art3
 
Art2
Art2Art2
Art2
 
Redes neuronales tipo Art
Redes neuronales tipo ArtRedes neuronales tipo Art
Redes neuronales tipo Art
 
Neocognitron
NeocognitronNeocognitron
Neocognitron
 
Neocognitron
NeocognitronNeocognitron
Neocognitron
 
Neocognitron
NeocognitronNeocognitron
Neocognitron
 
Fukushima Cognitron
Fukushima CognitronFukushima Cognitron
Fukushima Cognitron
 
Counterpropagation NETWORK
Counterpropagation NETWORKCounterpropagation NETWORK
Counterpropagation NETWORK
 
Counterpropagation NETWORK
Counterpropagation NETWORKCounterpropagation NETWORK
Counterpropagation NETWORK
 
Counterpropagation
CounterpropagationCounterpropagation
Counterpropagation
 
Teoría de Resonancia Adaptativa Art2 ARTMAP
Teoría de Resonancia Adaptativa Art2 ARTMAPTeoría de Resonancia Adaptativa Art2 ARTMAP
Teoría de Resonancia Adaptativa Art2 ARTMAP
 
Teoría de Resonancia Adaptativa ART1
Teoría de Resonancia Adaptativa ART1Teoría de Resonancia Adaptativa ART1
Teoría de Resonancia Adaptativa ART1
 

REDES NEURONALES APRENDIZAJE Supervised Vs Unsupervised

  • 1. Supervised versus Unsupervised Learning Supervised Labelled Data. Difficult to justify biologically. Doesn’t fit all situations. Unsupervised Input Environment only.
  • 2. Self Organising Neural Networks The basic design of an Unsupervised Network Unsupervised Learning Geometric Interpretation What they Learn Problems with Self Organising Neural Networks Statistical Views. Origins: Rosenblatt’s “spontaneous learning” in per- ceptrons Important work by Fukushima, Grossberg, Kohonen, von der Malsburg, Willshaw
  • 3. No Teachers Learn about regularities in the environment Recognition — familiarity to previous inputs Classification — clustering Feature Mapping — topographic mappings Encoding — dimensionality reduction — data com- pression What determines what is learnt?
  • 4. Example von der Malsburg, C (1973). Self–organisation of orientation sensi- tive cells in the striate cortex. Kybernetik, 14: 85–100. Environment. Initially random. Orientation tuned units.
  • 5. Basic Requirements for Unsupervised Networks Rumelhart, D. and Zipser, D. (1985). Feature discovery by competi- tive learning. Cognitive Science, 2: 75–112. 1. Input units or Input lines. 2. Response units. Number of units. Units not all the same. 3. Limit the strength of units. 5 2 1 0 1 2000 Input pattern 1 0.5 0.01. Weight normalisation. 4. Allow the units to compete. “winner take all”. 5. Learning.
  • 6. Learning in the Rumelhart and Zipser Network Winning unit learns. Weights become more like input patterns (classification). Normalisation by weight redistribution: ´ ¡Û ¼ if loses on stimulus «   «Û Ò if wins on stimulus = 1 (0) if input is (in)active on pattern . Ò = number of inputs active for pattern È (Ò ). « is the learning constant.
  • 7. Example of Weight Redistribution 16 inputs; for each stimulus assume 8 inputs are active. È Assume that for each output unit , weights are ini- tially normalised: ½ ½ Û ½. Then ¡Û ½ «   «Û if wins and is ON ¡Û  «Û if wins and is OFF All weights for wining unit decremented by «Û Total weight from all lines decremented È «Û Since ÈÛ ½, loss = total deducted from all weights on winning unit = « Each weight on an active line is incremented by ½ « gain = total amount of weight added = ¡ ½ « «. loss = gain, so no net change in weight.
  • 9. Example 2 classification units — binary classification 16 input lines Dipole input (2/16 neighbouring inputs active) Weights learned unit 1 unit 2 Also discovers horizontal, diagonal divisions; simi- lar result in 3-d. System discovers spatial structure, not in architec- ture.
  • 11. Problems with “Competitive Learning” How many units? Normalisation - biological? Problem of dead units? 1. Leaky learning 2. Conscience mechanism Not a magic technique. c.f. horizontal/vertical line task (Rumelhart & Zipser, 1985).
  • 12. Competitive Learning - - Input space is divided up – units learn about a subset of the input patterns. Input space broken into groups of maximum simi- larity. Cluster analysis. Two sources of competition: 1. Winner-take-all mechanism 2. Resource limitation (normalisation)
  • 13. Statistical Views y w1 w2 wi wN x1 x2 xi xN Simple Hebbian learning: dÛ dØ «Ü Ý Linear activation function. Ý ÛÜ Ý Û¡Ü Then dÛ dØ «Ü ÛÜ dÛ Û dØ « Û ÜÜ
  • 14. Correlation matrix Ensemble average and slow changing weights: ¶ · Û « Û ÜÜ Û « Û ÜÜ Û « Û Û « Û where is the correlation matrix: ÜÜ
  • 15. Eigenvectors & Eigenvalues Vector Ü viewed as a point in N dimensional space (e.g. Ü = 1,1,1.5 ). z=1.5 y=1 x=1 A Matrix as a linear transformation. Ú Õ Eigenvectors/values
  • 16. Unconstrained Hebbian Learning Û « Û Over a large number of patterns the eigenvector with the largest eigenvalue will be the dominant influ- ence in weight change. Weights change fastest in the direction of the eigen- vector with the largest eigenvalue. So weights tend to the principle component of the data. Solutions to unbounded weights: Explicit Normalisation. Oja type rule – new terms. Simple weight decay.
  • 17. Principal Components Find principal components: Principal component of data = maximal eigenvector of the covariance matrix of the data.
  • 18. Oja rule Simple Hebbian learning is unstable, weights grow without limits: dÛ dØ «Ü Ý Oja rule adds weight decay term: dÛ dØ «Ý´Ü   ÝÛ µ Several properties (p202, Hertz et al., 1991) 1. Û tends to 1. 2. Û is maximal eigenvector of . ª « 3. Variance of the output, Ý ¾ , is maximised by Û. ¯ Decorrelate output units (via lateral inhibitory con- nections) to get other components (Sanger).
  • 19. Correlation matrices and eigenvectors Given the simple rule: Û Û (ignore «) w can be rewritten in terms of the eigenvectors ( ) of with eigenvalues : Û ½ ½ · ¾ ¾ · Ò Ò where Û¡ Û ´ ½ ½· ¾ ¾ · Ò Ò µ But since : Û ½ ½ ½ · ¾ ¾ ¾ · Ò Ò Ò So weight derivative grows mostly in direction of eigen- vector Ñ with largest eigenvalue Ñ
  • 20. Summary No external teacher needed. Competition arises from “winner take all” and weight normalisation. Discovers principal features of input environment. Output units have maximal variance.