Fundamental, An Introduction to Neural Networks

Neural Networks,
Key Notes
An introduction to Neural Networks, eight edition, 1996
Authors: Ben Krose, Faculty of Mathematics & Computer Science,
University of Amsterdam. Patrick wan der Smagt, Institute of Robotics
and Systems Dynamics, German Aerospace Research Establishment
Keynote: Nelson Piedra, Computer Sciences School - Advanced Tech,
Technical University of Loja UTPL, Ecuador.

Part I Fundamentals
1. Introduction

First wave of interest

• First wave of interest emerged after the
introduction of simpliﬁed neurons by
McCullock and Pitts in 1943.

First wave of interest

• First wave of interest emerged after the
introduction of simpliﬁed neurons by
McCullock and Pitts in 1943.
• These neurons were introduced as models
of biological neurons and as
conceptual components for circuts
that could perform computational tasks.

ANN, “black age”
• Perceptrons book (Minsky &
Papert, 1969): showed deﬁciencies of
perceptrons models, most neural network
funding was redirected and researches left
the ﬁeld

ANN, “black age”
• Perceptrons book (Minsky &
Papert, 1969): showed deﬁciencies of
perceptrons models, most neural network
funding was redirected and researches left
the ﬁeld
• Only a few researchers continued their
efforts, most notably Teuvo Kohoen, Stephen
Grossberg, James Anderson, and Kunihiko
Fukushima

ANN re-emerged
• Early eighties: ANN, re-emerged only after
some important theorical results, most
notably the discovery of error back-
propagation, and new hardware
developments increased the processing
capacities.

ANN re-emerged
• Early eighties: ANN, re-emerged only after
some important theorical results, most
notably the discovery of error back-
propagation, and new hardware
developments increased the processing
capacities.
• Nowdays most universities have a neural
networks groups (i.e. Advanced Tech - UTPL)

¿How be can adequality
characterised A.N.N.?

• Artiﬁcial neural networks can be most
adequately characterised as “computational
models” with particular properties such as the
ability,

ability,
• to adapt or learn,

ability,
• to generalise, or

ability,
• to cluster or organise data, and

ability,
• which operation is based on parallel
processing.

ability,
• which operation is based on parallel
processing.
• Also exist parallels with biological systems

to to
adapt learn
parallel
process

to to
adapt learn
to
parallel
organise
process
data

to
to to
cluster
adapt learn
to
parallel
organise
process
data

to
to to
cluster
adapt learn
to
parallel
organise
process
data

Above slide shows properties can be
attributed to neural network models
and existing (non-neural) models

Extent the neural approach
proves to be better suited for
certain applications than existing
models

Part I Fundamentals
2. Fundamentals

A framework for
distributed representation

A framework for
• To understand ANN, thinking on the parallel
distributed processing (PDP) idea

A framework for
• To understand ANN, thinking on the parallel
distributed processing (PDP) idea
• An artiﬁtial network consists of a pool of
simple processing units wich
comunicate by sending signals to each
other over a large number of
weighted connections.

• 1/2 Rumelhart and McClelland, 1986:

• a set o processing units (‘neurons’, ‘cells’);

• a state o activation y for every unit, wich
k
equivalent to the output of the unit;

k
• connections between the units. Generally
each conection is deﬁned by a weight wjk
wich determines the effect wich the signal
of unit j has on unit k;

k
• connections between the units. Generally
each conection is deﬁned by a weight wjk
wich determines the effect wich the signal
of unit j has on unit k;
• a propagation rule, wich determines the
effective input sk of a unit from its external
inputs.

• an activation function F , wich determines
k
the new level of activation based on the
effective input sk(t) and the current
activation yk(t);

k
activation yk(t);
• an external input (aka bias, offset) θk for
each unit;

k
activation yk(t);
each unit;
• a method for information gathering (the
learning rule);

k
activation yk(t);
each unit;
• a method for information gathering (the
learning rule);
• an environment within wich the system
must operate, provinding input signals and
-if necesary- error signals

Processing Units
• Each unit performs a relatively simple job:

Processing Units
• a) receive input from neighbours or
external sources an use this to compute
an output which is propagated to other
units;

Processing Units
units;
• b) adjustment of the weights

Processing Units
units;
• b) adjustment of the weights
• The system is inherently parallel in the sense
that many units can carry out their
computations at the same time

k
w1k
yk
w2k
fk
sk = Σj wjk yj + θk
wjk
wnk

j y θk

The basic components of an artificial neural network. The
propagation rule used here is the standard wighted summation

Thre types of units
input units, i: which receive data
from outside the neural network

output units, o: which send data out
of neural network

hidden units, h: whose input and
output signals remain within the
neural network

update of units
Synchronously: all units update their
activation simultanously

Asynchronously: each unit has a
(usually ﬁxed) probability of updating its
activation at a time t, and usually only one
unit will be to do this at a time; in some
cases the latter model has some
advantages

Conections between units

sk (t) = Σ wjk (t) yj (t)+ θk
j

• Assume that unit provides an additive contribution
to the input of the unit which it is connected

j

• The total input to unit k is simply the weighted
sum of the separate outputs from each of the
connected units plus a bias or offset term θk

j

• A positive w is considerad excitation and
jk
negative wjk as inhibition.

j

• A positive w is considerad excitation and
jk
negative wjk as inhibition.
• The units of propagation rule be call sigma units
j

Different propagation rule

sk (t) = Σ wjk (t) ∏ yjm (t)+ θk (t)
j m

• Propagation rule for the sigma - Pi unit, Feldman and
Ballard, 1982.

j m

• Propagation rule for the sigma - Pi unit, Feldman and
Ballard, 1982.
• Often, the yjm are weighted before multiplication.
Although these units are not frequently used, they
their value for gating of input, as well as
implementation of lookup tables (Mel 1990)

j m

Activation and output
rules
• New value de activation: we need a function
fk which takes the total input sk (t) and the
current activation yk (t) and produced a
new value of the activation of the unit k.

yk (t+1) = fk(yk (t) , sk (t) )

• Often, the activation function is a
nondecreasing function of the total input of
the unit

yk (t+1) = fk( sk (t) ) =
fk( Σ wjk (t) yj (t)+ θk (t) )
j

Sgn sigmoid i
i i
semi linear
hard limiting linear o semi linear smoothly limiting
threshold function function threshold

• For this smoonthly limiting function often a
sigmoid (S-shaped) function like:

yk = fk( sk )=1 / ( 1 +e-sk )

• In some cases, the output of a unit can be a
stochastic function of the total input of the
unit. In that case the activation is not
deterministically determined by the neuron
input, but the neuron input determines the
probability p that a neuron get a high
activation rule

p( yk ← 1 ) = 1/ ( 1 +e-sk /T )

Network topologies

• This section focuses on the pattern of
connections between the units and
the propagation of data:
• Feed - forward networks
• Recurrent networks that do contain
feedback connections

Feed-forward networks

• The data processing can extend over
multiple (layers of) units, but no
feedback connections are present,
that is, connections extending from outputs
of units to input of units in the same layer or
previous layers

Recurrent networks that do
contain feedback connections

• Contrary to feed-forward networks, the dynamical properties
of the network are important.


• In some cases, the activation values of the units under go a
relaxation process such that the network will evolve to a
stable state in wich these activations do not change anymore.



• In other applications, the change of the activation values of
the output neurons are signiﬁcant, such that the dynamical
behaviour constitutes the output of the network
(Pearlmutter, 1990)



• In other applications, the change of the activation values of
the output neurons are signiﬁcant, such that the dynamical
behaviour constitutes the output of the network
(Pearlmutter, 1990)

• Classical examples of feed-forward networks are the
Perceptron and Adaline.

Training of artiﬁcial
neural networks

neural networks
• A neural network has to be conﬁgured such
that the application of a set of inputs
produces (either ‘direct’ or via a relaxation
process) the desired set ot output.

neural networks
• One way is to set the weights explicity,
using a priori knowledge.

neural networks
• One way is to set the weights explicity,
using a priori knowledge.
• Other way is to ‘train’ the neural network
by feeding it teaching patterns and letting it
change its weights according to some
learning rule.

Paradigms of learning

• Supervised learning or Associative
learning in which the network is trained
by providing in with input and matching
output patterns. These input-output pairs
can be provided by an external teacher, or by
the system which contains the network (self-
supervised)

Paradigms of learning
• Unsupervised learning or Self-
organisation in which an (output) unit is
trained to respond to clusters of pattern
within the input. In this paradigm the system
is supposed to discover statistically salient
features of the input population. Unlike the
supervised learning paradigm, there is no a
priori set of categories into which the
patterns are to be classiﬁed; rather the
system must develop its own representation
of the input stimuli.

Modifying patters of
connectivity

Modifying patters of
connectivity
Hebbian learning rule
Widrow - Hoff
In the next chapters some of these update rules will be
discussed

• Suggested by Hebb in his classic book
Organization of Behaviour (Hebb, 1949)
• The basic idea is that if two units j and k are
active simultaneously, their interconnection
must be strengthened. If j receives input
from k, the simplest version of Hebbian
learning prescribes to modify the weight wjk
with:

• Suggested by Hebb in his classic book
Organization of Behaviour (Hebb, 1949)
• The basic idea is that if two units j and k are
active simultaneously, their interconnection
must be strengthened. If j receives input
from k, the simplest version of Hebbian
learning prescribes to modify the weight wjk
with:
∆wjk = ϒyjyk; ϒ is a positive constant of
proportionality representing the learning rate

Widrow-Hoff rule or
the delta rule

Widrow-Hoff rule or
the delta rule
• Another common rule uses not the actual
activation of unit k but the difference
between the actual and desired activation
for adjusting the weights.
•d is the desired activation provided by a
k
teacher

Widrow-Hoff rule or
the delta rule
• Another common rule uses not the actual
activation of unit k but the difference
between the actual and desired activation
for adjusting the weights.
•d is the desired activation provided by a
k
teacher

∆wjk = γyj(dk - yk)

Terminology
Output vs activation of a unit: to be and the same
thing; that is, the output of each neuron equals its activation rule

Bias, offset, threshold: These terms all refer to a constant
term which is input to a unit. This external input is usually
implemented (and can be written) as a weight from a unit with
activation value 1
Number of layers: In a feed-forward network, the inputs
perform no computation and their layer is therefore not
counted. Thus a network with one input layer, one hidden layer,
and one output layer is referred to as a network with two layer.

Fundamental, An Introduction to Neural Networks

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Destacado

Destacado (20)

Similar a Fundamental, An Introduction to Neural Networks

Similar a Fundamental, An Introduction to Neural Networks (20)

Más de Nelson Piedra

Más de Nelson Piedra (18)

Último

Último (20)

Fundamental, An Introduction to Neural Networks