This document provides an introduction to neural networks. It discusses how the first wave of interest emerged after McCullock and Pitts introduced simplified neuron models in 1943. However, perceptron models were shown to have deficiencies in 1969, leading to reduced funding and many researchers leaving the field. Interest re-emerged in the early 1980s after important theoretical results like backpropagation and new hardware increased processing capacities. The document then describes key components of artificial neural networks, including processing units that receive inputs and propagate outputs, different types of connections between units, and activation and output rules. It also covers different network topologies like feed-forward and recurrent networks.
Scaling API-first – The story of a global engineering organization
Fundamental, An Introduction to Neural Networks
1. Neural Networks,
Key Notes
An introduction to Neural Networks, eight edition, 1996
Authors: Ben Krose, Faculty of Mathematics & Computer Science,
University of Amsterdam. Patrick wan der Smagt, Institute of Robotics
and Systems Dynamics, German Aerospace Research Establishment
Keynote: Nelson Piedra, Computer Sciences School - Advanced Tech,
Technical University of Loja UTPL, Ecuador.
4. First wave of interest
• First wave of interest emerged after the
introduction of simplified neurons by
McCullock and Pitts in 1943.
5. First wave of interest
• First wave of interest emerged after the
introduction of simplified neurons by
McCullock and Pitts in 1943.
• These neurons were introduced as models
of biological neurons and as
conceptual components for circuts
that could perform computational tasks.
7. ANN, “black age”
• Perceptrons book (Minsky &
Papert, 1969): showed deficiencies of
perceptrons models, most neural network
funding was redirected and researches left
the field
8. ANN, “black age”
• Perceptrons book (Minsky &
Papert, 1969): showed deficiencies of
perceptrons models, most neural network
funding was redirected and researches left
the field
• Only a few researchers continued their
efforts, most notably Teuvo Kohoen, Stephen
Grossberg, James Anderson, and Kunihiko
Fukushima
10. ANN re-emerged
• Early eighties: ANN, re-emerged only after
some important theorical results, most
notably the discovery of error back-
propagation, and new hardware
developments increased the processing
capacities.
11. ANN re-emerged
• Early eighties: ANN, re-emerged only after
some important theorical results, most
notably the discovery of error back-
propagation, and new hardware
developments increased the processing
capacities.
• Nowdays most universities have a neural
networks groups (i.e. Advanced Tech - UTPL)
13. ¿How be can adequality
characterised A.N.N.?
• Artificial neural networks can be most
adequately characterised as “computational
models” with particular properties such as the
ability,
14. ¿How be can adequality
characterised A.N.N.?
• Artificial neural networks can be most
adequately characterised as “computational
models” with particular properties such as the
ability,
• to adapt or learn,
15. ¿How be can adequality
characterised A.N.N.?
• Artificial neural networks can be most
adequately characterised as “computational
models” with particular properties such as the
ability,
• to adapt or learn,
• to generalise, or
16. ¿How be can adequality
characterised A.N.N.?
• Artificial neural networks can be most
adequately characterised as “computational
models” with particular properties such as the
ability,
• to adapt or learn,
• to generalise, or
• to cluster or organise data, and
17. ¿How be can adequality
characterised A.N.N.?
• Artificial neural networks can be most
adequately characterised as “computational
models” with particular properties such as the
ability,
• to adapt or learn,
• to generalise, or
• to cluster or organise data, and
• which operation is based on parallel
processing.
18. ¿How be can adequality
characterised A.N.N.?
• Artificial neural networks can be most
adequately characterised as “computational
models” with particular properties such as the
ability,
• to adapt or learn,
• to generalise, or
• to cluster or organise data, and
• which operation is based on parallel
processing.
• Also exist parallels with biological systems
23. to to
adapt learn
to
parallel
organise
process
data
24. to
to to
cluster
adapt learn
to
parallel
organise
process
data
25. to
to to
cluster
adapt learn
to
parallel
organise
process
data
Above slide shows properties can be
attributed to neural network models
and existing (non-neural) models
26. Extent the neural approach
proves to be better suited for
certain applications than existing
models
29. A framework for
distributed representation
• To understand ANN, thinking on the parallel
distributed processing (PDP) idea
30. A framework for
distributed representation
• To understand ANN, thinking on the parallel
distributed processing (PDP) idea
• An artifitial network consists of a pool of
simple processing units wich
comunicate by sending signals to each
other over a large number of
weighted connections.
33. • 1/2 Rumelhart and McClelland, 1986:
• a set o processing units (‘neurons’, ‘cells’);
34. • 1/2 Rumelhart and McClelland, 1986:
• a set o processing units (‘neurons’, ‘cells’);
• a state o activation y for every unit, wich
k
equivalent to the output of the unit;
35. • 1/2 Rumelhart and McClelland, 1986:
• a set o processing units (‘neurons’, ‘cells’);
• a state o activation y for every unit, wich
k
equivalent to the output of the unit;
• connections between the units. Generally
each conection is defined by a weight wjk
wich determines the effect wich the signal
of unit j has on unit k;
36. • 1/2 Rumelhart and McClelland, 1986:
• a set o processing units (‘neurons’, ‘cells’);
• a state o activation y for every unit, wich
k
equivalent to the output of the unit;
• connections between the units. Generally
each conection is defined by a weight wjk
wich determines the effect wich the signal
of unit j has on unit k;
• a propagation rule, wich determines the
effective input sk of a unit from its external
inputs.
39. • 2/2 Rumelhart and McClelland, 1986:
• an activation function F , wich determines
k
the new level of activation based on the
effective input sk(t) and the current
activation yk(t);
40. • 2/2 Rumelhart and McClelland, 1986:
• an activation function F , wich determines
k
the new level of activation based on the
effective input sk(t) and the current
activation yk(t);
• an external input (aka bias, offset) θk for
each unit;
41. • 2/2 Rumelhart and McClelland, 1986:
• an activation function F , wich determines
k
the new level of activation based on the
effective input sk(t) and the current
activation yk(t);
• an external input (aka bias, offset) θk for
each unit;
• a method for information gathering (the
learning rule);
42. • 2/2 Rumelhart and McClelland, 1986:
• an activation function F , wich determines
k
the new level of activation based on the
effective input sk(t) and the current
activation yk(t);
• an external input (aka bias, offset) θk for
each unit;
• a method for information gathering (the
learning rule);
• an environment within wich the system
must operate, provinding input signals and
-if necesary- error signals
45. Processing Units
• Each unit performs a relatively simple job:
• a) receive input from neighbours or
external sources an use this to compute
an output which is propagated to other
units;
46. Processing Units
• Each unit performs a relatively simple job:
• a) receive input from neighbours or
external sources an use this to compute
an output which is propagated to other
units;
• b) adjustment of the weights
47. Processing Units
• Each unit performs a relatively simple job:
• a) receive input from neighbours or
external sources an use this to compute
an output which is propagated to other
units;
• b) adjustment of the weights
• The system is inherently parallel in the sense
that many units can carry out their
computations at the same time
48. k
w1k
yk
w2k
fk
sk = Σj wjk yj + θk
wjk
wnk
j y θk
The basic components of an artificial neural network. The
propagation rule used here is the standard wighted summation
49. Thre types of units
input units, i: which receive data
from outside the neural network
output units, o: which send data out
of neural network
hidden units, h: whose input and
output signals remain within the
neural network
50. update of units
Synchronously: all units update their
activation simultanously
Asynchronously: each unit has a
(usually fixed) probability of updating its
activation at a time t, and usually only one
unit will be to do this at a time; in some
cases the latter model has some
advantages
52. Conections between units
• Assume that unit provides an additive contribution
to the input of the unit which it is connected
sk (t) = Σ wjk (t) yj (t)+ θk
j
53. Conections between units
• Assume that unit provides an additive contribution
to the input of the unit which it is connected
• The total input to unit k is simply the weighted
sum of the separate outputs from each of the
connected units plus a bias or offset term θk
sk (t) = Σ wjk (t) yj (t)+ θk
j
54. Conections between units
• Assume that unit provides an additive contribution
to the input of the unit which it is connected
• The total input to unit k is simply the weighted
sum of the separate outputs from each of the
connected units plus a bias or offset term θk
• A positive w is considerad excitation and
jk
negative wjk as inhibition.
sk (t) = Σ wjk (t) yj (t)+ θk
j
55. Conections between units
• Assume that unit provides an additive contribution
to the input of the unit which it is connected
• The total input to unit k is simply the weighted
sum of the separate outputs from each of the
connected units plus a bias or offset term θk
• A positive w is considerad excitation and
jk
negative wjk as inhibition.
• The units of propagation rule be call sigma units
sk (t) = Σ wjk (t) yj (t)+ θk
j
57. Different propagation rule
• Propagation rule for the sigma - Pi unit, Feldman and
Ballard, 1982.
sk (t) = Σ wjk (t) ∏ yjm (t)+ θk (t)
j m
58. Different propagation rule
• Propagation rule for the sigma - Pi unit, Feldman and
Ballard, 1982.
• Often, the yjm are weighted before multiplication.
Although these units are not frequently used, they
their value for gating of input, as well as
implementation of lookup tables (Mel 1990)
sk (t) = Σ wjk (t) ∏ yjm (t)+ θk (t)
j m
59. Activation and output
rules
• New value de activation: we need a function
fk which takes the total input sk (t) and the
current activation yk (t) and produced a
new value of the activation of the unit k.
yk (t+1) = fk(yk (t) , sk (t) )
60. • Often, the activation function is a
nondecreasing function of the total input of
the unit
yk (t+1) = fk( sk (t) ) =
fk( Σ wjk (t) yj (t)+ θk (t) )
j
Sgn sigmoid i
i i
semi linear
hard limiting linear o semi linear smoothly limiting
threshold function function threshold
61. • For this smoonthly limiting function often a
sigmoid (S-shaped) function like:
yk = fk( sk )=1 / ( 1 +e-sk )
• In some cases, the output of a unit can be a
stochastic function of the total input of the
unit. In that case the activation is not
deterministically determined by the neuron
input, but the neuron input determines the
probability p that a neuron get a high
activation rule
p( yk ← 1 ) = 1/ ( 1 +e-sk /T )
62. Network topologies
• This section focuses on the pattern of
connections between the units and
the propagation of data:
• Feed - forward networks
• Recurrent networks that do contain
feedback connections
63. Feed-forward networks
• The data processing can extend over
multiple (layers of) units, but no
feedback connections are present,
that is, connections extending from outputs
of units to input of units in the same layer or
previous layers
65. Recurrent networks that do
contain feedback connections
• Contrary to feed-forward networks, the dynamical properties
of the network are important.
66. Recurrent networks that do
contain feedback connections
• Contrary to feed-forward networks, the dynamical properties
of the network are important.
• In some cases, the activation values of the units under go a
relaxation process such that the network will evolve to a
stable state in wich these activations do not change anymore.
67. Recurrent networks that do
contain feedback connections
• Contrary to feed-forward networks, the dynamical properties
of the network are important.
• In some cases, the activation values of the units under go a
relaxation process such that the network will evolve to a
stable state in wich these activations do not change anymore.
• In other applications, the change of the activation values of
the output neurons are significant, such that the dynamical
behaviour constitutes the output of the network
(Pearlmutter, 1990)
68. Recurrent networks that do
contain feedback connections
• Contrary to feed-forward networks, the dynamical properties
of the network are important.
• In some cases, the activation values of the units under go a
relaxation process such that the network will evolve to a
stable state in wich these activations do not change anymore.
• In other applications, the change of the activation values of
the output neurons are significant, such that the dynamical
behaviour constitutes the output of the network
(Pearlmutter, 1990)
• Classical examples of feed-forward networks are the
Perceptron and Adaline.
70. Training of artificial
neural networks
• A neural network has to be configured such
that the application of a set of inputs
produces (either ‘direct’ or via a relaxation
process) the desired set ot output.
71. Training of artificial
neural networks
• A neural network has to be configured such
that the application of a set of inputs
produces (either ‘direct’ or via a relaxation
process) the desired set ot output.
• One way is to set the weights explicity,
using a priori knowledge.
72. Training of artificial
neural networks
• A neural network has to be configured such
that the application of a set of inputs
produces (either ‘direct’ or via a relaxation
process) the desired set ot output.
• One way is to set the weights explicity,
using a priori knowledge.
• Other way is to ‘train’ the neural network
by feeding it teaching patterns and letting it
change its weights according to some
learning rule.
74. Paradigms of learning
• Supervised learning or Associative
learning in which the network is trained
by providing in with input and matching
output patterns. These input-output pairs
can be provided by an external teacher, or by
the system which contains the network (self-
supervised)
76. Paradigms of learning
• Unsupervised learning or Self-
organisation in which an (output) unit is
trained to respond to clusters of pattern
within the input. In this paradigm the system
is supposed to discover statistically salient
features of the input population. Unlike the
supervised learning paradigm, there is no a
priori set of categories into which the
patterns are to be classified; rather the
system must develop its own representation
of the input stimuli.
80. Hebbian learning rule
• Suggested by Hebb in his classic book
Organization of Behaviour (Hebb, 1949)
• The basic idea is that if two units j and k are
active simultaneously, their interconnection
must be strengthened. If j receives input
from k, the simplest version of Hebbian
learning prescribes to modify the weight wjk
with:
81. Hebbian learning rule
• Suggested by Hebb in his classic book
Organization of Behaviour (Hebb, 1949)
• The basic idea is that if two units j and k are
active simultaneously, their interconnection
must be strengthened. If j receives input
from k, the simplest version of Hebbian
learning prescribes to modify the weight wjk
with:
∆wjk = ϒyjyk; ϒ is a positive constant of
proportionality representing the learning rate
83. Widrow-Hoff rule or
the delta rule
• Another common rule uses not the actual
activation of unit k but the difference
between the actual and desired activation
for adjusting the weights.
•d is the desired activation provided by a
k
teacher
84. Widrow-Hoff rule or
the delta rule
• Another common rule uses not the actual
activation of unit k but the difference
between the actual and desired activation
for adjusting the weights.
•d is the desired activation provided by a
k
teacher
∆wjk = γyj(dk - yk)
85. Terminology
Output vs activation of a unit: to be and the same
thing; that is, the output of each neuron equals its activation rule
Bias, offset, threshold: These terms all refer to a constant
term which is input to a unit. This external input is usually
implemented (and can be written) as a weight from a unit with
activation value 1
Number of layers: In a feed-forward network, the inputs
perform no computation and their layer is therefore not
counted. Thus a network with one input layer, one hidden layer,
and one output layer is referred to as a network with two layer.