An artificial neuron network (neural network) is a computational model that mimics the way nerve cells work in the human brain. Artificial neural networks (ANNs) use learning algorithms that can independently make adjustments - or learn, in a sense - as they receive new input
2. • traditional knowledge-based systems (e.g. expert systems)
• neural networks
• fuzzy systems
• evolutionary algorithms
• statistical and Bayesian ML
• multi-agent systems
• distributed artificial intelligence
Main approaches in modern
Artificial Intelligence
Computational Intelligence
(or Soft Computing)
Hybrid Intelligent
Systems
3. • Biological foundations of neural networks. Short
introduction to neurons, synapses, and the
concept of learning
• Perceptron networks
• Linear separability
• Perceptron learning rule
• Types of neural networks. History of
artificial neural networks
Lecture Outline
4. What is connectionism/neural
networks?
A simplified model of how natural neural systems work. Neural
Networks (NNs) simulate natural information processing tasks from
human brain.
A NN model consists of neurons and connections between neurons
(synapses).
Characteristics of Human Brain:
◦ Contains 1011 neurons and 1015 connections
◦ Each neuron may connect to other 10,000 neurons.
◦ Human can perform a task of picture naming in about 500 milisec.
5. Biological Neural Networks
Soma Soma
Synapse
Synapse
Dendrites
Axon
Synapse
Dendrites
Axon
(Picture from M. Negnevitsky, Pearson Education, 2002)
Dendrites bring information to the cell body and axons take information away
from the cell body.
Soma or cell body is the bulbous, non-process portion of a neuron or other
brain cell type, containing the cell nucleus.
6. What is an Artificial Neural Network?
.
.
.
.
. outputs
1
2
m
1
2
n
)
(
f
y
b
x
w
n
1
i
i
i
inputs
f
f:Rm Rn
7. Artificial Neural Networks (ANNs):
An Artificial Neural Network is an interconnected
assembly of simple processing elements, units or
nodes (neurons), whose functionality is inspired by
the functioning of the natural neuron from brain.
The processing ability of the neural network lies in
the inter-unit connection strengths, or weights,
obtained by a process of learning from a set of
training patterns.
8. Artificial Neural Nets (ANNs):
The units (individual neurons) operate only locally on
the inputs they receive via connections.
ANNs undergo some sort of "training" whereby the
connection weights are adjusted on the basis of
presented data. In other words, ANNs "learn" from
examples (as children learn to recognize dogs from
examples of dogs) and exhibit some generalization
capability beyond the training data (for other data
than those included in the training set).
9. Formal neuron model (McCullochandWalterPittsin1943)
)
(
f
y
b
x
w
n
1
i
i
i
f
Asingleneuron has 6 components:
1. Input x
2. Weights w
3. Bias b (Threshold = -b)
4. Activation function f
5. Input function σ
6. Output y
y = f ( 𝚺 xi wi + b )
McCulloch & Pitts (1943) recognised as the designers of the first neuron (and neural network)
model
10. Stept(x) = 1 if x >= t, else 0
Sign(x) = +1 if x >= 0, else –1
Sigmoid(x) = 1/(1+e-k*x)
Identity Function: Id(x) = x
Linear function: Lin(x) = k*x
Formal neuron - Activation Functions
11. A neuron which implements the OR function:
OR
X1 X2 Y
1 1 1
1 0 1
0 1 1
0 0 0
Threshold = 1
Bias = -1 (Threshold = - Bias)
1.5
1.5
Y
X1
X2
Using the McCulloch-Pitts neurons it is possible to implement the basic logic
gates (e.g. AND, OR, NOT). Also, it is well known that any logical function can be
constructed based on the three basic logic gates. Only accepts binary inputs.
Can be considered as classification of
the binary outputs Y, using the binary
inputs, using a linear decision boundary. X1
X2
12. Perceptron Model (Frank Rosenblatt in 1957)
• Perceptron is a Single-Layer, Feed-
Forward Network
• The inputs are real values
• The input connections are associated
with weights.
• The weighted sum of the input
neurons and a threshold can be used
to calculate the output for decision
making .
• Only linear decision surface is learned.
( from G. Kendall, Lect. Notes Univ. of Nottingham)
13. What can Perceptrons learn?
0,0
0,1
1,0
1,1
0,0
0,1
1,0
1,1
AND XOR
• Functions that can be separated in this way are called
Linearly Separable (XOR is not Linearly Separable)
• A perceptron can learn (represent) only Linearly Separable
functions. (e.g. OR, AND, NOT)
14. XOR problem – not linearly separable
One neuron layer is not enough, we should introduce an intermediate (hidden)
layer.
Y = X1 XOR X2 = (X2 AND NOT X1) OR (X1 AND NOT X2)
Threshold for all nodes = 1.5
XOR
X1 X2 Y
1 1 0
1 0 1
0 1 1
0 0 0
X1
X2
Y
2
-1
-1
2
2
2
15. What can Perceptrons represent?
Linear Separability is also possible in more than 3 dimensions –
but it is harder to visualize
( from G. Kendall, Lect. Notes Univ. of Nottingham)
Take a 3-input neuron:
Linear Separability is also possible in more than 3 dimensions –
but it is harder to visualize
16. Training a Perceptron:
p = 4
Training set = { ((1,1),1), ((1,0),0), ((0,1),0), ((0,0),0) }
The training technique is called Perceptron Learning Rule
AND
X1 X2 D
1 1 1
1 0 0
0 1 0
0 0 0
Training Dataset: { (x(i), d(i)), i=1,…,p} p - number of training examples
18. Vectors from the training set are presented to the
Perceptron network one after another (cyclic or randomly):
(x(1), d(1)), (x(2), d(2)),…, (x(p), d(p)),
(x(p+1), d(p+1)),…
• If the network's output is correct, no change is made.
Otherwise, the weights and biases are updated using the
Perceptron Learning Rule.
• An entire pass through all of the input training vectors is
called an Epoch.
• When such an entire pass of the training set has occurred
without error, training is complete
Training a perceptron (main idea):
19. The Perceptron learning algorithm:
1. Initialize the weights and threshold to small random numbers.
2. At time step t present a vector to the neuron inputs and calculate
the perceptron output y(t).
3. Update the weights and biases as follows:
wj(t+1) = wj(t) + η (d(t)-y(t))xj
b(t+1) = b(t) + η (d(t)-y(t))
d(t) is the desired output, y(t) is the computed output,
t is the step/iteration number, and
η is the gain or step size (Learning Rate), where 0.0 < η <= 1.0
4. Repeat steps 2 and 3 until:
the iteration error is less than a user-specified error threshold
or a predetermined number of iterations have been completed.
(The perceptron learning algorithm developed originally by F. Rosenblatt in the late 1950s)
20. Training a Perceptron – Example AND function
Initial weights: 0.4; 0.6
threshold =0.5 (bias=-0.5)
Learning rate: η = 0.1
Input 1 Input 2 Target Output Weight 1 Weight 2 Bias
0.4 0.6 -0.5
21. Input 1 Input 2 Target Output Weight 1 Weight 2 Bias
1 1 1 1 0.4 0.6 -0.5
Training a Perceptron – Example AND function
Initial weights: 0.4; 0.6
threshold =0.5 (bias=-0.5)
Learning rate: η = 0.1
22. Input 1 Input 2 Target Output Weight 1 Weight 2 Bias
1 1 1 1 0.4 0.6 -0.5
0.4 0.6 -0.5
Training a Perceptron – Example AND function
Initial weights: 0.4; 0.6
threshold =0.5 (bias=-0.5)
Learning rate: η = 0.1
23. Input 1 Input 2 Target Output Weight 1 Weight 2 Bias
1 1 1 1 0.4 0.6 -0.5
1 0 0 0 0.4 0.6 -0.5
Training a Perceptron – Example AND function
Initial weights: 0.4; 0.6
threshold =0.5 (bias=-0.5)
Learning rate: η = 0.1
24. Input 1 Input 2 Target Output Weight 1 Weight 2 Bias
1 1 1 1 0.4 0.6 -0.5
1 0 0 0 0.4 0.6 -0.5
0.4 0.6 -0.5
Training a Perceptron – Example AND function
Initial weights: 0.4; 0.6
threshold =0.5 (bias=-0.5)
Learning rate: η = 0.1
25. Input 1 Input 2 Target Output Weight 1 Weight 2 Bias
1 1 1 1 0.4 0.6 -0.5
1 0 0 0 0.4 0.6 -0.5
0 1 0 1 0.4 0.6 -0.5
Training a Perceptron – Example AND function
Initial weights: 0.4; 0.6
threshold =0.5 (bias=-0.5)
Learning rate: η = 0.1
26. Input 1 Input 2 Target Output Weight 1 Weight 2 Bias
1 1 1 1 0.4 0.6 -0.5
1 0 0 0 0.4 0.6 -0.5
0 1 0 1 0.4 0.6 -0.5
0.4 0.5 -0.6
Training a Perceptron – Example AND function
Initial weights: 0.4; 0.6
threshold =0.5 (bias=-0.5)
Learning rate: η = 0.1
28. Training a Perceptron:
•Learning only occurs when an error is made, otherwise the
weights are left unchanged!!.
•During training, it is useful to measure the performance of the
network as it attempts to find the optimal weight set.
•A common error measure used is sum-squared errors (computed over
all of the input vector / output vector pairs in the training set):
where p is the number of input/output vector pairs in the training set.
- Learning rate - Dictates how quickly the network converges.
It is set by a matter of experimentation (usually small – e.g. 0.1)
p
1
i
2
i
i )
t
(
y
)
t
(
d
2
1
E
29. Major classes of Neural Networks:
• Backpropagation Neural Networks (supervised learning)
• Kohonen Self-Organizing Maps (unsupervised learning)
• Hopfield Neural Networks (recurrent neural network)
• Radial Basis Function Neural Networks (RBF)
• Neuro-Fuzzy Networks (NF)
• Others: various architectures of Recurrent Neural Networks,
networks with dynamic neurons,
networks with competitive learning, etc.
•More recent improvements: convolutional neural networks,
spiking neural networks, deep learning architectures
30. History of Neural Networks
McCulloch & Pitts (1943) ----- neural networks and artificial intelligence were born,
first well-known model for a biological neuron
Hebb(1949) ------ Hebb learning rule
Minsky(1954) ------ Neural Networks (PhD Thesis)
Rosenblatt(1957) ------ Perceptron networks (Perceptron learning rule)
Widrow and Hoff(1959) ------ Delta rule for ADALINE networks
Minsky & Papert(1969) ------ Criticism on Perceptron networks
(problem of linear separability)
Kohonen(1982) ------- Self-Organizing Maps
Hopfield(1982) ------- Hopfield Networks
Rumelhart, Hinton ------- Back-Propagation algorithm
& Williams (1986)
Broomhead & Lowe (1988) ------ Radial Basis Functions networks (RBF)
Vapnik (1990) ------ Support Vector Machine approach
In the ’90s - massive interest in neural networks, many NN applications were developed,
and after Neuro-Fuzzy networks emerged, spiking NNs, Deep NN architectures, etc.
31. Neural Networks:
•Can learn directly from data (good learning ability –
better than other AI approaches)
•Can learn from noisy or corrupted data
•Parallel information processing
•Computationally fast once trained
•Robustness to partial failure of the network
•Useful where data are available and difficult to acquire
symbolic knowledge
Drawbacks of NNs:
• Knowledge captured by a NN through learning (in
weights –real numbers) is not in a familiar form for
human beings, e.g. if-then rules (NNs are black box
structures), difficult to interpret the model.
•Long training time