SlideShare una empresa de Scribd logo
1 de 144
Descargar para leer sin conexión
Introduction to Neural Networks and Deep Learning
Introduction to Neural Networks
Andres Mendez-Vazquez
May 5, 2019
1 / 63
Outline
1 What are Neural Networks?
Structure of a Neural Cell
Pigeon Experiment
Formal Definition of Artificial Neural Network
Basic Elements of an Artificial Neuron
Types of Activation Functions
McCulloch-Pitts model
More Advanced Models
The Problem of the Vanishing Gradient
Fixing the Problem, ReLu function
2 Neural Network As a Graph
Introduction
Feedback
Neural Architectures
Knowledge Representation
Design of a Neural Network
Representing Knowledge in a Neural Networks
2 / 63
What are Neural Networks?
Basic Intuition
The human brain is a highly complex, nonlinear and parallel computer
It is organized as a
Network with (Ramon y Cajal 1911)
1 Basic Processing Units ≈ Neurons
2 Connections ≈ Axons and Dendrites
3 / 63
What are Neural Networks?
Basic Intuition
The human brain is a highly complex, nonlinear and parallel computer
It is organized as a
Network with (Ramon y Cajal 1911)
1 Basic Processing Units ≈ Neurons
2 Connections ≈ Axons and Dendrites
3 / 63
What are Neural Networks?
Basic Intuition
The human brain is a highly complex, nonlinear and parallel computer
It is organized as a
Network with (Ramon y Cajal 1911)
1 Basic Processing Units ≈ Neurons
2 Connections ≈ Axons and Dendrites
3 / 63
Outline
1 What are Neural Networks?
Structure of a Neural Cell
Pigeon Experiment
Formal Definition of Artificial Neural Network
Basic Elements of an Artificial Neuron
Types of Activation Functions
McCulloch-Pitts model
More Advanced Models
The Problem of the Vanishing Gradient
Fixing the Problem, ReLu function
2 Neural Network As a Graph
Introduction
Feedback
Neural Architectures
Knowledge Representation
Design of a Neural Network
Representing Knowledge in a Neural Networks
4 / 63
Example
Example
Apical Dentrites
Basal Dentrites
CELL BODY
Dendritic Spines
  SYNAPTIC
INPUTS
AXON
5 / 63
Silicon Chip Vs Neurons
Speed Differential
1 Speeds in silicon chips are in the nanosecond range (10−9 s).
2 Speeds in human neural networks are in the millisecond range (10−3
s).
However
We have massive parallelism on the human brain
1 10 billion neurons in the human cortex.
2 60 trillion synapses or connections
High Energy Efficiency
1 Human Brain uses 10−16 joules per operation.
2 Best computers use 10−6 joules per operation.
6 / 63
Silicon Chip Vs Neurons
Speed Differential
1 Speeds in silicon chips are in the nanosecond range (10−9 s).
2 Speeds in human neural networks are in the millisecond range (10−3
s).
However
We have massive parallelism on the human brain
1 10 billion neurons in the human cortex.
2 60 trillion synapses or connections
High Energy Efficiency
1 Human Brain uses 10−16 joules per operation.
2 Best computers use 10−6 joules per operation.
6 / 63
Silicon Chip Vs Neurons
Speed Differential
1 Speeds in silicon chips are in the nanosecond range (10−9 s).
2 Speeds in human neural networks are in the millisecond range (10−3
s).
However
We have massive parallelism on the human brain
1 10 billion neurons in the human cortex.
2 60 trillion synapses or connections
High Energy Efficiency
1 Human Brain uses 10−16 joules per operation.
2 Best computers use 10−6 joules per operation.
6 / 63
Silicon Chip Vs Neurons
Speed Differential
1 Speeds in silicon chips are in the nanosecond range (10−9 s).
2 Speeds in human neural networks are in the millisecond range (10−3
s).
However
We have massive parallelism on the human brain
1 10 billion neurons in the human cortex.
2 60 trillion synapses or connections
High Energy Efficiency
1 Human Brain uses 10−16 joules per operation.
2 Best computers use 10−6 joules per operation.
6 / 63
Silicon Chip Vs Neurons
Speed Differential
1 Speeds in silicon chips are in the nanosecond range (10−9 s).
2 Speeds in human neural networks are in the millisecond range (10−3
s).
However
We have massive parallelism on the human brain
1 10 billion neurons in the human cortex.
2 60 trillion synapses or connections
High Energy Efficiency
1 Human Brain uses 10−16 joules per operation.
2 Best computers use 10−6 joules per operation.
6 / 63
Silicon Chip Vs Neurons
Speed Differential
1 Speeds in silicon chips are in the nanosecond range (10−9 s).
2 Speeds in human neural networks are in the millisecond range (10−3
s).
However
We have massive parallelism on the human brain
1 10 billion neurons in the human cortex.
2 60 trillion synapses or connections
High Energy Efficiency
1 Human Brain uses 10−16 joules per operation.
2 Best computers use 10−6 joules per operation.
6 / 63
Silicon Chip Vs Neurons
Speed Differential
1 Speeds in silicon chips are in the nanosecond range (10−9 s).
2 Speeds in human neural networks are in the millisecond range (10−3
s).
However
We have massive parallelism on the human brain
1 10 billion neurons in the human cortex.
2 60 trillion synapses or connections
High Energy Efficiency
1 Human Brain uses 10−16 joules per operation.
2 Best computers use 10−6 joules per operation.
6 / 63
Outline
1 What are Neural Networks?
Structure of a Neural Cell
Pigeon Experiment
Formal Definition of Artificial Neural Network
Basic Elements of an Artificial Neuron
Types of Activation Functions
McCulloch-Pitts model
More Advanced Models
The Problem of the Vanishing Gradient
Fixing the Problem, ReLu function
2 Neural Network As a Graph
Introduction
Feedback
Neural Architectures
Knowledge Representation
Design of a Neural Network
Representing Knowledge in a Neural Networks
7 / 63
Pigeon Experiment
Watanabe et al. 1995
Pigeons as art experts
Experiment
Pigeon is in a Skinner box
Then, paintings of two different artists (e.g. Chagall / Van Gogh) are
presented to it.
A Reward is given for pecking when presented a particular artist (e.g.
Van Gogh).
8 / 63
Pigeon Experiment
Watanabe et al. 1995
Pigeons as art experts
Experiment
Pigeon is in a Skinner box
Then, paintings of two different artists (e.g. Chagall / Van Gogh) are
presented to it.
A Reward is given for pecking when presented a particular artist (e.g.
Van Gogh).
8 / 63
Pigeon Experiment
Watanabe et al. 1995
Pigeons as art experts
Experiment
Pigeon is in a Skinner box
Then, paintings of two different artists (e.g. Chagall / Van Gogh) are
presented to it.
A Reward is given for pecking when presented a particular artist (e.g.
Van Gogh).
8 / 63
The Pigeon in the Skinner Box
Something like this
9 / 63
Results
Something Notable
Pigeons were able to discriminate between Van Gogh and Chagall
with 95% accuracy (when presented with pictures they had been
trained on).
Discrimination still 85% successful for previously unseen paintings of
the artists.
Thus
1 Pigeons do not simply memorize the pictures.
2 They can extract and recognize patterns (the ‘style’).
3 They generalize from the already seen to make predictions.
4 This is what neural networks (biological and artificial) are good at
(unlike conventional computer).
10 / 63
Results
Something Notable
Pigeons were able to discriminate between Van Gogh and Chagall
with 95% accuracy (when presented with pictures they had been
trained on).
Discrimination still 85% successful for previously unseen paintings of
the artists.
Thus
1 Pigeons do not simply memorize the pictures.
2 They can extract and recognize patterns (the ‘style’).
3 They generalize from the already seen to make predictions.
4 This is what neural networks (biological and artificial) are good at
(unlike conventional computer).
10 / 63
Results
Something Notable
Pigeons were able to discriminate between Van Gogh and Chagall
with 95% accuracy (when presented with pictures they had been
trained on).
Discrimination still 85% successful for previously unseen paintings of
the artists.
Thus
1 Pigeons do not simply memorize the pictures.
2 They can extract and recognize patterns (the ‘style’).
3 They generalize from the already seen to make predictions.
4 This is what neural networks (biological and artificial) are good at
(unlike conventional computer).
10 / 63
Results
Something Notable
Pigeons were able to discriminate between Van Gogh and Chagall
with 95% accuracy (when presented with pictures they had been
trained on).
Discrimination still 85% successful for previously unseen paintings of
the artists.
Thus
1 Pigeons do not simply memorize the pictures.
2 They can extract and recognize patterns (the ‘style’).
3 They generalize from the already seen to make predictions.
4 This is what neural networks (biological and artificial) are good at
(unlike conventional computer).
10 / 63
Results
Something Notable
Pigeons were able to discriminate between Van Gogh and Chagall
with 95% accuracy (when presented with pictures they had been
trained on).
Discrimination still 85% successful for previously unseen paintings of
the artists.
Thus
1 Pigeons do not simply memorize the pictures.
2 They can extract and recognize patterns (the ‘style’).
3 They generalize from the already seen to make predictions.
4 This is what neural networks (biological and artificial) are good at
(unlike conventional computer).
10 / 63
Results
Something Notable
Pigeons were able to discriminate between Van Gogh and Chagall
with 95% accuracy (when presented with pictures they had been
trained on).
Discrimination still 85% successful for previously unseen paintings of
the artists.
Thus
1 Pigeons do not simply memorize the pictures.
2 They can extract and recognize patterns (the ‘style’).
3 They generalize from the already seen to make predictions.
4 This is what neural networks (biological and artificial) are good at
(unlike conventional computer).
10 / 63
Outline
1 What are Neural Networks?
Structure of a Neural Cell
Pigeon Experiment
Formal Definition of Artificial Neural Network
Basic Elements of an Artificial Neuron
Types of Activation Functions
McCulloch-Pitts model
More Advanced Models
The Problem of the Vanishing Gradient
Fixing the Problem, ReLu function
2 Neural Network As a Graph
Introduction
Feedback
Neural Architectures
Knowledge Representation
Design of a Neural Network
Representing Knowledge in a Neural Networks
11 / 63
Formal Definition
Definition
An artificial neural network is a massively parallel distributed processor made up of
simple processing units. It resembles the brain in two respects:
1 Knowledge is acquired by the network from its environment through a learning
process.
2 Inter-neuron connection strengths, known as synaptic weights, are used to store
the acquired knowledge.
12 / 63
Formal Definition
Definition
An artificial neural network is a massively parallel distributed processor made up of
simple processing units. It resembles the brain in two respects:
1 Knowledge is acquired by the network from its environment through a learning
process.
2 Inter-neuron connection strengths, known as synaptic weights, are used to store
the acquired knowledge.
12 / 63
Formal Definition
Definition
An artificial neural network is a massively parallel distributed processor made up of
simple processing units. It resembles the brain in two respects:
1 Knowledge is acquired by the network from its environment through a learning
process.
2 Inter-neuron connection strengths, known as synaptic weights, are used to store
the acquired knowledge.
12 / 63
Inter-neuron connection strengths?
How do the neuron collect this information?
Some way to aggregate information needs to be devised...
A Classic
Use a summation of product of weights by inputs!!!
Something like
m
i=1
wi × xi
Where: wi is the strength given to signal xi
However: We still need a way to regulate this “aggregation”
(Activation function)
13 / 63
Inter-neuron connection strengths?
How do the neuron collect this information?
Some way to aggregate information needs to be devised...
A Classic
Use a summation of product of weights by inputs!!!
Something like
m
i=1
wi × xi
Where: wi is the strength given to signal xi
However: We still need a way to regulate this “aggregation”
(Activation function)
13 / 63
Inter-neuron connection strengths?
How do the neuron collect this information?
Some way to aggregate information needs to be devised...
A Classic
Use a summation of product of weights by inputs!!!
Something like
m
i=1
wi × xi
Where: wi is the strength given to signal xi
However: We still need a way to regulate this “aggregation”
(Activation function)
13 / 63
The Model of a Artificial Neuron
Graphical Representation for neuron k
INPUT
SIGNALS
SYNAPTIC
WEIGHTS
BIAS
SUMMING
JUNCTION
OUTPUT
ACTIVATION
FUNCTION
14 / 63
Outline
1 What are Neural Networks?
Structure of a Neural Cell
Pigeon Experiment
Formal Definition of Artificial Neural Network
Basic Elements of an Artificial Neuron
Types of Activation Functions
McCulloch-Pitts model
More Advanced Models
The Problem of the Vanishing Gradient
Fixing the Problem, ReLu function
2 Neural Network As a Graph
Introduction
Feedback
Neural Architectures
Knowledge Representation
Design of a Neural Network
Representing Knowledge in a Neural Networks
15 / 63
Basic Elements of an Artificial Neuron (AN) Model
Set of Connecting links
A signal xj, at the input of synapse j connected to neuron k is multiplied by the
synaptic weight wkj.
The weight of an AN may lie in a negative of positive range.
What about the real neuron? In classic literature you only have positive
values.
Adder
An adder for summing the input signals, weighted by the respective synapses of the
neuron.
Activation function (Squashing function)
It limits the amplitude of the output of a neuron.
It maps the permissible range of the output signal to an interval.
16 / 63
Basic Elements of an Artificial Neuron (AN) Model
Set of Connecting links
A signal xj, at the input of synapse j connected to neuron k is multiplied by the
synaptic weight wkj.
The weight of an AN may lie in a negative of positive range.
What about the real neuron? In classic literature you only have positive
values.
Adder
An adder for summing the input signals, weighted by the respective synapses of the
neuron.
Activation function (Squashing function)
It limits the amplitude of the output of a neuron.
It maps the permissible range of the output signal to an interval.
16 / 63
Basic Elements of an Artificial Neuron (AN) Model
Set of Connecting links
A signal xj, at the input of synapse j connected to neuron k is multiplied by the
synaptic weight wkj.
The weight of an AN may lie in a negative of positive range.
What about the real neuron? In classic literature you only have positive
values.
Adder
An adder for summing the input signals, weighted by the respective synapses of the
neuron.
Activation function (Squashing function)
It limits the amplitude of the output of a neuron.
It maps the permissible range of the output signal to an interval.
16 / 63
Basic Elements of an Artificial Neuron (AN) Model
Set of Connecting links
A signal xj, at the input of synapse j connected to neuron k is multiplied by the
synaptic weight wkj.
The weight of an AN may lie in a negative of positive range.
What about the real neuron? In classic literature you only have positive
values.
Adder
An adder for summing the input signals, weighted by the respective synapses of the
neuron.
Activation function (Squashing function)
It limits the amplitude of the output of a neuron.
It maps the permissible range of the output signal to an interval.
16 / 63
Mathematically
Adder
uk =
m
j=1
wkjxj (1)
1 x1, x2, ..., xm are the input signals.
2 wk1, wk2, ..., wkm are the synaptic weights.
3 It is also known as “Affine Transformation.”
Activation function
yk = ϕ (uk + bk) (2)
1 yk output of neuron.
2 ϕ is the activation function.
17 / 63
Mathematically
Adder
uk =
m
j=1
wkjxj (1)
1 x1, x2, ..., xm are the input signals.
2 wk1, wk2, ..., wkm are the synaptic weights.
3 It is also known as “Affine Transformation.”
Activation function
yk = ϕ (uk + bk) (2)
1 yk output of neuron.
2 ϕ is the activation function.
17 / 63
Mathematically
Adder
uk =
m
j=1
wkjxj (1)
1 x1, x2, ..., xm are the input signals.
2 wk1, wk2, ..., wkm are the synaptic weights.
3 It is also known as “Affine Transformation.”
Activation function
yk = ϕ (uk + bk) (2)
1 yk output of neuron.
2 ϕ is the activation function.
17 / 63
Mathematically
Adder
uk =
m
j=1
wkjxj (1)
1 x1, x2, ..., xm are the input signals.
2 wk1, wk2, ..., wkm are the synaptic weights.
3 It is also known as “Affine Transformation.”
Activation function
yk = ϕ (uk + bk) (2)
1 yk output of neuron.
2 ϕ is the activation function.
17 / 63
Mathematically
Adder
uk =
m
j=1
wkjxj (1)
1 x1, x2, ..., xm are the input signals.
2 wk1, wk2, ..., wkm are the synaptic weights.
3 It is also known as “Affine Transformation.”
Activation function
yk = ϕ (uk + bk) (2)
1 yk output of neuron.
2 ϕ is the activation function.
17 / 63
Mathematically
Adder
uk =
m
j=1
wkjxj (1)
1 x1, x2, ..., xm are the input signals.
2 wk1, wk2, ..., wkm are the synaptic weights.
3 It is also known as “Affine Transformation.”
Activation function
yk = ϕ (uk + bk) (2)
1 yk output of neuron.
2 ϕ is the activation function.
17 / 63
Mathematically
Adder
uk =
m
j=1
wkjxj (1)
1 x1, x2, ..., xm are the input signals.
2 wk1, wk2, ..., wkm are the synaptic weights.
3 It is also known as “Affine Transformation.”
Activation function
yk = ϕ (uk + bk) (2)
1 yk output of neuron.
2 ϕ is the activation function.
17 / 63
Integrating the Bias
The Affine Transformation
0
Induced Local Field,
Bias
Linear Combiner's Output,
18 / 63
Thus
Final Equation
vk =
m
j=0
wkjxj
yk = ϕ (vk)
With
x0 = 1 and wk0 = bk
19 / 63
Thus
Final Equation
vk =
m
j=0
wkjxj
yk = ϕ (vk)
With
x0 = 1 and wk0 = bk
19 / 63
Outline
1 What are Neural Networks?
Structure of a Neural Cell
Pigeon Experiment
Formal Definition of Artificial Neural Network
Basic Elements of an Artificial Neuron
Types of Activation Functions
McCulloch-Pitts model
More Advanced Models
The Problem of the Vanishing Gradient
Fixing the Problem, ReLu function
2 Neural Network As a Graph
Introduction
Feedback
Neural Architectures
Knowledge Representation
Design of a Neural Network
Representing Knowledge in a Neural Networks
20 / 63
Types of Activation Functions I
Threshold Function
ϕ (v) =
1 if v ≥ 0
0 if v < 0
(Heaviside Function) (3)
In the following picture
21 / 63
Types of Activation Functions I
Threshold Function
ϕ (v) =
1 if v ≥ 0
0 if v < 0
(Heaviside Function) (3)
In the following picture
1
-1-2-3-4 10 2 3
0.5
21 / 63
Thus
We can use this activation function
To generate the first Neural Network Model
Clearly
The model uses the summation as aggregation operator and a threshold
function.
22 / 63
Thus
We can use this activation function
To generate the first Neural Network Model
Clearly
The model uses the summation as aggregation operator and a threshold
function.
22 / 63
Outline
1 What are Neural Networks?
Structure of a Neural Cell
Pigeon Experiment
Formal Definition of Artificial Neural Network
Basic Elements of an Artificial Neuron
Types of Activation Functions
McCulloch-Pitts model
More Advanced Models
The Problem of the Vanishing Gradient
Fixing the Problem, ReLu function
2 Neural Network As a Graph
Introduction
Feedback
Neural Architectures
Knowledge Representation
Design of a Neural Network
Representing Knowledge in a Neural Networks
23 / 63
McCulloch-Pitts model
McCulloch-Pitts model (Pioneers of Neural Networks in the 1940’s)
Output
Output yk =
1 if vk ≥ θ
0 if vk < θ
(4)
yk =
1 if vk ≤ θ = 0
0 if vk > θ = 0
with induced local field wk = (1, 1)T
vk =
m
j=1
wkjxj + bk (5)
24 / 63
McCulloch-Pitts model
McCulloch-Pitts model (Pioneers of Neural Networks in the 1940’s)
Output
Output yk =
1 if vk ≥ θ
0 if vk < θ
(4)
yk =
1 if vk ≤ θ = 0
0 if vk > θ = 0
with induced local field wk = (1, 1)T
vk =
m
j=1
wkjxj + bk (5)
24 / 63
It is possible to do classic operations in Boolean Algebra
AND Gate
25 / 63
In the other hand
OR Gate
26 / 63
Finally
NOT Gate
27 / 63
And the impact is further understood if you look at this
paper
Claude Shannon
“A Symbolic Analysis of Relay and Switching Circuits”
Shannon proved that his switching circuits could be used to simplify
the arrangement of the electromechanical relays
These circuits could solve all problems that Boolean algebra could
solve.
Basically, he proved that computer circuits
They can solve computational complex problems...
28 / 63
And the impact is further understood if you look at this
paper
Claude Shannon
“A Symbolic Analysis of Relay and Switching Circuits”
Shannon proved that his switching circuits could be used to simplify
the arrangement of the electromechanical relays
These circuits could solve all problems that Boolean algebra could
solve.
Basically, he proved that computer circuits
They can solve computational complex problems...
28 / 63
And the impact is further understood if you look at this
paper
Claude Shannon
“A Symbolic Analysis of Relay and Switching Circuits”
Shannon proved that his switching circuits could be used to simplify
the arrangement of the electromechanical relays
These circuits could solve all problems that Boolean algebra could
solve.
Basically, he proved that computer circuits
They can solve computational complex problems...
28 / 63
And the impact is further understood if you look at this
paper
Claude Shannon
“A Symbolic Analysis of Relay and Switching Circuits”
Shannon proved that his switching circuits could be used to simplify
the arrangement of the electromechanical relays
These circuits could solve all problems that Boolean algebra could
solve.
Basically, he proved that computer circuits
They can solve computational complex problems...
28 / 63
Outline
1 What are Neural Networks?
Structure of a Neural Cell
Pigeon Experiment
Formal Definition of Artificial Neural Network
Basic Elements of an Artificial Neuron
Types of Activation Functions
McCulloch-Pitts model
More Advanced Models
The Problem of the Vanishing Gradient
Fixing the Problem, ReLu function
2 Neural Network As a Graph
Introduction
Feedback
Neural Architectures
Knowledge Representation
Design of a Neural Network
Representing Knowledge in a Neural Networks
29 / 63
More advanced activation function
Piecewise-Linear Function
ϕ (v) =



1 if vk ≥ 1
2
v if − 1
2 < vk < 1
2
0 if v ≤ −1
2
(6)
Example
30 / 63
More advanced activation function
Piecewise-Linear Function
ϕ (v) =



1 if vk ≥ 1
2
v if − 1
2 < vk < 1
2
0 if v ≤ −1
2
(6)
Example
1
-1-2-3-4 10 2 3
0.5
-0.5 0.5
30 / 63
Remarks
Notes about Piecewise-Linear function
The amplification factor inside the linear region of operation is assumed to
be unity.
Special Cases
A linear combiner arises if the linear region of operation is maintained
without running into saturation.
The piecewise-linear function reduces to a threshold function if the
amplification factor of the linear region is made infinitely large.
31 / 63
Remarks
Notes about Piecewise-Linear function
The amplification factor inside the linear region of operation is assumed to
be unity.
Special Cases
A linear combiner arises if the linear region of operation is maintained
without running into saturation.
The piecewise-linear function reduces to a threshold function if the
amplification factor of the linear region is made infinitely large.
31 / 63
Remarks
Notes about Piecewise-Linear function
The amplification factor inside the linear region of operation is assumed to
be unity.
Special Cases
A linear combiner arises if the linear region of operation is maintained
without running into saturation.
The piecewise-linear function reduces to a threshold function if the
amplification factor of the linear region is made infinitely large.
31 / 63
A better choice!!!
Sigmoid function
ϕ (v) =
1
1 + exp {−av}
(7)
Where a is a slope parameter.
As a → ∞ then ϕ → Threshold function.
Examples
32 / 63
A better choice!!!
Sigmoid function
ϕ (v) =
1
1 + exp {−av}
(7)
Where a is a slope parameter.
As a → ∞ then ϕ → Threshold function.
Examples
Increasing
32 / 63
Outline
1 What are Neural Networks?
Structure of a Neural Cell
Pigeon Experiment
Formal Definition of Artificial Neural Network
Basic Elements of an Artificial Neuron
Types of Activation Functions
McCulloch-Pitts model
More Advanced Models
The Problem of the Vanishing Gradient
Fixing the Problem, ReLu function
2 Neural Network As a Graph
Introduction
Feedback
Neural Architectures
Knowledge Representation
Design of a Neural Network
Representing Knowledge in a Neural Networks
33 / 63
The Problem of the Vanishing Gradient
When using a non-linearity
However, there is a drawback when using Back-Propagation (As we
saw in Machine Learning) under a sigmoid function
s (x) =
1
1 + e−x
Because if we imagine a Deep Neural Network as a series of layer
functions fi
y (A) = ft ◦ ft−1 ◦ · · · ◦ f2 ◦ f1 (A)
With ft is the last layer.
Therefore, we finish with a sequence of derivatives
∂y (A)
∂w1i
=
∂ft (ft−1)
∂ft−1
·
∂ft−1 (ft−2)
∂ft−2
· · · · ·
∂f2 (f1)
∂f2
·
∂f1 (A)
∂w1i
34 / 63
The Problem of the Vanishing Gradient
When using a non-linearity
However, there is a drawback when using Back-Propagation (As we
saw in Machine Learning) under a sigmoid function
s (x) =
1
1 + e−x
Because if we imagine a Deep Neural Network as a series of layer
functions fi
y (A) = ft ◦ ft−1 ◦ · · · ◦ f2 ◦ f1 (A)
With ft is the last layer.
Therefore, we finish with a sequence of derivatives
∂y (A)
∂w1i
=
∂ft (ft−1)
∂ft−1
·
∂ft−1 (ft−2)
∂ft−2
· · · · ·
∂f2 (f1)
∂f2
·
∂f1 (A)
∂w1i
34 / 63
The Problem of the Vanishing Gradient
When using a non-linearity
However, there is a drawback when using Back-Propagation (As we
saw in Machine Learning) under a sigmoid function
s (x) =
1
1 + e−x
Because if we imagine a Deep Neural Network as a series of layer
functions fi
y (A) = ft ◦ ft−1 ◦ · · · ◦ f2 ◦ f1 (A)
With ft is the last layer.
Therefore, we finish with a sequence of derivatives
∂y (A)
∂w1i
=
∂ft (ft−1)
∂ft−1
·
∂ft−1 (ft−2)
∂ft−2
· · · · ·
∂f2 (f1)
∂f2
·
∂f1 (A)
∂w1i
34 / 63
Therefore
Given the commutativity of the product
You could put together the derivative of the sigmoid’s
f (x) =
ds (x)
dx
=
e−x
(1 + e−x)2
Therefore, deriving again
df (x)
dx
= −
e−x
(1 + e−x)2 +
2 (e−x)
2
(1 + e−x)3
After making df(x)
dx
= 0
We have the maximum is at x = 0
35 / 63
Therefore
Given the commutativity of the product
You could put together the derivative of the sigmoid’s
f (x) =
ds (x)
dx
=
e−x
(1 + e−x)2
Therefore, deriving again
df (x)
dx
= −
e−x
(1 + e−x)2 +
2 (e−x)
2
(1 + e−x)3
After making df(x)
dx
= 0
We have the maximum is at x = 0
35 / 63
Therefore
Given the commutativity of the product
You could put together the derivative of the sigmoid’s
f (x) =
ds (x)
dx
=
e−x
(1 + e−x)2
Therefore, deriving again
df (x)
dx
= −
e−x
(1 + e−x)2 +
2 (e−x)
2
(1 + e−x)3
After making df(x)
dx
= 0
We have the maximum is at x = 0
35 / 63
Therefore
The maximum for the derivative of the sigmoid
f (0) = 0.25
Therefore, Given a Deep Convolutional Network
We could finish with
lim
k→∞
ds (x)
dx
k
= lim
k→∞
(0.25)k
→ 0
A Vanishing Derivative or Vanishing Gradient
Making quite difficult to do train a deeper network using this
activation function for Deep Learning and even in Shallow Learning
36 / 63
Therefore
The maximum for the derivative of the sigmoid
f (0) = 0.25
Therefore, Given a Deep Convolutional Network
We could finish with
lim
k→∞
ds (x)
dx
k
= lim
k→∞
(0.25)k
→ 0
A Vanishing Derivative or Vanishing Gradient
Making quite difficult to do train a deeper network using this
activation function for Deep Learning and even in Shallow Learning
36 / 63
Therefore
The maximum for the derivative of the sigmoid
f (0) = 0.25
Therefore, Given a Deep Convolutional Network
We could finish with
lim
k→∞
ds (x)
dx
k
= lim
k→∞
(0.25)k
→ 0
A Vanishing Derivative or Vanishing Gradient
Making quite difficult to do train a deeper network using this
activation function for Deep Learning and even in Shallow Learning
36 / 63
Outline
1 What are Neural Networks?
Structure of a Neural Cell
Pigeon Experiment
Formal Definition of Artificial Neural Network
Basic Elements of an Artificial Neuron
Types of Activation Functions
McCulloch-Pitts model
More Advanced Models
The Problem of the Vanishing Gradient
Fixing the Problem, ReLu function
2 Neural Network As a Graph
Introduction
Feedback
Neural Architectures
Knowledge Representation
Design of a Neural Network
Representing Knowledge in a Neural Networks
37 / 63
Thus
The need to introduce a new function
f (x) = x+
= max (0, x)
It is called ReLu or Rectifier
With a smooth approximation (Softplus function)
f (x) =
ln 1 + ekx
k
38 / 63
Thus
The need to introduce a new function
f (x) = x+
= max (0, x)
It is called ReLu or Rectifier
With a smooth approximation (Softplus function)
f (x) =
ln 1 + ekx
k
38 / 63
We have
When k = 1
+0.4 +1.0 +1.6 +2.2−0.4−1.0−1.6−2.2−2.8
−0.5
+0.5
+1.0
+1.5
+2.0
+2.5
+3.0
+3.5
Softplus
ReLu
39 / 63
Increase k
When k = 104
+0.0006 +0.0012 +0.0018 +0.0024−0.0004−0.001−0.0016−0.0022
−0.001
+0.001
+0.002
+0.003
+0.004
Softplus
ReLu
40 / 63
Final Remarks
Although, ReLu functions
They can handle the problem of vanishing problem
However, as we will see, saturation starts to appear as a problem
As in Hebbian Learning!!!
41 / 63
Final Remarks
Although, ReLu functions
They can handle the problem of vanishing problem
However, as we will see, saturation starts to appear as a problem
As in Hebbian Learning!!!
41 / 63
Outline
1 What are Neural Networks?
Structure of a Neural Cell
Pigeon Experiment
Formal Definition of Artificial Neural Network
Basic Elements of an Artificial Neuron
Types of Activation Functions
McCulloch-Pitts model
More Advanced Models
The Problem of the Vanishing Gradient
Fixing the Problem, ReLu function
2 Neural Network As a Graph
Introduction
Feedback
Neural Architectures
Knowledge Representation
Design of a Neural Network
Representing Knowledge in a Neural Networks
42 / 63
Neural Network As a Graph
Definition
A neural network is a directed graph consisting of nodes with
interconnecting synaptic and activation links.
Properties
1 Each neuron is represented by a set of linear synaptic links, an
externally applied bias, and a possibly nonlinear activation link.
2 The synaptic to links of a neuron weight their respective input signals.
3 The weighted sum of the input signals defines the induced local field
of the neuron.
4 The activation link squashes the induced local field of the neuron to
produce an output.
43 / 63
Neural Network As a Graph
Definition
A neural network is a directed graph consisting of nodes with
interconnecting synaptic and activation links.
Properties
1 Each neuron is represented by a set of linear synaptic links, an
externally applied bias, and a possibly nonlinear activation link.
2 The synaptic to links of a neuron weight their respective input signals.
3 The weighted sum of the input signals defines the induced local field
of the neuron.
4 The activation link squashes the induced local field of the neuron to
produce an output.
43 / 63
Neural Network As a Graph
Definition
A neural network is a directed graph consisting of nodes with
interconnecting synaptic and activation links.
Properties
1 Each neuron is represented by a set of linear synaptic links, an
externally applied bias, and a possibly nonlinear activation link.
2 The synaptic to links of a neuron weight their respective input signals.
3 The weighted sum of the input signals defines the induced local field
of the neuron.
4 The activation link squashes the induced local field of the neuron to
produce an output.
43 / 63
Neural Network As a Graph
Definition
A neural network is a directed graph consisting of nodes with
interconnecting synaptic and activation links.
Properties
1 Each neuron is represented by a set of linear synaptic links, an
externally applied bias, and a possibly nonlinear activation link.
2 The synaptic to links of a neuron weight their respective input signals.
3 The weighted sum of the input signals defines the induced local field
of the neuron.
4 The activation link squashes the induced local field of the neuron to
produce an output.
43 / 63
Neural Network As a Graph
Definition
A neural network is a directed graph consisting of nodes with
interconnecting synaptic and activation links.
Properties
1 Each neuron is represented by a set of linear synaptic links, an
externally applied bias, and a possibly nonlinear activation link.
2 The synaptic to links of a neuron weight their respective input signals.
3 The weighted sum of the input signals defines the induced local field
of the neuron.
4 The activation link squashes the induced local field of the neuron to
produce an output.
43 / 63
Example
Example
44 / 63
Some Observations
Observation
A partially complete graph describing a neural architecture has the following
characteristics:
Source nodes supply input signals to the graph.
Each neuron is represented by a single node called a computation node.
The communication links provide directions of signal flow in the graph.
However
Other Representations exist!!!
Three main representations ones
Block diagram, providing a functional description of the network.
Signal-flow graph, providing a complete description of signal flow in the
network.
Then one we plan to use.
Architectural graph, describing the network layout.
45 / 63
Some Observations
Observation
A partially complete graph describing a neural architecture has the following
characteristics:
Source nodes supply input signals to the graph.
Each neuron is represented by a single node called a computation node.
The communication links provide directions of signal flow in the graph.
However
Other Representations exist!!!
Three main representations ones
Block diagram, providing a functional description of the network.
Signal-flow graph, providing a complete description of signal flow in the
network.
Then one we plan to use.
Architectural graph, describing the network layout.
45 / 63
Some Observations
Observation
A partially complete graph describing a neural architecture has the following
characteristics:
Source nodes supply input signals to the graph.
Each neuron is represented by a single node called a computation node.
The communication links provide directions of signal flow in the graph.
However
Other Representations exist!!!
Three main representations ones
Block diagram, providing a functional description of the network.
Signal-flow graph, providing a complete description of signal flow in the
network.
Then one we plan to use.
Architectural graph, describing the network layout.
45 / 63
Some Observations
Observation
A partially complete graph describing a neural architecture has the following
characteristics:
Source nodes supply input signals to the graph.
Each neuron is represented by a single node called a computation node.
The communication links provide directions of signal flow in the graph.
However
Other Representations exist!!!
Three main representations ones
Block diagram, providing a functional description of the network.
Signal-flow graph, providing a complete description of signal flow in the
network.
Then one we plan to use.
Architectural graph, describing the network layout.
45 / 63
Some Observations
Observation
A partially complete graph describing a neural architecture has the following
characteristics:
Source nodes supply input signals to the graph.
Each neuron is represented by a single node called a computation node.
The communication links provide directions of signal flow in the graph.
However
Other Representations exist!!!
Three main representations ones
Block diagram, providing a functional description of the network.
Signal-flow graph, providing a complete description of signal flow in the
network.
Then one we plan to use.
Architectural graph, describing the network layout.
45 / 63
Some Observations
Observation
A partially complete graph describing a neural architecture has the following
characteristics:
Source nodes supply input signals to the graph.
Each neuron is represented by a single node called a computation node.
The communication links provide directions of signal flow in the graph.
However
Other Representations exist!!!
Three main representations ones
Block diagram, providing a functional description of the network.
Signal-flow graph, providing a complete description of signal flow in the
network.
Then one we plan to use.
Architectural graph, describing the network layout.
45 / 63
Some Observations
Observation
A partially complete graph describing a neural architecture has the following
characteristics:
Source nodes supply input signals to the graph.
Each neuron is represented by a single node called a computation node.
The communication links provide directions of signal flow in the graph.
However
Other Representations exist!!!
Three main representations ones
Block diagram, providing a functional description of the network.
Signal-flow graph, providing a complete description of signal flow in the
network.
Then one we plan to use.
Architectural graph, describing the network layout.
45 / 63
Some Observations
Observation
A partially complete graph describing a neural architecture has the following
characteristics:
Source nodes supply input signals to the graph.
Each neuron is represented by a single node called a computation node.
The communication links provide directions of signal flow in the graph.
However
Other Representations exist!!!
Three main representations ones
Block diagram, providing a functional description of the network.
Signal-flow graph, providing a complete description of signal flow in the
network.
Then one we plan to use.
Architectural graph, describing the network layout.
45 / 63
Some Observations
Observation
A partially complete graph describing a neural architecture has the following
characteristics:
Source nodes supply input signals to the graph.
Each neuron is represented by a single node called a computation node.
The communication links provide directions of signal flow in the graph.
However
Other Representations exist!!!
Three main representations ones
Block diagram, providing a functional description of the network.
Signal-flow graph, providing a complete description of signal flow in the
network.
Then one we plan to use.
Architectural graph, describing the network layout.
45 / 63
Outline
1 What are Neural Networks?
Structure of a Neural Cell
Pigeon Experiment
Formal Definition of Artificial Neural Network
Basic Elements of an Artificial Neuron
Types of Activation Functions
McCulloch-Pitts model
More Advanced Models
The Problem of the Vanishing Gradient
Fixing the Problem, ReLu function
2 Neural Network As a Graph
Introduction
Feedback
Neural Architectures
Knowledge Representation
Design of a Neural Network
Representing Knowledge in a Neural Networks
46 / 63
Feedback Model
Feedback
Feedback is said to exist in a dynamic system.
Indeed, feedback occurs in almost every part of the nervous system of
every animal (Freeman, 1975)
Feedback Architecture
Thus
Given xj the internal signal and A (The weights product) operator:
Output yk (n) = A xj (n) (8)
47 / 63
Feedback Model
Feedback
Feedback is said to exist in a dynamic system.
Indeed, feedback occurs in almost every part of the nervous system of
every animal (Freeman, 1975)
Feedback Architecture
Thus
Given xj the internal signal and A (The weights product) operator:
Output yk (n) = A xj (n) (8)
47 / 63
Feedback Model
Feedback
Feedback is said to exist in a dynamic system.
Indeed, feedback occurs in almost every part of the nervous system of
every animal (Freeman, 1975)
Feedback Architecture
Thus
Given xj the internal signal and A (The weights product) operator:
Output yk (n) = A xj (n) (8)
47 / 63
Thus
The internal input has a feedback
Given a B operator:
xj (n) = xj (n) + B (yk (n)) (9)
Then eliminating xj (n)
yk (n) =
A
1 − AB
(xj (n)) (10)
Where A
1−AB is known as the closed-loop operator of the system.
48 / 63
Thus
The internal input has a feedback
Given a B operator:
xj (n) = xj (n) + B (yk (n)) (9)
Then eliminating xj (n)
yk (n) =
A
1 − AB
(xj (n)) (10)
Where A
1−AB is known as the closed-loop operator of the system.
48 / 63
Outline
1 What are Neural Networks?
Structure of a Neural Cell
Pigeon Experiment
Formal Definition of Artificial Neural Network
Basic Elements of an Artificial Neuron
Types of Activation Functions
McCulloch-Pitts model
More Advanced Models
The Problem of the Vanishing Gradient
Fixing the Problem, ReLu function
2 Neural Network As a Graph
Introduction
Feedback
Neural Architectures
Knowledge Representation
Design of a Neural Network
Representing Knowledge in a Neural Networks
49 / 63
Neural Architectures I
Single-Layer Feedforward Networks
Input Layer
of Source
Nodes
Output Layer
of Neurons
50 / 63
Observations
Observations
This network is know as a strictly feed-forward or acyclic type.
51 / 63
Neural Architectures II
Multilayer Feedforward Networks
Input Layer
of Source
Nodes
Output Layer
of Neurons
Hidden Layer
of Neurons
52 / 63
Observations
Observations
1 This network contains a series of hidden layer.
2 Each hidden layers allows for classification of the new output space of
the previous hidden layer.
53 / 63
Observations
Observations
1 This network contains a series of hidden layer.
2 Each hidden layers allows for classification of the new output space of
the previous hidden layer.
53 / 63
Neural Architectures III
Recurrent Networks
UNIT-DELAY
OPERATORS
54 / 63
Observations
Observations
1 This network has not self-feedback loops.
2 It has something known as unit delay operator B = z−1.
55 / 63
Observations
Observations
1 This network has not self-feedback loops.
2 It has something known as unit delay operator B = z−1.
55 / 63
Outline
1 What are Neural Networks?
Structure of a Neural Cell
Pigeon Experiment
Formal Definition of Artificial Neural Network
Basic Elements of an Artificial Neuron
Types of Activation Functions
McCulloch-Pitts model
More Advanced Models
The Problem of the Vanishing Gradient
Fixing the Problem, ReLu function
2 Neural Network As a Graph
Introduction
Feedback
Neural Architectures
Knowledge Representation
Design of a Neural Network
Representing Knowledge in a Neural Networks
56 / 63
Knowledge Representation
Definition
By Fischler and Firschein, 1987
“Knowledge refers to stored information or models used by a person or
machine to interpret, predict, and appropriately respond to the outside
world. ”
It consists of two kinds of information:
1 The known world state, represented by facts about what is and what
has been known.
2 Observations (measurements) of the world, obtained by means of
sensors.
Observations can be
1 Labeled
2 Unlabeled
57 / 63
Knowledge Representation
Definition
By Fischler and Firschein, 1987
“Knowledge refers to stored information or models used by a person or
machine to interpret, predict, and appropriately respond to the outside
world. ”
It consists of two kinds of information:
1 The known world state, represented by facts about what is and what
has been known.
2 Observations (measurements) of the world, obtained by means of
sensors.
Observations can be
1 Labeled
2 Unlabeled
57 / 63
Knowledge Representation
Definition
By Fischler and Firschein, 1987
“Knowledge refers to stored information or models used by a person or
machine to interpret, predict, and appropriately respond to the outside
world. ”
It consists of two kinds of information:
1 The known world state, represented by facts about what is and what
has been known.
2 Observations (measurements) of the world, obtained by means of
sensors.
Observations can be
1 Labeled
2 Unlabeled
57 / 63
Knowledge Representation
Definition
By Fischler and Firschein, 1987
“Knowledge refers to stored information or models used by a person or
machine to interpret, predict, and appropriately respond to the outside
world. ”
It consists of two kinds of information:
1 The known world state, represented by facts about what is and what
has been known.
2 Observations (measurements) of the world, obtained by means of
sensors.
Observations can be
1 Labeled
2 Unlabeled
57 / 63
Knowledge Representation
Definition
By Fischler and Firschein, 1987
“Knowledge refers to stored information or models used by a person or
machine to interpret, predict, and appropriately respond to the outside
world. ”
It consists of two kinds of information:
1 The known world state, represented by facts about what is and what
has been known.
2 Observations (measurements) of the world, obtained by means of
sensors.
Observations can be
1 Labeled
2 Unlabeled
57 / 63
Knowledge Representation
Definition
By Fischler and Firschein, 1987
“Knowledge refers to stored information or models used by a person or
machine to interpret, predict, and appropriately respond to the outside
world. ”
It consists of two kinds of information:
1 The known world state, represented by facts about what is and what
has been known.
2 Observations (measurements) of the world, obtained by means of
sensors.
Observations can be
1 Labeled
2 Unlabeled
57 / 63
Outline
1 What are Neural Networks?
Structure of a Neural Cell
Pigeon Experiment
Formal Definition of Artificial Neural Network
Basic Elements of an Artificial Neuron
Types of Activation Functions
McCulloch-Pitts model
More Advanced Models
The Problem of the Vanishing Gradient
Fixing the Problem, ReLu function
2 Neural Network As a Graph
Introduction
Feedback
Neural Architectures
Knowledge Representation
Design of a Neural Network
Representing Knowledge in a Neural Networks
58 / 63
Set of Training Data
Training Data
It consist of input-output pairs (x, y)
x= input signal
y= desired output
Thus, we have the following phases of designing a Neuronal Network
1 Choose appropriate architecture
2 Train the network - learning.
1 Use the Training Data!!!
3 Test the network with data not seen before
1 Use a set of pairs that where not shown to the network so the y
component is guessed.
4 Then, you can see how well the network behaves - Generalization Phase.
59 / 63
Set of Training Data
Training Data
It consist of input-output pairs (x, y)
x= input signal
y= desired output
Thus, we have the following phases of designing a Neuronal Network
1 Choose appropriate architecture
2 Train the network - learning.
1 Use the Training Data!!!
3 Test the network with data not seen before
1 Use a set of pairs that where not shown to the network so the y
component is guessed.
4 Then, you can see how well the network behaves - Generalization Phase.
59 / 63
Set of Training Data
Training Data
It consist of input-output pairs (x, y)
x= input signal
y= desired output
Thus, we have the following phases of designing a Neuronal Network
1 Choose appropriate architecture
2 Train the network - learning.
1 Use the Training Data!!!
3 Test the network with data not seen before
1 Use a set of pairs that where not shown to the network so the y
component is guessed.
4 Then, you can see how well the network behaves - Generalization Phase.
59 / 63
Set of Training Data
Training Data
It consist of input-output pairs (x, y)
x= input signal
y= desired output
Thus, we have the following phases of designing a Neuronal Network
1 Choose appropriate architecture
2 Train the network - learning.
1 Use the Training Data!!!
3 Test the network with data not seen before
1 Use a set of pairs that where not shown to the network so the y
component is guessed.
4 Then, you can see how well the network behaves - Generalization Phase.
59 / 63
Set of Training Data
Training Data
It consist of input-output pairs (x, y)
x= input signal
y= desired output
Thus, we have the following phases of designing a Neuronal Network
1 Choose appropriate architecture
2 Train the network - learning.
1 Use the Training Data!!!
3 Test the network with data not seen before
1 Use a set of pairs that where not shown to the network so the y
component is guessed.
4 Then, you can see how well the network behaves - Generalization Phase.
59 / 63
Set of Training Data
Training Data
It consist of input-output pairs (x, y)
x= input signal
y= desired output
Thus, we have the following phases of designing a Neuronal Network
1 Choose appropriate architecture
2 Train the network - learning.
1 Use the Training Data!!!
3 Test the network with data not seen before
1 Use a set of pairs that where not shown to the network so the y
component is guessed.
4 Then, you can see how well the network behaves - Generalization Phase.
59 / 63
Set of Training Data
Training Data
It consist of input-output pairs (x, y)
x= input signal
y= desired output
Thus, we have the following phases of designing a Neuronal Network
1 Choose appropriate architecture
2 Train the network - learning.
1 Use the Training Data!!!
3 Test the network with data not seen before
1 Use a set of pairs that where not shown to the network so the y
component is guessed.
4 Then, you can see how well the network behaves - Generalization Phase.
59 / 63
Outline
1 What are Neural Networks?
Structure of a Neural Cell
Pigeon Experiment
Formal Definition of Artificial Neural Network
Basic Elements of an Artificial Neuron
Types of Activation Functions
McCulloch-Pitts model
More Advanced Models
The Problem of the Vanishing Gradient
Fixing the Problem, ReLu function
2 Neural Network As a Graph
Introduction
Feedback
Neural Architectures
Knowledge Representation
Design of a Neural Network
Representing Knowledge in a Neural Networks
60 / 63
Representing Knowledge in a Neural Networks
Notice the Following
The subject of knowledge representation inside an artificial network is very
complicated.
However: Pattern Classifiers Vs Neural Networks
1 Pattern Classifiers are first designed and then validated by the
environment.
2 Neural Networks learns the environment by using the data from it!!!
Kurt Hornik et al. proved (1989)
“Standard multilayer feedforward networks with as few as one hidden layer
using arbitrary squashing functions are capable of approximating any Borel
measurable function” (Basically many of the known ones!!!)
61 / 63
Representing Knowledge in a Neural Networks
Notice the Following
The subject of knowledge representation inside an artificial network is very
complicated.
However: Pattern Classifiers Vs Neural Networks
1 Pattern Classifiers are first designed and then validated by the
environment.
2 Neural Networks learns the environment by using the data from it!!!
Kurt Hornik et al. proved (1989)
“Standard multilayer feedforward networks with as few as one hidden layer
using arbitrary squashing functions are capable of approximating any Borel
measurable function” (Basically many of the known ones!!!)
61 / 63
Representing Knowledge in a Neural Networks
Notice the Following
The subject of knowledge representation inside an artificial network is very
complicated.
However: Pattern Classifiers Vs Neural Networks
1 Pattern Classifiers are first designed and then validated by the
environment.
2 Neural Networks learns the environment by using the data from it!!!
Kurt Hornik et al. proved (1989)
“Standard multilayer feedforward networks with as few as one hidden layer
using arbitrary squashing functions are capable of approximating any Borel
measurable function” (Basically many of the known ones!!!)
61 / 63
Rules Knowledge Representation
Rule 1
Similar inputs from similar classes should usually produce similar
representation.
We can use a Metric to measure that similarity!!!
Examples
1 d (xi, xj) = xi − xj (Classic Euclidean Metric).
2 d2
ij = (xi − µi)T −1
xj − µj (Mahalanobis distance) where
1 µi = E [xi] .
2 = E (xi − µi) (xi − µi)
T
= E xj − µj xj − µj
T
.
62 / 63
Rules Knowledge Representation
Rule 1
Similar inputs from similar classes should usually produce similar
representation.
We can use a Metric to measure that similarity!!!
Examples
1 d (xi, xj) = xi − xj (Classic Euclidean Metric).
2 d2
ij = (xi − µi)T −1
xj − µj (Mahalanobis distance) where
1 µi = E [xi] .
2 = E (xi − µi) (xi − µi)
T
= E xj − µj xj − µj
T
.
62 / 63
Rules Knowledge Representation
Rule 1
Similar inputs from similar classes should usually produce similar
representation.
We can use a Metric to measure that similarity!!!
Examples
1 d (xi, xj) = xi − xj (Classic Euclidean Metric).
2 d2
ij = (xi − µi)T −1
xj − µj (Mahalanobis distance) where
1 µi = E [xi] .
2 = E (xi − µi) (xi − µi)
T
= E xj − µj xj − µj
T
.
62 / 63
More
Rule 2
Items to be categorized as separate classes should be given widely different
representations in the network.
Rule 3
If a particular feature is important, then there should be a large number of
neurons involved in the representation.
Rule 4
Prior information and invariance should be built into the design:
Thus, simplify the network by not learning that data.
63 / 63
More
Rule 2
Items to be categorized as separate classes should be given widely different
representations in the network.
Rule 3
If a particular feature is important, then there should be a large number of
neurons involved in the representation.
Rule 4
Prior information and invariance should be built into the design:
Thus, simplify the network by not learning that data.
63 / 63
More
Rule 2
Items to be categorized as separate classes should be given widely different
representations in the network.
Rule 3
If a particular feature is important, then there should be a large number of
neurons involved in the representation.
Rule 4
Prior information and invariance should be built into the design:
Thus, simplify the network by not learning that data.
63 / 63

Más contenido relacionado

Similar a 01 Introduction to Neural Networks and Deep Learning

Blue Brain_Nikhilesh+Krishna Raj
Blue Brain_Nikhilesh+Krishna RajBlue Brain_Nikhilesh+Krishna Raj
Blue Brain_Nikhilesh+Krishna RajKrishna Raj .S
 
Presentation on Blue Brain project .pptx
Presentation on Blue Brain project .pptxPresentation on Blue Brain project .pptx
Presentation on Blue Brain project .pptxvelezjace9
 
Blue Brain Technology
Blue Brain TechnologyBlue Brain Technology
Blue Brain Technologyhanmaslah
 
Technology Runs to Get Brains’ Control
Technology Runs to Get Brains’ ControlTechnology Runs to Get Brains’ Control
Technology Runs to Get Brains’ ControlMd. Ashifur Rahaman
 
Cracking the neural code
Cracking the neural codeCracking the neural code
Cracking the neural codeAlbanLevy
 
CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...
CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...
CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...Christy Maver
 
CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...
CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...
CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...Numenta
 
Neural Netwrok
Neural NetwrokNeural Netwrok
Neural NetwrokRabin BK
 
Brain chips ppt
Brain chips pptBrain chips ppt
Brain chips ppt9440999171
 
Art of Neuroscience 2017 - Team Favorites
Art of Neuroscience 2017 - Team FavoritesArt of Neuroscience 2017 - Team Favorites
Art of Neuroscience 2017 - Team FavoritesArt of Neuroscience
 
Art of Neuroscience 2017 - Selected Submissions
Art of Neuroscience 2017 - Selected SubmissionsArt of Neuroscience 2017 - Selected Submissions
Art of Neuroscience 2017 - Selected SubmissionsArt of Neuroscience
 
Blue_Brain_ppt_By_ K. Sai Krishna (Thalaivar)
Blue_Brain_ppt_By_ K. Sai Krishna (Thalaivar)Blue_Brain_ppt_By_ K. Sai Krishna (Thalaivar)
Blue_Brain_ppt_By_ K. Sai Krishna (Thalaivar)AMIRDHAADALARASUN
 
Introduction to Neural networks (under graduate course) Lecture 1 of 9
Introduction to Neural networks (under graduate course) Lecture 1 of 9Introduction to Neural networks (under graduate course) Lecture 1 of 9
Introduction to Neural networks (under graduate course) Lecture 1 of 9Randa Elanwar
 
Blue Brain Technology, Artificial Intelligence
Blue Brain Technology, Artificial IntelligenceBlue Brain Technology, Artificial Intelligence
Blue Brain Technology, Artificial Intelligencejjoyjessy31
 

Similar a 01 Introduction to Neural Networks and Deep Learning (20)

blue brain
blue brainblue brain
blue brain
 
Blue Brain
Blue BrainBlue Brain
Blue Brain
 
Blue Brain_Nikhilesh+Krishna Raj
Blue Brain_Nikhilesh+Krishna RajBlue Brain_Nikhilesh+Krishna Raj
Blue Brain_Nikhilesh+Krishna Raj
 
Axon to axon
Axon to axonAxon to axon
Axon to axon
 
Presentation on Blue Brain project .pptx
Presentation on Blue Brain project .pptxPresentation on Blue Brain project .pptx
Presentation on Blue Brain project .pptx
 
Ann first lecture
Ann first lectureAnn first lecture
Ann first lecture
 
Blue Brain Technology
Blue Brain TechnologyBlue Brain Technology
Blue Brain Technology
 
Technology Runs to Get Brains’ Control
Technology Runs to Get Brains’ ControlTechnology Runs to Get Brains’ Control
Technology Runs to Get Brains’ Control
 
Miracles of Brain
Miracles of BrainMiracles of Brain
Miracles of Brain
 
Cracking the neural code
Cracking the neural codeCracking the neural code
Cracking the neural code
 
CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...
CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...
CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...
 
CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...
CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...
CVPR 2020 Workshop: Sparsity in the neocortex, and its implications for conti...
 
Neural Netwrok
Neural NetwrokNeural Netwrok
Neural Netwrok
 
Brain chips ppt
Brain chips pptBrain chips ppt
Brain chips ppt
 
Art of Neuroscience 2017 - Team Favorites
Art of Neuroscience 2017 - Team FavoritesArt of Neuroscience 2017 - Team Favorites
Art of Neuroscience 2017 - Team Favorites
 
Art of Neuroscience 2017 - Selected Submissions
Art of Neuroscience 2017 - Selected SubmissionsArt of Neuroscience 2017 - Selected Submissions
Art of Neuroscience 2017 - Selected Submissions
 
Blue_Brain_ppt_By_ K. Sai Krishna (Thalaivar)
Blue_Brain_ppt_By_ K. Sai Krishna (Thalaivar)Blue_Brain_ppt_By_ K. Sai Krishna (Thalaivar)
Blue_Brain_ppt_By_ K. Sai Krishna (Thalaivar)
 
Introduction to Neural networks (under graduate course) Lecture 1 of 9
Introduction to Neural networks (under graduate course) Lecture 1 of 9Introduction to Neural networks (under graduate course) Lecture 1 of 9
Introduction to Neural networks (under graduate course) Lecture 1 of 9
 
1.ppt
1.ppt1.ppt
1.ppt
 
Blue Brain Technology, Artificial Intelligence
Blue Brain Technology, Artificial IntelligenceBlue Brain Technology, Artificial Intelligence
Blue Brain Technology, Artificial Intelligence
 

Más de Andres Mendez-Vazquez

01.04 orthonormal basis_eigen_vectors
01.04 orthonormal basis_eigen_vectors01.04 orthonormal basis_eigen_vectors
01.04 orthonormal basis_eigen_vectorsAndres Mendez-Vazquez
 
01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issues01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issuesAndres Mendez-Vazquez
 
05 backpropagation automatic_differentiation
05 backpropagation automatic_differentiation05 backpropagation automatic_differentiation
05 backpropagation automatic_differentiationAndres Mendez-Vazquez
 
25 introduction reinforcement_learning
25 introduction reinforcement_learning25 introduction reinforcement_learning
25 introduction reinforcement_learningAndres Mendez-Vazquez
 
Neural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning SyllabusNeural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning SyllabusAndres Mendez-Vazquez
 
Introduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabusIntroduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabusAndres Mendez-Vazquez
 
Ideas about a Bachelor in Machine Learning/Data Sciences
Ideas about a Bachelor in Machine Learning/Data SciencesIdeas about a Bachelor in Machine Learning/Data Sciences
Ideas about a Bachelor in Machine Learning/Data SciencesAndres Mendez-Vazquez
 
20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variations20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variationsAndres Mendez-Vazquez
 
Introduction Mathematics Intelligent Systems Syllabus
Introduction Mathematics Intelligent Systems SyllabusIntroduction Mathematics Intelligent Systems Syllabus
Introduction Mathematics Intelligent Systems SyllabusAndres Mendez-Vazquez
 

Más de Andres Mendez-Vazquez (20)

2.03 bayesian estimation
2.03 bayesian estimation2.03 bayesian estimation
2.03 bayesian estimation
 
05 linear transformations
05 linear transformations05 linear transformations
05 linear transformations
 
01.04 orthonormal basis_eigen_vectors
01.04 orthonormal basis_eigen_vectors01.04 orthonormal basis_eigen_vectors
01.04 orthonormal basis_eigen_vectors
 
01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issues01.03 squared matrices_and_other_issues
01.03 squared matrices_and_other_issues
 
01.02 linear equations
01.02 linear equations01.02 linear equations
01.02 linear equations
 
01.01 vector spaces
01.01 vector spaces01.01 vector spaces
01.01 vector spaces
 
06 recurrent neural_networks
06 recurrent neural_networks06 recurrent neural_networks
06 recurrent neural_networks
 
05 backpropagation automatic_differentiation
05 backpropagation automatic_differentiation05 backpropagation automatic_differentiation
05 backpropagation automatic_differentiation
 
Zetta global
Zetta globalZetta global
Zetta global
 
25 introduction reinforcement_learning
25 introduction reinforcement_learning25 introduction reinforcement_learning
25 introduction reinforcement_learning
 
Neural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning SyllabusNeural Networks and Deep Learning Syllabus
Neural Networks and Deep Learning Syllabus
 
Introduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabusIntroduction to artificial_intelligence_syllabus
Introduction to artificial_intelligence_syllabus
 
Ideas 09 22_2018
Ideas 09 22_2018Ideas 09 22_2018
Ideas 09 22_2018
 
Ideas about a Bachelor in Machine Learning/Data Sciences
Ideas about a Bachelor in Machine Learning/Data SciencesIdeas about a Bachelor in Machine Learning/Data Sciences
Ideas about a Bachelor in Machine Learning/Data Sciences
 
Analysis of Algorithms Syllabus
Analysis of Algorithms  SyllabusAnalysis of Algorithms  Syllabus
Analysis of Algorithms Syllabus
 
20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variations20 k-means, k-center, k-meoids and variations
20 k-means, k-center, k-meoids and variations
 
18.1 combining models
18.1 combining models18.1 combining models
18.1 combining models
 
17 vapnik chervonenkis dimension
17 vapnik chervonenkis dimension17 vapnik chervonenkis dimension
17 vapnik chervonenkis dimension
 
A basic introduction to learning
A basic introduction to learningA basic introduction to learning
A basic introduction to learning
 
Introduction Mathematics Intelligent Systems Syllabus
Introduction Mathematics Intelligent Systems SyllabusIntroduction Mathematics Intelligent Systems Syllabus
Introduction Mathematics Intelligent Systems Syllabus
 

Último

Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network DevicesChandrakantDivate1
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxchumtiyababu
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdfKamal Acharya
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwaitjaanualu31
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdfKamal Acharya
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxNadaHaitham1
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARKOUSTAV SARKAR
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptNANDHAKUMARA10
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadhamedmustafa094
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapRishantSharmaFr
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationBhangaleSonal
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesRAJNEESHKUMAR341697
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersMairaAshraf6
 

Último (20)

Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
Call Girls in South Ex (delhi) call me [🔝9953056974🔝] escort service 24X7
 
Verification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptxVerification of thevenin's theorem for BEEE Lab (1).pptx
Verification of thevenin's theorem for BEEE Lab (1).pptx
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills KuwaitKuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
Kuwait City MTP kit ((+919101817206)) Buy Abortion Pills Kuwait
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
Wadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptxWadi Rum luxhotel lodge Analysis case study.pptx
Wadi Rum luxhotel lodge Analysis case study.pptx
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
Block diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.pptBlock diagram reduction techniques in control systems.ppt
Block diagram reduction techniques in control systems.ppt
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
kiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal loadkiln thermal load.pptx kiln tgermal load
kiln thermal load.pptx kiln tgermal load
 
Unleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leapUnleashing the Power of the SORA AI lastest leap
Unleashing the Power of the SORA AI lastest leap
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
DC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equationDC MACHINE-Motoring and generation, Armature circuit equation
DC MACHINE-Motoring and generation, Armature circuit equation
 
Engineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planesEngineering Drawing focus on projection of planes
Engineering Drawing focus on projection of planes
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Computer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to ComputersComputer Lecture 01.pptxIntroduction to Computers
Computer Lecture 01.pptxIntroduction to Computers
 

01 Introduction to Neural Networks and Deep Learning

  • 1. Introduction to Neural Networks and Deep Learning Introduction to Neural Networks Andres Mendez-Vazquez May 5, 2019 1 / 63
  • 2. Outline 1 What are Neural Networks? Structure of a Neural Cell Pigeon Experiment Formal Definition of Artificial Neural Network Basic Elements of an Artificial Neuron Types of Activation Functions McCulloch-Pitts model More Advanced Models The Problem of the Vanishing Gradient Fixing the Problem, ReLu function 2 Neural Network As a Graph Introduction Feedback Neural Architectures Knowledge Representation Design of a Neural Network Representing Knowledge in a Neural Networks 2 / 63
  • 3. What are Neural Networks? Basic Intuition The human brain is a highly complex, nonlinear and parallel computer It is organized as a Network with (Ramon y Cajal 1911) 1 Basic Processing Units ≈ Neurons 2 Connections ≈ Axons and Dendrites 3 / 63
  • 4. What are Neural Networks? Basic Intuition The human brain is a highly complex, nonlinear and parallel computer It is organized as a Network with (Ramon y Cajal 1911) 1 Basic Processing Units ≈ Neurons 2 Connections ≈ Axons and Dendrites 3 / 63
  • 5. What are Neural Networks? Basic Intuition The human brain is a highly complex, nonlinear and parallel computer It is organized as a Network with (Ramon y Cajal 1911) 1 Basic Processing Units ≈ Neurons 2 Connections ≈ Axons and Dendrites 3 / 63
  • 6. Outline 1 What are Neural Networks? Structure of a Neural Cell Pigeon Experiment Formal Definition of Artificial Neural Network Basic Elements of an Artificial Neuron Types of Activation Functions McCulloch-Pitts model More Advanced Models The Problem of the Vanishing Gradient Fixing the Problem, ReLu function 2 Neural Network As a Graph Introduction Feedback Neural Architectures Knowledge Representation Design of a Neural Network Representing Knowledge in a Neural Networks 4 / 63
  • 7. Example Example Apical Dentrites Basal Dentrites CELL BODY Dendritic Spines   SYNAPTIC INPUTS AXON 5 / 63
  • 8. Silicon Chip Vs Neurons Speed Differential 1 Speeds in silicon chips are in the nanosecond range (10−9 s). 2 Speeds in human neural networks are in the millisecond range (10−3 s). However We have massive parallelism on the human brain 1 10 billion neurons in the human cortex. 2 60 trillion synapses or connections High Energy Efficiency 1 Human Brain uses 10−16 joules per operation. 2 Best computers use 10−6 joules per operation. 6 / 63
  • 9. Silicon Chip Vs Neurons Speed Differential 1 Speeds in silicon chips are in the nanosecond range (10−9 s). 2 Speeds in human neural networks are in the millisecond range (10−3 s). However We have massive parallelism on the human brain 1 10 billion neurons in the human cortex. 2 60 trillion synapses or connections High Energy Efficiency 1 Human Brain uses 10−16 joules per operation. 2 Best computers use 10−6 joules per operation. 6 / 63
  • 10. Silicon Chip Vs Neurons Speed Differential 1 Speeds in silicon chips are in the nanosecond range (10−9 s). 2 Speeds in human neural networks are in the millisecond range (10−3 s). However We have massive parallelism on the human brain 1 10 billion neurons in the human cortex. 2 60 trillion synapses or connections High Energy Efficiency 1 Human Brain uses 10−16 joules per operation. 2 Best computers use 10−6 joules per operation. 6 / 63
  • 11. Silicon Chip Vs Neurons Speed Differential 1 Speeds in silicon chips are in the nanosecond range (10−9 s). 2 Speeds in human neural networks are in the millisecond range (10−3 s). However We have massive parallelism on the human brain 1 10 billion neurons in the human cortex. 2 60 trillion synapses or connections High Energy Efficiency 1 Human Brain uses 10−16 joules per operation. 2 Best computers use 10−6 joules per operation. 6 / 63
  • 12. Silicon Chip Vs Neurons Speed Differential 1 Speeds in silicon chips are in the nanosecond range (10−9 s). 2 Speeds in human neural networks are in the millisecond range (10−3 s). However We have massive parallelism on the human brain 1 10 billion neurons in the human cortex. 2 60 trillion synapses or connections High Energy Efficiency 1 Human Brain uses 10−16 joules per operation. 2 Best computers use 10−6 joules per operation. 6 / 63
  • 13. Silicon Chip Vs Neurons Speed Differential 1 Speeds in silicon chips are in the nanosecond range (10−9 s). 2 Speeds in human neural networks are in the millisecond range (10−3 s). However We have massive parallelism on the human brain 1 10 billion neurons in the human cortex. 2 60 trillion synapses or connections High Energy Efficiency 1 Human Brain uses 10−16 joules per operation. 2 Best computers use 10−6 joules per operation. 6 / 63
  • 14. Silicon Chip Vs Neurons Speed Differential 1 Speeds in silicon chips are in the nanosecond range (10−9 s). 2 Speeds in human neural networks are in the millisecond range (10−3 s). However We have massive parallelism on the human brain 1 10 billion neurons in the human cortex. 2 60 trillion synapses or connections High Energy Efficiency 1 Human Brain uses 10−16 joules per operation. 2 Best computers use 10−6 joules per operation. 6 / 63
  • 15. Outline 1 What are Neural Networks? Structure of a Neural Cell Pigeon Experiment Formal Definition of Artificial Neural Network Basic Elements of an Artificial Neuron Types of Activation Functions McCulloch-Pitts model More Advanced Models The Problem of the Vanishing Gradient Fixing the Problem, ReLu function 2 Neural Network As a Graph Introduction Feedback Neural Architectures Knowledge Representation Design of a Neural Network Representing Knowledge in a Neural Networks 7 / 63
  • 16. Pigeon Experiment Watanabe et al. 1995 Pigeons as art experts Experiment Pigeon is in a Skinner box Then, paintings of two different artists (e.g. Chagall / Van Gogh) are presented to it. A Reward is given for pecking when presented a particular artist (e.g. Van Gogh). 8 / 63
  • 17. Pigeon Experiment Watanabe et al. 1995 Pigeons as art experts Experiment Pigeon is in a Skinner box Then, paintings of two different artists (e.g. Chagall / Van Gogh) are presented to it. A Reward is given for pecking when presented a particular artist (e.g. Van Gogh). 8 / 63
  • 18. Pigeon Experiment Watanabe et al. 1995 Pigeons as art experts Experiment Pigeon is in a Skinner box Then, paintings of two different artists (e.g. Chagall / Van Gogh) are presented to it. A Reward is given for pecking when presented a particular artist (e.g. Van Gogh). 8 / 63
  • 19. The Pigeon in the Skinner Box Something like this 9 / 63
  • 20. Results Something Notable Pigeons were able to discriminate between Van Gogh and Chagall with 95% accuracy (when presented with pictures they had been trained on). Discrimination still 85% successful for previously unseen paintings of the artists. Thus 1 Pigeons do not simply memorize the pictures. 2 They can extract and recognize patterns (the ‘style’). 3 They generalize from the already seen to make predictions. 4 This is what neural networks (biological and artificial) are good at (unlike conventional computer). 10 / 63
  • 21. Results Something Notable Pigeons were able to discriminate between Van Gogh and Chagall with 95% accuracy (when presented with pictures they had been trained on). Discrimination still 85% successful for previously unseen paintings of the artists. Thus 1 Pigeons do not simply memorize the pictures. 2 They can extract and recognize patterns (the ‘style’). 3 They generalize from the already seen to make predictions. 4 This is what neural networks (biological and artificial) are good at (unlike conventional computer). 10 / 63
  • 22. Results Something Notable Pigeons were able to discriminate between Van Gogh and Chagall with 95% accuracy (when presented with pictures they had been trained on). Discrimination still 85% successful for previously unseen paintings of the artists. Thus 1 Pigeons do not simply memorize the pictures. 2 They can extract and recognize patterns (the ‘style’). 3 They generalize from the already seen to make predictions. 4 This is what neural networks (biological and artificial) are good at (unlike conventional computer). 10 / 63
  • 23. Results Something Notable Pigeons were able to discriminate between Van Gogh and Chagall with 95% accuracy (when presented with pictures they had been trained on). Discrimination still 85% successful for previously unseen paintings of the artists. Thus 1 Pigeons do not simply memorize the pictures. 2 They can extract and recognize patterns (the ‘style’). 3 They generalize from the already seen to make predictions. 4 This is what neural networks (biological and artificial) are good at (unlike conventional computer). 10 / 63
  • 24. Results Something Notable Pigeons were able to discriminate between Van Gogh and Chagall with 95% accuracy (when presented with pictures they had been trained on). Discrimination still 85% successful for previously unseen paintings of the artists. Thus 1 Pigeons do not simply memorize the pictures. 2 They can extract and recognize patterns (the ‘style’). 3 They generalize from the already seen to make predictions. 4 This is what neural networks (biological and artificial) are good at (unlike conventional computer). 10 / 63
  • 25. Results Something Notable Pigeons were able to discriminate between Van Gogh and Chagall with 95% accuracy (when presented with pictures they had been trained on). Discrimination still 85% successful for previously unseen paintings of the artists. Thus 1 Pigeons do not simply memorize the pictures. 2 They can extract and recognize patterns (the ‘style’). 3 They generalize from the already seen to make predictions. 4 This is what neural networks (biological and artificial) are good at (unlike conventional computer). 10 / 63
  • 26. Outline 1 What are Neural Networks? Structure of a Neural Cell Pigeon Experiment Formal Definition of Artificial Neural Network Basic Elements of an Artificial Neuron Types of Activation Functions McCulloch-Pitts model More Advanced Models The Problem of the Vanishing Gradient Fixing the Problem, ReLu function 2 Neural Network As a Graph Introduction Feedback Neural Architectures Knowledge Representation Design of a Neural Network Representing Knowledge in a Neural Networks 11 / 63
  • 27. Formal Definition Definition An artificial neural network is a massively parallel distributed processor made up of simple processing units. It resembles the brain in two respects: 1 Knowledge is acquired by the network from its environment through a learning process. 2 Inter-neuron connection strengths, known as synaptic weights, are used to store the acquired knowledge. 12 / 63
  • 28. Formal Definition Definition An artificial neural network is a massively parallel distributed processor made up of simple processing units. It resembles the brain in two respects: 1 Knowledge is acquired by the network from its environment through a learning process. 2 Inter-neuron connection strengths, known as synaptic weights, are used to store the acquired knowledge. 12 / 63
  • 29. Formal Definition Definition An artificial neural network is a massively parallel distributed processor made up of simple processing units. It resembles the brain in two respects: 1 Knowledge is acquired by the network from its environment through a learning process. 2 Inter-neuron connection strengths, known as synaptic weights, are used to store the acquired knowledge. 12 / 63
  • 30. Inter-neuron connection strengths? How do the neuron collect this information? Some way to aggregate information needs to be devised... A Classic Use a summation of product of weights by inputs!!! Something like m i=1 wi × xi Where: wi is the strength given to signal xi However: We still need a way to regulate this “aggregation” (Activation function) 13 / 63
  • 31. Inter-neuron connection strengths? How do the neuron collect this information? Some way to aggregate information needs to be devised... A Classic Use a summation of product of weights by inputs!!! Something like m i=1 wi × xi Where: wi is the strength given to signal xi However: We still need a way to regulate this “aggregation” (Activation function) 13 / 63
  • 32. Inter-neuron connection strengths? How do the neuron collect this information? Some way to aggregate information needs to be devised... A Classic Use a summation of product of weights by inputs!!! Something like m i=1 wi × xi Where: wi is the strength given to signal xi However: We still need a way to regulate this “aggregation” (Activation function) 13 / 63
  • 33. The Model of a Artificial Neuron Graphical Representation for neuron k INPUT SIGNALS SYNAPTIC WEIGHTS BIAS SUMMING JUNCTION OUTPUT ACTIVATION FUNCTION 14 / 63
  • 34. Outline 1 What are Neural Networks? Structure of a Neural Cell Pigeon Experiment Formal Definition of Artificial Neural Network Basic Elements of an Artificial Neuron Types of Activation Functions McCulloch-Pitts model More Advanced Models The Problem of the Vanishing Gradient Fixing the Problem, ReLu function 2 Neural Network As a Graph Introduction Feedback Neural Architectures Knowledge Representation Design of a Neural Network Representing Knowledge in a Neural Networks 15 / 63
  • 35. Basic Elements of an Artificial Neuron (AN) Model Set of Connecting links A signal xj, at the input of synapse j connected to neuron k is multiplied by the synaptic weight wkj. The weight of an AN may lie in a negative of positive range. What about the real neuron? In classic literature you only have positive values. Adder An adder for summing the input signals, weighted by the respective synapses of the neuron. Activation function (Squashing function) It limits the amplitude of the output of a neuron. It maps the permissible range of the output signal to an interval. 16 / 63
  • 36. Basic Elements of an Artificial Neuron (AN) Model Set of Connecting links A signal xj, at the input of synapse j connected to neuron k is multiplied by the synaptic weight wkj. The weight of an AN may lie in a negative of positive range. What about the real neuron? In classic literature you only have positive values. Adder An adder for summing the input signals, weighted by the respective synapses of the neuron. Activation function (Squashing function) It limits the amplitude of the output of a neuron. It maps the permissible range of the output signal to an interval. 16 / 63
  • 37. Basic Elements of an Artificial Neuron (AN) Model Set of Connecting links A signal xj, at the input of synapse j connected to neuron k is multiplied by the synaptic weight wkj. The weight of an AN may lie in a negative of positive range. What about the real neuron? In classic literature you only have positive values. Adder An adder for summing the input signals, weighted by the respective synapses of the neuron. Activation function (Squashing function) It limits the amplitude of the output of a neuron. It maps the permissible range of the output signal to an interval. 16 / 63
  • 38. Basic Elements of an Artificial Neuron (AN) Model Set of Connecting links A signal xj, at the input of synapse j connected to neuron k is multiplied by the synaptic weight wkj. The weight of an AN may lie in a negative of positive range. What about the real neuron? In classic literature you only have positive values. Adder An adder for summing the input signals, weighted by the respective synapses of the neuron. Activation function (Squashing function) It limits the amplitude of the output of a neuron. It maps the permissible range of the output signal to an interval. 16 / 63
  • 39. Mathematically Adder uk = m j=1 wkjxj (1) 1 x1, x2, ..., xm are the input signals. 2 wk1, wk2, ..., wkm are the synaptic weights. 3 It is also known as “Affine Transformation.” Activation function yk = ϕ (uk + bk) (2) 1 yk output of neuron. 2 ϕ is the activation function. 17 / 63
  • 40. Mathematically Adder uk = m j=1 wkjxj (1) 1 x1, x2, ..., xm are the input signals. 2 wk1, wk2, ..., wkm are the synaptic weights. 3 It is also known as “Affine Transformation.” Activation function yk = ϕ (uk + bk) (2) 1 yk output of neuron. 2 ϕ is the activation function. 17 / 63
  • 41. Mathematically Adder uk = m j=1 wkjxj (1) 1 x1, x2, ..., xm are the input signals. 2 wk1, wk2, ..., wkm are the synaptic weights. 3 It is also known as “Affine Transformation.” Activation function yk = ϕ (uk + bk) (2) 1 yk output of neuron. 2 ϕ is the activation function. 17 / 63
  • 42. Mathematically Adder uk = m j=1 wkjxj (1) 1 x1, x2, ..., xm are the input signals. 2 wk1, wk2, ..., wkm are the synaptic weights. 3 It is also known as “Affine Transformation.” Activation function yk = ϕ (uk + bk) (2) 1 yk output of neuron. 2 ϕ is the activation function. 17 / 63
  • 43. Mathematically Adder uk = m j=1 wkjxj (1) 1 x1, x2, ..., xm are the input signals. 2 wk1, wk2, ..., wkm are the synaptic weights. 3 It is also known as “Affine Transformation.” Activation function yk = ϕ (uk + bk) (2) 1 yk output of neuron. 2 ϕ is the activation function. 17 / 63
  • 44. Mathematically Adder uk = m j=1 wkjxj (1) 1 x1, x2, ..., xm are the input signals. 2 wk1, wk2, ..., wkm are the synaptic weights. 3 It is also known as “Affine Transformation.” Activation function yk = ϕ (uk + bk) (2) 1 yk output of neuron. 2 ϕ is the activation function. 17 / 63
  • 45. Mathematically Adder uk = m j=1 wkjxj (1) 1 x1, x2, ..., xm are the input signals. 2 wk1, wk2, ..., wkm are the synaptic weights. 3 It is also known as “Affine Transformation.” Activation function yk = ϕ (uk + bk) (2) 1 yk output of neuron. 2 ϕ is the activation function. 17 / 63
  • 46. Integrating the Bias The Affine Transformation 0 Induced Local Field, Bias Linear Combiner's Output, 18 / 63
  • 47. Thus Final Equation vk = m j=0 wkjxj yk = ϕ (vk) With x0 = 1 and wk0 = bk 19 / 63
  • 48. Thus Final Equation vk = m j=0 wkjxj yk = ϕ (vk) With x0 = 1 and wk0 = bk 19 / 63
  • 49. Outline 1 What are Neural Networks? Structure of a Neural Cell Pigeon Experiment Formal Definition of Artificial Neural Network Basic Elements of an Artificial Neuron Types of Activation Functions McCulloch-Pitts model More Advanced Models The Problem of the Vanishing Gradient Fixing the Problem, ReLu function 2 Neural Network As a Graph Introduction Feedback Neural Architectures Knowledge Representation Design of a Neural Network Representing Knowledge in a Neural Networks 20 / 63
  • 50. Types of Activation Functions I Threshold Function ϕ (v) = 1 if v ≥ 0 0 if v < 0 (Heaviside Function) (3) In the following picture 21 / 63
  • 51. Types of Activation Functions I Threshold Function ϕ (v) = 1 if v ≥ 0 0 if v < 0 (Heaviside Function) (3) In the following picture 1 -1-2-3-4 10 2 3 0.5 21 / 63
  • 52. Thus We can use this activation function To generate the first Neural Network Model Clearly The model uses the summation as aggregation operator and a threshold function. 22 / 63
  • 53. Thus We can use this activation function To generate the first Neural Network Model Clearly The model uses the summation as aggregation operator and a threshold function. 22 / 63
  • 54. Outline 1 What are Neural Networks? Structure of a Neural Cell Pigeon Experiment Formal Definition of Artificial Neural Network Basic Elements of an Artificial Neuron Types of Activation Functions McCulloch-Pitts model More Advanced Models The Problem of the Vanishing Gradient Fixing the Problem, ReLu function 2 Neural Network As a Graph Introduction Feedback Neural Architectures Knowledge Representation Design of a Neural Network Representing Knowledge in a Neural Networks 23 / 63
  • 55. McCulloch-Pitts model McCulloch-Pitts model (Pioneers of Neural Networks in the 1940’s) Output Output yk = 1 if vk ≥ θ 0 if vk < θ (4) yk = 1 if vk ≤ θ = 0 0 if vk > θ = 0 with induced local field wk = (1, 1)T vk = m j=1 wkjxj + bk (5) 24 / 63
  • 56. McCulloch-Pitts model McCulloch-Pitts model (Pioneers of Neural Networks in the 1940’s) Output Output yk = 1 if vk ≥ θ 0 if vk < θ (4) yk = 1 if vk ≤ θ = 0 0 if vk > θ = 0 with induced local field wk = (1, 1)T vk = m j=1 wkjxj + bk (5) 24 / 63
  • 57. It is possible to do classic operations in Boolean Algebra AND Gate 25 / 63
  • 58. In the other hand OR Gate 26 / 63
  • 60. And the impact is further understood if you look at this paper Claude Shannon “A Symbolic Analysis of Relay and Switching Circuits” Shannon proved that his switching circuits could be used to simplify the arrangement of the electromechanical relays These circuits could solve all problems that Boolean algebra could solve. Basically, he proved that computer circuits They can solve computational complex problems... 28 / 63
  • 61. And the impact is further understood if you look at this paper Claude Shannon “A Symbolic Analysis of Relay and Switching Circuits” Shannon proved that his switching circuits could be used to simplify the arrangement of the electromechanical relays These circuits could solve all problems that Boolean algebra could solve. Basically, he proved that computer circuits They can solve computational complex problems... 28 / 63
  • 62. And the impact is further understood if you look at this paper Claude Shannon “A Symbolic Analysis of Relay and Switching Circuits” Shannon proved that his switching circuits could be used to simplify the arrangement of the electromechanical relays These circuits could solve all problems that Boolean algebra could solve. Basically, he proved that computer circuits They can solve computational complex problems... 28 / 63
  • 63. And the impact is further understood if you look at this paper Claude Shannon “A Symbolic Analysis of Relay and Switching Circuits” Shannon proved that his switching circuits could be used to simplify the arrangement of the electromechanical relays These circuits could solve all problems that Boolean algebra could solve. Basically, he proved that computer circuits They can solve computational complex problems... 28 / 63
  • 64. Outline 1 What are Neural Networks? Structure of a Neural Cell Pigeon Experiment Formal Definition of Artificial Neural Network Basic Elements of an Artificial Neuron Types of Activation Functions McCulloch-Pitts model More Advanced Models The Problem of the Vanishing Gradient Fixing the Problem, ReLu function 2 Neural Network As a Graph Introduction Feedback Neural Architectures Knowledge Representation Design of a Neural Network Representing Knowledge in a Neural Networks 29 / 63
  • 65. More advanced activation function Piecewise-Linear Function ϕ (v) =    1 if vk ≥ 1 2 v if − 1 2 < vk < 1 2 0 if v ≤ −1 2 (6) Example 30 / 63
  • 66. More advanced activation function Piecewise-Linear Function ϕ (v) =    1 if vk ≥ 1 2 v if − 1 2 < vk < 1 2 0 if v ≤ −1 2 (6) Example 1 -1-2-3-4 10 2 3 0.5 -0.5 0.5 30 / 63
  • 67. Remarks Notes about Piecewise-Linear function The amplification factor inside the linear region of operation is assumed to be unity. Special Cases A linear combiner arises if the linear region of operation is maintained without running into saturation. The piecewise-linear function reduces to a threshold function if the amplification factor of the linear region is made infinitely large. 31 / 63
  • 68. Remarks Notes about Piecewise-Linear function The amplification factor inside the linear region of operation is assumed to be unity. Special Cases A linear combiner arises if the linear region of operation is maintained without running into saturation. The piecewise-linear function reduces to a threshold function if the amplification factor of the linear region is made infinitely large. 31 / 63
  • 69. Remarks Notes about Piecewise-Linear function The amplification factor inside the linear region of operation is assumed to be unity. Special Cases A linear combiner arises if the linear region of operation is maintained without running into saturation. The piecewise-linear function reduces to a threshold function if the amplification factor of the linear region is made infinitely large. 31 / 63
  • 70. A better choice!!! Sigmoid function ϕ (v) = 1 1 + exp {−av} (7) Where a is a slope parameter. As a → ∞ then ϕ → Threshold function. Examples 32 / 63
  • 71. A better choice!!! Sigmoid function ϕ (v) = 1 1 + exp {−av} (7) Where a is a slope parameter. As a → ∞ then ϕ → Threshold function. Examples Increasing 32 / 63
  • 72. Outline 1 What are Neural Networks? Structure of a Neural Cell Pigeon Experiment Formal Definition of Artificial Neural Network Basic Elements of an Artificial Neuron Types of Activation Functions McCulloch-Pitts model More Advanced Models The Problem of the Vanishing Gradient Fixing the Problem, ReLu function 2 Neural Network As a Graph Introduction Feedback Neural Architectures Knowledge Representation Design of a Neural Network Representing Knowledge in a Neural Networks 33 / 63
  • 73. The Problem of the Vanishing Gradient When using a non-linearity However, there is a drawback when using Back-Propagation (As we saw in Machine Learning) under a sigmoid function s (x) = 1 1 + e−x Because if we imagine a Deep Neural Network as a series of layer functions fi y (A) = ft ◦ ft−1 ◦ · · · ◦ f2 ◦ f1 (A) With ft is the last layer. Therefore, we finish with a sequence of derivatives ∂y (A) ∂w1i = ∂ft (ft−1) ∂ft−1 · ∂ft−1 (ft−2) ∂ft−2 · · · · · ∂f2 (f1) ∂f2 · ∂f1 (A) ∂w1i 34 / 63
  • 74. The Problem of the Vanishing Gradient When using a non-linearity However, there is a drawback when using Back-Propagation (As we saw in Machine Learning) under a sigmoid function s (x) = 1 1 + e−x Because if we imagine a Deep Neural Network as a series of layer functions fi y (A) = ft ◦ ft−1 ◦ · · · ◦ f2 ◦ f1 (A) With ft is the last layer. Therefore, we finish with a sequence of derivatives ∂y (A) ∂w1i = ∂ft (ft−1) ∂ft−1 · ∂ft−1 (ft−2) ∂ft−2 · · · · · ∂f2 (f1) ∂f2 · ∂f1 (A) ∂w1i 34 / 63
  • 75. The Problem of the Vanishing Gradient When using a non-linearity However, there is a drawback when using Back-Propagation (As we saw in Machine Learning) under a sigmoid function s (x) = 1 1 + e−x Because if we imagine a Deep Neural Network as a series of layer functions fi y (A) = ft ◦ ft−1 ◦ · · · ◦ f2 ◦ f1 (A) With ft is the last layer. Therefore, we finish with a sequence of derivatives ∂y (A) ∂w1i = ∂ft (ft−1) ∂ft−1 · ∂ft−1 (ft−2) ∂ft−2 · · · · · ∂f2 (f1) ∂f2 · ∂f1 (A) ∂w1i 34 / 63
  • 76. Therefore Given the commutativity of the product You could put together the derivative of the sigmoid’s f (x) = ds (x) dx = e−x (1 + e−x)2 Therefore, deriving again df (x) dx = − e−x (1 + e−x)2 + 2 (e−x) 2 (1 + e−x)3 After making df(x) dx = 0 We have the maximum is at x = 0 35 / 63
  • 77. Therefore Given the commutativity of the product You could put together the derivative of the sigmoid’s f (x) = ds (x) dx = e−x (1 + e−x)2 Therefore, deriving again df (x) dx = − e−x (1 + e−x)2 + 2 (e−x) 2 (1 + e−x)3 After making df(x) dx = 0 We have the maximum is at x = 0 35 / 63
  • 78. Therefore Given the commutativity of the product You could put together the derivative of the sigmoid’s f (x) = ds (x) dx = e−x (1 + e−x)2 Therefore, deriving again df (x) dx = − e−x (1 + e−x)2 + 2 (e−x) 2 (1 + e−x)3 After making df(x) dx = 0 We have the maximum is at x = 0 35 / 63
  • 79. Therefore The maximum for the derivative of the sigmoid f (0) = 0.25 Therefore, Given a Deep Convolutional Network We could finish with lim k→∞ ds (x) dx k = lim k→∞ (0.25)k → 0 A Vanishing Derivative or Vanishing Gradient Making quite difficult to do train a deeper network using this activation function for Deep Learning and even in Shallow Learning 36 / 63
  • 80. Therefore The maximum for the derivative of the sigmoid f (0) = 0.25 Therefore, Given a Deep Convolutional Network We could finish with lim k→∞ ds (x) dx k = lim k→∞ (0.25)k → 0 A Vanishing Derivative or Vanishing Gradient Making quite difficult to do train a deeper network using this activation function for Deep Learning and even in Shallow Learning 36 / 63
  • 81. Therefore The maximum for the derivative of the sigmoid f (0) = 0.25 Therefore, Given a Deep Convolutional Network We could finish with lim k→∞ ds (x) dx k = lim k→∞ (0.25)k → 0 A Vanishing Derivative or Vanishing Gradient Making quite difficult to do train a deeper network using this activation function for Deep Learning and even in Shallow Learning 36 / 63
  • 82. Outline 1 What are Neural Networks? Structure of a Neural Cell Pigeon Experiment Formal Definition of Artificial Neural Network Basic Elements of an Artificial Neuron Types of Activation Functions McCulloch-Pitts model More Advanced Models The Problem of the Vanishing Gradient Fixing the Problem, ReLu function 2 Neural Network As a Graph Introduction Feedback Neural Architectures Knowledge Representation Design of a Neural Network Representing Knowledge in a Neural Networks 37 / 63
  • 83. Thus The need to introduce a new function f (x) = x+ = max (0, x) It is called ReLu or Rectifier With a smooth approximation (Softplus function) f (x) = ln 1 + ekx k 38 / 63
  • 84. Thus The need to introduce a new function f (x) = x+ = max (0, x) It is called ReLu or Rectifier With a smooth approximation (Softplus function) f (x) = ln 1 + ekx k 38 / 63
  • 85. We have When k = 1 +0.4 +1.0 +1.6 +2.2−0.4−1.0−1.6−2.2−2.8 −0.5 +0.5 +1.0 +1.5 +2.0 +2.5 +3.0 +3.5 Softplus ReLu 39 / 63
  • 86. Increase k When k = 104 +0.0006 +0.0012 +0.0018 +0.0024−0.0004−0.001−0.0016−0.0022 −0.001 +0.001 +0.002 +0.003 +0.004 Softplus ReLu 40 / 63
  • 87. Final Remarks Although, ReLu functions They can handle the problem of vanishing problem However, as we will see, saturation starts to appear as a problem As in Hebbian Learning!!! 41 / 63
  • 88. Final Remarks Although, ReLu functions They can handle the problem of vanishing problem However, as we will see, saturation starts to appear as a problem As in Hebbian Learning!!! 41 / 63
  • 89. Outline 1 What are Neural Networks? Structure of a Neural Cell Pigeon Experiment Formal Definition of Artificial Neural Network Basic Elements of an Artificial Neuron Types of Activation Functions McCulloch-Pitts model More Advanced Models The Problem of the Vanishing Gradient Fixing the Problem, ReLu function 2 Neural Network As a Graph Introduction Feedback Neural Architectures Knowledge Representation Design of a Neural Network Representing Knowledge in a Neural Networks 42 / 63
  • 90. Neural Network As a Graph Definition A neural network is a directed graph consisting of nodes with interconnecting synaptic and activation links. Properties 1 Each neuron is represented by a set of linear synaptic links, an externally applied bias, and a possibly nonlinear activation link. 2 The synaptic to links of a neuron weight their respective input signals. 3 The weighted sum of the input signals defines the induced local field of the neuron. 4 The activation link squashes the induced local field of the neuron to produce an output. 43 / 63
  • 91. Neural Network As a Graph Definition A neural network is a directed graph consisting of nodes with interconnecting synaptic and activation links. Properties 1 Each neuron is represented by a set of linear synaptic links, an externally applied bias, and a possibly nonlinear activation link. 2 The synaptic to links of a neuron weight their respective input signals. 3 The weighted sum of the input signals defines the induced local field of the neuron. 4 The activation link squashes the induced local field of the neuron to produce an output. 43 / 63
  • 92. Neural Network As a Graph Definition A neural network is a directed graph consisting of nodes with interconnecting synaptic and activation links. Properties 1 Each neuron is represented by a set of linear synaptic links, an externally applied bias, and a possibly nonlinear activation link. 2 The synaptic to links of a neuron weight their respective input signals. 3 The weighted sum of the input signals defines the induced local field of the neuron. 4 The activation link squashes the induced local field of the neuron to produce an output. 43 / 63
  • 93. Neural Network As a Graph Definition A neural network is a directed graph consisting of nodes with interconnecting synaptic and activation links. Properties 1 Each neuron is represented by a set of linear synaptic links, an externally applied bias, and a possibly nonlinear activation link. 2 The synaptic to links of a neuron weight their respective input signals. 3 The weighted sum of the input signals defines the induced local field of the neuron. 4 The activation link squashes the induced local field of the neuron to produce an output. 43 / 63
  • 94. Neural Network As a Graph Definition A neural network is a directed graph consisting of nodes with interconnecting synaptic and activation links. Properties 1 Each neuron is represented by a set of linear synaptic links, an externally applied bias, and a possibly nonlinear activation link. 2 The synaptic to links of a neuron weight their respective input signals. 3 The weighted sum of the input signals defines the induced local field of the neuron. 4 The activation link squashes the induced local field of the neuron to produce an output. 43 / 63
  • 96. Some Observations Observation A partially complete graph describing a neural architecture has the following characteristics: Source nodes supply input signals to the graph. Each neuron is represented by a single node called a computation node. The communication links provide directions of signal flow in the graph. However Other Representations exist!!! Three main representations ones Block diagram, providing a functional description of the network. Signal-flow graph, providing a complete description of signal flow in the network. Then one we plan to use. Architectural graph, describing the network layout. 45 / 63
  • 97. Some Observations Observation A partially complete graph describing a neural architecture has the following characteristics: Source nodes supply input signals to the graph. Each neuron is represented by a single node called a computation node. The communication links provide directions of signal flow in the graph. However Other Representations exist!!! Three main representations ones Block diagram, providing a functional description of the network. Signal-flow graph, providing a complete description of signal flow in the network. Then one we plan to use. Architectural graph, describing the network layout. 45 / 63
  • 98. Some Observations Observation A partially complete graph describing a neural architecture has the following characteristics: Source nodes supply input signals to the graph. Each neuron is represented by a single node called a computation node. The communication links provide directions of signal flow in the graph. However Other Representations exist!!! Three main representations ones Block diagram, providing a functional description of the network. Signal-flow graph, providing a complete description of signal flow in the network. Then one we plan to use. Architectural graph, describing the network layout. 45 / 63
  • 99. Some Observations Observation A partially complete graph describing a neural architecture has the following characteristics: Source nodes supply input signals to the graph. Each neuron is represented by a single node called a computation node. The communication links provide directions of signal flow in the graph. However Other Representations exist!!! Three main representations ones Block diagram, providing a functional description of the network. Signal-flow graph, providing a complete description of signal flow in the network. Then one we plan to use. Architectural graph, describing the network layout. 45 / 63
  • 100. Some Observations Observation A partially complete graph describing a neural architecture has the following characteristics: Source nodes supply input signals to the graph. Each neuron is represented by a single node called a computation node. The communication links provide directions of signal flow in the graph. However Other Representations exist!!! Three main representations ones Block diagram, providing a functional description of the network. Signal-flow graph, providing a complete description of signal flow in the network. Then one we plan to use. Architectural graph, describing the network layout. 45 / 63
  • 101. Some Observations Observation A partially complete graph describing a neural architecture has the following characteristics: Source nodes supply input signals to the graph. Each neuron is represented by a single node called a computation node. The communication links provide directions of signal flow in the graph. However Other Representations exist!!! Three main representations ones Block diagram, providing a functional description of the network. Signal-flow graph, providing a complete description of signal flow in the network. Then one we plan to use. Architectural graph, describing the network layout. 45 / 63
  • 102. Some Observations Observation A partially complete graph describing a neural architecture has the following characteristics: Source nodes supply input signals to the graph. Each neuron is represented by a single node called a computation node. The communication links provide directions of signal flow in the graph. However Other Representations exist!!! Three main representations ones Block diagram, providing a functional description of the network. Signal-flow graph, providing a complete description of signal flow in the network. Then one we plan to use. Architectural graph, describing the network layout. 45 / 63
  • 103. Some Observations Observation A partially complete graph describing a neural architecture has the following characteristics: Source nodes supply input signals to the graph. Each neuron is represented by a single node called a computation node. The communication links provide directions of signal flow in the graph. However Other Representations exist!!! Three main representations ones Block diagram, providing a functional description of the network. Signal-flow graph, providing a complete description of signal flow in the network. Then one we plan to use. Architectural graph, describing the network layout. 45 / 63
  • 104. Some Observations Observation A partially complete graph describing a neural architecture has the following characteristics: Source nodes supply input signals to the graph. Each neuron is represented by a single node called a computation node. The communication links provide directions of signal flow in the graph. However Other Representations exist!!! Three main representations ones Block diagram, providing a functional description of the network. Signal-flow graph, providing a complete description of signal flow in the network. Then one we plan to use. Architectural graph, describing the network layout. 45 / 63
  • 105. Outline 1 What are Neural Networks? Structure of a Neural Cell Pigeon Experiment Formal Definition of Artificial Neural Network Basic Elements of an Artificial Neuron Types of Activation Functions McCulloch-Pitts model More Advanced Models The Problem of the Vanishing Gradient Fixing the Problem, ReLu function 2 Neural Network As a Graph Introduction Feedback Neural Architectures Knowledge Representation Design of a Neural Network Representing Knowledge in a Neural Networks 46 / 63
  • 106. Feedback Model Feedback Feedback is said to exist in a dynamic system. Indeed, feedback occurs in almost every part of the nervous system of every animal (Freeman, 1975) Feedback Architecture Thus Given xj the internal signal and A (The weights product) operator: Output yk (n) = A xj (n) (8) 47 / 63
  • 107. Feedback Model Feedback Feedback is said to exist in a dynamic system. Indeed, feedback occurs in almost every part of the nervous system of every animal (Freeman, 1975) Feedback Architecture Thus Given xj the internal signal and A (The weights product) operator: Output yk (n) = A xj (n) (8) 47 / 63
  • 108. Feedback Model Feedback Feedback is said to exist in a dynamic system. Indeed, feedback occurs in almost every part of the nervous system of every animal (Freeman, 1975) Feedback Architecture Thus Given xj the internal signal and A (The weights product) operator: Output yk (n) = A xj (n) (8) 47 / 63
  • 109. Thus The internal input has a feedback Given a B operator: xj (n) = xj (n) + B (yk (n)) (9) Then eliminating xj (n) yk (n) = A 1 − AB (xj (n)) (10) Where A 1−AB is known as the closed-loop operator of the system. 48 / 63
  • 110. Thus The internal input has a feedback Given a B operator: xj (n) = xj (n) + B (yk (n)) (9) Then eliminating xj (n) yk (n) = A 1 − AB (xj (n)) (10) Where A 1−AB is known as the closed-loop operator of the system. 48 / 63
  • 111. Outline 1 What are Neural Networks? Structure of a Neural Cell Pigeon Experiment Formal Definition of Artificial Neural Network Basic Elements of an Artificial Neuron Types of Activation Functions McCulloch-Pitts model More Advanced Models The Problem of the Vanishing Gradient Fixing the Problem, ReLu function 2 Neural Network As a Graph Introduction Feedback Neural Architectures Knowledge Representation Design of a Neural Network Representing Knowledge in a Neural Networks 49 / 63
  • 112. Neural Architectures I Single-Layer Feedforward Networks Input Layer of Source Nodes Output Layer of Neurons 50 / 63
  • 113. Observations Observations This network is know as a strictly feed-forward or acyclic type. 51 / 63
  • 114. Neural Architectures II Multilayer Feedforward Networks Input Layer of Source Nodes Output Layer of Neurons Hidden Layer of Neurons 52 / 63
  • 115. Observations Observations 1 This network contains a series of hidden layer. 2 Each hidden layers allows for classification of the new output space of the previous hidden layer. 53 / 63
  • 116. Observations Observations 1 This network contains a series of hidden layer. 2 Each hidden layers allows for classification of the new output space of the previous hidden layer. 53 / 63
  • 117. Neural Architectures III Recurrent Networks UNIT-DELAY OPERATORS 54 / 63
  • 118. Observations Observations 1 This network has not self-feedback loops. 2 It has something known as unit delay operator B = z−1. 55 / 63
  • 119. Observations Observations 1 This network has not self-feedback loops. 2 It has something known as unit delay operator B = z−1. 55 / 63
  • 120. Outline 1 What are Neural Networks? Structure of a Neural Cell Pigeon Experiment Formal Definition of Artificial Neural Network Basic Elements of an Artificial Neuron Types of Activation Functions McCulloch-Pitts model More Advanced Models The Problem of the Vanishing Gradient Fixing the Problem, ReLu function 2 Neural Network As a Graph Introduction Feedback Neural Architectures Knowledge Representation Design of a Neural Network Representing Knowledge in a Neural Networks 56 / 63
  • 121. Knowledge Representation Definition By Fischler and Firschein, 1987 “Knowledge refers to stored information or models used by a person or machine to interpret, predict, and appropriately respond to the outside world. ” It consists of two kinds of information: 1 The known world state, represented by facts about what is and what has been known. 2 Observations (measurements) of the world, obtained by means of sensors. Observations can be 1 Labeled 2 Unlabeled 57 / 63
  • 122. Knowledge Representation Definition By Fischler and Firschein, 1987 “Knowledge refers to stored information or models used by a person or machine to interpret, predict, and appropriately respond to the outside world. ” It consists of two kinds of information: 1 The known world state, represented by facts about what is and what has been known. 2 Observations (measurements) of the world, obtained by means of sensors. Observations can be 1 Labeled 2 Unlabeled 57 / 63
  • 123. Knowledge Representation Definition By Fischler and Firschein, 1987 “Knowledge refers to stored information or models used by a person or machine to interpret, predict, and appropriately respond to the outside world. ” It consists of two kinds of information: 1 The known world state, represented by facts about what is and what has been known. 2 Observations (measurements) of the world, obtained by means of sensors. Observations can be 1 Labeled 2 Unlabeled 57 / 63
  • 124. Knowledge Representation Definition By Fischler and Firschein, 1987 “Knowledge refers to stored information or models used by a person or machine to interpret, predict, and appropriately respond to the outside world. ” It consists of two kinds of information: 1 The known world state, represented by facts about what is and what has been known. 2 Observations (measurements) of the world, obtained by means of sensors. Observations can be 1 Labeled 2 Unlabeled 57 / 63
  • 125. Knowledge Representation Definition By Fischler and Firschein, 1987 “Knowledge refers to stored information or models used by a person or machine to interpret, predict, and appropriately respond to the outside world. ” It consists of two kinds of information: 1 The known world state, represented by facts about what is and what has been known. 2 Observations (measurements) of the world, obtained by means of sensors. Observations can be 1 Labeled 2 Unlabeled 57 / 63
  • 126. Knowledge Representation Definition By Fischler and Firschein, 1987 “Knowledge refers to stored information or models used by a person or machine to interpret, predict, and appropriately respond to the outside world. ” It consists of two kinds of information: 1 The known world state, represented by facts about what is and what has been known. 2 Observations (measurements) of the world, obtained by means of sensors. Observations can be 1 Labeled 2 Unlabeled 57 / 63
  • 127. Outline 1 What are Neural Networks? Structure of a Neural Cell Pigeon Experiment Formal Definition of Artificial Neural Network Basic Elements of an Artificial Neuron Types of Activation Functions McCulloch-Pitts model More Advanced Models The Problem of the Vanishing Gradient Fixing the Problem, ReLu function 2 Neural Network As a Graph Introduction Feedback Neural Architectures Knowledge Representation Design of a Neural Network Representing Knowledge in a Neural Networks 58 / 63
  • 128. Set of Training Data Training Data It consist of input-output pairs (x, y) x= input signal y= desired output Thus, we have the following phases of designing a Neuronal Network 1 Choose appropriate architecture 2 Train the network - learning. 1 Use the Training Data!!! 3 Test the network with data not seen before 1 Use a set of pairs that where not shown to the network so the y component is guessed. 4 Then, you can see how well the network behaves - Generalization Phase. 59 / 63
  • 129. Set of Training Data Training Data It consist of input-output pairs (x, y) x= input signal y= desired output Thus, we have the following phases of designing a Neuronal Network 1 Choose appropriate architecture 2 Train the network - learning. 1 Use the Training Data!!! 3 Test the network with data not seen before 1 Use a set of pairs that where not shown to the network so the y component is guessed. 4 Then, you can see how well the network behaves - Generalization Phase. 59 / 63
  • 130. Set of Training Data Training Data It consist of input-output pairs (x, y) x= input signal y= desired output Thus, we have the following phases of designing a Neuronal Network 1 Choose appropriate architecture 2 Train the network - learning. 1 Use the Training Data!!! 3 Test the network with data not seen before 1 Use a set of pairs that where not shown to the network so the y component is guessed. 4 Then, you can see how well the network behaves - Generalization Phase. 59 / 63
  • 131. Set of Training Data Training Data It consist of input-output pairs (x, y) x= input signal y= desired output Thus, we have the following phases of designing a Neuronal Network 1 Choose appropriate architecture 2 Train the network - learning. 1 Use the Training Data!!! 3 Test the network with data not seen before 1 Use a set of pairs that where not shown to the network so the y component is guessed. 4 Then, you can see how well the network behaves - Generalization Phase. 59 / 63
  • 132. Set of Training Data Training Data It consist of input-output pairs (x, y) x= input signal y= desired output Thus, we have the following phases of designing a Neuronal Network 1 Choose appropriate architecture 2 Train the network - learning. 1 Use the Training Data!!! 3 Test the network with data not seen before 1 Use a set of pairs that where not shown to the network so the y component is guessed. 4 Then, you can see how well the network behaves - Generalization Phase. 59 / 63
  • 133. Set of Training Data Training Data It consist of input-output pairs (x, y) x= input signal y= desired output Thus, we have the following phases of designing a Neuronal Network 1 Choose appropriate architecture 2 Train the network - learning. 1 Use the Training Data!!! 3 Test the network with data not seen before 1 Use a set of pairs that where not shown to the network so the y component is guessed. 4 Then, you can see how well the network behaves - Generalization Phase. 59 / 63
  • 134. Set of Training Data Training Data It consist of input-output pairs (x, y) x= input signal y= desired output Thus, we have the following phases of designing a Neuronal Network 1 Choose appropriate architecture 2 Train the network - learning. 1 Use the Training Data!!! 3 Test the network with data not seen before 1 Use a set of pairs that where not shown to the network so the y component is guessed. 4 Then, you can see how well the network behaves - Generalization Phase. 59 / 63
  • 135. Outline 1 What are Neural Networks? Structure of a Neural Cell Pigeon Experiment Formal Definition of Artificial Neural Network Basic Elements of an Artificial Neuron Types of Activation Functions McCulloch-Pitts model More Advanced Models The Problem of the Vanishing Gradient Fixing the Problem, ReLu function 2 Neural Network As a Graph Introduction Feedback Neural Architectures Knowledge Representation Design of a Neural Network Representing Knowledge in a Neural Networks 60 / 63
  • 136. Representing Knowledge in a Neural Networks Notice the Following The subject of knowledge representation inside an artificial network is very complicated. However: Pattern Classifiers Vs Neural Networks 1 Pattern Classifiers are first designed and then validated by the environment. 2 Neural Networks learns the environment by using the data from it!!! Kurt Hornik et al. proved (1989) “Standard multilayer feedforward networks with as few as one hidden layer using arbitrary squashing functions are capable of approximating any Borel measurable function” (Basically many of the known ones!!!) 61 / 63
  • 137. Representing Knowledge in a Neural Networks Notice the Following The subject of knowledge representation inside an artificial network is very complicated. However: Pattern Classifiers Vs Neural Networks 1 Pattern Classifiers are first designed and then validated by the environment. 2 Neural Networks learns the environment by using the data from it!!! Kurt Hornik et al. proved (1989) “Standard multilayer feedforward networks with as few as one hidden layer using arbitrary squashing functions are capable of approximating any Borel measurable function” (Basically many of the known ones!!!) 61 / 63
  • 138. Representing Knowledge in a Neural Networks Notice the Following The subject of knowledge representation inside an artificial network is very complicated. However: Pattern Classifiers Vs Neural Networks 1 Pattern Classifiers are first designed and then validated by the environment. 2 Neural Networks learns the environment by using the data from it!!! Kurt Hornik et al. proved (1989) “Standard multilayer feedforward networks with as few as one hidden layer using arbitrary squashing functions are capable of approximating any Borel measurable function” (Basically many of the known ones!!!) 61 / 63
  • 139. Rules Knowledge Representation Rule 1 Similar inputs from similar classes should usually produce similar representation. We can use a Metric to measure that similarity!!! Examples 1 d (xi, xj) = xi − xj (Classic Euclidean Metric). 2 d2 ij = (xi − µi)T −1 xj − µj (Mahalanobis distance) where 1 µi = E [xi] . 2 = E (xi − µi) (xi − µi) T = E xj − µj xj − µj T . 62 / 63
  • 140. Rules Knowledge Representation Rule 1 Similar inputs from similar classes should usually produce similar representation. We can use a Metric to measure that similarity!!! Examples 1 d (xi, xj) = xi − xj (Classic Euclidean Metric). 2 d2 ij = (xi − µi)T −1 xj − µj (Mahalanobis distance) where 1 µi = E [xi] . 2 = E (xi − µi) (xi − µi) T = E xj − µj xj − µj T . 62 / 63
  • 141. Rules Knowledge Representation Rule 1 Similar inputs from similar classes should usually produce similar representation. We can use a Metric to measure that similarity!!! Examples 1 d (xi, xj) = xi − xj (Classic Euclidean Metric). 2 d2 ij = (xi − µi)T −1 xj − µj (Mahalanobis distance) where 1 µi = E [xi] . 2 = E (xi − µi) (xi − µi) T = E xj − µj xj − µj T . 62 / 63
  • 142. More Rule 2 Items to be categorized as separate classes should be given widely different representations in the network. Rule 3 If a particular feature is important, then there should be a large number of neurons involved in the representation. Rule 4 Prior information and invariance should be built into the design: Thus, simplify the network by not learning that data. 63 / 63
  • 143. More Rule 2 Items to be categorized as separate classes should be given widely different representations in the network. Rule 3 If a particular feature is important, then there should be a large number of neurons involved in the representation. Rule 4 Prior information and invariance should be built into the design: Thus, simplify the network by not learning that data. 63 / 63
  • 144. More Rule 2 Items to be categorized as separate classes should be given widely different representations in the network. Rule 3 If a particular feature is important, then there should be a large number of neurons involved in the representation. Rule 4 Prior information and invariance should be built into the design: Thus, simplify the network by not learning that data. 63 / 63