SlideShare una empresa de Scribd logo
1 de 18
Descargar para leer sin conexión
Christof Monz
Informatics Institute
University of Amsterdam
Data Mining
Part 3: Neural Networks
Overview
Christof Monz
Data Mining - Part 3: Neural Networks
1
Perceptrons
Gradient descent search
Multi-layer neural networks
The backpropagation algorithm
Neural Networks
Christof Monz
Data Mining - Part 3: Neural Networks
2
Analogy to biological neural systems, the most
robust learning systems we know
Attempt to understand natural biological
systems through computational modeling
Massive parallelism allows for computational
efficiency
Help understand ‘distributed’ nature of neural
representations
Intelligent behavior as an ‘emergent’ property of
large number of simple units rather than from
explicitly encoded symbolic rules and algorithms
Neural Network Learning
Christof Monz
Data Mining - Part 3: Neural Networks
3
Learning approach based on modeling
adaptation in biological neural systems
Perceptron: Initial algorithm for learning
simple neural networks (single layer) developed
in the 1950s
Backpropagation: More complex algorithm for
learning multi-layer neural networks developed
in the 1980s.
Real Neurons
Christof Monz
Data Mining - Part 3: Neural Networks
4
Human Neural Network
Christof Monz
Data Mining - Part 3: Neural Networks
5
Modeling Neural Networks
Christof Monz
Data Mining - Part 3: Neural Networks
6
Perceptrons
Christof Monz
Data Mining - Part 3: Neural Networks
7
Perceptrons
Christof Monz
Data Mining - Part 3: Neural Networks
8
A perceptron is a single layer neural network
with one output unit
The output of a perceptron is computed as
follows
o(x1 ...xn) =
1 ifw0 +w1x1 +...+wnxn > 0
−1 otherwise
Assume a ‘dummy’ input x0 = 1 we can write:
o(x1 ...xn) =
1 if ∑n
i=0 wixi > 0
−1 otherwise
Perceptrons
Christof Monz
Data Mining - Part 3: Neural Networks
9
Learning a perceptron involves choosing the
‘right’ values for the weights w0 ...wn
The set of candidate hypotheses is
H = {w | w ∈ ℜ(n+1)}
Representational Power
Christof Monz
Data Mining - Part 3: Neural Networks
10
A single perceptron represent many boolean
functions, e.g. AND, OR, NAND (¬AND), . . . ,
but not all (e.g., XOR)
Peceptron Training Rule
Christof Monz
Data Mining - Part 3: Neural Networks
11
The perceptron training rule can be defined
for each weight as:
wi ← wi +∆wi
where ∆wi = η(t −o)xi
where t is the target output, o is the output of
the perceptron, and η is the learning rate
This scenario assume that we know what the
target outputs are supposed to be like
Peceptron Training Rule Example
Christof Monz
Data Mining - Part 3: Neural Networks
12
If t = o then η(t −o)xi = 0 and ∆wi = 0, i.e.
the weight for wi remains unchanged, regardless
of the learning rate and the input values (i.e. xi)
Let’s assume a learning rate of η = 0.1 and an
input value of xi = 0.8
• If t = +1 and o = −1, then
∆wi = 0.1(1 −(−1)))0.8 = 0.16
• If t = −1 and o = +1, then
∆wi = 0.1(−1 −1)))0.8 = −0.16
Peceptron Training Rule
Christof Monz
Data Mining - Part 3: Neural Networks
13
The perceptron training rule converges after a
finite number of iterations
Stopping criterion holds if the amount of
changes falls below a pre-defined threshold θ,
e.g., if |∆w|L1 < θ
But only if the training examples are linearly
separable
The Delta Rule
Christof Monz
Data Mining - Part 3: Neural Networks
14
The delta rule overcomes the shortcoming of
the perceptron training rule not being
guaranteed to converge if the examples are not
linearly separable
Delta rule is based on gradient descent search
Let’s assume we have an unthresholded
perceptron: o(x) = w ·x
We can define the training error as:
E(w) = 1
2 ∑
d∈D
(td −od)2
where D is the set of training examples
Error Surface
Christof Monz
Data Mining - Part 3: Neural Networks
15
Gradient Descent
Christof Monz
Data Mining - Part 3: Neural Networks
16
The gradient of E is the vector pointing in the
direction of the steepest increase for any point
on the error surface
∇E(w) = ∂E
∂w0
, ∂E
∂w1
,..., ∂E
∂wn
Since we are interested in minimizing the error,
we consider negative gradients: −∇E(w)
The training rule for gradient descent is:
w ← w +∆w
where ∆w = −η∇E(w)
Gradient Descent
Christof Monz
Data Mining - Part 3: Neural Networks
17
The training rule for individual weights is
defined as wi ← wi +∆wi
where ∆wi = −η ∂E
∂wi
Instantiating E for the error function we use
gives: ∂E
∂wi
= ∂
∂wi
1
2 ∑
d∈D
(td −od)2
How do we use partial derivatives to actually
compute updates to weights at each step?
Gradient Descent
Christof Monz
Data Mining - Part 3: Neural Networks
18
∂E
∂wi
=
∂
∂wi
1
2
∑
d∈D
(td −od)2
=
1
2
∑
d∈D
∂
∂wi
(td −od)2
=
1
2
∑
d∈D
2(td −od)
∂
∂wi
(td −od)
= ∑
d∈D
(td −od)
∂
∂wi
(td −od)
∂E
∂wi
= ∑
d∈D
(td −od)·(−xid)
Gradient Descent
Christof Monz
Data Mining - Part 3: Neural Networks
19
The delta rule for individual weights can now be
written as wi ← wi +∆wi
where ∆wi = η ∑
d∈D
(td −od)xid
The gradient descent algorithm
• picks initial random weights
• computes the outputs
• updates each weight by adding ∆wi
• repeats until converge
The Gradient Descent Algorithm
Christof Monz
Data Mining - Part 3: Neural Networks
20
Each training example is a pair < x,t >
1 Initialize each wi to some small random value
2 Until the termination condition is met do:
2.1 Initialize each ∆wi to 0
2.2 For each < x,t >∈ D do
2.2.1 Compute o(x)
2.2.2 For each weight wi do
∆wi ← ∆wi +η(t −o)xi
2.3 For each weight wi do
wi ← wi +∆wi
The Gradient Descent Algorithm
Christof Monz
Data Mining - Part 3: Neural Networks
21
The gradient descent algorithm will find the
global minimum, provided that the learning rate
is small enough
If the learning rate is too large, this algorithm
runs into the risk of overstepping the global
minimum
It’s a common strategy to gradually the
decrease the learning rate
This algorithm works also in case the training
examples are not linearly separable
Shortcomings of Gradient Descent
Christof Monz
Data Mining - Part 3: Neural Networks
22
Converging to a minimum can be quite slow
(i.e. it can take thousands of steps). Increasing
the learning rate on the other hand can lead to
overstepping minima
If there are multiple local minima in the error
surface, gradient descent can get stuck in one
of them and not find the global minimum
Stochastic gradient descent alleviates these
difficulties
Stochastic Gradient Descent
Christof Monz
Data Mining - Part 3: Neural Networks
23
Gradient descent updates the weights after
summing over all training examples
Stochastic (or incremental) gradient descent
updates weights incrementally after calculating
the error for each individual training example
This this end step 2.3 is deleted and step 2.2.2
modified
Stochastic Gradient Descent
Christof Monz
Data Mining - Part 3: Neural Networks
24
Each training example is a pair < x,t >
1 Initialize each wi to some small random value
2 Until the termination condition is met do:
2.1 Initialize each ∆wi to 0
2.2 For each < x,t >∈ D do
2.2.1 Compute o(x)
2.2.2 For each weight wi do
wi ← wi +η(t −o)xi
Comparison
Christof Monz
Data Mining - Part 3: Neural Networks
25
In standard gradient descent summing over
multiple examples requires more computations
per weight update step
As a consequence standard gradient descent
often uses larger learning rates than stochastic
gradient descent
Stochastic gradient descent can avoid falling
into local minima because it uses the different
∇Ed(w) rather than the overall ∇E(w) to guide
its search
Multi-Layer Neural Networks
Christof Monz
Data Mining - Part 3: Neural Networks
26
Perceptrons only have two layers: the input
layer and the output layer
Perceptrons only have one output unit
Perceptrons are limited in their expressiveness
Multi-layer neural networks consist of an input
layer, a hidden layer, and an output layer
Multi-layer neural networks can have several
output units
Multi-Layer Neural Networks
Christof Monz
Data Mining - Part 3: Neural Networks
27
Multi-Layer Neural Networks
Christof Monz
Data Mining - Part 3: Neural Networks
28
The units of the hidden layer function as input
units to the next layer
However, multiple layers of linear units still
produce only linear functions
The step function in perceptrons is another
choice, but it is not differentiable, and therefore
not suitable for gradient descent search
Solution: the sigmoid function, a non-linear,
differentiable threshold function
Sigmoid Unit
Christof Monz
Data Mining - Part 3: Neural Networks
29
The Sigmoid Function
Christof Monz
Data Mining - Part 3: Neural Networks
30
The output is computed as o = σ(w ·x)
where σ(y) = 1
1+e−y
i.e. o = σ(w ·x) = 1
1+e−(w·x)
Another nice property of the sigmoid function is
that its derivative is easily expressed:
dσ(y)
dy
= σ(y)·(1 −σ(y))
Learning with Multiple Layers
Christof Monz
Data Mining - Part 3: Neural Networks
31
The gradient descent search can be used to
train multi-layer neural networks, but the
algorithm has to be adapted
Firstly, there can be multiple output units, and
therefore the error function as to be generalized:
E(w) = 1
2 ∑
d∈D
∑
k∈outputs
(tkd −okd)2
Secondly, the error ‘feedback’ has to be fed
through multiple layers
Backpropagation Algorithm
Christof Monz
Data Mining - Part 3: Neural Networks
32
For each training example < x,t > do
1. Input x to the network and compute ou for every unit in
the network
2. For each output unit k calculate its error δk :
δk ← ok (1 −ok )(tk −ok )
3. For each hidden unit h calculate its error δh:
δh ← oh(1 −oh) ∑
k∈outputs
wkhδk
4. Update each network weight wji :
wji ← wji +∆wji
where ∆wji = ηδj xji
Note: xji is the value from unit i to j and wji is
the weight of connecting unit i to j,
Backpropagation Algorithm
Christof Monz
Data Mining - Part 3: Neural Networks
33
Step 1 propagates the input forward through
the network
Steps 2–4 propagate the errors backward
through the network
Step 2 is similar to the delta rule in gradient
descent (step 2.3)
Step 3 sums over the errors of all output units
influence by a given hidden unit (this is because
the training data only provides direct feedback
for the output units)
Applications of Neural Networks
Christof Monz
Data Mining - Part 3: Neural Networks
34
Text to speech
Fraud detection
Automated vehicles
Game playing
Handwriting recognition
Summary
Christof Monz
Data Mining - Part 3: Neural Networks
35
Perceptrons, simple one layer neural networks
Perceptron training rule
Gradient descent search
Multi-layer neural networks
Backpropagation algorithm

Más contenido relacionado

La actualidad más candente

Learning Financial Market Data with Recurrent Autoencoders and TensorFlow
Learning Financial Market Data with Recurrent Autoencoders and TensorFlowLearning Financial Market Data with Recurrent Autoencoders and TensorFlow
Learning Financial Market Data with Recurrent Autoencoders and TensorFlowAltoros
 
Martin Takac - “Solving Large-Scale Machine Learning Problems in a Distribute...
Martin Takac - “Solving Large-Scale Machine Learning Problems in a Distribute...Martin Takac - “Solving Large-Scale Machine Learning Problems in a Distribute...
Martin Takac - “Solving Large-Scale Machine Learning Problems in a Distribute...diannepatricia
 
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)Universitat Politècnica de Catalunya
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural NetworksSharath TS
 
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksSang Jun Lee
 
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya
 
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...Universitat Politècnica de Catalunya
 
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Universitat Politècnica de Catalunya
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Universitat Politècnica de Catalunya
 
Introductions to Neural Networks,Basic concepts
Introductions to Neural Networks,Basic conceptsIntroductions to Neural Networks,Basic concepts
Introductions to Neural Networks,Basic conceptsQin Jian
 
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Universitat Politècnica de Catalunya
 
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...Universitat Politècnica de Catalunya
 
00463517b1e90c1e63000000
00463517b1e90c1e6300000000463517b1e90c1e63000000
00463517b1e90c1e63000000Ivonne Liu
 
Winter school-pq2016v2
Winter school-pq2016v2Winter school-pq2016v2
Winter school-pq2016v2Ludovic Perret
 
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...Universitat Politècnica de Catalunya
 
Deep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlowDeep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlowOswald Campesato
 
Recurrent Neural Networks (DLAI D7L1 2017 UPC Deep Learning for Artificial In...
Recurrent Neural Networks (DLAI D7L1 2017 UPC Deep Learning for Artificial In...Recurrent Neural Networks (DLAI D7L1 2017 UPC Deep Learning for Artificial In...
Recurrent Neural Networks (DLAI D7L1 2017 UPC Deep Learning for Artificial In...Universitat Politècnica de Catalunya
 

La actualidad más candente (20)

Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
 
Learning Financial Market Data with Recurrent Autoencoders and TensorFlow
Learning Financial Market Data with Recurrent Autoencoders and TensorFlowLearning Financial Market Data with Recurrent Autoencoders and TensorFlow
Learning Financial Market Data with Recurrent Autoencoders and TensorFlow
 
Martin Takac - “Solving Large-Scale Machine Learning Problems in a Distribute...
Martin Takac - “Solving Large-Scale Machine Learning Problems in a Distribute...Martin Takac - “Solving Large-Scale Machine Learning Problems in a Distribute...
Martin Takac - “Solving Large-Scale Machine Learning Problems in a Distribute...
 
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
Deep Learning for Computer Vision: Recurrent Neural Networks (UPC 2016)
 
Recurrent Neural Networks
Recurrent Neural NetworksRecurrent Neural Networks
Recurrent Neural Networks
 
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
 
Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural Networks
 
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
 
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
Convolutional Neural Networks (DLAI D5L1 2017 UPC Deep Learning for Artificia...
 
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
 
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
The Perceptron - Xavier Giro-i-Nieto - UPC Barcelona 2018
 
Introductions to Neural Networks,Basic concepts
Introductions to Neural Networks,Basic conceptsIntroductions to Neural Networks,Basic concepts
Introductions to Neural Networks,Basic concepts
 
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
 
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
Video Analysis with Recurrent Neural Networks (Master Computer Vision Barcelo...
 
00463517b1e90c1e63000000
00463517b1e90c1e6300000000463517b1e90c1e63000000
00463517b1e90c1e63000000
 
Winter school-pq2016v2
Winter school-pq2016v2Winter school-pq2016v2
Winter school-pq2016v2
 
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
 
Deep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlowDeep Learning, Keras, and TensorFlow
Deep Learning, Keras, and TensorFlow
 
Recurrent Neural Networks (DLAI D7L1 2017 UPC Deep Learning for Artificial In...
Recurrent Neural Networks (DLAI D7L1 2017 UPC Deep Learning for Artificial In...Recurrent Neural Networks (DLAI D7L1 2017 UPC Deep Learning for Artificial In...
Recurrent Neural Networks (DLAI D7L1 2017 UPC Deep Learning for Artificial In...
 

Destacado

Destacado (7)

Basics Of Neural Network Analysis
Basics Of Neural Network AnalysisBasics Of Neural Network Analysis
Basics Of Neural Network Analysis
 
Seminar Presentation
Seminar PresentationSeminar Presentation
Seminar Presentation
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Data mining
Data   miningData   mining
Data mining
 
01 introduction to data mining
01 introduction to data mining01 introduction to data mining
01 introduction to data mining
 
neural network
neural networkneural network
neural network
 
Neural network & its applications
Neural network & its applications Neural network & its applications
Neural network & its applications
 

Similar a Dm part03 neural-networks-handout

Two algorithms to accelerate training of back-propagation neural networks
Two algorithms to accelerate training of back-propagation neural networksTwo algorithms to accelerate training of back-propagation neural networks
Two algorithms to accelerate training of back-propagation neural networksESCOM
 
From RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphsFrom RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphstuxette
 
BLIND SIGNATURE SCHEME BASED ON CHEBYSHEV POLYNOMIALS
BLIND SIGNATURE SCHEME BASED ON CHEBYSHEV POLYNOMIALSBLIND SIGNATURE SCHEME BASED ON CHEBYSHEV POLYNOMIALS
BLIND SIGNATURE SCHEME BASED ON CHEBYSHEV POLYNOMIALSIJNSA Journal
 
Deep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlowDeep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlowOswald Campesato
 
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...MLconf
 
System Monitoring
System MonitoringSystem Monitoring
System Monitoringbutest
 
Artificial Neural Networks
Artificial Neural NetworksArtificial Neural Networks
Artificial Neural NetworksArslan Zulfiqar
 

Similar a Dm part03 neural-networks-handout (20)

Two algorithms to accelerate training of back-propagation neural networks
Two algorithms to accelerate training of back-propagation neural networksTwo algorithms to accelerate training of back-propagation neural networks
Two algorithms to accelerate training of back-propagation neural networks
 
C++ and Deep Learning
C++ and Deep LearningC++ and Deep Learning
C++ and Deep Learning
 
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
Multilayer Perceptron - Elisa Sayrol - UPC Barcelona 2018
 
The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)
The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)
The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)
 
UofT_ML_lecture.pptx
UofT_ML_lecture.pptxUofT_ML_lecture.pptx
UofT_ML_lecture.pptx
 
ann-ics320Part4.ppt
ann-ics320Part4.pptann-ics320Part4.ppt
ann-ics320Part4.ppt
 
ann-ics320Part4.ppt
ann-ics320Part4.pptann-ics320Part4.ppt
ann-ics320Part4.ppt
 
From RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphsFrom RNN to neural networks for cyclic undirected graphs
From RNN to neural networks for cyclic undirected graphs
 
Nn3
Nn3Nn3
Nn3
 
Scala and Deep Learning
Scala and Deep LearningScala and Deep Learning
Scala and Deep Learning
 
BLIND SIGNATURE SCHEME BASED ON CHEBYSHEV POLYNOMIALS
BLIND SIGNATURE SCHEME BASED ON CHEBYSHEV POLYNOMIALSBLIND SIGNATURE SCHEME BASED ON CHEBYSHEV POLYNOMIALS
BLIND SIGNATURE SCHEME BASED ON CHEBYSHEV POLYNOMIALS
 
ANNs.pdf
ANNs.pdfANNs.pdf
ANNs.pdf
 
Deep learning (2)
Deep learning (2)Deep learning (2)
Deep learning (2)
 
2-Perceptrons.pdf
2-Perceptrons.pdf2-Perceptrons.pdf
2-Perceptrons.pdf
 
Deep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlowDeep Learning: R with Keras and TensorFlow
Deep Learning: R with Keras and TensorFlow
 
10-Perceptron.pdf
10-Perceptron.pdf10-Perceptron.pdf
10-Perceptron.pdf
 
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
 
System Monitoring
System MonitoringSystem Monitoring
System Monitoring
 
Pres metabief2020jmm
Pres metabief2020jmmPres metabief2020jmm
Pres metabief2020jmm
 
Artificial Neural Networks
Artificial Neural NetworksArtificial Neural Networks
Artificial Neural Networks
 

Más de okeee

Week02 answer
Week02 answerWeek02 answer
Week02 answerokeee
 
Dm uitwerkingen wc4
Dm uitwerkingen wc4Dm uitwerkingen wc4
Dm uitwerkingen wc4okeee
 
Dm uitwerkingen wc2
Dm uitwerkingen wc2Dm uitwerkingen wc2
Dm uitwerkingen wc2okeee
 
Dm uitwerkingen wc1
Dm uitwerkingen wc1Dm uitwerkingen wc1
Dm uitwerkingen wc1okeee
 
Dm uitwerkingen wc3
Dm uitwerkingen wc3Dm uitwerkingen wc3
Dm uitwerkingen wc3okeee
 
Dm uitwerkingen wc1
Dm uitwerkingen wc1Dm uitwerkingen wc1
Dm uitwerkingen wc1okeee
 
Dm part03 neural-networks-homework
Dm part03 neural-networks-homeworkDm part03 neural-networks-homework
Dm part03 neural-networks-homeworkokeee
 
10[1].1.1.115.9508
10[1].1.1.115.950810[1].1.1.115.9508
10[1].1.1.115.9508okeee
 
Hcm p137 hilliges
Hcm p137 hilligesHcm p137 hilliges
Hcm p137 hilligesokeee
 
Prob18
Prob18Prob18
Prob18okeee
 
Overfit10
Overfit10Overfit10
Overfit10okeee
 
Decision tree.10.11
Decision tree.10.11Decision tree.10.11
Decision tree.10.11okeee
 
Dm week01 linreg.handout
Dm week01 linreg.handoutDm week01 linreg.handout
Dm week01 linreg.handoutokeee
 
Dm week02 decision-trees-handout
Dm week02 decision-trees-handoutDm week02 decision-trees-handout
Dm week02 decision-trees-handoutokeee
 
Dm week01 prob-refresher.handout
Dm week01 prob-refresher.handoutDm week01 prob-refresher.handout
Dm week01 prob-refresher.handoutokeee
 
Dm week01 intro.handout
Dm week01 intro.handoutDm week01 intro.handout
Dm week01 intro.handoutokeee
 
Dm week01 homework(1)
Dm week01 homework(1)Dm week01 homework(1)
Dm week01 homework(1)okeee
 
Chapter7 huizing
Chapter7 huizingChapter7 huizing
Chapter7 huizingokeee
 
Chapter8 choo
Chapter8 chooChapter8 choo
Chapter8 choookeee
 
Chapter6 huizing
Chapter6 huizingChapter6 huizing
Chapter6 huizingokeee
 

Más de okeee (20)

Week02 answer
Week02 answerWeek02 answer
Week02 answer
 
Dm uitwerkingen wc4
Dm uitwerkingen wc4Dm uitwerkingen wc4
Dm uitwerkingen wc4
 
Dm uitwerkingen wc2
Dm uitwerkingen wc2Dm uitwerkingen wc2
Dm uitwerkingen wc2
 
Dm uitwerkingen wc1
Dm uitwerkingen wc1Dm uitwerkingen wc1
Dm uitwerkingen wc1
 
Dm uitwerkingen wc3
Dm uitwerkingen wc3Dm uitwerkingen wc3
Dm uitwerkingen wc3
 
Dm uitwerkingen wc1
Dm uitwerkingen wc1Dm uitwerkingen wc1
Dm uitwerkingen wc1
 
Dm part03 neural-networks-homework
Dm part03 neural-networks-homeworkDm part03 neural-networks-homework
Dm part03 neural-networks-homework
 
10[1].1.1.115.9508
10[1].1.1.115.950810[1].1.1.115.9508
10[1].1.1.115.9508
 
Hcm p137 hilliges
Hcm p137 hilligesHcm p137 hilliges
Hcm p137 hilliges
 
Prob18
Prob18Prob18
Prob18
 
Overfit10
Overfit10Overfit10
Overfit10
 
Decision tree.10.11
Decision tree.10.11Decision tree.10.11
Decision tree.10.11
 
Dm week01 linreg.handout
Dm week01 linreg.handoutDm week01 linreg.handout
Dm week01 linreg.handout
 
Dm week02 decision-trees-handout
Dm week02 decision-trees-handoutDm week02 decision-trees-handout
Dm week02 decision-trees-handout
 
Dm week01 prob-refresher.handout
Dm week01 prob-refresher.handoutDm week01 prob-refresher.handout
Dm week01 prob-refresher.handout
 
Dm week01 intro.handout
Dm week01 intro.handoutDm week01 intro.handout
Dm week01 intro.handout
 
Dm week01 homework(1)
Dm week01 homework(1)Dm week01 homework(1)
Dm week01 homework(1)
 
Chapter7 huizing
Chapter7 huizingChapter7 huizing
Chapter7 huizing
 
Chapter8 choo
Chapter8 chooChapter8 choo
Chapter8 choo
 
Chapter6 huizing
Chapter6 huizingChapter6 huizing
Chapter6 huizing
 

Último

USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxthorishapillay1
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)cama23
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...Nguyen Thanh Tu Collection
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinojohnmickonozaleda
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomnelietumpap1
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 

Último (20)

USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Tilak Nagar Delhi reach out to us at 🔝9953056974🔝
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Proudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptxProudly South Africa powerpoint Thorisha.pptx
Proudly South Africa powerpoint Thorisha.pptx
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)Global Lehigh Strategic Initiatives (without descriptions)
Global Lehigh Strategic Initiatives (without descriptions)
 
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
HỌC TỐT TIẾNG ANH 11 THEO CHƯƠNG TRÌNH GLOBAL SUCCESS ĐÁP ÁN CHI TIẾT - CẢ NĂ...
 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
FILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipinoFILIPINO PSYCHology sikolohiyang pilipino
FILIPINO PSYCHology sikolohiyang pilipino
 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
 
ENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choomENGLISH6-Q4-W3.pptxqurter our high choom
ENGLISH6-Q4-W3.pptxqurter our high choom
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 

Dm part03 neural-networks-handout

  • 1. Christof Monz Informatics Institute University of Amsterdam Data Mining Part 3: Neural Networks Overview Christof Monz Data Mining - Part 3: Neural Networks 1 Perceptrons Gradient descent search Multi-layer neural networks The backpropagation algorithm
  • 2. Neural Networks Christof Monz Data Mining - Part 3: Neural Networks 2 Analogy to biological neural systems, the most robust learning systems we know Attempt to understand natural biological systems through computational modeling Massive parallelism allows for computational efficiency Help understand ‘distributed’ nature of neural representations Intelligent behavior as an ‘emergent’ property of large number of simple units rather than from explicitly encoded symbolic rules and algorithms Neural Network Learning Christof Monz Data Mining - Part 3: Neural Networks 3 Learning approach based on modeling adaptation in biological neural systems Perceptron: Initial algorithm for learning simple neural networks (single layer) developed in the 1950s Backpropagation: More complex algorithm for learning multi-layer neural networks developed in the 1980s.
  • 3. Real Neurons Christof Monz Data Mining - Part 3: Neural Networks 4 Human Neural Network Christof Monz Data Mining - Part 3: Neural Networks 5
  • 4. Modeling Neural Networks Christof Monz Data Mining - Part 3: Neural Networks 6 Perceptrons Christof Monz Data Mining - Part 3: Neural Networks 7
  • 5. Perceptrons Christof Monz Data Mining - Part 3: Neural Networks 8 A perceptron is a single layer neural network with one output unit The output of a perceptron is computed as follows o(x1 ...xn) = 1 ifw0 +w1x1 +...+wnxn > 0 −1 otherwise Assume a ‘dummy’ input x0 = 1 we can write: o(x1 ...xn) = 1 if ∑n i=0 wixi > 0 −1 otherwise Perceptrons Christof Monz Data Mining - Part 3: Neural Networks 9 Learning a perceptron involves choosing the ‘right’ values for the weights w0 ...wn The set of candidate hypotheses is H = {w | w ∈ ℜ(n+1)}
  • 6. Representational Power Christof Monz Data Mining - Part 3: Neural Networks 10 A single perceptron represent many boolean functions, e.g. AND, OR, NAND (¬AND), . . . , but not all (e.g., XOR) Peceptron Training Rule Christof Monz Data Mining - Part 3: Neural Networks 11 The perceptron training rule can be defined for each weight as: wi ← wi +∆wi where ∆wi = η(t −o)xi where t is the target output, o is the output of the perceptron, and η is the learning rate This scenario assume that we know what the target outputs are supposed to be like
  • 7. Peceptron Training Rule Example Christof Monz Data Mining - Part 3: Neural Networks 12 If t = o then η(t −o)xi = 0 and ∆wi = 0, i.e. the weight for wi remains unchanged, regardless of the learning rate and the input values (i.e. xi) Let’s assume a learning rate of η = 0.1 and an input value of xi = 0.8 • If t = +1 and o = −1, then ∆wi = 0.1(1 −(−1)))0.8 = 0.16 • If t = −1 and o = +1, then ∆wi = 0.1(−1 −1)))0.8 = −0.16 Peceptron Training Rule Christof Monz Data Mining - Part 3: Neural Networks 13 The perceptron training rule converges after a finite number of iterations Stopping criterion holds if the amount of changes falls below a pre-defined threshold θ, e.g., if |∆w|L1 < θ But only if the training examples are linearly separable
  • 8. The Delta Rule Christof Monz Data Mining - Part 3: Neural Networks 14 The delta rule overcomes the shortcoming of the perceptron training rule not being guaranteed to converge if the examples are not linearly separable Delta rule is based on gradient descent search Let’s assume we have an unthresholded perceptron: o(x) = w ·x We can define the training error as: E(w) = 1 2 ∑ d∈D (td −od)2 where D is the set of training examples Error Surface Christof Monz Data Mining - Part 3: Neural Networks 15
  • 9. Gradient Descent Christof Monz Data Mining - Part 3: Neural Networks 16 The gradient of E is the vector pointing in the direction of the steepest increase for any point on the error surface ∇E(w) = ∂E ∂w0 , ∂E ∂w1 ,..., ∂E ∂wn Since we are interested in minimizing the error, we consider negative gradients: −∇E(w) The training rule for gradient descent is: w ← w +∆w where ∆w = −η∇E(w) Gradient Descent Christof Monz Data Mining - Part 3: Neural Networks 17 The training rule for individual weights is defined as wi ← wi +∆wi where ∆wi = −η ∂E ∂wi Instantiating E for the error function we use gives: ∂E ∂wi = ∂ ∂wi 1 2 ∑ d∈D (td −od)2 How do we use partial derivatives to actually compute updates to weights at each step?
  • 10. Gradient Descent Christof Monz Data Mining - Part 3: Neural Networks 18 ∂E ∂wi = ∂ ∂wi 1 2 ∑ d∈D (td −od)2 = 1 2 ∑ d∈D ∂ ∂wi (td −od)2 = 1 2 ∑ d∈D 2(td −od) ∂ ∂wi (td −od) = ∑ d∈D (td −od) ∂ ∂wi (td −od) ∂E ∂wi = ∑ d∈D (td −od)·(−xid) Gradient Descent Christof Monz Data Mining - Part 3: Neural Networks 19 The delta rule for individual weights can now be written as wi ← wi +∆wi where ∆wi = η ∑ d∈D (td −od)xid The gradient descent algorithm • picks initial random weights • computes the outputs • updates each weight by adding ∆wi • repeats until converge
  • 11. The Gradient Descent Algorithm Christof Monz Data Mining - Part 3: Neural Networks 20 Each training example is a pair < x,t > 1 Initialize each wi to some small random value 2 Until the termination condition is met do: 2.1 Initialize each ∆wi to 0 2.2 For each < x,t >∈ D do 2.2.1 Compute o(x) 2.2.2 For each weight wi do ∆wi ← ∆wi +η(t −o)xi 2.3 For each weight wi do wi ← wi +∆wi The Gradient Descent Algorithm Christof Monz Data Mining - Part 3: Neural Networks 21 The gradient descent algorithm will find the global minimum, provided that the learning rate is small enough If the learning rate is too large, this algorithm runs into the risk of overstepping the global minimum It’s a common strategy to gradually the decrease the learning rate This algorithm works also in case the training examples are not linearly separable
  • 12. Shortcomings of Gradient Descent Christof Monz Data Mining - Part 3: Neural Networks 22 Converging to a minimum can be quite slow (i.e. it can take thousands of steps). Increasing the learning rate on the other hand can lead to overstepping minima If there are multiple local minima in the error surface, gradient descent can get stuck in one of them and not find the global minimum Stochastic gradient descent alleviates these difficulties Stochastic Gradient Descent Christof Monz Data Mining - Part 3: Neural Networks 23 Gradient descent updates the weights after summing over all training examples Stochastic (or incremental) gradient descent updates weights incrementally after calculating the error for each individual training example This this end step 2.3 is deleted and step 2.2.2 modified
  • 13. Stochastic Gradient Descent Christof Monz Data Mining - Part 3: Neural Networks 24 Each training example is a pair < x,t > 1 Initialize each wi to some small random value 2 Until the termination condition is met do: 2.1 Initialize each ∆wi to 0 2.2 For each < x,t >∈ D do 2.2.1 Compute o(x) 2.2.2 For each weight wi do wi ← wi +η(t −o)xi Comparison Christof Monz Data Mining - Part 3: Neural Networks 25 In standard gradient descent summing over multiple examples requires more computations per weight update step As a consequence standard gradient descent often uses larger learning rates than stochastic gradient descent Stochastic gradient descent can avoid falling into local minima because it uses the different ∇Ed(w) rather than the overall ∇E(w) to guide its search
  • 14. Multi-Layer Neural Networks Christof Monz Data Mining - Part 3: Neural Networks 26 Perceptrons only have two layers: the input layer and the output layer Perceptrons only have one output unit Perceptrons are limited in their expressiveness Multi-layer neural networks consist of an input layer, a hidden layer, and an output layer Multi-layer neural networks can have several output units Multi-Layer Neural Networks Christof Monz Data Mining - Part 3: Neural Networks 27
  • 15. Multi-Layer Neural Networks Christof Monz Data Mining - Part 3: Neural Networks 28 The units of the hidden layer function as input units to the next layer However, multiple layers of linear units still produce only linear functions The step function in perceptrons is another choice, but it is not differentiable, and therefore not suitable for gradient descent search Solution: the sigmoid function, a non-linear, differentiable threshold function Sigmoid Unit Christof Monz Data Mining - Part 3: Neural Networks 29
  • 16. The Sigmoid Function Christof Monz Data Mining - Part 3: Neural Networks 30 The output is computed as o = σ(w ·x) where σ(y) = 1 1+e−y i.e. o = σ(w ·x) = 1 1+e−(w·x) Another nice property of the sigmoid function is that its derivative is easily expressed: dσ(y) dy = σ(y)·(1 −σ(y)) Learning with Multiple Layers Christof Monz Data Mining - Part 3: Neural Networks 31 The gradient descent search can be used to train multi-layer neural networks, but the algorithm has to be adapted Firstly, there can be multiple output units, and therefore the error function as to be generalized: E(w) = 1 2 ∑ d∈D ∑ k∈outputs (tkd −okd)2 Secondly, the error ‘feedback’ has to be fed through multiple layers
  • 17. Backpropagation Algorithm Christof Monz Data Mining - Part 3: Neural Networks 32 For each training example < x,t > do 1. Input x to the network and compute ou for every unit in the network 2. For each output unit k calculate its error δk : δk ← ok (1 −ok )(tk −ok ) 3. For each hidden unit h calculate its error δh: δh ← oh(1 −oh) ∑ k∈outputs wkhδk 4. Update each network weight wji : wji ← wji +∆wji where ∆wji = ηδj xji Note: xji is the value from unit i to j and wji is the weight of connecting unit i to j, Backpropagation Algorithm Christof Monz Data Mining - Part 3: Neural Networks 33 Step 1 propagates the input forward through the network Steps 2–4 propagate the errors backward through the network Step 2 is similar to the delta rule in gradient descent (step 2.3) Step 3 sums over the errors of all output units influence by a given hidden unit (this is because the training data only provides direct feedback for the output units)
  • 18. Applications of Neural Networks Christof Monz Data Mining - Part 3: Neural Networks 34 Text to speech Fraud detection Automated vehicles Game playing Handwriting recognition Summary Christof Monz Data Mining - Part 3: Neural Networks 35 Perceptrons, simple one layer neural networks Perceptron training rule Gradient descent search Multi-layer neural networks Backpropagation algorithm