SlideShare una empresa de Scribd logo
1 de 30
Descargar para leer sin conexión
[course site]
Day 2 Lecture 2
Backpropagation
Elisa Sayrol
Acknowledgements
Kevin McGuinness
kevin.mcguinness@dcu.ie
Research Fellow
Insight Centre for Data Analytics
Dublin City University
Xavier Giro-i-Nieto
xavier.giro@upc.edu
…in our previous lecture
Multilayer perceptrons
When each node in each layer is a linear
combination of all inputs from the previous
layer then the network is called a multilayer
perceptron (MLP)
Weights can be organized into matrices.
Forward pass computes
!(#)
Training MLPs
With Multiple Layer Perceptrons we need to find the gradient of the loss function with respect to all the
parameters of the model (W(t), b(t))
These can be found using the chain rule of differentiation.
The calculations reveal that the gradient wrt the parameters in layer k only depends on the error from the
above layer and the output from the layer below.
This means that the gradients for each layer can be computed iteratively, starting at the last layer and
propagating the error back through the network. This is known as the backpropagation algorithm.
• Computational Graphs
• Examples applying chain of rule in simple graphs
• Backpropagation applied to Multilayer Perceptron
• Another perspective: modular backprop
Backpropagation algorithm
Computational graphs
z
x y
x
u(1) u(2)
·
+
y^
x w b
s
U(1) U(2)
matmul
+
H
X W b
relu
u(1)
u(2)
·
y^
x w l
x
u(3)
sum
sqrt
! = #$ %$=&(x)w + b) .=max 0, 12 + 3 %$=x)w
4 5
6
76
8
From Deep Learning Book
Computational graphs
Applying the Chain Rule to Computational Graphs
! = #(%)
' = ( # %
)'
)%
=
)'
)!
)!
)%
)'
)!
)!
)%*
)!
)%+
)'
)%*
)'
)%+
)'
)%*
=
)'
)!
)!
)%*
)'
)%+
=
)'
)!
)!
)%+
%*
%+
! '
fg
)'
)!
' = ((!)
From computational graphs to MLP
Numerical Examples
From Stanford Course: Convolutional Neural Networks for Visual Recognition 2017
+
x
! ", $, % = " + $ %
"
$
%
(
−2
5
−4
-12
3
( = " + $
! = (%
-(
-"
= 1
-(
-$
= 1
-!
-(
= %
-!
-%
= (
/0 1234 45 6578940:
-!
-"
,
-!
-$
,
-!
-%
;"278<0 " = −2, $ = 5, % = −4
-!
-!
= 1 -!
-!
= 1
-!
-(
= % = −4
-!
-%
= ( = 3
-!
-"
=
-!
-(
-(
-"
= −4 · 1 = −4
-!
-$
=
-!
-(
-(
-$
= −4 · 1 = −4
Applying the Chain Rule to Computational Graphs
From computational graphs to MLP
x
+! ", $, % = ' ()") + (+"+ + ,
(0
"0
,
x
(1
"1
+ s
/'(")
/"
=
234
1 + 234 2
=
1 + 234 − 1
1 + 234
1
1 + 234
' " =
1
1 + 234
2
−1
−3
−2
−3
−2
6
4 1 0.73
1
/'(")
/"
= (1 − '("))('("))
0,20,2
0,2
-0,2
0,2
0,2
0,4
-0,4
-0,6
From Stanford Course: Convolutional Neural Networks for Visual Recognition
:!
:!
= 1
:!
:(0
:!
:(1
:!
:"0
:!
:"1
:!
:,
Applying the Chain Rule to Computational Graphs
Numerical Examples
Computational graphs
Gates. Backward Pass
! " =
1
1 + &'(
)!(")
)"
= (1 − !("))(!("))
- = " + . /-
/"
= 1
/-
/.
= 1
0 = -1
/0
/-
= 1
/0
/1
= -
Sum: Distributes the gradient to both branches
Product: Switches gradient weigth values
Max: Routes the gradient only to the higher input
branche (not sensitive to the lower branche)
"
.
-0,2
0,2
0
max
2
1
2
+
In general: Derivative of a function
Add branches: Branches that split in the forward pass
and merge in the backward pass, add gradients
Computational graphs
Numerical Examples
From Stanford Course: Convolutional Neural Networks for Visual Recognition 2017
x
! ", $ = & ' ( 2 = *
+,-
.
& ' ( +
/
= *
+,-
.
0 +
/
&
(
0 0,116
1!
12+
= 22+
1
3&! = 20 ' (4
L2
0.1 0.5
−0.3 0.8
0.2
0.4
0.22
0.26
0.44
0.52
0.088 0.176
0.104 0.208
30! = 20
3(! = 2&? ' 0
−0.112
0.636
Backpropagation applied to an element of the MLP
For a single neuron with its linear and non-linear part
ℎ"
#
g(·)
ℎ(
)
ℎ"
#*"
+)*"
,)*" = .(/),) +0)) = .(+)*")
1+)*"
1,)
= /2
1,)*"
1+)*"
= .3(+)*")
…Backpropagation applied to Multilayer
Perceptron
h2 h3a3 a4 h4
Loss
Hidden Hidden Output
W2 W3
x a2
Input
W1
Forward Pass
h2 h3a3 a4 h4
Loss
Hidden Hidden Output
W2 W3
x a2
Input
W1
L
Backward Pass
Backpropagation is applied to the Backward Pass
Probability Class given an input
(softmax)
h2 h3a3 a4 h4
Loss
Hidden Hidden Output
W2 W3
x a2
Input
W1
Figure Credit: Kevin McGuiness
Forward Pass
Probability Class given an input
(softmax)
Loss function; e.g., negative log-likelihood
(good for classification)
h2 h3a3 a4 h4
Loss
Hidden Hidden Output
W2 W3
x a2
Input
W1
Regularization term (L2 Norm)
aka as weight decay
Figure Credit: Kevin McGuiness
Forward Pass
Probability Class given an input
(softmax)
Minimize the loss (plus some regularization
term) w.r.t. Parameters over the whole
training set.
Loss function; e.g., negative log-likelihood
(good for classification)
h2 h3a3 a4 h4
Loss
Hidden Hidden Output
W2 W3
x a2
Input
W1
Regularization term (L2 Norm)
aka as weight decay
Figure Credit: Kevin McGuiness
Forward Pass
1. Find the error in the top layer:
h2 h3a3 a4 h4
Loss
Hidden Hidden Output
W2 W3
x a2
Input
W1
L
Figure Credit: Kevin McGuiness
Backward Pass
1. Find the error in the top layer: 2. Compute weight updates
h2 h3a3 a4 h4
Loss
Hidden Hidden Output
W2 W3
x a2
Input
W1
L
Figure Credit: Kevin McGuiness
Backward Pass
To simplify we don’t consider the biass
1. Find the error in the top layer: 3. Backpropagate error to layer below2. Compute weight updates
h2 h3a3 a4 h4
Loss
Hidden Hidden Output
W2 W3
x a2
Input
W1
L
Figure Credit: Kevin McGuiness
Backward Pass
To simplify we don’t consider the biass
Another perspective: Modular backprop
You could use the chain rule on all the individual neurons to compute the
gradients with respect to the parameters and backpropagate the error signal.
It is useful to use the layer abstraction
Then define the backpropagation algorithm in terms of three operations that layers
need to be able to do.
This is called modular backpropagation
The layer abstraction
* see change of notation
*
Linear layer
ReLU layer
Modular backprop
Using this idea, it is possible to create
many types of layers
● Linear (fully connected layers)
● Activation functions (sigmoid, ReLU)
● Convolutions
● Pooling
● Dropout
Once layers support the backward and
forward operations, they can be plugged
together to create more complex functions
Convolution
Input Error (L)
Gradients
ReLU
Linear
Gradients
Output Error (L+1)
Implementation notes
Caffe and Torch
Libraries like Caffe and Torch implement
backpropagation this way.
To define a new layer, you need to create
an class and define the forward and
backward operations.
Theano and TensorFlow
Libraries like Theano and TensorFlow
operate on a computational graph.
To define a new layer, you only need to
specify the forward operation. Autodiff is
used to automatically infer backward.
You also don't need to implement
backprop manually in Theano or
TensorFlow. It uses computational graph
optimizations to automatically factor out
common computations.
Issues on Backpropagation and Training
Gradient Descent: Move the parameter !"in small steps in the direction opposite sign of the
derivative of the loss with respect j.
!($) = !($'() − * $'( + ,-ℒ /, 1 2 − 3! $'(
Stochastic gradient descent (SGD): estimate the gradient with one sample, or better, with a
minibatch of examples.
Weight Decay: Regularization term that penalizes large weights, distributes values among all the
parameters
Momentum: the movement direction of parameters averages the gradient estimation with
previous ones.
Several strategies have been proposed to update the weights: optimizers
Note on hyperparameters
So far we have lots of hyperparameters to choose:
1. Learning rate (a)
2. Weight decay (l)
3. Number of epochs
4. Number of hidden layers
5. Nodes in each hidden layer
6. Weight initialization strategy
7. Loss function
8. Activation functions
9. ...
… next class more
Summary
• Backpropagation is applied during the Backward pass while training
• Computational graphs help to understand the chain rule of differentiation
• Parameters in layer k only depend on the error from the above layer and the output from
the layer below. This means that the gradients for each layer can be computed iteratively,
starting at the last layer and propagating the error back through the network.
• Hyperparameters have to be chosen and it’s not obvious
• For a “deeper” study: http://www.deeplearningbook.org/

Más contenido relacionado

La actualidad más candente

Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Simplilearn
 

La actualidad más candente (20)

Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018
 
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
Loss functions (DLAI D4L2 2017 UPC Deep Learning for Artificial Intelligence)
 
Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks
Skip RNN: Learning to Skip State Updates in Recurrent Neural NetworksSkip RNN: Learning to Skip State Updates in Recurrent Neural Networks
Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks
 
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
Transfer Learning and Domain Adaptation - Ramon Morros - UPC Barcelona 2018
 
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
Deep Generative Models I (DLAI D9L2 2017 UPC Deep Learning for Artificial Int...
 
The Perceptron (D1L2 Deep Learning for Speech and Language)
The Perceptron (D1L2 Deep Learning for Speech and Language)The Perceptron (D1L2 Deep Learning for Speech and Language)
The Perceptron (D1L2 Deep Learning for Speech and Language)
 
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
 
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
Variational Autoencoders VAE - Santiago Pascual - UPC Barcelona 2018
 
Convolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in TheanoConvolutional Neural Network (CNN) presentation from theory to code in Theano
Convolutional Neural Network (CNN) presentation from theory to code in Theano
 
Recurrent Neural Networks (DLAI D7L1 2017 UPC Deep Learning for Artificial In...
Recurrent Neural Networks (DLAI D7L1 2017 UPC Deep Learning for Artificial In...Recurrent Neural Networks (DLAI D7L1 2017 UPC Deep Learning for Artificial In...
Recurrent Neural Networks (DLAI D7L1 2017 UPC Deep Learning for Artificial In...
 
Anomaly detection using deep one class classifier
Anomaly detection using deep one class classifierAnomaly detection using deep one class classifier
Anomaly detection using deep one class classifier
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
 
Learning Financial Market Data with Recurrent Autoencoders and TensorFlow
Learning Financial Market Data with Recurrent Autoencoders and TensorFlowLearning Financial Market Data with Recurrent Autoencoders and TensorFlow
Learning Financial Market Data with Recurrent Autoencoders and TensorFlow
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagation
 
K-means and GMM
K-means and GMMK-means and GMM
K-means and GMM
 
nn network
nn networknn network
nn network
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learning
 
Neural tool box
Neural tool boxNeural tool box
Neural tool box
 
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
Backpropagation And Gradient Descent In Neural Networks | Neural Network Tuto...
 

Similar a Backpropagation - Elisa Sayrol - UPC Barcelona 2018

Similar a Backpropagation - Elisa Sayrol - UPC Barcelona 2018 (20)

Deep Learning for Computer Vision: Backward Propagation (UPC 2016)
Deep Learning for Computer Vision: Backward Propagation (UPC 2016)Deep Learning for Computer Vision: Backward Propagation (UPC 2016)
Deep Learning for Computer Vision: Backward Propagation (UPC 2016)
 
DeepLearningLecture.pptx
DeepLearningLecture.pptxDeepLearningLecture.pptx
DeepLearningLecture.pptx
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and Regularization
 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptx
 
Deep learning from scratch
Deep learning from scratch Deep learning from scratch
Deep learning from scratch
 
19 - Neural Networks I.pptx
19 - Neural Networks I.pptx19 - Neural Networks I.pptx
19 - Neural Networks I.pptx
 
Introduction to PyTorch
Introduction to PyTorchIntroduction to PyTorch
Introduction to PyTorch
 
Lesson_8_DeepLearning.pdf
Lesson_8_DeepLearning.pdfLesson_8_DeepLearning.pdf
Lesson_8_DeepLearning.pdf
 
04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks04 Multi-layer Feedforward Networks
04 Multi-layer Feedforward Networks
 
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
Training Deep Networks with Backprop (D1L4 Insight@DCU Machine Learning Works...
 
Hands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in PythonHands-on Tutorial of Machine Learning in Python
Hands-on Tutorial of Machine Learning in Python
 
My Postdoctoral Research
My Postdoctoral ResearchMy Postdoctoral Research
My Postdoctoral Research
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
 
Artificial Neural Network
Artificial Neural NetworkArtificial Neural Network
Artificial Neural Network
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learning
 
机器学习Adaboost
机器学习Adaboost机器学习Adaboost
机器学习Adaboost
 
Machine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data DemystifiedMachine Learning Essentials Demystified part2 | Big Data Demystified
Machine Learning Essentials Demystified part2 | Big Data Demystified
 
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
Backpropagation (DLAI D3L1 2017 UPC Deep Learning for Artificial Intelligence)
 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlow
 
MATLAB for Technical Computing
MATLAB for Technical ComputingMATLAB for Technical Computing
MATLAB for Technical Computing
 

Más de Universitat Politècnica de Catalunya

Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Universitat Politècnica de Catalunya
 

Más de Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
 
Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
 

Último

如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
nirzagarg
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
q6pzkpark
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
wsppdmt
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
gajnagarg
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
vexqp
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
vexqp
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
gajnagarg
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Abortion pills in Riyadh +966572737505 get cytotec
 

Último (20)

如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
Top profile Call Girls In Begusarai [ 7014168258 ] Call Me For Genuine Models...
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
一比一原版(曼大毕业证书)曼尼托巴大学毕业证成绩单留信学历认证一手价格
 
7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt7. Epi of Chronic respiratory diseases.ppt
7. Epi of Chronic respiratory diseases.ppt
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With OrangePredicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
Predicting HDB Resale Prices - Conducting Linear Regression Analysis With Orange
 
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
一比一原版(UCD毕业证书)加州大学戴维斯分校毕业证成绩单原件一模一样
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
Top profile Call Girls In dimapur [ 7014168258 ] Call Me For Genuine Models W...
 
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
怎样办理旧金山城市学院毕业证(CCSF毕业证书)成绩单学校原版复制
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
Data Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdfData Analyst Tasks to do the internship.pdf
Data Analyst Tasks to do the internship.pdf
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
怎样办理纽约州立大学宾汉姆顿分校毕业证(SUNY-Bin毕业证书)成绩单学校原版复制
 
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
Top profile Call Girls In Chandrapur [ 7014168258 ] Call Me For Genuine Model...
 
Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........Switzerland Constitution 2002.pdf.........
Switzerland Constitution 2002.pdf.........
 
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
5CL-ADBA,5cladba, Chinese supplier, safety is guaranteed
 
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit RiyadhCytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
Cytotec in Jeddah+966572737505) get unwanted pregnancy kit Riyadh
 

Backpropagation - Elisa Sayrol - UPC Barcelona 2018

  • 1. [course site] Day 2 Lecture 2 Backpropagation Elisa Sayrol
  • 2. Acknowledgements Kevin McGuinness kevin.mcguinness@dcu.ie Research Fellow Insight Centre for Data Analytics Dublin City University Xavier Giro-i-Nieto xavier.giro@upc.edu
  • 4. Multilayer perceptrons When each node in each layer is a linear combination of all inputs from the previous layer then the network is called a multilayer perceptron (MLP) Weights can be organized into matrices. Forward pass computes !(#)
  • 5. Training MLPs With Multiple Layer Perceptrons we need to find the gradient of the loss function with respect to all the parameters of the model (W(t), b(t)) These can be found using the chain rule of differentiation. The calculations reveal that the gradient wrt the parameters in layer k only depends on the error from the above layer and the output from the layer below. This means that the gradients for each layer can be computed iteratively, starting at the last layer and propagating the error back through the network. This is known as the backpropagation algorithm.
  • 6. • Computational Graphs • Examples applying chain of rule in simple graphs • Backpropagation applied to Multilayer Perceptron • Another perspective: modular backprop Backpropagation algorithm
  • 7. Computational graphs z x y x u(1) u(2) · + y^ x w b s U(1) U(2) matmul + H X W b relu u(1) u(2) · y^ x w l x u(3) sum sqrt ! = #$ %$=&(x)w + b) .=max 0, 12 + 3 %$=x)w 4 5 6 76 8 From Deep Learning Book
  • 8. Computational graphs Applying the Chain Rule to Computational Graphs ! = #(%) ' = ( # % )' )% = )' )! )! )% )' )! )! )%* )! )%+ )' )%* )' )%+ )' )%* = )' )! )! )%* )' )%+ = )' )! )! )%+ %* %+ ! ' fg )' )! ' = ((!)
  • 9. From computational graphs to MLP Numerical Examples From Stanford Course: Convolutional Neural Networks for Visual Recognition 2017 + x ! ", $, % = " + $ % " $ % ( −2 5 −4 -12 3 ( = " + $ ! = (% -( -" = 1 -( -$ = 1 -! -( = % -! -% = ( /0 1234 45 6578940: -! -" , -! -$ , -! -% ;"278<0 " = −2, $ = 5, % = −4 -! -! = 1 -! -! = 1 -! -( = % = −4 -! -% = ( = 3 -! -" = -! -( -( -" = −4 · 1 = −4 -! -$ = -! -( -( -$ = −4 · 1 = −4 Applying the Chain Rule to Computational Graphs
  • 10. From computational graphs to MLP x +! ", $, % = ' ()") + (+"+ + , (0 "0 , x (1 "1 + s /'(") /" = 234 1 + 234 2 = 1 + 234 − 1 1 + 234 1 1 + 234 ' " = 1 1 + 234 2 −1 −3 −2 −3 −2 6 4 1 0.73 1 /'(") /" = (1 − '("))('(")) 0,20,2 0,2 -0,2 0,2 0,2 0,4 -0,4 -0,6 From Stanford Course: Convolutional Neural Networks for Visual Recognition :! :! = 1 :! :(0 :! :(1 :! :"0 :! :"1 :! :, Applying the Chain Rule to Computational Graphs Numerical Examples
  • 11. Computational graphs Gates. Backward Pass ! " = 1 1 + &'( )!(") )" = (1 − !("))(!(")) - = " + . /- /" = 1 /- /. = 1 0 = -1 /0 /- = 1 /0 /1 = - Sum: Distributes the gradient to both branches Product: Switches gradient weigth values Max: Routes the gradient only to the higher input branche (not sensitive to the lower branche) " . -0,2 0,2 0 max 2 1 2 + In general: Derivative of a function Add branches: Branches that split in the forward pass and merge in the backward pass, add gradients
  • 12. Computational graphs Numerical Examples From Stanford Course: Convolutional Neural Networks for Visual Recognition 2017 x ! ", $ = & ' ( 2 = * +,- . & ' ( + / = * +,- . 0 + / & ( 0 0,116 1! 12+ = 22+ 1 3&! = 20 ' (4 L2 0.1 0.5 −0.3 0.8 0.2 0.4 0.22 0.26 0.44 0.52 0.088 0.176 0.104 0.208 30! = 20 3(! = 2&? ' 0 −0.112 0.636
  • 13. Backpropagation applied to an element of the MLP For a single neuron with its linear and non-linear part ℎ" # g(·) ℎ( ) ℎ" #*" +)*" ,)*" = .(/),) +0)) = .(+)*") 1+)*" 1,) = /2 1,)*" 1+)*" = .3(+)*")
  • 14. …Backpropagation applied to Multilayer Perceptron
  • 15. h2 h3a3 a4 h4 Loss Hidden Hidden Output W2 W3 x a2 Input W1 Forward Pass h2 h3a3 a4 h4 Loss Hidden Hidden Output W2 W3 x a2 Input W1 L Backward Pass Backpropagation is applied to the Backward Pass
  • 16. Probability Class given an input (softmax) h2 h3a3 a4 h4 Loss Hidden Hidden Output W2 W3 x a2 Input W1 Figure Credit: Kevin McGuiness Forward Pass
  • 17. Probability Class given an input (softmax) Loss function; e.g., negative log-likelihood (good for classification) h2 h3a3 a4 h4 Loss Hidden Hidden Output W2 W3 x a2 Input W1 Regularization term (L2 Norm) aka as weight decay Figure Credit: Kevin McGuiness Forward Pass
  • 18. Probability Class given an input (softmax) Minimize the loss (plus some regularization term) w.r.t. Parameters over the whole training set. Loss function; e.g., negative log-likelihood (good for classification) h2 h3a3 a4 h4 Loss Hidden Hidden Output W2 W3 x a2 Input W1 Regularization term (L2 Norm) aka as weight decay Figure Credit: Kevin McGuiness Forward Pass
  • 19. 1. Find the error in the top layer: h2 h3a3 a4 h4 Loss Hidden Hidden Output W2 W3 x a2 Input W1 L Figure Credit: Kevin McGuiness Backward Pass
  • 20. 1. Find the error in the top layer: 2. Compute weight updates h2 h3a3 a4 h4 Loss Hidden Hidden Output W2 W3 x a2 Input W1 L Figure Credit: Kevin McGuiness Backward Pass To simplify we don’t consider the biass
  • 21. 1. Find the error in the top layer: 3. Backpropagate error to layer below2. Compute weight updates h2 h3a3 a4 h4 Loss Hidden Hidden Output W2 W3 x a2 Input W1 L Figure Credit: Kevin McGuiness Backward Pass To simplify we don’t consider the biass
  • 22. Another perspective: Modular backprop You could use the chain rule on all the individual neurons to compute the gradients with respect to the parameters and backpropagate the error signal. It is useful to use the layer abstraction Then define the backpropagation algorithm in terms of three operations that layers need to be able to do. This is called modular backpropagation
  • 23. The layer abstraction * see change of notation *
  • 26. Modular backprop Using this idea, it is possible to create many types of layers ● Linear (fully connected layers) ● Activation functions (sigmoid, ReLU) ● Convolutions ● Pooling ● Dropout Once layers support the backward and forward operations, they can be plugged together to create more complex functions Convolution Input Error (L) Gradients ReLU Linear Gradients Output Error (L+1)
  • 27. Implementation notes Caffe and Torch Libraries like Caffe and Torch implement backpropagation this way. To define a new layer, you need to create an class and define the forward and backward operations. Theano and TensorFlow Libraries like Theano and TensorFlow operate on a computational graph. To define a new layer, you only need to specify the forward operation. Autodiff is used to automatically infer backward. You also don't need to implement backprop manually in Theano or TensorFlow. It uses computational graph optimizations to automatically factor out common computations.
  • 28. Issues on Backpropagation and Training Gradient Descent: Move the parameter !"in small steps in the direction opposite sign of the derivative of the loss with respect j. !($) = !($'() − * $'( + ,-ℒ /, 1 2 − 3! $'( Stochastic gradient descent (SGD): estimate the gradient with one sample, or better, with a minibatch of examples. Weight Decay: Regularization term that penalizes large weights, distributes values among all the parameters Momentum: the movement direction of parameters averages the gradient estimation with previous ones. Several strategies have been proposed to update the weights: optimizers
  • 29. Note on hyperparameters So far we have lots of hyperparameters to choose: 1. Learning rate (a) 2. Weight decay (l) 3. Number of epochs 4. Number of hidden layers 5. Nodes in each hidden layer 6. Weight initialization strategy 7. Loss function 8. Activation functions 9. ... … next class more
  • 30. Summary • Backpropagation is applied during the Backward pass while training • Computational graphs help to understand the chain rule of differentiation • Parameters in layer k only depend on the error from the above layer and the output from the layer below. This means that the gradients for each layer can be computed iteratively, starting at the last layer and propagating the error back through the network. • Hyperparameters have to be chosen and it’s not obvious • For a “deeper” study: http://www.deeplearningbook.org/