SlideShare a Scribd company logo
1 of 27
Neural Networks and
Fuzzy Systems
Multi-layer Feed forward Networks
Dr. Tamer Ahmed Farrag
Course No.: 803522-3
Course Outline
Part I : Neural Networks (11 weeks)
• Introduction to Machine Learning
• Fundamental Concepts of Artificial Neural Networks
(ANN)
• Single layer Perception Classifier
• Multi-layer Feed forward Networks
• Single layer FeedBack Networks
• Unsupervised learning
Part II : Fuzzy Systems (4 weeks)
• Fuzzy set theory
• Fuzzy Systems
2
Outline
• Why we need Multi-layer Feed forward Networks
(MLFF)?
• Error Function (or Cost Function or Loss function)
• Gradient Descent
• Backpropagation
3
Why we need Multi-layer Feed forward
Networks (MLFF)?
• Overcoming failure of single layer perceptron in
solving nonlinear problems.
• First Suggestion:
• Divide the problem space into smaller linearly separable
regions
• Use a perceptron for each linearly separable region
• Combine the output of multiple hidden neurons to
produce a final decision neuron.
4
Region 1
Region 2
Why we need Multi-layer Feed forward
Networks (MLFF)?
• Second suggestion
• In some cases we need a curve decision boundary or we try to solve
more complicated classification and regression problems.
• So, we need to:
• Add more layers
• Increase a number of neurons in each layer.
• Use non linear activation function in
the hidden layers.
• So , we need Multi-layer Feed forward Networks (MLFF).
5
Notation for Multi-Layer Networks
• Dealing with multi-layer networks is easy if a sensible notation is adopted.
• We simply need another label (n) to tell us which layer in the network we
are dealing with.
• Each unit j in layer n receives activations 𝑜𝑢𝑡𝑖
(𝑛−1)
𝑤𝑖𝑗
(𝑛)
from the previous
layer of processing units and sends activations 𝑜𝑢𝑡𝑗
(𝑛)
to the next layer of
units.
6
1
2
3
1
2
layer (0) layer (1)
𝒘𝒊𝒋
(𝟏)
layer (n-1) layer (n)
𝒘𝒊𝒋
(𝒏)
ANN Representation
(1 input layer + 1 hidden layer +1 output layer)
7
for example:
𝑧1
(1)
= (𝑤11
(1)
𝑥1 + 𝑤21
(1)
𝑥2+ 𝑤31
(1)
𝑥3 + 𝑏1
(1)
)
𝑎1
(1)
= 𝑓 𝑧1
(1)
=σ (𝑧1
(1)
)
𝑧2
(2)
= (𝑤12
(2)
𝑎1
(1)
+ 𝑤22
(2)
𝑎2
(1)
+ 𝑤32
(2)
𝑎3
(1)
+ 𝑏2
(2)
)
𝑦2 = 𝑎2
(2)
= 𝑓 𝑧2
(2)
=σ (𝑧2
(2)
)
𝒛𝒋
(𝒍)
=
𝒋
𝒘𝒊𝒋
(𝒍)
𝒂𝒊
(𝒍−𝟏)
+ 𝒃𝒋
(𝒍)
𝒂𝒋
(𝒍)
= 𝒇 𝒛𝒋
𝒍
= σ 𝒛𝒋
𝒍
layer (0)
𝑥1= 𝑎1
(0)
𝑥2= 𝑎2
(0)
𝒛 𝟏
(𝟏)
𝒂 𝟏
(𝟏)
𝒛 𝟐
(𝟏)
𝒂 𝟐
(𝟏)
𝒛 𝟑
(𝟏)
𝒂 𝟑
(𝟏)
layer (1)
𝒛 𝟏
(𝟐)
𝒂 𝟏
(𝟐)
𝒛 𝟐
(𝟐)
𝒂 𝟐
(𝟐)
layer (2)
𝒘 𝟏𝟏
(𝟏)
𝒘 𝟏𝟐
(𝟏)
𝒘 𝟏𝟑
(𝟏)
𝒘 𝟐𝟏
(𝟏)
𝒘 𝟐𝟐
(𝟏)
𝒘 𝟐𝟑
(𝟏)
𝒘 𝟑𝟏
(𝟏)
𝒘 𝟑𝟐
(𝟏)
𝒘 𝟑𝟑
(𝟏)
𝒘 𝟏𝟏
(𝟐)
𝒘 𝟏𝟐
(𝟐)
𝒘 𝟐𝟏
(𝟐)
𝒘 𝟐𝟐
(𝟐)
𝒘 𝟑𝟏
(𝟐)
𝒘 𝟑𝟐
(𝟐)
𝒚 𝟏
𝒚 𝟐
𝑥2= 𝑎2
(0)
Gradient Descent
and Backpropagation
Error Function
● how we can evaluate performance of a neuron
????
● We can use a Error function (or cost function or
loss function) to measure how far off we are from
the expected value.
● Choosing appropriate Error function help the
learning algorithm to reach to best values for
weights and biases.
● We’ll use the following variables:
○ D to represent the true value (desired value)
○ y to represent neuron’s prediction 9
Error Functions
(Cost function or Lost Function)
• There are many formulates for error functions.
• In this course, we will deal with two Error function
formulas.
Sum Squared Error (SSE) :
𝑒 𝑝𝑗 = 𝑦𝑗 − 𝐷𝑗
2
for single perceptron
𝐸𝑆𝑆𝐸=
𝑗=1
𝑛
𝑦𝑗 − 𝐷𝑗
2
1
Cross entropy (CE):
𝐸 𝐶𝐸 =
1
𝑛 𝑗=1
𝑛
[𝐷𝑗 ∗ ln(𝑦𝑗) + (1− 𝐷𝑗) ∗ ln(1− 𝑦𝑗)] (2)
10
1
2
Why the error in ANN occurs?
• Each weight and bias in the network contribute in
the occasion of the error.
• To solve this we need:
• A cost function or error function to compute the error.
(SSE or CE Error function)
• An optimization algorithm to minimize the error
function. (Gradient Decent)
• A learning algorithm to modify weights and biases to
new values to get the error down. (Backpropagation)
• Repeat this operation until find the best solution
11
Gradient Decent (in 1 dimension)
• Assume we have a error function E and we need to
use it to update one weight w
• The figure show the error function in terms of w
• Our target is to learn the value of w produces the
minimum value of E.
How?
12
E
W
minimum
Gradient Decent (in 1 dimension)
• In Gradient Decent algorithm, we use the following
equation to get a better value of w:
𝑤 = 𝑤 − αΔ𝑤 (called Delta rule)
Where:
α : is the learning rate
Δ𝑤 : is mathematically can be computed using
derivative of E with respect to w (
𝑑𝐸
𝑑𝑤
)
13
E
W
minimum
𝑤 = 𝑤 − α
𝑑𝐸
𝑑𝑤
(3)
Local Minima problem
14
Choosing learning rate
15
Gradient Decent (multi dimension)
• In ANN with many layers and many neurons in each layer the
Error function will be multi-variable function.
• So, the derivative in equation (3) should be partial derivative
𝑤𝑖𝑗 = 𝑤𝑖𝑗 − α
𝜕𝐸 𝑗
𝜕𝑤 𝑖𝑗
(4)
• We write equation (4) as :
𝑤𝑖𝑗 = 𝑤𝑖𝑗 − α 𝜕𝑤𝑖𝑗
• Same process will be use to get the
new bias value:
𝑏𝑗= 𝑏𝑗 − α 𝜕𝑏𝑗
16
derivative of activation functions
17
Sigmoid
Learning Rule in the output layer
using SSE as error function and sigmoid as Activation
function
𝜕𝐸 𝑗
𝜕𝑤 𝑖𝑗
=
𝜕𝐸 𝑗
𝜕𝑎 𝑗
(𝑙) *
𝜕𝑎 𝑗
(𝑙)
𝜕𝑧 𝑗
(𝑙) *
𝜕𝑧 𝑗
(𝑙)
𝜕𝑤𝑖𝑗
(𝑙)
Where:
𝐸𝑗 =
𝑗
(𝑦𝑗 − 𝐷𝑗 )2
𝑦𝑗 = 𝑎𝑗
(𝑙)
= 𝑓 𝑧𝑗
𝑙
= 𝜎 𝑧𝑗
𝑙
𝑧𝑗
(𝑙)
=
𝑗
𝑤𝑖𝑗
(𝑙)
𝑎𝑖
(𝑙−1)
+ 𝑏𝑗
(𝑙)
From the previous table:
𝜎′ 𝑧𝑗
𝑙
= 𝜎 𝑧𝑗
𝑙
∗ 1 − 𝜎 𝑧𝑗
𝑙
= 𝑦𝑗 (1 − 𝑦𝑗)
18
Learning Rule in the output layer (cont.)
So (How?),
𝜕𝑦𝑗
𝜕𝑧𝑖
= 𝑦𝑗 (1 − 𝑦𝑗)
𝜕𝑧𝑗
𝜕𝑤𝑖𝑗
= 𝑎𝑖
(𝑙−1)
𝜕𝐸𝑗
𝜕𝑦𝑗
= −2(𝑦𝑗 − 𝐷𝑗 )
• Then:
𝜕𝐸 𝑗
𝜕𝑤 𝑖𝑗
= 2𝑎𝑖
𝑙−1
𝑦𝑗 − 𝐷𝑗 𝑦𝑗 1 − 𝑦𝑗
𝑤𝑖𝑗 = 𝑤𝑖𝑗 − 2 α 𝑎𝑖
𝑙−1
𝑦𝑗 − 𝐷𝑗 𝑦𝑗 1 − 𝑦𝑗
19
Learning Rule in the Hidden layer
• Now we have to determine the appropriate
weight change for an input to hidden weight.
• This is more complicated because it depends on
the error at all of the nodes this weighted
connection can lead to.
• The mathematical proof is out our scope.
20
Gradient Decent (Notes)
Note 1:
• the neuron activation function (f ) should be is defined
and differentiable function.
Note 3:
• The calculating of 𝜕𝑤𝑖𝑗 for the hidden layer will be
more difficult (Why?)
Note 2:
• The previous calculation will be repeated for each
weight and for each bias in the ANN
• So, we need big computational power (what about
deeper networks? )
21
Gradient Decent (Notes)
• 𝜕𝑤𝑖𝑗 is represent the change in the values of 𝑤𝑖𝑗
to get better output
• The equation of 𝜕𝑤𝑖𝑗 is dependent on the choosing
of the Error(Cost) function and activation function.
• Gradient Decent algorithm help in calculated the
new values of weights and bias.
• Question: is one iteration (one trail) enough to
bet the best values for weights and biases
• Answer: No, we need a extended version ?
Backpropagation
22
How Backpropagation Work?
23
𝒘 𝟏𝟏
(𝟏)
𝒘 𝟏𝟐
(𝟏)
𝒘 𝟐𝟏
(𝟏)
𝒘 𝟐𝟐
(𝟏)
𝒘 𝟑𝟏
(𝟏)
𝒘 𝟑𝟐
(𝟏)
𝒘 𝟏𝟏
(𝟐)
𝒘 𝟐𝟏
(𝟐)
𝒚
𝒂 𝟏
(𝟏)
= 𝒘 𝟏𝟏
(𝟏)
-𝛼 𝜕𝒘 𝟏𝟏
(𝟏)
= 𝒘 𝟏𝟏
(𝟐)
-𝛼 𝜕𝒘 𝟏𝟏
(𝟐)
𝑭𝒐𝒓𝒘𝒂𝒓𝒅 𝑷𝒓𝒐𝒑𝒂𝒈𝒂𝒕𝒊𝒐𝒏 𝑩𝒂𝒄𝒌 𝑷𝒓𝒐𝒑𝒂𝒈𝒂𝒕𝒊𝒐𝒏
𝒍𝒂𝒚𝒆𝒓 𝟎 𝒍𝒂𝒚𝒆𝒓 𝟏 𝒍𝒂𝒚𝒆𝒓 𝟐
Online Learning vs. Offline Learning
• Online: Pattern-by-Pattern
learning
• Error calculated for each
pattern
• Weights updated after each
individual pattern
𝚫𝒘𝒊𝒋 = −𝜶
𝝏𝑬 𝒑
𝝏𝒘𝒊𝒋
• Offline: Batch learning
• Error calculated for all
patterns
• Weights updated once at
the end of each epoch
𝚫𝒘𝒊𝒋 = −𝜶
𝒑
𝝏𝑬 𝒑
𝝏𝒘𝒊𝒋
24
Choosing Appropriate Activation and Cost
Functions
• We already know consideration of single layer networks what
output activation and cost functions should be used for
particular problem types.
• We have also seen that non-linear hidden unit activations are
needed, such as sigmoids.
• So we can summarize the required network properties:
• Regression/ Function Approximation Problems
• SSE cost function, linear output activations, sigmoid hidden activations
• Classification Problems (2 classes, 1 output)
• CE cost function, sigmoid output and hidden activations
• Classification Problems (multiple-classes, 1 output per class)
• CE cost function, softmax outputs, sigmoid hidden activations
• In each case, application of the gradient descent learning
algorithm (by computing the partial derivatives) leads to
appropriate back-propagation weight update equations.
25
Overall picture : learning process on ANN
26
Neural network simulator
• Search through the internet to find a simulator and
report it
For example:
• https://www.mladdict.com/neural-network-
simulator
• http://playground.tensorflow.org/
27

More Related Content

What's hot

Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Syed Atif Naseem
 
Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learningSANTHOSH RAJA M G
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural networkDEEPASHRI HK
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsMd. Main Uddin Rony
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural NetworksAshray Bhandare
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceMaryamRehman6
 
Weights initialization
Weights initializationWeights initialization
Weights initializationHaiyang Kong
 
Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksFrancesco Collova'
 
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural networkSopheaktra YONG
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)EdutechLearners
 
Autoencoders
AutoencodersAutoencoders
AutoencodersCloudxLab
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and RegressionMegha Sharma
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streamsKrish_ver2
 
Machine learning Lecture 1
Machine learning Lecture 1Machine learning Lecture 1
Machine learning Lecture 1Srinivasan R
 
Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentMuhammad Rasel
 
Back propagation
Back propagationBack propagation
Back propagationNagarajan
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networksAkash Goel
 

What's hot (20)

Statistical Pattern recognition(1)
Statistical Pattern recognition(1)Statistical Pattern recognition(1)
Statistical Pattern recognition(1)
 
Uncertainty in AI
Uncertainty in AIUncertainty in AI
Uncertainty in AI
 
Ensemble methods in machine learning
Ensemble methods in machine learningEnsemble methods in machine learning
Ensemble methods in machine learning
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
Perceptron & Neural Networks
Perceptron & Neural NetworksPerceptron & Neural Networks
Perceptron & Neural Networks
 
Classification Based Machine Learning Algorithms
Classification Based Machine Learning AlgorithmsClassification Based Machine Learning Algorithms
Classification Based Machine Learning Algorithms
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
Decision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data scienceDecision tree induction \ Decision Tree Algorithm with Example| Data science
Decision tree induction \ Decision Tree Algorithm with Example| Data science
 
Ensemble learning
Ensemble learningEnsemble learning
Ensemble learning
 
Weights initialization
Weights initializationWeights initialization
Weights initialization
 
Machine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural NetworksMachine Learning: Introduction to Neural Networks
Machine Learning: Introduction to Neural Networks
 
Feedforward neural network
Feedforward neural networkFeedforward neural network
Feedforward neural network
 
Perceptron (neural network)
Perceptron (neural network)Perceptron (neural network)
Perceptron (neural network)
 
Autoencoders
AutoencodersAutoencoders
Autoencoders
 
Classification and Regression
Classification and RegressionClassification and Regression
Classification and Regression
 
5.1 mining data streams
5.1 mining data streams5.1 mining data streams
5.1 mining data streams
 
Machine learning Lecture 1
Machine learning Lecture 1Machine learning Lecture 1
Machine learning Lecture 1
 
Feed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descentFeed forward ,back propagation,gradient descent
Feed forward ,back propagation,gradient descent
 
Back propagation
Back propagationBack propagation
Back propagation
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
 

Similar to 04 Multi-layer Feedforward Networks

Lecture 5 backpropagation
Lecture 5 backpropagationLecture 5 backpropagation
Lecture 5 backpropagationParveenMalik18
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationYan Xu
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satChenYiHuang5
 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptxPrabhuSelvaraj15
 
nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyayabhishek upadhyay
 
Reinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural NetsReinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural NetsPierre de Lacaze
 
Introduction to PyTorch
Introduction to PyTorchIntroduction to PyTorch
Introduction to PyTorchJun Young Park
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningCastLabKAIST
 
Linear regression, costs & gradient descent
Linear regression, costs & gradient descentLinear regression, costs & gradient descent
Linear regression, costs & gradient descentRevanth Kumar
 
Introduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningIntroduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningVahid Mirjalili
 
Deep learning study 2
Deep learning study 2Deep learning study 2
Deep learning study 2San Kim
 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlowBarbara Fusinska
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningTapas Majumdar
 
Class13_Quicksort_Algorithm.pdf
Class13_Quicksort_Algorithm.pdfClass13_Quicksort_Algorithm.pdf
Class13_Quicksort_Algorithm.pdfAkashSingh625550
 
Lesson_8_DeepLearning.pdf
Lesson_8_DeepLearning.pdfLesson_8_DeepLearning.pdf
Lesson_8_DeepLearning.pdfssuser7f0b19
 
SPICE-MATEX @ DAC15
SPICE-MATEX @ DAC15SPICE-MATEX @ DAC15
SPICE-MATEX @ DAC15Hao Zhuang
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function홍배 김
 
Introduction to Neural networks (under graduate course) Lecture 6 of 9
Introduction to Neural networks (under graduate course) Lecture 6 of 9Introduction to Neural networks (under graduate course) Lecture 6 of 9
Introduction to Neural networks (under graduate course) Lecture 6 of 9Randa Elanwar
 

Similar to 04 Multi-layer Feedforward Networks (20)

Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Lecture 5 backpropagation
Lecture 5 backpropagationLecture 5 backpropagation
Lecture 5 backpropagation
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and Regularization
 
Paper study: Learning to solve circuit sat
Paper study: Learning to solve circuit satPaper study: Learning to solve circuit sat
Paper study: Learning to solve circuit sat
 
Training DNN Models - II.pptx
Training DNN Models - II.pptxTraining DNN Models - II.pptx
Training DNN Models - II.pptx
 
nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyay
 
Reinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural NetsReinforcement Learning and Artificial Neural Nets
Reinforcement Learning and Artificial Neural Nets
 
Introduction to PyTorch
Introduction to PyTorchIntroduction to PyTorch
Introduction to PyTorch
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
Linear regression, costs & gradient descent
Linear regression, costs & gradient descentLinear regression, costs & gradient descent
Linear regression, costs & gradient descent
 
Introduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep LearningIntroduction to Neural Networks and Deep Learning
Introduction to Neural Networks and Deep Learning
 
Deep learning study 2
Deep learning study 2Deep learning study 2
Deep learning study 2
 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlow
 
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018Backpropagation - Elisa Sayrol - UPC Barcelona 2018
Backpropagation - Elisa Sayrol - UPC Barcelona 2018
 
Neural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learningNeural network basic and introduction of Deep learning
Neural network basic and introduction of Deep learning
 
Class13_Quicksort_Algorithm.pdf
Class13_Quicksort_Algorithm.pdfClass13_Quicksort_Algorithm.pdf
Class13_Quicksort_Algorithm.pdf
 
Lesson_8_DeepLearning.pdf
Lesson_8_DeepLearning.pdfLesson_8_DeepLearning.pdf
Lesson_8_DeepLearning.pdf
 
SPICE-MATEX @ DAC15
SPICE-MATEX @ DAC15SPICE-MATEX @ DAC15
SPICE-MATEX @ DAC15
 
The world of loss function
The world of loss functionThe world of loss function
The world of loss function
 
Introduction to Neural networks (under graduate course) Lecture 6 of 9
Introduction to Neural networks (under graduate course) Lecture 6 of 9Introduction to Neural networks (under graduate course) Lecture 6 of 9
Introduction to Neural networks (under graduate course) Lecture 6 of 9
 

Recently uploaded

Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Krashi Coaching
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppCeline George
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docxPoojaSen20
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfsanyamsingh5019
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxSayali Powar
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxiammrhaywood
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdfQucHHunhnh
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfciinovamais
 

Recently uploaded (20)

Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"Mattingly "AI & Prompt Design: The Basics of Prompt Design"
Mattingly "AI & Prompt Design: The Basics of Prompt Design"
 
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
Kisan Call Centre - To harness potential of ICT in Agriculture by answer farm...
 
URLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website AppURLs and Routing in the Odoo 17 Website App
URLs and Routing in the Odoo 17 Website App
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 
mini mental status format.docx
mini    mental       status     format.docxmini    mental       status     format.docx
mini mental status format.docx
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
Staff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSDStaff of Color (SOC) Retention Efforts DDSD
Staff of Color (SOC) Retention Efforts DDSD
 
Sanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdfSanyam Choudhary Chemistry practical.pdf
Sanyam Choudhary Chemistry practical.pdf
 
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptxPOINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
POINT- BIOCHEMISTRY SEM 2 ENZYMES UNIT 5.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptxSOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
SOCIAL AND HISTORICAL CONTEXT - LFTVD.pptx
 
1029 - Danh muc Sach Giao Khoa 10 . pdf
1029 -  Danh muc Sach Giao Khoa 10 . pdf1029 -  Danh muc Sach Giao Khoa 10 . pdf
1029 - Danh muc Sach Giao Khoa 10 . pdf
 
Activity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdfActivity 01 - Artificial Culture (1).pdf
Activity 01 - Artificial Culture (1).pdf
 

04 Multi-layer Feedforward Networks

  • 1. Neural Networks and Fuzzy Systems Multi-layer Feed forward Networks Dr. Tamer Ahmed Farrag Course No.: 803522-3
  • 2. Course Outline Part I : Neural Networks (11 weeks) • Introduction to Machine Learning • Fundamental Concepts of Artificial Neural Networks (ANN) • Single layer Perception Classifier • Multi-layer Feed forward Networks • Single layer FeedBack Networks • Unsupervised learning Part II : Fuzzy Systems (4 weeks) • Fuzzy set theory • Fuzzy Systems 2
  • 3. Outline • Why we need Multi-layer Feed forward Networks (MLFF)? • Error Function (or Cost Function or Loss function) • Gradient Descent • Backpropagation 3
  • 4. Why we need Multi-layer Feed forward Networks (MLFF)? • Overcoming failure of single layer perceptron in solving nonlinear problems. • First Suggestion: • Divide the problem space into smaller linearly separable regions • Use a perceptron for each linearly separable region • Combine the output of multiple hidden neurons to produce a final decision neuron. 4 Region 1 Region 2
  • 5. Why we need Multi-layer Feed forward Networks (MLFF)? • Second suggestion • In some cases we need a curve decision boundary or we try to solve more complicated classification and regression problems. • So, we need to: • Add more layers • Increase a number of neurons in each layer. • Use non linear activation function in the hidden layers. • So , we need Multi-layer Feed forward Networks (MLFF). 5
  • 6. Notation for Multi-Layer Networks • Dealing with multi-layer networks is easy if a sensible notation is adopted. • We simply need another label (n) to tell us which layer in the network we are dealing with. • Each unit j in layer n receives activations 𝑜𝑢𝑡𝑖 (𝑛−1) 𝑤𝑖𝑗 (𝑛) from the previous layer of processing units and sends activations 𝑜𝑢𝑡𝑗 (𝑛) to the next layer of units. 6 1 2 3 1 2 layer (0) layer (1) 𝒘𝒊𝒋 (𝟏) layer (n-1) layer (n) 𝒘𝒊𝒋 (𝒏)
  • 7. ANN Representation (1 input layer + 1 hidden layer +1 output layer) 7 for example: 𝑧1 (1) = (𝑤11 (1) 𝑥1 + 𝑤21 (1) 𝑥2+ 𝑤31 (1) 𝑥3 + 𝑏1 (1) ) 𝑎1 (1) = 𝑓 𝑧1 (1) =σ (𝑧1 (1) ) 𝑧2 (2) = (𝑤12 (2) 𝑎1 (1) + 𝑤22 (2) 𝑎2 (1) + 𝑤32 (2) 𝑎3 (1) + 𝑏2 (2) ) 𝑦2 = 𝑎2 (2) = 𝑓 𝑧2 (2) =σ (𝑧2 (2) ) 𝒛𝒋 (𝒍) = 𝒋 𝒘𝒊𝒋 (𝒍) 𝒂𝒊 (𝒍−𝟏) + 𝒃𝒋 (𝒍) 𝒂𝒋 (𝒍) = 𝒇 𝒛𝒋 𝒍 = σ 𝒛𝒋 𝒍 layer (0) 𝑥1= 𝑎1 (0) 𝑥2= 𝑎2 (0) 𝒛 𝟏 (𝟏) 𝒂 𝟏 (𝟏) 𝒛 𝟐 (𝟏) 𝒂 𝟐 (𝟏) 𝒛 𝟑 (𝟏) 𝒂 𝟑 (𝟏) layer (1) 𝒛 𝟏 (𝟐) 𝒂 𝟏 (𝟐) 𝒛 𝟐 (𝟐) 𝒂 𝟐 (𝟐) layer (2) 𝒘 𝟏𝟏 (𝟏) 𝒘 𝟏𝟐 (𝟏) 𝒘 𝟏𝟑 (𝟏) 𝒘 𝟐𝟏 (𝟏) 𝒘 𝟐𝟐 (𝟏) 𝒘 𝟐𝟑 (𝟏) 𝒘 𝟑𝟏 (𝟏) 𝒘 𝟑𝟐 (𝟏) 𝒘 𝟑𝟑 (𝟏) 𝒘 𝟏𝟏 (𝟐) 𝒘 𝟏𝟐 (𝟐) 𝒘 𝟐𝟏 (𝟐) 𝒘 𝟐𝟐 (𝟐) 𝒘 𝟑𝟏 (𝟐) 𝒘 𝟑𝟐 (𝟐) 𝒚 𝟏 𝒚 𝟐 𝑥2= 𝑎2 (0)
  • 9. Error Function ● how we can evaluate performance of a neuron ???? ● We can use a Error function (or cost function or loss function) to measure how far off we are from the expected value. ● Choosing appropriate Error function help the learning algorithm to reach to best values for weights and biases. ● We’ll use the following variables: ○ D to represent the true value (desired value) ○ y to represent neuron’s prediction 9
  • 10. Error Functions (Cost function or Lost Function) • There are many formulates for error functions. • In this course, we will deal with two Error function formulas. Sum Squared Error (SSE) : 𝑒 𝑝𝑗 = 𝑦𝑗 − 𝐷𝑗 2 for single perceptron 𝐸𝑆𝑆𝐸= 𝑗=1 𝑛 𝑦𝑗 − 𝐷𝑗 2 1 Cross entropy (CE): 𝐸 𝐶𝐸 = 1 𝑛 𝑗=1 𝑛 [𝐷𝑗 ∗ ln(𝑦𝑗) + (1− 𝐷𝑗) ∗ ln(1− 𝑦𝑗)] (2) 10 1 2
  • 11. Why the error in ANN occurs? • Each weight and bias in the network contribute in the occasion of the error. • To solve this we need: • A cost function or error function to compute the error. (SSE or CE Error function) • An optimization algorithm to minimize the error function. (Gradient Decent) • A learning algorithm to modify weights and biases to new values to get the error down. (Backpropagation) • Repeat this operation until find the best solution 11
  • 12. Gradient Decent (in 1 dimension) • Assume we have a error function E and we need to use it to update one weight w • The figure show the error function in terms of w • Our target is to learn the value of w produces the minimum value of E. How? 12 E W minimum
  • 13. Gradient Decent (in 1 dimension) • In Gradient Decent algorithm, we use the following equation to get a better value of w: 𝑤 = 𝑤 − αΔ𝑤 (called Delta rule) Where: α : is the learning rate Δ𝑤 : is mathematically can be computed using derivative of E with respect to w ( 𝑑𝐸 𝑑𝑤 ) 13 E W minimum 𝑤 = 𝑤 − α 𝑑𝐸 𝑑𝑤 (3)
  • 16. Gradient Decent (multi dimension) • In ANN with many layers and many neurons in each layer the Error function will be multi-variable function. • So, the derivative in equation (3) should be partial derivative 𝑤𝑖𝑗 = 𝑤𝑖𝑗 − α 𝜕𝐸 𝑗 𝜕𝑤 𝑖𝑗 (4) • We write equation (4) as : 𝑤𝑖𝑗 = 𝑤𝑖𝑗 − α 𝜕𝑤𝑖𝑗 • Same process will be use to get the new bias value: 𝑏𝑗= 𝑏𝑗 − α 𝜕𝑏𝑗 16
  • 17. derivative of activation functions 17 Sigmoid
  • 18. Learning Rule in the output layer using SSE as error function and sigmoid as Activation function 𝜕𝐸 𝑗 𝜕𝑤 𝑖𝑗 = 𝜕𝐸 𝑗 𝜕𝑎 𝑗 (𝑙) * 𝜕𝑎 𝑗 (𝑙) 𝜕𝑧 𝑗 (𝑙) * 𝜕𝑧 𝑗 (𝑙) 𝜕𝑤𝑖𝑗 (𝑙) Where: 𝐸𝑗 = 𝑗 (𝑦𝑗 − 𝐷𝑗 )2 𝑦𝑗 = 𝑎𝑗 (𝑙) = 𝑓 𝑧𝑗 𝑙 = 𝜎 𝑧𝑗 𝑙 𝑧𝑗 (𝑙) = 𝑗 𝑤𝑖𝑗 (𝑙) 𝑎𝑖 (𝑙−1) + 𝑏𝑗 (𝑙) From the previous table: 𝜎′ 𝑧𝑗 𝑙 = 𝜎 𝑧𝑗 𝑙 ∗ 1 − 𝜎 𝑧𝑗 𝑙 = 𝑦𝑗 (1 − 𝑦𝑗) 18
  • 19. Learning Rule in the output layer (cont.) So (How?), 𝜕𝑦𝑗 𝜕𝑧𝑖 = 𝑦𝑗 (1 − 𝑦𝑗) 𝜕𝑧𝑗 𝜕𝑤𝑖𝑗 = 𝑎𝑖 (𝑙−1) 𝜕𝐸𝑗 𝜕𝑦𝑗 = −2(𝑦𝑗 − 𝐷𝑗 ) • Then: 𝜕𝐸 𝑗 𝜕𝑤 𝑖𝑗 = 2𝑎𝑖 𝑙−1 𝑦𝑗 − 𝐷𝑗 𝑦𝑗 1 − 𝑦𝑗 𝑤𝑖𝑗 = 𝑤𝑖𝑗 − 2 α 𝑎𝑖 𝑙−1 𝑦𝑗 − 𝐷𝑗 𝑦𝑗 1 − 𝑦𝑗 19
  • 20. Learning Rule in the Hidden layer • Now we have to determine the appropriate weight change for an input to hidden weight. • This is more complicated because it depends on the error at all of the nodes this weighted connection can lead to. • The mathematical proof is out our scope. 20
  • 21. Gradient Decent (Notes) Note 1: • the neuron activation function (f ) should be is defined and differentiable function. Note 3: • The calculating of 𝜕𝑤𝑖𝑗 for the hidden layer will be more difficult (Why?) Note 2: • The previous calculation will be repeated for each weight and for each bias in the ANN • So, we need big computational power (what about deeper networks? ) 21
  • 22. Gradient Decent (Notes) • 𝜕𝑤𝑖𝑗 is represent the change in the values of 𝑤𝑖𝑗 to get better output • The equation of 𝜕𝑤𝑖𝑗 is dependent on the choosing of the Error(Cost) function and activation function. • Gradient Decent algorithm help in calculated the new values of weights and bias. • Question: is one iteration (one trail) enough to bet the best values for weights and biases • Answer: No, we need a extended version ? Backpropagation 22
  • 23. How Backpropagation Work? 23 𝒘 𝟏𝟏 (𝟏) 𝒘 𝟏𝟐 (𝟏) 𝒘 𝟐𝟏 (𝟏) 𝒘 𝟐𝟐 (𝟏) 𝒘 𝟑𝟏 (𝟏) 𝒘 𝟑𝟐 (𝟏) 𝒘 𝟏𝟏 (𝟐) 𝒘 𝟐𝟏 (𝟐) 𝒚 𝒂 𝟏 (𝟏) = 𝒘 𝟏𝟏 (𝟏) -𝛼 𝜕𝒘 𝟏𝟏 (𝟏) = 𝒘 𝟏𝟏 (𝟐) -𝛼 𝜕𝒘 𝟏𝟏 (𝟐) 𝑭𝒐𝒓𝒘𝒂𝒓𝒅 𝑷𝒓𝒐𝒑𝒂𝒈𝒂𝒕𝒊𝒐𝒏 𝑩𝒂𝒄𝒌 𝑷𝒓𝒐𝒑𝒂𝒈𝒂𝒕𝒊𝒐𝒏 𝒍𝒂𝒚𝒆𝒓 𝟎 𝒍𝒂𝒚𝒆𝒓 𝟏 𝒍𝒂𝒚𝒆𝒓 𝟐
  • 24. Online Learning vs. Offline Learning • Online: Pattern-by-Pattern learning • Error calculated for each pattern • Weights updated after each individual pattern 𝚫𝒘𝒊𝒋 = −𝜶 𝝏𝑬 𝒑 𝝏𝒘𝒊𝒋 • Offline: Batch learning • Error calculated for all patterns • Weights updated once at the end of each epoch 𝚫𝒘𝒊𝒋 = −𝜶 𝒑 𝝏𝑬 𝒑 𝝏𝒘𝒊𝒋 24
  • 25. Choosing Appropriate Activation and Cost Functions • We already know consideration of single layer networks what output activation and cost functions should be used for particular problem types. • We have also seen that non-linear hidden unit activations are needed, such as sigmoids. • So we can summarize the required network properties: • Regression/ Function Approximation Problems • SSE cost function, linear output activations, sigmoid hidden activations • Classification Problems (2 classes, 1 output) • CE cost function, sigmoid output and hidden activations • Classification Problems (multiple-classes, 1 output per class) • CE cost function, softmax outputs, sigmoid hidden activations • In each case, application of the gradient descent learning algorithm (by computing the partial derivatives) leads to appropriate back-propagation weight update equations. 25
  • 26. Overall picture : learning process on ANN 26
  • 27. Neural network simulator • Search through the internet to find a simulator and report it For example: • https://www.mladdict.com/neural-network- simulator • http://playground.tensorflow.org/ 27