SlideShare una empresa de Scribd logo
1 de 32
Pratap Dangeti
Introduction2
Deep Architecture of ANN
Convolutional Neural Networks
4
5
Fundamentals3
Deep Learning Requirement1
Recurrent Neural Networks6
Deep Autoencoders7
Conclusions and Further Readings8
Table of Contents
Deep Learning Requirement
1. Over the period of time, more drive for Automation,
Artificial Intelligence (E.g.: Autonomous Car, Alphago
from Google Deepmind)
2. Some problems cannot be mathematically
programmed exclusively, instead make machines
learn by itself E.g.: Face recognition
3. Over the period, percentage of unstructured data has
grown to about 90% of total data E.g.: Pictures,
Twitter chats, YouTube videos, WhatsApp Logs etc.
Deep Learning is well suited for Picture, Audio and
Language processing etc.
4. Highly non linear models can be fitted on Big Data
without much issue of over fitting
5. High capacity computational power for cheap makes
the tedious calculations very possible to implement
Fitting Highly Non linear model on Small Data Vs
Big Data
Diversified Utility of Deep Learning
Introduction
• Deep learning is a form of machine learning that uses a
model of computing that's very much inspired by the
structure of the brain. Hence we call this model a neural
network. The basic foundational unit of a neural network is
the neuron)
• Each neuron has a set of inputs, each of which is given a
specific weight. The neuron computes some function on
these weighted inputs. A linear neuron takes a linear
combination of the weighted inputs and apply activation
function (sigmoid, tanh etc.)
• Network feeds the weighted sum of the inputs into
the logistic function (in case of sigmoid activation
function). The logistic function returns a value between 0
and 1. When the weighted sum is very negative, the return
value is very close to 0. When the weighted sum is very
large and positive, the return value is very close to 1
Biological Neuron Artificial Neuron
Number of Neurons in Species
Introduction
• Softwares used in Deep Learning
• Theano: Python based Deep Learning Library
• TensorFlow: Google’s Deep Learning library runs on top of Python/C++
• Keras / Lasagne: Light weight wrapper which sits on top of Theano/TensorFlow, enables faster model
prototyping
• Torch: Lua based Deep Learning library with wide support for machine learning algorithms
• Caffe: Deep Learning library primarily used for processing pictures
• Useful online Courses
• CS231n: Convolutional Neural Networks for Visual Recognition from Stanford university by Andrej Karpathy,
Justin Johnson (http://cs231n.stanford.edu/syllabus.html)
• Machine Learning from Oxford university by Nando de Freitas
(https://www.cs.ox.ac.uk/people/nando.defreitas/machinelearning/)
• Neural Networks for Machine Learning from University of Toronto by Geff Hinton
(https://www.coursera.org/course/neuralnets)
• CS224d: Deep Learning for Natural Language Processing from Stanford university by Richard Socher
(http://cs224d.stanford.edu/)
Fundamentals
• Activation Functions: Every activation function takes a single
number and performs a certain fixed mathematical operation on it.
Below are popularly used activation functions in Deep Learning
• Sigmoid
• Tanh
• Relu
• Linear
• Sigmoid: Sigmoid has mathematical form σ(x) = 1 / (1+e−x). It takes
real valued number and squashes it into range between 0 and 1.
Sigmoid is popular choice, which makes ease of calculating
derivatives and easy to interpret
• Tanh: Tanh squashes the real valued number to the range [-1,1].
Output is zero centered. In practice tanh non-linearity is always
preferred to the sigmoid nonlinearity. Also, it can be proved that
Tanh is scaled sigmoid neuron tanh(x) = 2σ(2x) − 1
Sigmoid Activation Function
.
Tanh Activation Function
Fundamentals
• ReLU (Rectified Linear Unit): ReLU has become very popular in last
few years. It computes the function f(x)=max(0,x). Activation is
simply thresholds at zero
• Linear: Linear activation function is used in Linear regression
problems where it provides derivative always as 1 due to the
function used is f(x) = x
Relu is now popularly being used in place of Sigmoid or Tanh
due to its better property of convergence
ReLU Activation Function
.
Linear Activation Function
Weight9
Activation = g(BiasWeight1 + Weight1 * Input1 + Weight2 * Input2)
Weight1
Weight2
BiasWeight1
Fundamentals
• Forward propagation & Backpropagation: During the forward
propagation stage, features are input to the network and feed
through the subsequent layers to produce the output activations.
• However, we can calculate error of the network only at the
output units but not in the middle/hidden layers. In order to
update the weights to optimal, we must propagate the network’s
errors backwards through its layers
Forward Propagation of Layer 1 Neurons
Hidden
1
Hidden
2
Hidden
3
Input 1
Input 2
Hidden
4
Hidden
5
Hidden
6
Output
1
Output
2
Hidden Layer 1 Hidden Layer 2
Output LayerInput Layer
Activation = g(BiasWeight4 + Weight7 * Hidden1 + Weight8 * Hidden2+ Weight9 *
Hidden3)
Weight7
Weight8
BiasWeight4
Forward Propagation of Layer 2 Neurons
Hidden
1
Hidden
2
Hidden
3
Input 1
Input 2
Hidden
4
Hidden
5
Hidden
6
Output
1
Output
2
Hidden Layer 1 Hidden Layer 2
Output LayerInput Layer
Fundamentals
Weight18
Activation = g(BiasWeight7 + Weight16 * Hidden4 + Weight17 * Hidden5+ Weight18 *
Hidden6)
Weight16
Weight17
BiasWeight7
Forward Propagation of Output Layer Neurons
Hidden
1
Hidden
2
Hidden
3
Input 1
Input 2
Hidden
4
Hidden
5
Hidden
6
Output
1
Output
2
Hidden Layer 1 Hidden Layer 2
Output Layer
Input Layer
Error (Output 1)= g’(Output 1) * (True1 – Output1)
Backpropagation of Output Layer Neurons
Hidden
1
Hidden
2
Hidden
3
Input 1
Input 2
Hidden
4
Hidden
5
Hidden
6
Output
1
Output
2
Hidden Layer 1 Hidden Layer 2
Output Layer
Input Layer
Weight13
Weight10
Weight7
Weight19
Fundamentals
Error (Hidden4)= g’(Hidden4) + (Weight16 * Error(Output1) + Weight19 * Error(Output2))
Weight16
Backpropagation of Hidden Layer2 Neurons
Hidden
1
Hidden
2
Hidden
3
Input 1
Input 2
Hidden
4
Hidden
5
Hidden
6
Output
1
Output
2
Hidden Layer 1 Hidden Layer 2
Output Layer
Input Layer
Error (Hidden1)= g’(Hidden1) * (Weight7 * Error(Hidden4) + Weight10 *
Error(Hidden5) + Weight13*Error(Hidden6))
Backpropagation of Hidden Layer1 Neurons
Hidden
1
Hidden
2
Hidden
3
Input 1
Input 2
Hidden
4
Hidden
5
Hidden
6
Output
1
Output
2
Hidden Layer 1 Hidden Layer 2
Output Layer
Input Layer
Weight8
BiasWeight4
Weight1
Input Layer
BiasWeight1
Weight2
Fundamentals
BiasWeight1= BiasWeight1 + a * Error(Hidden1) * 1
Weight1 = Weight1 + a * Error(Hidden1) * Input1
Weight2 = Weight2 + a * Error(Hidden1) * Input2
Updating Weights of Hidden Layer 1 – Input Layer
during Backpropagation
Hidden
1
Hidden
2
Hidden
3
Input 1
Input 2
Hidden
4
Hidden
5
Hidden
6
Output
1
Output
2
Hidden Layer 1 Hidden Layer 2
Output Layer
Weight7
Input Layer
Weight9
BiasWeight4= BiasWeight4 + a * Error(Hidden4) * 1
Weight7 = Weight7 + a * Error(Hidden4) * Hidden1
Weight8 = Weight8 + a * Error(Hidden4) * Hidden2
Weight9 = Weight9 + a * Error(Hidden4) * Hidden3
Updating Weights of Hidden Layer 1 – Hidden
Layer 2 during Forward Propagation
Hidden
1
Hidden
2
Hidden
3
Input 1
Input 2
Hidden
4
Hidden
5
Hidden
6
Output
1
Output
2
Hidden Layer 1
Output Layer
Hidden Layer 2
Fundamentals
• Dropout: Dropout is a regularization in Neural networks to avoid
over fitting the data. Typically Dropout is 0.8 (80 % neurons
present randomly all the time) in initial layers and 0.5 in middle
layers
• Optimization: Various techniques used to optimize the weights
including
• SGD (Stochastic Gradient Descent)
• Momentum
• Nag (Nesterov Accelerated Gradient)
• Adagrad (Adaptive gradient)
• Adadelta
• Rmsprop
• Adam (Adaptive moment estimation)
In practice Adam is good default choice, if you cant afford full
batch updates, then try out L-BFGS
Application of Dropout in Neural network
Optimization of Error Surface
Fundamentals
• Stochastic Gradient Descent (SGD): Gradient descent is a way to
minimize an objective function J(θ) parameterized by a model’s
parameter θ∈Rd by updating the parameters in the opposite
direction of the gradient of the objective function w.r.to the
parameters. Learning rate determines the size of steps taken to
reach minimum.
• Batch Gradient Descent (all training observations per each iteration)
• SGD (1 observation per iteration)
• Mini Batch Gradient Descent (size of about 50 training observations for each
iteration)
• Momentum: SGD has trouble navigating surface curves much more
steeply in one dimension than in other, in these scenarios SGD
oscillates across the slopes of the ravine while only making hesitant
progress along the bottom towards the local optimum
(When using momentum we push a ball down a hill. Ball accumulates momentum as it rolls
downhill, becoming faster and faster on the way until it stops (due to air resistance etc.)
similarly momentum
term increases for dimensions whose gradients point in the same direction and reduces
updates for dimensions whose gradients change directions. As a result, we gain faster
convergence and reduced oscillations)
Gradient Descent
Comparison of SGD without & with Momentum
Fundamentals
• Nesterov Accelerated Gradient (NAG): If a ball rolls down a hill and
blindly follows a slope, is highly unsatisfactory and it should have a
notion of where it is going so that it knows to slow down before the
hill slopes up again. NAG is a way to give momentum term this kind
of prescience
(While momentum first computes the current gradient (small blue vector) and then takes a big
jump in the direction of the updated accumulated gradient (big blue vector), NAG first makes a
big jump in the direction of the previous accumulated gradient (brown vector), measures the
gradient and then makes a correction (green vector). This anticipatory update prevents the ball
from going too fast and results in increased responsiveness and performance)
• Adagrad: Adagrad is an algorithm for gradient-based optimization
that adapts the differential learning rate to parameters, performing
larger updates for infrequent and smaller updates for frequent
parameters
(Adagrad greatly improves the robustness of SGD and used it to training large-scale neural
nets. One of the Adagrad’s main benefits is that it eliminates the need to manually tune the
learning rate. Most implementations use a default value of 0.01 and leave it at that.
Adagrad's main weakness is its accumulation of the squared gradients in the denominator:
Since every added term is positive, the accumulated sum keeps growing during training. This
in turn causes the learning rate to shrink and eventually become infinitesimally small, at which
point the algorithm is no longer able to acquire additional knowledge. The following
algorithms aim to resolve this flaw.)
Nesterov Momentum Update
Fundamentals
• Adadelta: Adadelta is an extension of Adagrad that seeks to reduce
its aggressive, monotonically decreasing learning rate. Instead of
accumulating all past squared gradients, Adadelta restricts the
window of accumulated past gradients to some fixed size w
(Instead of inefficiently storing W previous squared gradients, the sum of gradients is
recursively defined as a decaying average of all past squared gradients)
• RMSprop: RMSprop and Adadelta have both developed independently
around the same time to resolve Adagrad’s radically diminishing
learning rates
(RMSprop as well divides the learning rate by an exponentially decaying average of squared
gradients)
• Adam (Adaptive Moment Estimation): Adam is another method that
computes adaptive learning rates for each parameter. In addition to
storing an exponentially decaying average of past squared gradients
like Adadelta and RMSprop, Adam also keeps an exponentially
decaying average of past gradients similar to momentum
In practice Adam gives best results. For complete details on all methods refer:
http://sebastianruder.com/optimizing-gradient-descent/index.html#batchgradientdescent
Deep Architecture of ANN (Artificial Neural Network)
• In Multi Layer /Deep Architecture, each layer is fully
connected with the subsequent layer. Output of each
artificial neuron in a layer is an input to every artificial
neuron in the next layer towards the output
• Solving Methodology: Back propagation used to solve deep
layers by calculating the error of the network at output
units and propagate back through layers
• Thumb rules:
• All hidden layers should have same number of
neurons per layer
• Typically 2 hidden layers are good enough to solve
majority of problems
• Using scaling/batch normalization (mean 0, variance
1) for all input variables after each layer improves
convergence effectiveness
• Reduction in step size after each iteration improves
convergence, in addition to usage of momentum &
Dropout
Deep Architecture
Decision Boundary of Deep Architecture
Deep Architecture of ANN (Artificial Neural Network)
• Case Study: To predict the survival (0 or 1) on Titanic
based on few characteristics like Class, Age, Gender,
Fare etc.
Probability of Survival in Titanic Disaster
Method Test score Settings
ANN 0.7799 512-512-512-1, nb_epoch = 100, batchsize = 32
Adaboost 0.77033 ntree = 100, lrate = 0.04, algo = SAMME.R
Randomforest 0.77033
ntree = 100, maxdepth = 4, criteria =
gini,max_features = auto
Gradientboost 0.76555
ntree = 100, lrate = 0.04, maxdep =5, maxfeatures =
auto
XGBoost 0.76077 ntree = 100, lrate = 0.04, maxdep = 5
Logistic
Regression 0.7512 NA
Hidden
Layer 1
Hidden
Layer 2
Hidden
Layer 3
Output
Layer
Convolutional Neural Networks
• Convolutional Neural Networks used in picture analysis,
including image captioning, digit recognizer and various
visual system processing E.g.: Vision detection in Self
driving cars, Hand written Digit recognizer, Google
Deepmind’ s Alphago
Object recognition and classification using
Convolutional Networks
CNN application in Self Driving Cars CNN application in Handwritten Digit Recognizer
Convolutional Neural Networks
• Hubel & Wiesel inserted microscopic electrodes into the
visual cortex of anesthetized cat to read activity of the
single cells in visual cortex while presenting various
stimuli to it’s eyes during experiment on 1959. For which
received noble prize under Medicine category on 1981
• Hubel & Wiesel discovered that vision is hierarchical,
consists of simple cells, complex cells & hyper-complex
cells
Hubel & Weisel experiments on Cat’s Vision
Vision is Hierarchical phenomenon
Formation of features over layers using Neural Networks
Object detection using Edges
Convolutional Neural Networks
• Input layer/picture consists of 32 x 32 pixels with 3 colors
(Red, Green & Blue) (32 x 32 x 3)
• Convolution layer is formed by running a filter (5 x 5 x 3)
over Input layer which will result in (28 x 28 x 1)
Input Layer & Filter
Running filter over Input Layer to form Convolution layer Complete Convolution Layer from filter
Convolutional Neural Networks
• 2nd Convolution layer has been created in similar way with
another filter
• After striding/convolving with 6 filters, new layer has been
created with 28 x 28 x 6 dimension
Complete Convolution layer from Filter 2
Convolution layers created with 6 Filters Formation of complete 2nd layer
Convolutional Neural Networks
• Pooling Layer: Pooling layer makes the representation
smaller and more manageable. Operates over each
activation map independently. Pooling applies on width
and breadth of the layer and depth will remains the same
during pooling stage
• Padding: Size of the image (width & breadth) is getting
shrunk consecutively, this issue is undesirable during deep
networks, padding keeps the size of picture constant or
controllable in size throughout the network
Max pooling working methodology
Max pool layer after performing pooling
Zero padding on 6 x 6 picture
Convolutional Neural Networks
• Alex net Architecture: Alex Net won the IMAGENET challenge
competition during 2012
• Layer 0: Input image (227 * 227 * 3 ~= 150k)
• Layer 1: Convolution with 96 filters, size 11×11, stride 4, padding 0
• Layer 2: Max-Pooling with 3×3 filter, stride 2
• Layer 3: Convolution with 256 filters, size 5×5, stride 1, padding 2
• Layer 4: Max-Pooling with 3×3 filter, stride 2
• Layer 5: Convolution with 384 filters, size 3×3, stride 1, padding 1
• Layer 6: Convolution with 384 filters, size 3×3, stride 1, padding 1
• Layer 7: Convolution with 256 filters, size 3×3, stride 1, padding 1
• Layer 8: Max-Pooling with 3×3 filter, stride 2
• Layer 9: Fully Connected with 4096 neuron
• Layer 10: Fully Connected with 4096 neuron
• Layer 11: Fully Connected with 1000 neurons (classes to predict)
Total memory required 24M * 4 bytes ~= 93 MB/image (only forward !~
*2 for bwd)
Alex Net for IMAGENET Challenge 2012
Convolutional Neural Networks
• Case Study: kaggle Digit recognizer to recognize
handwritten digits
• Following implementation made the score 0.99314
(ideal score 1) 46th rank (3.5 %) out of 1314 teams
in Public Leaderboard
Digit Recognizer to classify Hand Written digits
Layer 1 consists of 2
Convolutional layers
followed by Max pooling
layer
}
}
Layer 2 consists of 2
Convolutional layers
followed by Max pooling
layer
} Layer 3 consists of
Dense network with
Dropout 0.5
Layer 4 is Softmax layer
for multiclass (10)
outputs
Recurrent Neural Networks
• Recurrent neural networks are very much useful in sequence
remembering, time series forecasting, Image captioning,
machine translation etc.
• RNNs are useful in building A.I. Chabot in which sequence of
words with all syntaxes & semantics would be remembered and
subsequently provide answers to given questions
Recurrent Neural Networks
Image Captioning using Convolutional and
Recurrent Neural Network
Application of RNN in A.I. Chatbot
Recurrent Neural Networks
• Recurrent neural network is used for processing sequence
of vectors x by applying a recurrence formula at every time
step
Recurrent Neural Network
Vanilla Network Image Captioning
(image -> Seq. of words)
Sentiment Classification
(Seq. of words -> Sentiment)
Machine Translation
(Seq. of words -> Seq. of words)
Video Classification on
frame level
yt
x
t
RNN
y0
x0
RNN
y1
x1
RNN
y2
x2
RNN
yt
x
t
RNN
Recurrent Neural Networks
• Vanishing gradient problem with RNN: Gradients do
vanishes quickly with more number of layers and this issue
is severe with RNN. Vanishing gradients leads to slow
training rates. LSTM & GRU are used to avoid this issue
• LSTM (Long Short Term Memory): LSTM is an artificial
neural network contains LSTM blocks in addition to regular
network units. LSTM block contains gates that determine
when the input is significant enough to remember, when it
should continue to remember or when it should forget the
value and when it should output the value
LSTM Working Principle (Backpropagation
through a memory cell)
LSTM Cell
RNN & LSTM formula
Recurrent Neural Networks
• Case Study: NIFTY prediction NIFTY 1 Year EOD data
}
}
}
}
Layer 1 consists of 1000
Recurrent LSTM neurons
Layer 2 consists of 1000
Recurrent LSTM neurons
Layer 3 consists of 1000
Recurrent LSTM neurons
Layer 4 consists of 1000
Recurrent LSTM neurons
with return sequence
False
Output Layer consists of
1 neuron with linear
activation function
Deep Autoencoders
• Deep Autoencoder: Autoencoder neural network is an
unsupervised learning algorithm that applies
backpropagation. Stacking layers of Autoencoders
produces a deeper architecture known as Stacked or Deep
Autoencoders
• Application of Encoders in Face recognition, Speech
recognition, Signal Denoising etc.
PCA vs Deep Autoencoder for MNIST Data
Face Recognition using Deep Autoencoders
Deep Autoencoders
• Deep Autoencoder: Autoencoder neural network is an
unsupervised learning algorithm that applies
backpropagation, setting the target values to be equal to
the inputs. i.e. it uses y ( i ) = x ( i )
• Typically deep Autoencoder is composed of two segments,
encoding network and decoding network.
Deep Autoencoder Examples Training Deep Autoencoder
Autoencoder with Classifier
Reconstruction of features with weight transpose
Deep learning

Más contenido relacionado

La actualidad más candente

Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationYan Xu
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkKnoldus Inc.
 
Deep Learning Explained
Deep Learning ExplainedDeep Learning Explained
Deep Learning ExplainedMelanie Swan
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural NetworksAshray Bhandare
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networksAkash Goel
 
Deep Learning With Python Tutorial | Edureka
Deep Learning With Python Tutorial | EdurekaDeep Learning With Python Tutorial | Edureka
Deep Learning With Python Tutorial | EdurekaEdureka!
 
Back propagation
Back propagationBack propagation
Back propagationNagarajan
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningOswald Campesato
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learningleopauly
 
CNN Machine learning DeepLearning
CNN Machine learning DeepLearningCNN Machine learning DeepLearning
CNN Machine learning DeepLearningAbhishek Sharma
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...Simplilearn
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksChristian Perone
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkRichard Kuo
 
Intro to deep learning
Intro to deep learning Intro to deep learning
Intro to deep learning David Voyles
 

La actualidad más candente (20)

Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and Regularization
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Deep learning
Deep learningDeep learning
Deep learning
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Deep Learning Explained
Deep Learning ExplainedDeep Learning Explained
Deep Learning Explained
 
Cnn method
Cnn methodCnn method
Cnn method
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
Deep learning
Deep learning Deep learning
Deep learning
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
 
Deep Learning With Python Tutorial | Edureka
Deep Learning With Python Tutorial | EdurekaDeep Learning With Python Tutorial | Edureka
Deep Learning With Python Tutorial | Edureka
 
Back propagation
Back propagationBack propagation
Back propagation
 
AlexNet
AlexNetAlexNet
AlexNet
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
 
CNN Machine learning DeepLearning
CNN Machine learning DeepLearningCNN Machine learning DeepLearning
CNN Machine learning DeepLearning
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
What Is Deep Learning? | Introduction to Deep Learning | Deep Learning Tutori...
 
Deep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural NetworksDeep Learning - Convolutional Neural Networks
Deep Learning - Convolutional Neural Networks
 
Machine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural NetworkMachine Learning - Convolutional Neural Network
Machine Learning - Convolutional Neural Network
 
Intro to deep learning
Intro to deep learning Intro to deep learning
Intro to deep learning
 

Destacado

Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearnPratap Dangeti
 
Deep Learning: a birds eye view
Deep Learning: a birds eye viewDeep Learning: a birds eye view
Deep Learning: a birds eye viewRoelof Pieters
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep LearningPoo Kuan Hoong
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual IntroductionLukas Masuch
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningSujit Pal
 
Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Jen Aman
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsRoelof Pieters
 

Destacado (7)

Machine learning with scikitlearn
Machine learning with scikitlearnMachine learning with scikitlearn
Machine learning with scikitlearn
 
Deep Learning: a birds eye view
Deep Learning: a birds eye viewDeep Learning: a birds eye view
Deep Learning: a birds eye view
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
Deep learning - A Visual Introduction
Deep learning - A Visual IntroductionDeep learning - A Visual Introduction
Deep learning - A Visual Introduction
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow Large Scale Deep Learning with TensorFlow
Large Scale Deep Learning with TensorFlow
 
Deep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word EmbeddingsDeep Learning for NLP: An Introduction to Neural Word Embeddings
Deep Learning for NLP: An Introduction to Neural Word Embeddings
 

Similar a Deep learning

Auto encoders in Deep Learning
Auto encoders in Deep LearningAuto encoders in Deep Learning
Auto encoders in Deep LearningShajun Nisha
 
Online learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and HadoopOnline learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and HadoopHéloïse Nonne
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Balázs Hidasi
 
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Universitat Politècnica de Catalunya
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Simplilearn
 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlowBarbara Fusinska
 
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer FarooquiDatabricks
 
08 neural networks
08 neural networks08 neural networks
08 neural networksankit_ppt
 
Deep learning from scratch
Deep learning from scratch Deep learning from scratch
Deep learning from scratch Eran Shlomo
 
19 - Neural Networks I.pptx
19 - Neural Networks I.pptx19 - Neural Networks I.pptx
19 - Neural Networks I.pptxEmanAl15
 
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종NAVER D2
 
Machine Learning With Neural Networks
Machine Learning  With Neural NetworksMachine Learning  With Neural Networks
Machine Learning With Neural NetworksKnoldus Inc.
 
ML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxDebabrataPain1
 
Introduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner'sIntroduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner'sVidyasagar Bhargava
 
Deep Learning Tutorial
Deep Learning Tutorial Deep Learning Tutorial
Deep Learning Tutorial Ligeng Zhu
 
Synthetic Gradients - Decoupling Layers of a Neural Nets
Synthetic Gradients - Decoupling Layers of a Neural NetsSynthetic Gradients - Decoupling Layers of a Neural Nets
Synthetic Gradients - Decoupling Layers of a Neural NetsAnuj Gupta
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksJeremy Nixon
 

Similar a Deep learning (20)

Auto encoders in Deep Learning
Auto encoders in Deep LearningAuto encoders in Deep Learning
Auto encoders in Deep Learning
 
Online learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and HadoopOnline learning, Vowpal Wabbit and Hadoop
Online learning, Vowpal Wabbit and Hadoop
 
Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017Deep Learning in Recommender Systems - RecSys Summer School 2017
Deep Learning in Recommender Systems - RecSys Summer School 2017
 
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
Multilayer Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intell...
 
Deeplearning
Deeplearning Deeplearning
Deeplearning
 
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
Deep Learning Interview Questions And Answers | AI & Deep Learning Interview ...
 
Deep learning with TensorFlow
Deep learning with TensorFlowDeep learning with TensorFlow
Deep learning with TensorFlow
 
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 Separating Hype from Reality in Deep Learning with Sameer Farooqui Separating Hype from Reality in Deep Learning with Sameer Farooqui
Separating Hype from Reality in Deep Learning with Sameer Farooqui
 
08 neural networks
08 neural networks08 neural networks
08 neural networks
 
Deep learning from scratch
Deep learning from scratch Deep learning from scratch
Deep learning from scratch
 
Development of Deep Learning Architecture
Development of Deep Learning ArchitectureDevelopment of Deep Learning Architecture
Development of Deep Learning Architecture
 
19 - Neural Networks I.pptx
19 - Neural Networks I.pptx19 - Neural Networks I.pptx
19 - Neural Networks I.pptx
 
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
[2C1] 아파치 피그를 위한 테즈 연산 엔진 개발하기 최종
 
Machine Learning With Neural Networks
Machine Learning  With Neural NetworksMachine Learning  With Neural Networks
Machine Learning With Neural Networks
 
ML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptxML Module 3 Non Linear Learning.pptx
ML Module 3 Non Linear Learning.pptx
 
Introduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner'sIntroduction to Deep learning and H2O for beginner's
Introduction to Deep learning and H2O for beginner's
 
Deep Learning Tutorial
Deep Learning Tutorial Deep Learning Tutorial
Deep Learning Tutorial
 
Deep learning (2)
Deep learning (2)Deep learning (2)
Deep learning (2)
 
Synthetic Gradients - Decoupling Layers of a Neural Nets
Synthetic Gradients - Decoupling Layers of a Neural NetsSynthetic Gradients - Decoupling Layers of a Neural Nets
Synthetic Gradients - Decoupling Layers of a Neural Nets
 
Understanding Convolutional Neural Networks
Understanding Convolutional Neural NetworksUnderstanding Convolutional Neural Networks
Understanding Convolutional Neural Networks
 

Último

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxolyaivanovalion
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...amitlee9823
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxolyaivanovalion
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightDelhi Call girls
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxolyaivanovalion
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 

Último (20)

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
BigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptxBigBuy dropshipping via API with DroFx.pptx
BigBuy dropshipping via API with DroFx.pptx
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
(NEHA) Call Girls Katra Call Now 8617697112 Katra Escorts 24x7
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Doha Qatar (+966572737505 ! Get Cytotec
Abortion pills in Doha Qatar (+966572737505 ! Get Cytotec
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 

Deep learning

  • 2. Introduction2 Deep Architecture of ANN Convolutional Neural Networks 4 5 Fundamentals3 Deep Learning Requirement1 Recurrent Neural Networks6 Deep Autoencoders7 Conclusions and Further Readings8 Table of Contents
  • 3. Deep Learning Requirement 1. Over the period of time, more drive for Automation, Artificial Intelligence (E.g.: Autonomous Car, Alphago from Google Deepmind) 2. Some problems cannot be mathematically programmed exclusively, instead make machines learn by itself E.g.: Face recognition 3. Over the period, percentage of unstructured data has grown to about 90% of total data E.g.: Pictures, Twitter chats, YouTube videos, WhatsApp Logs etc. Deep Learning is well suited for Picture, Audio and Language processing etc. 4. Highly non linear models can be fitted on Big Data without much issue of over fitting 5. High capacity computational power for cheap makes the tedious calculations very possible to implement Fitting Highly Non linear model on Small Data Vs Big Data
  • 4. Diversified Utility of Deep Learning
  • 5. Introduction • Deep learning is a form of machine learning that uses a model of computing that's very much inspired by the structure of the brain. Hence we call this model a neural network. The basic foundational unit of a neural network is the neuron) • Each neuron has a set of inputs, each of which is given a specific weight. The neuron computes some function on these weighted inputs. A linear neuron takes a linear combination of the weighted inputs and apply activation function (sigmoid, tanh etc.) • Network feeds the weighted sum of the inputs into the logistic function (in case of sigmoid activation function). The logistic function returns a value between 0 and 1. When the weighted sum is very negative, the return value is very close to 0. When the weighted sum is very large and positive, the return value is very close to 1 Biological Neuron Artificial Neuron Number of Neurons in Species
  • 6. Introduction • Softwares used in Deep Learning • Theano: Python based Deep Learning Library • TensorFlow: Google’s Deep Learning library runs on top of Python/C++ • Keras / Lasagne: Light weight wrapper which sits on top of Theano/TensorFlow, enables faster model prototyping • Torch: Lua based Deep Learning library with wide support for machine learning algorithms • Caffe: Deep Learning library primarily used for processing pictures • Useful online Courses • CS231n: Convolutional Neural Networks for Visual Recognition from Stanford university by Andrej Karpathy, Justin Johnson (http://cs231n.stanford.edu/syllabus.html) • Machine Learning from Oxford university by Nando de Freitas (https://www.cs.ox.ac.uk/people/nando.defreitas/machinelearning/) • Neural Networks for Machine Learning from University of Toronto by Geff Hinton (https://www.coursera.org/course/neuralnets) • CS224d: Deep Learning for Natural Language Processing from Stanford university by Richard Socher (http://cs224d.stanford.edu/)
  • 7. Fundamentals • Activation Functions: Every activation function takes a single number and performs a certain fixed mathematical operation on it. Below are popularly used activation functions in Deep Learning • Sigmoid • Tanh • Relu • Linear • Sigmoid: Sigmoid has mathematical form σ(x) = 1 / (1+e−x). It takes real valued number and squashes it into range between 0 and 1. Sigmoid is popular choice, which makes ease of calculating derivatives and easy to interpret • Tanh: Tanh squashes the real valued number to the range [-1,1]. Output is zero centered. In practice tanh non-linearity is always preferred to the sigmoid nonlinearity. Also, it can be proved that Tanh is scaled sigmoid neuron tanh(x) = 2σ(2x) − 1 Sigmoid Activation Function . Tanh Activation Function
  • 8. Fundamentals • ReLU (Rectified Linear Unit): ReLU has become very popular in last few years. It computes the function f(x)=max(0,x). Activation is simply thresholds at zero • Linear: Linear activation function is used in Linear regression problems where it provides derivative always as 1 due to the function used is f(x) = x Relu is now popularly being used in place of Sigmoid or Tanh due to its better property of convergence ReLU Activation Function . Linear Activation Function
  • 9. Weight9 Activation = g(BiasWeight1 + Weight1 * Input1 + Weight2 * Input2) Weight1 Weight2 BiasWeight1 Fundamentals • Forward propagation & Backpropagation: During the forward propagation stage, features are input to the network and feed through the subsequent layers to produce the output activations. • However, we can calculate error of the network only at the output units but not in the middle/hidden layers. In order to update the weights to optimal, we must propagate the network’s errors backwards through its layers Forward Propagation of Layer 1 Neurons Hidden 1 Hidden 2 Hidden 3 Input 1 Input 2 Hidden 4 Hidden 5 Hidden 6 Output 1 Output 2 Hidden Layer 1 Hidden Layer 2 Output LayerInput Layer Activation = g(BiasWeight4 + Weight7 * Hidden1 + Weight8 * Hidden2+ Weight9 * Hidden3) Weight7 Weight8 BiasWeight4 Forward Propagation of Layer 2 Neurons Hidden 1 Hidden 2 Hidden 3 Input 1 Input 2 Hidden 4 Hidden 5 Hidden 6 Output 1 Output 2 Hidden Layer 1 Hidden Layer 2 Output LayerInput Layer
  • 10. Fundamentals Weight18 Activation = g(BiasWeight7 + Weight16 * Hidden4 + Weight17 * Hidden5+ Weight18 * Hidden6) Weight16 Weight17 BiasWeight7 Forward Propagation of Output Layer Neurons Hidden 1 Hidden 2 Hidden 3 Input 1 Input 2 Hidden 4 Hidden 5 Hidden 6 Output 1 Output 2 Hidden Layer 1 Hidden Layer 2 Output Layer Input Layer Error (Output 1)= g’(Output 1) * (True1 – Output1) Backpropagation of Output Layer Neurons Hidden 1 Hidden 2 Hidden 3 Input 1 Input 2 Hidden 4 Hidden 5 Hidden 6 Output 1 Output 2 Hidden Layer 1 Hidden Layer 2 Output Layer Input Layer
  • 11. Weight13 Weight10 Weight7 Weight19 Fundamentals Error (Hidden4)= g’(Hidden4) + (Weight16 * Error(Output1) + Weight19 * Error(Output2)) Weight16 Backpropagation of Hidden Layer2 Neurons Hidden 1 Hidden 2 Hidden 3 Input 1 Input 2 Hidden 4 Hidden 5 Hidden 6 Output 1 Output 2 Hidden Layer 1 Hidden Layer 2 Output Layer Input Layer Error (Hidden1)= g’(Hidden1) * (Weight7 * Error(Hidden4) + Weight10 * Error(Hidden5) + Weight13*Error(Hidden6)) Backpropagation of Hidden Layer1 Neurons Hidden 1 Hidden 2 Hidden 3 Input 1 Input 2 Hidden 4 Hidden 5 Hidden 6 Output 1 Output 2 Hidden Layer 1 Hidden Layer 2 Output Layer Input Layer
  • 12. Weight8 BiasWeight4 Weight1 Input Layer BiasWeight1 Weight2 Fundamentals BiasWeight1= BiasWeight1 + a * Error(Hidden1) * 1 Weight1 = Weight1 + a * Error(Hidden1) * Input1 Weight2 = Weight2 + a * Error(Hidden1) * Input2 Updating Weights of Hidden Layer 1 – Input Layer during Backpropagation Hidden 1 Hidden 2 Hidden 3 Input 1 Input 2 Hidden 4 Hidden 5 Hidden 6 Output 1 Output 2 Hidden Layer 1 Hidden Layer 2 Output Layer Weight7 Input Layer Weight9 BiasWeight4= BiasWeight4 + a * Error(Hidden4) * 1 Weight7 = Weight7 + a * Error(Hidden4) * Hidden1 Weight8 = Weight8 + a * Error(Hidden4) * Hidden2 Weight9 = Weight9 + a * Error(Hidden4) * Hidden3 Updating Weights of Hidden Layer 1 – Hidden Layer 2 during Forward Propagation Hidden 1 Hidden 2 Hidden 3 Input 1 Input 2 Hidden 4 Hidden 5 Hidden 6 Output 1 Output 2 Hidden Layer 1 Output Layer Hidden Layer 2
  • 13. Fundamentals • Dropout: Dropout is a regularization in Neural networks to avoid over fitting the data. Typically Dropout is 0.8 (80 % neurons present randomly all the time) in initial layers and 0.5 in middle layers • Optimization: Various techniques used to optimize the weights including • SGD (Stochastic Gradient Descent) • Momentum • Nag (Nesterov Accelerated Gradient) • Adagrad (Adaptive gradient) • Adadelta • Rmsprop • Adam (Adaptive moment estimation) In practice Adam is good default choice, if you cant afford full batch updates, then try out L-BFGS Application of Dropout in Neural network Optimization of Error Surface
  • 14. Fundamentals • Stochastic Gradient Descent (SGD): Gradient descent is a way to minimize an objective function J(θ) parameterized by a model’s parameter θ∈Rd by updating the parameters in the opposite direction of the gradient of the objective function w.r.to the parameters. Learning rate determines the size of steps taken to reach minimum. • Batch Gradient Descent (all training observations per each iteration) • SGD (1 observation per iteration) • Mini Batch Gradient Descent (size of about 50 training observations for each iteration) • Momentum: SGD has trouble navigating surface curves much more steeply in one dimension than in other, in these scenarios SGD oscillates across the slopes of the ravine while only making hesitant progress along the bottom towards the local optimum (When using momentum we push a ball down a hill. Ball accumulates momentum as it rolls downhill, becoming faster and faster on the way until it stops (due to air resistance etc.) similarly momentum term increases for dimensions whose gradients point in the same direction and reduces updates for dimensions whose gradients change directions. As a result, we gain faster convergence and reduced oscillations) Gradient Descent Comparison of SGD without & with Momentum
  • 15. Fundamentals • Nesterov Accelerated Gradient (NAG): If a ball rolls down a hill and blindly follows a slope, is highly unsatisfactory and it should have a notion of where it is going so that it knows to slow down before the hill slopes up again. NAG is a way to give momentum term this kind of prescience (While momentum first computes the current gradient (small blue vector) and then takes a big jump in the direction of the updated accumulated gradient (big blue vector), NAG first makes a big jump in the direction of the previous accumulated gradient (brown vector), measures the gradient and then makes a correction (green vector). This anticipatory update prevents the ball from going too fast and results in increased responsiveness and performance) • Adagrad: Adagrad is an algorithm for gradient-based optimization that adapts the differential learning rate to parameters, performing larger updates for infrequent and smaller updates for frequent parameters (Adagrad greatly improves the robustness of SGD and used it to training large-scale neural nets. One of the Adagrad’s main benefits is that it eliminates the need to manually tune the learning rate. Most implementations use a default value of 0.01 and leave it at that. Adagrad's main weakness is its accumulation of the squared gradients in the denominator: Since every added term is positive, the accumulated sum keeps growing during training. This in turn causes the learning rate to shrink and eventually become infinitesimally small, at which point the algorithm is no longer able to acquire additional knowledge. The following algorithms aim to resolve this flaw.) Nesterov Momentum Update
  • 16. Fundamentals • Adadelta: Adadelta is an extension of Adagrad that seeks to reduce its aggressive, monotonically decreasing learning rate. Instead of accumulating all past squared gradients, Adadelta restricts the window of accumulated past gradients to some fixed size w (Instead of inefficiently storing W previous squared gradients, the sum of gradients is recursively defined as a decaying average of all past squared gradients) • RMSprop: RMSprop and Adadelta have both developed independently around the same time to resolve Adagrad’s radically diminishing learning rates (RMSprop as well divides the learning rate by an exponentially decaying average of squared gradients) • Adam (Adaptive Moment Estimation): Adam is another method that computes adaptive learning rates for each parameter. In addition to storing an exponentially decaying average of past squared gradients like Adadelta and RMSprop, Adam also keeps an exponentially decaying average of past gradients similar to momentum In practice Adam gives best results. For complete details on all methods refer: http://sebastianruder.com/optimizing-gradient-descent/index.html#batchgradientdescent
  • 17. Deep Architecture of ANN (Artificial Neural Network) • In Multi Layer /Deep Architecture, each layer is fully connected with the subsequent layer. Output of each artificial neuron in a layer is an input to every artificial neuron in the next layer towards the output • Solving Methodology: Back propagation used to solve deep layers by calculating the error of the network at output units and propagate back through layers • Thumb rules: • All hidden layers should have same number of neurons per layer • Typically 2 hidden layers are good enough to solve majority of problems • Using scaling/batch normalization (mean 0, variance 1) for all input variables after each layer improves convergence effectiveness • Reduction in step size after each iteration improves convergence, in addition to usage of momentum & Dropout Deep Architecture Decision Boundary of Deep Architecture
  • 18. Deep Architecture of ANN (Artificial Neural Network) • Case Study: To predict the survival (0 or 1) on Titanic based on few characteristics like Class, Age, Gender, Fare etc. Probability of Survival in Titanic Disaster Method Test score Settings ANN 0.7799 512-512-512-1, nb_epoch = 100, batchsize = 32 Adaboost 0.77033 ntree = 100, lrate = 0.04, algo = SAMME.R Randomforest 0.77033 ntree = 100, maxdepth = 4, criteria = gini,max_features = auto Gradientboost 0.76555 ntree = 100, lrate = 0.04, maxdep =5, maxfeatures = auto XGBoost 0.76077 ntree = 100, lrate = 0.04, maxdep = 5 Logistic Regression 0.7512 NA Hidden Layer 1 Hidden Layer 2 Hidden Layer 3 Output Layer
  • 19. Convolutional Neural Networks • Convolutional Neural Networks used in picture analysis, including image captioning, digit recognizer and various visual system processing E.g.: Vision detection in Self driving cars, Hand written Digit recognizer, Google Deepmind’ s Alphago Object recognition and classification using Convolutional Networks CNN application in Self Driving Cars CNN application in Handwritten Digit Recognizer
  • 20. Convolutional Neural Networks • Hubel & Wiesel inserted microscopic electrodes into the visual cortex of anesthetized cat to read activity of the single cells in visual cortex while presenting various stimuli to it’s eyes during experiment on 1959. For which received noble prize under Medicine category on 1981 • Hubel & Wiesel discovered that vision is hierarchical, consists of simple cells, complex cells & hyper-complex cells Hubel & Weisel experiments on Cat’s Vision Vision is Hierarchical phenomenon Formation of features over layers using Neural Networks Object detection using Edges
  • 21. Convolutional Neural Networks • Input layer/picture consists of 32 x 32 pixels with 3 colors (Red, Green & Blue) (32 x 32 x 3) • Convolution layer is formed by running a filter (5 x 5 x 3) over Input layer which will result in (28 x 28 x 1) Input Layer & Filter Running filter over Input Layer to form Convolution layer Complete Convolution Layer from filter
  • 22. Convolutional Neural Networks • 2nd Convolution layer has been created in similar way with another filter • After striding/convolving with 6 filters, new layer has been created with 28 x 28 x 6 dimension Complete Convolution layer from Filter 2 Convolution layers created with 6 Filters Formation of complete 2nd layer
  • 23. Convolutional Neural Networks • Pooling Layer: Pooling layer makes the representation smaller and more manageable. Operates over each activation map independently. Pooling applies on width and breadth of the layer and depth will remains the same during pooling stage • Padding: Size of the image (width & breadth) is getting shrunk consecutively, this issue is undesirable during deep networks, padding keeps the size of picture constant or controllable in size throughout the network Max pooling working methodology Max pool layer after performing pooling Zero padding on 6 x 6 picture
  • 24. Convolutional Neural Networks • Alex net Architecture: Alex Net won the IMAGENET challenge competition during 2012 • Layer 0: Input image (227 * 227 * 3 ~= 150k) • Layer 1: Convolution with 96 filters, size 11×11, stride 4, padding 0 • Layer 2: Max-Pooling with 3×3 filter, stride 2 • Layer 3: Convolution with 256 filters, size 5×5, stride 1, padding 2 • Layer 4: Max-Pooling with 3×3 filter, stride 2 • Layer 5: Convolution with 384 filters, size 3×3, stride 1, padding 1 • Layer 6: Convolution with 384 filters, size 3×3, stride 1, padding 1 • Layer 7: Convolution with 256 filters, size 3×3, stride 1, padding 1 • Layer 8: Max-Pooling with 3×3 filter, stride 2 • Layer 9: Fully Connected with 4096 neuron • Layer 10: Fully Connected with 4096 neuron • Layer 11: Fully Connected with 1000 neurons (classes to predict) Total memory required 24M * 4 bytes ~= 93 MB/image (only forward !~ *2 for bwd) Alex Net for IMAGENET Challenge 2012
  • 25. Convolutional Neural Networks • Case Study: kaggle Digit recognizer to recognize handwritten digits • Following implementation made the score 0.99314 (ideal score 1) 46th rank (3.5 %) out of 1314 teams in Public Leaderboard Digit Recognizer to classify Hand Written digits Layer 1 consists of 2 Convolutional layers followed by Max pooling layer } } Layer 2 consists of 2 Convolutional layers followed by Max pooling layer } Layer 3 consists of Dense network with Dropout 0.5 Layer 4 is Softmax layer for multiclass (10) outputs
  • 26. Recurrent Neural Networks • Recurrent neural networks are very much useful in sequence remembering, time series forecasting, Image captioning, machine translation etc. • RNNs are useful in building A.I. Chabot in which sequence of words with all syntaxes & semantics would be remembered and subsequently provide answers to given questions Recurrent Neural Networks Image Captioning using Convolutional and Recurrent Neural Network Application of RNN in A.I. Chatbot
  • 27. Recurrent Neural Networks • Recurrent neural network is used for processing sequence of vectors x by applying a recurrence formula at every time step Recurrent Neural Network Vanilla Network Image Captioning (image -> Seq. of words) Sentiment Classification (Seq. of words -> Sentiment) Machine Translation (Seq. of words -> Seq. of words) Video Classification on frame level yt x t RNN y0 x0 RNN y1 x1 RNN y2 x2 RNN yt x t RNN
  • 28. Recurrent Neural Networks • Vanishing gradient problem with RNN: Gradients do vanishes quickly with more number of layers and this issue is severe with RNN. Vanishing gradients leads to slow training rates. LSTM & GRU are used to avoid this issue • LSTM (Long Short Term Memory): LSTM is an artificial neural network contains LSTM blocks in addition to regular network units. LSTM block contains gates that determine when the input is significant enough to remember, when it should continue to remember or when it should forget the value and when it should output the value LSTM Working Principle (Backpropagation through a memory cell) LSTM Cell RNN & LSTM formula
  • 29. Recurrent Neural Networks • Case Study: NIFTY prediction NIFTY 1 Year EOD data } } } } Layer 1 consists of 1000 Recurrent LSTM neurons Layer 2 consists of 1000 Recurrent LSTM neurons Layer 3 consists of 1000 Recurrent LSTM neurons Layer 4 consists of 1000 Recurrent LSTM neurons with return sequence False Output Layer consists of 1 neuron with linear activation function
  • 30. Deep Autoencoders • Deep Autoencoder: Autoencoder neural network is an unsupervised learning algorithm that applies backpropagation. Stacking layers of Autoencoders produces a deeper architecture known as Stacked or Deep Autoencoders • Application of Encoders in Face recognition, Speech recognition, Signal Denoising etc. PCA vs Deep Autoencoder for MNIST Data Face Recognition using Deep Autoencoders
  • 31. Deep Autoencoders • Deep Autoencoder: Autoencoder neural network is an unsupervised learning algorithm that applies backpropagation, setting the target values to be equal to the inputs. i.e. it uses y ( i ) = x ( i ) • Typically deep Autoencoder is composed of two segments, encoding network and decoding network. Deep Autoencoder Examples Training Deep Autoencoder Autoencoder with Classifier Reconstruction of features with weight transpose