Se ha denunciado esta presentación.
Se está descargando tu SlideShare. ×

Introduction to deep learning

Próximo SlideShare
Convolutional neural network
Convolutional neural network
Cargando en…3

Eche un vistazo a continuación

1 de 25 Anuncio

Más Contenido Relacionado

Presentaciones para usted (20)

Similares a Introduction to deep learning (20)


Introduction to deep learning

  1. 1. Introduction to Deep Learning July 12th 2017 Prepared for: Charlotte Bots & AI Meetup Presenter: Abhishek Bhandwaldar Data scientist at
  2. 2. They are all talking about AI especially advancements in Deep Learning! Photo credit:
  3. 3. AI Evolution Credit: at
  4. 4. Artificial Intelligence Techniques
  5. 5. Cognitive Use Cases (Why DL?) Computer Vision Self Driving Cars Faces, Gaming Medical / Sensors Speech Processing Voice Recognition Music Generation Language Translation Natural Language Processing CRM Chatbots ADs Credit:
  6. 6. Deep Learning through Ages 1958 Perceptron algorithm Rosenblatt created the perceptron algorithm 1965 Multilayer perceptron First algorithm for multilayer perceptron was publisher by Ivakhnenko Image Recognition Yan LeCun et al. successfully applied deep neural network for image recognition using convolution 1969 1998 Neural Network Setbacks Minsky and Papert proved in their published book ‘Perceptrons’ that it has many limitation
  7. 7. Deep Learning Process • Data Collection and Pre-processing • Data division in train, dev and test sets • Selection of Model architecture • Training and performance benchmarking • Tuning hyper-parameters and Repeat training • Testing with Test set
  8. 8. Getting Started with Deep Learning 1. Python programming or alternative 2. GPU based Hardware 3. Deep Learning Frameworks 4. Basic knowledge of Neural Networks 5. Data sets (many available online)
  9. 9. Python Programming Basic level skill. Learn from or
  10. 10. GPU Hardware and Training in Cloud • CPU will work for simple workloads Cloud Provider Pros Cons   Similar to heroku, Easy to get started and use, Free credits No GUI. Difficult to get GPU working for non Tensorflow solutions.  Full blown desktop in cloud with good GPU support. UI lags and buggy but usable. Google ML Engine  Best for Tensorflow only solution. Data labs is good notebook environment. No support beyond Tensorflow Amazon AWS   GPU supported systems Self hosting and maintenance
  11. 11. Deep Learning Frameworks • Keras is wrapper and makes it easy to work with DL frameworks! Framework Sponsor Best for: Tensorflow Google Popularity and ease of use CNTK Microsoft Fast, Accurate and Growing PyTorch Facebook Early adopter MxNET Amazon Group of companies Caffe, Theano Schools Researchers
  12. 12. Neural Networks Overview • Linear and Non Linear Models • Deep Neural Networks (Deep Learning name comes from here)  Training  Architecture  Convolutional Neural Networks  Recurrent Neural Networks  Generative Adversial Neural Networks
  13. 13. Linear Models • Output is linear function of input. • The model is limited in learning and addition of layers has no effect. • Computation on GPU is very efficient. Input Linear Function Output z = b + ∑ xiwi Image source:
  14. 14. Non Linear Model and Relu • By introducing non linearity model is able to learn much better. • The non linearity function most widely used is Relu. • f(x)=max(0,x) • Other activations functions used : • Sigmoid • Tanh • Relu is less computationally expensive. Input Linear Function Output z = b + ∑ xiwi Relu output = max(0, z) Image source: 1/
  15. 15. Deep Neural Network & Architecture • When we connect multiple neuron together we have a fully connected deep neural network. • Making network deeper than wider. • This helps in learning hierarchical representation (Low level details to high level concepts). • Also decreases learnable parameters. Image source:
  16. 16. Back Propagation and Neural Network Training • Back propagation is the algorithm we use for neural network learning. • The cost function widely used is the cross entropy cost function for calculating loss. • Two Steps: • Forward Pass: The data is passed through the network and the loss is calculated • Backward Pass: Loss is propagated backwards by changing weights i.e. optimization • Various methods for NN optimization: • Stochastic Gradient Descent, Momentum, Nesterov accelerated gradient, RMSprop, Adagrad, Adam
  17. 17. Demo Feed Forward Neural Network at Tensorflow Playground
  18. 18. Convolutional neural network Has 3 Types of Layers. • Convolution Layer: A convolutional layer applies convolution to input • Pooling Layer/Subsampling: This layer combines all output into single value. • Fully-connected layer: Simple fully connected network Image source: LeCun, Yann; Léon Bottou; Yoshua Bengio; Patrick Haffner (1998). "Gradient-based learning applied to document recognition" (PDF). Proceedings of the IEEE.
  19. 19. CNN Example LeNet-5, convolutional neural networks source:
  20. 20. Code Walkthrough Handwriting Recognition on MNIST data set
  21. 21. Recurrent Neural Network • This type of Neural network is used for sequence data. Ideal for text data. • The output of hidden layer is fed back into itself (feedback). • RNN are Turing-complete. But practically they are very difficult to implement because of the gradient explosion/vanishing problem. • To tackle this issue we have the LSTM network. Image source: effectiveness/
  22. 22. Generative Adversarial Networks • According to Yan LeCun GAN were the next big thing. • Architecture is Simple: We have Discriminative model has task of determining whether image is image looks natural or not. • The task of generator is to generate image in order to fool discriminator.
  23. 23. Tips for training Deep Neural Network • Batch Learning: • Pass over complete training examples is made and then weights are updated. • In Mini-batch pass over small batch is made and weights are updated after every batch • Fast and parallel training can be implemented on GPU. Widely used. • Online Learning: • After completing pass over every example weights are updated. Easy to train on new examples • Very slow. • Convolutions can be implemented parallelly and gain speed up on GPU. • In RNN, multiple examples can be processed parallelly if using batch learning. • Using techniques like dropout and regularization to prevent overfitting. • Gathering more examples to prevent overfitting and generalize better. • Increase number of layer to prevent underfitting.
  24. 24. Resources Neural Network: • Neural Networks for Machine Learning | Coursera • Neural Networks by Hugo Larochelle • Neural Networks, Manifolds, and Topology -- colah's blog • Distill — Latest articles about machine learning • Deep Learning Book • An overview of gradient descent optimization algorithms • Deep Learning By Google • SIRAJ RAVAL'S DEEP LEARNING (Also available on Siraj Rawal’s YouTube channel) • Neural Networks and Deep Learning • Understanding Activation Functions in Neural Networks RNN: • The Unreasonable Effectiveness of Recurrent Neural Networks • Recurrent Neural Networks Tutorial • How to build a Recurrent Neural Network in TensorFlow CNN: • Convolutional Neural Network - Deep Learning • Convolutional Neural Networks (LeNet)
  25. 25. Questions & Feedback

Notas del editor

  • It is very hard for us to build a program that can do 3-D object recognition from novel viewpoint with new lighting and changing setting.
    This process happens in our brain but it is hard for us to build a program that can do this as we don’t know how our brain does it.
    Even if we get good idea of how the brain does the program we build will be very complicated.
    It is also hard to write a program that computes probability that a credit card transaction is fraudulent.
    There might not be any simple rules. The final program might be an collection of many weak rules.
    Fraud is moving target, program needs to keep updating itself.
    The Machine learning approach takes large number of examples that specify a particular task.
    It produces a program that can do that job for us. If we train it properly it will work on new cases as well i.e. it generalizes properly.
    The program will look nothing like the one we usually write. It will have lot of numbers.
    If the data changes the program can change too by training.
    Massive amount of computations are now cheaper and hence it is easy to train a program rather than paying someone to build one.
    Some of examples best solved by machine learning: Pattern Recognition, Anomaly detection, Prediction
  • In 1958 Rosenblatt created the perceptron algorithm. He made many tall claims.
    The first algorithm for multilayer perceptron was publisher by Ivakhnenko in 1965.
    In 1969, Minsky and Papert proved in their published book ‘Perceptrons’ that it has many limitation. This brought depression in neural network research. In their research they proved that a single layer perceptron was unable to learn ex-or function and current computers where not capable of multilayer network.
    In 1998 Yan LeCun et al. successfully applied deep neural network for image recognition using convolution.

  • Linear functions are limited. We want to be able to learn anything possible.
    So we need a way to compute not non linearity.
    We had online linear units, stacking of multiple layers, it will still behave like a single layer network as summing this layer gives a linear function.
    This is not strong enough to model complex data.
    We also want it to be differentiable. That is we want to be able to calculate derivative
  • We introduce non linearity by doing minimal amount of work.
    Other types of function used are Sigmoid function, Tanh function a scaled form of sigmoid function
    The sigmoid function was popular and was used in most of machine learning models.
    But sigmoid and tanh suffers from problem of gradient vanishing i.e. when input is very small or very large the gradients are very small. At this point the gradient is close to 0 and hence weight updates are negligible. Hence learning can get very slow or even stop and Neural network takes time to converge.
    To counter this we use Relu unit. This unit is Non linear.
    We insert Relu unit. So our input first passes through linear unit where it is multiplied by the weight matrix and bias is added.
    The output then goes through Relu unit. The Relu function is 0 for all input less than 0 and x if for input x > 0
    Relu at this point proves to be much better than sigmoid and Tanh.
    But relu has problem when input is negative or 0. At this time gradient is 0 and network learns slowly. This is called Dying relu problem
    To tackle this we have leaky relu where at negative input output is very small but not 0.

  • A typical neural network looks like bunch of neurons stacked on each other.
    The input of the network is often vectorized.
    The hidden layer of made up of activation functions from earlier slide
    While building network we often make our network deeper by introducing new layers rather than wider by increasing neurons in layer.
    Increasing neuron just increases the number of trainable parameters.
    While making it deeper or adding new layers helps network in learning hierarchical structure.
    Hierarchical structure as in low level details like lines and edges to mid level details like shapes to high level concepts like head and body.
    The output layer depends on type of problem. If we have classification problem then the number of neurons are same as number of classes.
    If the problem is regression then output layer is sum of all previous layer neurons.
    By increasing number of hidden layer model complexity is increases. i.e it is capable of learning complex data. But also increases risk of overfitting.

  • We first feed in the input data in vector form . Then we feed it to network. Its basically a series of matrix multiplication
    The data from input is multiplied by weights and bias is added. Then we apply non linearity like the Relu.
    This operation is repeated through every hidden layer.
    Finally when output is passed through output layer. This were we compare the output of neural network to the expected output or the label and compute the error.
    We then compute partial derivative with respect to weight in each layer and going back recursively. We then use this error derivative to change weights of that layers going back recursively.
    We repeat these steps until error reaches as small as possible.
    And this is how neural network learning is performed
  • Number of trainable parameter are huge in Fully connected neural network. If we apply image input to FNN then number of trainable parameter even grows bigger.
    To address this issue we have CNN. Early layers of neural network are convolution layers with pooling layer in between. By introducing convolution layers, number of fully connected layers required for learning is low.
    What is a Convolution layer? We have small n x n matrices which we call filters.
    This filters are convoluted with the image to produces feature map.
    Convolution is operation similar to matrix multiplication except it is done in reverse manner. Filter is simultaneously multiplied with all parts of image to produce a feature map. Every conv layer has multiple trainable filters and hence output produced is stack of feature maps.
    As same filter is applied on various part of image the number of trainable parameters decreases.
    We then pass this stack of feature map through pooling layer. This pooling layer then reduces feature map to more manageable size.
    Type of pooling are max pooling where we take max value from small array of values.
    Another is avg pooling where we average out the values.
    This process is repeated for couple of layers.
    The output is then feed to a fully connected layer which takes these feature maps and produces the classification output.
    Various architectures have been proposed like the LeNet, Alex Net, VGG 16 with 16 layers, inception with 25 million parameters
    Microsoft Resnet has 152 layers and has residual connections. That is some conncetion to layers are skipped which again improves performance.
    Microsft Fast R-CNN
  • How do we train model on sequence data. Data which has temporal property like speech data or weather forecasting or stock market price.
    For modelling sequences we have variation of Neural network know as RNN
    In this type of model the connection from hidden layer is given back to itself.
    Depending on type of requirement we have various architecture.
    First is Vanilla mode without RNN where we have fixed size input and fixed size output. Useful for image classification.
    Second model is fixed input variable output. Useful for caption generation for images
    Third: Sequence input fixed size output for task like sentiment analysis.
    Fourth Sequence input sequence output. For machine translation
    Last Synched sequence input output for task like video labeling.
    Also used for semantic similarity, where we have two RNN and output of both of them is passed through single layer which output related ness score.
  • These kind of networks have been used for image upsampling, image completion, image generation from text