SlideShare una empresa de Scribd logo
1 de 148
Descargar para leer sin conexión
NEURAL NETS FOR VISION
  CVPR 2012 Tutorial on Deep Learning
                    Part III


Marc'Aurelio Ranzato -
ranzato@google.com
www.cs.toronto.edu/~ranzato
Building an Object Recognition System

                              “CAR”

                                          CLASSIFIER
                  FEATURE
                 EXTRACTOR



IDEA: Use data to optimize features for the given task.




                                                                2

                                                          Ranzato
Building an Object Recognition System

                           “CAR”

                                        CLASSIFIER
                 f  X ; 


What we want: Use parameterized function such that
                a) features are computed efficiently
                b) features can be trained efficiently

                                                               3

                                                         Ranzato
Building an Object Recognition System

       “CAR”
                               END-TO-END
                              RECOGNITION
                                 SYSTEM



– Everything becomes adaptive.
– No distiction between feature extractor and classifier.
– Big non-linear system trained from raw pixels to labels.


                                                                   4

                                                             Ranzato
Building an Object Recognition System

      “CAR”
                              END-TO-END
                             RECOGNITION
                                SYSTEM



Q: How can we build such a highly non-linear system?
A: By combining simple building blocks we can make more and more
   complex systems.

                                                             5

                                                       Ranzato
Building A Complicated Function

  Simple Functions

                                     One Example of
     sin  x                      Complicated Function
                      log  x 
                                                     3
cos  x                          log cos exp sin  x 
                      3
                  x

            exp  x 


                                                          6

                                                    Ranzato
Building A Complicated Function

  Simple Functions

                                                   One Example of
     sin  x                                    Complicated Function
                      log  x 
                                                                  3
cos  x                                       log cos exp sin  x 
                      3
                  x

            exp  x              – Function composition is at the core of
                                    deep learning methods.
                                  – Each “simple function” will have
                                    parameters subject to training.     7

                                                                  Ranzato
Implementing A Complicated Function
               Complicated Function

                                    3
              log cos exp sin  x 




                  3
  sin  x    x         exp  x        cos  x    log  x 


                                                                8

                                                      Ranzato
Intuition Behind Deep Neural Nets


  “CAR”




                                    9

                              Ranzato
Intuition Behind Deep Neural Nets


     “CAR”




NOTE: Each black box can have trainable parameters.
       Their composition makes a highly non-linear system.

                                                             10

                                                       Ranzato
Intuition Behind Deep Neural Nets
                  Intermediate representations/features



     “CAR”




NOTE: System produces a hierarchy of features.

                                                          11

                                                    Ranzato
Intuition Behind Deep Neural Nets


           “CAR”




                                                                                  12

Lee et al. “Convolutional DBN's for scalable unsup. learning...” ICML 2009   Ranzato
Intuition Behind Deep Neural Nets


          “CAR”




                                         13

Lee et al. ICML 2009               Ranzato
Intuition Behind Deep Neural Nets


          “CAR”




                                         14

Lee et al. ICML 2009               Ranzato
KEY IDEAS OF NEURAL NETS
                    IDEA # 1
              Learn features from data

                    IDEA # 2
      Use differentiable functions that produce
                 features efficiently

                    IDEA # 3
                  End-to-end learning:
no distinction between feature extractor and classifier

                    IDEA # 4
                “Deep” architectures:
         cascade of simpler non-linear modules
                                                          15

                                                    Ranzato
KEY QUESTIONS



- What is the input-output mapping?

- How are parameters trained?

- How computational expensive is it?

- How well does it work?




                                            16

                                       Ranzato
Outline
- Neural Networks for Supervised Training
  - Architecture
  - Loss function
- Neural Networks for Vision: Convolutional & Tiled
- Unsupervised Training of Neural Networks
- Extensions:
  - semi-supervised / multi-task / multi-modal
- Comparison to Other Methods
  - boosting & cascade methods
  - probabilistic models
- Large-Scale Learning with Deep Neural Nets
                                                      17

                                                 Ranzato
Outline
- Neural Networks for Supervised Training
  - Architecture
  - Loss function
- Neural Networks for Vision: Convolutional & Tiled
- Unsupervised Training of Neural Networks
- Extensions:
  - semi-supervised / multi-task / multi-modal
- Comparison to Other Methods
  - boosting & cascade methods
  - probabilistic models
- Large-Scale Learning with Deep Neural Nets
                                                      18

                                                 Ranzato
Linear Classifier: SVM
                D
Input:   X ∈R
Binary label:   y
Parameters: W ∈R D
                      T
Output prediction:   W X
           1    2             T
Loss:    L= ∥W ∥  max [0,1−W X y ]
           2
                L


                                        Hinge Loss
                                                     19
                     1         WT X y           Ranzato
Linear Classifier: Logistic Regression
                D
Input:   X ∈R
Binary label:   y
Parameters: W ∈R D
                      T
Output prediction:   W X
           1    2                T
Loss:    L= ∥W ∥  log1exp −W X y
           2
                L


                                        Log Loss
                                                        20

                               WT X y              Ranzato
Logistic Regression: Probabilistic Interpretation
                 D
 Input:   X ∈R
 Binary label:   y
 Parameters: W ∈R D
                                               1
                                    1
 Output prediction: p y=1∣X =           T
                                        −W X
                                  1e
                                                   WT X
 Loss: L=−log p y∣X 

 Q: What is the gradient of L w.r.t. W ?



                                                          21

                                                     Ranzato
Logistic Regression: Probabilistic Interpretation
                 D
 Input:   X ∈R
 Binary label:   y
 Parameters: W ∈R D
                                               1
                                    1
 Output prediction: p y=1∣X =           T
                                        −W X
                                  1e
                                                   WT X
 Loss: L=log1exp −W T X y 

 Q: What is the gradient of L w.r.t. W ?



                                                          22

                                                     Ranzato
Simple Functions


     sin  x                     Complicated Function
                      log  x 
                                             1
cos  x                          −log      −W X
                                                 T   
                  x
                      3                   1e

            exp  x 


                                                         23

                                                 Ranzato
Logistic Regression: Computing Loss
          Complicated Function

                     1
          −log      −W X
                         T   
                  1e




    T
          u      1           p              L
  W X              −u            −log p
               1e


                                                 24

                                            Ranzato
Chain Rule
                          x                 y

                          dL            dL
                          dx            dy



Given   y  x   and   dL /dy,
What is   dL /dx ?



                                                     25

                                                Ranzato
Chain Rule
                          x                 y

                          dL            dL
                          dx            dy



Given   y  x   and   dL /dy,         dL   dL dy
                                          =   ⋅
What is   dL /dx ?                     dx   dy dx



                                                         26

                                                    Ranzato
Chain Rule
                          x                 y

                          dL            dL
                          dx            dy



Given   y  x   and   dL /dy,         dL   dL dy
                                          =   ⋅
What is   dL /dx ?                     dx   dy dx



           All needed information is local!
                                                         27

                                                    Ranzato
Logistic Regression: Computing Gradients

X      T
            u     1      p               L
     W X            −u        −log p
                1e

                         −1
                          p
                                dL
                                dp




                                              28

                                         Ranzato
Logistic Regression: Computing Gradients

X      T
             u         1      p               L
     W X                 −u        −log p
                     1e


           p1− p            −1
                               p
                      dp             dL
                      du             dp




                                                   29

                                              Ranzato
Logistic Regression: Computing Gradients

X        T
               u         1      p               L
       W X                 −u        −log p
                       1e


 X           p1− p            −1
                                 p
        du              dp             dL
       dW               du             dp


     dL   dL dp du
        =   ⋅ ⋅    =  p−1 X
     dW   dp du dW
                                                     30

                                                Ranzato
What Did We Learn?

- Logistic Regression
- How to compute gradients of complicated functions




                                                  Logistic
                                                 Regression




                                                              31

                                                        Ranzato
Neural Network




     Logistic     Logistic
    Regression   Regression




                              32

                        Ranzato
Neural Network


– A neural net can be thought of as a stack of logistic regression
  classifiers. Each input is the output of the previous layer.




    Logistic                 Logistic                Logistic
   Regression               Regression              Regression



NOTE: intermediate units can be thought of as linear
         classifiers trained with implicit target values.        33

                                                            Ranzato
Key Computations: F-Prop / B-Prop


            F-PROP

   X                       Z



               

                                    34

                               Ranzato
Key Computations: F-Prop / B-Prop


            B-PROP
 ∂L                          ∂L
 ∂X        ∂Z ∂Z             ∂Z
         {    ,   }
           ∂ X ∂
               ∂L
               ∂
                                    35

                              Ranzato
Neural Net: Training
A) Compute loss on small mini-batch




     Layer 1               Layer 2    Layer 3




                                                 36

                                            Ranzato
Neural Net: Training
A) Compute loss on small mini-batch


                          F-PROP


     Layer 1               Layer 2    Layer 3




                                                 37

                                            Ranzato
Neural Net: Training
A) Compute loss on small mini-batch


                          F-PROP


     Layer 1               Layer 2    Layer 3




                                                 38

                                            Ranzato
Neural Net: Training
A) Compute loss on small mini-batch


                          F-PROP


     Layer 1               Layer 2    Layer 3




                                                 39

                                            Ranzato
Neural Net: Training
A) Compute loss on small mini-batch
B) Compute gradient w.r.t. parameters
                          B-PROP


    Layer 1                Layer 2      Layer 3




                                                   40

                                              Ranzato
Neural Net: Training
A) Compute loss on small mini-batch
B) Compute gradient w.r.t. parameters
                          B-PROP


    Layer 1                Layer 2      Layer 3




                                                   41

                                              Ranzato
Neural Net: Training
A) Compute loss on small mini-batch
B) Compute gradient w.r.t. parameters
                          B-PROP


    Layer 1                Layer 2      Layer 3




                                                   42

                                              Ranzato
Neural Net: Training
A) Compute loss on small mini-batch
B) Compute gradient w.r.t. parameters
                                             dL
C) Use gradient to update parameters   −
                                            d


    Layer 1              Layer 2             Layer 3




                                                        43

                                                   Ranzato
NEURAL NET: ARCHITECTURE



 hj     h j1
                               T
                h j1 =W      j1   h j b j 1

                        M ×N                 N
                W j∈R          , b j∈R
                      M                  N
                h j ∈ R , h j1 ∈ R



                                                          44

                                                     Ranzato
NEURAL NET: ARCHITECTURE



 hj     h j1
                             T
                h j1 =W    j1   h j b j 1

                          1             x 
                 x =     −x
                        1e
                                                   x




                                                        45

                                                   Ranzato
NEURAL NET: ARCHITECTURE



 hj     h j1
                             T
                h j1 =W    j1   h j b j 1
                                        x 
                 x =tanh x                    x




                                                        46

                                                   Ranzato
Graphical Notations

                      f  X ;W 


                 is equivalent to




           X                           h
                          W
h k is called feature, hidden unit, neuron or code unit        47

                                                          Ranzato
MOST COMMON ARCHITECTURE

  X                                              y
                                                 


                                                     Error
  y

NOTE:   Multi-layer neural nets with more than two layers are
        nowadays called deep nets!!



                                                             48

                                                       Ranzato
MOST COMMON ARCHITECTURE

  X                                              y
                                                 


                                                     Error
  y

NOTE:   Multi-layer neural nets with more than two layers are
        nowadays called deep nets!!

NOTE:   User must specify number of layers, number of hidden
        units, type of layers and loss function.           49

                                                       Ranzato
MOST COMMON LOSSES

   X                                      y
                                          


                                              Error
   y

Square Euclidean Distance (regression):
           N
y , y ∈R
    
                 1 N                  2
               L= ∑ i=1  y i − y i 
                                
                 2
                                                      50

                                                Ranzato
MOST COMMON LOSSES

   X                                          y
                                              


                                                  Error
   y

Cross Entropy (classification):
                    N             N
           N
y , y ∈[0,1] ,
                ∑i =1 y i =1, ∑i =1  i =1
                                     y
                             N
                   L=−∑i =1 y i log y i
                                    
                                                          51

                                                    Ranzato
NEURAL NETS FACTS


1:   User specifies loss based on the task.

2:   Any optimization algorithm can be chosen for training.

3:   Cost of F-Prop and B-Prop is similar and proportional to the
     number of layers and their size.




                                                                    52

                                                              Ranzato
Toy Code: Neural Net Trainer
% F-PROP
for i = 1 : nr_layers - 1
 [h{i} jac{i}] = logistic(W{i} * h{i-1} + b{i});
end
h{nr_layers-1} = W{nr_layers-1} * h{nr_layers-2} + b{nr_layers-1};
prediction = softmax(h{l-1});

% CROSS ENTROPY LOSS
loss = - sum(sum(log(prediction) .* target));

% B-PROP
dh{l-1} = prediction - target;
for i = nr_layers – 1 : -1 : 1
 Wgrad{i} = dh{i} * h{i-1}';
 bgrad{i} = sum(dh{i}, 2);
 dh{i-1} = (W{i}' * dh{i}) .* jac{i-1};
end

% UPDATE
for i = 1 : nr_layers - 1
 W{i} = W{i} – (lr / batch_size) * Wgrad{i};
 b{i} = b{i} – (lr / batch_size) * bgrad{i};                              53
end                                                                  Ranzato
TOY EXAMPLE: SYNTHETIC DATA
  1 input & 1 output
  100 hidden units in each layer




                                        54

                                   Ranzato
TOY EXAMPLE: SYNTHETIC DATA
                   1 input & 1 output
                   3 hidden layers




                                        55

                                  Ranzato
TOY EXAMPLE: SYNTHETIC DATA
1 input & 1 output
3 hidden layers, 1000 hiddens
Regression of cosine




                                     56

                                Ranzato
TOY EXAMPLE: SYNTHETIC DATA




                  1 input & 1 output
                  3 hidden layers, 1000 hiddens
                  Regression of cosine
                                            57

                                     Ranzato
Outline
- Neural Networks for Supervised Training
  - Architecture
  - Loss function
- Neural Networks for Vision: Convolutional & Tiled
- Unsupervised Training of Neural Networks
- Extensions:
  - semi-supervised / multi-task / multi-modal
- Comparison to Other Methods
  - boosting & cascade methods
  - probabilistic models
- Large-Scale Learning with Deep Neural Nets
                                                      58

                                                 Ranzato
FULLY CONNECTED NEURAL NET
                  Example: 1000x1000 image
                              1M hidden units
                               10^12 parameters!!!




      - Spatial correlation is local
      - Better to put resources elsewhere!        59

                                             Ranzato
LOCALLY CONNECTED NEURAL NET




              Example: 1000x1000 image
                        1M hidden units
                        Filter size: 10x10
                        100M parameters




                                             60

                                      Ranzato
LOCALLY CONNECTED NEURAL NET




              Example: 1000x1000 image
                        1M hidden units
                        Filter size: 10x10
                        100M parameters




                                             61

                                      Ranzato
LOCALLY CONNECTED NEURAL NET
            STATIONARITY? Statistics is
            similar at different locations




                Example: 1000x1000 image
                          1M hidden units
                          Filter size: 10x10
                          100M parameters




                                               62

                                        Ranzato
CONVOLUTIONAL NET




      Share the same parameters across
      different locations:
       Convolutions with learned kernels




                                           63

                                    Ranzato
CONVOLUTIONAL NET

          Learn multiple filters.




          E.g.: 1000x1000 image
                 100 Filters
                 Filter size: 10x10
                 10K parameters


                                      64

                                 Ranzato
NEURAL NETS FOR VISION

A standard neural net applied to images:
- scales quadratically with the size of the input
- does not leverage stationarity


Solution:
- connect each hidden unit to a small patch of the input
- share the weight across hidden units
This is called: convolutional network.



LeCun et al. “Gradient-based learning applied to document recognition” IEEE 1998
                                                                               65

                                                                         Ranzato
CONVOLUTIONAL NET

    Let us assume filter is an “eye” detector.

    Q.: how can we make the detection robust
    to the exact location of the eye?




                                            66

                                      Ranzato
CONVOLUTIONAL NET

    By “pooling” (e.g., max or average) filter
    responses at different locations we gain
    robustness to the exact spatial location
    of features.




                                             67

                                       Ranzato
CONV NETS: EXTENSIONS
 Over the years, some new modules have proven to be very
 effective when plugged into conv-nets:

 - L2 Pooling
        layer i       layer i 1



         N x , y
                          x , y
                                    h i1, x , y =   ∑    j , k ∈ N  x , y 
                                                                                 h   2
                                                                                     i, j ,k


 - Local Contrast Normalization
        layer i       layer i 1
                                                   hi , x , y −m i , N  x , y
                          x , y   h i1, x , y =
         N x , y                                         i , N  x , y
Jarrett et al. “What is the best multi-stage architecture for object    68

recognition?” ICCV 2009                                            Ranzato
CONV NETS: L2 POOLING


                              -1
                               0          L2 Pooling
                               0
                               0
                               0

                                      ∑
                                            5              +1
                                                       2
                                                 ⋅
                              hi            i =1       i
                                                           h i1




                                                                        69

Kavukguoglu et al. “Learning invariant features ...” CVPR 2009     Ranzato
CONV NETS: L2 POOLING


                              0
                              0           L2 Pooling
                              0
                              0
                              +1

                                      ∑
                                            5              +1
                                                       2
                                                 ⋅
                              hi            i =1       i
                                                           h i1




                                                                        70

Kavukguoglu et al. “Learning invariant features ...” CVPR 2009     Ranzato
LOCAL CONTRAST NORMALIZATION
                    hi , x , y −m i , N  x , y
     h i1, x , y =
                            i , N  x , y

                   0               0

                   1                0.5

                   0     LCN        0

                  -1               -0.5

                   0                0


                                                        71

                                                   Ranzato
LOCAL CONTRAST NORMALIZATION
                    hi , x , y −m i , N  x , y
     h i1, x , y =
                            i , N  x , y

                   1               0

                   11               0.5

                   1     LCN        0

                  -9               -0.5

                   1                0


                                                        72

                                                   Ranzato
CONV NETS: EXTENSIONS




 L2 Pooling & Local Contrast Normalization
help learning more invariant representations!




                                                73

                                           Ranzato
CONV NETS: TYPICAL ARCHITECTURE
             One stage (zoom)


               Filtering         Pooling        LCN




         Whole system
 Input                                                   Class
Image                                                    Labels
                                                Linear
                                                Layer

         1st stage   2nd stage      3rd stage
                                                              74

                                                         Ranzato
CONV NETS: TRAINING

Since convolutions and sub-sampling are differentiable, we can use
standard back-propagation.

Algorithm:
  Given a small mini-batch
  - FPROP
  - BPROP
  - PARAMETER UPDATE




                                                               75

                                                          Ranzato
CONV NETS: EXAMPLES
- Object category recognition
  Boureau et al. “Ask the locals: multi-way local pooling for image
  recognition” ICCV 2011
- Segmentation
  Turaga et al. “Maximin learning of image segmentation” NIPS 2009
- OCR
  Ciresan et al. “MCDNN for Image Classification” CVPR 2012
- Pedestrian detection
  Kavukcuoglu et al. “Learning convolutional feature hierarchies for
  visual recognition” NIPS 2010
- Robotics
  Sermanet et al. “Mapping and planning..with long range perception”
   IROS 2008
                                                                      76

                                                              Ranzato
LIMITATIONS & SOLUTIONS

- requires lots of labeled data to train
+ unsupervised learning



- difficult optimization
+ layer-wise training



- scalability
+ distributed training
                                                77

                                           Ranzato
Outline
- Neural Networks for Supervised Training
  - Architecture
  - Loss function
- Neural Networks for Vision: Convolutional & Tiled
- Unsupervised Training of Neural Networks
- Extensions:
  - semi-supervised / multi-task / multi-modal
- Comparison to Other Methods
  - boosting & cascade methods
  - probabilistic models
- Large-Scale Learning with Deep Neural Nets
                                                      78

                                                 Ranzato
BACK TO LOGISTIC REGRESSION



    input      prediction




                            Error
    target




                                         79

                                    Ranzato
Unsupervised Learning



input          prediction




                            Error

 ?

                                         80

                                    Ranzato
Unsupervised Learning

Q: How should we train the input-output mapping
    if we do not have target values?

                 input                 code



A: Code has to retain information from the input
   but only if this is similar to training samples.

  By better representing only those inputs that are similar to
  training samples we hope to extract interesting structure
  (e.g., structure of manifold where data live).

                                                                 81

                                                          Ranzato
Unsupervised Learning

Q: How to constrain the model to represent training
   samples better than other data points?




                                                           82

                                                      Ranzato
Unsupervised Learning
– reconstruct the input from the code & make code compact
  (auto-encder with bottle-neck).

– reconstruct the input from the code & make code sparse
 (sparse auto-encoders)
see work in LeCun, Ng, Fergus, Lee, Yu's labs

– add noise to the input or code (denoising auto-encoders)
see work in Y. Bengio, Lee's lab

– make sure that the model defines a distribution that normalizes
  to 1 (RBM).
see work in Y. Bengio, Hinton, Lee, Salakthudinov's lab




                                                                  83

                                                             Ranzato
AUTO-ENCODERS NEURAL NETS



 input                    code                   reconstruction
              encoder                  decoder


                                                            Error



– input higher dimensional than code
- error: ||reconstruction - input||2
- training: back-propagation                                      84

                                                             Ranzato
SPARSE AUTO-ENCODERS

                        Sparsity
                        Penalty
 input                                           reconstruction
              encoder                  decoder
                          code


                                                            Error


– sparsity penalty: ||code||1
- error: ||reconstruction - input||2
- loss: sum of squared reconstruction error and sparsity
                                                                  85
- training: back-propagation                                 Ranzato
SPARSE AUTO-ENCODERS

                            Sparsity
                            Penalty
    input                                              reconstruction
                  encoder                   decoder
                                 code


                                                                  Error


                             T
  – input: X code:   h=W X
                                        2
  - loss:   L  X ; W =∥W h− X ∥  ∑ j ∣h j∣
                                                                        86
Le et al. “ICA with reconstruction cost..” NIPS 2011               Ranzato
How To Use Unsupervised Learning
1) Given unlabeled data, learn features




                                               87

                                          Ranzato
How To Use Unsupervised Learning
1) Given unlabeled data, learn features
2) Use encoder to produce features and train another layer on
   the top




                                                                88

                                                         Ranzato
How To Use Unsupervised Learning
1) Given unlabeled data, learn features
2) Use encoder to produce features and train another layer on
   the top




     Layer-wise training of a feature hierarchy
                                                                89

                                                         Ranzato
How To Use Unsupervised Learning
1) Given unlabeled data, learn features
2) Use encoder to produce features and train another layer on
   the top
3) feed features to classifier & train just the classifier



  input                                         prediction




  label



          Reduced overfitting since features are
               learned in unsupervised way!                       90

                                                             Ranzato
How To Use Unsupervised Learning
1) Given unlabeled data, learn features
2) Use encoder to produce features and train another layer on
   the top
3) feed features to classifier & jointly train the whole system



  input                                        prediction




  label



          Given enough data, this usually yields the
              best results: end-to-end learning!                  91

                                                            Ranzato
Outline
- Neural Networks for Supervised Training
  - Architecture
  - Loss function
- Neural Networks for Vision: Convolutional & Tiled
- Unsupervised Training of Neural Networks
- Extensions:
  - semi-supervised / multi-task / multi-modal
- Comparison to Other Methods
  - boosting & cascade methods
  - probabilistic models
- Large-Scale Learning with Deep Neural Nets
                                                      92

                                                 Ranzato
Semi-Supervised Learning
                  truck




airplane
                          deer




           frog

                                 bird




                                             93

                                        Ranzato
Semi-Supervised Learning
                       truck




     airplane
                               deer




                frog

                                      bird




LOTS & LOTS OF UNLABELED DATA!!!                  94

                                             Ranzato
Semi-Supervised Learning


   Loss = supervised_error + unsupervised_error




                                                          prediction
    input




    label


                                                                             95

Weston et al. “Deep learning via semi-supervised embedding” ICML 2008   Ranzato
Multi-Task Learning


                        Face detection is hard because
                        of lighting, pose, but also
                        occluding goggles.



                        Face detection could made be
                        easier by face identification.



The identification task may help the detection task.
                                                         96

                                                    Ranzato
Multi-Task Learning
 - Easy to add many error terms to loss function.
 - Joint learning of related tasks yields better representations.




 Example of architecture:




                                                                    97
Collobert et al. “NLP (almost) from scratch” JMLR 2011      Ranzato
Multi-Modal Learning




        Audio and Video streams are
        often complimentary to each
        other.

        E.g., audio can provide important
        clues to improve visual recognition,
        and vice versa.



                                          98

                                     Ranzato
Multi-Modal Learning
  - Weak assumptions on input distribution
  - Fully adaptive to data



 Example of architecture:
modality #1

modality #2




modality #3


                                                          99
Ngiam et al. “Multi-modal deep learning” ICML 2011   Ranzato
Outline
- Neural Networks for Supervised Training
  - Architecture
  - Loss function
- Neural Networks for Vision: Convolutional & Tiled
- Unsupervised Training of Neural Networks
- Extensions:
  - semi-supervised / multi-task / multi-modal
- Comparison to Other Methods
  - boosting & cascade methods
  - probabilistic models
- Large-Scale Learning with Deep Neural Nets
                                                      100

                                                 Ranzato
Boosting & Forests

Deep Nets:
- single highly non-linear system
- “deep” stack of simpler modules
- all parameters are subject to learning



Boosting & Forests:
- sequence of “weak” (simple) classifiers that are linearly combined
to produce a powerful classifier
- subsequent classifiers do not exploit representations of earlier
classifiers, it's a “shallow” linear mixture
- typically features are not learned                            101

                                                           Ranzato
Properties              Deep Nets   Boosting

Adaptive features

Hierarchical features

End-to-end learning

Leverage unlab. data

Easy to parallelize

Fast training

Fast at test time
                                               102

                                          Ranzato
Deep Neural-Nets VS Probabilistic Models
Deep Neural Nets:
- mean-field approximations of intractable probabilistic models
- usually more efficient
- typically more unconstrained (partition function has to be
  replaced by other constraints, e.g. sparsity).




Hierarchical Probabilistic Models (DBN, DBM, etc.):
- in the most interesting cases, they are intractable
- they better deal with uncertainty
- they can be easily combined
                                                                  103

                                                           Ranzato
Example: Auto-Encoder
Neural Net:
                  T
 code   Z = W X b e 
                  e

 reconstruction   
                  X =W d Z bd


Probabilistic Model (Gaussian RBM):
                      T
  E [ Z∣ X ]= W X b e 

  E [ X ∣Z ]=W Z bd

                                          104

                                      Ranzato
Properties              Deep Nets   Probab. Models

Adaptive features

Hierarchical features

End-to-end learning

Leverage unlab. data

Models uncertainty

Fast training

Fast at test time
                                                105

                                            Ranzato
Outline
- Neural Networks for Supervised Training
  - Architecture
  - Loss function
- Neural Networks for Vision: Convolutional & Tiled
- Unsupervised Training of Neural Networks
- Extensions:
  - semi-supervised / multi-task / multi-modal
- Comparison to Other Methods
  - boosting & cascade methods
  - probabilistic models
- Large-Scale Learning with Deep Neural Nets
                                                      106

                                                 Ranzato
Tera-Scale Deep Learning @ Google

Observation #1: more features always improve
                   performance unless data is scarce.

Observation #2: deep learning methods have higher capacity
                    and have the potential to model data better.


Q #1: Given lots of data and lots of machines, can we scale up
        deep learning methods?

Q #2: Will deep learning methods perform much better?


                                                                 107

                                                          Ranzato
The Challenge
              A Large Scale problem has:
              – lots of training samples (>10M)
              – lots of classes (>10K) and
              – lots of input dimensions (>10K).


– best optimizer in practice is on-line SGD which is naturally
 sequential, hard to parallelize.
– layers cannot be trained independently and in parallel, hard to
 distribute
– model can have lots of parameters that may clog the network,
 hard to distribute across machines

                                                                 108

                                                            Ranzato
Our Solution



                                                                           2nd layer




                                                                           1st layer




                                                                              input
                                                                                                 109
Le et al. “Building high-level features using large-scale unsupervised learning” ICML 2012   Ranzato
Our Solution

  1st       2nd
                      3rd
machine   machine   machine
                                 2nd layer




                                 1st layer




                                   input
                                                 110

                                             Ranzato
Our Solution

  1st       2nd
                      3rd
machine   machine   machine




                                    MODEL
                                 PARALLELISM




                                           111

                                       Ranzato
Distributed Deep Nets



                              Deep Net
                                                                           MODEL
                                                                        PARALLELISM




                               input #1
                                                                                                 112
Le et al. “Building high-level features using large-scale unsupervised learning” ICML 2012   Ranzato
Distributed Deep Nets



                                                                           MODEL
                                                                        PARALLELISM

                                                                                    +
                                                                           DATA
                                                                        PARALLELISM
             input #3
                    input #2
                           input #1
                                                                                                 113
Le et al. “Building high-level features using large-scale unsupervised learning” ICML 2012   Ranzato
Asynchronous SGD

          PARAMETER SERVER




1st replica       2nd replica   3rd replica
                                                  114

                                              Ranzato
Asynchronous SGD

           PARAMETER SERVER

∂L
∂1




 1st replica       2nd replica   3rd replica
                                                   115

                                               Ranzato
Asynchronous SGD

          PARAMETER SERVER

 1




1st replica       2nd replica   3rd replica
                                                  116

                                              Ranzato
Asynchronous SGD

              PARAMETER SERVER
               (update parameters)




1st replica        2nd replica       3rd replica
                                                       117

                                                   Ranzato
Asynchronous SGD

          PARAMETER SERVER

                ∂L
                ∂2




1st replica       2nd replica   3rd replica
                                                  118

                                              Ranzato
Asynchronous SGD

          PARAMETER SERVER

                 2




1st replica       2nd replica   3rd replica
                                                  119

                                              Ranzato
Asynchronous SGD

              PARAMETER SERVER
               (update parameters)




1st replica        2nd replica       3rd replica
                                                       120

                                                   Ranzato
Unsupervised Learning With 1B Parameters

          DATA: 10M youtube (unlabeled) frames of size 200x200.




                                                                                                 121
Le et al. “Building high-level features using large-scale unsupervised learning” ICML 2012   Ranzato
Unsupervised Learning With 1B Parameters

     Deep Net:
     – 3 stages
     – each stage consists of local filtering, L2 pooling, LCN
        - 18x18 filters
        - 8 filters at each location
        - L2 pooling and LCN over 5x5 neighborhoods
     – training jointly the three layers by:
        - reconstructing the input of each layer
        - sparsity on the code


                                                                                                 122
Le et al. “Building high-level features using large-scale unsupervised learning” ICML 2012   Ranzato
Unsupervised Learning With 1B Parameters

     Deep Net:
     – 3 stages
     – each stage consists of local filtering, L2 pooling, LCN
        - 18x18 filters
        - 8 filters at each location
        - L2 pooling and LCN over 5x5 neighborhoods
     – training jointly the three layers by:
        - reconstructing the input of each layer
        - sparsity on the code
                                    1B parameters!!!
                                                                                                 123
Le et al. “Building high-level features using large-scale unsupervised learning” ICML 2012   Ranzato
Validating Unsupervised Learning


     The network has seen lots of objects during training, but
     without any label.

     Q.: how can we validate unsupervised learning?
     Q.: Did the network form any high-level representation?
            E.g., does it have any neuron responding for faces?


     – build validation set with 50% faces, 50% random images
     - study properties of neurons



                                                                                                 124
Le et al. “Building high-level features using large-scale unsupervised learning” ICML 2012   Ranzato
Validating Unsupervised Learning




                                                                                                   neuron responses
                                         1st stage            2nd stage             3rd stage




                                                                                                           125
Le et al. “Building high-level features using large-scale unsupervised learning” ICML 2012      Ranzato
Top Images For Best Face Neuron




                                  126

                              Ranzato
Best Input For Face Neuron




                                                                                                 127
Le et al. “Building high-level features using large-scale unsupervised learning” ICML 2012   Ranzato
Unsupervised + Supervised (ImageNet)



     Input
    Image                                                                               y
                                                                                        


               1st stage            2nd stage             3rd stage                          COST

     y


                                                                                                    128
Le et al. “Building high-level features using large-scale unsupervised learning” ICML 2012     Ranzato
Object Recognition on ImageNet


     IMAGENET v.2011 (16M images, 20K categories)

METHOD                                                                    ACCURACY %
Weston & Bengio 2011                                                             9.3
Linear Classifier on deep features                                               13.1
Deep Net (from random)                                                           13.6
Deep Net (from unsup.)                                                           15.8


                                                                                                 129
Le et al. “Building high-level features using large-scale unsupervised learning” ICML 2012   Ranzato
Top Inputs After Supervision




                                   130

                               Ranzato
Top Inputs After Supervision




                                   131

                               Ranzato
Experiments: and many more...


- automatic speech recognition

- natural language processing

- biomed applications

- finance


Generic learning algorithm!!

                                           132

                                       Ranzato
References
Tutorials & Background Material
– Yoshua Bengio, Learning Deep Architectures for AI, Foundations and Trends in
Machine Learning, 2(1), pp.1-127, 2009.
– LeCun, Chopra, Hadsell, Ranzato, Huang: A Tutorial on Energy-Based Learning, in
Bakir, G. and Hofman, T. and Schölkopf, B. and Smola, A. and Taskar, B. (Eds),
Predicting Structured Data, MIT Press, 2006

Convolutional Nets
– LeCun, Bottou, Bengio and Haffner: Gradient-Based Learning Applied to
Document Recognition, Proceedings of the IEEE, 86(11):2278-2324, November
1998
– Jarrett, Kavukcuoglu, Ranzato, LeCun: What is the Best Multi-Stage
Architecture for Object Recognition?, Proc. International Conference on
Computer Vision (ICCV'09), IEEE, 2009
- Kavukcuoglu, Sermanet, Boureau, Gregor, Mathieu, LeCun: Learning Convolutional
Feature Hierachies for Visual Recognition, Advances in Neural Information 133
Processing Systems (NIPS 2010), 23, 2010                              Ranzato
Unsupervised Learning
– ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning. Le,
Karpenko, Ngiam, Ng. In NIPS*2011
– Rifai, Vincent, Muller, Glorot, Bengio, Contracting Auto-Encoders: Explicit
invariance during feature extraction, in: Proceedings of the Twenty-eight
International Conference on Machine Learning (ICML'11), 2011
- Vincent, Larochelle, Lajoie, Bengio, Manzagol, Stacked Denoising Autoencoders:
Learning Useful Representations in a Deep Network with a Local Denoising
Criterion, Journal of Machine Learning Research, 11:3371--3408, 2010.
- Gregor, Szlam, LeCun: Structured Sparse Coding via Lateral Inhibition,
Advances in Neural Information Processing Systems (NIPS 2011), 24, 2011
- Kavukcuoglu, Ranzato, LeCun. "Fast Inference in Sparse Coding Algorithms with
Applications to Object Recognition". ArXiv 1010.3467 2008
- Hinton, Krizhevsky, Wang, Transforming Auto-encoders, ICANN, 2011

Multi-modal Learning
– Multimodal deep learning, Ngiam, Khosla, Kim, Nam, Lee, Ng. In Proceedings of
the Twenty-Eighth International Conference on Machine Learning, 2011.        134

                                                                       Ranzato
Locally Connected Nets
– Gregor, LeCun “Emergence of complex-like cells in a temporal product network
with local receptive fields” Arxiv. 2009
– Ranzato, Mnih, Hinton “Generating more realistic images using gated MRF's”
NIPS 2010
– Le, Ngiam, Chen, Chia, Koh, Ng “Tiled convolutional neural networks” NIPS 2010

Distributed Learning
– Le, Ranzato, Monga, Devin, Corrado, Chen, Dean, Ng. "Building High-Level
Features Using Large Scale Unsupervised Learning". International Conference of
Machine Learning (ICML 2012), Edinburgh, 2012.
Papers on Scene Parsing
– Farabet, Couprie, Najman, LeCun, “Scene Parsing with Multiscale Feature
Learning, Purity Trees, and Optimal Covers”, in Proc. of the International
Conference on Machine Learning (ICML'12), Edinburgh, Scotland, 2012.
- Socher, Lin, Ng, Manning, “Parsing Natural Scenes and Natural Language with
Recursive Neural Networks”. International Conference of Machine Learning
(ICML 2011) 2011.                                                           135

                                                                       Ranzato
Papers on Object Recognition
- Boureau, Le Roux, Bach, Ponce, LeCun: Ask the locals: multi-way local pooling for
image recognition, Proc. International Conference on Computer Vision 2011
- Sermanet, LeCun: Traffic Sign Recognition with Multi-Scale Convolutional
Networks, Proceedings of International Joint Conference on Neural Networks
(IJCNN'11)
- Ciresan, Meier, Gambardella, Schmidhuber. Convolutional Neural Network
Committees For Handwritten Character Classification. 11th International
Conference on Document Analysis and Recognition (ICDAR 2011), Beijing, China.
- Ciresan, Meier, Masci, Gambardella, Schmidhuber. Flexible, High Performance
Convolutional Neural Networks for Image Classification. International Joint
Conference on Artificial Intelligence IJCAI-2011.
Papers on Action Recognition
– Learning hierarchical spatio-temporal features for action recognition with
independent subspace analysis, Le, Zou, Yeung, Ng. In Computer Vision and
Pattern Recognition (CVPR), 2011
Papers on Segmentation
– Turaga, Briggman, Helmstaedter, Denk, Seung Maximin learning of image   136

segmentation. NIPS, 2009.                                            Ranzato
Papers on Vision for Robotics
– Hadsell, Sermanet, Scoffier, Erkan, Kavackuoglu, Muller, LeCun: Learning Long-
Range Vision for Autonomous Off-Road Driving, Journal of Field Robotics,
26(2):120-144, February 2009,
Deep Convex Nets & Deconv-Nets
– Deng, Yu. “Deep Convex Network: A Scalable Architecture for Speech Pattern
Classification.” Interspeech, 2011.
- Zeiler, Taylor, Fergus "Adaptive Deconvolutional Networks for Mid and High
Level Feature Learning." ICCV. 2011
Papers on Biological Inspired Vision
– Serre, Wolf, Bileschi, Riesenhuber, Poggio. Robust Object Recognition with
Cortex-like Mechanisms, IEEE Transactions on Pattern Analysis and Machine
Intelligence, 29, 3, 411-426, 2007.
- Pinto, Doukhan, DiCarlo, Cox "A high-throughput screening approach to
discovering good forms of biologically inspired visual representation." {PLoS}
Computational Biology. 2009
                                                                                 137

                                                                         Ranzato
Papers on Embedded ConvNets for Real-Time Vision Applications
– Farabet, Martini, Corda, Akselrod, Culurciello, LeCun: NeuFlow: A Runtime
Reconfigurable Dataflow Processor for Vision, Workshop on Embedded Computer
Vision, CVPR 2011,
Papers on Image Denoising Using Neural Nets
– Burger, Schuler, Harmeling: Image Denoisng: Can Plain Neural Networks
Compete with BM3D?, Computer Vision and Pattern Recognition, CVPR 2012,




                                                                          138

                                                                   Ranzato
Software & Links
Deep Learning website
– http://deeplearning.net/

C++ code for ConvNets
– http://eblearn.sourceforge.net/

Matlab code for R-ICA unsupervised algorithm
- http://ai.stanford.edu/~quocle/rica_release.zip

Python-based learning library
- http://deeplearning.net/software/theano/

Lush learning library which includes ConvNets
- http://lush.sourceforge.net/

Torch7: learning library that supports neural net training
http://www.torch.ch                                          139

                                                         Ranzato
Software & Links

Code used to generate demo for this tutorial
- http://cs.nyu.edu/~fergus/tutorials/deep_learning_cvpr12/




                                                                  140

                                                              Ranzato
Acknowledgements



Quoc Le, Andrew Ng

Jeff Dean, Kai Chen, Greg Corrado, Matthieu Devin,
Mark Mao, Rajat Monga, Paul Tucker, Samy Bengio

Yann LeCun, Pierre Sermanet, Clement Farabet




                                                141

                                            Ranzato
Visualizing Learned Features
 Q: can we interpret the learned features?




                                                                  142
Ranzato et al. “Sparse feature learning for DBNs” NIPS 2007   Ranzato
Visualizing Learned Features



 reconstruction: W       h=W 1 h1W 2 h2 

         ≈           =   0.9        + 0.7        + 0.5        + 1.0      + ...




 Columns of    W   show what each code unit represents.




                                                                           143
Ranzato et al. “Sparse feature learning for DBNs” NIPS 2007           Ranzato
Visualizing Learned Features
 1st layer features




                                                                  144
Ranzato et al. “Sparse feature learning for DBNs” NIPS 2007   Ranzato
Visualizing Learned Features
 Q: How about the second layer features?

 A: Similarly, each second layer code unit can be visualized by
     taking its bases and then projecting those bases in image
     space through the first layer decoder.




                                                                  145
Ranzato et al. “Sparse feature learning for DBNs” NIPS 2007   Ranzato
Visualizing Learned Features
 Q: How about the second layer features?

 A: Similarly, each second layer code unit can be visualized by
     taking its bases and then projecting those bases in image
     space through the first layer decoder.



                                        Missing edges have 0 weight.
                                        Light gray nodes have zero value.




                                                                      146
Ranzato et al. “Sparse feature learning for DBNs” NIPS 2007       Ranzato
Visualizing Learned Features
 Q: how are these images computed?
 1st layer features




2nd layer features

                                                                  147
Ranzato et al. “Sparse feature learning for DBNs” NIPS 2007   Ranzato
Example of Feature Learning
1st layer features




2nd layer features


                                                                  148
Ranzato et al. “Sparse feature learning for DBNs” NIPS 2007   Ranzato

Más contenido relacionado

La actualidad más candente

From neural networks to deep learning
From neural networks to deep learningFrom neural networks to deep learning
From neural networks to deep learningViet-Trung TRAN
 
Autoencoders for image_classification
Autoencoders for image_classificationAutoencoders for image_classification
Autoencoders for image_classificationCenk Bircanoğlu
 
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Larry Guo
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learningJunaid Bhat
 
Deep Learning Tutorial
Deep Learning Tutorial Deep Learning Tutorial
Deep Learning Tutorial Ligeng Zhu
 
RNN and its applications
RNN and its applicationsRNN and its applications
RNN and its applicationsSungjoon Choi
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Gaurav Mittal
 
Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용홍배 김
 
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...Tahmid Abtahi
 
(Kpi summer school 2015) theano tutorial part2
(Kpi summer school 2015) theano tutorial part2(Kpi summer school 2015) theano tutorial part2
(Kpi summer school 2015) theano tutorial part2Serhii Havrylov
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networksSi Haem
 
2010 deep learning and unsupervised feature learning
2010 deep learning and unsupervised feature learning2010 deep learning and unsupervised feature learning
2010 deep learning and unsupervised feature learningVan Thanh
 
TypeScript and Deep Learning
TypeScript and Deep LearningTypeScript and Deep Learning
TypeScript and Deep LearningOswald Campesato
 
論文輪読資料「Gated Feedback Recurrent Neural Networks」
論文輪読資料「Gated Feedback Recurrent Neural Networks」論文輪読資料「Gated Feedback Recurrent Neural Networks」
論文輪読資料「Gated Feedback Recurrent Neural Networks」kurotaki_weblab
 
A Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its ApplicationA Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its ApplicationXiaohu ZHU
 
Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksSang Jun Lee
 
Simple Introduction to AutoEncoder
Simple Introduction to AutoEncoderSimple Introduction to AutoEncoder
Simple Introduction to AutoEncoderJun Lang
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networksstellajoseph
 
Deep Belief Networks
Deep Belief NetworksDeep Belief Networks
Deep Belief NetworksHasan H Topcu
 

La actualidad más candente (20)

From neural networks to deep learning
From neural networks to deep learningFrom neural networks to deep learning
From neural networks to deep learning
 
Autoencoders for image_classification
Autoencoders for image_classificationAutoencoders for image_classification
Autoencoders for image_classification
 
Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10) Deep Learning: Recurrent Neural Network (Chapter 10)
Deep Learning: Recurrent Neural Network (Chapter 10)
 
Introduction to deep learning
Introduction to deep learningIntroduction to deep learning
Introduction to deep learning
 
Multidimensional RNN
Multidimensional RNNMultidimensional RNN
Multidimensional RNN
 
Deep Learning Tutorial
Deep Learning Tutorial Deep Learning Tutorial
Deep Learning Tutorial
 
RNN and its applications
RNN and its applicationsRNN and its applications
RNN and its applications
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용Convolutional neural networks 이론과 응용
Convolutional neural networks 이론과 응용
 
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
A Framework for Scene Recognition Using Convolutional Neural Network as Featu...
 
(Kpi summer school 2015) theano tutorial part2
(Kpi summer school 2015) theano tutorial part2(Kpi summer school 2015) theano tutorial part2
(Kpi summer school 2015) theano tutorial part2
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
 
2010 deep learning and unsupervised feature learning
2010 deep learning and unsupervised feature learning2010 deep learning and unsupervised feature learning
2010 deep learning and unsupervised feature learning
 
TypeScript and Deep Learning
TypeScript and Deep LearningTypeScript and Deep Learning
TypeScript and Deep Learning
 
論文輪読資料「Gated Feedback Recurrent Neural Networks」
論文輪読資料「Gated Feedback Recurrent Neural Networks」論文輪読資料「Gated Feedback Recurrent Neural Networks」
論文輪読資料「Gated Feedback Recurrent Neural Networks」
 
A Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its ApplicationA Brief Introduction on Recurrent Neural Network and Its Application
A Brief Introduction on Recurrent Neural Network and Its Application
 
Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural Networks
 
Simple Introduction to AutoEncoder
Simple Introduction to AutoEncoderSimple Introduction to AutoEncoder
Simple Introduction to AutoEncoder
 
Artificial neural networks
Artificial neural networksArtificial neural networks
Artificial neural networks
 
Deep Belief Networks
Deep Belief NetworksDeep Belief Networks
Deep Belief Networks
 

Similar a P03 neural networks cvpr2012 deep learning methods for vision

Looking into the Black Box - A Theoretical Insight into Deep Learning Networks
Looking into the Black Box - A Theoretical Insight into Deep Learning NetworksLooking into the Black Box - A Theoretical Insight into Deep Learning Networks
Looking into the Black Box - A Theoretical Insight into Deep Learning NetworksDinesh V
 
Startup.Ml: Using neon for NLP and Localization Applications
Startup.Ml: Using neon for NLP and Localization Applications Startup.Ml: Using neon for NLP and Localization Applications
Startup.Ml: Using neon for NLP and Localization Applications Intel Nervana
 
Machine learning - session 7
Machine learning - session 7Machine learning - session 7
Machine learning - session 7Luis Borbon
 
introduction to deeplearning
introduction to deeplearningintroduction to deeplearning
introduction to deeplearningEyad Alshami
 
Machine Learning Live
Machine Learning LiveMachine Learning Live
Machine Learning LiveMike Anderson
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspectiveAnirban Santara
 
Top 10 deep learning algorithms you should know in
Top 10 deep learning algorithms you should know inTop 10 deep learning algorithms you should know in
Top 10 deep learning algorithms you should know inAmanKumarSingh97
 
1 Simuladores Rna
1 Simuladores Rna1 Simuladores Rna
1 Simuladores RnaESCOM
 
Automatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face RecognitionAutomatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face Recognitionvatsal199567
 
Fundamental of deep learning
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learningStanley Wang
 
Introduction to un supervised learning
Introduction to un supervised learningIntroduction to un supervised learning
Introduction to un supervised learningRishikesh .
 
Machine Learning Workshop
Machine Learning WorkshopMachine Learning Workshop
Machine Learning WorkshopGDSCTUe
 
Introducing Deep learning with Matlab
Introducing Deep learning with MatlabIntroducing Deep learning with Matlab
Introducing Deep learning with MatlabMassimo Talia
 
DNR - Auto deep lab paper review ppt
DNR - Auto deep lab paper review pptDNR - Auto deep lab paper review ppt
DNR - Auto deep lab paper review ppttaeseon ryu
 
Yann le cun
Yann le cunYann le cun
Yann le cunYandex
 

Similar a P03 neural networks cvpr2012 deep learning methods for vision (20)

Looking into the Black Box - A Theoretical Insight into Deep Learning Networks
Looking into the Black Box - A Theoretical Insight into Deep Learning NetworksLooking into the Black Box - A Theoretical Insight into Deep Learning Networks
Looking into the Black Box - A Theoretical Insight into Deep Learning Networks
 
Startup.Ml: Using neon for NLP and Localization Applications
Startup.Ml: Using neon for NLP and Localization Applications Startup.Ml: Using neon for NLP and Localization Applications
Startup.Ml: Using neon for NLP and Localization Applications
 
Cnn
CnnCnn
Cnn
 
Machine learning - session 7
Machine learning - session 7Machine learning - session 7
Machine learning - session 7
 
introduction to deeplearning
introduction to deeplearningintroduction to deeplearning
introduction to deeplearning
 
Machine Learning Live
Machine Learning LiveMachine Learning Live
Machine Learning Live
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective
 
Top 10 deep learning algorithms you should know in
Top 10 deep learning algorithms you should know inTop 10 deep learning algorithms you should know in
Top 10 deep learning algorithms you should know in
 
Convolutional neural networks
Convolutional neural  networksConvolutional neural  networks
Convolutional neural networks
 
Som paper1.doc
Som paper1.docSom paper1.doc
Som paper1.doc
 
1 Simuladores Rna
1 Simuladores Rna1 Simuladores Rna
1 Simuladores Rna
 
Automatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face RecognitionAutomatic Attendace using convolutional neural network Face Recognition
Automatic Attendace using convolutional neural network Face Recognition
 
Fundamental of deep learning
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learning
 
Introduction to un supervised learning
Introduction to un supervised learningIntroduction to un supervised learning
Introduction to un supervised learning
 
Machine Learning Workshop
Machine Learning WorkshopMachine Learning Workshop
Machine Learning Workshop
 
A Puppet Story
A Puppet StoryA Puppet Story
A Puppet Story
 
Introducing Deep learning with Matlab
Introducing Deep learning with MatlabIntroducing Deep learning with Matlab
Introducing Deep learning with Matlab
 
DNR - Auto deep lab paper review ppt
DNR - Auto deep lab paper review pptDNR - Auto deep lab paper review ppt
DNR - Auto deep lab paper review ppt
 
Yann le cun
Yann le cunYann le cun
Yann le cun
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
 

Más de zukun

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009zukun
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVzukun
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Informationzukun
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statisticszukun
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibrationzukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionzukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluationzukun
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-softwarezukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptorszukun
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectorszukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-introzukun
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video searchzukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video searchzukun
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video searchzukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learningzukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionzukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick startzukun
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structureszukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities zukun
 

Más de zukun (20)

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 

Último

ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactPECB
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxheathfieldcps1
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingTechSoup
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesShubhangi Sonawane
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxNikitaBankoti2
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdfQucHHunhnh
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxheathfieldcps1
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfAyushMahapatra5
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxRamakrishna Reddy Bijjam
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphThiyagu K
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfJayanti Pande
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhikauryashika82
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701bronxfugly43
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...EduSkills OECD
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...christianmathematics
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docxPoojaSen20
 

Último (20)

ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
The basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptxThe basics of sentences session 2pptx copy.pptx
The basics of sentences session 2pptx copy.pptx
 
Grant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy ConsultingGrant Readiness 101 TechSoup and Remy Consulting
Grant Readiness 101 TechSoup and Remy Consulting
 
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptxINDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
INDIA QUIZ 2024 RLAC DELHI UNIVERSITY.pptx
 
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural ResourcesEnergy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
Energy Resources. ( B. Pharmacy, 1st Year, Sem-II) Natural Resources
 
Role Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptxRole Of Transgenic Animal In Target Validation-1.pptx
Role Of Transgenic Animal In Target Validation-1.pptx
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Z Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot GraphZ Score,T Score, Percential Rank and Box Plot Graph
Z Score,T Score, Percential Rank and Box Plot Graph
 
Web & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdfWeb & Social Media Analytics Previous Year Question Paper.pdf
Web & Social Media Analytics Previous Year Question Paper.pdf
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701ComPTIA Overview | Comptia Security+ Book SY0-701
ComPTIA Overview | Comptia Security+ Book SY0-701
 
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
Presentation by Andreas Schleicher Tackling the School Absenteeism Crisis 30 ...
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 

P03 neural networks cvpr2012 deep learning methods for vision

  • 1. NEURAL NETS FOR VISION CVPR 2012 Tutorial on Deep Learning Part III Marc'Aurelio Ranzato - ranzato@google.com www.cs.toronto.edu/~ranzato
  • 2. Building an Object Recognition System “CAR” CLASSIFIER FEATURE EXTRACTOR IDEA: Use data to optimize features for the given task. 2 Ranzato
  • 3. Building an Object Recognition System “CAR” CLASSIFIER f  X ;  What we want: Use parameterized function such that a) features are computed efficiently b) features can be trained efficiently 3 Ranzato
  • 4. Building an Object Recognition System “CAR” END-TO-END RECOGNITION SYSTEM – Everything becomes adaptive. – No distiction between feature extractor and classifier. – Big non-linear system trained from raw pixels to labels. 4 Ranzato
  • 5. Building an Object Recognition System “CAR” END-TO-END RECOGNITION SYSTEM Q: How can we build such a highly non-linear system? A: By combining simple building blocks we can make more and more complex systems. 5 Ranzato
  • 6. Building A Complicated Function Simple Functions One Example of sin  x  Complicated Function log  x  3 cos  x  log cos exp sin  x  3 x exp  x  6 Ranzato
  • 7. Building A Complicated Function Simple Functions One Example of sin  x  Complicated Function log  x  3 cos  x  log cos exp sin  x  3 x exp  x  – Function composition is at the core of deep learning methods. – Each “simple function” will have parameters subject to training. 7 Ranzato
  • 8. Implementing A Complicated Function Complicated Function 3 log cos exp sin  x  3 sin  x  x exp  x  cos  x  log  x  8 Ranzato
  • 9. Intuition Behind Deep Neural Nets “CAR” 9 Ranzato
  • 10. Intuition Behind Deep Neural Nets “CAR” NOTE: Each black box can have trainable parameters. Their composition makes a highly non-linear system. 10 Ranzato
  • 11. Intuition Behind Deep Neural Nets Intermediate representations/features “CAR” NOTE: System produces a hierarchy of features. 11 Ranzato
  • 12. Intuition Behind Deep Neural Nets “CAR” 12 Lee et al. “Convolutional DBN's for scalable unsup. learning...” ICML 2009 Ranzato
  • 13. Intuition Behind Deep Neural Nets “CAR” 13 Lee et al. ICML 2009 Ranzato
  • 14. Intuition Behind Deep Neural Nets “CAR” 14 Lee et al. ICML 2009 Ranzato
  • 15. KEY IDEAS OF NEURAL NETS IDEA # 1 Learn features from data IDEA # 2 Use differentiable functions that produce features efficiently IDEA # 3 End-to-end learning: no distinction between feature extractor and classifier IDEA # 4 “Deep” architectures: cascade of simpler non-linear modules 15 Ranzato
  • 16. KEY QUESTIONS - What is the input-output mapping? - How are parameters trained? - How computational expensive is it? - How well does it work? 16 Ranzato
  • 17. Outline - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions: - semi-supervised / multi-task / multi-modal - Comparison to Other Methods - boosting & cascade methods - probabilistic models - Large-Scale Learning with Deep Neural Nets 17 Ranzato
  • 18. Outline - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions: - semi-supervised / multi-task / multi-modal - Comparison to Other Methods - boosting & cascade methods - probabilistic models - Large-Scale Learning with Deep Neural Nets 18 Ranzato
  • 19. Linear Classifier: SVM D Input: X ∈R Binary label: y Parameters: W ∈R D T Output prediction: W X 1 2 T Loss: L= ∥W ∥  max [0,1−W X y ] 2 L Hinge Loss 19 1 WT X y Ranzato
  • 20. Linear Classifier: Logistic Regression D Input: X ∈R Binary label: y Parameters: W ∈R D T Output prediction: W X 1 2 T Loss: L= ∥W ∥  log1exp −W X y 2 L Log Loss 20 WT X y Ranzato
  • 21. Logistic Regression: Probabilistic Interpretation D Input: X ∈R Binary label: y Parameters: W ∈R D 1 1 Output prediction: p y=1∣X = T −W X 1e WT X Loss: L=−log p y∣X  Q: What is the gradient of L w.r.t. W ? 21 Ranzato
  • 22. Logistic Regression: Probabilistic Interpretation D Input: X ∈R Binary label: y Parameters: W ∈R D 1 1 Output prediction: p y=1∣X = T −W X 1e WT X Loss: L=log1exp −W T X y  Q: What is the gradient of L w.r.t. W ? 22 Ranzato
  • 23. Simple Functions sin  x  Complicated Function log  x  1 cos  x  −log −W X T  x 3 1e exp  x  23 Ranzato
  • 24. Logistic Regression: Computing Loss Complicated Function 1 −log −W X T  1e T u 1 p L W X −u −log p 1e 24 Ranzato
  • 25. Chain Rule x y dL dL dx dy Given y  x and dL /dy, What is dL /dx ? 25 Ranzato
  • 26. Chain Rule x y dL dL dx dy Given y  x and dL /dy, dL dL dy = ⋅ What is dL /dx ? dx dy dx 26 Ranzato
  • 27. Chain Rule x y dL dL dx dy Given y  x and dL /dy, dL dL dy = ⋅ What is dL /dx ? dx dy dx All needed information is local! 27 Ranzato
  • 28. Logistic Regression: Computing Gradients X T u 1 p L W X −u −log p 1e −1 p dL dp 28 Ranzato
  • 29. Logistic Regression: Computing Gradients X T u 1 p L W X −u −log p 1e p1− p −1 p dp dL du dp 29 Ranzato
  • 30. Logistic Regression: Computing Gradients X T u 1 p L W X −u −log p 1e X p1− p −1 p du dp dL dW du dp dL dL dp du = ⋅ ⋅ =  p−1 X dW dp du dW 30 Ranzato
  • 31. What Did We Learn? - Logistic Regression - How to compute gradients of complicated functions Logistic Regression 31 Ranzato
  • 32. Neural Network Logistic Logistic Regression Regression 32 Ranzato
  • 33. Neural Network – A neural net can be thought of as a stack of logistic regression classifiers. Each input is the output of the previous layer. Logistic Logistic Logistic Regression Regression Regression NOTE: intermediate units can be thought of as linear classifiers trained with implicit target values. 33 Ranzato
  • 34. Key Computations: F-Prop / B-Prop F-PROP X Z  34 Ranzato
  • 35. Key Computations: F-Prop / B-Prop B-PROP ∂L ∂L ∂X ∂Z ∂Z ∂Z { , } ∂ X ∂ ∂L ∂ 35 Ranzato
  • 36. Neural Net: Training A) Compute loss on small mini-batch Layer 1 Layer 2 Layer 3 36 Ranzato
  • 37. Neural Net: Training A) Compute loss on small mini-batch F-PROP Layer 1 Layer 2 Layer 3 37 Ranzato
  • 38. Neural Net: Training A) Compute loss on small mini-batch F-PROP Layer 1 Layer 2 Layer 3 38 Ranzato
  • 39. Neural Net: Training A) Compute loss on small mini-batch F-PROP Layer 1 Layer 2 Layer 3 39 Ranzato
  • 40. Neural Net: Training A) Compute loss on small mini-batch B) Compute gradient w.r.t. parameters B-PROP Layer 1 Layer 2 Layer 3 40 Ranzato
  • 41. Neural Net: Training A) Compute loss on small mini-batch B) Compute gradient w.r.t. parameters B-PROP Layer 1 Layer 2 Layer 3 41 Ranzato
  • 42. Neural Net: Training A) Compute loss on small mini-batch B) Compute gradient w.r.t. parameters B-PROP Layer 1 Layer 2 Layer 3 42 Ranzato
  • 43. Neural Net: Training A) Compute loss on small mini-batch B) Compute gradient w.r.t. parameters dL C) Use gradient to update parameters   − d Layer 1 Layer 2 Layer 3 43 Ranzato
  • 44. NEURAL NET: ARCHITECTURE hj h j1 T h j1 =W j1 h j b j 1 M ×N N W j∈R , b j∈R M N h j ∈ R , h j1 ∈ R 44 Ranzato
  • 45. NEURAL NET: ARCHITECTURE hj h j1 T h j1 =W j1 h j b j 1 1  x   x = −x 1e x 45 Ranzato
  • 46. NEURAL NET: ARCHITECTURE hj h j1 T h j1 =W j1 h j b j 1  x   x =tanh x  x 46 Ranzato
  • 47. Graphical Notations f  X ;W  is equivalent to X h W h k is called feature, hidden unit, neuron or code unit 47 Ranzato
  • 48. MOST COMMON ARCHITECTURE X y  Error y NOTE: Multi-layer neural nets with more than two layers are nowadays called deep nets!! 48 Ranzato
  • 49. MOST COMMON ARCHITECTURE X y  Error y NOTE: Multi-layer neural nets with more than two layers are nowadays called deep nets!! NOTE: User must specify number of layers, number of hidden units, type of layers and loss function. 49 Ranzato
  • 50. MOST COMMON LOSSES X y  Error y Square Euclidean Distance (regression): N y , y ∈R  1 N 2 L= ∑ i=1  y i − y i   2 50 Ranzato
  • 51. MOST COMMON LOSSES X y  Error y Cross Entropy (classification): N N N y , y ∈[0,1] ,  ∑i =1 y i =1, ∑i =1  i =1 y N L=−∑i =1 y i log y i  51 Ranzato
  • 52. NEURAL NETS FACTS 1: User specifies loss based on the task. 2: Any optimization algorithm can be chosen for training. 3: Cost of F-Prop and B-Prop is similar and proportional to the number of layers and their size. 52 Ranzato
  • 53. Toy Code: Neural Net Trainer % F-PROP for i = 1 : nr_layers - 1 [h{i} jac{i}] = logistic(W{i} * h{i-1} + b{i}); end h{nr_layers-1} = W{nr_layers-1} * h{nr_layers-2} + b{nr_layers-1}; prediction = softmax(h{l-1}); % CROSS ENTROPY LOSS loss = - sum(sum(log(prediction) .* target)); % B-PROP dh{l-1} = prediction - target; for i = nr_layers – 1 : -1 : 1 Wgrad{i} = dh{i} * h{i-1}'; bgrad{i} = sum(dh{i}, 2); dh{i-1} = (W{i}' * dh{i}) .* jac{i-1}; end % UPDATE for i = 1 : nr_layers - 1 W{i} = W{i} – (lr / batch_size) * Wgrad{i}; b{i} = b{i} – (lr / batch_size) * bgrad{i}; 53 end Ranzato
  • 54. TOY EXAMPLE: SYNTHETIC DATA 1 input & 1 output 100 hidden units in each layer 54 Ranzato
  • 55. TOY EXAMPLE: SYNTHETIC DATA 1 input & 1 output 3 hidden layers 55 Ranzato
  • 56. TOY EXAMPLE: SYNTHETIC DATA 1 input & 1 output 3 hidden layers, 1000 hiddens Regression of cosine 56 Ranzato
  • 57. TOY EXAMPLE: SYNTHETIC DATA 1 input & 1 output 3 hidden layers, 1000 hiddens Regression of cosine 57 Ranzato
  • 58. Outline - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions: - semi-supervised / multi-task / multi-modal - Comparison to Other Methods - boosting & cascade methods - probabilistic models - Large-Scale Learning with Deep Neural Nets 58 Ranzato
  • 59. FULLY CONNECTED NEURAL NET Example: 1000x1000 image 1M hidden units 10^12 parameters!!! - Spatial correlation is local - Better to put resources elsewhere! 59 Ranzato
  • 60. LOCALLY CONNECTED NEURAL NET Example: 1000x1000 image 1M hidden units Filter size: 10x10 100M parameters 60 Ranzato
  • 61. LOCALLY CONNECTED NEURAL NET Example: 1000x1000 image 1M hidden units Filter size: 10x10 100M parameters 61 Ranzato
  • 62. LOCALLY CONNECTED NEURAL NET STATIONARITY? Statistics is similar at different locations Example: 1000x1000 image 1M hidden units Filter size: 10x10 100M parameters 62 Ranzato
  • 63. CONVOLUTIONAL NET Share the same parameters across different locations: Convolutions with learned kernels 63 Ranzato
  • 64. CONVOLUTIONAL NET Learn multiple filters. E.g.: 1000x1000 image 100 Filters Filter size: 10x10 10K parameters 64 Ranzato
  • 65. NEURAL NETS FOR VISION A standard neural net applied to images: - scales quadratically with the size of the input - does not leverage stationarity Solution: - connect each hidden unit to a small patch of the input - share the weight across hidden units This is called: convolutional network. LeCun et al. “Gradient-based learning applied to document recognition” IEEE 1998 65 Ranzato
  • 66. CONVOLUTIONAL NET Let us assume filter is an “eye” detector. Q.: how can we make the detection robust to the exact location of the eye? 66 Ranzato
  • 67. CONVOLUTIONAL NET By “pooling” (e.g., max or average) filter responses at different locations we gain robustness to the exact spatial location of features. 67 Ranzato
  • 68. CONV NETS: EXTENSIONS Over the years, some new modules have proven to be very effective when plugged into conv-nets: - L2 Pooling layer i layer i 1 N x , y  x , y h i1, x , y = ∑  j , k ∈ N  x , y  h 2 i, j ,k - Local Contrast Normalization layer i layer i 1 hi , x , y −m i , N  x , y  x , y h i1, x , y = N x , y  i , N  x , y Jarrett et al. “What is the best multi-stage architecture for object 68 recognition?” ICCV 2009 Ranzato
  • 69. CONV NETS: L2 POOLING -1 0 L2 Pooling 0 0 0 ∑ 5 +1 2 ⋅ hi i =1 i h i1 69 Kavukguoglu et al. “Learning invariant features ...” CVPR 2009 Ranzato
  • 70. CONV NETS: L2 POOLING 0 0 L2 Pooling 0 0 +1 ∑ 5 +1 2 ⋅ hi i =1 i h i1 70 Kavukguoglu et al. “Learning invariant features ...” CVPR 2009 Ranzato
  • 71. LOCAL CONTRAST NORMALIZATION hi , x , y −m i , N  x , y h i1, x , y =  i , N  x , y 0 0 1 0.5 0 LCN 0 -1 -0.5 0 0 71 Ranzato
  • 72. LOCAL CONTRAST NORMALIZATION hi , x , y −m i , N  x , y h i1, x , y =  i , N  x , y 1 0 11 0.5 1 LCN 0 -9 -0.5 1 0 72 Ranzato
  • 73. CONV NETS: EXTENSIONS L2 Pooling & Local Contrast Normalization help learning more invariant representations! 73 Ranzato
  • 74. CONV NETS: TYPICAL ARCHITECTURE One stage (zoom) Filtering Pooling LCN Whole system Input Class Image Labels Linear Layer 1st stage 2nd stage 3rd stage 74 Ranzato
  • 75. CONV NETS: TRAINING Since convolutions and sub-sampling are differentiable, we can use standard back-propagation. Algorithm: Given a small mini-batch - FPROP - BPROP - PARAMETER UPDATE 75 Ranzato
  • 76. CONV NETS: EXAMPLES - Object category recognition Boureau et al. “Ask the locals: multi-way local pooling for image recognition” ICCV 2011 - Segmentation Turaga et al. “Maximin learning of image segmentation” NIPS 2009 - OCR Ciresan et al. “MCDNN for Image Classification” CVPR 2012 - Pedestrian detection Kavukcuoglu et al. “Learning convolutional feature hierarchies for visual recognition” NIPS 2010 - Robotics Sermanet et al. “Mapping and planning..with long range perception” IROS 2008 76 Ranzato
  • 77. LIMITATIONS & SOLUTIONS - requires lots of labeled data to train + unsupervised learning - difficult optimization + layer-wise training - scalability + distributed training 77 Ranzato
  • 78. Outline - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions: - semi-supervised / multi-task / multi-modal - Comparison to Other Methods - boosting & cascade methods - probabilistic models - Large-Scale Learning with Deep Neural Nets 78 Ranzato
  • 79. BACK TO LOGISTIC REGRESSION input prediction Error target 79 Ranzato
  • 80. Unsupervised Learning input prediction Error ? 80 Ranzato
  • 81. Unsupervised Learning Q: How should we train the input-output mapping if we do not have target values? input code A: Code has to retain information from the input but only if this is similar to training samples. By better representing only those inputs that are similar to training samples we hope to extract interesting structure (e.g., structure of manifold where data live). 81 Ranzato
  • 82. Unsupervised Learning Q: How to constrain the model to represent training samples better than other data points? 82 Ranzato
  • 83. Unsupervised Learning – reconstruct the input from the code & make code compact (auto-encder with bottle-neck). – reconstruct the input from the code & make code sparse (sparse auto-encoders) see work in LeCun, Ng, Fergus, Lee, Yu's labs – add noise to the input or code (denoising auto-encoders) see work in Y. Bengio, Lee's lab – make sure that the model defines a distribution that normalizes to 1 (RBM). see work in Y. Bengio, Hinton, Lee, Salakthudinov's lab 83 Ranzato
  • 84. AUTO-ENCODERS NEURAL NETS input code reconstruction encoder decoder Error – input higher dimensional than code - error: ||reconstruction - input||2 - training: back-propagation 84 Ranzato
  • 85. SPARSE AUTO-ENCODERS Sparsity Penalty input reconstruction encoder decoder code Error – sparsity penalty: ||code||1 - error: ||reconstruction - input||2 - loss: sum of squared reconstruction error and sparsity 85 - training: back-propagation Ranzato
  • 86. SPARSE AUTO-ENCODERS Sparsity Penalty input reconstruction encoder decoder code Error T – input: X code: h=W X 2 - loss: L  X ; W =∥W h− X ∥  ∑ j ∣h j∣ 86 Le et al. “ICA with reconstruction cost..” NIPS 2011 Ranzato
  • 87. How To Use Unsupervised Learning 1) Given unlabeled data, learn features 87 Ranzato
  • 88. How To Use Unsupervised Learning 1) Given unlabeled data, learn features 2) Use encoder to produce features and train another layer on the top 88 Ranzato
  • 89. How To Use Unsupervised Learning 1) Given unlabeled data, learn features 2) Use encoder to produce features and train another layer on the top Layer-wise training of a feature hierarchy 89 Ranzato
  • 90. How To Use Unsupervised Learning 1) Given unlabeled data, learn features 2) Use encoder to produce features and train another layer on the top 3) feed features to classifier & train just the classifier input prediction label Reduced overfitting since features are learned in unsupervised way! 90 Ranzato
  • 91. How To Use Unsupervised Learning 1) Given unlabeled data, learn features 2) Use encoder to produce features and train another layer on the top 3) feed features to classifier & jointly train the whole system input prediction label Given enough data, this usually yields the best results: end-to-end learning! 91 Ranzato
  • 92. Outline - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions: - semi-supervised / multi-task / multi-modal - Comparison to Other Methods - boosting & cascade methods - probabilistic models - Large-Scale Learning with Deep Neural Nets 92 Ranzato
  • 93. Semi-Supervised Learning truck airplane deer frog bird 93 Ranzato
  • 94. Semi-Supervised Learning truck airplane deer frog bird LOTS & LOTS OF UNLABELED DATA!!! 94 Ranzato
  • 95. Semi-Supervised Learning Loss = supervised_error + unsupervised_error prediction input label 95 Weston et al. “Deep learning via semi-supervised embedding” ICML 2008 Ranzato
  • 96. Multi-Task Learning Face detection is hard because of lighting, pose, but also occluding goggles. Face detection could made be easier by face identification. The identification task may help the detection task. 96 Ranzato
  • 97. Multi-Task Learning - Easy to add many error terms to loss function. - Joint learning of related tasks yields better representations. Example of architecture: 97 Collobert et al. “NLP (almost) from scratch” JMLR 2011 Ranzato
  • 98. Multi-Modal Learning Audio and Video streams are often complimentary to each other. E.g., audio can provide important clues to improve visual recognition, and vice versa. 98 Ranzato
  • 99. Multi-Modal Learning - Weak assumptions on input distribution - Fully adaptive to data Example of architecture: modality #1 modality #2 modality #3 99 Ngiam et al. “Multi-modal deep learning” ICML 2011 Ranzato
  • 100. Outline - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions: - semi-supervised / multi-task / multi-modal - Comparison to Other Methods - boosting & cascade methods - probabilistic models - Large-Scale Learning with Deep Neural Nets 100 Ranzato
  • 101. Boosting & Forests Deep Nets: - single highly non-linear system - “deep” stack of simpler modules - all parameters are subject to learning Boosting & Forests: - sequence of “weak” (simple) classifiers that are linearly combined to produce a powerful classifier - subsequent classifiers do not exploit representations of earlier classifiers, it's a “shallow” linear mixture - typically features are not learned 101 Ranzato
  • 102. Properties Deep Nets Boosting Adaptive features Hierarchical features End-to-end learning Leverage unlab. data Easy to parallelize Fast training Fast at test time 102 Ranzato
  • 103. Deep Neural-Nets VS Probabilistic Models Deep Neural Nets: - mean-field approximations of intractable probabilistic models - usually more efficient - typically more unconstrained (partition function has to be replaced by other constraints, e.g. sparsity). Hierarchical Probabilistic Models (DBN, DBM, etc.): - in the most interesting cases, they are intractable - they better deal with uncertainty - they can be easily combined 103 Ranzato
  • 104. Example: Auto-Encoder Neural Net: T code Z = W X b e  e reconstruction  X =W d Z bd Probabilistic Model (Gaussian RBM): T E [ Z∣ X ]= W X b e  E [ X ∣Z ]=W Z bd 104 Ranzato
  • 105. Properties Deep Nets Probab. Models Adaptive features Hierarchical features End-to-end learning Leverage unlab. data Models uncertainty Fast training Fast at test time 105 Ranzato
  • 106. Outline - Neural Networks for Supervised Training - Architecture - Loss function - Neural Networks for Vision: Convolutional & Tiled - Unsupervised Training of Neural Networks - Extensions: - semi-supervised / multi-task / multi-modal - Comparison to Other Methods - boosting & cascade methods - probabilistic models - Large-Scale Learning with Deep Neural Nets 106 Ranzato
  • 107. Tera-Scale Deep Learning @ Google Observation #1: more features always improve performance unless data is scarce. Observation #2: deep learning methods have higher capacity and have the potential to model data better. Q #1: Given lots of data and lots of machines, can we scale up deep learning methods? Q #2: Will deep learning methods perform much better? 107 Ranzato
  • 108. The Challenge A Large Scale problem has: – lots of training samples (>10M) – lots of classes (>10K) and – lots of input dimensions (>10K). – best optimizer in practice is on-line SGD which is naturally sequential, hard to parallelize. – layers cannot be trained independently and in parallel, hard to distribute – model can have lots of parameters that may clog the network, hard to distribute across machines 108 Ranzato
  • 109. Our Solution 2nd layer 1st layer input 109 Le et al. “Building high-level features using large-scale unsupervised learning” ICML 2012 Ranzato
  • 110. Our Solution 1st 2nd 3rd machine machine machine 2nd layer 1st layer input 110 Ranzato
  • 111. Our Solution 1st 2nd 3rd machine machine machine MODEL PARALLELISM 111 Ranzato
  • 112. Distributed Deep Nets Deep Net MODEL PARALLELISM input #1 112 Le et al. “Building high-level features using large-scale unsupervised learning” ICML 2012 Ranzato
  • 113. Distributed Deep Nets MODEL PARALLELISM + DATA PARALLELISM input #3 input #2 input #1 113 Le et al. “Building high-level features using large-scale unsupervised learning” ICML 2012 Ranzato
  • 114. Asynchronous SGD PARAMETER SERVER 1st replica 2nd replica 3rd replica 114 Ranzato
  • 115. Asynchronous SGD PARAMETER SERVER ∂L ∂1 1st replica 2nd replica 3rd replica 115 Ranzato
  • 116. Asynchronous SGD PARAMETER SERVER 1 1st replica 2nd replica 3rd replica 116 Ranzato
  • 117. Asynchronous SGD PARAMETER SERVER (update parameters) 1st replica 2nd replica 3rd replica 117 Ranzato
  • 118. Asynchronous SGD PARAMETER SERVER ∂L ∂2 1st replica 2nd replica 3rd replica 118 Ranzato
  • 119. Asynchronous SGD PARAMETER SERVER 2 1st replica 2nd replica 3rd replica 119 Ranzato
  • 120. Asynchronous SGD PARAMETER SERVER (update parameters) 1st replica 2nd replica 3rd replica 120 Ranzato
  • 121. Unsupervised Learning With 1B Parameters DATA: 10M youtube (unlabeled) frames of size 200x200. 121 Le et al. “Building high-level features using large-scale unsupervised learning” ICML 2012 Ranzato
  • 122. Unsupervised Learning With 1B Parameters Deep Net: – 3 stages – each stage consists of local filtering, L2 pooling, LCN - 18x18 filters - 8 filters at each location - L2 pooling and LCN over 5x5 neighborhoods – training jointly the three layers by: - reconstructing the input of each layer - sparsity on the code 122 Le et al. “Building high-level features using large-scale unsupervised learning” ICML 2012 Ranzato
  • 123. Unsupervised Learning With 1B Parameters Deep Net: – 3 stages – each stage consists of local filtering, L2 pooling, LCN - 18x18 filters - 8 filters at each location - L2 pooling and LCN over 5x5 neighborhoods – training jointly the three layers by: - reconstructing the input of each layer - sparsity on the code 1B parameters!!! 123 Le et al. “Building high-level features using large-scale unsupervised learning” ICML 2012 Ranzato
  • 124. Validating Unsupervised Learning The network has seen lots of objects during training, but without any label. Q.: how can we validate unsupervised learning? Q.: Did the network form any high-level representation? E.g., does it have any neuron responding for faces? – build validation set with 50% faces, 50% random images - study properties of neurons 124 Le et al. “Building high-level features using large-scale unsupervised learning” ICML 2012 Ranzato
  • 125. Validating Unsupervised Learning neuron responses 1st stage 2nd stage 3rd stage 125 Le et al. “Building high-level features using large-scale unsupervised learning” ICML 2012 Ranzato
  • 126. Top Images For Best Face Neuron 126 Ranzato
  • 127. Best Input For Face Neuron 127 Le et al. “Building high-level features using large-scale unsupervised learning” ICML 2012 Ranzato
  • 128. Unsupervised + Supervised (ImageNet) Input Image y  1st stage 2nd stage 3rd stage COST y 128 Le et al. “Building high-level features using large-scale unsupervised learning” ICML 2012 Ranzato
  • 129. Object Recognition on ImageNet IMAGENET v.2011 (16M images, 20K categories) METHOD ACCURACY % Weston & Bengio 2011 9.3 Linear Classifier on deep features 13.1 Deep Net (from random) 13.6 Deep Net (from unsup.) 15.8 129 Le et al. “Building high-level features using large-scale unsupervised learning” ICML 2012 Ranzato
  • 130. Top Inputs After Supervision 130 Ranzato
  • 131. Top Inputs After Supervision 131 Ranzato
  • 132. Experiments: and many more... - automatic speech recognition - natural language processing - biomed applications - finance Generic learning algorithm!! 132 Ranzato
  • 133. References Tutorials & Background Material – Yoshua Bengio, Learning Deep Architectures for AI, Foundations and Trends in Machine Learning, 2(1), pp.1-127, 2009. – LeCun, Chopra, Hadsell, Ranzato, Huang: A Tutorial on Energy-Based Learning, in Bakir, G. and Hofman, T. and Schölkopf, B. and Smola, A. and Taskar, B. (Eds), Predicting Structured Data, MIT Press, 2006 Convolutional Nets – LeCun, Bottou, Bengio and Haffner: Gradient-Based Learning Applied to Document Recognition, Proceedings of the IEEE, 86(11):2278-2324, November 1998 – Jarrett, Kavukcuoglu, Ranzato, LeCun: What is the Best Multi-Stage Architecture for Object Recognition?, Proc. International Conference on Computer Vision (ICCV'09), IEEE, 2009 - Kavukcuoglu, Sermanet, Boureau, Gregor, Mathieu, LeCun: Learning Convolutional Feature Hierachies for Visual Recognition, Advances in Neural Information 133 Processing Systems (NIPS 2010), 23, 2010 Ranzato
  • 134. Unsupervised Learning – ICA with Reconstruction Cost for Efficient Overcomplete Feature Learning. Le, Karpenko, Ngiam, Ng. In NIPS*2011 – Rifai, Vincent, Muller, Glorot, Bengio, Contracting Auto-Encoders: Explicit invariance during feature extraction, in: Proceedings of the Twenty-eight International Conference on Machine Learning (ICML'11), 2011 - Vincent, Larochelle, Lajoie, Bengio, Manzagol, Stacked Denoising Autoencoders: Learning Useful Representations in a Deep Network with a Local Denoising Criterion, Journal of Machine Learning Research, 11:3371--3408, 2010. - Gregor, Szlam, LeCun: Structured Sparse Coding via Lateral Inhibition, Advances in Neural Information Processing Systems (NIPS 2011), 24, 2011 - Kavukcuoglu, Ranzato, LeCun. "Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition". ArXiv 1010.3467 2008 - Hinton, Krizhevsky, Wang, Transforming Auto-encoders, ICANN, 2011 Multi-modal Learning – Multimodal deep learning, Ngiam, Khosla, Kim, Nam, Lee, Ng. In Proceedings of the Twenty-Eighth International Conference on Machine Learning, 2011. 134 Ranzato
  • 135. Locally Connected Nets – Gregor, LeCun “Emergence of complex-like cells in a temporal product network with local receptive fields” Arxiv. 2009 – Ranzato, Mnih, Hinton “Generating more realistic images using gated MRF's” NIPS 2010 – Le, Ngiam, Chen, Chia, Koh, Ng “Tiled convolutional neural networks” NIPS 2010 Distributed Learning – Le, Ranzato, Monga, Devin, Corrado, Chen, Dean, Ng. "Building High-Level Features Using Large Scale Unsupervised Learning". International Conference of Machine Learning (ICML 2012), Edinburgh, 2012. Papers on Scene Parsing – Farabet, Couprie, Najman, LeCun, “Scene Parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers”, in Proc. of the International Conference on Machine Learning (ICML'12), Edinburgh, Scotland, 2012. - Socher, Lin, Ng, Manning, “Parsing Natural Scenes and Natural Language with Recursive Neural Networks”. International Conference of Machine Learning (ICML 2011) 2011. 135 Ranzato
  • 136. Papers on Object Recognition - Boureau, Le Roux, Bach, Ponce, LeCun: Ask the locals: multi-way local pooling for image recognition, Proc. International Conference on Computer Vision 2011 - Sermanet, LeCun: Traffic Sign Recognition with Multi-Scale Convolutional Networks, Proceedings of International Joint Conference on Neural Networks (IJCNN'11) - Ciresan, Meier, Gambardella, Schmidhuber. Convolutional Neural Network Committees For Handwritten Character Classification. 11th International Conference on Document Analysis and Recognition (ICDAR 2011), Beijing, China. - Ciresan, Meier, Masci, Gambardella, Schmidhuber. Flexible, High Performance Convolutional Neural Networks for Image Classification. International Joint Conference on Artificial Intelligence IJCAI-2011. Papers on Action Recognition – Learning hierarchical spatio-temporal features for action recognition with independent subspace analysis, Le, Zou, Yeung, Ng. In Computer Vision and Pattern Recognition (CVPR), 2011 Papers on Segmentation – Turaga, Briggman, Helmstaedter, Denk, Seung Maximin learning of image 136 segmentation. NIPS, 2009. Ranzato
  • 137. Papers on Vision for Robotics – Hadsell, Sermanet, Scoffier, Erkan, Kavackuoglu, Muller, LeCun: Learning Long- Range Vision for Autonomous Off-Road Driving, Journal of Field Robotics, 26(2):120-144, February 2009, Deep Convex Nets & Deconv-Nets – Deng, Yu. “Deep Convex Network: A Scalable Architecture for Speech Pattern Classification.” Interspeech, 2011. - Zeiler, Taylor, Fergus "Adaptive Deconvolutional Networks for Mid and High Level Feature Learning." ICCV. 2011 Papers on Biological Inspired Vision – Serre, Wolf, Bileschi, Riesenhuber, Poggio. Robust Object Recognition with Cortex-like Mechanisms, IEEE Transactions on Pattern Analysis and Machine Intelligence, 29, 3, 411-426, 2007. - Pinto, Doukhan, DiCarlo, Cox "A high-throughput screening approach to discovering good forms of biologically inspired visual representation." {PLoS} Computational Biology. 2009 137 Ranzato
  • 138. Papers on Embedded ConvNets for Real-Time Vision Applications – Farabet, Martini, Corda, Akselrod, Culurciello, LeCun: NeuFlow: A Runtime Reconfigurable Dataflow Processor for Vision, Workshop on Embedded Computer Vision, CVPR 2011, Papers on Image Denoising Using Neural Nets – Burger, Schuler, Harmeling: Image Denoisng: Can Plain Neural Networks Compete with BM3D?, Computer Vision and Pattern Recognition, CVPR 2012, 138 Ranzato
  • 139. Software & Links Deep Learning website – http://deeplearning.net/ C++ code for ConvNets – http://eblearn.sourceforge.net/ Matlab code for R-ICA unsupervised algorithm - http://ai.stanford.edu/~quocle/rica_release.zip Python-based learning library - http://deeplearning.net/software/theano/ Lush learning library which includes ConvNets - http://lush.sourceforge.net/ Torch7: learning library that supports neural net training http://www.torch.ch 139 Ranzato
  • 140. Software & Links Code used to generate demo for this tutorial - http://cs.nyu.edu/~fergus/tutorials/deep_learning_cvpr12/ 140 Ranzato
  • 141. Acknowledgements Quoc Le, Andrew Ng Jeff Dean, Kai Chen, Greg Corrado, Matthieu Devin, Mark Mao, Rajat Monga, Paul Tucker, Samy Bengio Yann LeCun, Pierre Sermanet, Clement Farabet 141 Ranzato
  • 142. Visualizing Learned Features Q: can we interpret the learned features? 142 Ranzato et al. “Sparse feature learning for DBNs” NIPS 2007 Ranzato
  • 143. Visualizing Learned Features reconstruction: W h=W 1 h1W 2 h2  ≈ = 0.9 + 0.7 + 0.5 + 1.0 + ... Columns of W show what each code unit represents. 143 Ranzato et al. “Sparse feature learning for DBNs” NIPS 2007 Ranzato
  • 144. Visualizing Learned Features 1st layer features 144 Ranzato et al. “Sparse feature learning for DBNs” NIPS 2007 Ranzato
  • 145. Visualizing Learned Features Q: How about the second layer features? A: Similarly, each second layer code unit can be visualized by taking its bases and then projecting those bases in image space through the first layer decoder. 145 Ranzato et al. “Sparse feature learning for DBNs” NIPS 2007 Ranzato
  • 146. Visualizing Learned Features Q: How about the second layer features? A: Similarly, each second layer code unit can be visualized by taking its bases and then projecting those bases in image space through the first layer decoder. Missing edges have 0 weight. Light gray nodes have zero value. 146 Ranzato et al. “Sparse feature learning for DBNs” NIPS 2007 Ranzato
  • 147. Visualizing Learned Features Q: how are these images computed? 1st layer features 2nd layer features 147 Ranzato et al. “Sparse feature learning for DBNs” NIPS 2007 Ranzato
  • 148. Example of Feature Learning 1st layer features 2nd layer features 148 Ranzato et al. “Sparse feature learning for DBNs” NIPS 2007 Ranzato