SlideShare una empresa de Scribd logo
1 de 114
Descargar para leer sin conexión
Deep	
  Learning	
  	
  


          Russ	
  Salakhutdinov	
  
Department of Statistics and Computer Science!
            University of Toronto	
  
Mining	
  for	
  Structure	
  
    Massive	
  increase	
  in	
  both	
  computa:onal	
  power	
  and	
  the	
  amount	
  of	
  
    data	
  available	
  from	
  web,	
  video	
  cameras,	
  laboratory	
  measurements.	
  
  Images	
  &	
  Video	
            Text	
  &	
  Language	
  	
     Speech	
  &	
  Audio	
  

                                                                                               Gene	
  Expression	
  



                                Rela:onal	
  Data/	
  	
            Climate	
  Change	
  
Product	
  	
  
                                Social	
  Network	
                                            Geological	
  Data	
  
Recommenda:on	
  




                               Mostly	
  Unlabeled	
  
• 	
  Develop	
  sta:s:cal	
  models	
  that	
  can	
  discover	
  underlying	
  structure,	
  cause,	
  or	
  
sta:s:cal	
  correla:on	
  from	
  data	
  in	
  unsupervised	
  or	
  semi-­‐supervised	
  way.	
  	
  
• 	
  Mul:ple	
  applica:on	
  domains.	
  
Talk	
  Roadmap	
  
                  •  Unsupervised	
  Feature	
  Learning	
  
                      –  Restricted	
  Boltzmann	
  Machines	
  
    DBM	
             –  Deep	
  Belief	
  Networks	
  
                      –  Deep	
  Boltzmann	
  Machines	
  
                  •  Transfer	
  Learning	
  with	
  Deep	
  
                     Models	
  
RBM	
             •  Mul:modal	
  Learning	
  
Restricted	
  Boltzmann	
  Machines	
  
                         	
  	
  hidden	
  variables	
  

                                                           Bipar:te	
  	
         Stochas:c	
  binary	
  visible	
  variables	
  
                                                           Structure	
  
                                                                                  are	
  connected	
  to	
  stochas:c	
  binary	
  hidden	
  
                                                                                  variables      	
      	
  	
  	
  	
  	
  	
  	
  	
  .	
  	
  

                                                                              The	
  energy	
  of	
  the	
  joint	
  configura:on:	
  	
  


Image	
  	
  	
  	
  	
  	
  visible	
  variables	
  
                                                                                                     model	
  parameters.	
  

  Probability	
  of	
  the	
  joint	
  configura:on	
  is	
  given	
  by	
  the	
  Boltzmann	
  distribu:on:	
  



                                                                                     par::on	
  func:on	
       poten:al	
  func:ons	
  



   Markov	
  random	
  fields,	
  Boltzmann	
  machines,	
  log-­‐linear	
  models.	
  
Restricted	
  Boltzmann	
  Machines	
  
                         	
  	
  hidden	
  variables	
  

                                                            Bipar:te	
  	
  
                                                            Structure	
        Restricted:	
  	
  	
  No	
  interac:on	
  between	
  
                                                                                  	
   	
   	
  	
  hidden	
  variables	
  



                                                                    Inferring	
  the	
  distribu:on	
  over	
  the	
  
Image	
  	
  	
  	
  	
  	
  visible	
  variables	
  
                                                                    hidden	
  variables	
  is	
  easy:	
  



                                                        Factorizes:	
  Easy	
  to	
  compute	
  
   Similarly:	
  



   Markov	
  random	
  fields,	
  Boltzmann	
  machines,	
  log-­‐linear	
  models.	
  
Model	
  Learning	
  
                          	
  	
  hidden	
  variables	
  



                                                                     Given	
  a	
  set	
  of	
  i.i.d.	
  training	
  examples	
  	
  
                                                                         	
        	
      	
       	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ,	
  we	
  want	
  to	
  learn	
  	
  
                                                                     model	
  parameters 	
                                                                     	
     	
  	
  	
  	
  	
  	
  .	
  	
  	
  	
  

                                                               Maximize	
  (penalized)	
  log-­‐likelihood	
  objec:ve:	
  

 Image	
  	
  	
  	
  	
  	
  visible	
  variables	
  

Deriva:ve	
  of	
  the	
  log-­‐likelihood:	
                                                                                                                   Regulariza:on	
  	
  




                                                                 Difficult	
  to	
  compute:	
  exponen:ally	
  many	
  	
  
                                                                 configura:ons	
  
Model	
  Learning	
  
                          	
  	
  hidden	
  variables	
  



                                                                     Given	
  a	
  set	
  of	
  i.i.d.	
  training	
  examples	
  	
  
                                                                         	
        	
      	
       	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  ,	
  we	
  want	
  to	
  learn	
  	
  
                                                                     model	
  parameters 	
                                                                     	
     	
  	
  	
  	
  	
  	
  .	
  	
  	
  	
  

                                                               Maximize	
  (penalized)	
  log-­‐likelihood	
  objec:ve:	
  

 Image	
  	
  	
  	
  	
  	
  visible	
  variables	
  

Deriva:ve	
  of	
  the	
  log-­‐likelihood:	
  



 Approximate	
  maximum	
  likelihood	
  learning:	
  	
  
  Contras:ve	
  Divergence	
  (Hinton	
  2000)                            	
  	
  	
  	
  	
  Pseudo	
  Likelihood	
  (Besag	
  1977) 	
  	
  
  MCMC-­‐MLE	
  es:mator	
  (Geyer	
  1991)	
  	
                         	
  	
  	
  	
  	
  Composite	
  Likelihoods	
  (Lindsay,	
  1988;	
  Varin	
  2008)	
  	
  
  Tempered	
  MCMC	
   	
       	
     	
   	
                            	
  	
  	
  	
  	
  Adap:ve	
  MCMC	
  	
  
  (Salakhutdinov,	
  NIPS	
  2009)     	
   	
                            	
  	
  	
  	
  	
  (Salakhutdinov,	
  ICML	
  2010)	
  
RBMs	
  for	
  Images	
  
Gaussian-­‐Bernoulli	
  RBM:	
  	
  



                                                               Define	
  energy	
  func:ons	
  for	
  	
  
                                                               	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  various	
  data	
  modali:es:	
  


 Image	
  	
  	
  	
  	
  	
  visible	
  variables	
  




                                                                                                                                         Gaussian	
  


                                                                                                                                         Bernoulli	
  
RBMs	
  for	
  Images	
  
Gaussian-­‐Bernoulli	
  RBM:	
  	
  



                                                               Interpreta:on:	
  Mixture	
  of	
  exponen:al	
  
                                                               number	
  of	
  Gaussians	
  


 Image	
  	
  	
  	
  	
  	
  visible	
  variables	
  



          	
  where	
  
                                                                    	
  is	
  an	
  implicit	
  prior,	
  and	
  	
  

                                                                                                                  Gaussian	
  
RBMs	
  for	
  Images	
  and	
  Text	
  
Images:	
  Gaussian-­‐Bernoulli	
  RBM	
  
                                                            Learned	
  features	
  (out	
  of	
  10,000)	
  
   4	
  million	
  unlabelled	
  images	
  




Text:	
  Mul:nomial-­‐Bernoulli	
  RBM	
  

                                                                  Learned	
  features:	
  ``topics’’	
  
                                              russian	
        clinton	
       computer	
       trade	
        stock	
  
        Reuters	
  dataset:	
                 russia	
         house	
         system	
         country	
      wall	
  
        804,414	
  unlabeled	
                moscow	
         president	
     product	
        import	
       street	
  
        newswire	
  stories	
                 yeltsin	
        bill	
          sobware	
        world	
        point	
  
                                              soviet	
         congress	
      develop	
        economy	
      dow	
  
        Bag-­‐of-­‐Words	
  	
  
Collabora:ve	
  Filtering	
  

     Bernoulli	
  hidden:	
  user	
  preferences	
  
         h
                                                                        Learned	
  features:	
  ``genre’’	
  
    W1
                                                       Fahrenheit	
  9/11	
                              Independence	
  Day	
  
v                                                      Bowling	
  for	
  Columbine	
                     The	
  Day	
  Aber	
  Tomorrow	
  
                                                       The	
  People	
  vs.	
  Larry	
  Flynt	
          Con	
  Air	
  
      Mul:nomial	
  visible:	
  user	
  ra:ngs	
       Canadian	
  Bacon	
                               Men	
  in	
  Black	
  II	
  
                                                       La	
  Dolce	
  Vita	
                             Men	
  in	
  Black	
  
          Neglix	
  dataset:	
  	
  
          480,189	
  users	
  	
                       Friday	
  the	
  13th	
                           Scary	
  Movie	
  
                                                       The	
  Texas	
  Chainsaw	
  Massacre	
            Naked	
  Gun	
  	
  
          17,770	
  movies	
  	
                       Children	
  of	
  the	
  Corn	
                   Hot	
  Shots!	
  
          Over	
  100	
  million	
  ra:ngs	
           Child's	
  Play	
                                 American	
  Pie	
  	
  
                                                       The	
  Return	
  of	
  Michael	
  Myers	
         Police	
  Academy	
  

    State-­‐of-­‐the-­‐art	
  performance	
  	
  
    on	
  the	
  Neglix	
  dataset.	
  	
  
    Relates	
  to	
  Probabilis;c	
  Matrix	
  Factoriza;on	
  
                                                                                               (Salakhutdinov & Mnih ICML 2007)!
Mul:ple	
  Applica:on	
  Domains	
  
•    Natural	
  Images	
  
•    Text/Documents	
  
•    Collabora:ve	
  Filtering	
  /	
  Matrix	
  Factoriza:on	
  	
  
•    Video	
  (Langford	
  et	
  al.	
  ICML	
  2009	
  ,	
  Lee	
  et	
  al.)	
  	
  
•    Mo:on	
  Capture	
  (Taylor	
  et.al.	
  NIPS	
  2007)	
  
•    Speech	
  Percep:on	
  (Dahl	
  et.	
  al.	
  NIPS	
  2010,	
  Lee	
  et.al.	
  NIPS	
  2010)	
  

           Same	
  learning	
  algorithm	
  -­‐-­‐	
  	
  	
  	
   	
   	
   	
  	
  	
  
             	
   	
   	
   	
  mul:ple	
  input	
  domains.

       Limita:ons	
  on	
  the	
  types	
  of	
  structure	
  that	
  can	
  be	
  
       represented	
  by	
  a	
  single	
  layer	
  of	
  low-­‐level	
  features!	
  
Talk	
  Roadmap	
  
                  •  Unsupervised	
  Feature	
  Learning	
  
                      –  Restricted	
  Boltzmann	
  Machines	
  
    DBM	
             –  Deep	
  Belief	
  Networks	
  
                      –  Deep	
  Boltzmann	
  Machines	
  
                  •  Transfer	
  Learning	
  with	
  Deep	
  
                     Models	
  
RBM	
             •  Mul:modal	
  Learning	
  
Deep	
  Belief	
  Network	
  	
  




                        Low-­‐level	
  features:	
  
                        Edges	
  


                                         Built	
  from	
  unlabeled	
  inputs.	
  	
  

                       Input:	
  Pixels	
  
Image	
  
Deep	
  Belief	
  Network	
  
                                       Unsupervised	
  feature	
  learning.	
  
                                      Internal	
  representa:ons	
  capture	
  
                                      higher-­‐order	
  sta:s:cal	
  structure	
  

                     Higher-­‐level	
  features:	
  
                     Combina:on	
  of	
  edges	
  



                       Low-­‐level	
  features:	
  
                       Edges	
  


                                        Built	
  from	
  unlabeled	
  inputs.	
  	
  

                      Input:	
  Pixels	
  
Image	
  
                                   (Hinton et.al. Neural Computation 2006)!
Deep	
  Belief	
  Network	
  
                                                The	
  joint	
  probability	
  
     Deep	
  Belief	
  Network	
  
                                                  distribu:on	
  factorizes:	
  
h3
                             W3      RBM	
  

h2
                             W2                                Sigmoid	
  Belief	
  	
     RBM	
  
                                     Sigmoid	
  	
  
h1                                   Belief	
  	
              Network	
  
                                     Network	
  
                             W1
v
DBNs	
  for	
  Classifica:on	
  
              2000
                    W3
               500                              Softmax Output
                              RBM
                                                     10                            10
              500                                         WT
                                                           4                            WT+
                                                                                         4    4

                    W2
                                                    2000                          2000
              500                                         WT
                                                           3                            WT+
                                                                                         3    3
                              RBM
                                                    500                            500
                                                          WT
                                                           2                            WT+
                                                                                         2    2
              500                                   500                            500
                    W1                                    WT
                                                           1                            WT+
                                                                                         1    1




                              RBM
           Pretraining                           Unrolling                     Fine tuning
• 	
  Aber	
  layer-­‐by-­‐layer	
  unsupervised	
  pretraining,	
  discrimina:ve	
  fine-­‐tuning	
  	
  
by	
  backpropaga:on	
  achieves	
  an	
  error	
  rate	
  of	
  1.2%	
  on	
  MNIST.	
  SVM’s	
  get	
  
1.4%	
  and	
  randomly	
  ini:alized	
  backprop	
  gets	
  1.6%.	
  	
  
• 	
  Clearly	
  unsupervised	
  learning	
  helps	
  generaliza:on.	
  It	
  ensures	
  that	
  most	
  of	
  
the	
  informa:on	
  in	
  the	
  weights	
  comes	
  from	
  modeling	
  the	
  input	
  data.	
  
Deep	
  Autoencoders	
  
                                            B-=4>-,
    6$
         !*
              )4C
   ($$
              %&'
                                       )                         )
                                      !"                       !" !   ;
                                #$$$                     #$$$
                                       )                         )
   ($$                                !#                       !# !   :

         !6                     "$$$                     "$$$
                                       )                         )
   "$$$                               !6                       !6 !   9
              %&'
                                 ($$                     ($$
                                       )                         )
                                      !*                       !* !   (
                                 6$    ?4>-@5/A-,         6$
   "$$$
                                      !*                       !* !   *
         !#
                                 ($$                     ($$
   #$$$
              %&'
                                      !6                       !6 !   6
                                "$$$                     "$$$
                                      !#                       !# !   #
   #$$$                         #$$$                     #$$$
         !"                           !"                       !" !   "




              %&'   <1=4>-,

+,-.,/01012                   31,455012               701- .81012
Deep	
  Genera:ve	
  Model	
  
Model	
  P(document)	
                    Reuters	
  dataset:	
  804,414	
  	
  
                                          newswire	
  stories:	
  unsupervised	
  
                                                              European Community
                                          Interbank Markets   Monetary/Economic



                              Energy Markets
                                                                                Disasters and
                                                                                Accidents




                             Leading                                        Legal/Judicial
                             Economic
                             Indicators

    Bag	
  of	
  words	
                                           Government
                                           Accounts/
                                                                   Borrowings
                                           Earnings
                                                       (Hinton & Salakhutdinov, Science 2006)!
Informa:on	
  Retrieval	
  
                 Interbank Markets
                                      European Community
                                      Monetary/Economic
                                                                                      2-­‐D	
  LSA	
  space	
  

     Energy Markets
                                                         Disasters and
                                                         Accidents




    Leading                                           Legal/Judicial
    Economic
    Indicators


                                           Government
                 Accounts/
                                           Borrowings
                 Earnings

• 	
  The	
  Reuters	
  Corpus	
  Volume	
  II	
  contains	
  804,414	
  newswire	
  stories	
  
(randomly	
  split	
  into	
  402,207	
  training	
  and	
  402,207	
  test).	
  
• 	
  “Bag-­‐of-­‐words”:	
  each	
  ar:cle	
  is	
  represented	
  as	
  a	
  vector	
  containing	
  the	
  counts	
  of	
  
the	
  most	
  frequently	
  used	
  2000	
  words	
  in	
  the	
  training	
  set.	
  
Seman:c	
  Hashing	
  
                                                       European Community
                                                       Monetary/Economic
                   Address Space                                                               Disasters and
                                                                                               Accidents




                                       Semantically
                                       Similar
                                       Documents

Semantic
Hashing                                                                                            Government
Function                                                     Energy Markets                        Borrowing



      Document                                                                 Accounts/Earnings




 • 	
  Learn	
  to	
  map	
  documents	
  into	
  seman;c	
  20-­‐D	
  binary	
  codes.	
  
 • 	
  Retrieve	
  similar	
  documents	
  stored	
  at	
  the	
  nearby	
  addresses	
  with	
  no	
  
 search	
  at	
  all.	
  
Searching	
  Large	
  Image	
  Database	
  
                  using	
  Binary	
  Codes	
  
• 	
  Map	
  images	
  into	
  binary	
  codes	
  for	
  fast	
  retrieval.	
  




• 	
  Small	
  Codes,	
  Torralba,	
  Fergus,	
  Weiss,	
  CVPR	
  2008	
  
• 	
  Spectral	
  Hashing,	
  Y.	
  Weiss,	
  A.	
  Torralba,	
  R.	
  Fergus,	
  NIPS	
  2008	
  
• 	
  Kulis	
  and	
  Darrell,	
  NIPS	
  2009,	
  Gong	
  and	
  Lazebnik,	
  CVPR	
  20111	
  
• 	
  Norouzi	
  and	
  Fleet,	
  ICML	
  2011,	
  	
  
Talk	
  Roadmap	
  
                  •  Unsupervised	
  Feature	
  Learning	
  
                      –  Restricted	
  Boltzmann	
  Machines	
  
    DBM	
             –  Deep	
  Belief	
  Networks	
  
                      –  Deep	
  Boltzmann	
  Machines	
  
                  •  Transfer	
  Learning	
  with	
  Deep	
  
                     Models	
  
RBM	
             •  Mul:modal	
  Learning	
  
DBNs	
  vs.	
  DBMs	
  
         Deep Belief Network!                               Deep Boltzmann Machine!
         h3                                                   h3
                                          W3                                                  W3
         h2                                                   h2
                                          W2                                                  W2
         h1                                                   h1
                                          W1                                                  W1
          v                                                    v
DBNs	
  are	
  hybrid	
  models:	
  	
  
   • 	
  Inference	
  in	
  DBNs	
  is	
  problema:c	
  due	
  to	
  explaining	
  away.	
  
   • 	
  Only	
  greedy	
  pretrainig,	
  no	
  joint	
  op;miza;on	
  over	
  all	
  layers.	
  	
  
   • 	
  Approximate	
  inference	
  is	
  feed-­‐forward:	
  no	
  boMom-­‐up	
  and	
  top-­‐down.	
  	
  	
  	
  
Introduce	
  a	
  new	
  class	
  of	
  models	
  called	
  Deep	
  Boltzmann	
  Machines.	
  
Mathema:cal	
  Formula:on	
  

Deep	
  Boltzmann	
  Machine	
                                           model	
  parameters	
  
h3
                                      •  Dependencies	
  between	
  hidden	
  variables.	
  
                          W3          •  All	
  connec:ons	
  are	
  undirected.	
  
h2                                    •  Bopom-­‐up	
  and	
  Top-­‐down:	
  
                          W2
h1
                          W1
 v                                                      Top-­‐down	
                       Bopom-­‐up	
  
            Input	
            Unlike	
  many	
  exis:ng	
  feed-­‐forward	
  models:	
  ConvNet	
  (LeCun),	
  
                               HMAX	
  (Poggio	
  et.al.),	
  Deep	
  Belief	
  Nets	
  (Hinton	
  et.al.)	
  
Mathema:cal	
  Formula:on	
  

                                      Neural	
  Network	
  	
  
Deep	
  Boltzmann	
  Machine	
                                          Deep	
  Belief	
  Network	
  
                                        Output	
  
h3                            h3                                   h3
                          W3                                     W3                                   W3
h2                            h2                                   h2
                          W2                                     W2                                   W2
h1                            h1                                   h1
                          W1                                     W1                                   W1
 v                             v                                    v
            Input	
            Unlike	
  many	
  exis:ng	
  feed-­‐forward	
  models:	
  ConvNet	
  (LeCun),	
  
                               HMAX	
  (Poggio),	
  Deep	
  Belief	
  Nets	
  (Hinton)	
  
Mathema:cal	
  Formula:on	
  

                                      Neural	
  Network	
  	
  
Deep	
  Boltzmann	
  Machine	
                                          Deep	
  Belief	
  Network	
  
                                        Output	
  
h3                            h3                                   h3
                          W3                                     W3                                   W3
h2                            h2                                   h2




                                                                                                         inference	
  
                          W2                                     W2                                   W2
h1                            h1                                   h1
                          W1                                     W1                                   W1
 v                             v                                    v
            Input	
            Unlike	
  many	
  exis:ng	
  feed-­‐forward	
  models:	
  ConvNet	
  (LeCun),	
  
                               HMAX	
  (Poggio),	
  Deep	
  Belief	
  Nets	
  (Hinton)	
  
Mathema:cal	
  Formula:on	
  

Deep	
  Boltzmann	
  Machine	
                                     model	
  parameters	
  
h3
                                   •  Dependencies	
  between	
  hidden	
  variables.	
  
                          W3
                                    Maximum	
  likelihood	
  learning:	
  
h2
                          W2
h1
                                     Problem:	
  Both	
  expecta:ons	
  are	
  
                          W1           intractable!	
  
 v
                                    Learning	
  rule	
  for	
  undirected	
  graphical	
  models:	
  	
  
                                    MRFs,	
  CRFs,	
  Factor	
  graphs.	
  	
  
Previous	
  Work	
  
Many	
  approaches	
  for	
  learning	
  Boltzmann	
  machines	
  have	
  been	
  
proposed	
  over	
  the	
  last	
  20	
  years:	
  
 • 	
  Hinton	
  and	
  Sejnowski	
  (1983),	
  
 • 	
  Peterson	
  and	
  Anderson	
  (1987)	
  
 • 	
  Galland	
  (1991)	
  	
                               Real-­‐world	
  applica:ons	
  –	
  thousands	
  	
  
 • 	
  Kappen	
  and	
  Rodriguez	
  (1998)	
                of	
  hidden	
  and	
  observed	
  variables	
  
 • 	
  Lawrence,	
  Bishop,	
  and	
  Jordan	
  (1998)	
  
 • 	
  Tanaka	
  (1998)	
  	
                                with	
  millions	
  of	
  parameters.	
  
 • 	
  Welling	
  and	
  Hinton	
  (2002)	
  	
  
 • 	
  Zhu	
  and	
  Liu	
  (2002)	
  
 • 	
  Welling	
  and	
  Teh	
  (2003)	
  
 • 	
  Yasuda	
  and	
  Tanaka	
  (2009)	
  	
  	
  

Many	
  of	
  the	
  previous	
  approaches	
  were	
  not	
  successful	
  for	
  learning	
  
general	
  Boltzmann	
  machines	
  with	
  hidden	
  variables.	
  

Algorithms	
  based	
  on	
  Contras:ve	
  Divergence,	
  Score	
  Matching,	
  Pseudo-­‐
Likelihood,	
  Composite	
  Likelihood,	
  MCMC-­‐MLE,	
  Piecewise	
  Learning,	
  cannot	
  
handle	
  mul:ple	
  layers	
  of	
  hidden	
  variables.	
  	
  	
  
New	
  Learning	
  Algorithm	
  
   Posterior	
  Inference	
                    Simulate	
  from	
  the	
  Model	
  
Condi:onal	
                                                       Uncondi:onal	
  
                      Approximate	
     Approximate	
  the	
  
                      condi:onal	
      joint	
  distribu:on	
  




                                                     (Salakhutdinov, 2008; NIPS 2009)!
New	
  Learning	
  Algorithm	
  
   Posterior	
  Inference	
                               Simulate	
  from	
  the	
  Model	
  
Condi:onal	
                                                                  Uncondi:onal	
  
                        Approximate	
             Approximate	
  the	
  
                        condi:onal	
              joint	
  distribu:on	
  




                      Data-­‐dependent	
           Data-­‐independent	
  

                 density	
                Match	
  	
  
New	
  Learning	
  Algorithm	
  
      Posterior	
  Inference	
                                      Simulate	
  from	
  the	
  Model	
  
   Condi:onal	
                                                                             Uncondi:onal	
  
                               Approximate	
                 Approximate	
  the	
  
                               condi:onal	
                  joint	
  distribu:on	
  
                                                                               Markov	
  Chain	
  
Mean-­‐Field	
                                                                 Monte	
  Carlo	
  
                             Data-­‐dependent	
               Data-­‐independent	
  

                                                   Match	
  	
  
                    Key	
  Idea	
  of	
  Our	
  Approach:	
  
                    Data-­‐dependent:	
  	
  	
  	
  	
  Varia;onal	
  Inference,	
  mean-­‐field	
  theory	
  
                    Data-­‐independent:	
  	
  Stochas;c	
  Approxima;on,	
  MCMC	
  based	
  
Sampling	
  from	
  DBMs	
  
  Sampling	
  from	
  two-­‐hidden	
  layer	
  DBM:	
  by	
  running	
  Markov	
  chain:	
  




Randomly	
  
ini:alize	
  
                                                                …	
  
                                                                              Sample	
  
Stochas:c	
  Approxima:on	
  
Time	
  	
  t=1	
                                                     t=2	
                                                                 t=3	
  
       h2                                                      h2                                                               h2

                                       Update	
                                                  Update	
  
       h1                                                      h1                                                               h1



        v                                                      v                                                                 v




  Update	
  	
  	
  	
  	
  	
  and	
  	
  	
  	
  	
  	
  sequen:ally,	
  	
  where	
  
        •  Generate 	
                                        	
   	
      	
      	
  	
  	
  	
  	
  	
  by	
  simula:ng	
  from	
  a	
  Markov	
  chain	
  
           that	
  leaves	
  	
  	
  	
  	
  	
  	
  	
  	
  invariant	
  (e.g.	
  Gibbs	
  or	
  M-­‐H	
  sampler)	
  
        •  Update 	
  	
  	
  	
  	
  	
  by	
  replacing	
  intractable	
  	
                 	
        	
  	
  	
  	
  	
  	
  	
  	
  with	
  a	
  point	
  
           es:mate	
  	
  
 In	
  prac:ce	
  we	
  simulate	
  several	
  Markov	
  chains	
  in	
  parallel.	
  
                                                                        Robbins	
  and	
  Monro,	
  Ann.	
  Math.	
  Stats,	
  1957	
  
                                                                        	
  L.	
  Younes,	
  	
  Probability	
  Theory	
  1989,	
  Tieleman,	
  ICML	
  2008.	
  	
  
Stochas:c	
  Approxima:on	
  
Update	
  rule	
  decomposes:	
  



                                       True	
  gradient	
                                    Noise	
  term	
  
Almost	
  sure	
  convergence	
  guarantees	
  as	
  learning	
  rate	
  	
  	
  

                                      Problem:	
  High-­‐dimensional	
  data:	
                           Markov	
  Chain	
  
                                      the	
  energy	
  landscape	
  is	
  highly	
  	
  
                                                                                                          Monte	
  Carlo	
  
                                      mul:modal	
  

                                       Key	
  insight:	
  The	
  transi:on	
  operator	
  can	
  be	
  	
  
                                       any	
  valid	
  transi:on	
  operator	
  –	
  Tempered	
  	
  
                                       Transi:ons,	
  Parallel/Simulated	
  Tempering.	
  

            Connec:ons	
  to	
  the	
  theory	
  of	
  stochas:c	
  approxima:on	
  and	
  adap:ve	
  MCMC.	
  
Varia:onal	
  Inference	
  
       Approximate	
  intractable	
  distribu:on	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  with	
  simpler,	
  tractable	
  
       distribu:on 	
   	
  	
  	
  	
  	
  :	
  


   Posterior	
  Inference	
  



 Mean-­‐Field	
  



                                                                            Varia:onal	
  Lower	
  Bound	
  


Minimize	
  KL	
  between	
  approxima:ng	
  and	
  true	
  
distribu:ons	
  with	
  respect	
  to	
  varia:onal	
  parameters	
  	
  	
  	
  	
  .	
  	
  
                                                                                          (Salakhutdinov & Larochelle, AI & Statistics 2010)!
Varia:onal	
  Inference	
  
    Approximate	
  intractable	
  distribu:on	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  with	
  simpler,	
  tractable	
  
    distribu:on 	
   	
  	
  	
  	
  	
  :	
  


Posterior	
  Inference	
  
                                                              Varia:onal	
  Lower	
  Bound	
  

                                  Mean-­‐Field:	
  Choose	
  a	
  fully	
  factorized	
  distribu:on:	
  
Mean-­‐Field	
  
                                                                                        with	
  

                                  Varia;onal	
  Inference:	
  Maximize	
  the	
  lower	
  bound	
  w.r.t.	
  
                                  Varia:onal	
  parameters	
  	
  	
  	
  	
  .	
  	
  

                                                              Nonlinear	
  fixed-­‐	
  
                                                              point	
  equa:ons:	
  
Varia:onal	
  Inference	
  
   Approximate	
  intractable	
  distribu:on	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  with	
  simpler,	
  tractable	
  
   distribu:on 	
   	
  	
  	
  	
  	
  :	
  


Posterior	
  Inference	
  
                                                                  Varia:onal	
  Lower	
  Bound	
                                       Uncondi:onal	
  Simula:on	
  



                                                                                                       Markov	
  Chain	
  
Mean-­‐Field	
  
                                                      Fast	
  Inference	
  
                                    1.	
  Varia;onal	
  Inference:	
  Maximize	
  the	
  lower	
  	
  
                                    bound	
  w.r.t.	
  varia:onal	
  parameters	
  	
  	
  	
  	
  	
  
                                                                                                       Monte	
  Carlo	
  

                                    2.	
  MCMC:	
  Apply	
  stochas:c	
  approxima:on	
  	
  
                                                 Learning	
  can	
  scale	
  to	
  
                                    to	
  update	
  model	
  parameters	
  	
  	
  	
  	
   	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  

                                                 millions	
  of	
  examples	
  
                              Almost	
  sure	
  convergence	
  guarantees	
  to	
  an	
  asympto:cally	
  
                              stable	
  point.	
  
Good	
  Genera:ve	
  Model?	
  
Handwripen	
  Characters	
  
Good	
  Genera:ve	
  Model?	
  
Handwripen	
  Characters	
  
Good	
  Genera:ve	
  Model?	
  
Handwripen	
  Characters	
  




        Simulated	
            Real	
  Data	
  
Good	
  Genera:ve	
  Model?	
  
Handwripen	
  Characters	
  




        Real	
  Data	
         Simulated	
  
Good	
  Genera:ve	
  Model?	
  
Handwripen	
  Characters	
  
Good	
  Genera:ve	
  Model?	
  
MNIST	
  Handwripen	
  Digit	
  Dataset	
  
Deep	
  Boltzmann	
  Machine	
  
 Sanskrit	
                Model	
  P(image)	
  




                      25,000	
  characters	
  from	
  50	
  
                      alphabets	
  around	
  the	
  world.	
  

                      • 	
  3,000	
  hidden	
  variables	
  
                      • 	
  784	
  	
  observed	
  variables	
  
                      	
  	
  	
  (28	
  by	
  28	
  images)	
  
                      • 	
  Over	
  2	
  million	
  parameters	
  

                  Bernoulli	
  Markov	
  Random	
  Field	
  
Deep	
  Boltzmann	
  Machine	
  

                                   Condi:onal	
  
                                   Simula:on	
  




P(image|par:al	
  image)	
   Bernoulli	
  Markov	
  Random	
  Field	
  
Handwri:ng	
  Recogni:on	
  
        MNIST	
  Dataset	
                               Op:cal	
  Character	
  Recogni:on	
  
   60,000	
  examples	
  of	
  10	
  digits	
         42,152	
  examples	
  of	
  26	
  English	
  lepers	
  	
  
 Learning	
  Algorithm	
                  Error	
         Learning	
  Algorithm	
                       Error	
  
 Logis:c	
  regression	
                  12.0%	
         Logis:c	
  regression	
                      22.14%	
  
 K-­‐NN	
  	
                             3.09%	
         K-­‐NN	
  	
                                 18.92%	
  
 Neural	
  Net	
  (Plap	
  2005)	
        1.53%	
         Neural	
  Net	
                              14.62%	
  
 SVM	
  (Decoste	
  et.al.	
  2002)	
     1.40%	
         SVM	
  (Larochelle	
  et.al.	
  2009)	
      9.70%	
  
 Deep	
  Autoencoder	
                    1.40%	
         Deep	
  Autoencoder	
                        10.05%	
  
 (Bengio	
  et.	
  al.	
  2007)	
  	
                     (Bengio	
  et.	
  al.	
  2007)	
  	
  

 Deep	
  Belief	
  Net	
                  1.20%	
         Deep	
  Belief	
  Net	
                      9.68%	
  
 (Hinton	
  et.	
  al.	
  2006)	
  	
                     (Larochelle	
  et.	
  al.	
  2009)	
  	
  

 DBM	
  	
                                0.95%	
         DBM	
                                        8.40%	
  


Permuta:on-­‐invariant	
  version.	
  
Deep	
  Boltzmann	
  Machine	
  
                                                                Gaussian-­‐Bernoulli	
  Markov	
  
             Deep	
  Boltzmann	
  Machine	
  
                                                                Random	
  Field	
  



                                                               12,000	
  Latent	
  	
  
                                                               Variables	
  

                                                                    Model	
  P(image)	
  



Planes	
                                  96	
  by	
  96	
  
                                          images	
  
                                                                    24,000	
  Training	
  Images	
  
                     Stereo	
  pair	
  
Genera:ve	
  Model	
  of	
  3-­‐D	
  Objects	
  




24,000	
  examples,	
  5	
  object	
  categories,	
  5	
  different	
  objects	
  within	
  each	
  	
  
category,	
  6	
  lightning	
  condi:ons,	
  9	
  eleva:ons,	
  18	
  azimuths.	
  	
  	
  
3-­‐D	
  Object	
  Recogni:on	
  
                                                              Papern	
  Comple:on	
  
 Learning	
  Algorithm	
                          Error	
  
 Logis:c	
  regression	
                         22.5%	
  
 K-­‐NN	
  (LeCun	
  2004)	
                     18.92%	
  
 SVM	
  (Bengio	
  &	
  LeCun	
  	
  2007)	
     11.6%	
  
 Deep	
  Belief	
  Net	
  (Nair	
  &	
            9.0%	
  
 Hinton	
  	
  2009)	
  	
  

 DBM	
                                            7.2%	
  




Permuta:on-­‐invariant	
  version.	
  
Learning	
  Part-­‐based	
  Hierarchy	
  
                         Object	
  parts.	
  




                         Combina:on	
  of	
  edges.	
  




             Trained	
  from	
  mul:ple	
  classes	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  
             (cars,	
  faces,	
  motorbikes,	
  airplanes).	
  
                                                                     Lee	
  et.al.,	
  ICML	
  2009	
  
Robust	
  Boltzmann	
  Machines	
  
• 	
  Build	
  more	
  complex	
  models	
  that	
  can	
  deal	
  with	
  occlusions	
  or	
  structured	
  
noise.	
  	
                                   Gaussian	
  RBM,	
  modeling	
   Binary	
  RBM	
  modeling	
  
                                                                clean	
  faces	
                      occlusions	
  


                                         Inferred	
  




                                                            Binary	
  pixel-­‐wise	
            Gaussian	
  noise	
  
                                                               Mask	
  

                 Observed	
  
Relates	
  to	
  Le	
  Roux,	
  Heess,	
  Shopon,	
  and	
  Winn,	
  
Neural	
  Computa:on,	
  2011	
  
Eslami,	
  Heess,	
  Winn,	
  CVPR	
  2012	
  
                                                                                         Tang	
  et.	
  al.,	
  CVPR	
  2012	
  
Robust	
  Boltzmann	
  Machines	
  
                                    Internal	
  States	
  of	
  
                                    RoBM	
  during	
  
                                    learning.	
  	
  




                      Inference	
  on	
  the	
  test	
  
                      subjects	
  




                                 Comparing	
  to	
  Other	
  
                                 Denoising	
  Algorithms	
  
Spoken	
  Query	
  Detec:on	
  
•  630	
  speaker	
  TIMIT	
  corpus:	
  3,696	
  training	
  and	
  944	
  test	
  uperances.	
  
•  10	
  query	
  keywords	
  were	
  randomly	
  selected	
  and	
  10	
  examples	
  of	
  
   each	
  keyword	
  were	
  extracted	
  from	
  the	
  training	
  set.	
  	
  
•  Goal:	
  For	
  each	
  keyword,	
  rank	
  all	
  944	
  uperances	
  based	
  on	
  the	
  
   uperance’s	
  probability	
  of	
  containing	
  that	
  keyword.	
  	
  	
  
•  Performance	
  measure:	
  The	
  average	
  equal	
  error	
  rate	
  (EER).	
  	
  
                                                                    13.5


  Learning	
  Algorithm	
            AVG	
  EER	
                    13



  GMM	
  Unsupervised	
  	
            16.4%	
                      12.5


                                                                     12

  DBM	
  Unsupervised	
  	
            14.7%	
           Avg. EER   11.5


  DBM	
  (1%	
  labels)	
              13.3%	
                       11



  DBM	
  (30%	
  labels)	
             10.5%	
                      10.5




  DBM	
  (100%	
  labels)	
             9.7%	
  
                                                                     10


                                                                     9.5
                                                                           0   0.1   0.2     0.3   0.4   0.5   0.6   0.7   0.8   0.9   1
                                                                                                   Training Ratio

                                                                                           (Yaodong	
  Zhang	
  et.al.	
  	
  ICASSP	
  2012)!
Learning	
  Hierarchical	
  Representa:ons	
  
Deep	
  Boltzmann	
  Machines:	
  	
  
  Learning	
  Hierarchical	
  Structure	
  	
  
  in	
  Features:	
  edges,	
  combina:on	
  	
  
  of	
  edges.	
  	
  
   • 	
  Performs	
  well	
  in	
  many	
  applica:on	
  domains	
  
   • 	
  Combines	
  bopom	
  and	
  top-­‐down	
  
   • 	
  Fast	
  Inference:	
  frac:on	
  of	
  a	
  second	
  
   • 	
  Learning	
  scales	
  to	
  millions	
  of	
  examples	
  


                Many	
  examples,	
  few	
  categories	
  

Next:	
  Few	
  examples,	
  many	
  categories	
  –	
  Transfer	
  Learning	
  
Talk	
  Roadmap	
  
 DBM	
                                                                      •  Unsupervised	
  Feature	
  Learning	
  
                                                                                –  Restricted	
  Boltzmann	
  Machines	
  
                                                                                –  Deep	
  Belief	
  Networks	
  
                                                                                –  Deep	
  Boltzmann	
  Machines	
  
                                                                            •  Transfer	
  Learning	
  with	
  Deep	
  
                                                                               Models	
  
                                                                            •  Mul:modal	
  Learning	
  
“animal”	
  
                                                         “vehicle”	
  




    	
  horse	
   	
  cow	
     	
  car	
   	
  van	
   	
  truck	
  
One-­‐shot	
  Learning	
  
                    “zarc”                              “segway”




How	
  can	
  we	
  learn	
  a	
  novel	
  concept	
  –	
  a	
  high	
  dimensional	
  
sta:s:cal	
  object	
  –	
  from	
  few	
  examples.	
  	
  	
  
Learning	
  from	
  Few	
  Examples	
  
SUN	
  database	
  
                                     car	
  




                                                               van	
     truck	
  
                                                                                     bus	
  




                    Classes	
  sorted	
  by	
  frequency	
  
 Rare	
  objects	
  are	
  similar	
  to	
  frequent	
  objects	
  
Tradi:onal	
  Supervised	
  Learning	
  




    Segway	
                    Motorcycle	
  



                       Test:	
  	
  
                       What	
  is	
  this?	
  
Learning	
  to	
  Transfer	
  
 Background	
  Knowledge	
  
Millions	
  of	
  unlabeled	
  images	
  	
  
                                                Learn	
  to	
  Transfer	
  
                                                Knowledge	
  




     Some	
  labeled	
  images	
  
                                                             Learn	
  novel	
  concept	
  
                                                             from	
  one	
  example	
  

      Bicycle	
            Dolphin	
  

                                                      Test:	
  	
  
                                                      What	
  is	
  this?	
  
      Elephant	
           Tractor	
  
Learning	
  to	
  Transfer	
  
 Background	
  Knowledge	
  
Millions	
  of	
  unlabeled	
  images	
  	
  
                                                Learn	
  to	
  Transfer	
  
                                                Knowledge	
  
 Key	
  problem	
  in	
  computer	
  vision,	
  	
  
 speech	
  percep:on,	
  natural	
  language	
  
 processing,	
  and	
  many	
  other	
  domains.	
  
     Some	
  labeled	
  images	
  
                                                             Learn	
  novel	
  concept	
  
                                                             from	
  one	
  example	
  

      Bicycle	
            Dolphin	
  

                                                      Test:	
  	
  
                                                      What	
  is	
  this?	
  
      Elephant	
           Tractor	
  
One-­‐Shot	
  Learning	
  
                               Level 3
                              {τ 0, α0}
                                                                                                     Hierarchical	
  Bayesian	
  
             Level 2
         {µk, τ k, αk}

Level 1
                             Animal                     Vehicle
                                                                                      ...
                                                                                                     Models	
  
{µc, τ c}
            Cow             Horse            Sheep              Truck              Car



                                                                                                                Hierarchical	
  Prior.	
  
                                                                              Probability of observed             Prior probability of
                                                                              data given parameters               weight vector W

                    Posterior probability of
                    parameters given the
                    training data D.



• 	
  Fei-­‐Fei,	
  Fergus,	
  and	
  Perona,	
  TPAMI	
  2006	
  
• 	
  E.	
  Bart,	
  I.	
  Porteous,	
  P.	
  Perona,	
  and	
  M.	
  Welling,	
  CVPR	
  2007	
  
• 	
  Miller,	
  Matsakis,	
  and	
  Viola,	
  CVPR	
  2000	
  
• 	
  Sivic,	
  Russell,	
  	
  Zisserman,	
  Freeman,	
  and	
  Efros,	
  CVPR	
  2008	
  
Hierarchical-­‐Deep	
  Models	
  
HD	
  Models:	
  Compose	
  hierarchical	
  Bayesian	
                                  Deep	
  Nets	
  
models	
  with	
  deep	
  networks,	
  two	
  influen:al	
                        Part-­‐based	
  Hierarchy	
  
approaches	
  from	
  unsupervised	
  learning	
  
Deep	
  Networks:	
  
 • 	
  learn	
  mul:ple	
  layers	
  of	
  nonlineari;es.	
  
 • 	
  trained	
  in	
  unsupervised	
  fashion	
  -­‐-­‐	
  
 unsupervised	
  feature	
  learning	
  –	
  no	
  need	
  to	
  
                                                                                           Marr	
  and	
  Nishihara	
  (1978)	
  
 rely	
  on	
  human-­‐crabed	
  input	
  representa:ons.	
  
 • 	
  labeled	
  data	
  is	
  used	
  to	
  slightly	
  adjust	
  the	
        Hierarchical	
  Bayes	
  
 model	
  for	
  a	
  specific	
  task.	
                                      Category-­‐based	
  Hierarchy	
  
Hierarchical	
  Bayes:	
  
• 	
  explicitly	
  represent	
  category	
  hierarchies	
  for	
  
sharing	
  abstract	
  knowledge.	
  
• 	
  explicitly	
  iden:fy	
  only	
  a	
  small	
  number	
  of	
  
parameters	
  that	
  are	
  relevant	
  to	
  the	
  new	
  
concept	
  being	
  learned.	
                                                              Collins	
  &	
  Quillian	
  (1969)	
  
Mo:va:on	
  
      Learning	
  to	
  transfer	
  knowledge:	
  	
  
                                                                                              Super-­‐class	
  
Hierarchical	
  

           • 	
  Super-­‐category:	
  “A	
  segway	
  looks	
                    Segway	
  
           like	
  a	
  funny	
  kind	
  of	
  vehicle”.	
  

           • 	
  Higher-­‐level	
  features,	
  or	
  parts,	
  	
  shared	
  
           with	
  other	
  classes:	
  
                                                                                                 Parts	
  
              Ø 	
  wheel,	
  handle,	
  post	
  

           • 	
  Lower-­‐level	
  features:	
  
                 Ø 	
  edges,	
  composi:on	
  of	
  edges 	
  	
  
Deep	
                                                                     Edges	
  
Hierarchical	
  Genera:ve	
  Model	
  
                                            Hierarchical	
  Latent	
  Dirichlet	
  
                                            Alloca:on	
  Model	
  
“animal”	
  
                                                          “vehicle”	
  




                                                                            K




  	
  horse	
   	
  cow	
     	
  car	
     	
  van	
       	
  truck	
  



                                                  DBM	
  Model	
  

                                                                                Lower-­‐level	
  generic	
  features:	
  	
  
                                                                                  • 	
  edges,	
  combina:on	
  of	
  edges	
  
                                                                 Images	
  
                                                                                         (Salakhutdinov, Tenenbaum, Torralba, 2011)!
Hierarchical	
  Genera:ve	
  Model	
  
                                            Hierarchical	
  Latent	
  Dirichlet	
  
                                            Alloca:on	
  Model	
  
“animal”	
  
                                                          “vehicle”	
           Hierarchical	
  Organiza;on	
  of	
  Categories:	
  
                                                                                  • 	
  express	
  priors	
  on	
  the	
  features	
  that	
  are	
  
                                                                                  typical	
  of	
  different	
  kinds	
  of	
  concepts	
  
                                                                                   • 	
  modular	
  data-­‐parameter	
  rela:ons	
  	
  	
  	
  
                                                                            K


                                                                                Higher-­‐level	
  class-­‐sensi;ve	
  features:	
  
  	
  horse	
   	
  cow	
     	
  car	
     	
  van	
       	
  truck	
           • 	
  capture	
  dis:nc:ve	
  perceptual	
  
                                                                                  structure	
  of	
  a	
  specific	
  concept	
  
                                                  DBM	
  Model	
  

                                                                                Lower-­‐level	
  generic	
  features:	
  	
  
                                                                                  • 	
  edges,	
  combina:on	
  of	
  edges	
  
                                                                 Images	
  
                                                                                           (Salakhutdinov, Tenenbaum, Torralba, 2011)!
Intui:on	
                             α Pr(topic	
  |	
  doc)	
  
                                                                                                 π

Words	
  ⇔	
  ac:va:ons	
  of	
  DBM’s	
  top-­‐level	
  units.	
  	
                            z
Topics	
  ⇔	
  distribu:ons	
  over	
  top-­‐level	
  units,	
  or	
                                           θ
higher-­‐level	
  parts.	
  	
                                                                  w
                                                                                                                 Pr(word	
  |	
  topic)	
  

   DBM	
  generic	
  features:	
  	
         LDA	
  high-­‐level	
  features:	
              Images	
  
            Words	
                                     Topics	
                           Documents	
  




 Each	
  topic	
  is	
  made	
  up	
  of	
  words.	
                  Each	
  document	
  is	
  made	
  up	
  of	
  topics.	
  
Hierarchical	
  Deep	
  Model	
  

“animal”	
                                        “vehicle”	
  




                                                                                K

                                                                           Topics	
  

   	
  horse	
   	
  cow	
     	
  car	
     	
  van	
     	
  truck	
  
Hierarchical	
  Deep	
  Model	
  
                                              Tree	
  hierarchy	
  of	
  	
  
                                              classes	
  is	
  learned	
  
“animal”	
                                        “vehicle”	
  
                                                                                              	
      	
  (Nested	
  Chinese	
  Restaurant	
  Process)	
  
                                                                                        prior:	
  a	
  nonparametric	
  prior	
  over	
  tree	
  
                                                                                        structures.	
  


                                                                                K

                                                                           Topics	
  

   	
  horse	
   	
  cow	
     	
  car	
     	
  van	
     	
  truck	
  
Hierarchical	
  Deep	
  Model	
  
                                              Tree	
  hierarchy	
  of	
  	
  
                                              classes	
  is	
  learned	
  
“animal”	
                                        “vehicle”	
  
                                                                                              	
            	
  (Nested	
  Chinese	
  Restaurant	
  Process)	
  
                                                                                        prior:	
  a	
  nonparametric	
  prior	
  over	
  tree	
  
                                                                                        structures.	
  

                                                                                                     	
      	
  (Hierarchical	
  Dirichlet	
  Process)	
  prior:	
  
                                                                                K         a	
  nonparametric	
  prior	
  allowing	
  categories	
  to	
  
                                                                           Topics	
       share	
  higher-­‐level	
  features,	
  or	
  parts.	
  


   	
  horse	
   	
  cow	
     	
  car	
     	
  van	
     	
  truck	
  
Hierarchical	
  Deep	
  Model	
  
                                              Tree	
  hierarchy	
  of	
  	
  
                                              classes	
  is	
  learned	
  
“animal”	
                                        “vehicle”	
  
                                                                               Unlike	
  standard	
  sta:s:cal	
  models,	
  	
  
                                                                                                 	
        	
  (Nested	
  Chinese	
  Restaurant	
  Process)	
  
                                                                               in	
  addi:on	
  ntonparametric	
  prior	
  arameters,	
  
                                                                                        prior:	
  a	
  
                                                                                        structures	
  
                                                                                                           o	
  inferring	
  p over	
  tree	
  
                                                                               we	
  also	
  infer	
  the	
  hierarchy	
  for	
  
                                                                                                      	
        	
  (Hierarchical	
  Dirichlet	
  Process)	
  prior:	
  
                                                                               sharing	
  nonparametric	
  prior	
  allowing	
  categories	
  to	
  
                                                                                K         a	
   those	
  parameters.	
  
                                                                           Topics	
           share	
  higher-­‐level	
  features,	
  or	
  parts.	
  

                                                                                                                  Condi;onal	
  Deep	
  Boltzmann	
  
   	
  horse	
   	
  cow	
     	
  car	
     	
  van	
     	
  truck	
  
                                                                                                                  Machine.	
  

                                                                                  Enforce	
  (approximate)	
  global	
  consistency	
  	
  
                                                                                  through	
  many	
  local	
  constraints.	
  	
  
CIFAR	
  Object	
  Recogni:on	
  
                                                           Tree	
  hierarchy	
  of	
  	
  
                                                           classes	
  is	
  learned	
         50,000	
  images	
  of	
  100	
  classes	
  

“animal”	
  
                                                               “vehicle”	
  




            Higher-­‐level	
  class	
                                                         Inference:	
  Markov	
  chain	
  	
  
            sensi:ve	
  features	
  	
                                                        Monte	
  Carlo	
  –	
  Later!	
  

     	
  horse	
   	
  cow	
       	
  car	
     	
  van	
       	
  truck	
                 4	
  million	
  unlabeled	
  images	
  



       Lower-­‐level	
  
       generic	
  	
  features	
  	
  
                                                                                                    32	
  x	
  32	
  pixels	
  x	
  3	
  	
  RGB	
  
Learning	
  to	
  Learn	
  
     The	
  model	
  learns	
  how	
  to	
  share	
  the	
  knowledge	
  across	
  many	
  visual	
  
     categories.	
                                                                    Learned	
  super-­‐
                                                                                “global”	
  
                                                                                                                  class	
  hierarchy	
  
          “aqua;c	
  
          animal”	
                                 “fruit”	
                           “human”	
  




dolphin	
        turtle	
   shark	
   ray	
     apple	
           orange	
   sunflower	
                girl	
     baby	
     man	
     Basic	
  level	
  	
  
                                                                                                                                       class	
  
                                                                                                                  woman	
  




                                                                                            …	
     Learned	
  higher-­‐level	
  
                                                                                                    class-­‐sensi;ve	
  features	
  

                                                …	
         Learned	
  low-­‐level	
  
                                                            generic	
  features	
  
Learning	
  to	
  Learn	
  
    The	
  model	
  learns	
  how	
  to	
  share	
  the	
  knowledge	
  across	
  many	
  visual	
  
    categories.	
  
                                                                                      “global”	
                    Learned	
  super-­‐
crocodile	
                     spider	
  
                                                                                                                    class	
  hierarchy	
  
           “aqua;c	
    snake	
  
        lizard	
                                         “fruit”	
                          “human”	
                                  castle	
   road	
  
           animal”	
  
                   squirrel	
                                                                                                     bridge	
  
        kangaroo	
                                                                                                                       skyscraper	
  


                                                                                                                       bus	
                     house	
  
 leopard	
  
  dolphin	
     fox	
   turtle	
   shark	
   ray	
   apple	
        orange	
   sunflower	
                                   truck	
   Basic	
  level	
  	
  
                                                                                                                                                train	
  
                          ;ger	
                                                                         girl	
     baby	
   man	
   tank	
  
        lion	
   wolf	
                                                                                                        tractor	
   streetcar	
  
                                                                                                                                           class	
  
                                     oMer	
   skunk	
                                                               woman	
  
                             shrew	
  
                                      porcupine	
  
                                                                                        pine	
  
                                                    dolphin	
                              oak	
   maple	
  tree	
  
bear	
  
                                                              ray	
  
                                                          whale	
  
                                                                          shark	
  
                                                                                              …	
      Learned	
  higher-­‐level	
  
                                                                                          willow	
  tree	
  
                          camel	
                                   turtle	
                                         boMle	
  
 elephant	
  
           caMle	
  
                                                                                                       class-­‐sensi;ve	
   can	
   lamp	
  
                                                                                                                        bowl	
   cup	
  
    chimpanzee	
   beaver	
                                                                           features	
  
                                 mouse	
   raccoon	
   Learned	
  low-­‐level	
  
                                                    …	
      apple	
  
                                                                       peer	
   pepper	
                     man	
  
                                                                                                                         boy	
   man	
  
                                                       generic	
  features	
  
                           hamster	
   rabbit	
   possum	
     orange	
  
                                                                        sunflower	
                              girl	
   woman	
  
Sharing	
  Features	
                                                                            “fruit”	
  
                                                 Reconst-­‐	
                                              Learning	
  to	
  
                         Real	
                  ruc:ons	
            Shape	
              Color	
         Learn	
  
Dolphin	
   Sunflower	
   Orange	
   Apple	
  




                                                                                                                                           apple	
   orange	
   sunflower	
  

                                                                                                                                     Sunflower	
  ROC	
  curve	
  
                                                                                                                                              5  3	
  	
  	
  1	
  ex’s	
  	
  
                                                                                                                                1
                                                                                                                               0.9
                                                                                                                               0.8
                                                                                                                               0.7




                                                                                                              detection rate
                                                                                                                               0.6
                                                                                                                               0.5
                                                                                                                               0.4
                                                                                                                                                             Pixel-­‐
                                                                                                                               0.3
                                                                                                                               0.2
                                                                                                                                                             space	
  	
  
                                                                                                                               0.1                           distance	
  
                                                                                                                                0
                                                                                                                                 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
                                                                                                                                              false alarm rate




                                                Learning	
  to	
  Learn:	
  Learning	
  a	
  hierarchy	
  for	
  sharing	
  parameters	
  –	
  
                                                	
   	
   	
  	
  	
  	
  	
  	
  rapid	
  learning	
  of	
  a	
  novel	
  concept.	
  
Object	
  Recogni:on	
  	
  
  Area under ROC curve for same/different
    (1 new class vs. 99 distractor classes)

      1	
                                                                                                               HDP-­‐DBM	
  
                                                        LDA	
               DBM	
         	
  	
  	
  HDP-­‐DBM	
  
0.95	
               GIST	
                     (class	
  condi:onal)	
                   (no	
  super-­‐classes)	
  
0.9	
  

0.85	
  
0.8	
  
0.75	
  
0.7	
  

0.65	
  
                  1	
   3	
   5	
   10	
   50	
  
                 #	
  examples	
                                                      [Averaged	
  over	
  40	
  test	
  classes]	
  
              Our	
  model	
  outperforms	
  standard	
  computer	
  vision	
  	
  
              features	
  (e.g.	
  GIST).	
  
Handwripen	
  Character	
  Recogni:on	
  

“alphabet	
  1”	
  
                                                                                “alphabet	
  2”	
  
                                                                                                         Learned	
  
                                                                                                         higher-­‐level	
  	
  
                                                                                                         features	
  

                                                                                                         Strokes	
  

       	
  char	
  1	
   	
  char	
  2	
     	
  char	
  3	
  	
   	
  char	
  4	
   	
  char	
  5	
  



                                                                                               Learned	
  lower-­‐
                                                                                               level	
  features	
  

                                                                        25,000	
  	
                          Edge
                                                                        characters	
  
                                                                                                              s	
  
Handwripen	
  Character	
  Recogni:on	
  
      Area under ROC curve for same/different
        (1 new class vs. 1000 distractor classes)

                                                                                                                   HDP-­‐DBM	
  
      1	
                                                                                HDP-­‐DBM	
  
                                                 LDA	
  	
           DBM	
           (no	
  super-­‐classes)	
  
0.95	
                                   (class	
  condi:onal)	
  
0.9	
              Pixels	
  
0.85	
  
0.8	
  
0.75	
  
0.7	
  

0.65	
  
                  1	
   3	
   5	
   10	
  
                                                                               [Averaged	
  over	
  40	
  test	
  classes]	
  
              #	
  examples	
  
Simula:ng	
  New	
  Characters	
  
                                                            Real	
  data	
  within	
  super	
  class	
  
                                               Global	
  



 Super	
                                Super	
  
 class	
  1	
                           class	
  2	
  



Class	
  1	
      Class	
  2	
     New	
  class	
  




         Simulated	
  new	
  characters	
  
Simula:ng	
  New	
  Characters	
  
                                                            Real	
  data	
  within	
  super	
  class	
  
                                               Global	
  



 Super	
                                Super	
  
 class	
  1	
                           class	
  2	
  



Class	
  1	
      Class	
  2	
     New	
  class	
  




         Simulated	
  new	
  characters	
  
Simula:ng	
  New	
  Characters	
  
                                                            Real	
  data	
  within	
  super	
  class	
  
                                               Global	
  



 Super	
                                Super	
  
 class	
  1	
                           class	
  2	
  



Class	
  1	
      Class	
  2	
     New	
  class	
  




         Simulated	
  new	
  characters	
  
Simula:ng	
  New	
  Characters	
  
                                                            Real	
  data	
  within	
  super	
  class	
  
                                               Global	
  



 Super	
                                Super	
  
 class	
  1	
                           class	
  2	
  



Class	
  1	
      Class	
  2	
     New	
  class	
  




         Simulated	
  new	
  characters	
  
Simula:ng	
  New	
  Characters	
  
                                                            Real	
  data	
  within	
  super	
  class	
  
                                               Global	
  



 Super	
                                Super	
  
 class	
  1	
                           class	
  2	
  



Class	
  1	
      Class	
  2	
     New	
  class	
  




         Simulated	
  new	
  characters	
  
Simula:ng	
  New	
  Characters	
  
                                                            Real	
  data	
  within	
  super	
  class	
  
                                               Global	
  



 Super	
                                Super	
  
 class	
  1	
                           class	
  2	
  



Class	
  1	
      Class	
  2	
     New	
  class	
  




         Simulated	
  new	
  characters	
  
Simula:ng	
  New	
  Characters	
  
                                                            Real	
  data	
  within	
  super	
  class	
  
                                               Global	
  



 Super	
                                Super	
  
 class	
  1	
                           class	
  2	
  



Class	
  1	
      Class	
  2	
     New	
  class	
  




         Simulated	
  new	
  characters	
  
Learning from very few examples
  3	
  examples	
  of	
  
  	
  	
  a	
  new	
  class	
  


Condi:onal	
  samples	
  
in	
  the	
  same	
  class	
  




Inferred	
  super-­‐class	
  
Learning from very few examples
Learning from very few examples
Learning from very few examples
Learning from very few examples
Learning from very few examples
Learning from very few examples
Learning from very few examples
Learning from very few examples
Hierarchical-­‐Deep	
  	
  
   So	
  far	
  we	
  have	
  considered	
  directed	
  +	
  undirected	
  models.	
  

                                                                                             Deep	
  Lamber:an	
  Networks	
  	
  
“animal”	
                                         “vehicle”	
  



                                                                                                                                            Deep	
  
                                                                                                                                            Undirected	
  
                                                                              K
                                                                           Topics	
  


   	
  horse	
   	
  cow	
     	
  car	
     	
  van	
     	
  truck	
  
                                                                                                                                             Directed	
  



                                                                                        Combines	
  the	
  elegant	
  proper:es	
  of	
  the	
  
                                                                                        Lamber:an	
  model	
  with	
  the	
  Gaussian	
  RBMs	
  
      Low-­‐level	
  features:	
  
                                                                                        (and	
  Deep	
  Belief	
  Nets,	
  Deep	
  Boltzmann	
  
      replace	
  GIST,	
  SIFT	
                                                        Machines).	
  
                                                                                                                      Tang	
  et.	
  al.,	
  ICML	
  2012	
  
Deep	
  Lamber:an	
  Networks	
  
Model	
  Specifics	
  
                                                        Image	
      Surface	
  	
     Light	
  	
  
  Deep	
  Lamber:an	
  Net	
  	
                                     Normals	
  
                                                        albedo	
                       source	
  




                                         Inferred	
  



                          Observed	
  


 Inference:	
  Gibbs	
  sampler.	
  
 Learning:	
  Stochas:c	
  Approxima:on	
  
Deep	
  Lamber:an	
  Networks	
  
Yale	
  B	
  Extended	
  Database	
  

One	
  Test	
  Image	
     Two	
  Test	
  Images	
     Face	
  Religh:ng	
  
Deep	
  Lamber:an	
  Networks	
  
  Recogni:on	
  as	
  func:on	
  of	
  the	
  number	
  of	
  training	
  images	
  for	
  10	
  test	
  
  subjects.	
  	
  




One-­‐Shot	
  
Recogni:on	
  
Recursive	
  Neural	
  Networks	
  
Recursive	
  structure	
  learning	
  	
  




                                             Local	
  recursive	
  networks	
  are	
  
                                             making	
  predic:ons	
  whether	
  
                                             to	
  merge	
  the	
  two	
  inputs	
  as	
  
                                             well	
  as	
  predic:ng	
  the	
  label.	
  	
  


                                             Use	
  Max-­‐Margin	
  Es:ma:on.	
  




                                                      Socher	
  et.	
  al.,	
  ICML	
  2011	
  
Recursive	
  Neural	
  Networks	
  
Recursive	
  structure	
  learning	
  	
  




                                             Socher	
  et.	
  al.,	
  ICML	
  2011	
  
Learning	
  from	
  Few	
  Examples	
  
SUN	
  database	
  
                                     car	
  




                                                               van	
     truck	
  
                                                                                     bus	
  




                    Classes	
  sorted	
  by	
  frequency	
  
 Rare	
  objects	
  are	
  similar	
  to	
  frequent	
  objects	
  
Learning	
  from	
  Few	
  Examples	
  
            chair	
  

                               armchair	
  

                                               Swivel	
  chair	
  
                                                                     Deck	
  chair	
  




    Classes	
  sorted	
  by	
  frequency	
  
Genera:ve	
  Model	
  of	
  Classifier	
  
                    Parameters	
  
Many	
  state-­‐of-­‐the-­‐art	
  object	
  detec:on	
  systems	
  use	
  sophis:cated	
  models,	
  
based	
  on	
  mul:ple	
  parts	
  with	
  separate	
  appearance	
  and	
  shape	
  components.	
  

                                              Detect	
  objects	
  by	
  tes:ng	
  sub-­‐windows	
  and	
  
                                              scoring	
  corresponding	
  test	
  patches	
  with	
  a	
  linear	
  
                                              func:on.	
  


                                              Define	
  hierarchical	
  prior	
  over	
  parameters	
  
                                              of	
  discrimina;ve	
  model	
  and	
  learn	
  the	
  
                                              hierarchy.	
  	
  

  Image	
  Specific:	
  concatena:on	
  of	
  the	
  	
  
  HOG	
  feature	
  pyramid	
  at	
  mul:ple	
  scales.	
  
  Felzenszwalb,	
  McAllester	
  &	
  Ramanan,	
  2008	
  
Genera:ve	
  Model	
  of	
  Classifier	
  
                            Parameters	
  
                                                                             Hierarchical	
  
                                                                             Bayes	
  
  By	
  learning	
  hierarchical	
  structure,	
  	
  
  we	
  can	
  improve	
  the	
  current	
  	
  
  state-­‐of-­‐the-­‐art.	
  	
  


  Sun	
  Dataset:	
  32,855	
  examples	
  of	
  	
  	
  
  200	
  categories	
  
                                                            185	
  ex	
   27	
  ex	
   12	
  ex	
  
Hierarchical	
  Model	
  



Single	
  Class	
  
Talk	
  Roadmap	
  
                  •  Unsupervised	
  Feature	
  Learning	
  
                      –  Restricted	
  Boltzmann	
  Machines	
  
    DBM	
             –  Deep	
  Belief	
  Networks	
  
                      –  Deep	
  Boltzmann	
  Machines	
  
                  •  Transfer	
  Learning	
  with	
  Deep	
  
                     Models	
  
RBM	
             •  Mul:modal	
  Learning	
  
Mul:-­‐Modal	
  Input	
  
 Learning	
  systems	
  that	
  combine	
  mul:ple	
  input	
  domains	
  
       Images	
                                                                Text	
  &	
  Language	
  	
  




           Video	
  
                                                                                        Laser	
  scans	
  




    Speech	
  &	
  	
  
    Audio	
                                                                                             Time	
  series	
  	
  
                                                                                                        data	
  

Develop	
  learning	
  systems	
  that	
  come	
  	
            One	
  of	
  Key	
  Challenges:	
  	
  
closer	
  to	
  displaying	
  human	
  like	
  intelligence	
   	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  Inference	
  
Mul:-­‐Modal	
  Input	
  
Learning	
  systems	
  that	
  combine	
  mul:ple	
  input	
  domains	
  




                                                                                    Image	
         Text	
  


More	
  robust	
  percep:on.	
  	
  
Ngiam	
  et.al.,	
  ICML	
  2011	
  used	
  deep	
  autoencoders	
  (video	
  +	
  speech)	
  

• 	
  Guillaumin,	
  Verbeek,	
  and	
  Schmid,	
  CVPR	
  2011	
  
• 	
  Huiskes,	
  Thomee,	
  and	
  Lew,	
  Mul:media	
  Informa:on	
  Retrieval,	
  2010	
  	
  
• 	
  Xing,	
  Yan,	
  and	
  Hauptmann,	
  UAI	
  2005.	
  	
  
Training	
  Data	
  
pentax,	
  k10d,	
                                                                         camera,	
  jahdakine,	
  
kangarooisland	
                                                                           lightpain:ng,	
  
southaustralia,	
  sa	
                                                                    reflec:on	
  
australia	
                                                                                doublepaneglass	
  
australiansealion	
  300mm	
                                                               wowiekazowie	
  	
  




sandbanks,	
  lake,	
  
lakeontario,	
  sunset,	
                                                             top20buperflies	
  
walking,	
  beach,	
  purple,	
  
sky,	
  water,	
  clouds,	
  
overtheexcellence	
  	
  




                                                                                mickikrimmel,	
  
                                                                                mickipedia,	
  headshot	
  
 <no	
  text>	
  




                         Samples	
  from	
  the	
  MIR	
  Flickr	
  Dataset	
  -­‐	
  Crea:ve	
  Commons	
  License	
  
Mul:-­‐Modal	
  Input	
  
•  Improve	
  Classifica:on	
  	
  
                                  pentax,	
  k10d,	
  kangarooisland	
  
                                  southaustralia,	
  sa	
  australia	
                      SEA	
  /	
  NOT	
  SEA	
  
                                  australiansealion	
  300mm	
  



•  Fill	
  in	
  Missing	
  Modali:es	
  
                                                                           beach,	
  sea,	
  surf,	
  
                                                                           strand,	
  shore,	
  
                                                                           wave,	
  seascape,	
  
                                                                           sand,	
  ocean,	
  waves	
  


•  Retrieve	
  data	
  from	
  one	
  modality	
  when	
  queried	
  using	
  data	
  from	
  
   another	
  modality	
  
        beach,	
  sea,	
  surf,	
  
        strand,	
  shore,	
  
        wave,	
  seascape,	
  
        sand,	
  ocean,	
  waves	
  
Mul:-­‐Modal	
  Deep	
  Belief	
  Net	
  



Gaussian	
  RBM	
  
                                             Replicated	
  Sobmax	
  
   Dense	
            Image	
     Text	
         Sparse	
  counts	
  
Mul:-­‐Modal	
  Deep	
  Belief	
  Net	
  
• 	
  Flickr	
  Data	
  -­‐	
  1	
  Million	
  images	
  along	
  with	
  text	
  tags,	
  25K	
  annotated	
  
Recogni:on	
  Results	
  
• 	
  Mul:modal	
  Inputs	
  (images	
  +	
  text),	
  38	
  classes.	
  	
  	
  
      Learning	
  Algorithm	
                         Mean	
  Average	
  Precision	
  
      Image-­‐text	
  SVM	
                                       0.475	
  
      Image-­‐text	
  LDA	
                                       0.492	
  
      Mul:modal	
  DBN	
                                          0.566	
  

• 	
  Unimodal	
  Inputs	
  (images	
  only).	
  	
  
      Learning	
  Algorithm	
                         Mean	
  Average	
  Precision	
  
      Image-­‐SVM	
                                               0.375	
  
      Image-­‐LDA	
                                               0.315	
  
      Image	
  DBN	
                                              0.413	
  
Papern	
  Comple:on	
  
Given	
  a	
  test	
  image,	
  we	
  generate	
  associated	
  text	
  –	
  achieve	
  far	
  beper	
  
classifica:on	
  results.	
  	
  	
  
Thank	
  you	
  

Code	
  for	
  learning	
  RBMs,	
  DBNs,	
  and	
  DBMs	
  is	
  available	
  at:	
  
hpp://www.mit.edu/~rsalakhu/	
  	
  

Más contenido relacionado

La actualidad más candente

lecture07.ppt
lecture07.pptlecture07.ppt
lecture07.pptbutest
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationYan Xu
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural NetworksDatabricks
 
Neural Networks: Multilayer Perceptron
Neural Networks: Multilayer PerceptronNeural Networks: Multilayer Perceptron
Neural Networks: Multilayer PerceptronMostafa G. M. Mostafa
 
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)Fellowship at Vodafone FutureLab
 
Artificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computationArtificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computationMohammed Bennamoun
 
Detailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss FunctionDetailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss Function범준 김
 
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryHands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryAhmed Yousry
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent methodSanghyuk Chun
 
Runge–Kutta methods for ordinary differential equations
Runge–Kutta methods for ordinary differential equationsRunge–Kutta methods for ordinary differential equations
Runge–Kutta methods for ordinary differential equationsYuri Vyatkin
 
activelearning.ppt
activelearning.pptactivelearning.ppt
activelearning.pptbutest
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkKnoldus Inc.
 
Independent Component Analysis
Independent Component AnalysisIndependent Component Analysis
Independent Component AnalysisTatsuya Yokota
 

La actualidad más candente (20)

lecture07.ppt
lecture07.pptlecture07.ppt
lecture07.ppt
 
Associative memory network
Associative memory networkAssociative memory network
Associative memory network
 
Back propagation method
Back propagation methodBack propagation method
Back propagation method
 
CNN Tutorial
CNN TutorialCNN Tutorial
CNN Tutorial
 
Deep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and RegularizationDeep Feed Forward Neural Networks and Regularization
Deep Feed Forward Neural Networks and Regularization
 
Training Neural Networks
Training Neural NetworksTraining Neural Networks
Training Neural Networks
 
Neural Networks: Multilayer Perceptron
Neural Networks: Multilayer PerceptronNeural Networks: Multilayer Perceptron
Neural Networks: Multilayer Perceptron
 
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
AlexNet(ImageNet Classification with Deep Convolutional Neural Networks)
 
Artificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computationArtificial Neural Networks Lect1: Introduction & neural computation
Artificial Neural Networks Lect1: Introduction & neural computation
 
Detailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss FunctionDetailed Description on Cross Entropy Loss Function
Detailed Description on Cross Entropy Loss Function
 
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousryHands on machine learning with scikit-learn and tensor flow by ahmed yousry
Hands on machine learning with scikit-learn and tensor flow by ahmed yousry
 
Gradient descent method
Gradient descent methodGradient descent method
Gradient descent method
 
Runge–Kutta methods for ordinary differential equations
Runge–Kutta methods for ordinary differential equationsRunge–Kutta methods for ordinary differential equations
Runge–Kutta methods for ordinary differential equations
 
Multi Layer Network
Multi Layer NetworkMulti Layer Network
Multi Layer Network
 
activelearning.ppt
activelearning.pptactivelearning.ppt
activelearning.ppt
 
vector QUANTIZATION
vector QUANTIZATIONvector QUANTIZATION
vector QUANTIZATION
 
Introduction to Recurrent Neural Network
Introduction to Recurrent Neural NetworkIntroduction to Recurrent Neural Network
Introduction to Recurrent Neural Network
 
Independent Component Analysis
Independent Component AnalysisIndependent Component Analysis
Independent Component Analysis
 
Deep Learning for Computer Vision: ImageNet Challenge (UPC 2016)
Deep Learning for Computer Vision: ImageNet Challenge (UPC 2016)Deep Learning for Computer Vision: ImageNet Challenge (UPC 2016)
Deep Learning for Computer Vision: ImageNet Challenge (UPC 2016)
 
Restricted boltzmann machine
Restricted boltzmann machineRestricted boltzmann machine
Restricted boltzmann machine
 

Destacado

Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hier...
Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hier...Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hier...
Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hier...Mad Scientists
 
Deep Belief nets
Deep Belief netsDeep Belief nets
Deep Belief netsbutest
 
Tutorial on Deep learning and Applications
Tutorial on Deep learning and ApplicationsTutorial on Deep learning and Applications
Tutorial on Deep learning and ApplicationsNhatHai Phan
 
Georgia Tech cse6242 - Intro to Deep Learning and DL4J
Georgia Tech cse6242 - Intro to Deep Learning and DL4JGeorgia Tech cse6242 - Intro to Deep Learning and DL4J
Georgia Tech cse6242 - Intro to Deep Learning and DL4JJosh Patterson
 
Energy based models and boltzmann machines
Energy based models and boltzmann machinesEnergy based models and boltzmann machines
Energy based models and boltzmann machinesSoowan Lee
 
Lattice boltzmann method ammar
Lattice boltzmann method ammarLattice boltzmann method ammar
Lattice boltzmann method ammarKrunal Gangawane
 
Deep belief networks for spam filtering
Deep belief networks for spam filteringDeep belief networks for spam filtering
Deep belief networks for spam filteringSOYEON KIM
 
« Convolutional Deep Belief Networks »
« Convolutional Deep Belief Networks »« Convolutional Deep Belief Networks »
« Convolutional Deep Belief Networks » Loghin Dumitru
 
Basic Understanding of the Deep
Basic Understanding of the DeepBasic Understanding of the Deep
Basic Understanding of the DeepMad Scientists
 
Sf data mining_meetup
Sf data mining_meetupSf data mining_meetup
Sf data mining_meetupAdam Gibson
 
Searching for magic formula by deep learning
Searching for magic formula by deep learningSearching for magic formula by deep learning
Searching for magic formula by deep learningJames Ahn
 
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningDeep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningBigDataCloud
 
Learning RBM(Restricted Boltzmann Machine in Practice)
Learning RBM(Restricted Boltzmann Machine in Practice)Learning RBM(Restricted Boltzmann Machine in Practice)
Learning RBM(Restricted Boltzmann Machine in Practice)Mad Scientists
 
P04 restricted boltzmann machines cvpr2012 deep learning methods for vision
P04 restricted boltzmann machines cvpr2012 deep learning methods for visionP04 restricted boltzmann machines cvpr2012 deep learning methods for vision
P04 restricted boltzmann machines cvpr2012 deep learning methods for visionzukun
 
Deep Belief Networks
Deep Belief NetworksDeep Belief Networks
Deep Belief NetworksHasan H Topcu
 
Face Feature Recognition System with Deep Belief Networks, for Korean/KIISE T...
Face Feature Recognition System with Deep Belief Networks, for Korean/KIISE T...Face Feature Recognition System with Deep Belief Networks, for Korean/KIISE T...
Face Feature Recognition System with Deep Belief Networks, for Korean/KIISE T...Mad Scientists
 
Restricted Boltzmann Machine - A comprehensive study with a focus on Deep Bel...
Restricted Boltzmann Machine - A comprehensive study with a focus on Deep Bel...Restricted Boltzmann Machine - A comprehensive study with a focus on Deep Bel...
Restricted Boltzmann Machine - A comprehensive study with a focus on Deep Bel...Indraneel Pole
 
2010 deep learning and unsupervised feature learning
2010 deep learning and unsupervised feature learning2010 deep learning and unsupervised feature learning
2010 deep learning and unsupervised feature learningVan Thanh
 
Deep learning Tutorial - Part II
Deep learning Tutorial - Part IIDeep learning Tutorial - Part II
Deep learning Tutorial - Part IIQuantUniversity
 

Destacado (20)

Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hier...
Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hier...Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hier...
Convolutional Deep Belief Networks for Scalable Unsupervised Learning of Hier...
 
Deep Belief nets
Deep Belief netsDeep Belief nets
Deep Belief nets
 
Tutorial on Deep learning and Applications
Tutorial on Deep learning and ApplicationsTutorial on Deep learning and Applications
Tutorial on Deep learning and Applications
 
Georgia Tech cse6242 - Intro to Deep Learning and DL4J
Georgia Tech cse6242 - Intro to Deep Learning and DL4JGeorgia Tech cse6242 - Intro to Deep Learning and DL4J
Georgia Tech cse6242 - Intro to Deep Learning and DL4J
 
Energy based models and boltzmann machines
Energy based models and boltzmann machinesEnergy based models and boltzmann machines
Energy based models and boltzmann machines
 
Lattice boltzmann method ammar
Lattice boltzmann method ammarLattice boltzmann method ammar
Lattice boltzmann method ammar
 
Deep belief networks for spam filtering
Deep belief networks for spam filteringDeep belief networks for spam filtering
Deep belief networks for spam filtering
 
« Convolutional Deep Belief Networks »
« Convolutional Deep Belief Networks »« Convolutional Deep Belief Networks »
« Convolutional Deep Belief Networks »
 
Basic Understanding of the Deep
Basic Understanding of the DeepBasic Understanding of the Deep
Basic Understanding of the Deep
 
DNN and RBM
DNN and RBMDNN and RBM
DNN and RBM
 
Sf data mining_meetup
Sf data mining_meetupSf data mining_meetup
Sf data mining_meetup
 
Searching for magic formula by deep learning
Searching for magic formula by deep learningSearching for magic formula by deep learning
Searching for magic formula by deep learning
 
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher ManningDeep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
Deep Learning for NLP (without Magic) - Richard Socher and Christopher Manning
 
Learning RBM(Restricted Boltzmann Machine in Practice)
Learning RBM(Restricted Boltzmann Machine in Practice)Learning RBM(Restricted Boltzmann Machine in Practice)
Learning RBM(Restricted Boltzmann Machine in Practice)
 
P04 restricted boltzmann machines cvpr2012 deep learning methods for vision
P04 restricted boltzmann machines cvpr2012 deep learning methods for visionP04 restricted boltzmann machines cvpr2012 deep learning methods for vision
P04 restricted boltzmann machines cvpr2012 deep learning methods for vision
 
Deep Belief Networks
Deep Belief NetworksDeep Belief Networks
Deep Belief Networks
 
Face Feature Recognition System with Deep Belief Networks, for Korean/KIISE T...
Face Feature Recognition System with Deep Belief Networks, for Korean/KIISE T...Face Feature Recognition System with Deep Belief Networks, for Korean/KIISE T...
Face Feature Recognition System with Deep Belief Networks, for Korean/KIISE T...
 
Restricted Boltzmann Machine - A comprehensive study with a focus on Deep Bel...
Restricted Boltzmann Machine - A comprehensive study with a focus on Deep Bel...Restricted Boltzmann Machine - A comprehensive study with a focus on Deep Bel...
Restricted Boltzmann Machine - A comprehensive study with a focus on Deep Bel...
 
2010 deep learning and unsupervised feature learning
2010 deep learning and unsupervised feature learning2010 deep learning and unsupervised feature learning
2010 deep learning and unsupervised feature learning
 
Deep learning Tutorial - Part II
Deep learning Tutorial - Part IIDeep learning Tutorial - Part II
Deep learning Tutorial - Part II
 

Similar a Deep Learning Structure Discovery

NIPS2007: deep belief nets
NIPS2007: deep belief netsNIPS2007: deep belief nets
NIPS2007: deep belief netszukun
 
Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative ModelsMLReview
 
Restricted Boltzmann Machines.pptx
Restricted Boltzmann Machines.pptxRestricted Boltzmann Machines.pptx
Restricted Boltzmann Machines.pptxhusseinali674716
 
Boltzmann Machines in Deep learning and machine learning also used for traini...
Boltzmann Machines in Deep learning and machine learning also used for traini...Boltzmann Machines in Deep learning and machine learning also used for traini...
Boltzmann Machines in Deep learning and machine learning also used for traini...venkatasaisumanth74
 
Acoustic modeling using deep belief networks
Acoustic modeling using deep belief networksAcoustic modeling using deep belief networks
Acoustic modeling using deep belief networksYueshen Xu
 
IEEE 2015 Matlab Projects
IEEE 2015 Matlab ProjectsIEEE 2015 Matlab Projects
IEEE 2015 Matlab ProjectsVijay Karan
 
Reading group nfm - 20170312
Reading group  nfm - 20170312Reading group  nfm - 20170312
Reading group nfm - 20170312Shuai Zhang
 
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-LearningMeta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-LearningMLAI2
 
jefferson-mae Masked Autoencoders based Pretraining
jefferson-mae Masked Autoencoders based Pretrainingjefferson-mae Masked Autoencoders based Pretraining
jefferson-mae Masked Autoencoders based Pretrainingcevesom156
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspectiveAnirban Santara
 
IEEE 2015 Matlab Projects
IEEE 2015 Matlab ProjectsIEEE 2015 Matlab Projects
IEEE 2015 Matlab ProjectsVijay Karan
 
Score based generative model
Score based generative modelScore based generative model
Score based generative modelsangyun lee
 
multi modal transformers representation generation .pptx
multi modal transformers representation generation .pptxmulti modal transformers representation generation .pptx
multi modal transformers representation generation .pptxsiddharth1729
 
Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...
Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...
Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...SBGC
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsbutest
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsbutest
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsbutest
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsbutest
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsbutest
 

Similar a Deep Learning Structure Discovery (20)

NIPS2007: deep belief nets
NIPS2007: deep belief netsNIPS2007: deep belief nets
NIPS2007: deep belief nets
 
Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative Models
 
Restricted Boltzmann Machines.pptx
Restricted Boltzmann Machines.pptxRestricted Boltzmann Machines.pptx
Restricted Boltzmann Machines.pptx
 
Boltzmann Machines in Deep learning and machine learning also used for traini...
Boltzmann Machines in Deep learning and machine learning also used for traini...Boltzmann Machines in Deep learning and machine learning also used for traini...
Boltzmann Machines in Deep learning and machine learning also used for traini...
 
Acoustic modeling using deep belief networks
Acoustic modeling using deep belief networksAcoustic modeling using deep belief networks
Acoustic modeling using deep belief networks
 
IEEE 2015 Matlab Projects
IEEE 2015 Matlab ProjectsIEEE 2015 Matlab Projects
IEEE 2015 Matlab Projects
 
Reading group nfm - 20170312
Reading group  nfm - 20170312Reading group  nfm - 20170312
Reading group nfm - 20170312
 
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-LearningMeta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
Meta-GMVAE: Mixture of Gaussian VAE for Unsupervised Meta-Learning
 
jefferson-mae Masked Autoencoders based Pretraining
jefferson-mae Masked Autoencoders based Pretrainingjefferson-mae Masked Autoencoders based Pretraining
jefferson-mae Masked Autoencoders based Pretraining
 
Deep learning from a novice perspective
Deep learning from a novice perspectiveDeep learning from a novice perspective
Deep learning from a novice perspective
 
IEEE 2015 Matlab Projects
IEEE 2015 Matlab ProjectsIEEE 2015 Matlab Projects
IEEE 2015 Matlab Projects
 
Score based generative model
Score based generative modelScore based generative model
Score based generative model
 
multi modal transformers representation generation .pptx
multi modal transformers representation generation .pptxmulti modal transformers representation generation .pptx
multi modal transformers representation generation .pptx
 
Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...
Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...
Java image processing ieee projects 2012 @ Seabirds ( Chennai, Bangalore, Hyd...
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
 
Cristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical modelsCristopher M. Bishop's tutorial on graphical models
Cristopher M. Bishop's tutorial on graphical models
 
Deep Learning Opening Workshop - Improving Generative Models - Junier Oliva, ...
Deep Learning Opening Workshop - Improving Generative Models - Junier Oliva, ...Deep Learning Opening Workshop - Improving Generative Models - Junier Oliva, ...
Deep Learning Opening Workshop - Improving Generative Models - Junier Oliva, ...
 

Más de zukun

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009zukun
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVzukun
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Informationzukun
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statisticszukun
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibrationzukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionzukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluationzukun
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-softwarezukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptorszukun
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectorszukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-introzukun
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video searchzukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video searchzukun
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video searchzukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learningzukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionzukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick startzukun
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysiszukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structureszukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities zukun
 

Más de zukun (20)

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 

Último

Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxCarlos105
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfTechSoup
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYKayeClaireEstoconing
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptxmary850239
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsManeerUddin
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxleah joy valeriano
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4MiaBumagat1
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationRosabel UA
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 

Último (20)

Barangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptxBarangay Council for the Protection of Children (BCPC) Orientation.pptx
Barangay Council for the Protection of Children (BCPC) Orientation.pptx
 
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdfInclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
Inclusivity Essentials_ Creating Accessible Websites for Nonprofits .pdf
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITYISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
ISYU TUNGKOL SA SEKSWLADIDA (ISSUE ABOUT SEXUALITY
 
4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx4.18.24 Movement Legacies, Reflection, and Review.pptx
4.18.24 Movement Legacies, Reflection, and Review.pptx
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
 
Food processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture honsFood processing presentation for bsc agriculture hons
Food processing presentation for bsc agriculture hons
 
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptxMusic 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
Music 9 - 4th quarter - Vocal Music of the Romantic Period.pptx
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4ANG SEKTOR NG agrikultura.pptx QUARTER 4
ANG SEKTOR NG agrikultura.pptx QUARTER 4
 
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptxYOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
YOUVE GOT EMAIL_FINALS_EL_DORADO_2024.pptx
 
Activity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translationActivity 2-unit 2-update 2024. English translation
Activity 2-unit 2-update 2024. English translation
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 

Deep Learning Structure Discovery

  • 1. Deep  Learning     Russ  Salakhutdinov   Department of Statistics and Computer Science! University of Toronto  
  • 2. Mining  for  Structure   Massive  increase  in  both  computa:onal  power  and  the  amount  of   data  available  from  web,  video  cameras,  laboratory  measurements.   Images  &  Video   Text  &  Language     Speech  &  Audio   Gene  Expression   Rela:onal  Data/     Climate  Change   Product     Social  Network   Geological  Data   Recommenda:on   Mostly  Unlabeled   •   Develop  sta:s:cal  models  that  can  discover  underlying  structure,  cause,  or   sta:s:cal  correla:on  from  data  in  unsupervised  or  semi-­‐supervised  way.     •   Mul:ple  applica:on  domains.  
  • 3. Talk  Roadmap   •  Unsupervised  Feature  Learning   –  Restricted  Boltzmann  Machines   DBM   –  Deep  Belief  Networks   –  Deep  Boltzmann  Machines   •  Transfer  Learning  with  Deep   Models   RBM   •  Mul:modal  Learning  
  • 4. Restricted  Boltzmann  Machines      hidden  variables   Bipar:te     Stochas:c  binary  visible  variables   Structure   are  connected  to  stochas:c  binary  hidden   variables                  .     The  energy  of  the  joint  configura:on:     Image            visible  variables   model  parameters.   Probability  of  the  joint  configura:on  is  given  by  the  Boltzmann  distribu:on:   par::on  func:on   poten:al  func:ons   Markov  random  fields,  Boltzmann  machines,  log-­‐linear  models.  
  • 5. Restricted  Boltzmann  Machines      hidden  variables   Bipar:te     Structure   Restricted:      No  interac:on  between          hidden  variables   Inferring  the  distribu:on  over  the   Image            visible  variables   hidden  variables  is  easy:   Factorizes:  Easy  to  compute   Similarly:   Markov  random  fields,  Boltzmann  machines,  log-­‐linear  models.  
  • 6. Model  Learning      hidden  variables   Given  a  set  of  i.i.d.  training  examples                                      ,  we  want  to  learn     model  parameters                .         Maximize  (penalized)  log-­‐likelihood  objec:ve:   Image            visible  variables   Deriva:ve  of  the  log-­‐likelihood:   Regulariza:on     Difficult  to  compute:  exponen:ally  many     configura:ons  
  • 7. Model  Learning      hidden  variables   Given  a  set  of  i.i.d.  training  examples                                      ,  we  want  to  learn     model  parameters                .         Maximize  (penalized)  log-­‐likelihood  objec:ve:   Image            visible  variables   Deriva:ve  of  the  log-­‐likelihood:   Approximate  maximum  likelihood  learning:     Contras:ve  Divergence  (Hinton  2000)          Pseudo  Likelihood  (Besag  1977)     MCMC-­‐MLE  es:mator  (Geyer  1991)              Composite  Likelihoods  (Lindsay,  1988;  Varin  2008)     Tempered  MCMC                    Adap:ve  MCMC     (Salakhutdinov,  NIPS  2009)              (Salakhutdinov,  ICML  2010)  
  • 8. RBMs  for  Images   Gaussian-­‐Bernoulli  RBM:     Define  energy  func:ons  for                                various  data  modali:es:   Image            visible  variables   Gaussian   Bernoulli  
  • 9. RBMs  for  Images   Gaussian-­‐Bernoulli  RBM:     Interpreta:on:  Mixture  of  exponen:al   number  of  Gaussians   Image            visible  variables    where    is  an  implicit  prior,  and     Gaussian  
  • 10. RBMs  for  Images  and  Text   Images:  Gaussian-­‐Bernoulli  RBM   Learned  features  (out  of  10,000)   4  million  unlabelled  images   Text:  Mul:nomial-­‐Bernoulli  RBM   Learned  features:  ``topics’’   russian   clinton   computer   trade   stock   Reuters  dataset:   russia   house   system   country   wall   804,414  unlabeled   moscow   president   product   import   street   newswire  stories   yeltsin   bill   sobware   world   point   soviet   congress   develop   economy   dow   Bag-­‐of-­‐Words    
  • 11. Collabora:ve  Filtering   Bernoulli  hidden:  user  preferences   h Learned  features:  ``genre’’   W1 Fahrenheit  9/11   Independence  Day   v Bowling  for  Columbine   The  Day  Aber  Tomorrow   The  People  vs.  Larry  Flynt   Con  Air   Mul:nomial  visible:  user  ra:ngs   Canadian  Bacon   Men  in  Black  II   La  Dolce  Vita   Men  in  Black   Neglix  dataset:     480,189  users     Friday  the  13th   Scary  Movie   The  Texas  Chainsaw  Massacre   Naked  Gun     17,770  movies     Children  of  the  Corn   Hot  Shots!   Over  100  million  ra:ngs   Child's  Play   American  Pie     The  Return  of  Michael  Myers   Police  Academy   State-­‐of-­‐the-­‐art  performance     on  the  Neglix  dataset.     Relates  to  Probabilis;c  Matrix  Factoriza;on   (Salakhutdinov & Mnih ICML 2007)!
  • 12. Mul:ple  Applica:on  Domains   •  Natural  Images   •  Text/Documents   •  Collabora:ve  Filtering  /  Matrix  Factoriza:on     •  Video  (Langford  et  al.  ICML  2009  ,  Lee  et  al.)     •  Mo:on  Capture  (Taylor  et.al.  NIPS  2007)   •  Speech  Percep:on  (Dahl  et.  al.  NIPS  2010,  Lee  et.al.  NIPS  2010)   Same  learning  algorithm  -­‐-­‐                          mul:ple  input  domains. Limita:ons  on  the  types  of  structure  that  can  be   represented  by  a  single  layer  of  low-­‐level  features!  
  • 13. Talk  Roadmap   •  Unsupervised  Feature  Learning   –  Restricted  Boltzmann  Machines   DBM   –  Deep  Belief  Networks   –  Deep  Boltzmann  Machines   •  Transfer  Learning  with  Deep   Models   RBM   •  Mul:modal  Learning  
  • 14. Deep  Belief  Network     Low-­‐level  features:   Edges   Built  from  unlabeled  inputs.     Input:  Pixels   Image  
  • 15. Deep  Belief  Network   Unsupervised  feature  learning.   Internal  representa:ons  capture   higher-­‐order  sta:s:cal  structure   Higher-­‐level  features:   Combina:on  of  edges   Low-­‐level  features:   Edges   Built  from  unlabeled  inputs.     Input:  Pixels   Image   (Hinton et.al. Neural Computation 2006)!
  • 16. Deep  Belief  Network   The  joint  probability   Deep  Belief  Network   distribu:on  factorizes:   h3 W3 RBM   h2 W2 Sigmoid  Belief     RBM   Sigmoid     h1 Belief     Network   Network   W1 v
  • 17. DBNs  for  Classifica:on   2000 W3 500 Softmax Output RBM 10 10 500 WT 4 WT+ 4 4 W2 2000 2000 500 WT 3 WT+ 3 3 RBM 500 500 WT 2 WT+ 2 2 500 500 500 W1 WT 1 WT+ 1 1 RBM Pretraining Unrolling Fine tuning •   Aber  layer-­‐by-­‐layer  unsupervised  pretraining,  discrimina:ve  fine-­‐tuning     by  backpropaga:on  achieves  an  error  rate  of  1.2%  on  MNIST.  SVM’s  get   1.4%  and  randomly  ini:alized  backprop  gets  1.6%.     •   Clearly  unsupervised  learning  helps  generaliza:on.  It  ensures  that  most  of   the  informa:on  in  the  weights  comes  from  modeling  the  input  data.  
  • 18. Deep  Autoencoders   B-=4>-, 6$ !* )4C ($$ %&' ) ) !" !" ! ; #$$$ #$$$ ) ) ($$ !# !# ! : !6 "$$$ "$$$ ) ) "$$$ !6 !6 ! 9 %&' ($$ ($$ ) ) !* !* ! ( 6$ ?4>-@5/A-, 6$ "$$$ !* !* ! * !# ($$ ($$ #$$$ %&' !6 !6 ! 6 "$$$ "$$$ !# !# ! # #$$$ #$$$ #$$$ !" !" !" ! " %&' <1=4>-, +,-.,/01012 31,455012 701- .81012
  • 19. Deep  Genera:ve  Model   Model  P(document)   Reuters  dataset:  804,414     newswire  stories:  unsupervised   European Community Interbank Markets Monetary/Economic Energy Markets Disasters and Accidents Leading Legal/Judicial Economic Indicators Bag  of  words   Government Accounts/ Borrowings Earnings (Hinton & Salakhutdinov, Science 2006)!
  • 20. Informa:on  Retrieval   Interbank Markets European Community Monetary/Economic 2-­‐D  LSA  space   Energy Markets Disasters and Accidents Leading Legal/Judicial Economic Indicators Government Accounts/ Borrowings Earnings •   The  Reuters  Corpus  Volume  II  contains  804,414  newswire  stories   (randomly  split  into  402,207  training  and  402,207  test).   •   “Bag-­‐of-­‐words”:  each  ar:cle  is  represented  as  a  vector  containing  the  counts  of   the  most  frequently  used  2000  words  in  the  training  set.  
  • 21. Seman:c  Hashing   European Community Monetary/Economic Address Space Disasters and Accidents Semantically Similar Documents Semantic Hashing Government Function Energy Markets Borrowing Document Accounts/Earnings •   Learn  to  map  documents  into  seman;c  20-­‐D  binary  codes.   •   Retrieve  similar  documents  stored  at  the  nearby  addresses  with  no   search  at  all.  
  • 22. Searching  Large  Image  Database   using  Binary  Codes   •   Map  images  into  binary  codes  for  fast  retrieval.   •   Small  Codes,  Torralba,  Fergus,  Weiss,  CVPR  2008   •   Spectral  Hashing,  Y.  Weiss,  A.  Torralba,  R.  Fergus,  NIPS  2008   •   Kulis  and  Darrell,  NIPS  2009,  Gong  and  Lazebnik,  CVPR  20111   •   Norouzi  and  Fleet,  ICML  2011,    
  • 23. Talk  Roadmap   •  Unsupervised  Feature  Learning   –  Restricted  Boltzmann  Machines   DBM   –  Deep  Belief  Networks   –  Deep  Boltzmann  Machines   •  Transfer  Learning  with  Deep   Models   RBM   •  Mul:modal  Learning  
  • 24. DBNs  vs.  DBMs   Deep Belief Network! Deep Boltzmann Machine! h3 h3 W3 W3 h2 h2 W2 W2 h1 h1 W1 W1 v v DBNs  are  hybrid  models:     •   Inference  in  DBNs  is  problema:c  due  to  explaining  away.   •   Only  greedy  pretrainig,  no  joint  op;miza;on  over  all  layers.     •   Approximate  inference  is  feed-­‐forward:  no  boMom-­‐up  and  top-­‐down.         Introduce  a  new  class  of  models  called  Deep  Boltzmann  Machines.  
  • 25. Mathema:cal  Formula:on   Deep  Boltzmann  Machine   model  parameters   h3 •  Dependencies  between  hidden  variables.   W3 •  All  connec:ons  are  undirected.   h2 •  Bopom-­‐up  and  Top-­‐down:   W2 h1 W1 v Top-­‐down   Bopom-­‐up   Input   Unlike  many  exis:ng  feed-­‐forward  models:  ConvNet  (LeCun),   HMAX  (Poggio  et.al.),  Deep  Belief  Nets  (Hinton  et.al.)  
  • 26. Mathema:cal  Formula:on   Neural  Network     Deep  Boltzmann  Machine   Deep  Belief  Network   Output   h3 h3 h3 W3 W3 W3 h2 h2 h2 W2 W2 W2 h1 h1 h1 W1 W1 W1 v v v Input   Unlike  many  exis:ng  feed-­‐forward  models:  ConvNet  (LeCun),   HMAX  (Poggio),  Deep  Belief  Nets  (Hinton)  
  • 27. Mathema:cal  Formula:on   Neural  Network     Deep  Boltzmann  Machine   Deep  Belief  Network   Output   h3 h3 h3 W3 W3 W3 h2 h2 h2 inference   W2 W2 W2 h1 h1 h1 W1 W1 W1 v v v Input   Unlike  many  exis:ng  feed-­‐forward  models:  ConvNet  (LeCun),   HMAX  (Poggio),  Deep  Belief  Nets  (Hinton)  
  • 28. Mathema:cal  Formula:on   Deep  Boltzmann  Machine   model  parameters   h3 •  Dependencies  between  hidden  variables.   W3 Maximum  likelihood  learning:   h2 W2 h1 Problem:  Both  expecta:ons  are   W1 intractable!   v Learning  rule  for  undirected  graphical  models:     MRFs,  CRFs,  Factor  graphs.    
  • 29. Previous  Work   Many  approaches  for  learning  Boltzmann  machines  have  been   proposed  over  the  last  20  years:   •   Hinton  and  Sejnowski  (1983),   •   Peterson  and  Anderson  (1987)   •   Galland  (1991)     Real-­‐world  applica:ons  –  thousands     •   Kappen  and  Rodriguez  (1998)   of  hidden  and  observed  variables   •   Lawrence,  Bishop,  and  Jordan  (1998)   •   Tanaka  (1998)     with  millions  of  parameters.   •   Welling  and  Hinton  (2002)     •   Zhu  and  Liu  (2002)   •   Welling  and  Teh  (2003)   •   Yasuda  and  Tanaka  (2009)       Many  of  the  previous  approaches  were  not  successful  for  learning   general  Boltzmann  machines  with  hidden  variables.   Algorithms  based  on  Contras:ve  Divergence,  Score  Matching,  Pseudo-­‐ Likelihood,  Composite  Likelihood,  MCMC-­‐MLE,  Piecewise  Learning,  cannot   handle  mul:ple  layers  of  hidden  variables.      
  • 30. New  Learning  Algorithm   Posterior  Inference   Simulate  from  the  Model   Condi:onal   Uncondi:onal   Approximate   Approximate  the   condi:onal   joint  distribu:on   (Salakhutdinov, 2008; NIPS 2009)!
  • 31. New  Learning  Algorithm   Posterior  Inference   Simulate  from  the  Model   Condi:onal   Uncondi:onal   Approximate   Approximate  the   condi:onal   joint  distribu:on   Data-­‐dependent   Data-­‐independent   density   Match    
  • 32. New  Learning  Algorithm   Posterior  Inference   Simulate  from  the  Model   Condi:onal   Uncondi:onal   Approximate   Approximate  the   condi:onal   joint  distribu:on   Markov  Chain   Mean-­‐Field   Monte  Carlo   Data-­‐dependent   Data-­‐independent   Match     Key  Idea  of  Our  Approach:   Data-­‐dependent:          Varia;onal  Inference,  mean-­‐field  theory   Data-­‐independent:    Stochas;c  Approxima;on,  MCMC  based  
  • 33. Sampling  from  DBMs   Sampling  from  two-­‐hidden  layer  DBM:  by  running  Markov  chain:   Randomly   ini:alize   …   Sample  
  • 34. Stochas:c  Approxima:on   Time    t=1   t=2   t=3   h2 h2 h2 Update   Update   h1 h1 h1 v v v Update            and            sequen:ally,    where   •  Generate                    by  simula:ng  from  a  Markov  chain   that  leaves                  invariant  (e.g.  Gibbs  or  M-­‐H  sampler)   •  Update            by  replacing  intractable                      with  a  point   es:mate     In  prac:ce  we  simulate  several  Markov  chains  in  parallel.   Robbins  and  Monro,  Ann.  Math.  Stats,  1957    L.  Younes,    Probability  Theory  1989,  Tieleman,  ICML  2008.    
  • 35. Stochas:c  Approxima:on   Update  rule  decomposes:   True  gradient   Noise  term   Almost  sure  convergence  guarantees  as  learning  rate       Problem:  High-­‐dimensional  data:   Markov  Chain   the  energy  landscape  is  highly     Monte  Carlo   mul:modal   Key  insight:  The  transi:on  operator  can  be     any  valid  transi:on  operator  –  Tempered     Transi:ons,  Parallel/Simulated  Tempering.   Connec:ons  to  the  theory  of  stochas:c  approxima:on  and  adap:ve  MCMC.  
  • 36. Varia:onal  Inference   Approximate  intractable  distribu:on                                  with  simpler,  tractable   distribu:on            :   Posterior  Inference   Mean-­‐Field   Varia:onal  Lower  Bound   Minimize  KL  between  approxima:ng  and  true   distribu:ons  with  respect  to  varia:onal  parameters          .     (Salakhutdinov & Larochelle, AI & Statistics 2010)!
  • 37. Varia:onal  Inference   Approximate  intractable  distribu:on                                  with  simpler,  tractable   distribu:on            :   Posterior  Inference   Varia:onal  Lower  Bound   Mean-­‐Field:  Choose  a  fully  factorized  distribu:on:   Mean-­‐Field   with   Varia;onal  Inference:  Maximize  the  lower  bound  w.r.t.   Varia:onal  parameters          .     Nonlinear  fixed-­‐   point  equa:ons:  
  • 38. Varia:onal  Inference   Approximate  intractable  distribu:on                                  with  simpler,  tractable   distribu:on            :   Posterior  Inference   Varia:onal  Lower  Bound   Uncondi:onal  Simula:on   Markov  Chain   Mean-­‐Field   Fast  Inference   1.  Varia;onal  Inference:  Maximize  the  lower     bound  w.r.t.  varia:onal  parameters             Monte  Carlo   2.  MCMC:  Apply  stochas:c  approxima:on     Learning  can  scale  to   to  update  model  parameters                               millions  of  examples   Almost  sure  convergence  guarantees  to  an  asympto:cally   stable  point.  
  • 39. Good  Genera:ve  Model?   Handwripen  Characters  
  • 40. Good  Genera:ve  Model?   Handwripen  Characters  
  • 41. Good  Genera:ve  Model?   Handwripen  Characters   Simulated   Real  Data  
  • 42. Good  Genera:ve  Model?   Handwripen  Characters   Real  Data   Simulated  
  • 43. Good  Genera:ve  Model?   Handwripen  Characters  
  • 44. Good  Genera:ve  Model?   MNIST  Handwripen  Digit  Dataset  
  • 45. Deep  Boltzmann  Machine   Sanskrit   Model  P(image)   25,000  characters  from  50   alphabets  around  the  world.   •   3,000  hidden  variables   •   784    observed  variables        (28  by  28  images)   •   Over  2  million  parameters   Bernoulli  Markov  Random  Field  
  • 46. Deep  Boltzmann  Machine   Condi:onal   Simula:on   P(image|par:al  image)   Bernoulli  Markov  Random  Field  
  • 47. Handwri:ng  Recogni:on   MNIST  Dataset   Op:cal  Character  Recogni:on   60,000  examples  of  10  digits   42,152  examples  of  26  English  lepers     Learning  Algorithm   Error   Learning  Algorithm   Error   Logis:c  regression   12.0%   Logis:c  regression   22.14%   K-­‐NN     3.09%   K-­‐NN     18.92%   Neural  Net  (Plap  2005)   1.53%   Neural  Net   14.62%   SVM  (Decoste  et.al.  2002)   1.40%   SVM  (Larochelle  et.al.  2009)   9.70%   Deep  Autoencoder   1.40%   Deep  Autoencoder   10.05%   (Bengio  et.  al.  2007)     (Bengio  et.  al.  2007)     Deep  Belief  Net   1.20%   Deep  Belief  Net   9.68%   (Hinton  et.  al.  2006)     (Larochelle  et.  al.  2009)     DBM     0.95%   DBM   8.40%   Permuta:on-­‐invariant  version.  
  • 48. Deep  Boltzmann  Machine   Gaussian-­‐Bernoulli  Markov   Deep  Boltzmann  Machine   Random  Field   12,000  Latent     Variables   Model  P(image)   Planes   96  by  96   images   24,000  Training  Images   Stereo  pair  
  • 49. Genera:ve  Model  of  3-­‐D  Objects   24,000  examples,  5  object  categories,  5  different  objects  within  each     category,  6  lightning  condi:ons,  9  eleva:ons,  18  azimuths.      
  • 50. 3-­‐D  Object  Recogni:on   Papern  Comple:on   Learning  Algorithm   Error   Logis:c  regression   22.5%   K-­‐NN  (LeCun  2004)   18.92%   SVM  (Bengio  &  LeCun    2007)   11.6%   Deep  Belief  Net  (Nair  &   9.0%   Hinton    2009)     DBM   7.2%   Permuta:on-­‐invariant  version.  
  • 51. Learning  Part-­‐based  Hierarchy   Object  parts.   Combina:on  of  edges.   Trained  from  mul:ple  classes                     (cars,  faces,  motorbikes,  airplanes).   Lee  et.al.,  ICML  2009  
  • 52. Robust  Boltzmann  Machines   •   Build  more  complex  models  that  can  deal  with  occlusions  or  structured   noise.     Gaussian  RBM,  modeling   Binary  RBM  modeling   clean  faces   occlusions   Inferred   Binary  pixel-­‐wise   Gaussian  noise   Mask   Observed   Relates  to  Le  Roux,  Heess,  Shopon,  and  Winn,   Neural  Computa:on,  2011   Eslami,  Heess,  Winn,  CVPR  2012   Tang  et.  al.,  CVPR  2012  
  • 53. Robust  Boltzmann  Machines   Internal  States  of   RoBM  during   learning.     Inference  on  the  test   subjects   Comparing  to  Other   Denoising  Algorithms  
  • 54. Spoken  Query  Detec:on   •  630  speaker  TIMIT  corpus:  3,696  training  and  944  test  uperances.   •  10  query  keywords  were  randomly  selected  and  10  examples  of   each  keyword  were  extracted  from  the  training  set.     •  Goal:  For  each  keyword,  rank  all  944  uperances  based  on  the   uperance’s  probability  of  containing  that  keyword.       •  Performance  measure:  The  average  equal  error  rate  (EER).     13.5 Learning  Algorithm   AVG  EER   13 GMM  Unsupervised     16.4%   12.5 12 DBM  Unsupervised     14.7%   Avg. EER 11.5 DBM  (1%  labels)   13.3%   11 DBM  (30%  labels)   10.5%   10.5 DBM  (100%  labels)   9.7%   10 9.5 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Training Ratio (Yaodong  Zhang  et.al.    ICASSP  2012)!
  • 55. Learning  Hierarchical  Representa:ons   Deep  Boltzmann  Machines:     Learning  Hierarchical  Structure     in  Features:  edges,  combina:on     of  edges.     •   Performs  well  in  many  applica:on  domains   •   Combines  bopom  and  top-­‐down   •   Fast  Inference:  frac:on  of  a  second   •   Learning  scales  to  millions  of  examples   Many  examples,  few  categories   Next:  Few  examples,  many  categories  –  Transfer  Learning  
  • 56. Talk  Roadmap   DBM   •  Unsupervised  Feature  Learning   –  Restricted  Boltzmann  Machines   –  Deep  Belief  Networks   –  Deep  Boltzmann  Machines   •  Transfer  Learning  with  Deep   Models   •  Mul:modal  Learning   “animal”   “vehicle”    horse    cow    car    van    truck  
  • 57. One-­‐shot  Learning   “zarc” “segway” How  can  we  learn  a  novel  concept  –  a  high  dimensional   sta:s:cal  object  –  from  few  examples.      
  • 58. Learning  from  Few  Examples   SUN  database   car   van   truck   bus   Classes  sorted  by  frequency   Rare  objects  are  similar  to  frequent  objects  
  • 59. Tradi:onal  Supervised  Learning   Segway   Motorcycle   Test:     What  is  this?  
  • 60. Learning  to  Transfer   Background  Knowledge   Millions  of  unlabeled  images     Learn  to  Transfer   Knowledge   Some  labeled  images   Learn  novel  concept   from  one  example   Bicycle   Dolphin   Test:     What  is  this?   Elephant   Tractor  
  • 61. Learning  to  Transfer   Background  Knowledge   Millions  of  unlabeled  images     Learn  to  Transfer   Knowledge   Key  problem  in  computer  vision,     speech  percep:on,  natural  language   processing,  and  many  other  domains.   Some  labeled  images   Learn  novel  concept   from  one  example   Bicycle   Dolphin   Test:     What  is  this?   Elephant   Tractor  
  • 62. One-­‐Shot  Learning   Level 3 {τ 0, α0} Hierarchical  Bayesian   Level 2 {µk, τ k, αk} Level 1 Animal Vehicle ... Models   {µc, τ c} Cow Horse Sheep Truck Car Hierarchical  Prior.   Probability of observed Prior probability of data given parameters weight vector W Posterior probability of parameters given the training data D. •   Fei-­‐Fei,  Fergus,  and  Perona,  TPAMI  2006   •   E.  Bart,  I.  Porteous,  P.  Perona,  and  M.  Welling,  CVPR  2007   •   Miller,  Matsakis,  and  Viola,  CVPR  2000   •   Sivic,  Russell,    Zisserman,  Freeman,  and  Efros,  CVPR  2008  
  • 63. Hierarchical-­‐Deep  Models   HD  Models:  Compose  hierarchical  Bayesian   Deep  Nets   models  with  deep  networks,  two  influen:al   Part-­‐based  Hierarchy   approaches  from  unsupervised  learning   Deep  Networks:   •   learn  mul:ple  layers  of  nonlineari;es.   •   trained  in  unsupervised  fashion  -­‐-­‐   unsupervised  feature  learning  –  no  need  to   Marr  and  Nishihara  (1978)   rely  on  human-­‐crabed  input  representa:ons.   •   labeled  data  is  used  to  slightly  adjust  the   Hierarchical  Bayes   model  for  a  specific  task.   Category-­‐based  Hierarchy   Hierarchical  Bayes:   •   explicitly  represent  category  hierarchies  for   sharing  abstract  knowledge.   •   explicitly  iden:fy  only  a  small  number  of   parameters  that  are  relevant  to  the  new   concept  being  learned.   Collins  &  Quillian  (1969)  
  • 64. Mo:va:on   Learning  to  transfer  knowledge:     Super-­‐class   Hierarchical   •   Super-­‐category:  “A  segway  looks   Segway   like  a  funny  kind  of  vehicle”.   •   Higher-­‐level  features,  or  parts,    shared   with  other  classes:   Parts   Ø   wheel,  handle,  post   •   Lower-­‐level  features:   Ø   edges,  composi:on  of  edges     Deep   Edges  
  • 65. Hierarchical  Genera:ve  Model   Hierarchical  Latent  Dirichlet   Alloca:on  Model   “animal”   “vehicle”   K  horse    cow    car    van    truck   DBM  Model   Lower-­‐level  generic  features:     •   edges,  combina:on  of  edges   Images   (Salakhutdinov, Tenenbaum, Torralba, 2011)!
  • 66. Hierarchical  Genera:ve  Model   Hierarchical  Latent  Dirichlet   Alloca:on  Model   “animal”   “vehicle”   Hierarchical  Organiza;on  of  Categories:   •   express  priors  on  the  features  that  are   typical  of  different  kinds  of  concepts   •   modular  data-­‐parameter  rela:ons         K Higher-­‐level  class-­‐sensi;ve  features:    horse    cow    car    van    truck   •   capture  dis:nc:ve  perceptual   structure  of  a  specific  concept   DBM  Model   Lower-­‐level  generic  features:     •   edges,  combina:on  of  edges   Images   (Salakhutdinov, Tenenbaum, Torralba, 2011)!
  • 67. Intui:on   α Pr(topic  |  doc)   π Words  ⇔  ac:va:ons  of  DBM’s  top-­‐level  units.     z Topics  ⇔  distribu:ons  over  top-­‐level  units,  or   θ higher-­‐level  parts.     w Pr(word  |  topic)   DBM  generic  features:     LDA  high-­‐level  features:   Images   Words   Topics   Documents   Each  topic  is  made  up  of  words.   Each  document  is  made  up  of  topics.  
  • 68. Hierarchical  Deep  Model   “animal”   “vehicle”   K Topics    horse    cow    car    van    truck  
  • 69. Hierarchical  Deep  Model   Tree  hierarchy  of     classes  is  learned   “animal”   “vehicle”      (Nested  Chinese  Restaurant  Process)   prior:  a  nonparametric  prior  over  tree   structures.   K Topics    horse    cow    car    van    truck  
  • 70. Hierarchical  Deep  Model   Tree  hierarchy  of     classes  is  learned   “animal”   “vehicle”      (Nested  Chinese  Restaurant  Process)   prior:  a  nonparametric  prior  over  tree   structures.      (Hierarchical  Dirichlet  Process)  prior:   K a  nonparametric  prior  allowing  categories  to   Topics   share  higher-­‐level  features,  or  parts.    horse    cow    car    van    truck  
  • 71. Hierarchical  Deep  Model   Tree  hierarchy  of     classes  is  learned   “animal”   “vehicle”   Unlike  standard  sta:s:cal  models,        (Nested  Chinese  Restaurant  Process)   in  addi:on  ntonparametric  prior  arameters,   prior:  a   structures   o  inferring  p over  tree   we  also  infer  the  hierarchy  for      (Hierarchical  Dirichlet  Process)  prior:   sharing  nonparametric  prior  allowing  categories  to   K a   those  parameters.   Topics   share  higher-­‐level  features,  or  parts.   Condi;onal  Deep  Boltzmann    horse    cow    car    van    truck   Machine.   Enforce  (approximate)  global  consistency     through  many  local  constraints.    
  • 72. CIFAR  Object  Recogni:on   Tree  hierarchy  of     classes  is  learned   50,000  images  of  100  classes   “animal”   “vehicle”   Higher-­‐level  class   Inference:  Markov  chain     sensi:ve  features     Monte  Carlo  –  Later!    horse    cow    car    van    truck   4  million  unlabeled  images   Lower-­‐level   generic    features     32  x  32  pixels  x  3    RGB  
  • 73. Learning  to  Learn   The  model  learns  how  to  share  the  knowledge  across  many  visual   categories.   Learned  super-­‐ “global”   class  hierarchy   “aqua;c   animal”   “fruit”   “human”   dolphin   turtle   shark   ray   apple   orange   sunflower   girl   baby   man   Basic  level     class   woman   …   Learned  higher-­‐level   class-­‐sensi;ve  features   …   Learned  low-­‐level   generic  features  
  • 74. Learning  to  Learn   The  model  learns  how  to  share  the  knowledge  across  many  visual   categories.   “global”   Learned  super-­‐ crocodile   spider   class  hierarchy   “aqua;c   snake   lizard   “fruit”   “human”   castle   road   animal”   squirrel   bridge   kangaroo   skyscraper   bus   house   leopard   dolphin   fox   turtle   shark   ray   apple   orange   sunflower   truck   Basic  level     train   ;ger   girl   baby   man   tank   lion   wolf   tractor   streetcar   class   oMer   skunk   woman   shrew   porcupine   pine   dolphin   oak   maple  tree   bear   ray   whale   shark   …   Learned  higher-­‐level   willow  tree   camel   turtle   boMle   elephant   caMle   class-­‐sensi;ve   can   lamp   bowl   cup   chimpanzee   beaver   features   mouse   raccoon   Learned  low-­‐level   …   apple   peer   pepper   man   boy   man   generic  features   hamster   rabbit   possum   orange   sunflower   girl   woman  
  • 75. Sharing  Features   “fruit”   Reconst-­‐   Learning  to   Real   ruc:ons   Shape   Color   Learn   Dolphin   Sunflower   Orange   Apple   apple   orange   sunflower   Sunflower  ROC  curve   5  3      1  ex’s     1 0.9 0.8 0.7 detection rate 0.6 0.5 0.4 Pixel-­‐ 0.3 0.2 space     0.1 distance   0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 false alarm rate Learning  to  Learn:  Learning  a  hierarchy  for  sharing  parameters  –                  rapid  learning  of  a  novel  concept.  
  • 76. Object  Recogni:on     Area under ROC curve for same/different (1 new class vs. 99 distractor classes) 1   HDP-­‐DBM   LDA   DBM        HDP-­‐DBM   0.95   GIST   (class  condi:onal)   (no  super-­‐classes)   0.9   0.85   0.8   0.75   0.7   0.65   1   3   5   10   50   #  examples   [Averaged  over  40  test  classes]   Our  model  outperforms  standard  computer  vision     features  (e.g.  GIST).  
  • 77. Handwripen  Character  Recogni:on   “alphabet  1”   “alphabet  2”   Learned   higher-­‐level     features   Strokes    char  1    char  2    char  3      char  4    char  5   Learned  lower-­‐ level  features   25,000     Edge characters   s  
  • 78. Handwripen  Character  Recogni:on   Area under ROC curve for same/different (1 new class vs. 1000 distractor classes) HDP-­‐DBM   1   HDP-­‐DBM   LDA     DBM   (no  super-­‐classes)   0.95   (class  condi:onal)   0.9   Pixels   0.85   0.8   0.75   0.7   0.65   1   3   5   10   [Averaged  over  40  test  classes]   #  examples  
  • 79. Simula:ng  New  Characters   Real  data  within  super  class   Global   Super   Super   class  1   class  2   Class  1   Class  2   New  class   Simulated  new  characters  
  • 80. Simula:ng  New  Characters   Real  data  within  super  class   Global   Super   Super   class  1   class  2   Class  1   Class  2   New  class   Simulated  new  characters  
  • 81. Simula:ng  New  Characters   Real  data  within  super  class   Global   Super   Super   class  1   class  2   Class  1   Class  2   New  class   Simulated  new  characters  
  • 82. Simula:ng  New  Characters   Real  data  within  super  class   Global   Super   Super   class  1   class  2   Class  1   Class  2   New  class   Simulated  new  characters  
  • 83. Simula:ng  New  Characters   Real  data  within  super  class   Global   Super   Super   class  1   class  2   Class  1   Class  2   New  class   Simulated  new  characters  
  • 84. Simula:ng  New  Characters   Real  data  within  super  class   Global   Super   Super   class  1   class  2   Class  1   Class  2   New  class   Simulated  new  characters  
  • 85. Simula:ng  New  Characters   Real  data  within  super  class   Global   Super   Super   class  1   class  2   Class  1   Class  2   New  class   Simulated  new  characters  
  • 86. Learning from very few examples 3  examples  of      a  new  class   Condi:onal  samples   in  the  same  class   Inferred  super-­‐class  
  • 87. Learning from very few examples
  • 88. Learning from very few examples
  • 89. Learning from very few examples
  • 90. Learning from very few examples
  • 91. Learning from very few examples
  • 92. Learning from very few examples
  • 93. Learning from very few examples
  • 94. Learning from very few examples
  • 95. Hierarchical-­‐Deep     So  far  we  have  considered  directed  +  undirected  models.   Deep  Lamber:an  Networks     “animal”   “vehicle”   Deep   Undirected   K Topics    horse    cow    car    van    truck   Directed   Combines  the  elegant  proper:es  of  the   Lamber:an  model  with  the  Gaussian  RBMs   Low-­‐level  features:   (and  Deep  Belief  Nets,  Deep  Boltzmann   replace  GIST,  SIFT   Machines).   Tang  et.  al.,  ICML  2012  
  • 96. Deep  Lamber:an  Networks   Model  Specifics   Image   Surface     Light     Deep  Lamber:an  Net     Normals   albedo   source   Inferred   Observed   Inference:  Gibbs  sampler.   Learning:  Stochas:c  Approxima:on  
  • 97. Deep  Lamber:an  Networks   Yale  B  Extended  Database   One  Test  Image   Two  Test  Images   Face  Religh:ng  
  • 98. Deep  Lamber:an  Networks   Recogni:on  as  func:on  of  the  number  of  training  images  for  10  test   subjects.     One-­‐Shot   Recogni:on  
  • 99. Recursive  Neural  Networks   Recursive  structure  learning     Local  recursive  networks  are   making  predic:ons  whether   to  merge  the  two  inputs  as   well  as  predic:ng  the  label.     Use  Max-­‐Margin  Es:ma:on.   Socher  et.  al.,  ICML  2011  
  • 100. Recursive  Neural  Networks   Recursive  structure  learning     Socher  et.  al.,  ICML  2011  
  • 101. Learning  from  Few  Examples   SUN  database   car   van   truck   bus   Classes  sorted  by  frequency   Rare  objects  are  similar  to  frequent  objects  
  • 102. Learning  from  Few  Examples   chair   armchair   Swivel  chair   Deck  chair   Classes  sorted  by  frequency  
  • 103. Genera:ve  Model  of  Classifier   Parameters   Many  state-­‐of-­‐the-­‐art  object  detec:on  systems  use  sophis:cated  models,   based  on  mul:ple  parts  with  separate  appearance  and  shape  components.   Detect  objects  by  tes:ng  sub-­‐windows  and   scoring  corresponding  test  patches  with  a  linear   func:on.   Define  hierarchical  prior  over  parameters   of  discrimina;ve  model  and  learn  the   hierarchy.     Image  Specific:  concatena:on  of  the     HOG  feature  pyramid  at  mul:ple  scales.   Felzenszwalb,  McAllester  &  Ramanan,  2008  
  • 104. Genera:ve  Model  of  Classifier   Parameters   Hierarchical   Bayes   By  learning  hierarchical  structure,     we  can  improve  the  current     state-­‐of-­‐the-­‐art.     Sun  Dataset:  32,855  examples  of       200  categories   185  ex   27  ex   12  ex   Hierarchical  Model   Single  Class  
  • 105. Talk  Roadmap   •  Unsupervised  Feature  Learning   –  Restricted  Boltzmann  Machines   DBM   –  Deep  Belief  Networks   –  Deep  Boltzmann  Machines   •  Transfer  Learning  with  Deep   Models   RBM   •  Mul:modal  Learning  
  • 106. Mul:-­‐Modal  Input   Learning  systems  that  combine  mul:ple  input  domains   Images   Text  &  Language     Video   Laser  scans   Speech  &     Audio   Time  series     data   Develop  learning  systems  that  come     One  of  Key  Challenges:     closer  to  displaying  human  like  intelligence                                                      Inference  
  • 107. Mul:-­‐Modal  Input   Learning  systems  that  combine  mul:ple  input  domains   Image   Text   More  robust  percep:on.     Ngiam  et.al.,  ICML  2011  used  deep  autoencoders  (video  +  speech)   •   Guillaumin,  Verbeek,  and  Schmid,  CVPR  2011   •   Huiskes,  Thomee,  and  Lew,  Mul:media  Informa:on  Retrieval,  2010     •   Xing,  Yan,  and  Hauptmann,  UAI  2005.    
  • 108. Training  Data   pentax,  k10d,   camera,  jahdakine,   kangarooisland   lightpain:ng,   southaustralia,  sa   reflec:on   australia   doublepaneglass   australiansealion  300mm   wowiekazowie     sandbanks,  lake,   lakeontario,  sunset,   top20buperflies   walking,  beach,  purple,   sky,  water,  clouds,   overtheexcellence     mickikrimmel,   mickipedia,  headshot   <no  text>   Samples  from  the  MIR  Flickr  Dataset  -­‐  Crea:ve  Commons  License  
  • 109. Mul:-­‐Modal  Input   •  Improve  Classifica:on     pentax,  k10d,  kangarooisland   southaustralia,  sa  australia   SEA  /  NOT  SEA   australiansealion  300mm   •  Fill  in  Missing  Modali:es   beach,  sea,  surf,   strand,  shore,   wave,  seascape,   sand,  ocean,  waves   •  Retrieve  data  from  one  modality  when  queried  using  data  from   another  modality   beach,  sea,  surf,   strand,  shore,   wave,  seascape,   sand,  ocean,  waves  
  • 110. Mul:-­‐Modal  Deep  Belief  Net   Gaussian  RBM   Replicated  Sobmax   Dense   Image   Text   Sparse  counts  
  • 111. Mul:-­‐Modal  Deep  Belief  Net   •   Flickr  Data  -­‐  1  Million  images  along  with  text  tags,  25K  annotated  
  • 112. Recogni:on  Results   •   Mul:modal  Inputs  (images  +  text),  38  classes.       Learning  Algorithm   Mean  Average  Precision   Image-­‐text  SVM   0.475   Image-­‐text  LDA   0.492   Mul:modal  DBN   0.566   •   Unimodal  Inputs  (images  only).     Learning  Algorithm   Mean  Average  Precision   Image-­‐SVM   0.375   Image-­‐LDA   0.315   Image  DBN   0.413  
  • 113. Papern  Comple:on   Given  a  test  image,  we  generate  associated  text  –  achieve  far  beper   classifica:on  results.      
  • 114. Thank  you   Code  for  learning  RBMs,  DBNs,  and  DBMs  is  available  at:   hpp://www.mit.edu/~rsalakhu/