SlideShare una empresa de Scribd logo
1 de 100
2007 NIPS Tutorial on:   Deep Belief Nets Geoffrey Hinton Canadian Institute for Advanced Research & Department of Computer Science University of Toronto
Some things you will learn in this tutorial ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
A spectrum of machine learning tasks ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Typical Statistics-- -- -- -- -- -- Artificial Intelligence
Historical background: First generation neural networks ,[object Object],[object Object],[object Object],non-adaptive hand-coded features output units  e.g. class labels input units e.g. pixels Sketch of a typical perceptron from the 1960’s Bomb Toy
Second generation neural networks (~1985) input vector hidden layers outputs Back-propagate  error signal to get derivatives for learning Compare outputs with  correct answer  to get error signal
A temporary digression ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What is wrong with back-propagation? ,[object Object],[object Object],[object Object],[object Object],[object Object]
Overcoming the limitations of  back-propagation ,[object Object],[object Object],[object Object],[object Object],[object Object]
Belief Nets ,[object Object],[object Object],[object Object],[object Object],stochastic hidden  cause visible  effect We will use nets composed of layers of stochastic binary variables with weighted connections.  Later, we will generalize to other types of variable.
Stochastic binary units (Bernoulli variables) ,[object Object],[object Object],0 0 1
Learning Deep Belief Nets ,[object Object],[object Object],[object Object],[object Object],stochastic hidden  cause visible  effect
The learning rule for sigmoid belief nets ,[object Object],[object Object],j i learning rate
Explaining away  (Judea Pearl) ,[object Object],[object Object],truck hits house earthquake house jumps 20 20 -20 -10 -10 p(1,1)=.0001 p(1,0)=.4999 p(0,1)=.4999 p(0,0)=.0001 posterior
Why it is usually very hard to learn  sigmoid belief nets one layer at a time ,[object Object],[object Object],[object Object],[object Object],[object Object],data hidden variables hidden variables hidden variables likelihood W prior
Two types of generative neural network ,[object Object],[object Object],[object Object]
Restricted Boltzmann Machines (Smolensky ,1986, called them “harmoniums”) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],hidden i j visible
The Energy of a joint configuration (ignoring terms to do with biases) weight between units i and j Energy with configuration  v  on the visible units and  h  on the hidden units binary state of visible unit i binary state of hidden unit j
Weights    Energies    Probabilities ,[object Object],[object Object],[object Object],[object Object]
Using energies to define probabilities ,[object Object],[object Object],partition function
A picture of the maximum likelihood learning algorithm for an RBM i j i j i j i j t = 0  t = 1  t = 2  t = infinity Start with a training vector on the visible units. Then alternate between updating all the hidden units in parallel and updating all the visible units in parallel. a fantasy
A quick way to learn an RBM i j i j t = 0  t = 1  Start with a training vector on the visible units. Update all the hidden units in parallel Update the all the visible units in parallel to get a “reconstruction”. Update the hidden units again .  This is not following the gradient of the log likelihood. But it works well. It is approximately following the gradient of another objective function (Carreira-Perpinan & Hinton, 2005). reconstruction data
How to learn a set of features that are good for reconstructing images of the digit 2   50 binary feature neurons   16 x 16 pixel  image   50 binary feature neurons   16 x 16 pixel  image   Increment  weights between an active pixel and an active feature Decrement  weights between an active pixel and an active feature data  (reality) reconstruction  (better than reality)
The final 50  x  256 weights Each neuron grabs a different feature.
How well can we reconstruct the digit images from the binary feature activations? Reconstruction from activated binary features Data Reconstruction from activated binary features Data New test images from the digit class that the model was trained on Images from an unfamiliar digit class  (the network tries to see every image as a 2)
Three ways to combine probability density models (an underlying theme of the tutorial) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Training a deep network (the main reason RBM’s are interesting) ,[object Object],[object Object],[object Object],[object Object],[object Object]
The generative model after learning 3 layers ,[object Object],[object Object],[object Object],[object Object],h2 data h1 h3
Why does greedy learning work?  An aside: Averaging factorial distributions  ,[object Object],[object Object],[object Object]
Why does greedy learning work? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],data distribution on visible units aggregated  posterior distribution  on hidden units   Task 2 Task 1
Why does greedy learning work? The weights, W,  in the bottom level RBM define p(v|h) and they also, indirectly, define p(h). So we can express the RBM model as If we leave p(v|h) alone and improve p(h), we will improve p(v).  To improve p(h), we need it to be a better model of the  aggregated posterior  distribution over hidden vectors produced by applying W to the data.
Which distributions are factorial in a directed belief net? ,[object Object],[object Object],[object Object],[object Object]
Why does greedy learning fail in a directed module? ,[object Object],[object Object],[object Object],[object Object],[object Object],data distribution on visible units Task 2 Task 1 aggregated  posterior distribution  on hidden units
A model of digit recognition 2000 top-level neurons 500 neurons 500 neurons   28 x 28 pixel  image   10 label neurons   The model learns to generate combinations of labels and images.  To perform recognition we start with a neutral state of the label units and do an up-pass from the image followed by a few iterations of the top-level associative memory. The top two layers form an associative memory  whose  energy landscape models the low dimensional manifolds of the digits. The energy valleys have names
Fine-tuning with a contrastive version of the “wake-sleep” algorithm ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Show the movie of the network generating digits  (available at www.cs.toronto/~hinton)
Samples generated by letting the associative memory run with one label clamped. There are 1000 iterations of alternating Gibbs sampling between samples.
Examples of correctly recognized handwritten digits that the neural network had never seen before  Its very good
How well does it discriminate on MNIST test set with no extra information about geometric distortions? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Unsupervised “pre-training” also helps for models that have more data and better priors ,[object Object],[object Object],Back-propagation alone:  0.49%  Unsupervised layer-by-layer pre-training followed by backprop:  0.39%  (record)
Another view of why layer-by-layer  learning works ,[object Object],[object Object]
An infinite sigmoid belief net that is equivalent to an RBM ,[object Object],[object Object],[object Object],v 1 h 1 v 0 h 0 v 2 h 2 etc.
[object Object],[object Object],[object Object],[object Object],[object Object],Inference in a directed net with replicated weights v 1 h 1 v 0 h 0 v 2 h 2 etc. + + + +
[object Object],[object Object],v 1 h 1 v 0 h 0 v 2 h 2 etc.
[object Object],[object Object],[object Object],Learning a deep directed network v 1 h 1 v 0 h 0 v 2 h 2 etc. v 0 h 0
[object Object],[object Object],v 1 h 1 v 0 h 0 v 2 h 2 etc. v 1 h 0
How many layers should we use and how wide should they be?  (I am indebted to Karl Rove for this slide) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
What happens when the weights in higher layers become different from the weights in the first layer? ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
A stack of RBM’s (Yee-Whye Teh’s idea) ,[object Object],[object Object],[object Object],v h 1 h 2 h L
The variational bound Now we cancel out all of the partition functions except the top one and replace log probabilities by goodnesses using the fact that: This has simple derivatives that give a more justifiable fine-tuning algorithm than contrastive wake-sleep . Each time we replace the prior over the hidden units by a better prior, we win by the difference in the probability assigned
Summary so far ,[object Object],[object Object],[object Object],[object Object],[object Object]
Overview of the rest of the tutorial ,[object Object],[object Object],[object Object],[object Object],[object Object]
BREAK
Fine-tuning for discrimination ,[object Object],[object Object],[object Object],[object Object],[object Object]
Why backpropagation works better after greedy pre-training ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
First, model the distribution of digit images 2000 units 500 units  500 units  28 x 28 pixel  image   The network learns a density model for unlabeled digit images. When we generate from the model we get things that look like real digits of all classes.  But do the hidden features really help with digit discrimination?  Add 10 softmaxed units to the top and do backpropagation. The top two layers form a restricted Boltzmann machine whose free energy landscape should model the low dimensional manifolds of the digits.
Results on permutation-invariant MNIST task ,[object Object],[object Object],[object Object],[object Object]
Combining deep belief nets with Gaussian processes ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Learning to extract the orientation of a face patch  (Salakhutdinov & Hinton, NIPS 2007)
The training and test sets 11,000 unlabeled cases 100, 500, or 1000 labeled cases face patches from new people
The root mean squared error in the orientation when combining GP’s with deep belief nets 22.2  17.9  15.2 17.2  12.7  7.2 16.3  11.2  6.4 GP on the pixels GP on top-level features GP on top-level features with fine-tuning 100 labels 500 labels 1000 labels Conclusion: The deep features are much better than the pixels. Fine-tuning helps a lot.
Modeling real-valued data ,[object Object],[object Object],[object Object],[object Object],[object Object]
The free-energy of a mean-field logistic unit ,[object Object],[object Object],0  output->  1 F  energy -  entropy
An RBM with real-valued visible units ,[object Object],E   energy-gradient produced by the total input to a visible unit   parabolic containment function Welling et. al. (2005) show how to extend RBM’s to the exponential family. See also Bengio et. al. 2007)
Deep Autoencoders (Hinton & Salakhutdinov, 2006) ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],1000  neurons 500 neurons 500 neurons   250 neurons   250 neurons   30   1000  neurons 28x28 28x28 linear units
A comparison of methods for compressing digit images to 30 real numbers. real  data 30-D  deep auto 30-D logistic PCA 30-D  PCA
Do the 30-D codes found by the deep autoencoder preserve the class structure of the data? ,[object Object],[object Object],[object Object],[object Object]
entirely unsupervised except for the colors
Retrieving documents that are similar to a query document ,[object Object],[object Object]
How to compress the count vector  ,[object Object],[object Object],[object Object],2000  reconstructed counts 500 neurons 2000  word counts 500 neurons   250 neurons   250 neurons   10   input vector output vector
Performance of the autoencoder at document retrieval ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Proportion of retrieved documents in same class as query Number of documents retrieved
First compress all documents to 2 numbers using a type of PCA  Then use different colors for different document categories
First compress all documents to 2 numbers.  Then use different colors for different document categories
Finding binary codes for documents ,[object Object],[object Object],[object Object],[object Object],[object Object],2000  reconstructed counts 500 neurons 2000  word counts 500 neurons   250 neurons   250 neurons   30   noise
Semantic hashing: Using a deep autoencoder as a hash-function for finding  approximate  matches (Salakhutdinov & Hinton, 2007) hash function “ supermarket search”
How good is a shortlist found this way?  ,[object Object],[object Object],[object Object],[object Object]
Time series models ,[object Object],[object Object],[object Object]
Time series models ,[object Object],[object Object],[object Object],[object Object],[object Object]
The conditional RBM model  (Sutskever & Hinton 2007) ,[object Object],[object Object],[object Object],[object Object],[object Object],t- 2   t- 1   t t
Why the autoregressive connections do not cause problems ,[object Object],[object Object],[object Object],[object Object]
Generating from a learned model ,[object Object],[object Object],[object Object],t- 2   t- 1   t t
Stacking temporal RBM’s ,[object Object],[object Object],[object Object],[object Object],[object Object]
An application to modeling  motion capture data  (Taylor, Roweis & Hinton, 2007) ,[object Object],[object Object],[object Object]
Modeling multiple types of motion ,[object Object],[object Object],[object Object],[object Object],[object Object]
Show Graham Taylor’s movies available at www.cs.toronto/~hinton
Generating the parts of an object  ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],sloppy  top-down activation of parts clean-up using known interactions pose parameters  features with top-down support “ square” + Its like soldiers on a parade ground
Semi-restricted Boltzmann Machines ,[object Object],[object Object],[object Object],[object Object],[object Object],hidden i j visible
Learning a semi-restricted Boltzmann Machine i j i j t = 0  t = 1  1. Start with a training vector on the visible units. 2. Update all of the hidden units in parallel 3. Repeatedly update all of the visible units in parallel using mean-field updates (with the hiddens fixed) to get a “reconstruction”. 4. Update all of the hidden units again .  reconstruction data k i i k k k update for a lateral weight
Learning in Semi-restricted Boltzmann Machines  ,[object Object],[object Object],[object Object],total input to i damping
Results on modeling natural image patches using a stack of RBM’s  (Osindero and Hinton)   ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],Directed Connections Directed Connections Undirected Connections 400 Gaussian units  Hidden MRF with 2000 units Hidden MRF with 500 units 1000 top-level units. No MRF.
Without lateral connections real data samples from model
With lateral connections real data samples from model
A funny way to use an MRF ,[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object],[object Object]
Why do we whiten data? ,[object Object],[object Object],[object Object],[object Object]
Whitening the learning signal instead of the data ,[object Object],[object Object],[object Object],[object Object],[object Object]
Towards a more powerful, multi-linear stackable learning module ,[object Object],[object Object],[object Object],[object Object]
Higher order Boltzmann machines  (Sejnowski, ~1986) ,[object Object],[object Object],[object Object],[object Object]
A picture of a conditional,  higher-order Boltzmann machine  (Hinton & Lang,1985) retina-based features object-based features viewing transform ,[object Object],[object Object],[object Object]
Using conditional higher-order Boltzmann machines to model image transformations  (Memisevic and Hinton, 2007) ,[object Object],[object Object],image(t) image(t+1) image transformation
Readings on deep belief nets ,[object Object],[object Object]

Más contenido relacionado

La actualidad más candente

nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyay
abhishek upadhyay
 
Backpropagation
BackpropagationBackpropagation
Backpropagation
ariffast
 
lecture07.ppt
lecture07.pptlecture07.ppt
lecture07.ppt
butest
 

La actualidad más candente (20)

Multilayer & Back propagation algorithm
Multilayer & Back propagation algorithmMultilayer & Back propagation algorithm
Multilayer & Back propagation algorithm
 
Deep Belief Networks (D2L1 Deep Learning for Speech and Language UPC 2017)
Deep Belief Networks (D2L1 Deep Learning for Speech and Language UPC 2017)Deep Belief Networks (D2L1 Deep Learning for Speech and Language UPC 2017)
Deep Belief Networks (D2L1 Deep Learning for Speech and Language UPC 2017)
 
Classification by back propagation, multi layered feed forward neural network...
Classification by back propagation, multi layered feed forward neural network...Classification by back propagation, multi layered feed forward neural network...
Classification by back propagation, multi layered feed forward neural network...
 
Classification By Back Propagation
Classification By Back PropagationClassification By Back Propagation
Classification By Back Propagation
 
Deep Learning
Deep Learning Deep Learning
Deep Learning
 
Introduction to Applied Machine Learning
Introduction to Applied Machine LearningIntroduction to Applied Machine Learning
Introduction to Applied Machine Learning
 
Multi Layer Network
Multi Layer NetworkMulti Layer Network
Multi Layer Network
 
DNN and RBM
DNN and RBMDNN and RBM
DNN and RBM
 
Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)Deep learning lecture - part 1 (basics, CNN)
Deep learning lecture - part 1 (basics, CNN)
 
Auto encoders in Deep Learning
Auto encoders in Deep LearningAuto encoders in Deep Learning
Auto encoders in Deep Learning
 
2.5 backpropagation
2.5 backpropagation2.5 backpropagation
2.5 backpropagation
 
Classification using back propagation algorithm
Classification using back propagation algorithmClassification using back propagation algorithm
Classification using back propagation algorithm
 
Review-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learningReview-image-segmentation-by-deep-learning
Review-image-segmentation-by-deep-learning
 
nural network ER. Abhishek k. upadhyay
nural network ER. Abhishek  k. upadhyaynural network ER. Abhishek  k. upadhyay
nural network ER. Abhishek k. upadhyay
 
Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural Networks
 
Backpropagation
BackpropagationBackpropagation
Backpropagation
 
Back propagation method
Back propagation methodBack propagation method
Back propagation method
 
lecture07.ppt
lecture07.pptlecture07.ppt
lecture07.ppt
 
Convolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNetConvolutional neural network from VGG to DenseNet
Convolutional neural network from VGG to DenseNet
 
Ann
Ann Ann
Ann
 

Similar a NIPS2007: deep belief nets

ANNs have been widely used in various domains for: Pattern recognition Funct...
ANNs have been widely used in various domains for: Pattern recognition  Funct...ANNs have been widely used in various domains for: Pattern recognition  Funct...
ANNs have been widely used in various domains for: Pattern recognition Funct...
vijaym148
 
machinelearningengineeringslideshare-160909192132 (1).pdf
machinelearningengineeringslideshare-160909192132 (1).pdfmachinelearningengineeringslideshare-160909192132 (1).pdf
machinelearningengineeringslideshare-160909192132 (1).pdf
ShivareddyGangam
 
Face recognition using artificial neural network
Face recognition using artificial neural networkFace recognition using artificial neural network
Face recognition using artificial neural network
Sumeet Kakani
 
Deep Neural Network DNN.docx
Deep Neural Network DNN.docxDeep Neural Network DNN.docx
Deep Neural Network DNN.docx
jaffarbikat
 

Similar a NIPS2007: deep belief nets (20)

Modeling of neural image compression using gradient decent technology
Modeling of neural image compression using gradient decent technologyModeling of neural image compression using gradient decent technology
Modeling of neural image compression using gradient decent technology
 
Artificial Neural Networks ppt.pptx for final sem cse
Artificial Neural Networks  ppt.pptx for final sem cseArtificial Neural Networks  ppt.pptx for final sem cse
Artificial Neural Networks ppt.pptx for final sem cse
 
Artificial neural networks
Artificial neural networks Artificial neural networks
Artificial neural networks
 
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in RUnderstanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
Understanding Deep Learning & Parameter Tuning with MXnet, H2o Package in R
 
ai7.ppt
ai7.pptai7.ppt
ai7.ppt
 
Chapter 09 class advanced
Chapter 09 class advancedChapter 09 class advanced
Chapter 09 class advanced
 
Chapter 09 classification advanced
Chapter 09 classification advancedChapter 09 classification advanced
Chapter 09 classification advanced
 
ai7.ppt
ai7.pptai7.ppt
ai7.ppt
 
Introduction Of Artificial neural network
Introduction Of Artificial neural networkIntroduction Of Artificial neural network
Introduction Of Artificial neural network
 
ANNs have been widely used in various domains for: Pattern recognition Funct...
ANNs have been widely used in various domains for: Pattern recognition  Funct...ANNs have been widely used in various domains for: Pattern recognition  Funct...
ANNs have been widely used in various domains for: Pattern recognition Funct...
 
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdfCCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
CCS355 Neural Networks & Deep Learning Unit 1 PDF notes with Question bank .pdf
 
N ns 1
N ns 1N ns 1
N ns 1
 
machinelearningengineeringslideshare-160909192132 (1).pdf
machinelearningengineeringslideshare-160909192132 (1).pdfmachinelearningengineeringslideshare-160909192132 (1).pdf
machinelearningengineeringslideshare-160909192132 (1).pdf
 
Artificial neural network for machine learning
Artificial neural network for machine learningArtificial neural network for machine learning
Artificial neural network for machine learning
 
Face recognition using artificial neural network
Face recognition using artificial neural networkFace recognition using artificial neural network
Face recognition using artificial neural network
 
Applications in Machine Learning
Applications in Machine LearningApplications in Machine Learning
Applications in Machine Learning
 
Fundamental of deep learning
Fundamental of deep learningFundamental of deep learning
Fundamental of deep learning
 
deep learning
deep learningdeep learning
deep learning
 
introduction to deeplearning
introduction to deeplearningintroduction to deeplearning
introduction to deeplearning
 
Deep Neural Network DNN.docx
Deep Neural Network DNN.docxDeep Neural Network DNN.docx
Deep Neural Network DNN.docx
 

Más de zukun

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
zukun
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
zukun
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
zukun
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
zukun
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
zukun
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
zukun
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
zukun
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
zukun
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
zukun
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
zukun
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
zukun
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
zukun
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
zukun
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
zukun
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
zukun
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
zukun
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
zukun
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
zukun
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
zukun
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
zukun
 

Más de zukun (20)

My lyn tutorial 2009
My lyn tutorial 2009My lyn tutorial 2009
My lyn tutorial 2009
 
ETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCVETHZ CV2012: Tutorial openCV
ETHZ CV2012: Tutorial openCV
 
ETHZ CV2012: Information
ETHZ CV2012: InformationETHZ CV2012: Information
ETHZ CV2012: Information
 
Siwei lyu: natural image statistics
Siwei lyu: natural image statisticsSiwei lyu: natural image statistics
Siwei lyu: natural image statistics
 
Lecture9 camera calibration
Lecture9 camera calibrationLecture9 camera calibration
Lecture9 camera calibration
 
Brunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer visionBrunelli 2008: template matching techniques in computer vision
Brunelli 2008: template matching techniques in computer vision
 
Modern features-part-4-evaluation
Modern features-part-4-evaluationModern features-part-4-evaluation
Modern features-part-4-evaluation
 
Modern features-part-3-software
Modern features-part-3-softwareModern features-part-3-software
Modern features-part-3-software
 
Modern features-part-2-descriptors
Modern features-part-2-descriptorsModern features-part-2-descriptors
Modern features-part-2-descriptors
 
Modern features-part-1-detectors
Modern features-part-1-detectorsModern features-part-1-detectors
Modern features-part-1-detectors
 
Modern features-part-0-intro
Modern features-part-0-introModern features-part-0-intro
Modern features-part-0-intro
 
Lecture 02 internet video search
Lecture 02 internet video searchLecture 02 internet video search
Lecture 02 internet video search
 
Lecture 01 internet video search
Lecture 01 internet video searchLecture 01 internet video search
Lecture 01 internet video search
 
Lecture 03 internet video search
Lecture 03 internet video searchLecture 03 internet video search
Lecture 03 internet video search
 
Icml2012 tutorial representation_learning
Icml2012 tutorial representation_learningIcml2012 tutorial representation_learning
Icml2012 tutorial representation_learning
 
Advances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer visionAdvances in discrete energy minimisation for computer vision
Advances in discrete energy minimisation for computer vision
 
Gephi tutorial: quick start
Gephi tutorial: quick startGephi tutorial: quick start
Gephi tutorial: quick start
 
EM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysisEM algorithm and its application in probabilistic latent semantic analysis
EM algorithm and its application in probabilistic latent semantic analysis
 
Object recognition with pictorial structures
Object recognition with pictorial structuresObject recognition with pictorial structures
Object recognition with pictorial structures
 
Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities Iccv2011 learning spatiotemporal graphs of human activities
Iccv2011 learning spatiotemporal graphs of human activities
 

Último

Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
ssuserdda66b
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Último (20)

Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Dyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptxDyslexia AI Workshop for Slideshare.pptx
Dyslexia AI Workshop for Slideshare.pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdfVishram Singh - Textbook of Anatomy  Upper Limb and Thorax.. Volume 1 (1).pdf
Vishram Singh - Textbook of Anatomy Upper Limb and Thorax.. Volume 1 (1).pdf
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Single or Multiple melodic lines structure
Single or Multiple melodic lines structureSingle or Multiple melodic lines structure
Single or Multiple melodic lines structure
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 

NIPS2007: deep belief nets

  • 1. 2007 NIPS Tutorial on: Deep Belief Nets Geoffrey Hinton Canadian Institute for Advanced Research & Department of Computer Science University of Toronto
  • 2.
  • 3.
  • 4.
  • 5. Second generation neural networks (~1985) input vector hidden layers outputs Back-propagate error signal to get derivatives for learning Compare outputs with correct answer to get error signal
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11.
  • 12.
  • 13.
  • 14.
  • 15.
  • 16.
  • 17. The Energy of a joint configuration (ignoring terms to do with biases) weight between units i and j Energy with configuration v on the visible units and h on the hidden units binary state of visible unit i binary state of hidden unit j
  • 18.
  • 19.
  • 20. A picture of the maximum likelihood learning algorithm for an RBM i j i j i j i j t = 0 t = 1 t = 2 t = infinity Start with a training vector on the visible units. Then alternate between updating all the hidden units in parallel and updating all the visible units in parallel. a fantasy
  • 21. A quick way to learn an RBM i j i j t = 0 t = 1 Start with a training vector on the visible units. Update all the hidden units in parallel Update the all the visible units in parallel to get a “reconstruction”. Update the hidden units again . This is not following the gradient of the log likelihood. But it works well. It is approximately following the gradient of another objective function (Carreira-Perpinan & Hinton, 2005). reconstruction data
  • 22. How to learn a set of features that are good for reconstructing images of the digit 2 50 binary feature neurons 16 x 16 pixel image 50 binary feature neurons 16 x 16 pixel image Increment weights between an active pixel and an active feature Decrement weights between an active pixel and an active feature data (reality) reconstruction (better than reality)
  • 23. The final 50 x 256 weights Each neuron grabs a different feature.
  • 24. How well can we reconstruct the digit images from the binary feature activations? Reconstruction from activated binary features Data Reconstruction from activated binary features Data New test images from the digit class that the model was trained on Images from an unfamiliar digit class (the network tries to see every image as a 2)
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30. Why does greedy learning work? The weights, W, in the bottom level RBM define p(v|h) and they also, indirectly, define p(h). So we can express the RBM model as If we leave p(v|h) alone and improve p(h), we will improve p(v). To improve p(h), we need it to be a better model of the aggregated posterior distribution over hidden vectors produced by applying W to the data.
  • 31.
  • 32.
  • 33. A model of digit recognition 2000 top-level neurons 500 neurons 500 neurons 28 x 28 pixel image 10 label neurons The model learns to generate combinations of labels and images. To perform recognition we start with a neutral state of the label units and do an up-pass from the image followed by a few iterations of the top-level associative memory. The top two layers form an associative memory whose energy landscape models the low dimensional manifolds of the digits. The energy valleys have names
  • 34.
  • 35. Show the movie of the network generating digits (available at www.cs.toronto/~hinton)
  • 36. Samples generated by letting the associative memory run with one label clamped. There are 1000 iterations of alternating Gibbs sampling between samples.
  • 37. Examples of correctly recognized handwritten digits that the neural network had never seen before Its very good
  • 38.
  • 39.
  • 40.
  • 41.
  • 42.
  • 43.
  • 44.
  • 45.
  • 46.
  • 47.
  • 48.
  • 49. The variational bound Now we cancel out all of the partition functions except the top one and replace log probabilities by goodnesses using the fact that: This has simple derivatives that give a more justifiable fine-tuning algorithm than contrastive wake-sleep . Each time we replace the prior over the hidden units by a better prior, we win by the difference in the probability assigned
  • 50.
  • 51.
  • 52. BREAK
  • 53.
  • 54.
  • 55. First, model the distribution of digit images 2000 units 500 units 500 units 28 x 28 pixel image The network learns a density model for unlabeled digit images. When we generate from the model we get things that look like real digits of all classes. But do the hidden features really help with digit discrimination? Add 10 softmaxed units to the top and do backpropagation. The top two layers form a restricted Boltzmann machine whose free energy landscape should model the low dimensional manifolds of the digits.
  • 56.
  • 57.
  • 58. Learning to extract the orientation of a face patch (Salakhutdinov & Hinton, NIPS 2007)
  • 59. The training and test sets 11,000 unlabeled cases 100, 500, or 1000 labeled cases face patches from new people
  • 60. The root mean squared error in the orientation when combining GP’s with deep belief nets 22.2 17.9 15.2 17.2 12.7 7.2 16.3 11.2 6.4 GP on the pixels GP on top-level features GP on top-level features with fine-tuning 100 labels 500 labels 1000 labels Conclusion: The deep features are much better than the pixels. Fine-tuning helps a lot.
  • 61.
  • 62.
  • 63.
  • 64.
  • 65. A comparison of methods for compressing digit images to 30 real numbers. real data 30-D deep auto 30-D logistic PCA 30-D PCA
  • 66.
  • 68.
  • 69.
  • 70.
  • 71. Proportion of retrieved documents in same class as query Number of documents retrieved
  • 72. First compress all documents to 2 numbers using a type of PCA Then use different colors for different document categories
  • 73. First compress all documents to 2 numbers. Then use different colors for different document categories
  • 74.
  • 75. Semantic hashing: Using a deep autoencoder as a hash-function for finding approximate matches (Salakhutdinov & Hinton, 2007) hash function “ supermarket search”
  • 76.
  • 77.
  • 78.
  • 79.
  • 80.
  • 81.
  • 82.
  • 83.
  • 84.
  • 85. Show Graham Taylor’s movies available at www.cs.toronto/~hinton
  • 86.
  • 87.
  • 88. Learning a semi-restricted Boltzmann Machine i j i j t = 0 t = 1 1. Start with a training vector on the visible units. 2. Update all of the hidden units in parallel 3. Repeatedly update all of the visible units in parallel using mean-field updates (with the hiddens fixed) to get a “reconstruction”. 4. Update all of the hidden units again . reconstruction data k i i k k k update for a lateral weight
  • 89.
  • 90.
  • 91. Without lateral connections real data samples from model
  • 92. With lateral connections real data samples from model
  • 93.
  • 94.
  • 95.
  • 96.
  • 97.
  • 98.
  • 99.
  • 100.