Introduction to deep learning

INTRODUCTIONTO
DEEP LEARNING
Zeynep Su Kurultay

Outline
■ Modeling humans in machines
■ Introduction to neural nets
■ What makes an algorithm intelligent?
■ Learning
– Supervised learning
■ Deep learning
– Neural nets in detail
■ Framework discussion & sample code
■ Future

Modeling humans in machines
But why?

Neural networks
■ The mammal brain is organized in a deep
architecture (Serre, Kreiman, Kouh, Cadieu,
Knoblich, & Poggio, 2007)
(E.g. visual system has 5 to 10 levels)
■ Very popular at the beginning of 1990s but fell
out of favor after it was found that they were
not performing well
■ Why is it gaining power again now: Deep
architectures might be able to represent some
functions otherwise not efficiently
representable. Breakthrough in 2006/2007 with
Hinton, Bengio papers

Examples around us
Date: November 2014

Examples around us
Image: NasenSpray/Imgur

Examples around us
Image: http://www.telegraph.co.uk/technology/google/11730050/deep-dream-best-images.html?frame=3370388

Examples around us
Image: drkaugumon/Imgur

What makes an algorithm intelligent?
Image courtesy ofToptal.com

What makes an algorithm intelligent?

Learning
■ Supervised machine learning:The program is “trained” on a pre-defined set of
“training examples”, which then facilitate its ability to reach an accurate conclusion
when given new data.
■ Semi-supervised machine learning:The program infers the unknown labels through
“label propagation”, utilizing similarities between different examples and inferring
non-existent labels from existent ones
■ Unsupervised machine learning:The program is given a bunch of data and must find
patterns and relationships therein. – e.g. clustering via nearest neighbor algorithm

Supervised Learning
■ Binary classification: Does this person have that disease?
■ Regression:What is the market value of this house?
■ Multiclass classification: Digit recognition, Face recognition

Supervised Learning
■ Goal: Given a number of features, try to make sense out of it!
■ Example: Employee satisfaction rates – depends on ?  So, given these
features in a dataset, try to predict the rate

Supervised Learning
■ But how do we adjust ourselves? How do we know at each step we are getting better?
■ Measurement of wrongness: Loss functions

Gradient descent
How do we know how to “roll down
the hill”?
The gradient (the derivatives of the
loss function over all of the individual
weights of features -i.e. parameters-)
tells us “which way is down”.

What exactly is deep learning?
■ “a network would need more than one hidden layer to be a deep network, networks
with one or two hidden layers are traditional neural networks…….”
■ “in my experience, a network can be considered deep when there is at least one
hidden layer.Although the term deep learning can be fuzzy, …”
■ “in my own thinking, deep is not related to the number of layers, but it talks about
how hard the feature to be discovered is…….”
■ - a discussion from StackExchange

Deep learning
■ What is the difference? Remember the quote fromYann LeCun from before? It goes
on:
■ “A pattern recognition system is like a black box with a camera at one end, a green
light and a red light on top, and a whole bunch of knobs on the front…. Now, imagine a
box with 500 million knobs, 1,000 light bulbs, and 10 million images to train it with.
That’s what a typical Deep Learning system is.”

Aim: Learning features
■ Deep learning excels in tasks where the basic unit, a single
pixel, a single frequency, or a single word has very little
meaning in and of itself, but the combination of such units
has a useful meaning. It can learn these useful combinations
of values without any human intervention.

(convolutional neural networks)

Neural networks
■ An input, output, and one or more hidden layers
of units/neurons/perceptrons
■ Each connection between two neurons has a
weight w. Best weights can again be found with
gradient descent.
Image courtesy of
http://ljs.academicdirect.org/A15/053_070.htm

Neural networks
■ Example: Input vector: [7, 1, 2]  Into the input
units
■ Forward propagation
■ Activation function
Image courtesy of

Neural networks
■ Why deep?
■ Number of parameterized transformations a
signal encounters as it propagates from the
input layer to the output layer, where a
parameterized transformation is a processing
unit that has trainable parameters, such as
weights.
Image courtesy of

■ The goal of deep learning methods is to learn higher
levels of feature from lower level features.

Other important concepts
■ Overfitting – there is such a thing as learning too much –or too specific-!
■ Regularization – a technique that prevents overfitting

Overfitting
■ Overfitting – there is such a thing as learning too much –or too specific-!
■ Regularization – a technique that prevents overfitting

Overfitting
U.S. Census Population overTime

Different frameworks
■ Pylearn2, Lasagne,Caffe,Torch,Theano, Blocks, Plate,Crino,Theanet, DL4J, Keras, …

■ Theano:
– A mathematical expression compiler, designed with machine learning in mind.
– Lets you define an objective and automatically produces the code that computes the
gradient of the objective.
– Good for experimenting with different loss functions
– Slightly lower layer of abstraction vs more possibilities

■ Caffe:
– Developed by UC Berkeley
– Widely used machine-vision library that ported Matlab’s implementation of fast
convolutional nets to C and C++
– Not intended for other deep-learning applications such as text, sound or time series data
CORRECTION: There are new implementations of RNNs and LSTMs in Caffe, so it is not
only for images any more!
– Very fast: over 60M images per day with a single NVIDIA K40 GPU

■ Torch:
– Written in Lua (a scripting language developed in Brazil in the early 1990s)
– A highly customized version of it is used by large tech companies such as Google and
Facebook

■ Keras:
– Minimalist, highly modular neural network library in the spirit ofTorch
– Written in Python
– UsesTheano under the hood for optimized tensor manipulation on GPU andCPU
– It was developed with a focus on enabling fast experimentation
– 60K images took 30 hours on Amazon g2.2xlarge 

Comparing Keras andTheano
MNIST digits dataset
- serves as a benchmark to compare
results with as new articles come
out.
Multilayer Perceptron
- Basic feedforward neural network

Demo
Code snippets – inside the gradient descent
Output =Wx+b

Demo
Code snippets – inside the hidden layer

Demo
Code snippets – inside the network

Demo
■ https://algorithmia.com/demo/handwriting

Future of deep learning
■ Deep learning has a lot of hype right now, and it is apparent that it is very useful for
specific tasks.
■ What frontiers and challenges do you think are the most exciting for researchers in
the field of neural networks in the next ten years?
■ I cannot see ten years into the future. For me, the wall of fog starts at about 5 years. ...
I think that the most exciting areas over the next five years will be really understanding
videos and text. I will be disappointed if in five years time we do not have something
that can watch aYouTube video and tell a story about what happened. I have had a lot
of disappointments.
– From Geoffrey Hinton’s AMA on Reddit

Now &The future
Facebook Deep Learning, March 26, 2015
Image courtesy ofVenturebeat.com

Join us!
■ Open positions: https://angel.co/algorithmia/jobs/
– Algorithm Developer [this is me!]
– Backend Developer
– Product Manager
– Technical Evangelist

Further resources
■ Introductory:
■ Andrew Ng’s Machine Learning course on Coursera
■ Geoffrey Hinton’s Neural Networks course on Coursera
■ Advanced:
■ Stanford’s Convolutional Neural Networks forVisual Recognition http://cs231n.github.io/
■ Who is afraid of non-convex loss functions? ByYann LeCun http://videolectures.net/eml07_lecun_wia/
■ What is wrong with Deep Learning? ByYann Lecun http://techtalks.tv/talks/whats-wrong-with-deep-learning/61639/
■ For those who like papers, recent advances:
■ PlayingAtari with Deep Reinforcement Learning - http://www.cs.toronto.edu/~vmnih/docs/dqn.pdf
■ Unsupervised Face Detection - http://cs.stanford.edu/~quocle/faces_full.pdf

■ Content:
■ Toptal.com, Deeplearning.net
■ http://www.computerworld.com/article/2918161/emerging-technology/the-ai-ecosystem.html
■ Introduction to Machine Learning CMU-10701 - Deep Learning slides
■ Images:
■ http://www.spyemporium.com/images/products/st-sc1720.jpg
■ http://stats.stackexchange.com/questions/128616/whats-a-real-world-example-of-overfitting
■ http://www.homedepot.com/catalog/productImages/1000/c4/c4c34d2e-56ce-4c11-94c0-67aa19b769fa_1000.jpg
■ http://www.bulborama.com/images/products/1933.jpg
■ https://xkcd.com/1122/, https://xkcd.com/1425/
■ www.deeplearning.net

Introduction to deep learning

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Introduction to deep learning

Similar to Introduction to deep learning (20)

Recently uploaded

Recently uploaded (20)

Introduction to deep learning

Editor's Notes