SlideShare una empresa de Scribd logo
1 de 95
Descargar para leer sin conexión
Deep Learning for Computer Vision
Igor Trajkovski
admin@time.mk
twitter.com/mkrobot
facebook.com/mkrobot
SkopjeTechMeetup
29.9.2016
Slides from Andrej Karpathy, Yann LeCun & Geoffrey Hinton
Artificial Neural Network
Artificial Neural Network
Very few assumptions made about the input vector.
In many real-world applications input vectors have structure.
Spectrograms
ImagesText
Convolutional Neural Networks:
A pinch of history
Hubel & Wiesel,
1959
RECEPTIVE FIELDS OF SINGLE NEURONES IN
THE CAT'S STRIATE CORTEX
1962
RECEPTIVE FIELDS, BINOCULAR
INTERACTION
AND FUNCTIONAL ARCHITECTURE IN
THE CAT'S VISUAL CORTEX
https://www.youtube.com/watch?v=8VdFf3egwfg
A bit of history:
Neurocognitron
[Fukushima 1980]
“sandwich” architecture (SCSCSC…)
simple cells: modifiable parameters
complex cells: perform pooling
Gradient-based learning applied to
document recognition
[LeCun, Bottou, Bengio, Haffner 1998]
LeNet-5
https://www.youtube.com/watch?v=FwFduRA_L6Q
car 99%
Computer
Vision
2011
Computer
Vision
2011
+  code complexity :(
ImageNet Classification with Deep
Convolutional Neural Networks
[Krizhevsky, Sutskever, Hinton, 2012]
“AlexNet”
Deng et al.
Russakovsky et al.
NVIDIA et al.
“What I learned from competing against a ConvNet on
ImageNet” (karpathy.github.io)
“What I learned from competing against a ConvNet on
ImageNet” (karpathy.github.io)
TLDR: Human accuracy is somewhere 2-5%.
(depending on how much training or how little life you have)
(slide from Kaiming He’s recent presentation)
[224x224x3]
f 1000 numbers,
indicating class scores
Feature
Extraction
vector describing
various image statistics
[224x224x3]
f 1000 numbers,
indicating class scores
training
training
“Run the image through 20 layers of 3x3
convolutions and train the filters with SGD.”
Convolutional Neural Networks
[224x224x3]
f 1000 numbers,
indicating class scores
training
Only two basic operations are involved throughout:
1.  Dot products w.x
2.  Max operations
parameters
(~10M of them)
preview:
e.g. 200K numbers e.g. 10 numbers
Convolution 1-D
1 -1 -1
-1 1 -1
-1 -1 1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
Convolution 2-D
1
1 -1 -1
-1 1 -1
-1 -1 1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
Convolution 2-D
1 1
1 -1 -1
-1 1 -1
-1 -1 1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
Convolution 2-D
1 1 1
1 -1 -1
-1 1 -1
-1 -1 1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
Convolution 2-D
1 1 1
1
1 -1 -1
-1 1 -1
-1 -1 1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
Convolution 2-D
1 1 1
1 1
1 -1 -1
-1 1 -1
-1 -1 1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
Convolution 2-D
1 1 1
1 1 1
1 -1 -1
-1 1 -1
-1 -1 1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
Convolution 2-D
1 1 1
1 1 1
1
1 -1 -1
-1 1 -1
-1 -1 1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
Convolution 2-D
1 1 1
1 1 1
1 1
1 -1 -1
-1 1 -1
-1 -1 1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
Convolution 2-D
1 1 1
1 1 1
1 1 1
1 -1 -1
-1 1 -1
-1 -1 1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
Convolution 2-D
1
1 1 1
1 1 1
1 1 1
1 -1 -1
-1 1 -1
-1 -1 1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
Convolution 2-D
1
1 -1 -1
-1 1 -1
-1 -1 1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
Convolution 2-D
1 1 -1
1 -1 -1
-1 1 -1
-1 -1 1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
Convolution 2-D
1 1 -1
1 1 1
-1 1 1
1 -1 -1
-1 1 -1
-1 -1 1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
Convolution 2-D
1
1 -1 -1
-1 1 -1
-1 -1 1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
1 1 -1
1 1 1
-1 1 1
55
1 1 -1
1 1 1
-1 1 1
Convolution 2-D
1 -1 -1
-1 1 -1
-1 -1 1
-1 -1 -1 -1 -1 -1 -1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 -1 -1 1 -1 -1 -1 -1
-1 -1 -1 1 -1 1 -1 -1 -1
-1 -1 1 -1 -1 -1 1 -1 -1
-1 1 -1 -1 -1 -1 -1 1 -1
-1 -1 -1 -1 -1 -1 -1 -1 -1
=
0.77 -0.11 0.11 0.33 0.55 -0.11 0.33
-0.11 1.00 -0.11 0.33 -0.11 0.11 -0.11
0.11 -0.11 1.00 -0.33 0.11 -0.11 0.55
0.33 0.33 -0.33 0.55 -0.33 0.33 0.33
0.55 -0.11 0.11 -0.33 1.00 -0.11 0.11
-0.11 0.11 -0.11 0.33 -0.11 1.00 -0.11
0.33 -0.11 0.55 0.33 0.11 -0.11 0.77
Convolution 2-D
32
32
3
Convolution Layer
32x32x3 image
width
height
depth
32
32
3
Convolution Layer
5x5x3 filter
32x32x3 image
Convolve the filter with the image
i.e. “slide over the image spatially,
computing dot products”
32
32
3
Convolution Layer
5x5x3 filter
32x32x3 image
Convolve the filter with the image
i.e. “slide over the image spatially,
computing dot products”
Filters always extend the full
depth of the input volume
32
32
3
Convolution Layer
32x32x3 image
5x5x3 filter
1 number:
the result of taking a dot product between the
filter and a small 5x5x3 chunk of the image
(i.e. 5*5*3 = 75-dimensional dot product + bias)
32
32
3
Convolution Layer
32x32x3 image
5x5x3 filter
convolve (slide) over all
spatial locations
activation map
1
28
28
32
32
3
Convolution Layer
32x32x3 image
5x5x3 filter
convolve (slide) over all
spatial locations
activation maps
1
28
28
consider a second, green filter
32
32
3
Convolution Layer
activation maps
6
28
28
For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps:
We stack these up to get a “new image” of size 28x28x6!
32
32
3
Convolution Layer
activation maps
6
28
28
For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps:
We processed [32x32x3] volume into [28x28x6] volume.
Q: how many parameters would this be if we used a fully connected layer instead?
32
32
3
Convolution Layer
activation maps
6
28
28
For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps:
We processed [32x32x3] volume into [28x28x6] volume.
Q: how many parameters would this be if we used a fully connected layer instead?
A: (32*32*3)*(28*28*6) = 14.5M parameters, ~14.5M multiplies
32
32
3
Convolution Layer
activation maps
6
28
28
For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps:
We processed [32x32x3] volume into [28x28x6] volume.
Q: how many parameters are used instead?
32
32
3
Convolution Layer
activation maps
6
28
28
For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps:
We processed [32x32x3] volume into [28x28x6] volume.
Q: how many parameters are used instead? --- And how many multiplies?
A: (5*5*3)*6 = 450 parameters
32
32
3
Convolution Layer
activation maps
6
28
28
For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps:
We processed [32x32x3] volume into [28x28x6] volume.
Q: how many parameters are used instead?
A: (5*5*3)*6 = 450 parameters, (5*5*3)*(28*28*6) = ~350K multiplies
example 5x5 filters
(32 total)
We call the layer convolutional
because it is related to convolution
of two signals:
elementwise multiplication and sum of
a filter and the signal (image)
one filter =>
one activation map
Preview: ConvNet is a sequence of Convolution Layers, interspersed with
activation functions
32
32
3
28
28
6
CONV,
ReLU
e.g. 6
5x5x3
filters
Rectified Linear Units (ReLUs)
0.77 -0.11 0.11 0.33 0.55 -0.11 0.33
-0.11 1.00 -0.11 0.33 -0.11 0.11 -0.11
0.11 -0.11 1.00 -0.33 0.11 -0.11 0.55
0.33 0.33 -0.33 0.55 -0.33 0.33 0.33
0.55 -0.11 0.11 -0.33 1.00 -0.11 0.11
-0.11 0.11 -0.11 0.33 -0.11 1.00 -0.11
0.33 -0.11 0.55 0.33 0.11 -0.11 0.77
0.77
0.77 0
Rectified Linear Units (ReLUs)
0.77 -0.11 0.11 0.33 0.55 -0.11 0.33
-0.11 1.00 -0.11 0.33 -0.11 0.11 -0.11
0.11 -0.11 1.00 -0.33 0.11 -0.11 0.55
0.33 0.33 -0.33 0.55 -0.33 0.33 0.33
0.55 -0.11 0.11 -0.33 1.00 -0.11 0.11
-0.11 0.11 -0.11 0.33 -0.11 1.00 -0.11
0.33 -0.11 0.55 0.33 0.11 -0.11 0.77
0.77 0 0.11 0.33 0.55 0 0.33
Rectified Linear Units (ReLUs)
0.77 -0.11 0.11 0.33 0.55 -0.11 0.33
-0.11 1.00 -0.11 0.33 -0.11 0.11 -0.11
0.11 -0.11 1.00 -0.33 0.11 -0.11 0.55
0.33 0.33 -0.33 0.55 -0.33 0.33 0.33
0.55 -0.11 0.11 -0.33 1.00 -0.11 0.11
-0.11 0.11 -0.11 0.33 -0.11 1.00 -0.11
0.33 -0.11 0.55 0.33 0.11 -0.11 0.77
0.77 0 0.11 0.33 0.55 0 0.33
0 1.00 0 0.33 0 0.11 0
0.11 0 1.00 0 0.11 0 0.55
0.33 0.33 0 0.55 0 0.33 0.33
0.55 0 0.11 0 1.00 0 0.11
0 0.11 0 0.33 0 1.00 0
0.33 0 0.55 0.33 0.11 0 0.77
Rectified Linear Units (ReLUs)
0.77 -0.11 0.11 0.33 0.55 -0.11 0.33
-0.11 1.00 -0.11 0.33 -0.11 0.11 -0.11
0.11 -0.11 1.00 -0.33 0.11 -0.11 0.55
0.33 0.33 -0.33 0.55 -0.33 0.33 0.33
0.55 -0.11 0.11 -0.33 1.00 -0.11 0.11
-0.11 0.11 -0.11 0.33 -0.11 1.00 -0.11
0.33 -0.11 0.55 0.33 0.11 -0.11 0.77
Preview: ConvNet is a sequence of Convolutional Layers, interspersed with
activation functions
32
32
3
CONV,
ReLU
e.g. 6
5x5x3
filters 28
28
6
CONV,
ReLU
e.g. 10
5x5x6
filters
CONV,
ReLU
….
10
24
24
two more layers to go: POOL/FC
Pooling layer
-  makes the representations smaller and more manageable
-  operates over each activation map independently:
1 1 2 4
5 6 7 8
3 2 1 0
1 2 3 4
Single depth slice
x
y
max pool with 2x2 filters
and stride 2 6 8
3 4
MAX POOLING
Fully Connected Layer (FC layer)
-  Contains neurons that connect to the entire input volume,
as in ordinary Neural Networks
Softmax
SCORES PROBABILITIES
http://cs.stanford.edu/people/karpathy/convnetjs/demo/cifar10.html
[ConvNetJS demo: training on CIFAR-10]
Case Study: AlexNet
[Krizhevsky et al. 2012]
Input: 227x227x3 images
First layer (CONV1): 96 11x11 filters applied at stride 4
=>
Q: what is the output volume size? Hint: (227-11)/4+1 = 55
Case Study: AlexNet
[Krizhevsky et al. 2012]
Input: 227x227x3 images
First layer (CONV1): 96 11x11 filters applied at stride 4
=>
Output volume [55x55x96]
Q: What is the total number of parameters in this layer?
Case Study: AlexNet
[Krizhevsky et al. 2012]
Input: 227x227x3 images
First layer (CONV1): 96 11x11 filters applied at stride 4
=>
Output volume [55x55x96]
Parameters: (11*11*3)*96 = 35K
Case Study: AlexNet
[Krizhevsky et al. 2012]
1st layer filters
Case Study: AlexNet
[Krizhevsky et al. 2012]
2nd layer filters
Face recognition
Case Study: AlexNet
[Krizhevsky et al. 2012]
Full (simplified) AlexNet architecture:
[227x227x3] INPUT
[55x55x96] CONV1: 96 11x11 filters at stride 4, pad 0
[27x27x96] MAX POOL1: 3x3 filters at stride 2
[27x27x96] NORM1: Normalization layer
[27x27x256] CONV2: 256 5x5 filters at stride 1, pad 2
[13x13x256] MAX POOL2: 3x3 filters at stride 2
[13x13x256] NORM2: Normalization layer
[13x13x384] CONV3: 384 3x3 filters at stride 1, pad 1
[13x13x384] CONV4: 384 3x3 filters at stride 1, pad 1
[13x13x256] CONV5: 256 3x3 filters at stride 1, pad 1
[6x6x256] MAX POOL3: 3x3 filters at stride 2
[4096] FC6: 4096 neurons
[4096] FC7: 4096 neurons
[1000] FC8: 1000 neurons (class scores)
Compared to LeCun 1998:
1 DATA:
- More data: 10^6 vs. 10^3
2 COMPUTE:
- GPU (~100x speedup)
3 ALGORITHM:
- Deeper: More layers (8 weight layers)
- Fancy regularization (dropout)
- Fancy non-linearity (ReLU)
Case Study: AlexNet
[Krizhevsky et al. 2012]
Full (simplified) AlexNet architecture:
[227x227x3] INPUT
[55x55x96] CONV1: 96 11x11 filters at stride 4, pad 0
[27x27x96] MAX POOL1: 3x3 filters at stride 2
[27x27x96] NORM1: Normalization layer
[27x27x256] CONV2: 256 5x5 filters at stride 1, pad 2
[13x13x256] MAX POOL2: 3x3 filters at stride 2
[13x13x256] NORM2: Normalization layer
[13x13x384] CONV3: 384 3x3 filters at stride 1, pad 1
[13x13x384] CONV4: 384 3x3 filters at stride 1, pad 1
[13x13x256] CONV5: 256 3x3 filters at stride 1, pad 1
[6x6x256] MAX POOL3: 3x3 filters at stride 2
[4096] FC6: 4096 neurons
[4096] FC7: 4096 neurons
[1000] FC8: 1000 neurons (class scores)
Details/Retrospectives:
- first use of ReLU
- used Norm layers (not common anymore)
- heavy data augmentation
- dropout 0.5
- batch size 128
- SGD Momentum 0.9
- Learning rate 1e-2, reduced by 10
manually when val accuracy plateaus
- L2 weight decay 5e-4
- 7 CNN ensemble: 18.2% -> 15.4%
Case Study: VGGNet
[Simonyan and Zisserman, 2014]
best model
Only 3x3 CONV stride 1, pad 1
and 2x2 MAX POOL stride 2
11.2% top 5 error in ILSVRC 2013
->
7.3% top 5 error
Case Study: GoogLeNet [Szegedy et al., 2014]
Inception module
ILSVRC 2014 winner (6.7% top 5 error)
Slide from Kaiming He’s recent presentation https://www.youtube.com/watch?v=1PGLj-uKT1w
Case Study: ResNet [He et al., 2015]
ILSVRC 2015 winner (3.6% top 5 error)
Case Study:
ResNet
[He et al., 2015]
224x224x3
spatial dimension
only 56x56!
VGG: ~2-3 weeks training with 4 GPUs
ResNet: ~2-3 weeks training with 4 GPUs
~$1K each
Addressing other tasks...
Transfer Learning
1. Train on
Imagenet
3. Medium dataset:
finetuning
more data = retrain more of
the network (or all of it)
2. Small dataset:
feature extractor
Freeze these
Train this
Freeze these
Train this
Transfer Learning
CNN Features off-the-shelf: an Astounding Baseline for Recognition [Razavian et al, 2014]
Addressing other tasks...
image CNN features
224x224x3
A block of compute with a few
million parameters.
7x7x512
Addressing other tasks...
image CNN features
224x224x3
A block of compute with a few
million parameters.
7x7x512
predicted thing
desired thing
Addressing other tasks...
image CNN features
224x224x3
A block of compute with a few
million parameters.
7x7x512
predicted thing
desired thing
this part changes
from task to task
Image Classification
thing = a vector of probabilities for different classes
image CNN features
224x224x3
7x7x512
e.g. vector of 1000 numbers giving
probabilities for different classes.
fully connected layer
Image Captioning
image CNN features
224x224x3
7x7x512
A sequence of 10,000-dimensional
vectors giving probabilities of
different words in the caption.
RNN
Localization
image CNN features
224x224x3
7x7x512
fully connected layer
Class
probabilities
(as before)
4 numbers:
-  X coord
-  Y coord
-  Width
-  Height
Reinforcement Learning
image CNN features
160x210x3
fully connected
e.g. vector of 8 numbers giving
probability of wanting to take any
of the 8 possible ATARI actions.
Mnih et al. 2015
ConvNets are everywhere…
e.g. Google Photos search
Face Verification, Taigman et al. 2014 (FAIR)
Self-driving cars[Goodfellow et al. 2014]
Ciresan et al. 2013
Turaga et al 2010
ConvNets are everywhere…
Whale recognition, Kaggle Challenge Satellite image analysis
Mnih and Hinton, 2010
Galaxy Challenge Dielman et al. 2015
WaveNet, van den Oord et al. 2016 Image captioning, Vinyals et al. 2015
ConvNets are everywhere…
AlphaGo, Silver et al 2016
VizDoom
StarCraft
ConvNets are everywhere…
DeepDream reddit.com/r/deepdream
NeuralStyle, Gatys et al. 2015
deepart.io, Prisma, etc.
Practical considerations
when applying ConvNets
What hardware do I use?
Buy your own machine:
-  NVIDIA DIGITS DevBox (TITAN X GPUs)
-  NVIDIA DGX-1 (P100 GPUs)
Build your own machine:
https://graphific.github.io/posts/building-a-deep-learning-dream-machine/
GPUs in the cloud:
-  Amazon AWS (GRID K520 :( )
-  Microsoft Azure (soon); 4x K80 GPUs
-  Cirrascale (“rent-a-box”)
What framework do I use?
Caffe
Torch
Theano
LasagneKeras
TensorFlow
Mxnet
chainer
Nervana’s Neon
Microsoft’s CNTK
Deeplearning4j
...
e.g. with keras.ioThe power is easily accessible.
Q: How do I know what architecture/hyperparameters
to use?
A: don’t be a hero.
1.  Take whatever works best on ILSVRC (latest ResNet)
2.  Download a pretrained model
3.  Potentially add/delete some parts of it
4.  Finetune it on your application.
5.  Play with the regularization strength (dropout rates)
MOOCS
Machine Learning
Coursera.org, Andrew Ng
Neural Networks for Machine Learning
Coursera.org, Geoffrey Hinton
Computational Neuroscience
Coursera.org, Rajesh P.N. Rao
Deep Learning
Udacity.org, Vincent Vanhoucke
Convolutional Neural Networks
Stanford.edu, Andrej Karapthy
(in Skopje) Convolutional Neural Networks & DL for NLP @ edu.time.mk
Thank you!
admin@time.mk
twitter.com/mkrobot
facebook.com/mkrobot

Más contenido relacionado

La actualidad más candente

TechnoBuzz - Tech Quiz
TechnoBuzz - Tech QuizTechnoBuzz - Tech Quiz
TechnoBuzz - Tech QuizCurious_Curie
 
The Amazing Ways Telecom Companies Use Artificial Intelligence And Machine Le...
The Amazing Ways Telecom Companies Use Artificial Intelligence And Machine Le...The Amazing Ways Telecom Companies Use Artificial Intelligence And Machine Le...
The Amazing Ways Telecom Companies Use Artificial Intelligence And Machine Le...Bernard Marr
 
Presentation on failure of nokia
Presentation on failure of nokiaPresentation on failure of nokia
Presentation on failure of nokiaabhishekthakur309
 
SIT tech quiz finals- 2019
SIT tech quiz finals- 2019   SIT tech quiz finals- 2019
SIT tech quiz finals- 2019 Ankit Jena
 
Artificial Intelligence in Telecom – Industry Adoption Analysis
Artificial Intelligence in Telecom – Industry Adoption AnalysisArtificial Intelligence in Telecom – Industry Adoption Analysis
Artificial Intelligence in Telecom – Industry Adoption AnalysisNetscribes
 
General Quiz @ OP Jindal University
General Quiz @ OP Jindal UniversityGeneral Quiz @ OP Jindal University
General Quiz @ OP Jindal UniversityRajan Arora
 
(Freshers' Science Quiz) Takneek quiz
(Freshers' Science Quiz) Takneek quiz(Freshers' Science Quiz) Takneek quiz
(Freshers' Science Quiz) Takneek quizQuiz Club IIT Kanpur
 
Vodafone Ppt
Vodafone PptVodafone Ppt
Vodafone Pptmarksh970
 
K Circle Weekly Quiz - Jan 9 , 2016 by Lucky Kaul
K Circle Weekly Quiz - Jan 9 , 2016 by Lucky KaulK Circle Weekly Quiz - Jan 9 , 2016 by Lucky Kaul
K Circle Weekly Quiz - Jan 9 , 2016 by Lucky KaulLucky Kaul
 
Minimalist Poster Quiz
Minimalist Poster QuizMinimalist Poster Quiz
Minimalist Poster QuizDipanwita Kar
 
Vatsal Vora, Nikita Moghe, Nikhil Hirve
Vatsal Vora, Nikita Moghe, Nikhil HirveVatsal Vora, Nikita Moghe, Nikhil Hirve
Vatsal Vora, Nikita Moghe, Nikhil HirveVarun Joshi
 
Artificial Intelligence - Opportunities and Challenges for Military Modeling ...
Artificial Intelligence - Opportunities and Challenges for Military Modeling ...Artificial Intelligence - Opportunities and Challenges for Military Modeling ...
Artificial Intelligence - Opportunities and Challenges for Military Modeling ...Andy Fawkes
 
Geek-A-Q (comp sci and tech quiz) finals, NIT JAMSHEDPUR
Geek-A-Q (comp sci and tech quiz) finals, NIT JAMSHEDPURGeek-A-Q (comp sci and tech quiz) finals, NIT JAMSHEDPUR
Geek-A-Q (comp sci and tech quiz) finals, NIT JAMSHEDPURPrasanna Gorai
 

La actualidad más candente (20)

TechnoBuzz - Tech Quiz
TechnoBuzz - Tech QuizTechnoBuzz - Tech Quiz
TechnoBuzz - Tech Quiz
 
The Amazing Ways Telecom Companies Use Artificial Intelligence And Machine Le...
The Amazing Ways Telecom Companies Use Artificial Intelligence And Machine Le...The Amazing Ways Telecom Companies Use Artificial Intelligence And Machine Le...
The Amazing Ways Telecom Companies Use Artificial Intelligence And Machine Le...
 
Nokia Strategy Presentation
Nokia Strategy PresentationNokia Strategy Presentation
Nokia Strategy Presentation
 
Presentation on failure of nokia
Presentation on failure of nokiaPresentation on failure of nokia
Presentation on failure of nokia
 
SIT tech quiz finals- 2019
SIT tech quiz finals- 2019   SIT tech quiz finals- 2019
SIT tech quiz finals- 2019
 
Artificial Intelligence in Telecom – Industry Adoption Analysis
Artificial Intelligence in Telecom – Industry Adoption AnalysisArtificial Intelligence in Telecom – Industry Adoption Analysis
Artificial Intelligence in Telecom – Industry Adoption Analysis
 
General Quiz @ OP Jindal University
General Quiz @ OP Jindal UniversityGeneral Quiz @ OP Jindal University
General Quiz @ OP Jindal University
 
(Freshers' Science Quiz) Takneek quiz
(Freshers' Science Quiz) Takneek quiz(Freshers' Science Quiz) Takneek quiz
(Freshers' Science Quiz) Takneek quiz
 
Vodafone Ppt
Vodafone PptVodafone Ppt
Vodafone Ppt
 
The Auto Quiz
The Auto QuizThe Auto Quiz
The Auto Quiz
 
K Circle Weekly Quiz - Jan 9 , 2016 by Lucky Kaul
K Circle Weekly Quiz - Jan 9 , 2016 by Lucky KaulK Circle Weekly Quiz - Jan 9 , 2016 by Lucky Kaul
K Circle Weekly Quiz - Jan 9 , 2016 by Lucky Kaul
 
Minimalist Poster Quiz
Minimalist Poster QuizMinimalist Poster Quiz
Minimalist Poster Quiz
 
Ibm case study
Ibm case studyIbm case study
Ibm case study
 
AGNITUZ QUIZ FINALS
AGNITUZ QUIZ FINALSAGNITUZ QUIZ FINALS
AGNITUZ QUIZ FINALS
 
Telecom ppt
Telecom pptTelecom ppt
Telecom ppt
 
Vatsal Vora, Nikita Moghe, Nikhil Hirve
Vatsal Vora, Nikita Moghe, Nikhil HirveVatsal Vora, Nikita Moghe, Nikhil Hirve
Vatsal Vora, Nikita Moghe, Nikhil Hirve
 
Artificial Intelligence - Opportunities and Challenges for Military Modeling ...
Artificial Intelligence - Opportunities and Challenges for Military Modeling ...Artificial Intelligence - Opportunities and Challenges for Military Modeling ...
Artificial Intelligence - Opportunities and Challenges for Military Modeling ...
 
Hp ppt
Hp pptHp ppt
Hp ppt
 
Geek-A-Q (comp sci and tech quiz) finals, NIT JAMSHEDPUR
Geek-A-Q (comp sci and tech quiz) finals, NIT JAMSHEDPURGeek-A-Q (comp sci and tech quiz) finals, NIT JAMSHEDPUR
Geek-A-Q (comp sci and tech quiz) finals, NIT JAMSHEDPUR
 
Artificial intelligence
Artificial intelligenceArtificial intelligence
Artificial intelligence
 

Destacado

Introducing: Tricode's Software Factory
Introducing: Tricode's Software FactoryIntroducing: Tricode's Software Factory
Introducing: Tricode's Software FactoryTricode (part of Dept)
 
De 4 belangrijkste risicofactoren van het nearshoring proces
De 4 belangrijkste risicofactoren van het nearshoring procesDe 4 belangrijkste risicofactoren van het nearshoring proces
De 4 belangrijkste risicofactoren van het nearshoring procesTricode (part of Dept)
 
Internet Addiction (Social Media Edition)
Internet Addiction (Social Media Edition)Internet Addiction (Social Media Edition)
Internet Addiction (Social Media Edition)Tricode (part of Dept)
 
How Technology is Affecting Society - STM 6
How Technology is Affecting Society - STM 6How Technology is Affecting Society - STM 6
How Technology is Affecting Society - STM 6Tricode (part of Dept)
 
Kids Can Code - an interactive IT workshop
Kids Can Code - an interactive IT workshopKids Can Code - an interactive IT workshop
Kids Can Code - an interactive IT workshopTricode (part of Dept)
 
Monolithic to Microservices Architecture - STM 6
Monolithic to Microservices Architecture - STM 6Monolithic to Microservices Architecture - STM 6
Monolithic to Microservices Architecture - STM 6Tricode (part of Dept)
 
Transfer Learning: An overview
Transfer Learning: An overviewTransfer Learning: An overview
Transfer Learning: An overviewjins0618
 
Porn, the leading influencer of Technology
Porn, the leading influencer of Technology Porn, the leading influencer of Technology
Porn, the leading influencer of Technology Tricode (part of Dept)
 
cvpaper.challenge in CVPR2015 (PRMU2015年12月)
cvpaper.challenge in CVPR2015 (PRMU2015年12月)cvpaper.challenge in CVPR2015 (PRMU2015年12月)
cvpaper.challenge in CVPR2015 (PRMU2015年12月)cvpaper. challenge
 
Backpropagation in Convolutional Neural Network
Backpropagation in Convolutional Neural NetworkBackpropagation in Convolutional Neural Network
Backpropagation in Convolutional Neural NetworkHiroshi Kuwajima
 
Introduction to Deep Learning with Python
Introduction to Deep Learning with PythonIntroduction to Deep Learning with Python
Introduction to Deep Learning with Pythonindico data
 

Destacado (18)

Tricode = Career + Fun
Tricode = Career + FunTricode = Career + Fun
Tricode = Career + Fun
 
Slide empr
Slide emprSlide empr
Slide empr
 
Quality Nearshoring met Tricode
Quality Nearshoring met TricodeQuality Nearshoring met Tricode
Quality Nearshoring met Tricode
 
Introducing: Tricode's Software Factory
Introducing: Tricode's Software FactoryIntroducing: Tricode's Software Factory
Introducing: Tricode's Software Factory
 
Customers speak on Magnolia CMS
Customers speak on Magnolia CMSCustomers speak on Magnolia CMS
Customers speak on Magnolia CMS
 
De 4 belangrijkste risicofactoren van het nearshoring proces
De 4 belangrijkste risicofactoren van het nearshoring procesDe 4 belangrijkste risicofactoren van het nearshoring proces
De 4 belangrijkste risicofactoren van het nearshoring proces
 
Internet Addiction (Social Media Edition)
Internet Addiction (Social Media Edition)Internet Addiction (Social Media Edition)
Internet Addiction (Social Media Edition)
 
How Technology is Affecting Society - STM 6
How Technology is Affecting Society - STM 6How Technology is Affecting Society - STM 6
How Technology is Affecting Society - STM 6
 
Kids Can Code - an interactive IT workshop
Kids Can Code - an interactive IT workshopKids Can Code - an interactive IT workshop
Kids Can Code - an interactive IT workshop
 
RESTful API - Best Practices
RESTful API - Best PracticesRESTful API - Best Practices
RESTful API - Best Practices
 
Monolithic to Microservices Architecture - STM 6
Monolithic to Microservices Architecture - STM 6Monolithic to Microservices Architecture - STM 6
Monolithic to Microservices Architecture - STM 6
 
Intro to JHipster
Intro to JHipster Intro to JHipster
Intro to JHipster
 
Deep learning
Deep learningDeep learning
Deep learning
 
Transfer Learning: An overview
Transfer Learning: An overviewTransfer Learning: An overview
Transfer Learning: An overview
 
Porn, the leading influencer of Technology
Porn, the leading influencer of Technology Porn, the leading influencer of Technology
Porn, the leading influencer of Technology
 
cvpaper.challenge in CVPR2015 (PRMU2015年12月)
cvpaper.challenge in CVPR2015 (PRMU2015年12月)cvpaper.challenge in CVPR2015 (PRMU2015年12月)
cvpaper.challenge in CVPR2015 (PRMU2015年12月)
 
Backpropagation in Convolutional Neural Network
Backpropagation in Convolutional Neural NetworkBackpropagation in Convolutional Neural Network
Backpropagation in Convolutional Neural Network
 
Introduction to Deep Learning with Python
Introduction to Deep Learning with PythonIntroduction to Deep Learning with Python
Introduction to Deep Learning with Python
 

Similar a Deep Learning - STM 6

How the convolutional neural network works behind the hood.
How the convolutional neural network works behind the hood.How the convolutional neural network works behind the hood.
How the convolutional neural network works behind the hood.USMANKHAN73908
 
Lecture 4: How it Works: Convolutional Neural Networks
Lecture 4: How it Works: Convolutional Neural NetworksLecture 4: How it Works: Convolutional Neural Networks
Lecture 4: How it Works: Convolutional Neural NetworksMohamed Loey
 
Day 3 take up convolutional neural network
Day 3  take up convolutional neural networkDay 3  take up convolutional neural network
Day 3 take up convolutional neural networkHuyPhmNht2
 
Optimization techniques in formulation Development- Plackett Burmann Design a...
Optimization techniques in formulation Development- Plackett Burmann Design a...Optimization techniques in formulation Development- Plackett Burmann Design a...
Optimization techniques in formulation Development- Plackett Burmann Design a...D.R. Chandravanshi
 
Deep Learning con CNTK by Pablo Doval
Deep Learning con CNTK by Pablo DovalDeep Learning con CNTK by Pablo Doval
Deep Learning con CNTK by Pablo DovalPlain Concepts
 
гдз алгебра решение задач 7 класс под ред. луканова н.д, 2005 год
гдз алгебра решение задач 7 класс под ред. луканова н.д, 2005 год гдз алгебра решение задач 7 класс под ред. луканова н.д, 2005 год
гдз алгебра решение задач 7 класс под ред. луканова н.д, 2005 год etigyasyujired73
 
Desing of Factorial Experiments
Desing of Factorial ExperimentsDesing of Factorial Experiments
Desing of Factorial ExperimentsRamon Saviñon
 
Estado del Arte de la IA
Estado del Arte de la IAEstado del Arte de la IA
Estado del Arte de la IAPlain Concepts
 
11b Hazard Mapping.ppt
11b Hazard Mapping.ppt11b Hazard Mapping.ppt
11b Hazard Mapping.pptJorge Atau
 
shake! 2017 예선 문제 풀이
shake! 2017 예선 문제 풀이shake! 2017 예선 문제 풀이
shake! 2017 예선 문제 풀이HYUNJEONG KIM
 
Alignment scoring functions
Alignment scoring functionsAlignment scoring functions
Alignment scoring functionsavrilcoghlan
 
Modelling for Strategic Design - IxDA Berlin 09/2013
Modelling for Strategic Design - IxDA Berlin 09/2013Modelling for Strategic Design - IxDA Berlin 09/2013
Modelling for Strategic Design - IxDA Berlin 09/2013Milan Guenther (eda.c)
 

Similar a Deep Learning - STM 6 (20)

How the convolutional neural network works behind the hood.
How the convolutional neural network works behind the hood.How the convolutional neural network works behind the hood.
How the convolutional neural network works behind the hood.
 
Lecture 4: How it Works: Convolutional Neural Networks
Lecture 4: How it Works: Convolutional Neural NetworksLecture 4: How it Works: Convolutional Neural Networks
Lecture 4: How it Works: Convolutional Neural Networks
 
Day 3 take up convolutional neural network
Day 3  take up convolutional neural networkDay 3  take up convolutional neural network
Day 3 take up convolutional neural network
 
Optimization techniques in formulation Development- Plackett Burmann Design a...
Optimization techniques in formulation Development- Plackett Burmann Design a...Optimization techniques in formulation Development- Plackett Burmann Design a...
Optimization techniques in formulation Development- Plackett Burmann Design a...
 
Deep Learning con CNTK by Pablo Doval
Deep Learning con CNTK by Pablo DovalDeep Learning con CNTK by Pablo Doval
Deep Learning con CNTK by Pablo Doval
 
гдз алгебра решение задач 7 класс под ред. луканова н.д, 2005 год
гдз алгебра решение задач 7 класс под ред. луканова н.д, 2005 год гдз алгебра решение задач 7 класс под ред. луканова н.д, 2005 год
гдз алгебра решение задач 7 класс под ред. луканова н.д, 2005 год
 
Desing of Factorial Experiments
Desing of Factorial ExperimentsDesing of Factorial Experiments
Desing of Factorial Experiments
 
Estado del Arte de la IA
Estado del Arte de la IAEstado del Arte de la IA
Estado del Arte de la IA
 
DL_Lecture_29_06_2020.pptx
DL_Lecture_29_06_2020.pptxDL_Lecture_29_06_2020.pptx
DL_Lecture_29_06_2020.pptx
 
11b Hazard Mapping.ppt
11b Hazard Mapping.ppt11b Hazard Mapping.ppt
11b Hazard Mapping.ppt
 
مذكرة انجليزى kg2
مذكرة انجليزى kg2مذكرة انجليزى kg2
مذكرة انجليزى kg2
 
Revision sheet 1_sixth_primary
Revision sheet 1_sixth_primaryRevision sheet 1_sixth_primary
Revision sheet 1_sixth_primary
 
shake! 2017 예선 문제 풀이
shake! 2017 예선 문제 풀이shake! 2017 예선 문제 풀이
shake! 2017 예선 문제 풀이
 
PC63 Remedial Design
PC63 Remedial DesignPC63 Remedial Design
PC63 Remedial Design
 
AP_Peer Monitor Report_MCYP_Monitoring Tool
AP_Peer Monitor Report_MCYP_Monitoring ToolAP_Peer Monitor Report_MCYP_Monitoring Tool
AP_Peer Monitor Report_MCYP_Monitoring Tool
 
Alignment scoring functions
Alignment scoring functionsAlignment scoring functions
Alignment scoring functions
 
TARJETA MADRE
TARJETA MADRETARJETA MADRE
TARJETA MADRE
 
Amon wp en
Amon wp enAmon wp en
Amon wp en
 
Master uji validitas
Master uji validitasMaster uji validitas
Master uji validitas
 
Modelling for Strategic Design - IxDA Berlin 09/2013
Modelling for Strategic Design - IxDA Berlin 09/2013Modelling for Strategic Design - IxDA Berlin 09/2013
Modelling for Strategic Design - IxDA Berlin 09/2013
 

Más de Tricode (part of Dept)

The Top Benefits of Magnolia CMS’s Inspirational Open Suite Ideology
The Top Benefits of Magnolia CMS’s Inspirational Open Suite IdeologyThe Top Benefits of Magnolia CMS’s Inspirational Open Suite Ideology
The Top Benefits of Magnolia CMS’s Inspirational Open Suite IdeologyTricode (part of Dept)
 
Mobile Sensor Networks based on Smartphone devices and Web Services
Mobile Sensor Networks based on Smartphone devices and Web ServicesMobile Sensor Networks based on Smartphone devices and Web Services
Mobile Sensor Networks based on Smartphone devices and Web ServicesTricode (part of Dept)
 
Keeping Your Clients Happy and Your Management Even Happier
Keeping Your Clients Happy and Your Management Even Happier Keeping Your Clients Happy and Your Management Even Happier
Keeping Your Clients Happy and Your Management Even Happier Tricode (part of Dept)
 
AEM Digital Assets Management - What's new in 6.2?
AEM Digital Assets Management - What's new in 6.2?AEM Digital Assets Management - What's new in 6.2?
AEM Digital Assets Management - What's new in 6.2?Tricode (part of Dept)
 
10 nearshoring it trends om in 2016 te volgen
10 nearshoring it trends om in 2016 te volgen 10 nearshoring it trends om in 2016 te volgen
10 nearshoring it trends om in 2016 te volgen Tricode (part of Dept)
 
Why you should use Adobe Experience Manager Mobile
Why you should use Adobe Experience Manager Mobile Why you should use Adobe Experience Manager Mobile
Why you should use Adobe Experience Manager Mobile Tricode (part of Dept)
 
Communication and its Importance to a Developer
Communication and its Importance to a DeveloperCommunication and its Importance to a Developer
Communication and its Importance to a DeveloperTricode (part of Dept)
 
12 hot features to engage and save time with aem 6.2
12 hot features to engage and save time with aem 6.212 hot features to engage and save time with aem 6.2
12 hot features to engage and save time with aem 6.2Tricode (part of Dept)
 
Content Marketing: How to Create Relevant Content for Your Audience
Content Marketing: How to Create Relevant Content for Your AudienceContent Marketing: How to Create Relevant Content for Your Audience
Content Marketing: How to Create Relevant Content for Your AudienceTricode (part of Dept)
 
Adobe Experience Manager - The hub within the Marketing Cloud
Adobe Experience Manager - The hub within the Marketing CloudAdobe Experience Manager - The hub within the Marketing Cloud
Adobe Experience Manager - The hub within the Marketing CloudTricode (part of Dept)
 
Continuous Delivery for Open Source Java projects
Continuous Delivery for Open Source Java projectsContinuous Delivery for Open Source Java projects
Continuous Delivery for Open Source Java projectsTricode (part of Dept)
 

Más de Tricode (part of Dept) (18)

The Top Benefits of Magnolia CMS’s Inspirational Open Suite Ideology
The Top Benefits of Magnolia CMS’s Inspirational Open Suite IdeologyThe Top Benefits of Magnolia CMS’s Inspirational Open Suite Ideology
The Top Benefits of Magnolia CMS’s Inspirational Open Suite Ideology
 
Agile QA 2017: A New Hope
Agile QA 2017: A New HopeAgile QA 2017: A New Hope
Agile QA 2017: A New Hope
 
Mobile Sensor Networks based on Smartphone devices and Web Services
Mobile Sensor Networks based on Smartphone devices and Web ServicesMobile Sensor Networks based on Smartphone devices and Web Services
Mobile Sensor Networks based on Smartphone devices and Web Services
 
Keeping Your Clients Happy and Your Management Even Happier
Keeping Your Clients Happy and Your Management Even Happier Keeping Your Clients Happy and Your Management Even Happier
Keeping Your Clients Happy and Your Management Even Happier
 
AEM Digital Assets Management - What's new in 6.2?
AEM Digital Assets Management - What's new in 6.2?AEM Digital Assets Management - What's new in 6.2?
AEM Digital Assets Management - What's new in 6.2?
 
10 nearshoring it trends om in 2016 te volgen
10 nearshoring it trends om in 2016 te volgen 10 nearshoring it trends om in 2016 te volgen
10 nearshoring it trends om in 2016 te volgen
 
Tricode & Magnolia
Tricode & MagnoliaTricode & Magnolia
Tricode & Magnolia
 
Why you should use Adobe Experience Manager Mobile
Why you should use Adobe Experience Manager Mobile Why you should use Adobe Experience Manager Mobile
Why you should use Adobe Experience Manager Mobile
 
Offshoring: Top 10 verborgen kosten
Offshoring: Top 10 verborgen kostenOffshoring: Top 10 verborgen kosten
Offshoring: Top 10 verborgen kosten
 
Communication and its Importance to a Developer
Communication and its Importance to a DeveloperCommunication and its Importance to a Developer
Communication and its Importance to a Developer
 
Little Brother Is Watching You
Little Brother Is Watching YouLittle Brother Is Watching You
Little Brother Is Watching You
 
12 hot features to engage and save time with aem 6.2
12 hot features to engage and save time with aem 6.212 hot features to engage and save time with aem 6.2
12 hot features to engage and save time with aem 6.2
 
Content Marketing: How to Create Relevant Content for Your Audience
Content Marketing: How to Create Relevant Content for Your AudienceContent Marketing: How to Create Relevant Content for Your Audience
Content Marketing: How to Create Relevant Content for Your Audience
 
Provisioning aem with puppet
Provisioning aem with puppet Provisioning aem with puppet
Provisioning aem with puppet
 
Adobe Experience Manager - The hub within the Marketing Cloud
Adobe Experience Manager - The hub within the Marketing CloudAdobe Experience Manager - The hub within the Marketing Cloud
Adobe Experience Manager - The hub within the Marketing Cloud
 
Continuous Delivery for Open Source Java projects
Continuous Delivery for Open Source Java projectsContinuous Delivery for Open Source Java projects
Continuous Delivery for Open Source Java projects
 
Intro to OSGi
Intro to OSGiIntro to OSGi
Intro to OSGi
 
Online marketing trends 2016
Online marketing trends 2016Online marketing trends 2016
Online marketing trends 2016
 

Último

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfPrecisely
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxLoriGlavin3
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 

Último (20)

Moving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdfMoving Beyond Passwords: FIDO Paris Seminar.pdf
Moving Beyond Passwords: FIDO Paris Seminar.pdf
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdfHyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
Hyperautomation and AI/ML: A Strategy for Digital Transformation Success.pdf
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptxThe Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
The Fit for Passkeys for Employee and Consumer Sign-ins: FIDO Paris Seminar.pptx
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 

Deep Learning - STM 6

  • 1. Deep Learning for Computer Vision Igor Trajkovski admin@time.mk twitter.com/mkrobot facebook.com/mkrobot SkopjeTechMeetup 29.9.2016 Slides from Andrej Karpathy, Yann LeCun & Geoffrey Hinton
  • 3. Artificial Neural Network Very few assumptions made about the input vector.
  • 4. In many real-world applications input vectors have structure. Spectrograms ImagesText
  • 6. Hubel & Wiesel, 1959 RECEPTIVE FIELDS OF SINGLE NEURONES IN THE CAT'S STRIATE CORTEX 1962 RECEPTIVE FIELDS, BINOCULAR INTERACTION AND FUNCTIONAL ARCHITECTURE IN THE CAT'S VISUAL CORTEX https://www.youtube.com/watch?v=8VdFf3egwfg
  • 7. A bit of history: Neurocognitron [Fukushima 1980] “sandwich” architecture (SCSCSC…) simple cells: modifiable parameters complex cells: perform pooling
  • 8. Gradient-based learning applied to document recognition [LeCun, Bottou, Bengio, Haffner 1998] LeNet-5 https://www.youtube.com/watch?v=FwFduRA_L6Q
  • 11. ImageNet Classification with Deep Convolutional Neural Networks [Krizhevsky, Sutskever, Hinton, 2012] “AlexNet” Deng et al. Russakovsky et al. NVIDIA et al.
  • 12. “What I learned from competing against a ConvNet on ImageNet” (karpathy.github.io)
  • 13. “What I learned from competing against a ConvNet on ImageNet” (karpathy.github.io) TLDR: Human accuracy is somewhere 2-5%. (depending on how much training or how little life you have)
  • 14. (slide from Kaiming He’s recent presentation)
  • 15. [224x224x3] f 1000 numbers, indicating class scores Feature Extraction vector describing various image statistics [224x224x3] f 1000 numbers, indicating class scores training training
  • 16. “Run the image through 20 layers of 3x3 convolutions and train the filters with SGD.”
  • 18. [224x224x3] f 1000 numbers, indicating class scores training Only two basic operations are involved throughout: 1.  Dot products w.x 2.  Max operations parameters (~10M of them)
  • 19. preview: e.g. 200K numbers e.g. 10 numbers
  • 21. 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Convolution 2-D
  • 22. 1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Convolution 2-D
  • 23. 1 1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Convolution 2-D
  • 24. 1 1 1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Convolution 2-D
  • 25. 1 1 1 1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Convolution 2-D
  • 26. 1 1 1 1 1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Convolution 2-D
  • 27. 1 1 1 1 1 1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Convolution 2-D
  • 28. 1 1 1 1 1 1 1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Convolution 2-D
  • 29. 1 1 1 1 1 1 1 1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Convolution 2-D
  • 30. 1 1 1 1 1 1 1 1 1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Convolution 2-D
  • 31. 1 1 1 1 1 1 1 1 1 1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Convolution 2-D
  • 32. 1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Convolution 2-D
  • 33. 1 1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Convolution 2-D
  • 34. 1 1 -1 1 1 1 -1 1 1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 Convolution 2-D
  • 35. 1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 1 -1 1 1 1 -1 1 1 55 1 1 -1 1 1 1 -1 1 1 Convolution 2-D
  • 36. 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 1 -1 -1 -1 -1 -1 1 -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 = 0.77 -0.11 0.11 0.33 0.55 -0.11 0.33 -0.11 1.00 -0.11 0.33 -0.11 0.11 -0.11 0.11 -0.11 1.00 -0.33 0.11 -0.11 0.55 0.33 0.33 -0.33 0.55 -0.33 0.33 0.33 0.55 -0.11 0.11 -0.33 1.00 -0.11 0.11 -0.11 0.11 -0.11 0.33 -0.11 1.00 -0.11 0.33 -0.11 0.55 0.33 0.11 -0.11 0.77 Convolution 2-D
  • 38. 32 32 3 Convolution Layer 5x5x3 filter 32x32x3 image Convolve the filter with the image i.e. “slide over the image spatially, computing dot products”
  • 39. 32 32 3 Convolution Layer 5x5x3 filter 32x32x3 image Convolve the filter with the image i.e. “slide over the image spatially, computing dot products” Filters always extend the full depth of the input volume
  • 40. 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter 1 number: the result of taking a dot product between the filter and a small 5x5x3 chunk of the image (i.e. 5*5*3 = 75-dimensional dot product + bias)
  • 41. 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter convolve (slide) over all spatial locations activation map 1 28 28
  • 42. 32 32 3 Convolution Layer 32x32x3 image 5x5x3 filter convolve (slide) over all spatial locations activation maps 1 28 28 consider a second, green filter
  • 43. 32 32 3 Convolution Layer activation maps 6 28 28 For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps: We stack these up to get a “new image” of size 28x28x6!
  • 44. 32 32 3 Convolution Layer activation maps 6 28 28 For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps: We processed [32x32x3] volume into [28x28x6] volume. Q: how many parameters would this be if we used a fully connected layer instead?
  • 45. 32 32 3 Convolution Layer activation maps 6 28 28 For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps: We processed [32x32x3] volume into [28x28x6] volume. Q: how many parameters would this be if we used a fully connected layer instead? A: (32*32*3)*(28*28*6) = 14.5M parameters, ~14.5M multiplies
  • 46. 32 32 3 Convolution Layer activation maps 6 28 28 For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps: We processed [32x32x3] volume into [28x28x6] volume. Q: how many parameters are used instead?
  • 47. 32 32 3 Convolution Layer activation maps 6 28 28 For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps: We processed [32x32x3] volume into [28x28x6] volume. Q: how many parameters are used instead? --- And how many multiplies? A: (5*5*3)*6 = 450 parameters
  • 48. 32 32 3 Convolution Layer activation maps 6 28 28 For example, if we had 6 5x5 filters, we’ll get 6 separate activation maps: We processed [32x32x3] volume into [28x28x6] volume. Q: how many parameters are used instead? A: (5*5*3)*6 = 450 parameters, (5*5*3)*(28*28*6) = ~350K multiplies
  • 49. example 5x5 filters (32 total) We call the layer convolutional because it is related to convolution of two signals: elementwise multiplication and sum of a filter and the signal (image) one filter => one activation map
  • 50. Preview: ConvNet is a sequence of Convolution Layers, interspersed with activation functions 32 32 3 28 28 6 CONV, ReLU e.g. 6 5x5x3 filters
  • 51. Rectified Linear Units (ReLUs) 0.77 -0.11 0.11 0.33 0.55 -0.11 0.33 -0.11 1.00 -0.11 0.33 -0.11 0.11 -0.11 0.11 -0.11 1.00 -0.33 0.11 -0.11 0.55 0.33 0.33 -0.33 0.55 -0.33 0.33 0.33 0.55 -0.11 0.11 -0.33 1.00 -0.11 0.11 -0.11 0.11 -0.11 0.33 -0.11 1.00 -0.11 0.33 -0.11 0.55 0.33 0.11 -0.11 0.77 0.77
  • 52. 0.77 0 Rectified Linear Units (ReLUs) 0.77 -0.11 0.11 0.33 0.55 -0.11 0.33 -0.11 1.00 -0.11 0.33 -0.11 0.11 -0.11 0.11 -0.11 1.00 -0.33 0.11 -0.11 0.55 0.33 0.33 -0.33 0.55 -0.33 0.33 0.33 0.55 -0.11 0.11 -0.33 1.00 -0.11 0.11 -0.11 0.11 -0.11 0.33 -0.11 1.00 -0.11 0.33 -0.11 0.55 0.33 0.11 -0.11 0.77
  • 53. 0.77 0 0.11 0.33 0.55 0 0.33 Rectified Linear Units (ReLUs) 0.77 -0.11 0.11 0.33 0.55 -0.11 0.33 -0.11 1.00 -0.11 0.33 -0.11 0.11 -0.11 0.11 -0.11 1.00 -0.33 0.11 -0.11 0.55 0.33 0.33 -0.33 0.55 -0.33 0.33 0.33 0.55 -0.11 0.11 -0.33 1.00 -0.11 0.11 -0.11 0.11 -0.11 0.33 -0.11 1.00 -0.11 0.33 -0.11 0.55 0.33 0.11 -0.11 0.77
  • 54. 0.77 0 0.11 0.33 0.55 0 0.33 0 1.00 0 0.33 0 0.11 0 0.11 0 1.00 0 0.11 0 0.55 0.33 0.33 0 0.55 0 0.33 0.33 0.55 0 0.11 0 1.00 0 0.11 0 0.11 0 0.33 0 1.00 0 0.33 0 0.55 0.33 0.11 0 0.77 Rectified Linear Units (ReLUs) 0.77 -0.11 0.11 0.33 0.55 -0.11 0.33 -0.11 1.00 -0.11 0.33 -0.11 0.11 -0.11 0.11 -0.11 1.00 -0.33 0.11 -0.11 0.55 0.33 0.33 -0.33 0.55 -0.33 0.33 0.33 0.55 -0.11 0.11 -0.33 1.00 -0.11 0.11 -0.11 0.11 -0.11 0.33 -0.11 1.00 -0.11 0.33 -0.11 0.55 0.33 0.11 -0.11 0.77
  • 55. Preview: ConvNet is a sequence of Convolutional Layers, interspersed with activation functions 32 32 3 CONV, ReLU e.g. 6 5x5x3 filters 28 28 6 CONV, ReLU e.g. 10 5x5x6 filters CONV, ReLU …. 10 24 24
  • 56. two more layers to go: POOL/FC
  • 57. Pooling layer -  makes the representations smaller and more manageable -  operates over each activation map independently:
  • 58. 1 1 2 4 5 6 7 8 3 2 1 0 1 2 3 4 Single depth slice x y max pool with 2x2 filters and stride 2 6 8 3 4 MAX POOLING
  • 59. Fully Connected Layer (FC layer) -  Contains neurons that connect to the entire input volume, as in ordinary Neural Networks
  • 62. Case Study: AlexNet [Krizhevsky et al. 2012] Input: 227x227x3 images First layer (CONV1): 96 11x11 filters applied at stride 4 => Q: what is the output volume size? Hint: (227-11)/4+1 = 55
  • 63. Case Study: AlexNet [Krizhevsky et al. 2012] Input: 227x227x3 images First layer (CONV1): 96 11x11 filters applied at stride 4 => Output volume [55x55x96] Q: What is the total number of parameters in this layer?
  • 64. Case Study: AlexNet [Krizhevsky et al. 2012] Input: 227x227x3 images First layer (CONV1): 96 11x11 filters applied at stride 4 => Output volume [55x55x96] Parameters: (11*11*3)*96 = 35K
  • 65. Case Study: AlexNet [Krizhevsky et al. 2012] 1st layer filters
  • 66. Case Study: AlexNet [Krizhevsky et al. 2012] 2nd layer filters
  • 68. Case Study: AlexNet [Krizhevsky et al. 2012] Full (simplified) AlexNet architecture: [227x227x3] INPUT [55x55x96] CONV1: 96 11x11 filters at stride 4, pad 0 [27x27x96] MAX POOL1: 3x3 filters at stride 2 [27x27x96] NORM1: Normalization layer [27x27x256] CONV2: 256 5x5 filters at stride 1, pad 2 [13x13x256] MAX POOL2: 3x3 filters at stride 2 [13x13x256] NORM2: Normalization layer [13x13x384] CONV3: 384 3x3 filters at stride 1, pad 1 [13x13x384] CONV4: 384 3x3 filters at stride 1, pad 1 [13x13x256] CONV5: 256 3x3 filters at stride 1, pad 1 [6x6x256] MAX POOL3: 3x3 filters at stride 2 [4096] FC6: 4096 neurons [4096] FC7: 4096 neurons [1000] FC8: 1000 neurons (class scores) Compared to LeCun 1998: 1 DATA: - More data: 10^6 vs. 10^3 2 COMPUTE: - GPU (~100x speedup) 3 ALGORITHM: - Deeper: More layers (8 weight layers) - Fancy regularization (dropout) - Fancy non-linearity (ReLU)
  • 69. Case Study: AlexNet [Krizhevsky et al. 2012] Full (simplified) AlexNet architecture: [227x227x3] INPUT [55x55x96] CONV1: 96 11x11 filters at stride 4, pad 0 [27x27x96] MAX POOL1: 3x3 filters at stride 2 [27x27x96] NORM1: Normalization layer [27x27x256] CONV2: 256 5x5 filters at stride 1, pad 2 [13x13x256] MAX POOL2: 3x3 filters at stride 2 [13x13x256] NORM2: Normalization layer [13x13x384] CONV3: 384 3x3 filters at stride 1, pad 1 [13x13x384] CONV4: 384 3x3 filters at stride 1, pad 1 [13x13x256] CONV5: 256 3x3 filters at stride 1, pad 1 [6x6x256] MAX POOL3: 3x3 filters at stride 2 [4096] FC6: 4096 neurons [4096] FC7: 4096 neurons [1000] FC8: 1000 neurons (class scores) Details/Retrospectives: - first use of ReLU - used Norm layers (not common anymore) - heavy data augmentation - dropout 0.5 - batch size 128 - SGD Momentum 0.9 - Learning rate 1e-2, reduced by 10 manually when val accuracy plateaus - L2 weight decay 5e-4 - 7 CNN ensemble: 18.2% -> 15.4%
  • 70. Case Study: VGGNet [Simonyan and Zisserman, 2014] best model Only 3x3 CONV stride 1, pad 1 and 2x2 MAX POOL stride 2 11.2% top 5 error in ILSVRC 2013 -> 7.3% top 5 error
  • 71. Case Study: GoogLeNet [Szegedy et al., 2014] Inception module ILSVRC 2014 winner (6.7% top 5 error)
  • 72. Slide from Kaiming He’s recent presentation https://www.youtube.com/watch?v=1PGLj-uKT1w Case Study: ResNet [He et al., 2015] ILSVRC 2015 winner (3.6% top 5 error)
  • 73. Case Study: ResNet [He et al., 2015] 224x224x3 spatial dimension only 56x56!
  • 74. VGG: ~2-3 weeks training with 4 GPUs ResNet: ~2-3 weeks training with 4 GPUs ~$1K each
  • 76. Transfer Learning 1. Train on Imagenet 3. Medium dataset: finetuning more data = retrain more of the network (or all of it) 2. Small dataset: feature extractor Freeze these Train this Freeze these Train this
  • 77. Transfer Learning CNN Features off-the-shelf: an Astounding Baseline for Recognition [Razavian et al, 2014]
  • 78. Addressing other tasks... image CNN features 224x224x3 A block of compute with a few million parameters. 7x7x512
  • 79. Addressing other tasks... image CNN features 224x224x3 A block of compute with a few million parameters. 7x7x512 predicted thing desired thing
  • 80. Addressing other tasks... image CNN features 224x224x3 A block of compute with a few million parameters. 7x7x512 predicted thing desired thing this part changes from task to task
  • 81. Image Classification thing = a vector of probabilities for different classes image CNN features 224x224x3 7x7x512 e.g. vector of 1000 numbers giving probabilities for different classes. fully connected layer
  • 82. Image Captioning image CNN features 224x224x3 7x7x512 A sequence of 10,000-dimensional vectors giving probabilities of different words in the caption. RNN
  • 83. Localization image CNN features 224x224x3 7x7x512 fully connected layer Class probabilities (as before) 4 numbers: -  X coord -  Y coord -  Width -  Height
  • 84. Reinforcement Learning image CNN features 160x210x3 fully connected e.g. vector of 8 numbers giving probability of wanting to take any of the 8 possible ATARI actions. Mnih et al. 2015
  • 85. ConvNets are everywhere… e.g. Google Photos search Face Verification, Taigman et al. 2014 (FAIR) Self-driving cars[Goodfellow et al. 2014] Ciresan et al. 2013 Turaga et al 2010
  • 86. ConvNets are everywhere… Whale recognition, Kaggle Challenge Satellite image analysis Mnih and Hinton, 2010 Galaxy Challenge Dielman et al. 2015 WaveNet, van den Oord et al. 2016 Image captioning, Vinyals et al. 2015
  • 87. ConvNets are everywhere… AlphaGo, Silver et al 2016 VizDoom StarCraft
  • 88. ConvNets are everywhere… DeepDream reddit.com/r/deepdream NeuralStyle, Gatys et al. 2015 deepart.io, Prisma, etc.
  • 90. What hardware do I use? Buy your own machine: -  NVIDIA DIGITS DevBox (TITAN X GPUs) -  NVIDIA DGX-1 (P100 GPUs) Build your own machine: https://graphific.github.io/posts/building-a-deep-learning-dream-machine/ GPUs in the cloud: -  Amazon AWS (GRID K520 :( ) -  Microsoft Azure (soon); 4x K80 GPUs -  Cirrascale (“rent-a-box”)
  • 91. What framework do I use? Caffe Torch Theano LasagneKeras TensorFlow Mxnet chainer Nervana’s Neon Microsoft’s CNTK Deeplearning4j ...
  • 92. e.g. with keras.ioThe power is easily accessible.
  • 93. Q: How do I know what architecture/hyperparameters to use? A: don’t be a hero. 1.  Take whatever works best on ILSVRC (latest ResNet) 2.  Download a pretrained model 3.  Potentially add/delete some parts of it 4.  Finetune it on your application. 5.  Play with the regularization strength (dropout rates)
  • 94. MOOCS Machine Learning Coursera.org, Andrew Ng Neural Networks for Machine Learning Coursera.org, Geoffrey Hinton Computational Neuroscience Coursera.org, Rajesh P.N. Rao Deep Learning Udacity.org, Vincent Vanhoucke Convolutional Neural Networks Stanford.edu, Andrej Karapthy (in Skopje) Convolutional Neural Networks & DL for NLP @ edu.time.mk