SlideShare una empresa de Scribd logo
1 de 61
Descargar para leer sin conexión
[course	
  site]	
  
Verónica Vilaplana
veronica.vilaplana@upc.edu
Associate Professor
Universitat Politecnica de Catalunya
Technical University of Catalonia
Convolutional Neural
Networks
Day 4 Lecture 1
#DLUPC
Index	
  
•  Mo-va-on:	
  
•  Local	
  connec-vity	
  
•  Parameter	
  sharing	
  
•  Pooling	
  and	
  subsampling	
  
•  Layers	
  
•  Convolu-onal	
  
•  Pooling	
  
•  Fully	
  connected	
  
•  Ac-va-on	
  func-ons	
  
•  Batch	
  normaliza-on	
  
•  Upsampling	
  
•  Examples	
  
2	
  
Mo)va)on	
  
3	
  
Neural	
  networks	
  for	
  visual	
  data	
  
•  Example:	
  Image	
  recogni)on	
  
•  Given	
  some	
  input	
  image,	
  iden-fy	
  which	
  object	
  it	
  contains	
  
	
  
	
  
4	
  
sun	
  flower	
  
Caltech101	
  dataset	
  
image	
  size	
  150x112	
  
neurons	
  connected	
  to	
  16800	
  inputs	
  
Neural	
  networks	
  for	
  visual	
  data	
  
	
  
•  We	
  can	
  design	
  neural	
  networks	
  that	
  are	
  specifically	
  adapted	
  for	
  such	
  problems	
  
•  must	
  deal	
  with	
  very	
  high-­‐dimensional	
  inputs	
  
•  150	
  x	
  112	
  pixels	
  =	
  16800	
  inputs,	
  or	
  3	
  x	
  16800	
  if	
  RGB	
  pixels	
  
•  can	
  exploit	
  the	
  2D	
  topology	
  of	
  pixels	
  (or	
  3D	
  for	
  video	
  data)	
  
•  can	
  build	
  in	
  invariance	
  to	
  certain	
  varia-ons	
  we	
  can	
  expect	
  
•  transla-ons,	
  illumina-on,	
  etc.	
  
•  Convolu)onal	
  networks	
  are	
  a	
  specialized	
  kind	
  of	
  neural	
  network	
  for	
  processing	
  data	
  
that	
  has	
  a	
  known,	
  grid-­‐like	
  topology.	
  They	
  leverage	
  these	
  ideas:	
  
•  local	
  connec-vity	
  
•  parameter	
  sharing	
  
•  pooling	
  /	
  subsampling	
  hidden	
  units	
  
5	
  S.	
  Credit:	
  H.	
  Larochelle	
  
Convolu)onal	
  neural	
  networks	
  
Local	
  connec)vity	
  
•  First	
  idea:	
  use	
  a	
  local	
  connec-vity	
  of	
  hidden	
  units	
  
•  each	
  hidden	
  unit	
  is	
  connected	
  only	
  to	
  a	
  	
  
	
  	
  	
  	
  	
  	
  subregion	
  (patch)	
  of	
  the	
  input	
  image:	
  recep)ve	
  field	
  
•  it	
  is	
  connected	
  to	
  all	
  channels	
  
•  1	
  if	
  greyscale	
  image	
  
•  3	
  (R,	
  G,	
  B)	
  for	
  color	
  image	
  
•  …	
  
•  Solves	
  the	
  following	
  problems:	
  
•  fully	
  connected	
  hidden	
  layer	
  would	
  have	
  
	
  	
  	
  	
  	
  	
  	
  an	
  unmanageable	
  number	
  of	
  parameters	
  
•  compu-ng	
  the	
  linear	
  ac-va-ons	
  of	
  the	
  	
  
	
  	
  	
  	
  	
  	
  	
  	
  hidden	
  units	
  would	
  be	
  very	
  expensive	
  
6	
  
S.	
  Credit:	
  H.	
  Larochelle	
  
Convolu)onal	
  neural	
  networks	
  
Parameter	
  sharing	
  
•  Second	
  idea:	
  share	
  matrix	
  of	
  parameters	
  across	
  certain	
  units	
  
•  units	
  organized	
  into	
  the	
  same	
  ‘‘feature	
  map’’	
  share	
  parameters	
  
•  hidden	
  units	
  within	
  a	
  feature	
  map	
  cover	
  different	
  posi-ons	
  in	
  the	
  image	
  
•  Solves	
  the	
  following	
  problems:	
  
•  reduces	
  even	
  more	
  the	
  number	
  of	
  parameters	
  
•  will	
  extract	
  the	
  same	
  features	
  at	
  every	
  posi-on	
  (features	
  are	
  ‘‘equivariant’’)	
  
7	
  
Wij	
  is	
  the	
  matrix	
  
connec-ng	
  the	
  ith	
  
input	
  channel	
  
with	
  the	
  jth	
  
feature	
  map	
  
S.	
  Credit:	
  H.	
  Larochelle	
  
Convolu)onal	
  neural	
  networks	
  
Parameter	
  sharing	
  
	
  
•  Each	
  feature	
  map	
  forms	
  a	
  2D	
  grid	
  of	
  features	
  
•  can	
  be	
  computed	
  with	
  a	
  discrete	
  convolu)on	
  of	
  a	
  kernel	
  matrix	
  kij	
  	
  
	
  which	
  is	
  the	
  hidden	
  weights	
  matrix	
  Wij	
  with	
  its	
  rows	
  and	
  columns	
  flipped	
  
8	
  
Convolu-ons	
  
Input	
  image	
   Feature	
  Maps	
  yj
= f ( kij
∗ xi
)
i
∑
xi	
  ith	
  channel	
  of	
  input,	
  yj	
  hidden	
  layer	
  
Convolu)onal	
  neural	
  networks	
  
•  Convolu-on	
  as	
  feature	
  extrac-on:	
  
	
  applying	
  a	
  filterbank	
  
•  but	
  filters	
  are	
  learned	
  
9	
  
Input	
   Feature	
  Map	
  
.
.
.
Convolu)onal	
  neural	
  networks	
  
Pooling	
  and	
  subsampling	
  
•  Third	
  idea:	
  pool	
  hidden	
  units	
  in	
  same	
  neighborhood	
  
•  pooling	
  is	
  performed	
  in	
  non-­‐overlapping	
  neighborhoods	
  (subsampling)	
  
•  an	
  alterna-ve	
  to	
  ‘‘max’’	
  pooling	
  is	
  ‘‘average’’	
  pooling	
  
•  pooling	
  reduces	
  dimensionality	
  and	
  provides	
  invariance	
  to	
  small	
  local	
  changes	
  
10	
  
Max	
  
yi
( j,k) = max
p,q
xi
( j + p,k + q)
Convolu)onal	
  Neural	
  Networks	
  
•  Convolu)onal	
  neural	
  networks	
  alternate	
  between	
  convolu)onal	
  layers	
  (followed	
  by	
  
a	
  nonlinearity)	
  and	
  pooling	
  layers	
  (basic	
  architecture)	
  
•  For	
  recogni-on:	
  output	
  layer	
  is	
  a	
  regular,	
  fully	
  connected	
  layer	
  with	
  sohmax	
  non-­‐
linearity	
  
•  output	
  provides	
  an	
  es-mate	
  of	
  the	
  condi-onal	
  probability	
  of	
  each	
  class	
  
•  The	
  network	
  is	
  trained	
  by	
  stochas-c	
  gradient	
  descent	
  (&	
  variants)	
  
•  backpropaga-on	
  is	
  used	
  similarly	
  as	
  in	
  a	
  fully	
  connected	
  network	
   11	
  
train	
  the	
  weights	
  of	
  filters	
  
Convolu)onal	
  Neural	
  Networks	
  
12	
  
CNN=	
  learning	
  
hierarchical	
  
representa)ons	
  with	
  
increasing	
  levels	
  of	
  
abstrac)on	
  
	
  
End-­‐to-­‐end	
  training:	
  
joint	
  op-miza-on	
  of	
  
features	
  and	
  classifier	
  
Fig.	
  Credit:	
  DLBook	
  
Example:	
  LeNet-­‐5	
  	
  
13	
  
•  LeCun	
  et	
  al.,	
  1998	
  
MNIST	
  digit	
  classifica-on	
  problem	
  
handwriken	
  digits	
  
60,000	
  training	
  examples	
  
10,000	
  test	
  samples	
  
10	
  classes	
  
28x28	
  grayscale	
  images	
  
Y.	
  LeCun,	
  L.	
  Bokou,	
  Y.	
  Bengio,	
  and	
  P.	
  Haffner,	
  Gradient-­‐based	
  learning	
  applied	
  to	
  document	
  recogni-on,	
  1998.	
  
Conv	
  filters	
  were	
  5x5,	
  applied	
  at	
  stride	
  1	
  
Sigmoid	
  or	
  tanh	
  nonlinearity	
  
Subsampling	
  (average	
  pooling)	
  layers	
  were	
  2x2	
  applied	
  at	
  stride	
  2	
  
Fully	
  connected	
  layers	
  at	
  the	
  end	
  
i.e.	
  architecture	
  is	
  [CONV-­‐POOL-­‐CONV-­‐POOL-­‐FC-­‐FC]	
  
Layers	
  
14	
  
Convolu)onal	
  Neural	
  Networks	
  
15	
  
A	
  regular	
  3-­‐layer	
  Neural	
  Network A	
  ConvNet	
  with	
  3	
  layers	
  
•  In	
  ConvNets	
  inputs	
  are	
  ‘images’	
  (architecture	
  is	
  constrained)	
  
•  A	
  ConvNet	
  arranges	
  neurons	
  in	
  three	
  dimensions	
  (width,	
  height,	
  depth)	
  
•  Every	
  layer	
  transforms	
  3D	
  input	
  volume	
  to	
  a	
  3D	
  output	
  volume	
  of	
  neuron	
  ac-va-ons	
  
input	
  layer 	
  	
  	
  hidden	
  layer	
  1	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  hidden	
  layer2	
  	
   	
  	
  outpt	
  layer
Convolu)onal	
  layer	
  
•  Convolu-on	
  on	
  a	
  2D	
  grid	
  
16	
  Image	
  source	
  
Convolu)onal	
  layer	
  
18	
  
32x32x3	
  input	
  
5x5x3	
  filter	
  
Filters	
  always	
  extend	
  the	
  full	
  detph	
  
of	
  the	
  input	
  volume	
  
Convolve	
  the	
  filter	
  with	
  the	
  input	
  
i.e.	
  slide	
  over	
  the	
  input	
  spa-ally,	
  
compu-ng	
  dot	
  products	
  
3	
  depth	
  
32	
  width	
  
32	
  height	
  
•  Convolu-on	
  on	
  a	
  volume	
  
Convolu)onal	
  layer	
  
19	
  
Ac-va-on	
  map	
  or	
  
feature	
  map	
  28x28	
  
Convolve	
  (slide)	
  over	
  all	
  
spa-al	
  loca-ons	
  
•  Convolu-on	
  on	
  a	
  volume	
  
32x32x3	
  input	
  
5x5x3	
  filter	
  w
Each	
  number:	
  
the	
  result	
  of	
  the	
  dot	
  product	
  between	
  the	
  filter	
  and	
  a	
  small	
  
5x5x3	
  patch	
  of	
  the	
  input:	
  5x5x3=75-­‐dim	
  dot	
  product	
  +	
  bias	
  	
  	
  	
  	
  	
  
wtx+b
1	
  
28	
  
28	
  
3	
  
32	
  
32	
  
Convolu)onal	
  layer	
  
20	
  
32x32x3	
  input	
  
5x5x3	
  filter
Ac-va-on	
  maps	
  
Convolve	
  (slide)	
  over	
  all	
  
spa-al	
  loca-ons	
  
Consider	
  a	
  second	
  filter	
  
1	
   28	
  
28	
  
Convolu)onal	
  layer	
  
21	
  
Convolu-onal	
  Layer	
  
We	
  stack	
  the	
  maps	
  up	
  to	
  get	
  a	
  new	
  volume	
  of	
  size	
  28x28x6
If	
  we	
  have	
  6	
  5x5x3	
  filters,	
  we	
  get	
  6	
  separate	
  ac-va-on	
  maps
So	
  applying	
  a	
  filterbank	
  to	
  an	
  input	
  (3D	
  matrix)	
  yields	
  a	
  cube-­‐like	
  output,	
  a	
  
3D	
  matrix	
  in	
  which	
  each	
  slice	
  is	
  an	
  output	
  of	
  convolu-on	
  with	
  one	
  filter.	
  
3	
  
32	
  
32	
  
Ac-va-on	
  maps	
  
6	
   28	
  
28	
  
Convolu)onal	
  layer	
  
22	
  
ConvNet	
  is	
  a	
  sequence	
  of	
  Convolu-onal	
  Layers,	
  interspersed	
  with	
  ac-va-on	
  func-ons	
  
and	
  pooling	
  layers	
  (and	
  a	
  small	
  number	
  of	
  fully	
  connected	
  layers)
We	
  add	
  more	
  layers	
  of	
  filters.	
  We	
  apply	
  filters	
  (convolu-ons)	
  to	
  the	
  output	
  volume	
  of	
  
the	
  previous	
  layer.	
  The	
  result	
  of	
  each	
  convolu-on	
  is	
  a	
  slice	
  in	
  the	
  new	
  volume.	
  
S.	
  Credit:	
  Stanford	
  cs231_2017	
  	
  
3	
  
32	
  
32	
  
6	
  
28	
  
28	
  
10	
  
24	
  
24	
  
CONV	
  
ReLU	
  
6	
  filters	
  
5x5x3	
  
CONV	
  
ReLU	
  
10	
  filters	
  
5x5x6	
  
CONV	
  
ReLU	
  
Example:	
  filters	
  and	
  ac)va)on	
  maps	
  
23	
  
Example	
  CNN	
  trained	
  for	
  image	
  recogni-on	
  on	
  
CIFAR	
  dataset	
  
	
  
The	
  network	
  learns	
  features	
  that	
  ac-vate	
  when	
  
they	
  see	
  some	
  specific	
  type	
  of	
  feature	
  at	
  some	
  
spa-al	
  posi-on	
  in	
  the	
  input.	
  Stacking	
  ac-va-on	
  
maps	
  for	
  all	
  filters	
  along	
  depth	
  dimension	
  forms	
  
the	
  full	
  output	
  volume	
  
hkp://cs.stanford.edu/people/karpathy/convnetjs/demo/cifar10.html	
  
Convolu)onal	
  layer	
  
24	
  
stride	
  1:	
  
7x7	
  input	
  (spa-ally)	
  	
  
a	
  3x3	
  filter:	
  5x5	
  output	
  
Hyperparameters:	
  number	
  of	
  filters,	
  filter	
  spa-al	
  extent	
  F,	
  stride	
  S,	
  padding	
  P	
  
	
  
Stride	
  is	
  the	
  number	
  of	
  pixels	
  by	
  which	
  we	
  slide	
  the	
  kernel	
  over	
  the	
  input	
  matrix.	
  
Larger	
  stride	
  produce	
  smaller	
  feature	
  maps.	
  
X X X X X
X X X X X
Convolu)onal	
  layer	
  
25	
  
Hyperparameters:	
  number	
  of	
  filters,	
  filter	
  spa-al	
  extent	
  F,	
  stride	
  S,	
  padding	
  P	
  
	
  
Stride	
  is	
  the	
  number	
  of	
  pixels	
  by	
  which	
  we	
  slide	
  the	
  kernel	
  over	
  the	
  input	
  matrix.	
  
Larger	
  stride	
  produce	
  smaller	
  feature	
  maps.	
  
X X X
stride	
  2:	
  
7x7	
  input	
  (spa-ally)	
  	
  
a	
  3x3	
  filter:	
  3x3	
  output	
  
X X X
Convolu)onal	
  layer	
  
26	
  
No	
  padding	
  (P=0)	
  
Output	
  size:	
  (N-­‐F)/	
  S	
  +	
  1	
  
e.g.	
  N=7,	
  F=3	
  
stride	
  1:	
  (7-­‐3)/1+1	
  =	
  5	
  
stride	
  2:	
  (7-­‐3)/2+1	
  =	
  3	
  	
  
stride	
  3:	
  (7-­‐3)/3+1	
  =	
  2.33	
  	
  	
  	
  not	
  applied	
  
Padding	
  P=1	
  
Output	
  size:	
  (N-­‐F+2P)	
  /	
  S+1	
  
e.g.	
  N=7,	
  F=3	
  ,	
  S=1,	
  pad	
  with	
  1	
  pixel	
  border:	
  output	
  size	
  7x7	
  
	
  
In	
  general,	
  CONV	
  layers	
  use	
  stride	
  1,	
  filters	
  FxF,	
  zero-­‐
padding	
  with	
  P=	
  (F-­‐1)/2	
  to	
  preserve	
  size	
  spa)ally	
  
zero-­‐padding	
  
in	
  the	
  border:	
  
Hyperparameters:	
  number	
  of	
  filters,	
  filter	
  spa-al	
  extent	
  F,	
  stride	
  S,	
  padding	
  P	
  
	
  
Pad	
  the	
  input	
  volume	
  with	
  zeros	
  around	
  the	
  border	
  so	
  that	
  the	
  input	
  and	
  output	
  width	
  and	
  
height	
  are	
  the	
  same	
  
X
0 0 0 0 0 0 0 0 0
0 X 0
0 0
0 0
0 0
0 0
0 0
0 0
0 0 0 0 0 0 0 0 0
1x1	
  convolu)ons	
  
27	
  
1x1	
  conv	
  with	
  64	
  filters	
  
1x1	
  convolu-on	
  layers:	
  used	
  to	
  reduce	
  dimensionality	
  (number	
  of	
  feature	
  maps)
each	
  filter	
  has	
  size	
  1x1x128	
  
and	
  performs	
  a	
  128-­‐dimensional	
  
dot	
  product	
  128	
  
32	
  
32	
  
64	
  
32	
  
32	
  
Example:	
  size,	
  parameters	
  
28	
  
Input	
  volume:	
  32x32x3	
  
10	
  5x5	
  filters	
  with	
  stride	
  1,	
  padding	
  2	
  
S.	
  Credit:	
  Stanford	
  cs231_2017	
  	
  
Output	
  volume	
  size:	
  	
  
(	
  N	
  +	
  2P	
  –	
  F	
  )	
  /	
  S	
  +	
  1	
  
(32+2*2-­‐5)/1+1=	
  32	
  spa-ally	
  so	
  	
  32x32x10	
  
	
  
	
  
Number	
  of	
  parameters	
  in	
  this	
  layer:	
  
each	
  filter	
  has	
  5x5x3	
  +	
  1	
  =76	
  params	
  (	
  +1	
  for	
  bias)	
  
-­‐>	
  	
  76x10	
  =	
  760	
  parameters	
  
Summary:	
  conv	
  layer	
  
To	
  summarize,	
  the	
  Conv	
  Layer	
  
•  Accepts	
  a	
  volume	
  of	
  size	
  W1	
  x	
  H1	
  x	
  D1	
  
•  Requires	
  four	
  hyperparameters:	
  
•  number	
  of	
  filters	
  K	
  
•  kernel	
  size	
  F	
  
•  stride	
  S	
  
•  amount	
  of	
  zero	
  padding	
  P	
  
•  Produces	
  a	
  volume	
  of	
  size	
  W2	
  x	
  H2	
  x	
  D2	
  
•  W2	
  =	
  (W1	
  –	
  F	
  +	
  2P)	
  /	
  S	
  +	
  1	
  
•  H2	
  =	
  (H1	
  –	
  F	
  +	
  2P)	
  /	
  S	
  +	
  1	
  
•  D2	
  =	
  K	
  
•  With	
  parameter	
  sharing,	
  it	
  introduces	
  F.F.D1	
  weights	
  per	
  filter,	
  for	
  a	
  total	
  of	
  (F.F.D1).K	
  
weights	
  and	
  K	
  biases	
  
•  In	
  the	
  output	
  volume,	
  the	
  dth	
  depth	
  slice	
  (of	
  size	
  W2xH2)	
  is	
  the	
  result	
  of	
  performing	
  a	
  valid	
  
convolu-on	
  of	
  the	
  dth	
  filter	
  over	
  the	
  input	
  volume	
  with	
  a	
  stride	
  of	
  S,	
  and	
  then	
  offset	
  by	
  dth	
  bias	
  
29	
  
Common	
  sevngs:	
  
K	
  =	
  powers	
  of	
  2	
  (32,64,128,256)	
  
F=3,	
  S=1,	
  P=1	
  
F=5,	
  S=1,	
  P=2	
  
F=5,	
  S=2,	
  P=?	
  (whatever	
  fits)	
  
F=1,	
  S=1,	
  P=0	
  
S.	
  Credit:	
  Stanford	
  cs231_2017	
  	
  
Ac)va)on	
  func)ons	
  
•  Desirable	
  proper)es:	
  mostly	
  smooth,	
  con-nuous,	
  differen-able,	
  fairly	
  linear	
  
30	
  
Sigmoid	
  
	
  
	
  
	
  
	
  
	
  
tanh	
  
	
  
	
  
	
  
	
  
ReLu	
  	
  	
  	
  	
  
Leaky	
  ReLu	
  
	
  
	
  
	
  
	
  
Maxout	
  
	
  
	
  
	
  
ELU	
  
tanh x =
ex
− e−x
ex
+ e−x
σ (x) =
1
1+ e−x
max(0.1x,x)
max(0,x)
max(w1
T
x + b1
,w2
T
x + b2
)
Ac)va)on	
  func)ons	
  
31	
  
•  Example	
  
Pooling	
  layer	
  
•  Makes	
  the	
  representa-ons	
  smaller	
  and	
  more	
  manageable	
  for	
  later	
  layers	
  
•  Useful	
  to	
  get	
  invariance	
  to	
  small	
  local	
  changes	
  
•  Operates	
  over	
  each	
  ac-va-on	
  map	
  independently	
  
32	
  
pooling	
  
downsampling	
  
180x120	
  
90x60	
  
180x120x64	
   90x60x64	
  
Pooling	
  layer	
  
•  Max	
  pooling	
  
•  Other	
  pooling	
  func-ons:	
  average	
  pooling	
  
33	
  
single	
  depth	
  slice	
  
max	
  pool	
  with	
  2x2	
  filter	
  
and	
  stride	
  2	
  
4	
   1	
   5	
   2	
   2	
   6	
  
1	
   2	
   9	
   0	
   2	
   4	
  
2	
   2	
   6	
   4	
   0	
   2	
  
3	
   1	
   0	
   3	
   3	
   1	
  
4	
   9	
   6	
  
3	
   6	
   3	
  
Pooling	
  layer	
  
•  Example	
  
34	
  
Summary:	
  pooling	
  layer	
  
To	
  summarize,	
  the	
  pooling	
  layer	
  
•  Accepts	
  a	
  volume	
  of	
  size	
  W1xH1xD1	
  
•  Requires	
  two	
  hyperparameters	
  
•  spa)al	
  extent	
  F	
  
•  stride	
  S	
  
•  Produces	
  a	
  volume	
  of	
  size	
  W2xH2xD2	
  
•  W2	
  =	
  (W1-­‐F)	
  /	
  S	
  +	
  1	
  
•  H2	
  =	
  (H1-­‐F)	
  /	
  S	
  +	
  1	
  
•  D2	
  =	
  D1	
  
•  Introduces	
  zero	
  parameters	
  since	
  it	
  computes	
  a	
  fixed	
  func-on	
  of	
  the	
  input	
  
•  Common	
  sevngs:	
  	
  F=2,	
  S=2	
  	
  	
  or	
  	
  F=3,	
  S=2	
  
35	
  
Pros:	
  
-­‐  reduces	
  the	
  number	
  of	
  inputs	
  to	
  the	
  next	
  
layer,	
  allowing	
  us	
  to	
  have	
  more	
  feature	
  maps	
  
-­‐  invariant	
  to	
  small	
  transla-ons	
  of	
  the	
  input	
  
Cons:	
  	
  
-­‐  aher	
  several	
  layers	
  of	
  pooling	
  we	
  have	
  lost	
  
informa-on	
  about	
  precise	
  posi-on	
  of	
  things	
  
Fully	
  connected	
  layer	
  
•  In	
  the	
  end	
  it	
  is	
  common	
  to	
  add	
  one	
  or	
  more	
  fully	
  (or	
  densely	
  )	
  connected	
  layers.	
  	
  
•  Every	
  neuron	
  in	
  the	
  previous	
  layer	
  is	
  connected	
  to	
  every	
  neuron	
  in	
  the	
  next	
  layer	
  (as	
  
in	
  regular	
  neural	
  networks).	
  Ac-va-on	
  is	
  computed	
  as	
  matrix	
  mul-plica-on	
  plus	
  bias	
  
•  At	
  the	
  output,	
  sohmax	
  ac-va-on	
  for	
  classifica-on	
  
36	
  
connec-ons	
  and	
  weights	
  
not	
  shown	
  here	
  
4	
  possible	
  outputs	
  
the	
  output	
  of	
  the	
  last	
  convolu-onal	
  layer	
  is	
  flakened	
  to	
  
a	
  single	
  vector	
  which	
  is	
  input	
  to	
  a	
  fully	
  connected	
  layer	
  
Fully	
  connected	
  layers	
  and	
  Convolu)onal	
  layers	
  
•  A	
  convolu-onal	
  layer	
  can	
  be	
  implemented	
  as	
  a	
  fully	
  connected	
  layer	
  
•  The	
  weight	
  matrix	
  is	
  a	
  large	
  matrix	
  that	
  is	
  mostly	
  zero	
  except	
  for	
  certain	
  
blocks	
  (due	
  to	
  local	
  connec-vity)	
  
37	
  
I	
  input	
  image	
  4x4	
  	
  vectorized	
  to	
  16x1	
  (Iv)	
  
Yv	
  output	
  image	
  4x1	
  (later	
  reshaped	
  2x2)	
  
h	
  3x3	
  kernel;	
  	
  C	
  16x4	
  (weights)	
  
Y= I ∗h YV
= C.IV
x0	
   x1	
   x2	
   x3	
  
x4	
   x5	
   x6	
   x7	
  
x8	
   x9	
   x10	
   x11	
  
x12	
   x13	
   x14	
   x15	
  
w00	
   w01	
   w02	
  
w10	
   w11	
   w12	
  
w20	
   w21	
   w23	
  
y0	
   y1	
  
y2	
   y3	
  
=	
  *	
  
Fully	
  connected	
  layers	
  and	
  Convolu)onal	
  layers	
  
38	
  
Y= I ∗h YV
= C.IV
I	
  input	
  image	
  4x4	
  	
  vectorized	
  to	
  16x1	
  (Iv)	
  
Yv	
  output	
  image	
  4x1	
  (later	
  reshaped	
  2x2)	
  
h	
  3x3	
  kernel;	
  	
  C	
  16x4	
  (weights)	
  
Fully	
  connected	
  layers	
  and	
  Convolu)onal	
  layers	
  
•  Fully	
  connected	
  layers	
  can	
  also	
  be	
  viewed	
  as	
  convolu-ons	
  with	
  kernels	
  that	
  
cover	
  the	
  en-re	
  input	
  region	
  
Example:	
  	
  
	
  A	
  fully	
  convolu-onal	
  layer	
  with	
  K=1024	
  neurons,	
  input	
  volume	
  	
  32x32x512	
  
	
  can	
  be	
  expressed	
  as	
  a	
  convolu-onal	
  layer	
  with	
  	
  
	
   	
  K=1024	
  filters	
  F=32	
  (size	
  of	
  kernel),	
  P=0,	
  S=1	
  
	
   	
  The	
  filter	
  size	
  is	
  exactly	
  the	
  size	
  of	
  the	
  input	
  volume;	
  the	
  output	
  is	
  1x1x1024	
  
	
  
39	
  
32	
  
32	
  
512	
  
F=32	
  
F=32	
  
512	
  
output	
  1x1x1	
  
input	
  32x32x512	
   filter	
  32x32x512	
  
with	
  1024	
  filters	
  
output	
  1x1x1024	
  
Batch	
  normaliza)on	
  layer	
  
•  As	
  learning	
  progresses,	
  the	
  distribu-on	
  of	
  the	
  layer	
  inputs	
  changes	
  due	
  
to	
  parameter	
  updates	
  (	
  internal	
  covariate	
  shih)	
  
•  This	
  can	
  result	
  in	
  most	
  inputs	
  being	
  in	
  the	
  non-­‐linear	
  regime	
  of	
  	
  
	
  the	
  ac-va-on	
  func-on,	
  slowing	
  down	
  learning	
  
	
  
•  Bach	
  normaliza-on	
  is	
  a	
  technique	
  to	
  reduce	
  this	
  effect	
  
•  Explicitly	
  force	
  the	
  layer	
  ac-va-ons	
  to	
  have	
  zero	
  mean	
  and	
  unit	
  variance	
  
w.r.t	
  running	
  batch	
  es-mates	
  
	
  
	
  
•  Adds	
  a	
  learnable	
  scale	
  and	
  bias	
  term	
  to	
  allow	
  the	
  network	
  to	
  s-ll	
  use	
  the	
  
nonlinearity	
  
40	
  	
  Ioffe	
  and	
  Szegedy,	
  2015.	
  “Batch	
  normaliza-on:	
  accelera-ng	
  deep	
  network	
  training	
  by	
  reducing	
  internal	
  covariate	
  shih”	
  
FC	
  /	
  Conv	
  
Batch	
  norm	
  
ReLu	
  
FC	
  /	
  Conv	
  
Batch	
  norm	
  
ReLu	
  
ˆx(k )
=
x(k )
− E(x(k )
)
var(x(k )
)
y(k )
= γ (k )
ˆx(k )
+ β(k )
Upsampling	
  layers:	
  recovering	
  spa)al	
  shape	
  
•  Mo-va-on:	
  seman-c	
  segmenta-on.	
  Make	
  predic-ons	
  for	
  all	
  pixels	
  at	
  once	
  
41	
  
Problem:	
  convolu-ons	
  at	
  original	
  image	
  resolu-on	
  will	
  be	
  very	
  expensive	
  
S.	
  Credit:	
  Stanford	
  cs231_2017	
  	
  
Upsampling	
  layers:	
  recovering	
  spa)al	
  shape	
  
•  Mo-va-on:	
  seman-c	
  segmenta-on.	
  Make	
  predic-ons	
  for	
  all	
  pixels	
  at	
  once	
  
42	
  
Design	
  a	
  network	
  as	
  a	
  sequence	
  of	
  convolu-onal	
  layers	
  with	
  downsampling	
  and	
  upsampling	
  
	
  
Other	
  applica)ons:	
  super-­‐resolu)on,	
  flow	
  es)ma)on,	
  genera)ve	
  modeling	
  
S.	
  Credit:	
  Stanford	
  cs231_2017	
  	
  
Learnable	
  upsampling	
  
•  Recall:	
  3x3	
  convolu-on,	
  stride	
  1,	
  pad	
  1	
  
43	
  S.	
  Credit:	
  Stanford	
  cs231_2017	
  	
  
Learnable	
  upsampling	
  
•  Recall:	
  3x3	
  convolu-on,	
  stride	
  2,	
  pad	
  1	
  
44	
  
Filter	
  moves	
  2	
  pixels	
  in	
  the	
  input	
  
for	
  every	
  one	
  pixel	
  in	
  the	
  output	
  
	
  
Stride	
  gives	
  ra-o	
  between	
  
movement	
  in	
  input	
  and	
  output	
  
S.	
  Credit:	
  Stanford	
  cs231_2017	
  	
  
Learnable	
  upsampling:	
  transposed	
  convolu)on	
  
•  3x3	
  transposed	
  convolu-on,	
  stride	
  2,	
  pad1	
  	
  
45	
  
Sum	
  where	
  
output	
  overlaps	
  
Filter	
  moves	
  2	
  pixels	
  in	
  the	
  output	
  
for	
  every	
  one	
  pixel	
  in	
  the	
  input	
  
	
  
Stride	
  gives	
  ra-o	
  between	
  
movement	
  in	
  input	
  and	
  output	
  
Various	
  names	
  
-­‐transposed	
  convolu-on	
  
-­‐backward	
  strided	
  convolu-on,	
  	
  
-­‐frac-onally	
  strided	
  convolu-on,	
  	
  
-­‐upconvolu-on,	
  
-­‐“deconvolu-on”	
  
S.	
  Credit:	
  Stanford	
  cs231_2017	
  	
  
Some	
  architectures	
  
for	
  Visual	
  Recogni-on	
  
46	
  
ImageNet:	
  ILSVRC	
  
47	
  
Image	
  Classifica-on	
  
1000	
  object	
  classes	
  (categories)	
  
Images:	
  
-­‐  1.2	
  M	
  train	
  
-­‐  100.000	
  test.	
  
Metric:	
  top	
  5	
  error	
  rate	
  (predict	
  5	
  classes)	
  
Large	
  Scale	
  Visual	
  Recogni-on	
  Challenge	
  
www.image-­‐net.org/challenges/LSVRC/	
  
ILSVRC	
  image	
  classifica)on	
  winners	
  
48	
  
AlexNet	
  (2012)	
  
	
  
	
  
•  Similar	
  framework	
  to	
  LeNet:	
  
•  8	
  layers	
  (5	
  convolu-onal,	
  3	
  fully	
  connected)	
  
•  Max	
  pooling,	
  ReLu	
  nonlineari-es	
  
•  650,000	
  units,	
  60	
  million	
  parameters	
  
•  trained	
  on	
  two	
  GPUs	
  (half	
  of	
  the	
  kernels	
  on	
  each	
  GPU)	
  for	
  a	
  week	
  
•  data	
  augmenta-on	
  
•  dropout	
  regulariza-on	
  
49	
  
A.	
  Krizhevsky,	
  I.	
  Sutskever,	
  and	
  G.	
  Hinton,	
  ImageNet	
  Classifica-on	
  with	
  Deep	
  Convolu-onal	
  Neural	
  Networks,	
  NIPS	
  2012	
  
AlexNet	
  (2012)	
  
50	
  
Full	
  AlexNet	
  architecture:	
  
8	
  layers	
  
[227x227x3]	
  INPUT	
  
[55x55x96]	
  CONV1:	
  96	
  11x11	
  filters	
  at	
  stride	
  4,	
  pad	
  0	
  	
  
[27x27x96]	
  MAX	
  POOL1:	
  3x3	
  filters	
  at	
  stride	
  2	
  
[27x27x96]	
  NORM1:	
  Normaliza-on	
  layer	
  
[27x27x256]	
  CONV2:	
  256	
  5x5	
  filters	
  at	
  stride	
  1,	
  pad	
  2	
  	
  
[13x13x256]	
  MAX	
  POOL2:	
  3x3	
  filters	
  at	
  stride	
  2	
  
[13x13x256]	
  NORM2:	
  Normaliza-on	
  layer	
  
[13x13x384]	
  CONV3:	
  384	
  3x3	
  filters	
  at	
  stride	
  1,	
  pad	
  1	
  	
  
[13x13x384]	
  CONV4:	
  384	
  3x3	
  filters	
  at	
  stride	
  1,	
  pad	
  1	
  	
  
[13x13x256]	
  CONV5:	
  256	
  3x3	
  filters	
  at	
  stride	
  1,	
  pad	
  1	
  	
  
[6x6x256]	
  MAX	
  POOL3:	
  3x3	
  filters	
  at	
  stride	
  2	
  
[4096]	
  FC6:	
  4096	
  neurons	
  
[4096]	
  FC7:	
  4096	
  neurons	
  
[1000]	
  FC8:	
  1000	
  neurons	
  (class	
  scores)	
  
Details/Retrospec)ves:	
  	
  
-­‐	
  first	
  use	
  of	
  ReLU	
  
-­‐	
  used	
  Norm	
  layers	
  (not	
  common)	
  
-­‐	
  heavy	
  data	
  augmenta-on	
  
-­‐	
  dropout	
  0.5	
  
-­‐	
  batch	
  size	
  128	
  
-­‐	
  SGD	
  Momentum	
  0.9	
  
-­‐	
  Learning	
  rate	
  1e-­‐2,	
  reduced	
  by	
  10	
  manually	
  
when	
  val	
  accuracy	
  plateaus	
  
-­‐	
  L2	
  weight	
  decay	
  5e-­‐4	
  
ILSVRC	
  2012	
  winner	
  
7	
  CNN	
  ensemble:	
  18.2%	
  -­‐>	
  15.4%	
  
AlexNet	
  (2012)	
  
•  Visualiza-on	
  of	
  the	
  96	
  11x11	
  filters	
  learned	
  by	
  the	
  first	
  layer	
  
51	
  
A.	
  Krizhevsky,	
  I.	
  Sutskever,	
  and	
  G.	
  Hinton,	
  ImageNet	
  Classifica-on	
  with	
  Deep	
  Convolu-onal	
  Neural	
  Networks,	
  NIPS	
  2012	
  
VGGNet-­‐16	
  (2014)	
  
52	
  K.	
  Simonyan	
  and	
  A.	
  Zisserman,	
  Very	
  Deep	
  Convolu-onal	
  Networks	
  for	
  Large-­‐Scale	
  Image	
  Recogni-on,	
  ICLR	
  2015	
  
Visual	
  Geometry	
  Group	
  from	
  Univ.	
  Oxford	
  
Seq.	
  of	
  deeper	
  nets	
  trained	
  progressively	
  
Large	
  recep-ve	
  fields	
  replaced	
  by	
  3x3	
  conv	
  
	
  
Only:	
  
3x3	
  CONV	
  stride	
  1,	
  pad	
  1	
  and	
  
2x2	
  MAX	
  POOL	
  stride	
  2	
  
16	
  -­‐19	
  layers	
  
	
  
Shows	
  that	
  depth	
  is	
  a	
  cri-cal	
  component	
  	
  
for	
  good	
  performance	
  
	
  
TOTAL	
  memory:	
  24M	
  *	
  4	
  bytes	
  ~=	
  	
  
93MB	
  /	
  image	
  	
  (only	
  forward!	
  ~*2	
  for	
  bwd)	
  
most	
  memory	
  is	
  in	
  the	
  early	
  CONV	
  
	
  
TOTAL	
  params:	
  138M	
  parameters	
  
most	
  parameters	
  are	
  in	
  late	
  FC	
  
GoogLeNet	
  (2014)	
  
Mo)va)on:	
  
•  The	
  most	
  straigh•orward	
  way	
  of	
  improving	
  the	
  performance	
  of	
  deep	
  neural	
  networks	
  is	
  by	
  
increasing	
  their	
  size,	
  both	
  depth	
  and	
  width	
  
•  Increasing	
  the	
  network	
  size	
  has	
  two	
  drawbacks:	
  
•  means	
  a	
  larger	
  number	
  or	
  parameters-­‐>	
  prone	
  to	
  overfivng	
  
•  drama-cally	
  increased	
  use	
  of	
  computa-onal	
  resources	
  
•  Goal:	
  increase	
  the	
  depth	
  and	
  width	
  while	
  keeping	
  the	
  computa-onal	
  budget	
  constant	
  
53	
  	
  C.	
  Szegedy	
  et	
  al.,	
  Going	
  deeper	
  with	
  convolu-ons,	
  CVPR	
  2015	
  
Compared	
  to	
  AlexNet	
  
12x	
  less	
  parameters	
  (5M	
  vs	
  60M)	
  
2x	
  more	
  compute	
  6.67%	
  (vs	
  16,4%)	
  
22	
  layers	
  
GoogLeNet	
  (2014)	
  
The	
  Incep-on	
  Module	
  
•  Apply	
  parallel	
  opera-ons	
  on	
  the	
  input	
  from	
  previous	
  layer:	
  	
  
•  mul-ple	
  kernel	
  size	
  for	
  convolu-on	
  (1x1,	
  3x3,	
  5x5)	
  
•  pooling	
  opera-on	
  
•  Concatenate	
  all	
  filter	
  outputs	
  together	
  depth-­‐wise	
  
•  Use	
  1x1	
  convolu-ons	
  for	
  dimensionality	
  reduc-on	
  before	
  expensive	
  convolu-ons	
  
•  Geometry	
  	
  
54	
  
conv	
  ops:	
  
1x1	
  conv,	
  128:	
  	
  
28x28x128x1x1x256	
  
1x1conv,	
  64:	
  28x28x64x1x1x256	
  
3x3	
  conv,	
  192:	
  28x28x192x3x3x64	
  
1x1	
  conv,	
  64:	
  28x28x64x1x1x256	
  
5x5	
  conv,	
  96:	
  28x28x96x5x5x64	
  
1x1	
  conv,	
  64:	
  28x28x64x1x1x256	
  
Total:	
  358M	
  ops	
  
without	
  1x1	
  convolu-ons:	
  
Total:	
  854M	
  ops	
  
GoogLeNet	
  (2014)	
  
Architecture	
  
•  A	
  stem	
  network	
  
•  Stacked	
  incep-on	
  modules	
  
55	
  
Convolu-on	
  
Pooling	
  
Other	
  
GoogLeNet	
  (2014)	
  
56	
  
•  Auxililary	
  classifiers	
  
•  features	
  produced	
  by	
  the	
  layers	
  in	
  the	
  middle	
  of	
  the	
  network	
  should	
  be	
  very	
  discrimina-ve	
  
•  auxiliary	
  classifiers	
  connected	
  to	
  these	
  intermediate	
  layers,	
  discrimina-on	
  in	
  the	
  lower	
  stages	
  in	
  
the	
  classifier	
  was	
  expected	
  
•  during	
  training,	
  their	
  loss	
  gets	
  added	
  to	
  the	
  total	
  loss	
  of	
  the	
  network	
  with	
  a	
  discount	
  weight	
  
(the	
  losses	
  of	
  the	
  aux	
  classifers	
  are	
  weighted	
  by	
  0.3)	
  
•  at	
  inference	
  -me,	
  the	
  auxiliary	
  classifiers	
  are	
  discarded	
  
...and	
  no	
  fully	
  connected	
  layers	
  needed	
  !	
  
Auxiliary	
  classifier	
  
Convolu-on	
  
Pooling	
  
Sohmax	
  
ResNet	
  (2015)	
  
57	
  Kaiming	
  He,	
  Xiangyu	
  Zhang,	
  Shaoqing	
  Ren,	
  and	
  Jian	
  Sun,	
  Deep	
  Residual	
  Learning	
  for	
  Image	
  Recogni-on,	
  CVPR	
  2016	
  (Best	
  Paper)	
  
Mo-va-on	
  
•  Stacking	
  more	
  layers	
  does	
  not	
  mean	
  beker	
  performance	
  
•  with	
  the	
  network	
  depth	
  increasing,	
  accuracy	
  gets	
  saturated	
  and	
  degrades	
  rapidly	
  
•  such	
  degrada-on	
  is	
  not	
  caused	
  by	
  overfivng	
  
ResNet	
  (2015)	
  
58	
  Kaiming	
  He,	
  Xiangyu	
  Zhang,	
  Shaoqing	
  Ren,	
  and	
  Jian	
  Sun,	
  Deep	
  Residual	
  Learning	
  for	
  Image	
  Recogni-on,	
  CVPR	
  2016	
  (Best	
  Paper)	
  
Residual	
  block	
  
•  Hypothesis:	
  the	
  problem	
  is	
  an	
  op-miza-on	
  problem,	
  deeper	
  models	
  are	
  harder	
  to	
  op-mize	
  
•  The	
  deeper	
  model	
  should	
  be	
  able	
  to	
  perform	
  at	
  least	
  as	
  well	
  as	
  the	
  shallower	
  model	
  
•  The	
  added	
  layers	
  are	
  iden-ty	
  mapping,	
  the	
  other	
  layers	
  are	
  copied	
  from	
  the	
  learned	
  shallower	
  model	
  
•  Solu)on:	
  use	
  network	
  layers	
  to	
  fit	
  a	
  residual	
  mapping	
  instead	
  of	
  directly	
  trying	
  to	
  fit	
  a	
  desired	
  
underying	
  model	
  
Plain	
  net	
   Residual	
  net	
  
ResNet	
  (2015)	
  
•  Similar	
  to	
  GoogLeNet,	
  use	
  bokleneck	
  layer	
  to	
  improve	
  efficiency	
  
59	
  
•  Directly	
  performing	
  3x3	
  convolu-ons	
  with	
  256	
  
feature	
  maps	
  at	
  input	
  and	
  output:	
  	
  
256	
  x	
  256	
  x	
  3	
  x	
  3	
  ~	
  600K	
  opera-ons	
  
•  Using	
  1x1	
  convolu-ons	
  to	
  reduce	
  256	
  to	
  64	
  
feature	
  maps,	
  followed	
  by	
  3x3	
  convolu-ons,	
  
followed	
  by	
  1x1	
  convolu-ons	
  to	
  expand	
  back	
  to	
  
256	
  maps:	
  
256	
  x	
  64	
  x	
  1	
  x	
  1	
  ~	
  16K	
  
64	
  x	
  64	
  x	
  3	
  x	
  3	
  ~	
  36K	
  
64	
  x	
  256	
  x	
  1	
  x	
  1	
  ~	
  16K	
  
Total:	
  ~70K	
  
Kaiming	
  He,	
  Xiangyu	
  Zhang,	
  Shaoqing	
  Ren,	
  and	
  Jian	
  Sun,	
  Deep	
  Residual	
  Learning	
  for	
  Image	
  Recogni-on,	
  CVPR	
  2016	
  (Best	
  Paper)	
  
ResNet	
  (2015)	
  
60	
  
ILSVRC	
  2015	
  winner	
  (3.6%	
  top	
  5	
  error)	
  
MSRA:	
  ILSVRC	
  &	
  COCO	
  2015	
  compe--ons	
  
-­‐  ImageNet	
  Classifica-on:	
  “ultra	
  deep”,	
  152-­‐leyers	
  
-­‐  ImageNet	
  Detec-on:	
  16%	
  beker	
  than	
  2nd	
  
-­‐  ImageNet	
  Localiza-on:	
  27%	
  beker	
  than	
  2nd	
  
-­‐  COCO	
  Detec-on:	
  11%	
  beker	
  than	
  2nd	
  
-­‐  COCO	
  Segmenta-on:	
  12%	
  beker	
  than	
  2nd	
  
2-­‐3	
  weeks	
  of	
  training	
  on	
  8	
  GPU	
  machine	
  
at	
  run-me:	
  faster	
  than	
  a	
  VGGNet!	
  
(even	
  though	
  it	
  has	
  8x	
  more	
  layers)	
  
ILSVRC	
  2012-­‐2015	
  summary	
  
61	
  
Team	
   Year	
   Place	
   Error	
  (top-­‐5)	
   External	
  data	
  
SuperVision	
  –	
  Toronto	
  
(AlexNet,	
  7	
  layers)	
  
2012	
  	
   -­‐	
   16.4%	
   no	
  
SuperVision	
   2012	
  	
   1st	
   15.3%	
   ImageNet	
  22k	
  
Clarifai	
  –	
  NYU	
  (7	
  layers)	
   2013	
   -­‐	
   11.7%	
   no	
  
Clarifai	
   2013	
   1st	
   11.2%	
   ImageNet	
  22k	
  
VGG	
  –	
  Oxford	
  (16	
  layers)	
   2014	
   2nd	
   7.32%	
   no	
  
GoogLeNet	
  (19	
  layers)	
   2014	
   1st	
   6.67%	
   no	
  
ResNet	
  (152	
  layers)	
   2015	
   1st	
   3.57%	
  
Human	
  expert*	
   5.1%	
  
hkp://karpathy.github.io/2014/09/02/what-­‐i-­‐learned-­‐from-­‐compe-ng-­‐against-­‐a-­‐convnet-­‐on-­‐imagenet/	
  
Summary	
  
•  Convolu-onal	
  neural	
  networks	
  are	
  a	
  specialized	
  kind	
  of	
  neural	
  network	
  for	
  processing	
  data	
  
that	
  has	
  a	
  known,	
  grid-­‐like	
  topology	
  
•  CNNs	
  leverage	
  these	
  ideas:	
  
•  local	
  connec-vity	
  
•  parameter	
  sharing	
  
•  pooling	
  /	
  subsampling	
  hidden	
  units	
  
•  Layers:	
  convolu-onal,	
  non-­‐linear	
  ac-va-on,	
  pooling,	
  upsampling,	
  batch	
  normaliza-on	
  
•  Architectures	
  for	
  object	
  recogni-on	
  in	
  images	
  
•  LeNet:	
  pioneer	
  net	
  for	
  digit	
  recogni-on	
  
•  AlexNet:	
  smaller	
  compute,	
  s-ll	
  memory	
  heavy,	
  lower	
  accuracy	
  
•  VGG:	
  highest	
  memory,	
  most	
  opera-ons	
  
•  GoogLeNet:	
  most	
  efficient	
  
•  ResNet:	
  moderate	
  effciency	
  depending	
  on	
  model,	
  beker	
  accuracy	
  
•  Incep-on-­‐v4:	
  hybrid	
  of	
  ResNet	
  and	
  incep-on,	
  highest	
  accuracy	
  
62	
  

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
The Perceptron (DLAI D1L2 2017 UPC Deep Learning for Artificial Intelligence)
 
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
Deep Neural Networks (D1L2 Insight@DCU Machine Learning Workshop 2017)
 
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
Optimization for Neural Network Training - Veronica Vilaplana - UPC Barcelona...
 
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
Transfer Learning and Domain Adaptation (DLAI D5L2 2017 UPC Deep Learning for...
 
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
Convolutional Neural Networks (D1L3 2017 UPC Deep Learning for Computer Vision)
 
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
Attention Models (D3L6 2017 UPC Deep Learning for Computer Vision)
 
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
Loss Functions for Deep Learning - Javier Ruiz Hidalgo - UPC Barcelona 2018
 
Modern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentationModern Convolutional Neural Network techniques for image segmentation
Modern Convolutional Neural Network techniques for image segmentation
 
Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks
Skip RNN: Learning to Skip State Updates in Recurrent Neural NetworksSkip RNN: Learning to Skip State Updates in Recurrent Neural Networks
Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks
 
Recurrent Neural Networks (DLAI D7L1 2017 UPC Deep Learning for Artificial In...
Recurrent Neural Networks (DLAI D7L1 2017 UPC Deep Learning for Artificial In...Recurrent Neural Networks (DLAI D7L1 2017 UPC Deep Learning for Artificial In...
Recurrent Neural Networks (DLAI D7L1 2017 UPC Deep Learning for Artificial In...
 
Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018
Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018
Deep Learning without Annotations - Xavier Giro - UPC Barcelona 2018
 
The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)
The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)
The Perceptron (D1L1 Insight@DCU Machine Learning Workshop 2017)
 
Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)
Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)
Perceptrons (D1L2 2017 UPC Deep Learning for Computer Vision)
 
Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018
Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018
Generative Adversarial Networks GAN - Santiago Pascual - UPC Barcelona 2018
 
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
Image Segmentation (D3L1 2017 UPC Deep Learning for Computer Vision)
 
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
Unsupervised Learning (D2L6 2017 UPC Deep Learning for Computer Vision)
 
Lecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural NetworksLecture 7: Recurrent Neural Networks
Lecture 7: Recurrent Neural Networks
 
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
Convolutional Neural Networks - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Tutorial on convolutional neural networks
Tutorial on convolutional neural networksTutorial on convolutional neural networks
Tutorial on convolutional neural networks
 
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
Reinforcement Learning (Reloaded) - Xavier Giró-i-Nieto - UPC Barcelona 2018
 

Similar a Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018

intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptx
ssuser3aa461
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
milad abbasi
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
Pierre de Lacaze
 

Similar a Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018 (20)

intro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptxintro-to-cnn-April_2020.pptx
intro-to-cnn-April_2020.pptx
 
Cnn
CnnCnn
Cnn
 
Convolutional Neural Networks
Convolutional Neural NetworksConvolutional Neural Networks
Convolutional Neural Networks
 
convolutional_neural_networks in deep learning
convolutional_neural_networks in deep learningconvolutional_neural_networks in deep learning
convolutional_neural_networks in deep learning
 
build a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Pythonbuild a Convolutional Neural Network (CNN) using TensorFlow in Python
build a Convolutional Neural Network (CNN) using TensorFlow in Python
 
Deep Learning
Deep LearningDeep Learning
Deep Learning
 
Deep learning (2)
Deep learning (2)Deep learning (2)
Deep learning (2)
 
Convolutional neural networks
Convolutional neural networksConvolutional neural networks
Convolutional neural networks
 
B.tech_project_ppt.pptx
B.tech_project_ppt.pptxB.tech_project_ppt.pptx
B.tech_project_ppt.pptx
 
Hardware Acceleration for Machine Learning
Hardware Acceleration for Machine LearningHardware Acceleration for Machine Learning
Hardware Acceleration for Machine Learning
 
Convolutional neural network
Convolutional neural networkConvolutional neural network
Convolutional neural network
 
Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)Convolutional Neural Networks (CNN)
Convolutional Neural Networks (CNN)
 
CNN_AH.pptx
CNN_AH.pptxCNN_AH.pptx
CNN_AH.pptx
 
CNN_AH.pptx
CNN_AH.pptxCNN_AH.pptx
CNN_AH.pptx
 
CNN.pptx
CNN.pptxCNN.pptx
CNN.pptx
 
A brief introduction to recent segmentation methods
A brief introduction to recent segmentation methodsA brief introduction to recent segmentation methods
A brief introduction to recent segmentation methods
 
Overview of Convolutional Neural Networks
Overview of Convolutional Neural NetworksOverview of Convolutional Neural Networks
Overview of Convolutional Neural Networks
 
cnn-170917175001 (1).pdf
cnn-170917175001 (1).pdfcnn-170917175001 (1).pdf
cnn-170917175001 (1).pdf
 
CONVOLUTIONAL NEURAL NETWORK
CONVOLUTIONAL NEURAL NETWORKCONVOLUTIONAL NEURAL NETWORK
CONVOLUTIONAL NEURAL NETWORK
 
Network Deconvolution review [cdm]
Network Deconvolution review [cdm]Network Deconvolution review [cdm]
Network Deconvolution review [cdm]
 

Más de Universitat Politècnica de Catalunya

Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Universitat Politècnica de Catalunya
 

Más de Universitat Politècnica de Catalunya (20)

Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Deep Generative Learning for All
Deep Generative Learning for AllDeep Generative Learning for All
Deep Generative Learning for All
 
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
The Transformer in Vision | Xavier Giro | Master in Computer Vision Barcelona...
 
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-NietoTowards Sign Language Translation & Production | Xavier Giro-i-Nieto
Towards Sign Language Translation & Production | Xavier Giro-i-Nieto
 
The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021The Transformer - Xavier Giró - UPC Barcelona 2021
The Transformer - Xavier Giró - UPC Barcelona 2021
 
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
Learning Representations for Sign Language Videos - Xavier Giro - NIST TRECVI...
 
Open challenges in sign language translation and production
Open challenges in sign language translation and productionOpen challenges in sign language translation and production
Open challenges in sign language translation and production
 
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in VideosGeneration of Synthetic Referring Expressions for Object Segmentation in Videos
Generation of Synthetic Referring Expressions for Object Segmentation in Videos
 
Discovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in MinecraftDiscovery and Learning of Navigation Goals from Pixels in Minecraft
Discovery and Learning of Navigation Goals from Pixels in Minecraft
 
Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...Learn2Sign : Sign language recognition and translation using human keypoint e...
Learn2Sign : Sign language recognition and translation using human keypoint e...
 
Intepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural NetworksIntepretability / Explainable AI for Deep Neural Networks
Intepretability / Explainable AI for Deep Neural Networks
 
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
Self-Supervised Audio-Visual Learning - Xavier Giro - UPC TelecomBCN Barcelon...
 
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
Attention for Deep Learning - Xavier Giro - UPC TelecomBCN Barcelona 2020
 
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
Generative Adversarial Networks GAN - Xavier Giro - UPC TelecomBCN Barcelona ...
 
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
Q-Learning with a Neural Network - Xavier Giró - UPC Barcelona 2020
 
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
Language and Vision with Deep Learning - Xavier Giró - ACM ICMR 2020 (Tutorial)
 
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
Image Segmentation with Deep Learning - Xavier Giro & Carles Ventura - ISSonD...
 
Curriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object SegmentationCurriculum Learning for Recurrent Video Object Segmentation
Curriculum Learning for Recurrent Video Object Segmentation
 
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
Deep Self-supervised Learning for All - Xavier Giro - X-Europe 2020
 
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
Deep Learning Representations for All - Xavier Giro-i-Nieto - IRI Barcelona 2020
 

Último

👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
karishmasinghjnh
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
amitlee9823
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
amitlee9823
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
amitlee9823
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
amitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
amitlee9823
 

Último (20)

Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
Thane Call Girls 7091864438 Call Girls in Thane Escort service book now -
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Hsr Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Thane West Call On 9920725232 With Body to body massage...
 
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men  🔝Dindigul🔝   Escor...
➥🔝 7737669865 🔝▻ Dindigul Call-girls in Women Seeking Men 🔝Dindigul🔝 Escor...
 
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Surabaya ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
Escorts Service Kumaraswamy Layout ☎ 7737669865☎ Book Your One night Stand (B...
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men  🔝malwa🔝   Escorts Ser...
➥🔝 7737669865 🔝▻ malwa Call-girls in Women Seeking Men 🔝malwa🔝 Escorts Ser...
 

Convolutional Neural Networks - Veronica Vilaplana - UPC Barcelona 2018

  • 1. [course  site]   Verónica Vilaplana veronica.vilaplana@upc.edu Associate Professor Universitat Politecnica de Catalunya Technical University of Catalonia Convolutional Neural Networks Day 4 Lecture 1 #DLUPC
  • 2. Index   •  Mo-va-on:   •  Local  connec-vity   •  Parameter  sharing   •  Pooling  and  subsampling   •  Layers   •  Convolu-onal   •  Pooling   •  Fully  connected   •  Ac-va-on  func-ons   •  Batch  normaliza-on   •  Upsampling   •  Examples   2  
  • 4. Neural  networks  for  visual  data   •  Example:  Image  recogni)on   •  Given  some  input  image,  iden-fy  which  object  it  contains       4   sun  flower   Caltech101  dataset   image  size  150x112   neurons  connected  to  16800  inputs  
  • 5. Neural  networks  for  visual  data     •  We  can  design  neural  networks  that  are  specifically  adapted  for  such  problems   •  must  deal  with  very  high-­‐dimensional  inputs   •  150  x  112  pixels  =  16800  inputs,  or  3  x  16800  if  RGB  pixels   •  can  exploit  the  2D  topology  of  pixels  (or  3D  for  video  data)   •  can  build  in  invariance  to  certain  varia-ons  we  can  expect   •  transla-ons,  illumina-on,  etc.   •  Convolu)onal  networks  are  a  specialized  kind  of  neural  network  for  processing  data   that  has  a  known,  grid-­‐like  topology.  They  leverage  these  ideas:   •  local  connec-vity   •  parameter  sharing   •  pooling  /  subsampling  hidden  units   5  S.  Credit:  H.  Larochelle  
  • 6. Convolu)onal  neural  networks   Local  connec)vity   •  First  idea:  use  a  local  connec-vity  of  hidden  units   •  each  hidden  unit  is  connected  only  to  a                subregion  (patch)  of  the  input  image:  recep)ve  field   •  it  is  connected  to  all  channels   •  1  if  greyscale  image   •  3  (R,  G,  B)  for  color  image   •  …   •  Solves  the  following  problems:   •  fully  connected  hidden  layer  would  have                an  unmanageable  number  of  parameters   •  compu-ng  the  linear  ac-va-ons  of  the                    hidden  units  would  be  very  expensive   6   S.  Credit:  H.  Larochelle  
  • 7. Convolu)onal  neural  networks   Parameter  sharing   •  Second  idea:  share  matrix  of  parameters  across  certain  units   •  units  organized  into  the  same  ‘‘feature  map’’  share  parameters   •  hidden  units  within  a  feature  map  cover  different  posi-ons  in  the  image   •  Solves  the  following  problems:   •  reduces  even  more  the  number  of  parameters   •  will  extract  the  same  features  at  every  posi-on  (features  are  ‘‘equivariant’’)   7   Wij  is  the  matrix   connec-ng  the  ith   input  channel   with  the  jth   feature  map   S.  Credit:  H.  Larochelle  
  • 8. Convolu)onal  neural  networks   Parameter  sharing     •  Each  feature  map  forms  a  2D  grid  of  features   •  can  be  computed  with  a  discrete  convolu)on  of  a  kernel  matrix  kij      which  is  the  hidden  weights  matrix  Wij  with  its  rows  and  columns  flipped   8   Convolu-ons   Input  image   Feature  Maps  yj = f ( kij ∗ xi ) i ∑ xi  ith  channel  of  input,  yj  hidden  layer  
  • 9. Convolu)onal  neural  networks   •  Convolu-on  as  feature  extrac-on:    applying  a  filterbank   •  but  filters  are  learned   9   Input   Feature  Map   . . .
  • 10. Convolu)onal  neural  networks   Pooling  and  subsampling   •  Third  idea:  pool  hidden  units  in  same  neighborhood   •  pooling  is  performed  in  non-­‐overlapping  neighborhoods  (subsampling)   •  an  alterna-ve  to  ‘‘max’’  pooling  is  ‘‘average’’  pooling   •  pooling  reduces  dimensionality  and  provides  invariance  to  small  local  changes   10   Max   yi ( j,k) = max p,q xi ( j + p,k + q)
  • 11. Convolu)onal  Neural  Networks   •  Convolu)onal  neural  networks  alternate  between  convolu)onal  layers  (followed  by   a  nonlinearity)  and  pooling  layers  (basic  architecture)   •  For  recogni-on:  output  layer  is  a  regular,  fully  connected  layer  with  sohmax  non-­‐ linearity   •  output  provides  an  es-mate  of  the  condi-onal  probability  of  each  class   •  The  network  is  trained  by  stochas-c  gradient  descent  (&  variants)   •  backpropaga-on  is  used  similarly  as  in  a  fully  connected  network   11   train  the  weights  of  filters  
  • 12. Convolu)onal  Neural  Networks   12   CNN=  learning   hierarchical   representa)ons  with   increasing  levels  of   abstrac)on     End-­‐to-­‐end  training:   joint  op-miza-on  of   features  and  classifier   Fig.  Credit:  DLBook  
  • 13. Example:  LeNet-­‐5     13   •  LeCun  et  al.,  1998   MNIST  digit  classifica-on  problem   handwriken  digits   60,000  training  examples   10,000  test  samples   10  classes   28x28  grayscale  images   Y.  LeCun,  L.  Bokou,  Y.  Bengio,  and  P.  Haffner,  Gradient-­‐based  learning  applied  to  document  recogni-on,  1998.   Conv  filters  were  5x5,  applied  at  stride  1   Sigmoid  or  tanh  nonlinearity   Subsampling  (average  pooling)  layers  were  2x2  applied  at  stride  2   Fully  connected  layers  at  the  end   i.e.  architecture  is  [CONV-­‐POOL-­‐CONV-­‐POOL-­‐FC-­‐FC]  
  • 15. Convolu)onal  Neural  Networks   15   A  regular  3-­‐layer  Neural  Network A  ConvNet  with  3  layers   •  In  ConvNets  inputs  are  ‘images’  (architecture  is  constrained)   •  A  ConvNet  arranges  neurons  in  three  dimensions  (width,  height,  depth)   •  Every  layer  transforms  3D  input  volume  to  a  3D  output  volume  of  neuron  ac-va-ons   input  layer      hidden  layer  1                        hidden  layer2        outpt  layer
  • 16. Convolu)onal  layer   •  Convolu-on  on  a  2D  grid   16  Image  source  
  • 17. Convolu)onal  layer   18   32x32x3  input   5x5x3  filter   Filters  always  extend  the  full  detph   of  the  input  volume   Convolve  the  filter  with  the  input   i.e.  slide  over  the  input  spa-ally,   compu-ng  dot  products   3  depth   32  width   32  height   •  Convolu-on  on  a  volume  
  • 18. Convolu)onal  layer   19   Ac-va-on  map  or   feature  map  28x28   Convolve  (slide)  over  all   spa-al  loca-ons   •  Convolu-on  on  a  volume   32x32x3  input   5x5x3  filter  w Each  number:   the  result  of  the  dot  product  between  the  filter  and  a  small   5x5x3  patch  of  the  input:  5x5x3=75-­‐dim  dot  product  +  bias             wtx+b 1   28   28   3   32   32  
  • 19. Convolu)onal  layer   20   32x32x3  input   5x5x3  filter Ac-va-on  maps   Convolve  (slide)  over  all   spa-al  loca-ons   Consider  a  second  filter   1   28   28  
  • 20. Convolu)onal  layer   21   Convolu-onal  Layer   We  stack  the  maps  up  to  get  a  new  volume  of  size  28x28x6 If  we  have  6  5x5x3  filters,  we  get  6  separate  ac-va-on  maps So  applying  a  filterbank  to  an  input  (3D  matrix)  yields  a  cube-­‐like  output,  a   3D  matrix  in  which  each  slice  is  an  output  of  convolu-on  with  one  filter.   3   32   32   Ac-va-on  maps   6   28   28  
  • 21. Convolu)onal  layer   22   ConvNet  is  a  sequence  of  Convolu-onal  Layers,  interspersed  with  ac-va-on  func-ons   and  pooling  layers  (and  a  small  number  of  fully  connected  layers) We  add  more  layers  of  filters.  We  apply  filters  (convolu-ons)  to  the  output  volume  of   the  previous  layer.  The  result  of  each  convolu-on  is  a  slice  in  the  new  volume.   S.  Credit:  Stanford  cs231_2017     3   32   32   6   28   28   10   24   24   CONV   ReLU   6  filters   5x5x3   CONV   ReLU   10  filters   5x5x6   CONV   ReLU  
  • 22. Example:  filters  and  ac)va)on  maps   23   Example  CNN  trained  for  image  recogni-on  on   CIFAR  dataset     The  network  learns  features  that  ac-vate  when   they  see  some  specific  type  of  feature  at  some   spa-al  posi-on  in  the  input.  Stacking  ac-va-on   maps  for  all  filters  along  depth  dimension  forms   the  full  output  volume   hkp://cs.stanford.edu/people/karpathy/convnetjs/demo/cifar10.html  
  • 23. Convolu)onal  layer   24   stride  1:   7x7  input  (spa-ally)     a  3x3  filter:  5x5  output   Hyperparameters:  number  of  filters,  filter  spa-al  extent  F,  stride  S,  padding  P     Stride  is  the  number  of  pixels  by  which  we  slide  the  kernel  over  the  input  matrix.   Larger  stride  produce  smaller  feature  maps.   X X X X X X X X X X
  • 24. Convolu)onal  layer   25   Hyperparameters:  number  of  filters,  filter  spa-al  extent  F,  stride  S,  padding  P     Stride  is  the  number  of  pixels  by  which  we  slide  the  kernel  over  the  input  matrix.   Larger  stride  produce  smaller  feature  maps.   X X X stride  2:   7x7  input  (spa-ally)     a  3x3  filter:  3x3  output   X X X
  • 25. Convolu)onal  layer   26   No  padding  (P=0)   Output  size:  (N-­‐F)/  S  +  1   e.g.  N=7,  F=3   stride  1:  (7-­‐3)/1+1  =  5   stride  2:  (7-­‐3)/2+1  =  3     stride  3:  (7-­‐3)/3+1  =  2.33        not  applied   Padding  P=1   Output  size:  (N-­‐F+2P)  /  S+1   e.g.  N=7,  F=3  ,  S=1,  pad  with  1  pixel  border:  output  size  7x7     In  general,  CONV  layers  use  stride  1,  filters  FxF,  zero-­‐ padding  with  P=  (F-­‐1)/2  to  preserve  size  spa)ally   zero-­‐padding   in  the  border:   Hyperparameters:  number  of  filters,  filter  spa-al  extent  F,  stride  S,  padding  P     Pad  the  input  volume  with  zeros  around  the  border  so  that  the  input  and  output  width  and   height  are  the  same   X 0 0 0 0 0 0 0 0 0 0 X 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
  • 26. 1x1  convolu)ons   27   1x1  conv  with  64  filters   1x1  convolu-on  layers:  used  to  reduce  dimensionality  (number  of  feature  maps) each  filter  has  size  1x1x128   and  performs  a  128-­‐dimensional   dot  product  128   32   32   64   32   32  
  • 27. Example:  size,  parameters   28   Input  volume:  32x32x3   10  5x5  filters  with  stride  1,  padding  2   S.  Credit:  Stanford  cs231_2017     Output  volume  size:     (  N  +  2P  –  F  )  /  S  +  1   (32+2*2-­‐5)/1+1=  32  spa-ally  so    32x32x10       Number  of  parameters  in  this  layer:   each  filter  has  5x5x3  +  1  =76  params  (  +1  for  bias)   -­‐>    76x10  =  760  parameters  
  • 28. Summary:  conv  layer   To  summarize,  the  Conv  Layer   •  Accepts  a  volume  of  size  W1  x  H1  x  D1   •  Requires  four  hyperparameters:   •  number  of  filters  K   •  kernel  size  F   •  stride  S   •  amount  of  zero  padding  P   •  Produces  a  volume  of  size  W2  x  H2  x  D2   •  W2  =  (W1  –  F  +  2P)  /  S  +  1   •  H2  =  (H1  –  F  +  2P)  /  S  +  1   •  D2  =  K   •  With  parameter  sharing,  it  introduces  F.F.D1  weights  per  filter,  for  a  total  of  (F.F.D1).K   weights  and  K  biases   •  In  the  output  volume,  the  dth  depth  slice  (of  size  W2xH2)  is  the  result  of  performing  a  valid   convolu-on  of  the  dth  filter  over  the  input  volume  with  a  stride  of  S,  and  then  offset  by  dth  bias   29   Common  sevngs:   K  =  powers  of  2  (32,64,128,256)   F=3,  S=1,  P=1   F=5,  S=1,  P=2   F=5,  S=2,  P=?  (whatever  fits)   F=1,  S=1,  P=0   S.  Credit:  Stanford  cs231_2017    
  • 29. Ac)va)on  func)ons   •  Desirable  proper)es:  mostly  smooth,  con-nuous,  differen-able,  fairly  linear   30   Sigmoid             tanh           ReLu           Leaky  ReLu           Maxout         ELU   tanh x = ex − e−x ex + e−x σ (x) = 1 1+ e−x max(0.1x,x) max(0,x) max(w1 T x + b1 ,w2 T x + b2 )
  • 30. Ac)va)on  func)ons   31   •  Example  
  • 31. Pooling  layer   •  Makes  the  representa-ons  smaller  and  more  manageable  for  later  layers   •  Useful  to  get  invariance  to  small  local  changes   •  Operates  over  each  ac-va-on  map  independently   32   pooling   downsampling   180x120   90x60   180x120x64   90x60x64  
  • 32. Pooling  layer   •  Max  pooling   •  Other  pooling  func-ons:  average  pooling   33   single  depth  slice   max  pool  with  2x2  filter   and  stride  2   4   1   5   2   2   6   1   2   9   0   2   4   2   2   6   4   0   2   3   1   0   3   3   1   4   9   6   3   6   3  
  • 33. Pooling  layer   •  Example   34  
  • 34. Summary:  pooling  layer   To  summarize,  the  pooling  layer   •  Accepts  a  volume  of  size  W1xH1xD1   •  Requires  two  hyperparameters   •  spa)al  extent  F   •  stride  S   •  Produces  a  volume  of  size  W2xH2xD2   •  W2  =  (W1-­‐F)  /  S  +  1   •  H2  =  (H1-­‐F)  /  S  +  1   •  D2  =  D1   •  Introduces  zero  parameters  since  it  computes  a  fixed  func-on  of  the  input   •  Common  sevngs:    F=2,  S=2      or    F=3,  S=2   35   Pros:   -­‐  reduces  the  number  of  inputs  to  the  next   layer,  allowing  us  to  have  more  feature  maps   -­‐  invariant  to  small  transla-ons  of  the  input   Cons:     -­‐  aher  several  layers  of  pooling  we  have  lost   informa-on  about  precise  posi-on  of  things  
  • 35. Fully  connected  layer   •  In  the  end  it  is  common  to  add  one  or  more  fully  (or  densely  )  connected  layers.     •  Every  neuron  in  the  previous  layer  is  connected  to  every  neuron  in  the  next  layer  (as   in  regular  neural  networks).  Ac-va-on  is  computed  as  matrix  mul-plica-on  plus  bias   •  At  the  output,  sohmax  ac-va-on  for  classifica-on   36   connec-ons  and  weights   not  shown  here   4  possible  outputs   the  output  of  the  last  convolu-onal  layer  is  flakened  to   a  single  vector  which  is  input  to  a  fully  connected  layer  
  • 36. Fully  connected  layers  and  Convolu)onal  layers   •  A  convolu-onal  layer  can  be  implemented  as  a  fully  connected  layer   •  The  weight  matrix  is  a  large  matrix  that  is  mostly  zero  except  for  certain   blocks  (due  to  local  connec-vity)   37   I  input  image  4x4    vectorized  to  16x1  (Iv)   Yv  output  image  4x1  (later  reshaped  2x2)   h  3x3  kernel;    C  16x4  (weights)   Y= I ∗h YV = C.IV x0   x1   x2   x3   x4   x5   x6   x7   x8   x9   x10   x11   x12   x13   x14   x15   w00   w01   w02   w10   w11   w12   w20   w21   w23   y0   y1   y2   y3   =  *  
  • 37. Fully  connected  layers  and  Convolu)onal  layers   38   Y= I ∗h YV = C.IV I  input  image  4x4    vectorized  to  16x1  (Iv)   Yv  output  image  4x1  (later  reshaped  2x2)   h  3x3  kernel;    C  16x4  (weights)  
  • 38. Fully  connected  layers  and  Convolu)onal  layers   •  Fully  connected  layers  can  also  be  viewed  as  convolu-ons  with  kernels  that   cover  the  en-re  input  region   Example:      A  fully  convolu-onal  layer  with  K=1024  neurons,  input  volume    32x32x512    can  be  expressed  as  a  convolu-onal  layer  with        K=1024  filters  F=32  (size  of  kernel),  P=0,  S=1      The  filter  size  is  exactly  the  size  of  the  input  volume;  the  output  is  1x1x1024     39   32   32   512   F=32   F=32   512   output  1x1x1   input  32x32x512   filter  32x32x512   with  1024  filters   output  1x1x1024  
  • 39. Batch  normaliza)on  layer   •  As  learning  progresses,  the  distribu-on  of  the  layer  inputs  changes  due   to  parameter  updates  (  internal  covariate  shih)   •  This  can  result  in  most  inputs  being  in  the  non-­‐linear  regime  of      the  ac-va-on  func-on,  slowing  down  learning     •  Bach  normaliza-on  is  a  technique  to  reduce  this  effect   •  Explicitly  force  the  layer  ac-va-ons  to  have  zero  mean  and  unit  variance   w.r.t  running  batch  es-mates       •  Adds  a  learnable  scale  and  bias  term  to  allow  the  network  to  s-ll  use  the   nonlinearity   40    Ioffe  and  Szegedy,  2015.  “Batch  normaliza-on:  accelera-ng  deep  network  training  by  reducing  internal  covariate  shih”   FC  /  Conv   Batch  norm   ReLu   FC  /  Conv   Batch  norm   ReLu   ˆx(k ) = x(k ) − E(x(k ) ) var(x(k ) ) y(k ) = γ (k ) ˆx(k ) + β(k )
  • 40. Upsampling  layers:  recovering  spa)al  shape   •  Mo-va-on:  seman-c  segmenta-on.  Make  predic-ons  for  all  pixels  at  once   41   Problem:  convolu-ons  at  original  image  resolu-on  will  be  very  expensive   S.  Credit:  Stanford  cs231_2017    
  • 41. Upsampling  layers:  recovering  spa)al  shape   •  Mo-va-on:  seman-c  segmenta-on.  Make  predic-ons  for  all  pixels  at  once   42   Design  a  network  as  a  sequence  of  convolu-onal  layers  with  downsampling  and  upsampling     Other  applica)ons:  super-­‐resolu)on,  flow  es)ma)on,  genera)ve  modeling   S.  Credit:  Stanford  cs231_2017    
  • 42. Learnable  upsampling   •  Recall:  3x3  convolu-on,  stride  1,  pad  1   43  S.  Credit:  Stanford  cs231_2017    
  • 43. Learnable  upsampling   •  Recall:  3x3  convolu-on,  stride  2,  pad  1   44   Filter  moves  2  pixels  in  the  input   for  every  one  pixel  in  the  output     Stride  gives  ra-o  between   movement  in  input  and  output   S.  Credit:  Stanford  cs231_2017    
  • 44. Learnable  upsampling:  transposed  convolu)on   •  3x3  transposed  convolu-on,  stride  2,  pad1     45   Sum  where   output  overlaps   Filter  moves  2  pixels  in  the  output   for  every  one  pixel  in  the  input     Stride  gives  ra-o  between   movement  in  input  and  output   Various  names   -­‐transposed  convolu-on   -­‐backward  strided  convolu-on,     -­‐frac-onally  strided  convolu-on,     -­‐upconvolu-on,   -­‐“deconvolu-on”   S.  Credit:  Stanford  cs231_2017    
  • 45. Some  architectures   for  Visual  Recogni-on   46  
  • 46. ImageNet:  ILSVRC   47   Image  Classifica-on   1000  object  classes  (categories)   Images:   -­‐  1.2  M  train   -­‐  100.000  test.   Metric:  top  5  error  rate  (predict  5  classes)   Large  Scale  Visual  Recogni-on  Challenge   www.image-­‐net.org/challenges/LSVRC/  
  • 47. ILSVRC  image  classifica)on  winners   48  
  • 48. AlexNet  (2012)       •  Similar  framework  to  LeNet:   •  8  layers  (5  convolu-onal,  3  fully  connected)   •  Max  pooling,  ReLu  nonlineari-es   •  650,000  units,  60  million  parameters   •  trained  on  two  GPUs  (half  of  the  kernels  on  each  GPU)  for  a  week   •  data  augmenta-on   •  dropout  regulariza-on   49   A.  Krizhevsky,  I.  Sutskever,  and  G.  Hinton,  ImageNet  Classifica-on  with  Deep  Convolu-onal  Neural  Networks,  NIPS  2012  
  • 49. AlexNet  (2012)   50   Full  AlexNet  architecture:   8  layers   [227x227x3]  INPUT   [55x55x96]  CONV1:  96  11x11  filters  at  stride  4,  pad  0     [27x27x96]  MAX  POOL1:  3x3  filters  at  stride  2   [27x27x96]  NORM1:  Normaliza-on  layer   [27x27x256]  CONV2:  256  5x5  filters  at  stride  1,  pad  2     [13x13x256]  MAX  POOL2:  3x3  filters  at  stride  2   [13x13x256]  NORM2:  Normaliza-on  layer   [13x13x384]  CONV3:  384  3x3  filters  at  stride  1,  pad  1     [13x13x384]  CONV4:  384  3x3  filters  at  stride  1,  pad  1     [13x13x256]  CONV5:  256  3x3  filters  at  stride  1,  pad  1     [6x6x256]  MAX  POOL3:  3x3  filters  at  stride  2   [4096]  FC6:  4096  neurons   [4096]  FC7:  4096  neurons   [1000]  FC8:  1000  neurons  (class  scores)   Details/Retrospec)ves:     -­‐  first  use  of  ReLU   -­‐  used  Norm  layers  (not  common)   -­‐  heavy  data  augmenta-on   -­‐  dropout  0.5   -­‐  batch  size  128   -­‐  SGD  Momentum  0.9   -­‐  Learning  rate  1e-­‐2,  reduced  by  10  manually   when  val  accuracy  plateaus   -­‐  L2  weight  decay  5e-­‐4   ILSVRC  2012  winner   7  CNN  ensemble:  18.2%  -­‐>  15.4%  
  • 50. AlexNet  (2012)   •  Visualiza-on  of  the  96  11x11  filters  learned  by  the  first  layer   51   A.  Krizhevsky,  I.  Sutskever,  and  G.  Hinton,  ImageNet  Classifica-on  with  Deep  Convolu-onal  Neural  Networks,  NIPS  2012  
  • 51. VGGNet-­‐16  (2014)   52  K.  Simonyan  and  A.  Zisserman,  Very  Deep  Convolu-onal  Networks  for  Large-­‐Scale  Image  Recogni-on,  ICLR  2015   Visual  Geometry  Group  from  Univ.  Oxford   Seq.  of  deeper  nets  trained  progressively   Large  recep-ve  fields  replaced  by  3x3  conv     Only:   3x3  CONV  stride  1,  pad  1  and   2x2  MAX  POOL  stride  2   16  -­‐19  layers     Shows  that  depth  is  a  cri-cal  component     for  good  performance     TOTAL  memory:  24M  *  4  bytes  ~=     93MB  /  image    (only  forward!  ~*2  for  bwd)   most  memory  is  in  the  early  CONV     TOTAL  params:  138M  parameters   most  parameters  are  in  late  FC  
  • 52. GoogLeNet  (2014)   Mo)va)on:   •  The  most  straigh•orward  way  of  improving  the  performance  of  deep  neural  networks  is  by   increasing  their  size,  both  depth  and  width   •  Increasing  the  network  size  has  two  drawbacks:   •  means  a  larger  number  or  parameters-­‐>  prone  to  overfivng   •  drama-cally  increased  use  of  computa-onal  resources   •  Goal:  increase  the  depth  and  width  while  keeping  the  computa-onal  budget  constant   53    C.  Szegedy  et  al.,  Going  deeper  with  convolu-ons,  CVPR  2015   Compared  to  AlexNet   12x  less  parameters  (5M  vs  60M)   2x  more  compute  6.67%  (vs  16,4%)   22  layers  
  • 53. GoogLeNet  (2014)   The  Incep-on  Module   •  Apply  parallel  opera-ons  on  the  input  from  previous  layer:     •  mul-ple  kernel  size  for  convolu-on  (1x1,  3x3,  5x5)   •  pooling  opera-on   •  Concatenate  all  filter  outputs  together  depth-­‐wise   •  Use  1x1  convolu-ons  for  dimensionality  reduc-on  before  expensive  convolu-ons   •  Geometry     54   conv  ops:   1x1  conv,  128:     28x28x128x1x1x256   1x1conv,  64:  28x28x64x1x1x256   3x3  conv,  192:  28x28x192x3x3x64   1x1  conv,  64:  28x28x64x1x1x256   5x5  conv,  96:  28x28x96x5x5x64   1x1  conv,  64:  28x28x64x1x1x256   Total:  358M  ops   without  1x1  convolu-ons:   Total:  854M  ops  
  • 54. GoogLeNet  (2014)   Architecture   •  A  stem  network   •  Stacked  incep-on  modules   55   Convolu-on   Pooling   Other  
  • 55. GoogLeNet  (2014)   56   •  Auxililary  classifiers   •  features  produced  by  the  layers  in  the  middle  of  the  network  should  be  very  discrimina-ve   •  auxiliary  classifiers  connected  to  these  intermediate  layers,  discrimina-on  in  the  lower  stages  in   the  classifier  was  expected   •  during  training,  their  loss  gets  added  to  the  total  loss  of  the  network  with  a  discount  weight   (the  losses  of  the  aux  classifers  are  weighted  by  0.3)   •  at  inference  -me,  the  auxiliary  classifiers  are  discarded   ...and  no  fully  connected  layers  needed  !   Auxiliary  classifier   Convolu-on   Pooling   Sohmax  
  • 56. ResNet  (2015)   57  Kaiming  He,  Xiangyu  Zhang,  Shaoqing  Ren,  and  Jian  Sun,  Deep  Residual  Learning  for  Image  Recogni-on,  CVPR  2016  (Best  Paper)   Mo-va-on   •  Stacking  more  layers  does  not  mean  beker  performance   •  with  the  network  depth  increasing,  accuracy  gets  saturated  and  degrades  rapidly   •  such  degrada-on  is  not  caused  by  overfivng  
  • 57. ResNet  (2015)   58  Kaiming  He,  Xiangyu  Zhang,  Shaoqing  Ren,  and  Jian  Sun,  Deep  Residual  Learning  for  Image  Recogni-on,  CVPR  2016  (Best  Paper)   Residual  block   •  Hypothesis:  the  problem  is  an  op-miza-on  problem,  deeper  models  are  harder  to  op-mize   •  The  deeper  model  should  be  able  to  perform  at  least  as  well  as  the  shallower  model   •  The  added  layers  are  iden-ty  mapping,  the  other  layers  are  copied  from  the  learned  shallower  model   •  Solu)on:  use  network  layers  to  fit  a  residual  mapping  instead  of  directly  trying  to  fit  a  desired   underying  model   Plain  net   Residual  net  
  • 58. ResNet  (2015)   •  Similar  to  GoogLeNet,  use  bokleneck  layer  to  improve  efficiency   59   •  Directly  performing  3x3  convolu-ons  with  256   feature  maps  at  input  and  output:     256  x  256  x  3  x  3  ~  600K  opera-ons   •  Using  1x1  convolu-ons  to  reduce  256  to  64   feature  maps,  followed  by  3x3  convolu-ons,   followed  by  1x1  convolu-ons  to  expand  back  to   256  maps:   256  x  64  x  1  x  1  ~  16K   64  x  64  x  3  x  3  ~  36K   64  x  256  x  1  x  1  ~  16K   Total:  ~70K   Kaiming  He,  Xiangyu  Zhang,  Shaoqing  Ren,  and  Jian  Sun,  Deep  Residual  Learning  for  Image  Recogni-on,  CVPR  2016  (Best  Paper)  
  • 59. ResNet  (2015)   60   ILSVRC  2015  winner  (3.6%  top  5  error)   MSRA:  ILSVRC  &  COCO  2015  compe--ons   -­‐  ImageNet  Classifica-on:  “ultra  deep”,  152-­‐leyers   -­‐  ImageNet  Detec-on:  16%  beker  than  2nd   -­‐  ImageNet  Localiza-on:  27%  beker  than  2nd   -­‐  COCO  Detec-on:  11%  beker  than  2nd   -­‐  COCO  Segmenta-on:  12%  beker  than  2nd   2-­‐3  weeks  of  training  on  8  GPU  machine   at  run-me:  faster  than  a  VGGNet!   (even  though  it  has  8x  more  layers)  
  • 60. ILSVRC  2012-­‐2015  summary   61   Team   Year   Place   Error  (top-­‐5)   External  data   SuperVision  –  Toronto   (AlexNet,  7  layers)   2012     -­‐   16.4%   no   SuperVision   2012     1st   15.3%   ImageNet  22k   Clarifai  –  NYU  (7  layers)   2013   -­‐   11.7%   no   Clarifai   2013   1st   11.2%   ImageNet  22k   VGG  –  Oxford  (16  layers)   2014   2nd   7.32%   no   GoogLeNet  (19  layers)   2014   1st   6.67%   no   ResNet  (152  layers)   2015   1st   3.57%   Human  expert*   5.1%   hkp://karpathy.github.io/2014/09/02/what-­‐i-­‐learned-­‐from-­‐compe-ng-­‐against-­‐a-­‐convnet-­‐on-­‐imagenet/  
  • 61. Summary   •  Convolu-onal  neural  networks  are  a  specialized  kind  of  neural  network  for  processing  data   that  has  a  known,  grid-­‐like  topology   •  CNNs  leverage  these  ideas:   •  local  connec-vity   •  parameter  sharing   •  pooling  /  subsampling  hidden  units   •  Layers:  convolu-onal,  non-­‐linear  ac-va-on,  pooling,  upsampling,  batch  normaliza-on   •  Architectures  for  object  recogni-on  in  images   •  LeNet:  pioneer  net  for  digit  recogni-on   •  AlexNet:  smaller  compute,  s-ll  memory  heavy,  lower  accuracy   •  VGG:  highest  memory,  most  opera-ons   •  GoogLeNet:  most  efficient   •  ResNet:  moderate  effciency  depending  on  model,  beker  accuracy   •  Incep-on-­‐v4:  hybrid  of  ResNet  and  incep-on,  highest  accuracy   62