Quoc le tera-scale deep learning

Tera-scale deep learning
Quoc
V.
Le

Stanford
University
and
Google

Joint
work
with

Kai
Chen
Greg
Corrado
Jeﬀ
Dean
MaQhieu
Devin

Rajat
Monga
Andrew
Ng
Marc Aurelio
Paul
Tucker
Ke
Yang

Ranzato

Samy
Bengio,
Zhenghao
Chen,
Tom
Dean,
Pangwei
Koh,

AddiNonal
Mark
Mao,
Jiquan
Ngiam,
Patrick
Nguyen,
Andrew
Saxe,

Thanks:
Mark
Segal,
Jon
Shlens,

Vincent
Vanhouke,

Xiaoyun
Wu,

Peng
Xe,
Serena
Yeung,
Will
Zou

Machine
Learning
successes

Face
recogniNon
OCR
Autonomous
car

Email
classiﬁcaNon

RecommendaNon
systems
Web
page
ranking

Feature
ExtracNon

Classiﬁer

Feature
extracNon

(Mostly
hand-‐cra]ed
features)

Hand-‐Cra]ed
Features

Computer
vision:

…

SIFT/HOG
SURF

Speech
RecogniNon:

…

MFCC
Spectrogram
ZCR

New
feature-‐designing
paradigm

Unsupervised
Feature
Learning
/
Deep
Learning

ReconstrucNon
ICA

Expensive
and
typically
applied
to
small
problems

The
Trend
of
BigData

Outline

No
maQer
the
algorithm,
more
features
always
more
successful.

-‐

ReconstrucNon
ICA

-‐  ApplicaNons
to
videos,
cancer
images

-‐  Ideas
for
scaling
up

-‐  Scaling
up
Results

Topographic
Independent
Component
Analysis
(TICA)

1.
Feature
computaNon

2
2.
Learning

2
(
W
T

(
W
T
)
9
)

1

W
T

W
T
9

1

W

W
9

1

W

1

W
=

W

2

.

.

W

10000

Input
data:

Invariance
explained

Images
Image1
Image2

Features

Loc1
Loc2

1
0

F1

F2
0
1

Pooled
feature
of
F1
and
F2
sqrt(1
+
02
)
=
1

2

sqrt(0
2
+
12
)
=
1

Same
value
regardless
the
locaNon
of
the
edge

TICA:
ReconstrucNon
ICA:

Equivalence
between
Sparse
Coding,
Autoencoders,
RBMs
and
ICA

Build
deep
architecture
by
treaNng
the
output
of
one
layer
as
input
to

another
layer

Le,
et
al.,
ICA
with
Reconstruc1on
Cost
for
Eﬃcient
Overcomplete
Feature
Learning.
NIPS
2011

ReconstrucNon
ICA:

Le,
et
al.,
ICA
with
Reconstruc1on
Cost
for
Eﬃcient
Overcomplete
Feature
Learning.
NIPS
2011

ReconstrucNon
ICA:

Data
whitening

Le,
et
al.,
ICA
with
Reconstruc1on
Cost
for
Eﬃcient
Overcomplete
Feature
Learning.
NIPS
2011

TICA:
ReconstrucNon
ICA:

Data
whitening

Le,
et
al.,
ICA
with
Reconstruc1on
Cost
for
Eﬃcient
Overcomplete
Feature
Learning.
NIPS
2011

Why
RICA?

Algorithms
Speed
Ease
of
training
Invariant
Features

Sparse
Coding

RBMs/Autoencoders

TICA

ReconstrucNon
ICA

Le,
et
al.,
ICA
with
Reconstruc1on
Cost
for
Eﬃcient
Overcomplete
Feature
Learning.
NIPS
2011

Summary
of
RICA

-‐  Two-‐layered
network

-‐  ReconstrucNon
cost
instead
of
orthogonality
constraints

-‐  Learns
invariant
features

AcNon
recogniNon

Sit
up
Drive
Car
Get

Out
of
Car

Eat
Answer
phone
Kiss

Run
Stand
up
Shake
hands

Le,
et
al.,
Learning
hierarchical
spa1o-‐temporal
features
for

ac1on
recogni1on
with
independent
subspace
analysis.
CVPR
2011

Le,
et
al.,
Learning
hierarchical
spa1o-‐temporal
features
for

ac1on
recogni1on
with
independent
subspace
analysis.
CVPR
2011

94
55

KTH
53
Hollywood2

92

51

90
49

88
47

45

86
43

84
41

39

82

37

80
35

Hessian/SURF
pLSA
HOF
GRBMs
3DCNN
HMAX
HOG
Hessian/SURF
HOG/HOF
HOG3D
GRBMS
HOF

Learned
Features
Learned
Features

87
76

UCF
YouTube

85
75

83
74

81
73

79
72

77
71

75
70

Hessian/SURF
HOG
Hessian
HOF
HOG3D
Combined
Learned
Features

HOG.HOF
Learned
Features
Engineered
Features

Le,
et
al.,
Learning
hierarchical
spa1o-‐temporal
features
for

ac1on
recogni1on
with
independent
subspace
analysis.
CVPR
2011

Cancer
classiﬁcaNon

92%

ApoptoNc

90%

Viable
tumor
88%

region

86%

Necrosis
84%

Hand
engineered
Features
RICA

…

Le,
et
al.,
Learning
Invariant
Features
of
Tumor
Signatures.
ISBI
2012

Scaling
up

deep
RICA
networks

Scaling
up
Deep
Learning

Deep
learning
data

Real
data

It’s
beQer
to
have
more
features!

No
maQer
the
algorithm,
more
features
always
more
successful.

Coates,
et
al.,
An
Analysis
of
Single-‐Layer
Networks
in
Unsupervised
Feature
Learning.
AISTATS’11

Most
are

local
features

Local
recepNve
ﬁeld
networks

Machine
#1
Machine
#2
Machine
#3
Machine
#4

RICA
features

Image

Le,
et
al.,
Tiled
Convolu1onal
Neural
Networks.
NIPS
2010

Challenges
with
1000s
of
machines

Asynchronous
Parallel
SGDs

Parameter
server

Le,
et
al.,
Building
high-‐level
features
using
large-‐scale
unsupervised
learning.
ICML
2012

Summary
of
Scaling
up

-‐  Local
connecNvity

-‐  Asynchronous
SGDs

…
And
more

-‐  RPC
vs
MapReduce

-‐  Prefetching

-‐  Single
vs
Double

-‐  Removing
slow
machines

-‐  OpNmized
So]max

-‐  …

10
million
200x200
images

1
billion
parameters

Training

RICA
Dataset:
10
million
200x200
unlabeled
images

from
YouTube/Web

Train
on
2000
machines
(16000
cores)
for
1
week

RICA
1.15
billion
parameters

-‐  100x
larger
than
previously
reported

-‐  Small
compared
to
visual
cortex

RICA

Image

Le,
et
al.,
Building
high-‐level
features
using
large-‐scale
unsupervised
learning.
ICML
2012

The
face
neuron

Top
sNmuli
from
the
test
set
OpNmal
sNmulus

by
numerical
opNmizaNon

Le,
et
al.,
Building
high-‐level
features
using
large-‐scale
unsupervised
learning.
ICML
2012

Random
distractors

Faces

Frequency

Feature
value

Le,
et
al.,
Building
high-‐level
features
using
large-‐scale
unsupervised
learning.
ICML
2012

Feature
response
Invariance
properNes

Feature
response

0
pixels
20
pixels
0
pixels
20
pixels

Horizontal
shi]s
VerNcal
shi]s

Feature
response

Feature
response

o
o

0
90
0.4x
1x
1.6x

3D
rotaNon
angle
Scale
factor

Le,
et
al.,
Building
high-‐level
features
using
large-‐scale
unsupervised
learning.
ICML
2012

Top
sNmuli
from
the
test
set
OpNmal
sNmulus

by
numerical
opNmizaNon

Le,
et
al.,
Building
high-‐level
features
using
large-‐scale
unsupervised
learning.
ICML
2012

Random
distractors

Pedestrians

Frequency

Feature
value

Le,
et
al.,
Building
high-‐level
features
using
large-‐scale
unsupervised
learning.
ICML
2012

Random
distractors

Cat
faces

Frequency

Feature
value

Le,
et
al.,
Building
high-‐level
features
using
large-‐scale
unsupervised
learning.
ICML
2012

ImageNet
classiﬁcaNon

22,000
categories

14,000,000
images

Hand-‐engineered
features
(SIFT,
HOG,
LBP),

SpaNal
pyramid,

SparseCoding/Compression

Le,
et
al.,
Building
high-‐level
features
using
large-‐scale
unsupervised
learning.
ICML
2012

22,000
is
a
lot
of
categories…

…

smoothhound,
smoothhound
shark,
Mustelus
mustelus

American
smooth
dogfish,
Mustelus
canis

Florida
smoothhound,
Mustelus
norrisi

whiteNp
shark,
reef
whiteNp
shark,
Triaenodon
obseus

AtlanNc
spiny
dogfish,
Squalus
acanthias

Pacific
spiny
dogfish,
Squalus
suckleyi
SNngray

hammerhead,
hammerhead
shark

smooth
hammerhead,
Sphyrna
zygaena

smalleye
hammerhead,
Sphyrna
tudes

shovelhead,
bonnethead,
bonnet
shark,
Sphyrna
Nburo

angel
shark,
angelfish,
SquaNna
squaNna,
monkfish

electric
ray,
crampfish,
numbfish,
torpedo
Mantaray

smalltooth
sawfish,
PrisNs
pecNnatus

guitarfish

roughtail
sNngray,
DasyaNs
centroura

buQerfly
ray

eagle
ray

spoQed
eagle
ray,
spoQed
ray,
Aetobatus
narinari

cownose
ray,
cow-‐nosed
ray,
Rhinoptera
bonasus

manta,
manta
ray,
devilfish

AtlanNc
manta,
Manta
birostris

devil
ray,
Mobula
hypostoma

grey
skate,
gray
skate,
Raja
baNs

liQle
skate,
Raja
erinacea

…

Best
sNmuli

Feature
1

Feature
2

Feature
3

Feature
4

Feature
5

Le,
et
al.,
Building
high-‐level
features
using
large-‐scale
unsupervised
learning.
ICML
2012

Best
sNmuli

Feature
6

Feature
7

Feature
8

Feature
9

Le,
et
al.,
Building
high-‐level
features
using
large-‐scale
unsupervised
learning.
ICML
2012

Best
sNmuli

Feature
10

Feature
11

Feature
12

Feature
13

Le,
et
al.,
Building
high-‐level
features
using
large-‐scale
unsupervised
learning.
ICML
2012

0.005%
9.5%
?

Random
guess
State-‐of-‐the-‐art
Feature
learning

(Weston,
Bengio
‘11)
From
raw
pixels

Le,
et
al.,
Building
high-‐level
features
using
large-‐scale
unsupervised
learning.
ICML
2012

0.005%
9.5%
15.8%

Random
guess
State-‐of-‐the-‐art
Feature
learning

(Weston,
Bengio
‘11)
From
raw
pixels

ImageNet
2009
(10k
categories):
Best
published
result:
17%

(Sanchez
&
Perronnin
‘11
),

Our
method:
20%

Using
only
1000
categories,
our
method
>
50%

Le,
et
al.,
Building
high-‐level
features
using
large-‐scale
unsupervised
learning.
ICML
2012

Other
results

No
maQer
the
algorithm,
more
features
always
more
successful.

-‐  We
also
have
great
features
for

-‐  Speech
recogniNon

-‐  Word-‐vector
embedding
for
NLPs

Conclusions

•  RICA
learns
invariant
features

•  Face
neuron
with
totally
unlabeled
data

ImageNet

with
enough
training
and
data

•  State-‐of-‐the-‐art
performances
on

0.005%
9.5%
15.8%

Random
guess
Best
published
result
Our
method

–  AcNon
RecogniNon

–  Cancer
image
classiﬁcaNon

–  ImageNet

94

92

90

88

86

84

82

80

Cancer
classiﬁcaNon
AcNon
recogniNon
AcNon
recogniNon
benchmarks

Feature
visualizaNon
Face
neuron

References

•  Q.V.
Le,
M.A.
Ranzato,
R.
Monga,
M.
Devin,
G.
Corrado,
K.
Chen,
J.
Dean,
A.Y.

Ng.
Building
high-‐level
features
using
large-‐scale
unsupervised
learning.

ICML,
2012.

•  Q.V.
Le,
J.
Ngiam,
Z.
Chen,
D.
Chia,
P.
Koh,
A.Y.
Ng.
Tiled
Convolu8onal
Neural

Networks.
NIPS,
2010.

•  Q.V.
Le,
W.Y.
Zou,
S.Y.
Yeung,
A.Y.
Ng.
Learning
hierarchical
spa8o-‐temporal

features
for
ac8on
recogni8on
with
independent
subspace
analysis.
CVPR,

2011.

•  Q.V.
Le,
J.
Ngiam,
A.
Coates,
A.
Lahiri,
B.
Prochnow,
A.Y.
Ng.

On
op8miza8on
methods
for
deep
learning.
ICML,
2011.

•  Q.V.
Le,
A.
Karpenko,
J.
Ngiam,
A.Y.
Ng.

ICA
with
Reconstruc8on
Cost
for

Eﬃcient
Overcomplete
Feature
Learning.
NIPS,
2011.

•  Q.V.
Le,
J.
Han,
J.
Gray,
P.
Spellman,
A.
Borowsky,
B.
Parvin.
Learning
Invariant

Features
for
Tumor
Signatures.
ISBI,
2012.

•  I.J.
Goodfellow,
Q.V.
Le,
A.M.
Saxe,
H.
Lee,
A.Y.
Ng,

Measuring
invariances
in

deep
networks.
NIPS,
2009.

hQp://ai.stanford.edu/~quocle

Quoc le tera-scale deep learning

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (14)

Más de zukun

Más de zukun (20)

Último

Último (20)

Quoc le tera-scale deep learning