2. Joint
work
with
Kai
Chen
Greg
Corrado
Jeff
Dean
MaQhieu
Devin
Rajat
Monga
Andrew
Ng
Marc Aurelio
Paul
Tucker
Ke
Yang
Ranzato
Samy
Bengio,
Zhenghao
Chen,
Tom
Dean,
Pangwei
Koh,
AddiNonal
Mark
Mao,
Jiquan
Ngiam,
Patrick
Nguyen,
Andrew
Saxe,
Thanks:
Mark
Segal,
Jon
Shlens,
Vincent
Vanhouke,
Xiaoyun
Wu,
Peng
Xe,
Serena
Yeung,
Will
Zou
3. Machine
Learning
successes
Face
recogniNon
OCR
Autonomous
car
Email
classificaNon
RecommendaNon
systems
Web
page
ranking
8. Outline
No
maQer
the
algorithm,
more
features
always
more
successful.
-‐
ReconstrucNon
ICA
-‐ ApplicaNons
to
videos,
cancer
images
-‐ Ideas
for
scaling
up
-‐ Scaling
up
Results
12. Invariance
explained
Images
Image1
Image2
Features
Loc1
Loc2
1
0
F1
F2
0
1
Pooled
feature
of
F1
and
F2
sqrt(1
+
02
)
=
1
2
sqrt(0
2
+
12
)
=
1
Same
value
regardless
the
locaNon
of
the
edge
13. TICA:
ReconstrucNon
ICA:
Equivalence
between
Sparse
Coding,
Autoencoders,
RBMs
and
ICA
Build
deep
architecture
by
treaNng
the
output
of
one
layer
as
input
to
another
layer
Le,
et
al.,
ICA
with
Reconstruc1on
Cost
for
Efficient
Overcomplete
Feature
Learning.
NIPS
2011
14. ReconstrucNon
ICA:
Le,
et
al.,
ICA
with
Reconstruc1on
Cost
for
Efficient
Overcomplete
Feature
Learning.
NIPS
2011
15. ReconstrucNon
ICA:
Data
whitening
Le,
et
al.,
ICA
with
Reconstruc1on
Cost
for
Efficient
Overcomplete
Feature
Learning.
NIPS
2011
16. TICA:
ReconstrucNon
ICA:
Data
whitening
Le,
et
al.,
ICA
with
Reconstruc1on
Cost
for
Efficient
Overcomplete
Feature
Learning.
NIPS
2011
17. Why
RICA?
Algorithms
Speed
Ease
of
training
Invariant
Features
Sparse
Coding
RBMs/Autoencoders
TICA
ReconstrucNon
ICA
Le,
et
al.,
ICA
with
Reconstruc1on
Cost
for
Efficient
Overcomplete
Feature
Learning.
NIPS
2011
18. Summary
of
RICA
-‐ Two-‐layered
network
-‐ ReconstrucNon
cost
instead
of
orthogonality
constraints
-‐ Learns
invariant
features
20. AcNon
recogniNon
Sit
up
Drive
Car
Get
Out
of
Car
Eat
Answer
phone
Kiss
Run
Stand
up
Shake
hands
Le,
et
al.,
Learning
hierarchical
spa1o-‐temporal
features
for
ac1on
recogni1on
with
independent
subspace
analysis.
CVPR
2011
21. Le,
et
al.,
Learning
hierarchical
spa1o-‐temporal
features
for
ac1on
recogni1on
with
independent
subspace
analysis.
CVPR
2011
22. 94
55
KTH
53
Hollywood2
92
51
90
49
88
47
45
86
43
84
41
39
82
37
80
35
Hessian/SURF
pLSA
HOF
GRBMs
3DCNN
HMAX
HOG
Hessian/SURF
HOG/HOF
HOG3D
GRBMS
HOF
Learned
Features
Learned
Features
87
76
UCF
YouTube
85
75
83
74
81
73
79
72
77
71
75
70
Hessian/SURF
HOG
Hessian
HOF
HOG3D
Combined
Learned
Features
HOG.HOF
Learned
Features
Engineered
Features
Le,
et
al.,
Learning
hierarchical
spa1o-‐temporal
features
for
ac1on
recogni1on
with
independent
subspace
analysis.
CVPR
2011
23. Cancer
classificaNon
92%
ApoptoNc
90%
Viable
tumor
88%
region
86%
Necrosis
84%
Hand
engineered
Features
RICA
…
Le,
et
al.,
Learning
Invariant
Features
of
Tumor
Signatures.
ISBI
2012
26. It’s
beQer
to
have
more
features!
No
maQer
the
algorithm,
more
features
always
more
successful.
Coates,
et
al.,
An
Analysis
of
Single-‐Layer
Networks
in
Unsupervised
Feature
Learning.
AISTATS’11
30. Asynchronous
Parallel
SGDs
Parameter
server
Le,
et
al.,
Building
high-‐level
features
using
large-‐scale
unsupervised
learning.
ICML
2012
31. Asynchronous
Parallel
SGDs
Parameter
server
Le,
et
al.,
Building
high-‐level
features
using
large-‐scale
unsupervised
learning.
ICML
2012
32. Summary
of
Scaling
up
-‐ Local
connecNvity
-‐ Asynchronous
SGDs
…
And
more
-‐ RPC
vs
MapReduce
-‐ Prefetching
-‐ Single
vs
Double
-‐ Removing
slow
machines
-‐ OpNmized
So]max
-‐ …
34. Training
RICA
Dataset:
10
million
200x200
unlabeled
images
from
YouTube/Web
Train
on
2000
machines
(16000
cores)
for
1
week
RICA
1.15
billion
parameters
-‐ 100x
larger
than
previously
reported
-‐ Small
compared
to
visual
cortex
RICA
Image
Le,
et
al.,
Building
high-‐level
features
using
large-‐scale
unsupervised
learning.
ICML
2012
35. The
face
neuron
Top
sNmuli
from
the
test
set
OpNmal
sNmulus
by
numerical
opNmizaNon
Le,
et
al.,
Building
high-‐level
features
using
large-‐scale
unsupervised
learning.
ICML
2012
36. Random
distractors
Faces
Frequency
Feature
value
Le,
et
al.,
Building
high-‐level
features
using
large-‐scale
unsupervised
learning.
ICML
2012
37. Feature
response
Invariance
properNes
Feature
response
0
pixels
20
pixels
0
pixels
20
pixels
Horizontal
shi]s
VerNcal
shi]s
Feature
response
Feature
response
o
o
0
90
0.4x
1x
1.6x
3D
rotaNon
angle
Scale
factor
Le,
et
al.,
Building
high-‐level
features
using
large-‐scale
unsupervised
learning.
ICML
2012
38. Top
sNmuli
from
the
test
set
OpNmal
sNmulus
by
numerical
opNmizaNon
Le,
et
al.,
Building
high-‐level
features
using
large-‐scale
unsupervised
learning.
ICML
2012
39. Random
distractors
Pedestrians
Frequency
Feature
value
Le,
et
al.,
Building
high-‐level
features
using
large-‐scale
unsupervised
learning.
ICML
2012
40. Top
sNmuli
from
the
test
set
OpNmal
sNmulus
by
numerical
opNmizaNon
Le,
et
al.,
Building
high-‐level
features
using
large-‐scale
unsupervised
learning.
ICML
2012
41. Random
distractors
Cat
faces
Frequency
Feature
value
Le,
et
al.,
Building
high-‐level
features
using
large-‐scale
unsupervised
learning.
ICML
2012
42.
43. ImageNet
classificaNon
22,000
categories
14,000,000
images
Hand-‐engineered
features
(SIFT,
HOG,
LBP),
SpaNal
pyramid,
SparseCoding/Compression
Le,
et
al.,
Building
high-‐level
features
using
large-‐scale
unsupervised
learning.
ICML
2012
45. Best
sNmuli
Feature
1
Feature
2
Feature
3
Feature
4
Feature
5
Le,
et
al.,
Building
high-‐level
features
using
large-‐scale
unsupervised
learning.
ICML
2012
46. Best
sNmuli
Feature
6
Feature
7
Feature
8
Feature
9
Le,
et
al.,
Building
high-‐level
features
using
large-‐scale
unsupervised
learning.
ICML
2012
47. Best
sNmuli
Feature
10
Feature
11
Feature
12
Feature
13
Le,
et
al.,
Building
high-‐level
features
using
large-‐scale
unsupervised
learning.
ICML
2012
48. 0.005%
9.5%
?
Random
guess
State-‐of-‐the-‐art
Feature
learning
(Weston,
Bengio
‘11)
From
raw
pixels
Le,
et
al.,
Building
high-‐level
features
using
large-‐scale
unsupervised
learning.
ICML
2012
49. 0.005%
9.5%
15.8%
Random
guess
State-‐of-‐the-‐art
Feature
learning
(Weston,
Bengio
‘11)
From
raw
pixels
ImageNet
2009
(10k
categories):
Best
published
result:
17%
(Sanchez
&
Perronnin
‘11
),
Our
method:
20%
Using
only
1000
categories,
our
method
>
50%
Le,
et
al.,
Building
high-‐level
features
using
large-‐scale
unsupervised
learning.
ICML
2012
50. Other
results
No
maQer
the
algorithm,
more
features
always
more
successful.
-‐ We
also
have
great
features
for
-‐ Speech
recogniNon
-‐ Word-‐vector
embedding
for
NLPs
51. Conclusions
• RICA
learns
invariant
features
• Face
neuron
with
totally
unlabeled
data
ImageNet
with
enough
training
and
data
• State-‐of-‐the-‐art
performances
on
0.005%
9.5%
15.8%
Random
guess
Best
published
result
Our
method
– AcNon
RecogniNon
– Cancer
image
classificaNon
– ImageNet
94
92
90
88
86
84
82
80
Cancer
classificaNon
AcNon
recogniNon
AcNon
recogniNon
benchmarks
Feature
visualizaNon
Face
neuron
52. References
• Q.V.
Le,
M.A.
Ranzato,
R.
Monga,
M.
Devin,
G.
Corrado,
K.
Chen,
J.
Dean,
A.Y.
Ng.
Building
high-‐level
features
using
large-‐scale
unsupervised
learning.
ICML,
2012.
• Q.V.
Le,
J.
Ngiam,
Z.
Chen,
D.
Chia,
P.
Koh,
A.Y.
Ng.
Tiled
Convolu8onal
Neural
Networks.
NIPS,
2010.
• Q.V.
Le,
W.Y.
Zou,
S.Y.
Yeung,
A.Y.
Ng.
Learning
hierarchical
spa8o-‐temporal
features
for
ac8on
recogni8on
with
independent
subspace
analysis.
CVPR,
2011.
• Q.V.
Le,
J.
Ngiam,
A.
Coates,
A.
Lahiri,
B.
Prochnow,
A.Y.
Ng.
On
op8miza8on
methods
for
deep
learning.
ICML,
2011.
• Q.V.
Le,
A.
Karpenko,
J.
Ngiam,
A.Y.
Ng.
ICA
with
Reconstruc8on
Cost
for
Efficient
Overcomplete
Feature
Learning.
NIPS,
2011.
• Q.V.
Le,
J.
Han,
J.
Gray,
P.
Spellman,
A.
Borowsky,
B.
Parvin.
Learning
Invariant
Features
for
Tumor
Signatures.
ISBI,
2012.
• I.J.
Goodfellow,
Q.V.
Le,
A.M.
Saxe,
H.
Lee,
A.Y.
Ng,
Measuring
invariances
in
deep
networks.
NIPS,
2009.
hQp://ai.stanford.edu/~quocle