Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Quoc Le, Stanford & Google - Tera Scale Deep Learning
1. Tera-scale deep learning
Quoc
V.
Le
Stanford
University
and
Google
Joint
work
with
Kai
Chen
Greg
Corrado
Jeff
Dean
MaAhieu
Devin
Rajat
Monga
Andrew
Ng
Marc Aurelio
Paul
Tucker
Ke
Yang
Ranzato
2. Machine
Learning
successes
Face
recogniLon
OCR
Autonomous
car
Email
classificaLon
RecommendaLon
systems
Web
page
ranking
Quoc
Le
3. The
role
of
Feature
ExtracLon
in
PaAern
RecogniLon
Classifier
Feature
extracLon
(Mostly
hand-‐craWed
features)
Quoc
Le
4. Hand-‐CraWed
Features
Computer
vision:
…
SIFT/HOG
SURF
Speech
RecogniLon:
…
MFCC
Spectrogram
ZCR
Quoc
Le
5. New
feature-‐designing
paradigm
Unsupervised
Feature
Learning
/
Deep
Learning
Show
promises
for
small
datasets
Expensive
and
typically
applied
to
small
problems
Quoc
Le
7. Brain
SimulaLon
Autoencoder
Watching
10
million
YouTube
video
frames
Train
on
2000
machines
(16000
cores)
for
1
week
Autoencoder
1.15
billion
parameters
-‐ 100x
larger
than
previously
reported
-‐ Small
compared
to
visual
cortex
Autoencoder
Image
Le,
et
al.,
Building
high-‐level
features
using
large-‐scale
unsupervised
learning.
ICML
2012
8. Key
results
Face
detector
Human
body
detector
Cat
detector
Totally
unsupervised!
~85%
correct
in
classifying
face
vs
no
face
Le,
et
al.,
Building
high-‐level
features
using
large-‐scale
unsupervised
learning.
ICML
2012
9. ImageNet
classificaLon
0.005%
9.5%
15.8%
Random
guess
State-‐of-‐the-‐art
Feature
learning
(Weston,
Bengio
‘11)
From
raw
pixels
ImageNet
2009
(10k
categories):
Best
published
result:
17%
(Sanchez
&
Perronnin
‘11
),
Our
method:
20%
Using
only
1000
categories,
our
method
>
50%
Quoc
Le
10. Scaling
up
Deep
Learning
Prior
art
Our
work
#
Examples
100,000
10,000,000
#
Dimensions
1,000
10,000
#
Parameters
10,000,000
1,000,000,000
Data
set
size
Gbytes
Tbytes
Edge
filters
High-‐level
features
Learned
features
from
Images
Face,
cat
detectors
Quoc
Le
11. Summary
of
Scaling
up
-‐ Local
connecLvity
(Model
Parallelism)
-‐ Asynchronous
SGDs
(Clever
opLmizaLon
/
Data
parallelism)
-‐ RPCs
-‐ Prefetching
-‐ Single
-‐ Removing
slow
machines
-‐ Lots
of
opLmizaLon
Quoc
Le
14. Conclusions
• Scale
deep
learning
100x
larger
using
distributed
training
on
1000
machines
• Brain
simulaLon
-‐>
Cat
neuron
• State-‐of-‐the-‐art
performances
on
– Object
recogniLon
(ImageNet)
– AcLon
RecogniLon
– Cancer
image
classificaLon
• Other
applicaLons
– Speech
recogniLon
– Machine
TranslaLon
ImageNet
0.005%
9.5%
15.8%
Best
published
result
Model
Random
guess
Our
method
Parallelism
Data
Parameter
server
Parallelism
Cat
neuron
Face
neuron
15. References
• Q.V.
Le,
M.A.
Ranzato,
R.
Monga,
M.
Devin,
G.
Corrado,
K.
Chen,
J.
Dean,
A.Y.
Ng.
Building
high-‐level
features
using
large-‐scale
unsupervised
learning.
ICML,
2012.
• Q.V.
Le,
J.
Ngiam,
Z.
Chen,
D.
Chia,
P.
Koh,
A.Y.
Ng.
Tiled
Convolu7onal
Neural
Networks.
NIPS,
2010.
• Q.V.
Le,
W.Y.
Zou,
S.Y.
Yeung,
A.Y.
Ng.
Learning
hierarchical
spa7o-‐temporal
features
for
ac7on
recogni7on
with
independent
subspace
analysis.
CVPR,
2011.
• Q.V.
Le,
J.
Ngiam,
A.
Coates,
A.
Lahiri,
B.
Prochnow,
A.Y.
Ng.
On
op7miza7on
methods
for
deep
learning.
ICML,
2011.
• Q.V.
Le,
A.
Karpenko,
J.
Ngiam,
A.Y.
Ng.
ICA
with
Reconstruc7on
Cost
for
Efficient
Overcomplete
Feature
Learning.
NIPS,
2011.
• Q.V.
Le,
J.
Han,
J.
Gray,
P.
Spellman,
A.
Borowsky,
B.
Parvin.
Learning
Invariant
Features
for
Tumor
Signatures.
ISBI,
2012.
• I.J.
Goodfellow,
Q.V.
Le,
A.M.
Saxe,
H.
Lee,
A.Y.
Ng,
Measuring
invariances
in
deep
networks.
NIPS,
2009.
hAp://ai.stanford.edu/~quocle