Quoc Le, Stanford & Google - Tera Scale Deep Learning

1. Tera-scale deep learning Quoc V. Le Stanford University and Google Joint work with Kai Chen Greg Corrado Jeﬀ Dean MaAhieu Devin Rajat Monga Andrew Ng Marc Aurelio Paul Tucker Ke Yang Ranzato

2. Machine Learning successes Face recogniLon OCR Autonomous car Email classiﬁcaLon RecommendaLon systems Web page ranking Quoc Le

3. The role of Feature ExtracLon in PaAern RecogniLon Classiﬁer Feature extracLon (Mostly hand-‐craWed features) Quoc Le

4. Hand-‐CraWed Features Computer vision: … SIFT/HOG SURF Speech RecogniLon: … MFCC Spectrogram ZCR Quoc Le

5. New feature-‐designing paradigm Unsupervised Feature Learning / Deep Learning Show promises for small datasets Expensive and typically applied to small problems Quoc Le

6. The Trend of BigData Quoc Le

7. Brain SimulaLon Autoencoder Watching 10 million YouTube video frames Train on 2000 machines (16000 cores) for 1 week Autoencoder 1.15 billion parameters -‐  100x larger than previously reported -‐  Small compared to visual cortex Autoencoder Image Le, et al., Building high-‐level features using large-‐scale unsupervised learning. ICML 2012

8. Key results Face detector Human body detector Cat detector Totally unsupervised! ~85% correct in classifying face vs no face Le, et al., Building high-‐level features using large-‐scale unsupervised learning. ICML 2012

9. ImageNet classiﬁcaLon 0.005% 9.5% 15.8% Random guess State-‐of-‐the-‐art Feature learning (Weston, Bengio ‘11) From raw pixels ImageNet 2009 (10k categories): Best published result: 17% (Sanchez & Perronnin ‘11 ), Our method: 20% Using only 1000 categories, our method > 50% Quoc Le

10. Scaling up Deep Learning Prior art Our work # Examples 100,000 10,000,000 # Dimensions 1,000 10,000 # Parameters 10,000,000 1,000,000,000 Data set size Gbytes Tbytes Edge ﬁlters High-‐level features Learned features from Images Face, cat detectors Quoc Le

11. Summary of Scaling up -‐  Local connecLvity (Model Parallelism) -‐  Asynchronous SGDs (Clever opLmizaLon / Data parallelism) -‐  RPCs -‐  Prefetching -‐  Single -‐  Removing slow machines -‐  Lots of opLmizaLon Quoc Le

12. Locally connected networks Machine #1 Machine #2 Machine #3 Machine #4 Features Image Quoc Le

13. Asynchronous Parallel SGDs (Alex Smola’s talk) Parameter server Quoc Le

14. Conclusions •  Scale deep learning 100x larger using distributed training on 1000 machines •  Brain simulaLon -‐> Cat neuron •  State-‐of-‐the-‐art performances on –  Object recogniLon (ImageNet) –  AcLon RecogniLon –  Cancer image classiﬁcaLon •  Other applicaLons –  Speech recogniLon –  Machine TranslaLon ImageNet 0.005% 9.5% 15.8% Best published result Model Random guess Our method Parallelism Data Parameter server Parallelism Cat neuron Face neuron

15. References •  Q.V. Le, M.A. Ranzato, R. Monga, M. Devin, G. Corrado, K. Chen, J. Dean, A.Y. Ng. Building high-‐level features using large-‐scale unsupervised learning. ICML, 2012. •  Q.V. Le, J. Ngiam, Z. Chen, D. Chia, P. Koh, A.Y. Ng. Tiled Convolu7onal Neural Networks. NIPS, 2010. •  Q.V. Le, W.Y. Zou, S.Y. Yeung, A.Y. Ng. Learning hierarchical spa7o-‐temporal features for ac7on recogni7on with independent subspace analysis. CVPR, 2011. •  Q.V. Le, J. Ngiam, A. Coates, A. Lahiri, B. Prochnow, A.Y. Ng. On op7miza7on methods for deep learning. ICML, 2011. •  Q.V. Le, A. Karpenko, J. Ngiam, A.Y. Ng. ICA with Reconstruc7on Cost for Eﬃcient Overcomplete Feature Learning. NIPS, 2011. •  Q.V. Le, J. Han, J. Gray, P. Spellman, A. Borowsky, B. Parvin. Learning Invariant Features for Tumor Signatures. ISBI, 2012. •  I.J. Goodfellow, Q.V. Le, A.M. Saxe, H. Lee, A.Y. Ng, Measuring invariances in deep networks. NIPS, 2009. hAp://ai.stanford.edu/~quocle

Quoc Le, Stanford & Google - Tera Scale Deep Learning

Recomendados

Recomendados

Más contenido relacionado

Similar a Quoc Le, Stanford & Google - Tera Scale Deep Learning

Similar a Quoc Le, Stanford & Google - Tera Scale Deep Learning (20)

Más de Kun Le

Más de Kun Le (14)

Último

Último (20)

Quoc Le, Stanford & Google - Tera Scale Deep Learning