This document provides an overview of deep learning, including what it is, why it is difficult, and problems to consider. Deep learning uses neural networks with 3 or more layers to perform pattern recognition on unlabeled and unstructured data like images and text. It is computationally intensive and requires large datasets and specialized hardware like GPUs. Some challenges include dealing with messy real-world data, scaling networks across large clusters, combining different neural network types, and tuning hyperparameters.
2. Overview
● What is Deep Learning?
● Why is it hard?
● Problems to think about
● Conclusions
3. What is Deep Learning?
Pattern
recognition on
unlabeled &
unstructured
data.
4. What is Deep Learning?
● Deep Neural Networks >= 3 Layers
● For media/unstructured data
● Automatic Feature Engineering
● Benefits From Complex Architectures
● Computationally Intensive
● Accelerates With Special Hardware
6. Deep Networks >= 3 Layers
● Backpropagation and Old School ANNs = 3
7. Deep Networks
● Neural Networks themselves as hidden
Layers
● Different Types of Layers can be
Interchanged/stacked
● Multiple Layer Types, each with own
Hyperparameters and Loss Functions
13. Other kinds
● Memory Networks
● Deep Reinforcement Learning
● Adversarial Architectures
● New recursive ConvNet variant to come in
2016?
● Over 9,000 Layers? (22 is already pretty
common)
17. Benefits from Complex Architectures
Google’s result combined:
● LSTMs (learning captions)
● Word Embeddings
● Convolutional features from images (aligned
to be same size as embeddings)
18. Computationally Intensive
● One iteration of ImageNet (1k label dataset
and over 1MM examples) takes 7 hours on
GPUs
● Project Adam
● Google Brain
20. Software Engineering Concerns
● Pipelines to deal with messy data,
not canned problems...
(Real life is not Kaggle, people.)
● Scale/Maintenance (Clusters of GPUs aren’t
done well today.)
● Different kinds of parallelism (model and
data)
21. Model vs Data Parallelism
● Model is sharding model across servers
(HPC style)
● Data is mini batch
22. Vectorizing unstructured data
● Data is stored in different databases
● Different kinds of files (raw)
● Deep Learning works well on mixed signal
24. Production Stacks today
● Hadoop/Spark not enough
● GPUs not friendly to average programmer
● Cluster management of GPUs as a resource
not typically done
● Many frameworks don’t work well in a
distributed env (getting better, though)
25. Problems With Neural Nets
● Loss functions
● Scaling data
● Mixing different neural nets
● Hyperparameter tuning
27. Scaling Data
● Zero mean and unit variance
● Zero to 1
● Other forms of preprocessing relative to
distribution of data
● Processing can also be columnwise
(categorical?)
28. Mixing and Matching Neural Networks
● Video: ConvNet + Recurrent
● Convolutional RBMs?
● Convolutional -> Subsampling -> Fully
Connected
● DBNs: Different hidden and visible units for
each layer
30. Hyperparameter Tuning (2)
● Grid search for neural nets (Don’t do it!)
● Bayesian (Getting better. There are at least
priors here.)
● Gradient-based approaches (Your hyper-
parameters are a neural net, so there are
neural nets optimizing your neural nets...)