Productionizing Deep Learning From the Ground Up

•Descargar como PPTX, PDF•

7 recomendaciones•2,758 vistas

This document provides an overview of deep learning, including what it is, why it is difficult, and problems to consider. Deep learning uses neural networks with 3 or more layers to perform pattern recognition on unlabeled and unstructured data like images and text. It is computationally intensive and requires large datasets and specialized hardware like GPUs. Some challenges include dealing with messy real-world data, scaling networks across large clusters, combining different neural network types, and tuning hyperparameters.

Datos y análisis

Open DataSciCon May 2015
Productionizing
Deep Learning
From the Ground Up

Overview
● What is Deep Learning?
● Why is it hard?
● Problems to think about
● Conclusions

What is Deep Learning?
Pattern
recognition on
unlabeled &
unstructured
data.

What is Deep Learning?
● Deep Neural Networks >= 3 Layers
● For media/unstructured data
● Automatic Feature Engineering
● Benefits From Complex Architectures
● Computationally Intensive
● Accelerates With Special Hardware

Deep Networks >= 3 Layers
● Backpropagation and Old School ANNs = 3

Deep Networks
● Neural Networks themselves as hidden
Layers
● Different Types of Layers can be
Interchanged/stacked
● Multiple Layer Types, each with own
Hyperparameters and Loss Functions

Feedforward
1. MLPs
2. AutoEncoders
3. RBMs

Recurrent
1. MultiModal
2. LSTMs
3. Stateful

Convolutional
Lenet: Mixes convolutional & subsampling layers

Recursive/Tree
Uses a parser to form a tree structure

Other kinds
● Memory Networks
● Deep Reinforcement Learning
● Adversarial Architectures
● New recursive ConvNet variant to come in
2016?
● Over 9,000 Layers? (22 is already pretty
common)

Automatic Feature Engineering (TSNE)
Visualizations are crucial:
Use TSNE to render different kinds of data:
http://lvdmaaten.github.io/tsne/

deeplearning4j.org
presentation@
Google, Nov. 17 2014
“TWO PIZZAS SITTING ON A STOVETOP”

Benefits from Complex Architectures
Google’s result combined:
● LSTMs (learning captions)
● Word Embeddings
● Convolutional features from images (aligned
to be same size as embeddings)

Computationally Intensive
● One iteration of ImageNet (1k label dataset
and over 1MM examples) takes 7 hours on
GPUs
● Project Adam
● Google Brain

Special Hardware required
Unlike most solutions, multiple GPUs are used
today
(Not common in Java-based stacks!)

Software Engineering Concerns
● Pipelines to deal with messy data,
not canned problems...
(Real life is not Kaggle, people.)
● Scale/Maintenance (Clusters of GPUs aren’t
done well today.)
● Different kinds of parallelism (model and
data)

Model vs Data Parallelism
● Model is sharding model across servers
(HPC style)
● Data is mini batch

Vectorizing unstructured data
● Data is stored in different databases
● Different kinds of files (raw)
● Deep Learning works well on mixed signal

Parallelism
● Model (HPC)
● Data (Mini batch param averaging)

Production Stacks today
● Hadoop/Spark not enough
● GPUs not friendly to average programmer
● Cluster management of GPUs as a resource
not typically done
● Many frameworks don’t work well in a
distributed env (getting better, though)

Problems With Neural Nets
● Loss functions
● Scaling data
● Mixing different neural nets
● Hyperparameter tuning

Loss Functions
● Classification
● Regression
● Reconstruction

Scaling Data
● Zero mean and unit variance
● Zero to 1
● Other forms of preprocessing relative to
distribution of data
● Processing can also be columnwise
(categorical?)

Mixing and Matching Neural Networks
● Video: ConvNet + Recurrent
● Convolutional RBMs?
● Convolutional -> Subsampling -> Fully
Connected
● DBNs: Different hidden and visible units for
each layer

Hyperparameter tuning
● Underfit
● Overfit
● Overdescribe (your hidden layers)
● Layerwise interactions
● What activation function? (Competing?
Relu? Good ol’ Sigmoid?)

Hyperparameter Tuning (2)
● Grid search for neural nets (Don’t do it!)
● Bayesian (Getting better. There are at least
priors here.)
● Gradient-based approaches (Your hyper-
parameters are a neural net, so there are
neural nets optimizing your neural nets...)

Questions?
Twitter: @agibsonccc
Github: agibsonccc
LinkedIn: /in/agibsonccc
Email: adam@skymind.io (combo breaker!)
Web: deeplearning4j.org

Más contenido relacionado

La actualidad más candente

Anomaly Detection and Automatic Labeling with Deep LearningAdam Gibson

Advanced deeplearning4j featuresAdam Gibson

Dl4j in the wildAdam Gibson

Machine Learning for (JVM) DevelopersMateusz Dymczyk

Deploying signature verification with deep learningAdam Gibson

Deep learning in production with the bestAdam Gibson

Building A Machine Learning Platform At Quora (1)Nikhil Garg

Machine Learning on Distributed Systems by Josh PoduskaData Con LA

DevTernity - OOP in the enterpriseNicolas Fränkel

How To Do A ProjectSudarsun Santhiappan

Introduction to Machine learningNEEVEE Technologies

ScrappyVishwas N

201906 02 Introduction to AutoML with ML.NET 1.0Mark Tabladillo

Introduction to ML.NETMarco Parenzan

Functional Programming and Java8Ender Aydin Orak

Deep Recommender systems - Shibsted, OsloMax Pumperla

201906 01 Introduction to ML.NET 1.0Mark Tabladillo

No sqlViyaan Jhiingade

Scriptable Objects in Unity Game Engine (C#)Om Shridhar

Scalable Automatic Machine Learning in H2OSri Ambati

La actualidad más candente (20)

Anomaly Detection and Automatic Labeling with Deep Learning

Advanced deeplearning4j features

Dl4j in the wild

Machine Learning for (JVM) Developers

Deploying signature verification with deep learning

Deep learning in production with the best

Building A Machine Learning Platform At Quora (1)

Machine Learning on Distributed Systems by Josh Poduska

DevTernity - OOP in the enterprise

How To Do A Project

Introduction to Machine learning

Scrappy

201906 02 Introduction to AutoML with ML.NET 1.0

Introduction to ML.NET

Functional Programming and Java8

Deep Recommender systems - Shibsted, Oslo

201906 01 Introduction to ML.NET 1.0

No sql

Scriptable Objects in Unity Game Engine (C#)

Scalable Automatic Machine Learning in H2O

Similar a Productionizing Deep Learning From the Ground Up

Distributed Databases - Concepts & ArchitecturesDaniel Marcous

Challenges in Large Scale Machine LearningSudarsun Santhiappan

Netflix machine learningAmer Ather

Distributed machine learning 101 using apache spark from a browser devoxx.b...Andy Petrella

Deep Learning at ScaleMateusz Dymczyk

Threads and processesFungirayiini Chiweshe Mushaninga

Distributed systems and consistencyseldo

Cloud accounting software ukArcus Universe Ltd

C3 w3Ajay Taneja

Writing Scalable Software in JavaRuben Badaró

BISSA: Empowering Web gadget Communication with Tuple SpacesSrinath Perera

Open source data_warehousing_overviewAlex Meadows

Introduction to MemoriaVictor Smirnov

Clustering in PostgreSQL - Because one database server is never enough (and n...Umair Shahid

Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2Anant Corporation

What is Distributed Computing, Why we use Apache SparkAndy Petrella

DBMS unit 1.pptxssuserc8e1481

Apache Spark 101 - Demi Ben-AriDemi Ben-Ari

An Introduction to Apache CassandraSaeid Zebardast

Modern processor artwaqasjadoon11

Similar a Productionizing Deep Learning From the Ground Up (20)

Distributed Databases - Concepts & Architectures

Challenges in Large Scale Machine Learning

Netflix machine learning

Distributed machine learning 101 using apache spark from a browser devoxx.b...

Deep Learning at Scale

Threads and processes

Distributed systems and consistency

Cloud accounting software uk

C3 w3

Writing Scalable Software in Java

BISSA: Empowering Web gadget Communication with Tuple Spaces

Open source data_warehousing_overview

Introduction to Memoria

Clustering in PostgreSQL - Because one database server is never enough (and n...

Apache Cassandra Lunch #54: Machine Learning with Spark + Cassandra Part 2

What is Distributed Computing, Why we use Apache Spark

DBMS unit 1.pptx

Apache Spark 101 - Demi Ben-Ari

An Introduction to Apache Cassandra

Modern processor art

Más de Adam Gibson

End to end MLworkflowsAdam Gibson

World Artificial Intelligence Conference Shanghai 2018Adam Gibson

Boolan machine learning summitAdam Gibson

Deep Learning with GPUs in Production - AI By the BayAdam Gibson

Wrangleconf Big Data Malaysia 2016Adam Gibson

Distributed deep rl on spark strata singaporeAdam Gibson

Strata Beijing - Deep Learning in Production on SparkAdam Gibson

Anomaly detection in deep learning (Updated) EnglishAdam Gibson

Skymind - Udacity China presentationAdam Gibson

Anomaly Detection in Deep Learning (Updated)Adam Gibson

Hadoop summit 2016Adam Gibson

Anomaly detection in deep learningAdam Gibson

Advanced spark deep learningAdam Gibson

Recurrent nets and sensorsAdam Gibson

Nd4 j slides.pptxAdam Gibson

Deep learning on Hadoop/Spark -NextMLAdam Gibson

Skymind & Deeplearning4j: Deep Learning for the EnterpriseAdam Gibson

Sf data mining_meetupAdam Gibson

Más de Adam Gibson (18)

End to end MLworkflows

World Artificial Intelligence Conference Shanghai 2018

Boolan machine learning summit

Deep Learning with GPUs in Production - AI By the Bay

Wrangleconf Big Data Malaysia 2016

Distributed deep rl on spark strata singapore

Strata Beijing - Deep Learning in Production on Spark

Anomaly detection in deep learning (Updated) English

Skymind - Udacity China presentation

Anomaly Detection in Deep Learning (Updated)

Hadoop summit 2016

Anomaly detection in deep learning

Advanced spark deep learning

Recurrent nets and sensors

Nd4 j slides.pptx

Deep learning on Hadoop/Spark -NextML

Skymind & Deeplearning4j: Deep Learning for the Enterprise

Sf data mining_meetup

Último

Carero dropshipping via API with DroFx.pptxolyaivanovalion

Call Girls Indiranagar Just Call 👗 7737669865 👗 Top Class Call Girl Service B...amitlee9823

Week-01-2.ppt BBB human Computer interactionfulawalesam

Best VIP Call Girls Noida Sector 22 Call Me: 8448380779Delhi Call girls

FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg

Invezz.com - Grow your wealth with trading signalsInvezz1

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal

Vip Model Call Girls (Delhi) Karol Bagh 9711199171✔️Body to body massage wit...shivangimorya083

Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...amitlee9823

Abortion pills in Doha Qatar (+966572737505 ! Get CytotecAbortion pills in Riyadh +966572737505 get cytotec

Chintamani Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore ...amitlee9823

VidaXL dropshipping via API with DroFx.pptxolyaivanovalion

Accredited-Transport-Cooperatives-Jan-2021-Web.pdfadriantubila

Digital Advertising Lecture for Advanced Digital & Social Media Strategy at U...Valters Lauzums

Halmar dropshipping via API with DroFxolyaivanovalion

BigBuy dropshipping via API with DroFx.pptxolyaivanovalion

CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion

Introduction-to-Machine-Learning (1).pptxfirstjob4

Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083

Determinants of health, dimensions of health, positive health and spectrum of...shambhavirathore45

Productionizing Deep Learning From the Ground Up

1. Open DataSciCon May 2015 Productionizing Deep Learning From the Ground Up

2. Overview ● What is Deep Learning? ● Why is it hard? ● Problems to think about ● Conclusions

3. What is Deep Learning? Pattern recognition on unlabeled & unstructured data.

4. What is Deep Learning? ● Deep Neural Networks >= 3 Layers ● For media/unstructured data ● Automatic Feature Engineering ● Benefits From Complex Architectures ● Computationally Intensive ● Accelerates With Special Hardware

5. Get why it’s hard yet?

6. Deep Networks >= 3 Layers ● Backpropagation and Old School ANNs = 3

7. Deep Networks ● Neural Networks themselves as hidden Layers ● Different Types of Layers can be Interchanged/stacked ● Multiple Layer Types, each with own Hyperparameters and Loss Functions

8. What Are Common Layer Types?

9. Feedforward 1. MLPs 2. AutoEncoders 3. RBMs

10. Recurrent 1. MultiModal 2. LSTMs 3. Stateful

11. Convolutional Lenet: Mixes convolutional & subsampling layers

12. Recursive/Tree Uses a parser to form a tree structure

13. Other kinds ● Memory Networks ● Deep Reinforcement Learning ● Adversarial Architectures ● New recursive ConvNet variant to come in 2016? ● Over 9,000 Layers? (22 is already pretty common)

14. Automatic Feature Engineering

15. Automatic Feature Engineering (TSNE) Visualizations are crucial: Use TSNE to render different kinds of data: http://lvdmaaten.github.io/tsne/

16. deeplearning4j.org presentation@ Google, Nov. 17 2014 “TWO PIZZAS SITTING ON A STOVETOP”

17. Benefits from Complex Architectures Google’s result combined: ● LSTMs (learning captions) ● Word Embeddings ● Convolutional features from images (aligned to be same size as embeddings)

18. Computationally Intensive ● One iteration of ImageNet (1k label dataset and over 1MM examples) takes 7 hours on GPUs ● Project Adam ● Google Brain

19. Special Hardware required Unlike most solutions, multiple GPUs are used today (Not common in Java-based stacks!)

20. Software Engineering Concerns ● Pipelines to deal with messy data, not canned problems... (Real life is not Kaggle, people.) ● Scale/Maintenance (Clusters of GPUs aren’t done well today.) ● Different kinds of parallelism (model and data)

21. Model vs Data Parallelism ● Model is sharding model across servers (HPC style) ● Data is mini batch

22. Vectorizing unstructured data ● Data is stored in different databases ● Different kinds of files (raw) ● Deep Learning works well on mixed signal

23. Parallelism ● Model (HPC) ● Data (Mini batch param averaging)

24. Production Stacks today ● Hadoop/Spark not enough ● GPUs not friendly to average programmer ● Cluster management of GPUs as a resource not typically done ● Many frameworks don’t work well in a distributed env (getting better, though)

25. Problems With Neural Nets ● Loss functions ● Scaling data ● Mixing different neural nets ● Hyperparameter tuning

26. Loss Functions ● Classification ● Regression ● Reconstruction

27. Scaling Data ● Zero mean and unit variance ● Zero to 1 ● Other forms of preprocessing relative to distribution of data ● Processing can also be columnwise (categorical?)

28. Mixing and Matching Neural Networks ● Video: ConvNet + Recurrent ● Convolutional RBMs? ● Convolutional -> Subsampling -> Fully Connected ● DBNs: Different hidden and visible units for each layer

29. Hyperparameter tuning ● Underfit ● Overfit ● Overdescribe (your hidden layers) ● Layerwise interactions ● What activation function? (Competing? Relu? Good ol’ Sigmoid?)

30. Hyperparameter Tuning (2) ● Grid search for neural nets (Don’t do it!) ● Bayesian (Getting better. There are at least priors here.) ● Gradient-based approaches (Your hyperparameters are a neural net, so there are neural nets optimizing your neural nets...)

31. Questions? Twitter: @agibsonccc Github: agibsonccc LinkedIn: /in/agibsonccc Email: adam@skymind.io (combo breaker!) Web: deeplearning4j.org

Productionizing Deep Learning From the Ground Up

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a Productionizing Deep Learning From the Ground Up

Similar a Productionizing Deep Learning From the Ground Up (20)

Más de Adam Gibson

Más de Adam Gibson (18)

Último

Último (20)

Productionizing Deep Learning From the Ground Up