Bringing Deep Learning into production

Brief introduction
• CTO & co-founder of Agile Lab
• Data & Tech addicted
• Contributor of Spark Notebook
• Spark early adopter
• Certified Cassandra Architect
• DeepLearning enthusiast

Who is Agile Lab ?
GO BIG (data) or GO HOME
http://www.meetup.com/it-IT/Torino-Scala-Programming-Big-Data-Meetup/

What we do
Applications
High scalability
Decision Support
Systems
data engineering, data mining and data
«meaning»
Big Data Strategies
Training
Reactive, NoSQL, Big Data, Machine
learning

What is Deep Learning
• Deep learning is just another name for artificial neural networks
• An algorithm is deep if the input is passed through several non-li
nearity before being output
• Deep learning is discovering the features that best represent the
problem, rather than just a way to combine them

Do you want start with Deep
Learning ?
Let’s choose the right tools !!

Deep Learning Frameworks
• Deeplearning4J
• TensorFlow
• Caffe
• Theano
• Torch
• Spark ML MultilayerPerceptrons
• H2O
• CNTK
• MatLab
• maxDNN
And many others

How to choose
Background
Target Environment
Vision

Background
Productivity !!
• Scala
• Java
Big Data
Engineer
• Java
• Python
Math
Engineer
• R
• Python
Statistician

Target Environment
Trained model should
be deployable !! Trained
Model
Dev Env
Prod Env

Target Environment
Prod Env Dev Env
Training
Data
Cleaning
ETLScheduling
ML Pipeline
- Track model performance over time
- Care about SLA
- Continous tweaks

Enterprise Architecture
HADOOP
Online
DataStore
Enterprise Service BUS
DataIntegrationLayer
Data Integration Layer
DataIntegrationLayer
External
Sources
ANALYTICS
VALUE
ADDED
SERVICES
API
SERVICES
Internal
Business
Sources
Internal
System
Sources
DeepLearning

Easy Wins
Training pipeline should run
on Spark or Hadoop
Trained Model should be
represented in Java objects

Vision: keep in mind Scaling
High Level dynamic languages
are incredibly productive for
prototyping and data exploration
Scaling on larger data sets
quickly runs into performance
limitations
Keep in mind scaling
requirements from beginning

Vision: simplify the pipeline
Copy & Sample data from Dev Env to Data
Scientist Env
Prototype in Python or R
Train model
Predict on validation Data
Translate Model to match Prod Env 
Java, MapReduce, Spark
Deploy training pipeline and model

Easy Wins
Datascientists should work
directly on distributed
environment
Datascientist and big data
engineers should co-operate
on the same platform

Tensor Flow
Strenghts:
- Powered By Google
- Nice UI
Weaknesses:
- Powered By Google
- No support for “inline” matrix operations
 Slow
Opportunities:
- Awesome community
Threats:
- No Scala or Java integration
- No commercial support

Theano
Strenghts:
- Grand Daddy of deep learning
- RNN and CNN
- Computational graph abstraction
- Python
Weaknesses:
- No support for Hadoop or Spark
- No plug & play nets
Opportunities:
- Great community
Threats:

Torch
Strenghts:
- GPU support
- Lots of pretrained models and packages
- Easy to use
Weaknesses:
- Lua language
Opportunities:
- Backed by DeepMind and Facebook
Threats:

Caffè
Strenghts:
- C++ & Python
- Good Performance
- GPU Support
Weaknesses:
- Focused on image processing
Opportunities:
- Backed by Yahoo for Spark integration
- Gpu Clustering
Threats:

DeepLearning4j
Strenghts:
- GPU support
- Java and Scala
- Full DNN set
- Support Hadoop, Spark & Akka
Weaknesses:
- Not for dummies
Opportunities:
- Commercial support - SkyMind
Threats:
- Not so sexy for DataScientist because of
Java/Scala

H2O
• Easy to use Web UI
• Multi language API
• Run directly on HDFS or S3
• Model is Java PoJo
• Big Data Ready
• Really Fast
• Compressed data
• Regularization
• Grid Search
• GPU is still on roadmap
• CNN and RNN too

H20 – Sparkling Water
• Python, R and Scala API
• Best Kagglers use H20
• Tons of tools for profiling and tu
ning
• Spark leverage
• Best in class algorithms – battle
tested
• Regolarization
• Grid search

Workflow
POJO Java
Training Set
Embeddable in:
• J2EE App
• Spark Job
• MR Job
• DWH as UDF
training

Spark as middleware
Using Spark as middleware, you can leverage :
• Deeplearning4J
• H2O
• TensorFlow ( Arimo Extension)
• Caffe ( Yahoo Extension )
• ML MultilayerPerceptrons and future implementations
NO tech provider Lock-in

Our Stack for Enterprise
• Ready for Enterprise and Hadoop World
• Deployable into Java Env
• Notebook ( Flow )
• H2O for out of the box algorithms
• DeepLearning 4J for advanced DNN and
n-dimension array manipulation
• Good usability for both DataScientists and
Big Data Engineers
• Enterprise Support along the whole stack

Thanks!
We are hiring !
paolo.platter@agilelab.it

Bringing Deep Learning into production

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Bringing Deep Learning into production

Similar to Bringing Deep Learning into production (20)

More from Paolo Platter

More from Paolo Platter (6)

Recently uploaded

Recently uploaded (20)

Bringing Deep Learning into production