SlideShare a Scribd company logo
1 of 33
Download to read offline
Brief introduction
• CTO & co-founder of Agile Lab
• Data & Tech addicted
• Contributor of Spark Notebook
• Spark early adopter
• Certified Cassandra Architect
• DeepLearning enthusiast
Who is Agile Lab ?
GO BIG (data) or GO HOME
http://www.meetup.com/it-IT/Torino-Scala-Programming-Big-Data-Meetup/
What we do
Applications
High scalability
Decision Support
Systems
data engineering, data mining and data
«meaning»
Big Data Strategies
Training
Reactive, NoSQL, Big Data, Machine
learning
Why Deep Learning
Deep Learning is trending
What is Deep Learning
• Deep learning is just another name for artificial neural networks
• An algorithm is deep if the input is passed through several non-li
nearity before being output
• Deep learning is discovering the features that best represent the
problem, rather than just a way to combine them
Deep Learning: Use cases
Do you want start with Deep
Learning ?
Let’s choose the right tools !!
Deep Learning Frameworks
• Deeplearning4J
• TensorFlow
• Caffe
• Theano
• Torch
• Spark ML MultilayerPerceptrons
• H2O
• CNTK
• MatLab
• maxDNN
And many others
How to choose
Background
Target Environment
Vision
Background
Productivity !!
• Scala
• Java
Big Data
Engineer
• Java
• Python
Math
Engineer
• R
• Python
Statistician
Target Environment
Trained model should
be deployable !! Trained
Model
Dev Env
Prod Env
Target Environment
Prod Env Dev Env
Training
Data
Cleaning
ETLScheduling
ML Pipeline
- Track model performance over time
- Care about SLA
- Continous tweaks
Enterprise Architecture
HADOOP
Online
DataStore
Enterprise Service BUS
DataIntegrationLayer
Data Integration Layer
DataIntegrationLayer
External
Sources
ANALYTICS
VALUE
ADDED
SERVICES
API
SERVICES
Internal
Business
Sources
Internal
System
Sources
DeepLearning
Easy Wins
Training pipeline should run
on Spark or Hadoop
Trained Model should be
represented in Java objects
Vision: keep in mind Scaling
High Level dynamic languages
are incredibly productive for
prototyping and data exploration
Scaling on larger data sets
quickly runs into performance
limitations
Keep in mind scaling
requirements from beginning
Vision: simplify the pipeline
Copy & Sample data from Dev Env to Data
Scientist Env
Prototype in Python or R
Train model
Predict on validation Data
Translate Model to match Prod Env 
Java, MapReduce, Spark
Deploy training pipeline and model
Easy Wins
Datascientists should work
directly on distributed
environment
Datascientist and big data
engineers should co-operate
on the same platform
SWOT Analysis
Tensor Flow
Strenghts:
- Powered By Google
- Nice UI
Weaknesses:
- Powered By Google
- No support for “inline” matrix operations
 Slow
Opportunities:
- Awesome community
Threats:
- No Scala or Java integration
- No commercial support
Theano
Strenghts:
- Grand Daddy of deep learning
- RNN and CNN
- Computational graph abstraction
- Python
Weaknesses:
- No support for Hadoop or Spark
- No plug & play nets
Opportunities:
- Great community
Threats:
- No Scala or Java integration
- No commercial support
Torch
Strenghts:
- GPU support
- Lots of pretrained models and packages
- Easy to use
Weaknesses:
- Lua language
Opportunities:
- Backed by DeepMind and Facebook
Threats:
- No Scala or Java integration
- No commercial support
Caffè
Strenghts:
- C++ & Python
- Good Performance
- GPU Support
Weaknesses:
- Focused on image processing
Opportunities:
- Backed by Yahoo for Spark integration
- Gpu Clustering
Threats:
- No commercial support
DeepLearning4j
Strenghts:
- GPU support
- Java and Scala
- Full DNN set
- Support Hadoop, Spark & Akka
Weaknesses:
- Not for dummies
Opportunities:
- Commercial support - SkyMind
Threats:
- Not so sexy for DataScientist because of
Java/Scala
H2O
• Easy to use Web UI
• Multi language API
• Run directly on HDFS or S3
• Model is Java PoJo
• Big Data Ready
• Really Fast
• Compressed data
• Regularization
• Grid Search
• GPU is still on roadmap
• CNN and RNN too
H2O - Flow
H20 – Sparkling Water
• Python, R and Scala API
• Best Kagglers use H20
• Tons of tools for profiling and tu
ning
• Spark leverage
• Best in class algorithms – battle
tested
• Regolarization
• Grid search
H20 – Sparkling Water
Workflow
POJO Java
Training Set
Embeddable in:
• J2EE App
• Spark Job
• MR Job
• DWH as UDF
training
Spark as middleware
Using Spark as middleware, you can leverage :
• Deeplearning4J
• H2O
• TensorFlow ( Arimo Extension)
• Caffe ( Yahoo Extension )
• ML MultilayerPerceptrons and future implementations
NO tech provider Lock-in
Our Stack for Enterprise
• Ready for Enterprise and Hadoop World
• Deployable into Java Env
• Notebook ( Flow )
• H2O for out of the box algorithms
• DeepLearning 4J for advanced DNN and
n-dimension array manipulation
• Good usability for both DataScientists and
Big Data Engineers
• Enterprise Support along the whole stack
Thanks!
We are hiring !
paolo.platter@agilelab.it

More Related Content

What's hot

Deploying MLlib for Scoring in Structured Streaming with Joseph Bradley
Deploying MLlib for Scoring in Structured Streaming with Joseph BradleyDeploying MLlib for Scoring in Structured Streaming with Joseph Bradley
Deploying MLlib for Scoring in Structured Streaming with Joseph BradleyDatabricks
 
Skymind Open Power Summit ISV Round Table
Skymind Open Power Summit ISV Round TableSkymind Open Power Summit ISV Round Table
Skymind Open Power Summit ISV Round TableAdam Gibson
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...Databricks
 
H2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
H2O World - Survey of Available Machine Learning Frameworks - Brendan HergerH2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
H2O World - Survey of Available Machine Learning Frameworks - Brendan HergerSri Ambati
 
Using PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of DataUsing PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of DataRobert Dempsey
 
Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelinesjeykottalam
 
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...Databricks
 
H2O Rains with Databricks Cloud - NY 02.16.16
H2O Rains with Databricks Cloud - NY 02.16.16H2O Rains with Databricks Cloud - NY 02.16.16
H2O Rains with Databricks Cloud - NY 02.16.16Sri Ambati
 
Productionizing H2O Models with Apache Spark with Jakub Hava and Michal Maloh...
Productionizing H2O Models with Apache Spark with Jakub Hava and Michal Maloh...Productionizing H2O Models with Apache Spark with Jakub Hava and Michal Maloh...
Productionizing H2O Models with Apache Spark with Jakub Hava and Michal Maloh...Databricks
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeIdo Shilon
 
Building an ML Platform with Ray and MLflow
Building an ML Platform with Ray and MLflowBuilding an ML Platform with Ray and MLflow
Building an ML Platform with Ray and MLflowDatabricks
 
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflowImproving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflowDatabricks
 
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...Databricks
 
Patterns and Anti-Patterns for Memorializing Data Science Project Artifacts
Patterns and Anti-Patterns for Memorializing Data Science Project ArtifactsPatterns and Anti-Patterns for Memorializing Data Science Project Artifacts
Patterns and Anti-Patterns for Memorializing Data Science Project ArtifactsDatabricks
 
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...Databricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Constrained Optimization with Genetic Algorithms and Project Bonsai
Constrained Optimization with Genetic Algorithms and Project BonsaiConstrained Optimization with Genetic Algorithms and Project Bonsai
Constrained Optimization with Genetic Algorithms and Project BonsaiIvo Andreev
 
Scalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2OScalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2OSri Ambati
 
Semantic Image Logging Using Approximate Statistics & MLflow
Semantic Image Logging Using Approximate Statistics & MLflowSemantic Image Logging Using Approximate Statistics & MLflow
Semantic Image Logging Using Approximate Statistics & MLflowDatabricks
 
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuireEmbracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuireDatabricks
 

What's hot (20)

Deploying MLlib for Scoring in Structured Streaming with Joseph Bradley
Deploying MLlib for Scoring in Structured Streaming with Joseph BradleyDeploying MLlib for Scoring in Structured Streaming with Joseph Bradley
Deploying MLlib for Scoring in Structured Streaming with Joseph Bradley
 
Skymind Open Power Summit ISV Round Table
Skymind Open Power Summit ISV Round TableSkymind Open Power Summit ISV Round Table
Skymind Open Power Summit ISV Round Table
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & Deep Learning ...
 
H2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
H2O World - Survey of Available Machine Learning Frameworks - Brendan HergerH2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
H2O World - Survey of Available Machine Learning Frameworks - Brendan Herger
 
Using PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of DataUsing PySpark to Process Boat Loads of Data
Using PySpark to Process Boat Loads of Data
 
Machine Learning Pipelines
Machine Learning PipelinesMachine Learning Pipelines
Machine Learning Pipelines
 
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
A Predictive Analytics Workflow on DICOM Images using Apache Spark with Anahi...
 
H2O Rains with Databricks Cloud - NY 02.16.16
H2O Rains with Databricks Cloud - NY 02.16.16H2O Rains with Databricks Cloud - NY 02.16.16
H2O Rains with Databricks Cloud - NY 02.16.16
 
Productionizing H2O Models with Apache Spark with Jakub Hava and Michal Maloh...
Productionizing H2O Models with Apache Spark with Jakub Hava and Michal Maloh...Productionizing H2O Models with Apache Spark with Jakub Hava and Michal Maloh...
Productionizing H2O Models with Apache Spark with Jakub Hava and Michal Maloh...
 
Production ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ wazeProduction ready big ml workflows from zero to hero daniel marcous @ waze
Production ready big ml workflows from zero to hero daniel marcous @ waze
 
Building an ML Platform with Ray and MLflow
Building an ML Platform with Ray and MLflowBuilding an ML Platform with Ray and MLflow
Building an ML Platform with Ray and MLflow
 
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflowImproving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
 
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
Deploying Python Machine Learning Models with Apache Spark with Brandon Hamri...
 
Patterns and Anti-Patterns for Memorializing Data Science Project Artifacts
Patterns and Anti-Patterns for Memorializing Data Science Project ArtifactsPatterns and Anti-Patterns for Memorializing Data Science Project Artifacts
Patterns and Anti-Patterns for Memorializing Data Science Project Artifacts
 
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...
Simplify Distributed TensorFlow Training for Fast Image Categorization at Sta...
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Constrained Optimization with Genetic Algorithms and Project Bonsai
Constrained Optimization with Genetic Algorithms and Project BonsaiConstrained Optimization with Genetic Algorithms and Project Bonsai
Constrained Optimization with Genetic Algorithms and Project Bonsai
 
Scalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2OScalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2O
 
Semantic Image Logging Using Approximate Statistics & MLflow
Semantic Image Logging Using Approximate Statistics & MLflowSemantic Image Logging Using Approximate Statistics & MLflow
Semantic Image Logging Using Approximate Statistics & MLflow
 
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuireEmbracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire
Embracing a Taxonomy of Types to Simplify Machine Learning with Leah McGuire
 

Similar to Bringing Deep Learning into production

Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesJen Aman
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDatabricks
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesJen Aman
 
Data Science and CDSW
Data Science and CDSWData Science and CDSW
Data Science and CDSWJason Hubbard
 
PPT5: Neuron Introduction
PPT5: Neuron IntroductionPPT5: Neuron Introduction
PPT5: Neuron Introductionakira-ai
 
Big Data Introduction - Solix empower
Big Data Introduction - Solix empowerBig Data Introduction - Solix empower
Big Data Introduction - Solix empowerDurga Gadiraju
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Jason Dai
 
Ncku csie talk about Spark
Ncku csie talk about SparkNcku csie talk about Spark
Ncku csie talk about SparkGiivee The
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...Databricks
 
Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...
Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...
Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...Data Con LA
 
The Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeThe Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeInside Analysis
 
Apache Deep Learning 201 - Philly Open Source
Apache Deep Learning 201 - Philly Open SourceApache Deep Learning 201 - Philly Open Source
Apache Deep Learning 201 - Philly Open SourceTimothy Spann
 
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Cloudera, Inc.
 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 editionDavid Talby
 
.NET per la Data Science e oltre
.NET per la Data Science e oltre.NET per la Data Science e oltre
.NET per la Data Science e oltreMarco Parenzan
 
Deploying Data Science Engines to Production
Deploying Data Science Engines to ProductionDeploying Data Science Engines to Production
Deploying Data Science Engines to ProductionMostafa Majidpour
 
Apache Spark in Scientific Applications
Apache Spark in Scientific ApplicationsApache Spark in Scientific Applications
Apache Spark in Scientific ApplicationsDr. Mirko Kämpf
 
Apache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsApache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsDr. Mirko Kämpf
 
BDTC2015 databricks-辛湜-state of spark
BDTC2015 databricks-辛湜-state of sparkBDTC2015 databricks-辛湜-state of spark
BDTC2015 databricks-辛湜-state of sparkJerry Wen
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopAmanda Casari
 

Similar to Bringing Deep Learning into production (20)

Deep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best PracticesDeep Learning on Apache® Spark™ : Workflows and Best Practices
Deep Learning on Apache® Spark™ : Workflows and Best Practices
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
 
Deep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best PracticesDeep Learning on Apache® Spark™: Workflows and Best Practices
Deep Learning on Apache® Spark™: Workflows and Best Practices
 
Data Science and CDSW
Data Science and CDSWData Science and CDSW
Data Science and CDSW
 
PPT5: Neuron Introduction
PPT5: Neuron IntroductionPPT5: Neuron Introduction
PPT5: Neuron Introduction
 
Big Data Introduction - Solix empower
Big Data Introduction - Solix empowerBig Data Introduction - Solix empower
Big Data Introduction - Solix empower
 
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
Build Deep Learning Applications for Big Data Platforms (CVPR 2018 tutorial)
 
Ncku csie talk about Spark
Ncku csie talk about SparkNcku csie talk about Spark
Ncku csie talk about Spark
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, and Deep Learnin...
 
Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...
Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...
Data Con LA 2018 - A Tale of DL Frameworks: TensorFlow, Keras, & Deep Learnin...
 
The Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On TimeThe Hadoop Guarantee: Keeping Analytics Running On Time
The Hadoop Guarantee: Keeping Analytics Running On Time
 
Apache Deep Learning 201 - Philly Open Source
Apache Deep Learning 201 - Philly Open SourceApache Deep Learning 201 - Philly Open Source
Apache Deep Learning 201 - Philly Open Source
 
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr
Analyzing Hadoop Data Using Sparklyr

Analyzing Hadoop Data Using Sparklyr

 
Architecting an Open Source AI Platform 2018 edition
Architecting an Open Source AI Platform   2018 editionArchitecting an Open Source AI Platform   2018 edition
Architecting an Open Source AI Platform 2018 edition
 
.NET per la Data Science e oltre
.NET per la Data Science e oltre.NET per la Data Science e oltre
.NET per la Data Science e oltre
 
Deploying Data Science Engines to Production
Deploying Data Science Engines to ProductionDeploying Data Science Engines to Production
Deploying Data Science Engines to Production
 
Apache Spark in Scientific Applications
Apache Spark in Scientific ApplicationsApache Spark in Scientific Applications
Apache Spark in Scientific Applications
 
Apache Spark in Scientific Applciations
Apache Spark in Scientific ApplciationsApache Spark in Scientific Applciations
Apache Spark in Scientific Applciations
 
BDTC2015 databricks-辛湜-state of spark
BDTC2015 databricks-辛湜-state of sparkBDTC2015 databricks-辛湜-state of spark
BDTC2015 databricks-辛湜-state of spark
 
Apache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code WorkshopApache Spark for Everyone - Women Who Code Workshop
Apache Spark for Everyone - Women Who Code Workshop
 

More from Paolo Platter

kafka simplicity and complexity
kafka simplicity and complexitykafka simplicity and complexity
kafka simplicity and complexityPaolo Platter
 
Wasp2 - IoT and Streaming Platform
Wasp2 - IoT and Streaming PlatformWasp2 - IoT and Streaming Platform
Wasp2 - IoT and Streaming PlatformPaolo Platter
 
Agile Lab_BigData_Meetup
Agile Lab_BigData_MeetupAgile Lab_BigData_Meetup
Agile Lab_BigData_MeetupPaolo Platter
 
Massive Streaming Analytics with Spark Streaming
Massive Streaming Analytics with Spark StreamingMassive Streaming Analytics with Spark Streaming
Massive Streaming Analytics with Spark StreamingPaolo Platter
 

More from Paolo Platter (6)

kafka simplicity and complexity
kafka simplicity and complexitykafka simplicity and complexity
kafka simplicity and complexity
 
Wasp2 - IoT and Streaming Platform
Wasp2 - IoT and Streaming PlatformWasp2 - IoT and Streaming Platform
Wasp2 - IoT and Streaming Platform
 
Meetup tensorframes
Meetup tensorframesMeetup tensorframes
Meetup tensorframes
 
Agile Lab_BigData_Meetup
Agile Lab_BigData_MeetupAgile Lab_BigData_Meetup
Agile Lab_BigData_Meetup
 
Massive Streaming Analytics with Spark Streaming
Massive Streaming Analytics with Spark StreamingMassive Streaming Analytics with Spark Streaming
Massive Streaming Analytics with Spark Streaming
 
Scala Intro
Scala IntroScala Intro
Scala Intro
 

Recently uploaded

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareJim McKeeth
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...Health
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024Mind IT Systems
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfkalichargn70th171
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplatePresentation.STUDIO
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrainmasabamasaba
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnAmarnathKambale
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfonteinmasabamasaba
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesVictorSzoltysek
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionOnePlan Solutions
 
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburgmasabamasaba
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park masabamasaba
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...masabamasaba
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyviewmasabamasaba
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsJhone kinadey
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfVishalKumarJha10
 

Recently uploaded (20)

call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Announcing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK SoftwareAnnouncing Codolex 2.0 from GDK Software
Announcing Codolex 2.0 from GDK Software
 
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Pushp Vihar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
+971565801893>>SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHAB...
 
10 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 202410 Trends Likely to Shape Enterprise Technology in 2024
10 Trends Likely to Shape Enterprise Technology in 2024
 
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdfPayment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
Payment Gateway Testing Simplified_ A Step-by-Step Guide for Beginners.pdf
 
AI & Machine Learning Presentation Template
AI & Machine Learning Presentation TemplateAI & Machine Learning Presentation Template
AI & Machine Learning Presentation Template
 
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
%in Bahrain+277-882-255-28 abortion pills for sale in Bahrain
 
VTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learnVTU technical seminar 8Th Sem on Scikit-learn
VTU technical seminar 8Th Sem on Scikit-learn
 
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
%in kaalfontein+277-882-255-28 abortion pills for sale in kaalfontein
 
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM TechniquesAI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
AI Mastery 201: Elevating Your Workflow with Advanced LLM Techniques
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) SolutionIntroducing Microsoft’s new Enterprise Work Management (EWM) Solution
Introducing Microsoft’s new Enterprise Work Management (EWM) Solution
 
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
%in Lydenburg+277-882-255-28 abortion pills for sale in Lydenburg
 
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park %in ivory park+277-882-255-28 abortion pills for sale in ivory park
%in ivory park+277-882-255-28 abortion pills for sale in ivory park
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
%+27788225528 love spells in Boston Psychic Readings, Attraction spells,Bring...
 
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
%in Hazyview+277-882-255-28 abortion pills for sale in Hazyview
 
Right Money Management App For Your Financial Goals
Right Money Management App For Your Financial GoalsRight Money Management App For Your Financial Goals
Right Money Management App For Your Financial Goals
 
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdfintroduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
introduction-to-automotive Andoid os-csimmonds-ndctechtown-2021.pdf
 

Bringing Deep Learning into production

  • 1.
  • 2. Brief introduction • CTO & co-founder of Agile Lab • Data & Tech addicted • Contributor of Spark Notebook • Spark early adopter • Certified Cassandra Architect • DeepLearning enthusiast
  • 3. Who is Agile Lab ? GO BIG (data) or GO HOME http://www.meetup.com/it-IT/Torino-Scala-Programming-Big-Data-Meetup/
  • 4. What we do Applications High scalability Decision Support Systems data engineering, data mining and data «meaning» Big Data Strategies Training Reactive, NoSQL, Big Data, Machine learning
  • 6. Deep Learning is trending
  • 7. What is Deep Learning • Deep learning is just another name for artificial neural networks • An algorithm is deep if the input is passed through several non-li nearity before being output • Deep learning is discovering the features that best represent the problem, rather than just a way to combine them
  • 9. Do you want start with Deep Learning ? Let’s choose the right tools !!
  • 10. Deep Learning Frameworks • Deeplearning4J • TensorFlow • Caffe • Theano • Torch • Spark ML MultilayerPerceptrons • H2O • CNTK • MatLab • maxDNN And many others
  • 11. How to choose Background Target Environment Vision
  • 12. Background Productivity !! • Scala • Java Big Data Engineer • Java • Python Math Engineer • R • Python Statistician
  • 13. Target Environment Trained model should be deployable !! Trained Model Dev Env Prod Env
  • 14. Target Environment Prod Env Dev Env Training Data Cleaning ETLScheduling ML Pipeline - Track model performance over time - Care about SLA - Continous tweaks
  • 15. Enterprise Architecture HADOOP Online DataStore Enterprise Service BUS DataIntegrationLayer Data Integration Layer DataIntegrationLayer External Sources ANALYTICS VALUE ADDED SERVICES API SERVICES Internal Business Sources Internal System Sources DeepLearning
  • 16. Easy Wins Training pipeline should run on Spark or Hadoop Trained Model should be represented in Java objects
  • 17. Vision: keep in mind Scaling High Level dynamic languages are incredibly productive for prototyping and data exploration Scaling on larger data sets quickly runs into performance limitations Keep in mind scaling requirements from beginning
  • 18. Vision: simplify the pipeline Copy & Sample data from Dev Env to Data Scientist Env Prototype in Python or R Train model Predict on validation Data Translate Model to match Prod Env  Java, MapReduce, Spark Deploy training pipeline and model
  • 19. Easy Wins Datascientists should work directly on distributed environment Datascientist and big data engineers should co-operate on the same platform
  • 21. Tensor Flow Strenghts: - Powered By Google - Nice UI Weaknesses: - Powered By Google - No support for “inline” matrix operations  Slow Opportunities: - Awesome community Threats: - No Scala or Java integration - No commercial support
  • 22. Theano Strenghts: - Grand Daddy of deep learning - RNN and CNN - Computational graph abstraction - Python Weaknesses: - No support for Hadoop or Spark - No plug & play nets Opportunities: - Great community Threats: - No Scala or Java integration - No commercial support
  • 23. Torch Strenghts: - GPU support - Lots of pretrained models and packages - Easy to use Weaknesses: - Lua language Opportunities: - Backed by DeepMind and Facebook Threats: - No Scala or Java integration - No commercial support
  • 24. Caffè Strenghts: - C++ & Python - Good Performance - GPU Support Weaknesses: - Focused on image processing Opportunities: - Backed by Yahoo for Spark integration - Gpu Clustering Threats: - No commercial support
  • 25. DeepLearning4j Strenghts: - GPU support - Java and Scala - Full DNN set - Support Hadoop, Spark & Akka Weaknesses: - Not for dummies Opportunities: - Commercial support - SkyMind Threats: - Not so sexy for DataScientist because of Java/Scala
  • 26. H2O • Easy to use Web UI • Multi language API • Run directly on HDFS or S3 • Model is Java PoJo • Big Data Ready • Really Fast • Compressed data • Regularization • Grid Search • GPU is still on roadmap • CNN and RNN too
  • 28. H20 – Sparkling Water • Python, R and Scala API • Best Kagglers use H20 • Tons of tools for profiling and tu ning • Spark leverage • Best in class algorithms – battle tested • Regolarization • Grid search
  • 30. Workflow POJO Java Training Set Embeddable in: • J2EE App • Spark Job • MR Job • DWH as UDF training
  • 31. Spark as middleware Using Spark as middleware, you can leverage : • Deeplearning4J • H2O • TensorFlow ( Arimo Extension) • Caffe ( Yahoo Extension ) • ML MultilayerPerceptrons and future implementations NO tech provider Lock-in
  • 32. Our Stack for Enterprise • Ready for Enterprise and Hadoop World • Deployable into Java Env • Notebook ( Flow ) • H2O for out of the box algorithms • DeepLearning 4J for advanced DNN and n-dimension array manipulation • Good usability for both DataScientists and Big Data Engineers • Enterprise Support along the whole stack
  • 33. Thanks! We are hiring ! paolo.platter@agilelab.it