SlideShare una empresa de Scribd logo
1 de 35
Descargar para leer sin conexión
Bucharest Big Data Meetup
Tech talks, use cases, all big data related topics
June 5th meetup
6:30 PM - 7:00 PM getting together
7:00 - 7:40 Productionizing Machine Learning,
Cosmin Pintoiu & Costina Batica @ Lentiq
7:40 - 8:15 Technology showdown: database vs blockchain,
Felix Crisan @ Blockchain Romania
8:15 - 8:45 Pizza and drinks sponsored by Netopia.
Sponsored by
Organizer
Valentina Crisan
Productionizing
Machine Learning
Cosmin Pintoiu – Solution Architect
cosmin.pintoiu@lentiq.com
Costina Batica – Software Developer
Agenda
1. Machine Learning made easy
2. From prototype to production (motivation for workflow and RCB)
3. Reusable code blocks (implementation)
4. Workflow Manager
5. Model Server
6. Demo time
7. Roadmap
8. Conclusions and Q&A
https://medium.freecodecamp.org/a-beginners-guide-to-training-and-deploying-machine-learning-models-using-python-48a313502e5a
Machine Learning
made easy
ü Gathering data
ü Preparing that data
ü Choosing a model
ü Training
ü Evaluation
ü Hyperparameter tuning
ü Prediction
The truth is
ü Productionizing ML is hard.
ü 5% of Machine Learning models/data science projects will be used in a Production Environment.
ü Most of the models developed will be “deployed” on power point slides.
ü If we want to be from this 5% we have to understand the problem of ML deployment and how to solve it.
Challenges of making the DS process successful
Small to medium companies
ü Lack of resources
ü Lack of skills and knowledge
ü Difficult to set up their own environment
ü Need additional developers for moving models into
production
ü Need devops for maintaining the environment
ü Struggles in defining the most valuable business problem
Enterprises
ü Over-centralized -> smaller and localized teams cannot be
agile
ü Lack of collaboration
ü Difficult to integrate new technologies in the enterprise
stack
ü Lack of visibility
ü Difficult to scale and put models in production
ü Centralized data ownership
A complete cycle
ü Gathering data
ü Preparing that data
ü Choosing a model
ü Training
ü Evaluation
ü Hyperparameter tuning
ü Prediction
We now have a model. What’s next?!
ü Serialize
ü Deploy
ü Serving / Scoring
ü Scaling
ü Update
ü Monitor
Continuos features
Vector Assember Scaler Scaled Continuos
Feature Vector
Categorical features
String Indexer
One hot
encoder
Categorical Feature
Vector
Vector Assember
Vector Assember
Final Feature
Vector
Linear Regression
Model
Object Storage
Model Server
Model Server
Model Server
…
Load Balancer
Inference API
request
From prototype to production
requestEnd User
Application
prediction
Continuos features
Vector Assember Scaler Scaled Continuos
Feature Vector
Categorical features
String Indexer
One hot
encoder
Categorical Feature
Vector
Vector Assember
Vector Assember
Final Feature
Vector
Linear Regression
Model
Object Storage
Model Server
Model Server
Model Server
…
Load Balancer
Inference API
request
From prototype to production
Model Server
RCB & Workflow manager
Reusable code blocks
What is a Reusable Code Block?
You can think of a Reusable Code Block as a template for
creating tasks. Users typically package frequent tasks such
as cleaning data, anonymizing data, training models etc.
Reusable Code Blocks are shared with the entire data lake
and are stored in Lentiq's global registry, meaning that any
user with access to it can reuse the code block to perform
similar tasks.
There are two possible sources for an RCB:
ü Custom Docker image: the image needs to be uploaded
to a public repository
ü A Jupyter Notebook (using Kaniko): one can choose
the notebook from a list of available published
notebooks, which are shared with all the users with
access to the notebook’s data lake. This makes
cooperation and sharing knowledge between
departments easy.
Reusable Code Blocks are shared with the entire data lake and are stored in Lentiq's global
registry, meaning that any user with access to it can reuse the code block to perform
similar tasks.
Dockerless containers. RCB and Kaniko
Kaniko is an open source tool created by Google for
building container images from a Dockerfile and pushing
them to a remote registry, without having root access to a
Docker daemon.
Kaniko enables building container images in environments
that cannot easily or securely run a Docker daemon, like a
Kubernetes cluster or a container.
It executes each command within a Dockerfile completely
in user-space, so the build does not require privileges.
Privileged mode should be avoided at all costs to ensure a
secure environment.
How does it work?
Kaniko builds as a root user within a container in an
unprivileged environment. The Kaniko executor then
fetches and extracts the base-image file system to root (the
base image is the image in the FROM line of the
Dockerfile).
It executes each command in order, and takes a snapshot of
the file system after each command. This snapshot is
created in user-space by walking the filesystem and
comparing it to the prior state that was stored in memory.
It appends any modifications to the filesystem as a new
layer to the base image, and makes any relevant changes to
image metadata. After executing every command in the
Dockerfile, the executor pushes the newly built image to
the desired registry.
Running Kaniko in a Kubernetes cluster
Kaniko is run as a container in the cluster.
The Job spec needs three arguments:
Running Kaniko in a Kubernetes cluster
Kaniko is run as a container in the cluster.
The Job spec needs three arguments:
ü -- dockerfile
ü -- context: a path to a Dockerfile. This can be:
• a Github repository (cloned using an init container)
• a place Kaniko has access to, like a GCS or S3 storage
bucket (compressed tar file) (or any other registry
supported by Docker credential helpers)
• a local directory (specified with an emptyDir volume)
ü -- destination (repository) where Kaniko pushes the image
Besides the Kubernetes Job definition, we have some additional requirements:
ü A Kubernetes cluster …
ü Kubernetes secret mounted as a data volume under /kaniko/.docker/config.json:
ü Contains registry credentials required for pushing the final image
ü For example, to push the image to a Docker Hub repository, you need to execute:
ü Otherwise, you create a Kubernetes secret with your registry credentials using the following command:
ü A configmap to store the Jupyter Notebook
Building inside a Kubernetes cluster
requestEnd User
Application
prediction
Continuos features
Vector Assember Scaler Scaled Continuos
Feature Vector
Categorical features
String Indexer
One hot
encoder
Categorical Feature
Vector
Vector Assember
Vector Assember
Final Feature
Vector
Linear Regression
Model
Object Storage
Model Server
Model Server
Model Server
…
Load Balancer
Inference API
request
From prototype to production
Model Server
RCB & Workflow manager
Why the need of workflows?!
How it used to be:
ü Train a model
ü Run a pipeline using scripts
ü Set manual triggers
ü Wait for jobs, ETL.
ü Monitor
What happened:
ü The need for more iterations
ü More experimentation
ü More work for ops
ü Tedious and repetitive tasks
ü Reduces productivity
We needed a tool to automate, schedule, and share machine learning pipelines
Introducing workflow manager
ü Develop - create model in any
framework (scikit-learn,
SparkML, Tensorflow, etc.)
ü Serialize – save it in a format that
can be stored and transmitted
over network
ü Serving – used the model for
online / batch inference
(prediction or scoring)
request
Regression, Clustering, Random Forest, K-mens, XGBoost, Neural Network
Object Storage
Model Server
Model Server
Model Server
…
Load Balancer
Inference API
request
End User
Application
prediction
Model Server
SparkMLlib
Scikit-learn
Tensorflow:
One Runtime
Model serialization / persistence
Why MLeap?
ü MLeap is a common serialization format and execution
engine for machine learning pipelines.
ü Minimizes the effort to serve models within a production
environment
ü MLeap provides simple interfaces to execute entire ML
pipelines, from feature transformers to classifiers,
regressions, clustering algorithms, and neural networks
https://www.slideshare.net/JenAman/mleap-productionize-data-science-workflows-using-spark
What else is there?
ü PMML (xml) - Predictive Model
Markup Language
ü ONNX (protobuf, DL, Tensors) -
Open Neural Network Exchange
ü NNEF(DL, Tensors) - Neural
networks exchange format
ü PFA (json) – Portable Format for
Analitycs
Hard-coded
models
(SQL, Java, Ruby)
PMML
Emerging
Solutions
(yHat,
DataRobot)
Enterprise
Solutions
(Microsoft, IBM,
SAS)
Quick to
implement
Open Sourced
Commited to
Spark/ Hadoop
API Server
Infrastructure
https://www.slideshare.net/JenAman/mleap-productionize-data-science-workflows-using-spark
ü Low latency
ü Scale fast (horizontally)
ü Reliable and robust
ü Model versioning and in place
updates
ü Monitoring and management
ü Both online and batch mode
ü Auto scaling
What do we want from a model server?
Demo time
Auto Scaling ML serving
Roadmap
ü One of the hardest problems to solve is scaling in
a cost effective way
ü You have hundreds of API calls for predictions at
01:00 PM but we might encounter 100 000 calls
at 07:00 PM (this is the time most users will use
the app)
ü We need:
ü Target metric
ü Min-max capacity
ü Cool down period
Monitoring and
troubleshooting
Roadmap
ü Monitoring and profiling of
production traffic
ü Monitoring of models
performance
ü Model interpretation
(Interpretability is as important as
creating a model)
ü Multiple trials
ü Bayesian methods
ü Hyper-parameter tuning using ParamGrid
ü Ideally to learn from previous runs
ü Run multiple experiments in parallel or sequentially
ü Ideally to learn from previous experiments (to guide future experiments)
Roadmap
HyperParameter
tuning
Distributed training
(Tensorflow & SparkML)
Roadmap
ü It can take a loooong time to
train
ü 1.000 cpus, gpu , tpu
ü HorovodRunner: Distributed
Deep Learning
Conclusions
ü Jupyter notebooks / code can be encapsulated inside docker containers to be shared
and reused
ü Workflow engine automate and schedule machine learning pipelines
ü Machine learning models are queried via REST APIs
ü Scalable model serving / inference using Model Server
ü https://github.com/mlflow/mlflow-example/
ü https://towardsdatascience.com/the-7-steps-of-machine-learning-2877d7e5548e
ü https://www.anaconda.com/productionizing-and-deploying-data-science-projects/
ü https://events.linuxfoundation.org/wp-content/uploads/2017/12/Productionizing-ML-Pipelines-with-the-Portable-Format-for-
Analytics-Nick-Pentreath-IBM.pdf
ü https://hackernoon.com/a-guide-to-scaling-machine-learning-models-in-production-aa8831163846
ü https://blog.algorithmia.com/deploying-machine-learning-at-scale/
ü https://medium.freecodecamp.org/a-beginners-guide-to-training-and-deploying-machine-learning-models-using-python-
48a313502e5a
ü https://towardsdatascience.com/how-to-train-your-neural-networks-in-parallel-with-keras-and-apache-spark-ea8a3f48cae6
ü https://hydrosphere.io/serving-docs/latest/components/runtimes.html
References:

Más contenido relacionado

La actualidad más candente

Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsAnyscale
 
Automating machine learning lifecycle with kubeflow
Automating machine learning lifecycle with kubeflowAutomating machine learning lifecycle with kubeflow
Automating machine learning lifecycle with kubeflowStepan Pushkarev
 
Data Agility—A Journey to Advanced Analytics and Machine Learning at Scale
Data Agility—A Journey to Advanced Analytics and Machine Learning at ScaleData Agility—A Journey to Advanced Analytics and Machine Learning at Scale
Data Agility—A Journey to Advanced Analytics and Machine Learning at ScaleDatabricks
 
Deep Dive of ADBMS Migration to Apache Spark—Use Cases Sharing
Deep Dive of ADBMS Migration to Apache Spark—Use Cases SharingDeep Dive of ADBMS Migration to Apache Spark—Use Cases Sharing
Deep Dive of ADBMS Migration to Apache Spark—Use Cases SharingDatabricks
 
Hopsworks Feature Store 2.0 a new paradigm
Hopsworks Feature Store  2.0   a new paradigmHopsworks Feature Store  2.0   a new paradigm
Hopsworks Feature Store 2.0 a new paradigmJim Dowling
 
Build, Scale, and Deploy Deep Learning Pipelines with Ease
Build, Scale, and Deploy Deep Learning Pipelines with EaseBuild, Scale, and Deploy Deep Learning Pipelines with Ease
Build, Scale, and Deploy Deep Learning Pipelines with EaseDatabricks
 
ROCm and Distributed Deep Learning on Spark and TensorFlow
ROCm and Distributed Deep Learning on Spark and TensorFlowROCm and Distributed Deep Learning on Spark and TensorFlow
ROCm and Distributed Deep Learning on Spark and TensorFlowDatabricks
 
MLFlow 1.0 Meetup
MLFlow 1.0 Meetup MLFlow 1.0 Meetup
MLFlow 1.0 Meetup Databricks
 
Strudel: Framework for Transaction Performance Analyses on SQL/NoSQL Systems
Strudel: Framework for Transaction Performance Analyses on SQL/NoSQL SystemsStrudel: Framework for Transaction Performance Analyses on SQL/NoSQL Systems
Strudel: Framework for Transaction Performance Analyses on SQL/NoSQL Systemstatemura
 
Migrating ETL Workflow to Apache Spark at Scale in Pinterest
Migrating ETL Workflow to Apache Spark at Scale in PinterestMigrating ETL Workflow to Apache Spark at Scale in Pinterest
Migrating ETL Workflow to Apache Spark at Scale in PinterestDatabricks
 
Big Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source ToolkitsBig Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source ToolkitsDataWorks Summit
 
Accelerated Training of Transformer Models
Accelerated Training of Transformer ModelsAccelerated Training of Transformer Models
Accelerated Training of Transformer ModelsDatabricks
 
Anomaly Detection with Azure and .NET
Anomaly Detection with Azure and .NETAnomaly Detection with Azure and .NET
Anomaly Detection with Azure and .NETMarco Parenzan
 
DAWN and Scientific Workflows
DAWN and Scientific WorkflowsDAWN and Scientific Workflows
DAWN and Scientific WorkflowsMatthew Gerring
 
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent MemoryAccelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent MemoryDatabricks
 
Scaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of ParametersScaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of ParametersJen Aman
 
Migrating Apache Spark ML Jobs to Spark + Tensorflow on Kubeflow
Migrating Apache Spark ML Jobs to Spark + Tensorflow on KubeflowMigrating Apache Spark ML Jobs to Spark + Tensorflow on Kubeflow
Migrating Apache Spark ML Jobs to Spark + Tensorflow on KubeflowDatabricks
 
Accelerating Data Science with Better Data Engineering on Databricks
Accelerating Data Science with Better Data Engineering on DatabricksAccelerating Data Science with Better Data Engineering on Databricks
Accelerating Data Science with Better Data Engineering on DatabricksDatabricks
 
Deploying Enterprise Deep Learning Masterclass Preview - Enterprise Deep Lea...
Deploying Enterprise Deep Learning Masterclass Preview -  Enterprise Deep Lea...Deploying Enterprise Deep Learning Masterclass Preview -  Enterprise Deep Lea...
Deploying Enterprise Deep Learning Masterclass Preview - Enterprise Deep Lea...Sam Putnam [Deep Learning]
 

La actualidad más candente (20)

Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
 
Automating machine learning lifecycle with kubeflow
Automating machine learning lifecycle with kubeflowAutomating machine learning lifecycle with kubeflow
Automating machine learning lifecycle with kubeflow
 
Data Agility—A Journey to Advanced Analytics and Machine Learning at Scale
Data Agility—A Journey to Advanced Analytics and Machine Learning at ScaleData Agility—A Journey to Advanced Analytics and Machine Learning at Scale
Data Agility—A Journey to Advanced Analytics and Machine Learning at Scale
 
Deep Dive of ADBMS Migration to Apache Spark—Use Cases Sharing
Deep Dive of ADBMS Migration to Apache Spark—Use Cases SharingDeep Dive of ADBMS Migration to Apache Spark—Use Cases Sharing
Deep Dive of ADBMS Migration to Apache Spark—Use Cases Sharing
 
Hopsworks Feature Store 2.0 a new paradigm
Hopsworks Feature Store  2.0   a new paradigmHopsworks Feature Store  2.0   a new paradigm
Hopsworks Feature Store 2.0 a new paradigm
 
Build, Scale, and Deploy Deep Learning Pipelines with Ease
Build, Scale, and Deploy Deep Learning Pipelines with EaseBuild, Scale, and Deploy Deep Learning Pipelines with Ease
Build, Scale, and Deploy Deep Learning Pipelines with Ease
 
ROCm and Distributed Deep Learning on Spark and TensorFlow
ROCm and Distributed Deep Learning on Spark and TensorFlowROCm and Distributed Deep Learning on Spark and TensorFlow
ROCm and Distributed Deep Learning on Spark and TensorFlow
 
MLFlow 1.0 Meetup
MLFlow 1.0 Meetup MLFlow 1.0 Meetup
MLFlow 1.0 Meetup
 
Strudel: Framework for Transaction Performance Analyses on SQL/NoSQL Systems
Strudel: Framework for Transaction Performance Analyses on SQL/NoSQL SystemsStrudel: Framework for Transaction Performance Analyses on SQL/NoSQL Systems
Strudel: Framework for Transaction Performance Analyses on SQL/NoSQL Systems
 
Migrating ETL Workflow to Apache Spark at Scale in Pinterest
Migrating ETL Workflow to Apache Spark at Scale in PinterestMigrating ETL Workflow to Apache Spark at Scale in Pinterest
Migrating ETL Workflow to Apache Spark at Scale in Pinterest
 
Big Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source ToolkitsBig Data Analytics-Open Source Toolkits
Big Data Analytics-Open Source Toolkits
 
Accelerated Training of Transformer Models
Accelerated Training of Transformer ModelsAccelerated Training of Transformer Models
Accelerated Training of Transformer Models
 
Anomaly Detection with Azure and .NET
Anomaly Detection with Azure and .NETAnomaly Detection with Azure and .NET
Anomaly Detection with Azure and .NET
 
DAWN and Scientific Workflows
DAWN and Scientific WorkflowsDAWN and Scientific Workflows
DAWN and Scientific Workflows
 
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent MemoryAccelerate Your Apache Spark with Intel Optane DC Persistent Memory
Accelerate Your Apache Spark with Intel Optane DC Persistent Memory
 
Scaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of ParametersScaling Machine Learning To Billions Of Parameters
Scaling Machine Learning To Billions Of Parameters
 
Migrating Apache Spark ML Jobs to Spark + Tensorflow on Kubeflow
Migrating Apache Spark ML Jobs to Spark + Tensorflow on KubeflowMigrating Apache Spark ML Jobs to Spark + Tensorflow on Kubeflow
Migrating Apache Spark ML Jobs to Spark + Tensorflow on Kubeflow
 
Catalyst optimizer
Catalyst optimizerCatalyst optimizer
Catalyst optimizer
 
Accelerating Data Science with Better Data Engineering on Databricks
Accelerating Data Science with Better Data Engineering on DatabricksAccelerating Data Science with Better Data Engineering on Databricks
Accelerating Data Science with Better Data Engineering on Databricks
 
Deploying Enterprise Deep Learning Masterclass Preview - Enterprise Deep Lea...
Deploying Enterprise Deep Learning Masterclass Preview -  Enterprise Deep Lea...Deploying Enterprise Deep Learning Masterclass Preview -  Enterprise Deep Lea...
Deploying Enterprise Deep Learning Masterclass Preview - Enterprise Deep Lea...
 

Similar a Productionizing Machine Learning - Bigdata meetup 5-06-2019

OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLNordic APIs
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaData Science Milan
 
Running Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesRunning Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesDatabricks
 
Open shift and docker - october,2014
Open shift and docker - october,2014Open shift and docker - october,2014
Open shift and docker - october,2014Hojoong Kim
 
Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Tushar Katarki
 
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...Henry Saputra
 
Containerized architectures for deep learning
Containerized architectures for deep learningContainerized architectures for deep learning
Containerized architectures for deep learningAntje Barth
 
Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm
Genomic Computation at Scale with Serverless, StackStorm and Docker SwarmGenomic Computation at Scale with Serverless, StackStorm and Docker Swarm
Genomic Computation at Scale with Serverless, StackStorm and Docker SwarmDmitri Zimine
 
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
TensorFlow meetup: Keras - Pytorch - TensorFlow.jsTensorFlow meetup: Keras - Pytorch - TensorFlow.js
TensorFlow meetup: Keras - Pytorch - TensorFlow.jsStijn Decubber
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...James Anderson
 
Hitchhiker's guide to Cloud-Native Build Pipelines and Infrastructure as Code
Hitchhiker's guide to Cloud-Native Build Pipelines and Infrastructure as CodeHitchhiker's guide to Cloud-Native Build Pipelines and Infrastructure as Code
Hitchhiker's guide to Cloud-Native Build Pipelines and Infrastructure as CodeRobert van Mölken
 
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...ScyllaDB
 
Ultimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on KubernetesUltimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on Kuberneteskloia
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...GetInData
 
Spring Cloud Services with Pivotal Cloud Foundry- Gokhan Goksu
Spring Cloud Services with Pivotal Cloud Foundry- Gokhan GoksuSpring Cloud Services with Pivotal Cloud Foundry- Gokhan Goksu
Spring Cloud Services with Pivotal Cloud Foundry- Gokhan GoksuVMware Tanzu
 
Onion Architecture with S#arp
Onion Architecture with S#arpOnion Architecture with S#arp
Onion Architecture with S#arpGary Pedretti
 
Clipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemClipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemDatabricks
 
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...Lightbend
 
Slide DevSecOps Microservices
Slide DevSecOps Microservices Slide DevSecOps Microservices
Slide DevSecOps Microservices Hendri Karisma
 

Similar a Productionizing Machine Learning - Bigdata meetup 5-06-2019 (20)

NextGenML
NextGenML NextGenML
NextGenML
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of ML
 
Serverless machine learning architectures at Helixa
Serverless machine learning architectures at HelixaServerless machine learning architectures at Helixa
Serverless machine learning architectures at Helixa
 
Running Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using KubernetesRunning Apache Spark Jobs Using Kubernetes
Running Apache Spark Jobs Using Kubernetes
 
Open shift and docker - october,2014
Open shift and docker - october,2014Open shift and docker - october,2014
Open shift and docker - october,2014
 
Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes Scaling AI/ML with Containers and Kubernetes
Scaling AI/ML with Containers and Kubernetes
 
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
S8277 - Introducing Krylov: AI Platform that Empowers eBay Data Science and E...
 
Containerized architectures for deep learning
Containerized architectures for deep learningContainerized architectures for deep learning
Containerized architectures for deep learning
 
Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm
Genomic Computation at Scale with Serverless, StackStorm and Docker SwarmGenomic Computation at Scale with Serverless, StackStorm and Docker Swarm
Genomic Computation at Scale with Serverless, StackStorm and Docker Swarm
 
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
TensorFlow meetup: Keras - Pytorch - TensorFlow.jsTensorFlow meetup: Keras - Pytorch - TensorFlow.js
TensorFlow meetup: Keras - Pytorch - TensorFlow.js
 
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
GDG Cloud Southlake #16: Priyanka Vergadia: Scalable Data Analytics in Google...
 
Hitchhiker's guide to Cloud-Native Build Pipelines and Infrastructure as Code
Hitchhiker's guide to Cloud-Native Build Pipelines and Infrastructure as CodeHitchhiker's guide to Cloud-Native Build Pipelines and Infrastructure as Code
Hitchhiker's guide to Cloud-Native Build Pipelines and Infrastructure as Code
 
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
Simplifying the Creation of Machine Learning Workflow Pipelines for IoT Appli...
 
Ultimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on KubernetesUltimate Guide to Microservice Architecture on Kubernetes
Ultimate Guide to Microservice Architecture on Kubernetes
 
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
Functioning incessantly of Data Science Platform with Kubeflow - Albert Lewan...
 
Spring Cloud Services with Pivotal Cloud Foundry- Gokhan Goksu
Spring Cloud Services with Pivotal Cloud Foundry- Gokhan GoksuSpring Cloud Services with Pivotal Cloud Foundry- Gokhan Goksu
Spring Cloud Services with Pivotal Cloud Foundry- Gokhan Goksu
 
Onion Architecture with S#arp
Onion Architecture with S#arpOnion Architecture with S#arp
Onion Architecture with S#arp
 
Clipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving SystemClipper: A Low-Latency Online Prediction Serving System
Clipper: A Low-Latency Online Prediction Serving System
 
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
How to build streaming data pipelines with Akka Streams, Flink, and Spark usi...
 
Slide DevSecOps Microservices
Slide DevSecOps Microservices Slide DevSecOps Microservices
Slide DevSecOps Microservices
 

Último

Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086anil_gaur
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...roncy bisnoi
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756dollysharma2066
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Bookingdharasingh5698
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfRagavanV2
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptxJIT KUMAR GUPTA
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptDineshKumar4165
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoordharasingh5698
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfRagavanV2
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringmulugeta48
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxJuliansyahHarahap1
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . pptDineshKumar4165
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueBhangaleSonal
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXssuser89054b
 

Último (20)

Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086Minimum and Maximum Modes of microprocessor 8086
Minimum and Maximum Modes of microprocessor 8086
 
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
Call Girls Pimpri Chinchwad Call Me 7737669865 Budget Friendly No Advance Boo...
 
Integrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - NeometrixIntegrated Test Rig For HTFE-25 - Neometrix
Integrated Test Rig For HTFE-25 - Neometrix
 
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
FULL ENJOY Call Girls In Mahipalpur Delhi Contact Us 8377877756
 
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 BookingVIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
VIP Call Girls Palanpur 7001035870 Whatsapp Number, 24/07 Booking
 
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
(INDIRA) Call Girl Meerut Call Now 8617697112 Meerut Escorts 24x7
 
Unit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdfUnit 2- Effective stress & Permeability.pdf
Unit 2- Effective stress & Permeability.pdf
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoorTop Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
Top Rated Call Girls In chittoor 📱 {7001035870} VIP Escorts chittoor
 
Unit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdfUnit 1 - Soil Classification and Compaction.pdf
Unit 1 - Soil Classification and Compaction.pdf
 
chapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineeringchapter 5.pptx: drainage and irrigation engineering
chapter 5.pptx: drainage and irrigation engineering
 
Work-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptxWork-Permit-Receiver-in-Saudi-Aramco.pptx
Work-Permit-Receiver-in-Saudi-Aramco.pptx
 
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bangalore ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort ServiceCall Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
Call Girls in Netaji Nagar, Delhi 💯 Call Us 🔝9953056974 🔝 Escort Service
 
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced LoadsFEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
FEA Based Level 3 Assessment of Deformed Tanks with Fluid Induced Loads
 
Thermal Engineering Unit - I & II . ppt
Thermal Engineering  Unit - I & II . pptThermal Engineering  Unit - I & II . ppt
Thermal Engineering Unit - I & II . ppt
 
Double Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torqueDouble Revolving field theory-how the rotor develops torque
Double Revolving field theory-how the rotor develops torque
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 

Productionizing Machine Learning - Bigdata meetup 5-06-2019

  • 1. Bucharest Big Data Meetup Tech talks, use cases, all big data related topics June 5th meetup 6:30 PM - 7:00 PM getting together 7:00 - 7:40 Productionizing Machine Learning, Cosmin Pintoiu & Costina Batica @ Lentiq 7:40 - 8:15 Technology showdown: database vs blockchain, Felix Crisan @ Blockchain Romania 8:15 - 8:45 Pizza and drinks sponsored by Netopia. Sponsored by Organizer Valentina Crisan
  • 2. Productionizing Machine Learning Cosmin Pintoiu – Solution Architect cosmin.pintoiu@lentiq.com Costina Batica – Software Developer
  • 3. Agenda 1. Machine Learning made easy 2. From prototype to production (motivation for workflow and RCB) 3. Reusable code blocks (implementation) 4. Workflow Manager 5. Model Server 6. Demo time 7. Roadmap 8. Conclusions and Q&A
  • 4. https://medium.freecodecamp.org/a-beginners-guide-to-training-and-deploying-machine-learning-models-using-python-48a313502e5a Machine Learning made easy ü Gathering data ü Preparing that data ü Choosing a model ü Training ü Evaluation ü Hyperparameter tuning ü Prediction
  • 5.
  • 6. The truth is ü Productionizing ML is hard. ü 5% of Machine Learning models/data science projects will be used in a Production Environment. ü Most of the models developed will be “deployed” on power point slides. ü If we want to be from this 5% we have to understand the problem of ML deployment and how to solve it.
  • 7. Challenges of making the DS process successful Small to medium companies ü Lack of resources ü Lack of skills and knowledge ü Difficult to set up their own environment ü Need additional developers for moving models into production ü Need devops for maintaining the environment ü Struggles in defining the most valuable business problem Enterprises ü Over-centralized -> smaller and localized teams cannot be agile ü Lack of collaboration ü Difficult to integrate new technologies in the enterprise stack ü Lack of visibility ü Difficult to scale and put models in production ü Centralized data ownership
  • 8. A complete cycle ü Gathering data ü Preparing that data ü Choosing a model ü Training ü Evaluation ü Hyperparameter tuning ü Prediction We now have a model. What’s next?! ü Serialize ü Deploy ü Serving / Scoring ü Scaling ü Update ü Monitor
  • 9.
  • 10. Continuos features Vector Assember Scaler Scaled Continuos Feature Vector Categorical features String Indexer One hot encoder Categorical Feature Vector Vector Assember Vector Assember Final Feature Vector Linear Regression Model Object Storage Model Server Model Server Model Server … Load Balancer Inference API request From prototype to production
  • 11. requestEnd User Application prediction Continuos features Vector Assember Scaler Scaled Continuos Feature Vector Categorical features String Indexer One hot encoder Categorical Feature Vector Vector Assember Vector Assember Final Feature Vector Linear Regression Model Object Storage Model Server Model Server Model Server … Load Balancer Inference API request From prototype to production Model Server RCB & Workflow manager
  • 13. What is a Reusable Code Block? You can think of a Reusable Code Block as a template for creating tasks. Users typically package frequent tasks such as cleaning data, anonymizing data, training models etc. Reusable Code Blocks are shared with the entire data lake and are stored in Lentiq's global registry, meaning that any user with access to it can reuse the code block to perform similar tasks. There are two possible sources for an RCB: ü Custom Docker image: the image needs to be uploaded to a public repository ü A Jupyter Notebook (using Kaniko): one can choose the notebook from a list of available published notebooks, which are shared with all the users with access to the notebook’s data lake. This makes cooperation and sharing knowledge between departments easy.
  • 14. Reusable Code Blocks are shared with the entire data lake and are stored in Lentiq's global registry, meaning that any user with access to it can reuse the code block to perform similar tasks.
  • 15. Dockerless containers. RCB and Kaniko Kaniko is an open source tool created by Google for building container images from a Dockerfile and pushing them to a remote registry, without having root access to a Docker daemon. Kaniko enables building container images in environments that cannot easily or securely run a Docker daemon, like a Kubernetes cluster or a container. It executes each command within a Dockerfile completely in user-space, so the build does not require privileges. Privileged mode should be avoided at all costs to ensure a secure environment.
  • 16. How does it work? Kaniko builds as a root user within a container in an unprivileged environment. The Kaniko executor then fetches and extracts the base-image file system to root (the base image is the image in the FROM line of the Dockerfile). It executes each command in order, and takes a snapshot of the file system after each command. This snapshot is created in user-space by walking the filesystem and comparing it to the prior state that was stored in memory. It appends any modifications to the filesystem as a new layer to the base image, and makes any relevant changes to image metadata. After executing every command in the Dockerfile, the executor pushes the newly built image to the desired registry.
  • 17. Running Kaniko in a Kubernetes cluster Kaniko is run as a container in the cluster. The Job spec needs three arguments:
  • 18. Running Kaniko in a Kubernetes cluster Kaniko is run as a container in the cluster. The Job spec needs three arguments: ü -- dockerfile ü -- context: a path to a Dockerfile. This can be: • a Github repository (cloned using an init container) • a place Kaniko has access to, like a GCS or S3 storage bucket (compressed tar file) (or any other registry supported by Docker credential helpers) • a local directory (specified with an emptyDir volume) ü -- destination (repository) where Kaniko pushes the image
  • 19. Besides the Kubernetes Job definition, we have some additional requirements: ü A Kubernetes cluster … ü Kubernetes secret mounted as a data volume under /kaniko/.docker/config.json: ü Contains registry credentials required for pushing the final image ü For example, to push the image to a Docker Hub repository, you need to execute: ü Otherwise, you create a Kubernetes secret with your registry credentials using the following command: ü A configmap to store the Jupyter Notebook Building inside a Kubernetes cluster
  • 20. requestEnd User Application prediction Continuos features Vector Assember Scaler Scaled Continuos Feature Vector Categorical features String Indexer One hot encoder Categorical Feature Vector Vector Assember Vector Assember Final Feature Vector Linear Regression Model Object Storage Model Server Model Server Model Server … Load Balancer Inference API request From prototype to production Model Server RCB & Workflow manager
  • 21. Why the need of workflows?! How it used to be: ü Train a model ü Run a pipeline using scripts ü Set manual triggers ü Wait for jobs, ETL. ü Monitor What happened: ü The need for more iterations ü More experimentation ü More work for ops ü Tedious and repetitive tasks ü Reduces productivity We needed a tool to automate, schedule, and share machine learning pipelines
  • 23. ü Develop - create model in any framework (scikit-learn, SparkML, Tensorflow, etc.) ü Serialize – save it in a format that can be stored and transmitted over network ü Serving – used the model for online / batch inference (prediction or scoring) request Regression, Clustering, Random Forest, K-mens, XGBoost, Neural Network Object Storage Model Server Model Server Model Server … Load Balancer Inference API request End User Application prediction Model Server SparkMLlib Scikit-learn Tensorflow: One Runtime Model serialization / persistence
  • 24. Why MLeap? ü MLeap is a common serialization format and execution engine for machine learning pipelines. ü Minimizes the effort to serve models within a production environment ü MLeap provides simple interfaces to execute entire ML pipelines, from feature transformers to classifiers, regressions, clustering algorithms, and neural networks
  • 25. https://www.slideshare.net/JenAman/mleap-productionize-data-science-workflows-using-spark What else is there? ü PMML (xml) - Predictive Model Markup Language ü ONNX (protobuf, DL, Tensors) - Open Neural Network Exchange ü NNEF(DL, Tensors) - Neural networks exchange format ü PFA (json) – Portable Format for Analitycs Hard-coded models (SQL, Java, Ruby) PMML Emerging Solutions (yHat, DataRobot) Enterprise Solutions (Microsoft, IBM, SAS) Quick to implement Open Sourced Commited to Spark/ Hadoop API Server Infrastructure
  • 26. https://www.slideshare.net/JenAman/mleap-productionize-data-science-workflows-using-spark ü Low latency ü Scale fast (horizontally) ü Reliable and robust ü Model versioning and in place updates ü Monitoring and management ü Both online and batch mode ü Auto scaling What do we want from a model server?
  • 28. Auto Scaling ML serving Roadmap ü One of the hardest problems to solve is scaling in a cost effective way ü You have hundreds of API calls for predictions at 01:00 PM but we might encounter 100 000 calls at 07:00 PM (this is the time most users will use the app) ü We need: ü Target metric ü Min-max capacity ü Cool down period
  • 29. Monitoring and troubleshooting Roadmap ü Monitoring and profiling of production traffic ü Monitoring of models performance ü Model interpretation (Interpretability is as important as creating a model)
  • 30. ü Multiple trials ü Bayesian methods ü Hyper-parameter tuning using ParamGrid ü Ideally to learn from previous runs ü Run multiple experiments in parallel or sequentially ü Ideally to learn from previous experiments (to guide future experiments) Roadmap HyperParameter tuning
  • 31. Distributed training (Tensorflow & SparkML) Roadmap ü It can take a loooong time to train ü 1.000 cpus, gpu , tpu ü HorovodRunner: Distributed Deep Learning
  • 32. Conclusions ü Jupyter notebooks / code can be encapsulated inside docker containers to be shared and reused ü Workflow engine automate and schedule machine learning pipelines ü Machine learning models are queried via REST APIs ü Scalable model serving / inference using Model Server
  • 33.
  • 34.
  • 35. ü https://github.com/mlflow/mlflow-example/ ü https://towardsdatascience.com/the-7-steps-of-machine-learning-2877d7e5548e ü https://www.anaconda.com/productionizing-and-deploying-data-science-projects/ ü https://events.linuxfoundation.org/wp-content/uploads/2017/12/Productionizing-ML-Pipelines-with-the-Portable-Format-for- Analytics-Nick-Pentreath-IBM.pdf ü https://hackernoon.com/a-guide-to-scaling-machine-learning-models-in-production-aa8831163846 ü https://blog.algorithmia.com/deploying-machine-learning-at-scale/ ü https://medium.freecodecamp.org/a-beginners-guide-to-training-and-deploying-machine-learning-models-using-python- 48a313502e5a ü https://towardsdatascience.com/how-to-train-your-neural-networks-in-parallel-with-keras-and-apache-spark-ea8a3f48cae6 ü https://hydrosphere.io/serving-docs/latest/components/runtimes.html References: