Deploying machine learning models from training to production requires companies to deal with the complexity of moving workloads through different pipelines and re-writing code from scratch.
3. 85% of AI Projects Never Make it to Production
Research Environment Production Pipeline
Build from
Scratch
with a Large
Team
Manual
extraction
In-mem
analysis
Small scale
training
Manual
evaluation
Real-time
ingestion
Preparation at
scale
Train with many
params & large data
Real-time events
& data features
ETL Streaming APIs
Sync
4. Because Model Development is
Just the First Step
Develop and
Test Locally
Package
─
• Dependencies
• Parameters
• Run scripts
• Build
Scale-out
─
• Load-balance
• Data partitions
• Model distribution
• AutoML
Tune
─
• Parallelism
• GPU support
• Query tuning
• Caching
Instrument
─
• Monitoring
• Logging
• Versioning
• Security
Automate
─
• CI/CD
• Workflows
• Rolling upgrades
• A/B testing
Weeks
with one data scientist
or developer
Months
with a large team of developers,
scientists, data engineers and DevOps
Production
5. What Is An Automated ML Pipeline ?
5
ETL, Streaming,
Logs, Scrapers, ..
Ingest Prepare Train
With hyper-params,
multiple algorithms
Validate Deploy ++
Join, Aggregate,
Split, ..
Test, deploy, monitor
model & API servers
End to end pipeline orchestration and tracking
Serverless:
ML & Analytics
Functions
Features/Data:
Fast, Secure,
Versioned base features train + test datasets model report report metricsRT features
feedback
Selected model
with test data
6. Modern Data-Science Platform Architecture
Auto ML
Experiment
Tracking
Feature
Store
Workflows
(Kubeflow)
Pipeline
Orchestration
Managed Functions and Services
Serverless
Automation
Shared GPU/CPU Resources
Data lake or object store
Real-time data and
DBaaS
Data layer
7. Serverless Enable:
Resource elasticity, Automated Deployment and Operations
Serverless Today Data Prep and Training
Task lifespan Millisecs to mins Secs to hours
Scaling Load-balancer Partition, shuffle, reduce,
Hyper-params, RDD
State Stateless Stateful
Input Event Params, Datasets
So why not use Serverless for training and data prep?
6
Time we extend Serverless to data-science !
8. ML & Analytics Functions Architecture
User Code OR
ML service
Runtime / SaaS
(e.g. Spark, Dask,
Horovod, Nuclio, ..)
Data / Feature
stores
Secrets
Artifacts &
Models
Ops
ML Pipeline
Inputs OutputsML Function
9. KubeFlow+Serverless: Automated ML Pipelines
What is Kubeflow ?
▪ Operators for ML frameworks
(lifecycle management, scale-out, ..)
▪ Managed notebooks
▪ ML Pipeline Automation
▪ With Serverless, we automate the
deployment, execution, scaling and
monitoring of our code
9
10. Automating The Development & Tracking Workflow
Write and
test locally
specify runtime
configuration
Run/scale on
the cluster
Build
(if needed)
Document
& Publish
Run in a
Pipeline
Track experiments/runs, functions and data
image, deps
cpu/gpu/mem
data, volumes, ..
Use
published
functions
11. MLOpsAutomation: The CI/CDWay
Write and
test locally
specify runtime
& pipeline config
Build
(if needed)
Document
& Publish
Run in a
Pipeline
Track experiments/runs, functions and data
image, deps
cpu/gpu/mem
data, volumes, ..
steps
trigger Process pull
request
(automated)
Feedback (comment)
https://github.com/mlrun/demo-github-actionsDemo:
12. • 4M global customers
• 200 countries and territories - streaming global commerce
• Understanding illicit patterns of behavior in real time
based on 90 different parameters
• Proactively preventing money laundering before it occurs
Want To Move From Fraud Detection to
Prevention And Cut Time To Production
Fraud Prevention
Case Study: Payoneer
13. Traditional Fraud-Detection
Architecture (Hadoop)
13
SQL Server
Operational database
ETL to the DWH
every 30min
Data warehouse
Mirror table
Offline
processing
(SQL)
Feature vector Batch prediction
Using R Server
40 Minutes to identify suspicious money laundering account
40 Precious Minutes (detect fraud after the fact)
Long and complex process to production
14. Moving To Real-Time Fraud Prevention
14
SQL Server
Operational database
CDC
(Real-time)
Real-time
Ingestion Online + Offline
Feature Store
Model Training
(sklearn)
Model Inferencing
(Nuclio)
Block account !
Queue
Analysis
12 Seconds (prevent fraud)
12 Seconds to detect and prevent fraud !
Automated dev to production using a serverless approach
15. Models Require Continuous Monitoring And Updates
MLOps lifecycle with drift detection:
• Automated data-prep and training
• Automated model deployment
• Real-time model &drift monitoring
• Periodic drift analysis
• Automated remediation
• Retrain, ensembles, …
15
Training
Batch
(Parquet)
Reference
data
Serving
Tracking
stream
Real-Time Model
Monitoring
TSDB
Model
Analysis
Requests
Serverless Drift Detection
Fix