SlideShare una empresa de Scribd logo
1 de 44
Descargar para leer sin conexión
Daniel Crankshaw
Spark Summit East
February 2017
A Low-Latency Online Prediction
Serving System
Clipper
Big
Data
Big Model
Training
Learning
Timescale: minutes to days
Systems: offline and batch optimized
Heavily studied ... major focus of the AMPLab
Big
Data
Big Model
Training
Application
Decision
Query
?
Learning Inference
Big
Data
Training
Learning
Inference
Big Model
Application
Decision
Query
Timescale: ~20 milliseconds
Systems: online and latency optimized
Less studied …
Big
Data
Big Model
Training
Application
Decision
Query
Learning Inference
Feedback
Big
Data
Training
Application
Decision
Learning Inference
Feedback
Timescale: hours to weeks
Systems: combination of systems
Less studied …
Big
Data
Big Model
Training
Application
Decision
Query
Learning Inference
Feedback
Big
Data
Big Model
Training
Application
Decision
Query
Learning Inference
Feedback
Responsive
(~10ms)
Adaptive
(~1 seconds)
Example: Fraud Detection
Serving Predictions Today
Big
Data
Big Model
Training
Offline Batch System
Big
Data
Big Model
Training
Offline Batch System
Scoring
X Y
Serving Predictions Today: Offline Scoring
Serving Predictions Today: Offline Scoring
X Y
Application
Decision
Query
Look up decision in KV-Store
Online Serving System
Serving Predictions Today: Offline Scoring
X Y
Application
Decision
Query
Look up decision in KV-Store
Online Serving System
Problems:
Ø Requires full set of queries ahead of time
Ø Small and bounded input domain
Ø Wasted computation and space
Ø Can render and store unneeded predictions
Ø No feedback and costly to update
Serving Predictions Today: Online Scoring
Application
Decision
Query
Render prediction with model in real-time
Online Serving System
Fraud
Dataset
Big Model
Training
Application
Decision
Query
Learning Inference
Feedback
???
Content
Rec.
Fraud
Detection
Personal
Asst.
Robotic
Control
Machine
Translation
Create VW
Caffe
Many applications and many models
Many applications and many models
???
Content
Rec.
Fraud
Detection
Personal
Asst.
Robotic
Control
Machine
Translation
Create VW
Caffe
Can we decouple models and applications?
???
Content
Rec.
Fraud
Detection
Personal
Asst.
Robotic
Control
Machine
Translation
Create VW
Caffe
Requirements
• System cannot stand in way of independent evolution of applications
models, empowers
• enables separate evolution, development
• From perspective of data scientist
• Ease of application evolution
• model rollout
• application deployment
• support for wide range of frameworks that data scientists
• improve accuracy, use cutting edge techniques, frameworks
• experiment with models in predictions
• Don’t have to worryabout applications (performance
• Frontend developer
• Stable, reliable, performantAPIs (need systems that meet their SLOs)
• scale system, hardware to meet application demands
• Don’t worryabout models (oblivious to underlying)
Requirements
• Decouple applications from models and allow them to evolve
independently from each other
• The Data Scientist perspective: focus on making accurate
predictions
• Support many models, frameworks
• Simple deployment and online experimentation
• (Mostly) oblivious to system performance and workload demands
• The Frontend Dev perspective: focus on building reliable, low-
latency applications
• Provide stable, reliable,performant APIs (need systems that meet their
SLOs)
• scale system, hardware to meet application demands
• Oblivious to the implementations of the underlying models
Prediction-Serving System:
Ø Decouple applications from models and allow them to
evolve independently from each other
Ø The Frontend Dev perspective: focus on building reliable,
low-latency applications
Ø Provide stable, reliable, performant APIs to meet SLAs
Ø scale system, hardware to meet application demands
Ø Oblivious to the implementations of the underlying models
Ø The Data Scientist perspective: focus on making accurate
predictions
Ø Support many models and frameworks simultaneously
Ø Simple deployment and online experimentation
Ø (Mostly) oblivious to system performance and workload demands
Requirements
Clipper
Predict FeedbackRPC/REST Query Interface
Applications
create_application()
deploy_model()
Management REST API
replicate_model()
inspect_instance()
From the Frontend Dev perspective
From the Data Scientist perspective
class ModelContainer:
def __init__(model_data)
def predict_batch(inputs)
Implement Model API:
From the Data Scientist perspective
class ModelContainer:
def __init__(model_data)
def predict_batch(inputs)
Implement Model API:
Ø Implemented in many languages
Ø Python
Ø Java
Ø C/C++
Ø R
Ø …
From the Data Scientist perspective
Model implementation packaged in container
Model Container (MC)
Clipper
Caffe
MC MC MC
RPC RPC RPC RPC
From the Data Scientist perspective
Model Container (MC)
Clipper
Predict FeedbackRPC/REST Interface
Model Container (MC)
Caffe
MC MC MC
RPC RPC RPC RPC
From the data scientist perspective
Applications
Clipper
Predict FeedbackRPC/REST Interface
Caffe
MC MC MC
RPC RPC RPC RPC
Clipper Decouples Applications and Models
Applications
Model Container (MC)
Clipper Generalizes Models Across ML Frameworks
Clipper
Content
Rec.
Fraud
Detection
Personal
Asst.
Robotic
Control
Machine
Translation
Create VW
Caffe
DEMO
Clipper
Create VWCaffeKey Insight:
The challenges of prediction serving can be addressed between
end-user applications and machine learning frameworks
As a result, Clipper is able to:
Ø hide complexity
Ø by providing a common prediction interface to applications
Ø bound latency and maximize throughput
Ø through caching, adaptive batching, model scaleout
Ø enable robust online learning and personalization
Ø through model selection and ensemble algorithms
without modifying machine learning frameworks or end-user applications
Clipper
As a result
Ø hide complexity
Ø by providing a common predictioninterface to applications
Ø bound latency and maximize throughput
Ø through caching, adaptive batching, model scaleout
Ø enable robust online learning and personalization
Ø through model selection and ensemble algorithms
without modifying machine learning frameworks or end-user
applications
Clipper Decouples Applications and Models
Challenges
Ø Managing heterogeneity everywhere
Ø different types of models (different software, different resource requirements)
in a productionenvironment
Ø Different applicationperformance requirements
Ø workloads, latencies
Ø Scheduling (space-time resource management)
Ø Where and when to send predictionqueries to models
Ø Latency-accuracy tradeoffs
Ø Marginal utility of allocating additional resources
Ø How to use feedback to improve accuracy in real-time
Clipper Architecture
Clipper
Caffe
Applications
Predict ObserveRPC/REST Interface
MC MC MC
RPC RPC RPC RPC
Model Abstraction Layer
Provide a common interface to models
while bounding latency and
maximizing throughput.
Model Selection Layer
Improve accuracy through bandit methods and
ensembles, online learning, and personalization
Model Container (MC)
Clipper Architecture
Clipper
Caffe
Applications
Predict ObserveRPC/REST Interface
MC MC MC
RPC RPC RPC RPC
Model Selection LayerSelection Policy
Model Abstraction Layer
Caching
Adaptive Batching
Model Container (MC)
Model Container (MC)
Caffe
Correction LayerCorrection Policy
MC MC MC
RPC RPC RPC
Model Abstraction Layer
Caching
Adaptive Batching
Provide a common interface to models while
RPC
Correction LayerCorrection Policy
Model Container (MC)
RPC
Caffe
MC
RPC
MC
RPC
MC
RPC
Model Abstraction Layer
Caching
Adaptive Batching
Common Interface à Simplifies Deployment:
Ø Evaluate models using original code & systems
Ø Models run in separate processes as Docker containers
Ø Resource isolation
Correction LayerCorrection Policy
Model Abstraction Layer
Caching
Adaptive Batching
odel Container (MC)
RPC
Caffe
MC
RPC
MC
RPC
MC
RPC
MC
RPC
MC
RPC
Common Interface à Simplifies Deployment:
Ø Evaluate models using original code & systems
Ø Models run in separate processes as Docker containers
Ø Resource isolation
Ø Scale-out
Problem: frameworks optimized for batch processing not latency
A single
page load
may generate
many queries
Adaptive Batching to Improve Throughput
Ø Optimal batch depends on:
Ø hardware configuration
Ø model and framework
Ø system load
Clipper Solution:
be as slow as allowed…
Ø Inc. batch size until the latency objective
is exceeded (Additive Increase)
Ø If latency exceeds SLO cut batch size
by a fraction (Multiplicative Decrease)
Ø Why batching helps:
Hardware
Acceleration
Helps amortize
system overhead
Batching Results
SLO
Up to 25.5x
throughput increase
from batching
Clipper Architecture
Clipper
Caffe
Applications
Predict ObserveRPC/REST Interface
Model Container (MC) MC MC MC
RPC RPC RPC RPC
Model Selection LayerSelection Policy
Model Abstraction Layer
Caching
Adaptive Batching
Caffe
Big
Data
Application
Learning Inference
Feedback
Slow
Slow Changing
Model
Real-time
model selection
and ensembles
Clipper
Clipper
Model Selection LayerSelection Policy
Caffe
Slow Changing
Model
Clipper
Bring Learning into the Serving Tier
What can we learn?
Ø Dynamically weight mixture of
experts
Ø Select best model for each user
Ø Use ensemble to estimate
prediction confidence
Ø Don’t try to retrain models
Real-time
model selection
and ensembles
Road Map
Ø Open source on GitHub: https://github.com/ucbrise/clipper
Ø Kick the tires, try out our tutorial
Ø Alpha release in mid-April
Ø Focused on reliability and performance for serving single-model applications
Ø First class support for Scikit-Learn and Spark models, arbitrary Python functions
Ø Coordinating initial set of features with RISE Lab sponsors and collaborators
Ø After alpha release
Ø Support for selection policies and multi-model applications
Ø Model performance monitoring to detect and correct accuracy degradation
Ø New task scheduler design to leverage model and resource heterogeneity
“Clipper: ALow-Latency Online Prediction Serving System” [NSDI ‘17]
https://arxiv.org/abs/1612.03079
crankshaw@cs.berkeley.edu

Más contenido relacionado

Similar a Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East talk by Dan Crankshaw

Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...MLconf
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareJustin Basilico
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...Robert Grossman
 
SLM (Sample Lifecycle Manager)
SLM (Sample Lifecycle Manager)SLM (Sample Lifecycle Manager)
SLM (Sample Lifecycle Manager)limscoder
 
TechTalk_Cloud Performance Testing_0.6
TechTalk_Cloud Performance Testing_0.6TechTalk_Cloud Performance Testing_0.6
TechTalk_Cloud Performance Testing_0.6Sravanthi N
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLNordic APIs
 
Scale Container Operations with AIOps
Scale Container Operations with AIOpsScale Container Operations with AIOps
Scale Container Operations with AIOpsTimothy Chen
 
Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...Paul Brebner
 
SoCal DevOps Meetup 1/26/2017 - Habitat by Chef
SoCal DevOps Meetup 1/26/2017 - Habitat by ChefSoCal DevOps Meetup 1/26/2017 - Habitat by Chef
SoCal DevOps Meetup 1/26/2017 - Habitat by ChefTrevor Hess
 
Get Started with JavaScript Frameworks
Get Started with JavaScript FrameworksGet Started with JavaScript Frameworks
Get Started with JavaScript FrameworksChristian Gaetano
 
Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...DataWorks Summit
 
Incremental Model Queries for Model-Dirven Software Engineering
Incremental Model Queries for Model-Dirven Software EngineeringIncremental Model Queries for Model-Dirven Software Engineering
Incremental Model Queries for Model-Dirven Software EngineeringÁkos Horváth
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixJustin Basilico
 
DevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-usDevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-useltonrodriguez11
 
Icon solutions presentation - Pure Hybrid Cloud Event, 11th September London
Icon solutions presentation - Pure Hybrid Cloud Event, 11th September LondonIcon solutions presentation - Pure Hybrid Cloud Event, 11th September London
Icon solutions presentation - Pure Hybrid Cloud Event, 11th September LondonIBM Systems UKI
 
Patterns & Practices of Microservices
Patterns & Practices of MicroservicesPatterns & Practices of Microservices
Patterns & Practices of MicroservicesWesley Reisz
 
Cloud-native Data: Every Microservice Needs a Cache
Cloud-native Data: Every Microservice Needs a CacheCloud-native Data: Every Microservice Needs a Cache
Cloud-native Data: Every Microservice Needs a Cachecornelia davis
 
Confluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with ReplyConfluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with Replyconfluent
 
Winning People to DevOps
Winning People to DevOpsWinning People to DevOps
Winning People to DevOpsMatthew Skelton
 

Similar a Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East talk by Dan Crankshaw (20)

Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
Justin Basilico, Research/ Engineering Manager at Netflix at MLconf SF - 11/1...
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
AnalyticOps: Lessons Learned Moving Machine-Learning Algorithms to Production...
 
SLM (Sample Lifecycle Manager)
SLM (Sample Lifecycle Manager)SLM (Sample Lifecycle Manager)
SLM (Sample Lifecycle Manager)
 
TechTalk_Cloud Performance Testing_0.6
TechTalk_Cloud Performance Testing_0.6TechTalk_Cloud Performance Testing_0.6
TechTalk_Cloud Performance Testing_0.6
 
OS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of MLOS for AI: Elastic Microservices & the Next Gen of ML
OS for AI: Elastic Microservices & the Next Gen of ML
 
Scale Container Operations with AIOps
Scale Container Operations with AIOpsScale Container Operations with AIOps
Scale Container Operations with AIOps
 
Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...Past Experiences and Future Challenges using Automatic Performance Modelling ...
Past Experiences and Future Challenges using Automatic Performance Modelling ...
 
SoCal DevOps Meetup 1/26/2017 - Habitat by Chef
SoCal DevOps Meetup 1/26/2017 - Habitat by ChefSoCal DevOps Meetup 1/26/2017 - Habitat by Chef
SoCal DevOps Meetup 1/26/2017 - Habitat by Chef
 
Get Started with JavaScript Frameworks
Get Started with JavaScript FrameworksGet Started with JavaScript Frameworks
Get Started with JavaScript Frameworks
 
Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...Software engineering practices for the data science and machine learning life...
Software engineering practices for the data science and machine learning life...
 
Incremental Model Queries for Model-Dirven Software Engineering
Incremental Model Queries for Model-Dirven Software EngineeringIncremental Model Queries for Model-Dirven Software Engineering
Incremental Model Queries for Model-Dirven Software Engineering
 
Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
 
DevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-usDevOps for Machine Learning overview en-us
DevOps for Machine Learning overview en-us
 
Icon solutions presentation - Pure Hybrid Cloud Event, 11th September London
Icon solutions presentation - Pure Hybrid Cloud Event, 11th September LondonIcon solutions presentation - Pure Hybrid Cloud Event, 11th September London
Icon solutions presentation - Pure Hybrid Cloud Event, 11th September London
 
Patterns & Practices of Microservices
Patterns & Practices of MicroservicesPatterns & Practices of Microservices
Patterns & Practices of Microservices
 
Cloud-native Data: Every Microservice Needs a Cache
Cloud-native Data: Every Microservice Needs a CacheCloud-native Data: Every Microservice Needs a Cache
Cloud-native Data: Every Microservice Needs a Cache
 
Microservices.pdf
Microservices.pdfMicroservices.pdf
Microservices.pdf
 
Confluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with ReplyConfluent Partner Tech Talk with Reply
Confluent Partner Tech Talk with Reply
 
Winning People to DevOps
Winning People to DevOpsWinning People to DevOps
Winning People to DevOps
 

Más de Spark Summit

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang Spark Summit
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...Spark Summit
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang WuSpark Summit
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya RaghavendraSpark Summit
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...Spark Summit
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...Spark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingSpark Summit
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...Spark Summit
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakSpark Summit
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimSpark Summit
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraSpark Summit
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Spark Summit
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...Spark Summit
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spark Summit
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovSpark Summit
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Spark Summit
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkSpark Summit
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Spark Summit
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...Spark Summit
 

Más de Spark Summit (20)

FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
FPGA-Based Acceleration Architecture for Spark SQL Qi Xie and Quanfu Wang
 
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
VEGAS: The Missing Matplotlib for Scala/Apache Spark with DB Tsai and Roger M...
 
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang WuApache Spark Structured Streaming Helps Smart Manufacturing with  Xiaochang Wu
Apache Spark Structured Streaming Helps Smart Manufacturing with Xiaochang Wu
 
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data  with Ramya RaghavendraImproving Traffic Prediction Using Weather Data  with Ramya Raghavendra
Improving Traffic Prediction Using Weather Data with Ramya Raghavendra
 
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
A Tale of Two Graph Frameworks on Spark: GraphFrames and Tinkerpop OLAP Artem...
 
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
No More Cumbersomeness: Automatic Predictive Modeling on Apache Spark Marcin ...
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
Apache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim DowlingApache Spark and Tensorflow as a Service with Jim Dowling
Apache Spark and Tensorflow as a Service with Jim Dowling
 
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
MMLSpark: Lessons from Building a SparkML-Compatible Machine Learning Library...
 
Next CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub WozniakNext CERN Accelerator Logging Service with Jakub Wozniak
Next CERN Accelerator Logging Service with Jakub Wozniak
 
Powering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin KimPowering a Startup with Apache Spark with Kevin Kim
Powering a Startup with Apache Spark with Kevin Kim
 
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya RaghavendraImproving Traffic Prediction Using Weather Datawith Ramya Raghavendra
Improving Traffic Prediction Using Weather Datawith Ramya Raghavendra
 
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
Hiding Apache Spark Complexity for Fast Prototyping of Big Data Applications—...
 
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...How Nielsen Utilized Databricks for Large-Scale Research and Development with...
How Nielsen Utilized Databricks for Large-Scale Research and Development with...
 
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
Spline: Apache Spark Lineage not Only for the Banking Industry with Marek Nov...
 
Goal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim SimeonovGoal Based Data Production with Sim Simeonov
Goal Based Data Production with Sim Simeonov
 
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
Preventing Revenue Leakage and Monitoring Distributed Systems with Machine Le...
 
Getting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir VolkGetting Ready to Use Redis with Apache Spark with Dvir Volk
Getting Ready to Use Redis with Apache Spark with Dvir Volk
 
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
Deduplication and Author-Disambiguation of Streaming Records via Supervised M...
 
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
MatFast: In-Memory Distributed Matrix Computation Processing and Optimization...
 

Último

VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxolyaivanovalion
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxolyaivanovalion
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAroojKhan71
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxolyaivanovalion
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一ffjhghh
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxfirstjob4
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Último (20)

VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Edukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFxEdukaciniai dropshipping via API with DroFx
Edukaciniai dropshipping via API with DroFx
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al BarshaAl Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
Al Barsha Escorts $#$ O565212860 $#$ Escort Service In Al Barsha
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
Mature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptxMature dropshipping via API with DroFx.pptx
Mature dropshipping via API with DroFx.pptx
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls CP 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Punjabi Bagh 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一定制英国白金汉大学毕业证(UCB毕业证书)																			成绩单原版一比一
定制英国白金汉大学毕业证(UCB毕业证书) 成绩单原版一比一
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
Introduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptxIntroduction-to-Machine-Learning (1).pptx
Introduction-to-Machine-Learning (1).pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Clipper: A Low-Latency Online Prediction Serving System: Spark Summit East talk by Dan Crankshaw

  • 1. Daniel Crankshaw Spark Summit East February 2017 A Low-Latency Online Prediction Serving System Clipper
  • 2. Big Data Big Model Training Learning Timescale: minutes to days Systems: offline and batch optimized Heavily studied ... major focus of the AMPLab
  • 4. Big Data Training Learning Inference Big Model Application Decision Query Timescale: ~20 milliseconds Systems: online and latency optimized Less studied …
  • 6. Big Data Training Application Decision Learning Inference Feedback Timescale: hours to weeks Systems: combination of systems Less studied …
  • 10. Serving Predictions Today Big Data Big Model Training Offline Batch System
  • 11. Big Data Big Model Training Offline Batch System Scoring X Y Serving Predictions Today: Offline Scoring
  • 12. Serving Predictions Today: Offline Scoring X Y Application Decision Query Look up decision in KV-Store Online Serving System
  • 13. Serving Predictions Today: Offline Scoring X Y Application Decision Query Look up decision in KV-Store Online Serving System Problems: Ø Requires full set of queries ahead of time Ø Small and bounded input domain Ø Wasted computation and space Ø Can render and store unneeded predictions Ø No feedback and costly to update
  • 14. Serving Predictions Today: Online Scoring Application Decision Query Render prediction with model in real-time Online Serving System
  • 17. Many applications and many models ??? Content Rec. Fraud Detection Personal Asst. Robotic Control Machine Translation Create VW Caffe
  • 18. Can we decouple models and applications? ??? Content Rec. Fraud Detection Personal Asst. Robotic Control Machine Translation Create VW Caffe
  • 19. Requirements • System cannot stand in way of independent evolution of applications models, empowers • enables separate evolution, development • From perspective of data scientist • Ease of application evolution • model rollout • application deployment • support for wide range of frameworks that data scientists • improve accuracy, use cutting edge techniques, frameworks • experiment with models in predictions • Don’t have to worryabout applications (performance • Frontend developer • Stable, reliable, performantAPIs (need systems that meet their SLOs) • scale system, hardware to meet application demands • Don’t worryabout models (oblivious to underlying)
  • 20. Requirements • Decouple applications from models and allow them to evolve independently from each other • The Data Scientist perspective: focus on making accurate predictions • Support many models, frameworks • Simple deployment and online experimentation • (Mostly) oblivious to system performance and workload demands • The Frontend Dev perspective: focus on building reliable, low- latency applications • Provide stable, reliable,performant APIs (need systems that meet their SLOs) • scale system, hardware to meet application demands • Oblivious to the implementations of the underlying models
  • 21. Prediction-Serving System: Ø Decouple applications from models and allow them to evolve independently from each other Ø The Frontend Dev perspective: focus on building reliable, low-latency applications Ø Provide stable, reliable, performant APIs to meet SLAs Ø scale system, hardware to meet application demands Ø Oblivious to the implementations of the underlying models Ø The Data Scientist perspective: focus on making accurate predictions Ø Support many models and frameworks simultaneously Ø Simple deployment and online experimentation Ø (Mostly) oblivious to system performance and workload demands Requirements
  • 22. Clipper Predict FeedbackRPC/REST Query Interface Applications create_application() deploy_model() Management REST API replicate_model() inspect_instance() From the Frontend Dev perspective
  • 23. From the Data Scientist perspective class ModelContainer: def __init__(model_data) def predict_batch(inputs) Implement Model API:
  • 24. From the Data Scientist perspective class ModelContainer: def __init__(model_data) def predict_batch(inputs) Implement Model API: Ø Implemented in many languages Ø Python Ø Java Ø C/C++ Ø R Ø …
  • 25. From the Data Scientist perspective Model implementation packaged in container Model Container (MC)
  • 26. Clipper Caffe MC MC MC RPC RPC RPC RPC From the Data Scientist perspective Model Container (MC)
  • 27. Clipper Predict FeedbackRPC/REST Interface Model Container (MC) Caffe MC MC MC RPC RPC RPC RPC From the data scientist perspective Applications
  • 28. Clipper Predict FeedbackRPC/REST Interface Caffe MC MC MC RPC RPC RPC RPC Clipper Decouples Applications and Models Applications Model Container (MC)
  • 29. Clipper Generalizes Models Across ML Frameworks Clipper Content Rec. Fraud Detection Personal Asst. Robotic Control Machine Translation Create VW Caffe
  • 30. DEMO
  • 31. Clipper Create VWCaffeKey Insight: The challenges of prediction serving can be addressed between end-user applications and machine learning frameworks As a result, Clipper is able to: Ø hide complexity Ø by providing a common prediction interface to applications Ø bound latency and maximize throughput Ø through caching, adaptive batching, model scaleout Ø enable robust online learning and personalization Ø through model selection and ensemble algorithms without modifying machine learning frameworks or end-user applications
  • 32. Clipper As a result Ø hide complexity Ø by providing a common predictioninterface to applications Ø bound latency and maximize throughput Ø through caching, adaptive batching, model scaleout Ø enable robust online learning and personalization Ø through model selection and ensemble algorithms without modifying machine learning frameworks or end-user applications Clipper Decouples Applications and Models
  • 33. Challenges Ø Managing heterogeneity everywhere Ø different types of models (different software, different resource requirements) in a productionenvironment Ø Different applicationperformance requirements Ø workloads, latencies Ø Scheduling (space-time resource management) Ø Where and when to send predictionqueries to models Ø Latency-accuracy tradeoffs Ø Marginal utility of allocating additional resources Ø How to use feedback to improve accuracy in real-time
  • 34. Clipper Architecture Clipper Caffe Applications Predict ObserveRPC/REST Interface MC MC MC RPC RPC RPC RPC Model Abstraction Layer Provide a common interface to models while bounding latency and maximizing throughput. Model Selection Layer Improve accuracy through bandit methods and ensembles, online learning, and personalization Model Container (MC)
  • 35. Clipper Architecture Clipper Caffe Applications Predict ObserveRPC/REST Interface MC MC MC RPC RPC RPC RPC Model Selection LayerSelection Policy Model Abstraction Layer Caching Adaptive Batching Model Container (MC)
  • 36. Model Container (MC) Caffe Correction LayerCorrection Policy MC MC MC RPC RPC RPC Model Abstraction Layer Caching Adaptive Batching Provide a common interface to models while RPC
  • 37. Correction LayerCorrection Policy Model Container (MC) RPC Caffe MC RPC MC RPC MC RPC Model Abstraction Layer Caching Adaptive Batching Common Interface à Simplifies Deployment: Ø Evaluate models using original code & systems Ø Models run in separate processes as Docker containers Ø Resource isolation
  • 38. Correction LayerCorrection Policy Model Abstraction Layer Caching Adaptive Batching odel Container (MC) RPC Caffe MC RPC MC RPC MC RPC MC RPC MC RPC Common Interface à Simplifies Deployment: Ø Evaluate models using original code & systems Ø Models run in separate processes as Docker containers Ø Resource isolation Ø Scale-out Problem: frameworks optimized for batch processing not latency
  • 39. A single page load may generate many queries Adaptive Batching to Improve Throughput Ø Optimal batch depends on: Ø hardware configuration Ø model and framework Ø system load Clipper Solution: be as slow as allowed… Ø Inc. batch size until the latency objective is exceeded (Additive Increase) Ø If latency exceeds SLO cut batch size by a fraction (Multiplicative Decrease) Ø Why batching helps: Hardware Acceleration Helps amortize system overhead
  • 40. Batching Results SLO Up to 25.5x throughput increase from batching
  • 41. Clipper Architecture Clipper Caffe Applications Predict ObserveRPC/REST Interface Model Container (MC) MC MC MC RPC RPC RPC RPC Model Selection LayerSelection Policy Model Abstraction Layer Caching Adaptive Batching
  • 43. Clipper Model Selection LayerSelection Policy Caffe Slow Changing Model Clipper Bring Learning into the Serving Tier What can we learn? Ø Dynamically weight mixture of experts Ø Select best model for each user Ø Use ensemble to estimate prediction confidence Ø Don’t try to retrain models Real-time model selection and ensembles
  • 44. Road Map Ø Open source on GitHub: https://github.com/ucbrise/clipper Ø Kick the tires, try out our tutorial Ø Alpha release in mid-April Ø Focused on reliability and performance for serving single-model applications Ø First class support for Scikit-Learn and Spark models, arbitrary Python functions Ø Coordinating initial set of features with RISE Lab sponsors and collaborators Ø After alpha release Ø Support for selection policies and multi-model applications Ø Model performance monitoring to detect and correct accuracy degradation Ø New task scheduler design to leverage model and resource heterogeneity “Clipper: ALow-Latency Online Prediction Serving System” [NSDI ‘17] https://arxiv.org/abs/1612.03079 crankshaw@cs.berkeley.edu