SlideShare una empresa de Scribd logo
1 de 32
Descargar para leer sin conexión
© 2019-2020, Anyscale.io
Ray.Serve: A new scalable machine
learning model serving library on Ray
Simon Mo
xmo@anyscale.io
@simon_mo_
@simon_mo_
A system for building scalable
Python (and Java) applications.
Reinforcement
learning
Hyperparameter
tuning and
distributed training
Serving
Distributed
Applications
Data analytics
Ray Ecosystem
2
@simon_mo_
rning
Hyperparameter tuning and
distributed training Serving
Distri
Applic
Data analytics
This talk
3
Offline
Training
Data
Data
Collection
Cleaning &
Visualization
Feature Eng. &
Model Design
Training &
Validation
Model Development
Trained
Models
Training Pipelines
Live
Data
Training
Validation
End User
Application
Query
Prediction
Prediction Service
Inference
Feedback
Logic
Big Picture:
Machine Learning Lifecycle
4
End User
Application
Query
Prediction
Prediction Service
Inference
Feedback
Logic
Goal: serve predictions for
large-scale, interactive
applications
5
@simon_mo_
Two common approaches
● Embed model evaluation in the web server
● Offload prediction to an external service
6
@simon_mo_
Embed model evaluation in server
7
HTTP
/api/healthz
/api/db_query
/api/image/id/..
@simon_mo_
Embed model evaluation in server
8
/api/healthz
/api/db_query
/api/image/id/..
/api/image/predict
@simon_mo_
The web server approach
+ Simplicity
+ End to end control over how model is served
x One query at a time
x Model loaded once, as global variable
x No isolation
x No fine-grained replication
9
@simon_mo_
The web server approach (continue)
x Process-pool based deployment
10
Initial process
Worker process
Worker process
Worker process
…
Requests
Load Balanced
Forked
@simon_mo_
The web server approach (continue)
x Process-pool based deployment
11
Worker process
Worker process
@simon_mo_
The web server approach (continue)
x Process-pool based deployment -> memory issue
12
Initial process
Worker process
Worker process
Worker process
…
Requests
Load Balanced
Forked
@simon_mo_
The web server approach (continue)
x No complex pipeline
13
Model
@simon_mo_
The web server approach (continue)
x No complex pipeline
14
Pipeline
@simon_mo_
The web server approach (continue)
x No complex pipeline
15
A/B Test
80%
20%
@simon_mo_
The web server approach (continue)
x No complex pipeline
16
Ensemble
@simon_mo_
The web server approach (continue)
x No complex pipeline
17
Cascade
High confidence
Low
confidence
@simon_mo_
Two common approaches
● Embed model evaluation in the web server
● Offload prediction to an external service
18
@simon_mo_
Offload inference to external service
19
/api/healthz
/api/db_query
/api/image/id/..
/api/image/predict
@simon_mo_
Offload inference to external service
20
-> HTTP
API Validation
Business Logic
Input Transformation
Inference
Output Transformation
Business Logic
<- HTTP
@simon_mo_
Offload inference to external service
21
Web
Server
External
Service
-> HTTP
API Validation
Business Logic
Input Transformation
Inference
Output Transformation
Business Logic
<- HTTP
Input Transformation
Inference
Output Transformation
@simon_mo_
Offload inference to external service
22
Web
Server
External
ServiceInference
-> HTTP
API Validation
Business Logic
Business Logic
<- HTTP
Input Transformation
Output Transformation
@simon_mo_
External services are mostly “tensor-in,
tensor-out”
23
-> HTTP
API Validation
Business Logic
Input Transformation
Inference
Output Transformation
Business Logic
<- HTTPComplexity
@simon_mo_
External services approach
+ Separation of concern
x Need to scale separately
x Model evaluation logic split from transformation logic
x Hard to learn
x Hard to debug
24
@simon_mo_
Ray.Serve
25
+ Simplicity
+ End to end control
+ Enable complex pipelines
+ Programmability and Observability
@simon_mo_
Serve API
26
@simon_mo_
Programmable Serving System
● YAML -> Python
● serve.create_backend
● serve.create_endpoint
● serve.split
● serve.scale
27
@simon_mo_
Kubernetes? Service Mesh?
● Serve provide a layer on top Kubernetes
○ Easy to serve simple model
○ Easy to serve complex pipeline
○ API definition and the model at the same place
○ Built-in service mesh for flexible routing
28
@simon_mo_
Serve runs on top of K8s
29
Ray Serve: Run on top of Kubernetes
23
Ray Serve
Model Model Model Model Model Model
Pod Pod Pod
@simon_mo_
Comparison
30
Compare to Serve
TFServing
+ Scale to any number of nodes
+ Support arbitrary frameworks
Seldon
+ Imperative pipelines
+ Flexible queuing policy
Sagemaker
+ Better batching
+ Deploy anywhere
"Flask”
+ Fine-grained replication
+ Isolated deployment
@simon_mo_
Try it out today
pip install ray[serve]
from ray.experimental import serve
31
- Ready for early adopters
- #serve channel in slack
- Coming soon:
- Performance benchmark
- Deployment tutorial
Questions?
© 2019-2020, Anyscale.io
xmo@anyscale.io
@simon_mo_

Más contenido relacionado

La actualidad más candente

Logging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaLogging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & Kibana
Amazee Labs
 
Choosing the Right Database (Database Freedom)
Choosing the Right Database (Database Freedom)Choosing the Right Database (Database Freedom)
Choosing the Right Database (Database Freedom)
Amazon Web Services
 
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
Edge AI and Vision Alliance
 
MLOps for production-level machine learning
MLOps for production-level machine learningMLOps for production-level machine learning
MLOps for production-level machine learning
cnvrg.io AI OS - Hands-on ML Workshops
 

La actualidad más candente (20)

Logging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & KibanaLogging with Elasticsearch, Logstash & Kibana
Logging with Elasticsearch, Logstash & Kibana
 
Choosing the Right Database (Database Freedom)
Choosing the Right Database (Database Freedom)Choosing the Right Database (Database Freedom)
Choosing the Right Database (Database Freedom)
 
MLOps with Kubeflow
MLOps with Kubeflow MLOps with Kubeflow
MLOps with Kubeflow
 
Machine Learning using Kubeflow and Kubernetes
Machine Learning using Kubeflow and KubernetesMachine Learning using Kubeflow and Kubernetes
Machine Learning using Kubeflow and Kubernetes
 
MLflow Model Serving
MLflow Model ServingMLflow Model Serving
MLflow Model Serving
 
Mother of Language`s Langchain
Mother of Language`s LangchainMother of Language`s Langchain
Mother of Language`s Langchain
 
ONNX and MLflow
ONNX and MLflowONNX and MLflow
ONNX and MLflow
 
An introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceAn introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging Face
 
Vector databases and neural search
Vector databases and neural searchVector databases and neural search
Vector databases and neural search
 
Kubeflow at Spotify (For the Kubeflow Summit)
Kubeflow at Spotify (For the Kubeflow Summit)Kubeflow at Spotify (For the Kubeflow Summit)
Kubeflow at Spotify (For the Kubeflow Summit)
 
Monitoring Kubernetes with Prometheus
Monitoring Kubernetes with PrometheusMonitoring Kubernetes with Prometheus
Monitoring Kubernetes with Prometheus
 
Generative-AI-Exploring-beyond-the-horizons-possibilities-of-AI-WP.pdf
Generative-AI-Exploring-beyond-the-horizons-possibilities-of-AI-WP.pdfGenerative-AI-Exploring-beyond-the-horizons-possibilities-of-AI-WP.pdf
Generative-AI-Exploring-beyond-the-horizons-possibilities-of-AI-WP.pdf
 
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platform
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platformGrokking Techtalk #40: AWS’s philosophy on designing MLOps platform
Grokking Techtalk #40: AWS’s philosophy on designing MLOps platform
 
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...How Does Generative AI Actually Work? (a quick semi-technical introduction to...
How Does Generative AI Actually Work? (a quick semi-technical introduction to...
 
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
“MLOps: Managing Data and Workflows for Efficient Model Development and Deplo...
 
Building-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWSBuilding-a-Data-Lake-on-AWS
Building-a-Data-Lake-on-AWS
 
Elasticsearch in Netflix
Elasticsearch in NetflixElasticsearch in Netflix
Elasticsearch in Netflix
 
Machine Learning with PyCarent + MLflow
Machine Learning with PyCarent + MLflowMachine Learning with PyCarent + MLflow
Machine Learning with PyCarent + MLflow
 
TensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlow
TensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlowTensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlow
TensorFlow Extended: An End-to-End Machine Learning Platform for TensorFlow
 
MLOps for production-level machine learning
MLOps for production-level machine learningMLOps for production-level machine learning
MLOps for production-level machine learning
 

Similar a Ray Serve: A new scalable machine learning model serving library on Ray

ThingsExpo: Enterprise Internet of Things (IoT) Patterns, Opportunities and P...
ThingsExpo: Enterprise Internet of Things (IoT) Patterns, Opportunities and P...ThingsExpo: Enterprise Internet of Things (IoT) Patterns, Opportunities and P...
ThingsExpo: Enterprise Internet of Things (IoT) Patterns, Opportunities and P...
ReidCarlberg
 

Similar a Ray Serve: A new scalable machine learning model serving library on Ray (20)

Salesforce platform session 2
 Salesforce platform session 2 Salesforce platform session 2
Salesforce platform session 2
 
Enable Oauth2.0 with Sentinet API Management (Massimo Crippa @ BTUG Event)
Enable Oauth2.0 with Sentinet API Management (Massimo Crippa @ BTUG Event)Enable Oauth2.0 with Sentinet API Management (Massimo Crippa @ BTUG Event)
Enable Oauth2.0 with Sentinet API Management (Massimo Crippa @ BTUG Event)
 
Get Mapped: Using Value Stream Mapping to Create a DevOps Adoption Roadmap
Get Mapped: Using Value Stream Mapping to Create a DevOps Adoption RoadmapGet Mapped: Using Value Stream Mapping to Create a DevOps Adoption Roadmap
Get Mapped: Using Value Stream Mapping to Create a DevOps Adoption Roadmap
 
Api management customer
Api management customerApi management customer
Api management customer
 
Accelerate Your Digital Transformation: How to Achieve Business Agility with ...
Accelerate Your Digital Transformation: How to Achieve Business Agility with ...Accelerate Your Digital Transformation: How to Achieve Business Agility with ...
Accelerate Your Digital Transformation: How to Achieve Business Agility with ...
 
apidays Helsinki & North 2023 - API Security in the era of Generative AI, Mat...
apidays Helsinki & North 2023 - API Security in the era of Generative AI, Mat...apidays Helsinki & North 2023 - API Security in the era of Generative AI, Mat...
apidays Helsinki & North 2023 - API Security in the era of Generative AI, Mat...
 
MuleSoftマイクロサービスとデプロイメントパターン
MuleSoftマイクロサービスとデプロイメントパターンMuleSoftマイクロサービスとデプロイメントパターン
MuleSoftマイクロサービスとデプロイメントパターン
 
Skills, Competencies And Methods
Skills, Competencies And MethodsSkills, Competencies And Methods
Skills, Competencies And Methods
 
ibm-zconnect-mule.pdf
ibm-zconnect-mule.pdfibm-zconnect-mule.pdf
ibm-zconnect-mule.pdf
 
Design - Start Your API Journey Today
Design - Start Your API Journey TodayDesign - Start Your API Journey Today
Design - Start Your API Journey Today
 
IBM API Connect Deployment `Good Practices - IBM Think 2018
IBM API Connect Deployment `Good Practices - IBM Think 2018IBM API Connect Deployment `Good Practices - IBM Think 2018
IBM API Connect Deployment `Good Practices - IBM Think 2018
 
ThingsExpo: Enterprise Internet of Things (IoT) Patterns, Opportunities and P...
ThingsExpo: Enterprise Internet of Things (IoT) Patterns, Opportunities and P...ThingsExpo: Enterprise Internet of Things (IoT) Patterns, Opportunities and P...
ThingsExpo: Enterprise Internet of Things (IoT) Patterns, Opportunities and P...
 
What’s behind a high quality web API? Ensure your APIs are more than just a ...
What’s behind a high quality web API? Ensure your APIs are more than just a ...What’s behind a high quality web API? Ensure your APIs are more than just a ...
What’s behind a high quality web API? Ensure your APIs are more than just a ...
 
API, Integration, and SOA Convergence
API, Integration, and SOA ConvergenceAPI, Integration, and SOA Convergence
API, Integration, and SOA Convergence
 
Optimize your CI/CD with GitLab and AWS
Optimize your CI/CD with GitLab and AWSOptimize your CI/CD with GitLab and AWS
Optimize your CI/CD with GitLab and AWS
 
Realize 2022 MINO 7 year of implementation v0.1.pptx
Realize 2022 MINO 7 year of implementation v0.1.pptxRealize 2022 MINO 7 year of implementation v0.1.pptx
Realize 2022 MINO 7 year of implementation v0.1.pptx
 
Vmware 2015 with vsphereHigh performance application platforms
Vmware 2015 with vsphereHigh performance application platformsVmware 2015 with vsphereHigh performance application platforms
Vmware 2015 with vsphereHigh performance application platforms
 
apidays LIVE Australia 2020 - Data with a Mission by Matt McLarty
apidays LIVE Australia 2020 -  Data with a Mission by Matt McLarty apidays LIVE Australia 2020 -  Data with a Mission by Matt McLarty
apidays LIVE Australia 2020 - Data with a Mission by Matt McLarty
 
apidays LIVE Paris - Data with a mission: a COVID-19 API case study by Matt M...
apidays LIVE Paris - Data with a mission: a COVID-19 API case study by Matt M...apidays LIVE Paris - Data with a mission: a COVID-19 API case study by Matt M...
apidays LIVE Paris - Data with a mission: a COVID-19 API case study by Matt M...
 
IBM DevOps Enabling continuous integration & delivery
IBM DevOps Enabling continuous integration & deliveryIBM DevOps Enabling continuous integration & delivery
IBM DevOps Enabling continuous integration & delivery
 

Último

Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 

Ray Serve: A new scalable machine learning model serving library on Ray