SlideShare una empresa de Scribd logo
1 de 62
Descargar para leer sin conexión
Building an ML
Platform with Ray and
MLflow
Amog Kamsetty and Archit Kulkarni
Ray Team @ Anyscale
The Team
Archit Kulkarni Amog Kamsetty Dmitri Gekhtman Edward Oakes
Richard Liaw Kai Fricke Simon Mo
Kathryn Zhou
Overview of Talk
▪ What are ML Platforms?
▪ Ray and its libraries
▪ MLflow
▪ Demo: An ML Platform
built with MLflow and
Ray
What are ML Platforms?
Typical ML Process
Fuzzy
search!
NLP, DL …
Execution
- Feature engineering
- Training
- Including tuning
- Serving
- Offline scoring, inference
- Online serving
Typical ML Process -- Simplified
Management
- Tracking
- Data, Code, Configurations
- Reproducing Results
- Deployment
- Deploy in a variety of
environments
Challenges with the ML Process
Data/Features
• Data Preparation
• Data Analysis
• Feature
Engineering
• Data Pipeline
• Data
Management/Feat
ure Store
• Manages big data
clusters
Model
• ML Expertise
• Implement SOTA
ML Research
• Experimentation
• Manage GPU
infrastructure
• Scalable training &
hyperparameter
tuning
Production
• A/B Testing
• Model Evaluation
• Analysis of
Predictions
• Deploy in variety of
environments
• CI/CD
• Highly Available
prediction service
Data/Research
Scientist
Engineers
Challenges with the ML Process
Data
• Data Preparation
• Data Analysis
• Feature
Engineering
• Data Pipeline
• Data
Management/Feat
ure Store
• Manages big data
clusters
Model
• ML Expertise
• Implement SOTA
ML Research
• Experimentation
• Manage GPU
infrastructure
• Scalable training &
hyperparameter
tuning
Production
• A/B Testing
• Model Evaluation
• Analysis of
Predictions
• Deploy in variety of
environments
• CI/CD
• Highly Available
prediction service
Data/Research
Scientist
Software/Data/
ML Engineer
ML Platform
Abstraction
ML Platforms -- Scale
- LinkedIn:
- 500+ “AI engineers” building models; 50+ MLP engineers
- > 50% offline compute demand (12K servers each with 256G RAM)
- More than 2x a year
- Uber Michelangelo, AirBnB Bighead, Facebook FBLearner,
etc.
- Globally, a few Billion $ now, growing 40%+ YoY
- Many companies building ML Platforms from the ground up
ML Platforms -- Landscape
(Source: Intel Capital)
ML Platforms -- Landscape
(Source: Intel Capital)
Execution
- Feature engineering 🔪
- Training 🍳
- Including tuning 🧂
- Serving 🍽
- Offline scoring, inference
- Online serving
Typical ML Process -- Simplified
Management
- Tracking 📝
- Data, Code, Configurations
- Reproducing Results 📖
- Deployment 🚚 💻
- Deploy in a variety of
environments
Execution
- Feature engineering 🔪
- Training 🍳
- Including tuning 🧂
- Serving 🍽
- Offline scoring, inference
- Online serving
Typical ML Process -- Simplified
Management
- Tracking 📝
- Data, Code, Configurations
- Reproducing Results 📖
- Deployment 🚚 💻
- Variety of environments
Ray and its Libraries
What is Ray?
• A simple/general library for distributed computing
• Single machine or 100s of nodes
• Agnostic to the type of work
• An ecosystem of libraries (for scaling ML and more)
• Native: Ray RLlib, Ray Tune, Ray Serve
• Third party: Modin, Dask, Horovod, XGBoost, Pytorch Lightning
• Tools for launching clusters on any cloud provider
Three key ideas
Execute remote functions as tasks, and
instantiate remote classes as actors
• Support both stateful and stateless computations
Asynchronous execution using futures
• Enable parallelism
Distributed (immutable) object store
• Efficient communication (send arguments by reference)
Ray API
API
Functions -> Tasks
def read_array(file):
# read array “a” from “file”
return a
def add(a, b):
return np.add(a, b)
API
Functions -> Tasks
@ray.remote
def read_array(file):
# read array “a” from “file”
return a
@ray.remote
def add(a, b):
return np.add(a, b)
API
Functions -> Tasks
@ray.remote
def read_array(file):
# read array “a” from “file”
return a
@ray.remote
def add(a, b):
return np.add(a, b)
id1 = read_array.remote(“/input1”)
id1
read_array
API
Functions -> Tasks
@ray.remote
def read_array(file):
# read array “a” from “file”
return a
@ray.remote
def add(a, b):
return np.add(a, b)
id1 = read_array.remote(“/input1”)
id2 = read_array.remote(“/input2”)
id1
read_array
id2
zeros
read_array
API
Functions -> Tasks
@ray.remote
def read_array(file):
# read array “a” from “file”
return a
@ray.remote
def add(a, b):
return np.add(a, b)
id1 = read_array.remote(“/input1”)
id2 = read_array.remote(“/input2”)
id3 = add.remote(id1, id2)
id1
read_array
id2
zeros
read_array
id3
add
API
Functions -> Tasks
@ray.remote
def read_array(file):
# read array “a” from “file”
return a
@ray.remote
def add(a, b):
return np.add(a, b)
id1 = read_array.remote(“/input1”)
id2 = read_array.remote(“/input2”)
id3 = add.remote(id1, id2); ray.get(id3)
id1
read_array
id2
zeros
read_array
id3
add
API
Functions -> Tasks
@ray.remote
def read_array(file):
# read array “a” from “file”
return a
@ray.remote
def add(a, b):
return np.add(a, b)
id1 = read_array.remote(“/input1”)
id2 = read_array.remote(“/input2”)
id3 = add.remote(id1, id2)
Classes -> Actors
API
Functions -> Tasks
@ray.remote
def read_array(file):
# read array “a” from “file”
return a
@ray.remote
def add(a, b):
return np.add(a, b)
id1 = read_array.remote(“/input1”)
id2 = read_array.remote(“/input2”)
id3 = add.remote(id1, id2)
Classes -> Actors
@ray.remote
class Counter(object):
def __init__(self):
self.value = 0
def inc(self):
self.value += 1
return self.value
API
Functions -> Tasks
@ray.remote
def read_array(file):
# read array “a” from “file”
return a
@ray.remote
def add(a, b):
return np.add(a, b)
id1 = read_array.remote(“/input1”)
id2 = read_array.remote(“/input2”)
id3 = add.remote(id1, id2)
Classes -> Actors
@ray.remote
class Counter(object):
def __init__(self):
self.value = 0
def inc(self):
self.value += 1
return self.value
c = Counter.remote()
id4 = c.inc.remote()
id5 = c.inc.remote()
ray.get([id4, id5])
API
Functions -> Tasks
@ray.remote
def read_array(file):
# read array “a” from “file”
return a
@ray.remote(num_gpus=1)
def add(a, b):
return np.add(a, b)
id1 = read_array.remote(“/input1”)
id2 = read_array.remote(“/input2”)
id3 = add.remote(id1, id2)
Classes -> Actors
@ray.remote(num_gpus=1)
class Counter(object):
def __init__(self):
self.value = 0
def inc(self):
self.value += 1
return self.value
c = Counter.remote()
id4 = c.inc.remote()
id5 = c.inc.remote()
ray.get([id4, id5])
at Anyscale
Your app
here!
Native Libraries 3rd Party Libraries
Ecosystem
Universal framework for
Distributed computing
Ray Ecosystem
Ray Tune
Ray Tune: Scalable
Hyperparameter Tuning
Wide variety of algorithms Compatible with ML frameworks
HYPERBAND
PBT
BAYESIAN OPT.
Ray Tune focuses on
simplifying execution
Easily launch distributed multi-gpu
tuning jobs
Automatic fault tolerance to save
3x on GPU costs
https://www.vecteezy.com/
$ ray up {cluster config}
ray.init(address="auto")
tune.run(func, num_samples=100)
Ray Tune interoperates
with other HPO libraries
Ray Tune
Ax
Optuna
scikit-optimize
…
def train_model(config={}):
model = ConvNet(config)
for i in range(steps):
current_loss = model.train()
from ray import tune
def train_model(config={}):
model = ConvNet(config)
for i in range(steps):
current_loss = model.train()
tune.report(loss=current_loss)
def train_model(config):
model = ConvNet(config)
for i in range(epochs):
current_loss = model.train()
tune.report(loss=current_loss)
tune.run(train_model,
config={“lr”: 0.1})
tune.run(
train_model,
config={“lr”: tune.uniform(0.001, 0.1)},
num_samples=100
)
def train_model(config):
model = ConvNet(config)
for i in range(epochs):
current_loss = model.train()
tune.report(loss=current_loss)
tune.run(
train_model,
config={“lr”: tune.uniform(0.001, 0.1)},
num_samples=100,
scheduler=ASHAScheduler())
def train_model(config):
model = ConvNet(config)
for i in range(epochs):
current_loss = model.train()
tune.report(loss=current_loss)
tune.run(
train_model,
config={“lr”: tune.uniform(0.001, 0.1)},
num_samples=100,
scheduler=PopulationBasedTraining(...))
def train_model(config, checkpoint_dir=None):
model = ConvNet(config)
if checkpoint_dir is not None:
model.load_checkpoint(checkpoint_dir+”model.pt”)
for i in range(epochs):
current_loss = model.train()
with tune.checkpoint_dir() as dir:
model.save_checkpoint(dir+”model.pt”)
tune.report(loss=current_loss)
Ray Serve
Ray Serve is a
Web Framework
Built for
Model Serving
Model Serving in Python
Ray Serve is
high-performance and flexible
• Framework-agnostic
• Easily scales
• Supports batching
• Query your endpoints from
HTTP and from Python
• Easily integrate with other
tools
Ray Serve is built on top of Ray
For user, no need to think about:
• Interprocess communication
• Failure management
• Scheduling
Just tell Ray Serve to scale up your model.
Serve functions and stateful classes.
Ray Serve will use multiple replicas to parallelize
across cores and across nodes in your cluster.
Ray Serve API
Flexibility
Query your model from HTTP:
> curl "http://127.0.0.1:8000/my/route"
Or query from Python using ServeHandle:
MLflow
Challenges of ML in production
• It’s difficult to keep track of experiments.
• It’s difficult to reproduce code.
• There’s no standard way to package and deploy
models.
• There’s no central store to manage models (their
versions and stage transitions).
Source: mlflow.org
What is MLflow?
• Open-source ML lifecycle management tool
• Single solution for all of the above challenges
• Library-agnostic and language-agnostic
• (Works with your existing code)
Four key functions of MLflow
Source: MLflow
MLflow Tracking
MLflow Models
Ray + MLflow
Ray Tune + MLflow Tracking
def train_model(config):
model = ConvNet(config)
for i in range(epochs):
current_loss = model.train()
tune.report(loss=current_loss)
tune.run(
train_model,
config={“lr”: tune.uniform(0.001, 0.1)},
num_samples=100,
callbacks=[MLflowLoggerCallback(“my_experiment”)])
Ray Tune + MLflow Tracking
@mlflow_mixin
def train_model(config):
mlflow.autolog()
xgboost_results = xgb.train(config, ...)
tune.run(
train_model,
config={“lr”: tune.uniform(0.001, 0.1)},
num_samples=100)
+
> pip install mlflow-ray-serve
> ray start --head
> serve start
MLflow deployments CLI
Create deployment
> mlflow deployments create -t ray-serve -m <model URI>
--name my_model -C num_replicas=100
Model URI:
• models:/MyModel/1
• runs:/93203689db9c4b50afb6869
• s3://<bucket>/<path>
• ...
MLflow deployments Python API
Create model
Integrating with Ray Serve is easy.
• Ray Serve endpoints can be called from Python.
• Clean conceptual separation:
• Ray Serve handles data plane (processing)
• MLflow handles control plane (metadata, configuration)
Demo: An ML Platform built with MLflow and Ray
Acknowledgements
Thanks to Jules Damji, Sid Murching, and Paul Ogilvie for
their help and guidance with MLflow.
Thanks to Dmitri Gekhtman, Kai Fricke, Simon Mo,
Edward Oakes, Richard Liaw, Kathryn Zhou and the rest
of the Ray team!
Feedback
Your feedback is important to us.
Don’t forget to rate and review the sessions.

Más contenido relacionado

La actualidad más candente

Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesDatabricks
 
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML PlatformHow to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML PlatformDatabricks
 
MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle Databricks
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models BootcampData Science Dojo
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...Databricks
 
An introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceAn introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceJulien SIMON
 
Productionalizing Models through CI/CD Design with MLflow
Productionalizing Models through CI/CD Design with MLflowProductionalizing Models through CI/CD Design with MLflow
Productionalizing Models through CI/CD Design with MLflowDatabricks
 
Leveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management SystemLeveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management SystemSemantic Web Company
 
Building NLP applications with Transformers
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with TransformersJulien SIMON
 
Performance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud EnvironmentsPerformance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud EnvironmentsDatabricks
 
Apply MLOps at Scale
Apply MLOps at ScaleApply MLOps at Scale
Apply MLOps at ScaleDatabricks
 
MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.Knoldus Inc.
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)James Serra
 
MLflow with Databricks
MLflow with DatabricksMLflow with Databricks
MLflow with DatabricksLiangjun Jiang
 
Machine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationMachine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationDataWorks Summit
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life CycleMLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life CycleDatabricks
 
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis Labs
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis LabsRedis + Kafka = Performance at Scale | Julien Ruaux, Redis Labs
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis LabsHostedbyConfluent
 

La actualidad más candente (20)

Scaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark PipelinesScaling and Unifying SciKit Learn and Apache Spark Pipelines
Scaling and Unifying SciKit Learn and Apache Spark Pipelines
 
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML PlatformHow to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
How to Utilize MLflow and Kubernetes to Build an Enterprise ML Platform
 
MLOps.pptx
MLOps.pptxMLOps.pptx
MLOps.pptx
 
MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle MLFlow: Platform for Complete Machine Learning Lifecycle
MLFlow: Platform for Complete Machine Learning Lifecycle
 
Large Language Models Bootcamp
Large Language Models BootcampLarge Language Models Bootcamp
Large Language Models Bootcamp
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 
An introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceAn introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging Face
 
Productionalizing Models through CI/CD Design with MLflow
Productionalizing Models through CI/CD Design with MLflowProductionalizing Models through CI/CD Design with MLflow
Productionalizing Models through CI/CD Design with MLflow
 
What is MLOps
What is MLOpsWhat is MLOps
What is MLOps
 
Leveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management SystemLeveraging Knowledge Graphs in your Enterprise Knowledge Management System
Leveraging Knowledge Graphs in your Enterprise Knowledge Management System
 
Building NLP applications with Transformers
Building NLP applications with TransformersBuilding NLP applications with Transformers
Building NLP applications with Transformers
 
Performance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud EnvironmentsPerformance Analysis of Apache Spark and Presto in Cloud Environments
Performance Analysis of Apache Spark and Presto in Cloud Environments
 
Apply MLOps at Scale
Apply MLOps at ScaleApply MLOps at Scale
Apply MLOps at Scale
 
MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.
 
Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)Data Lakehouse, Data Mesh, and Data Fabric (r1)
Data Lakehouse, Data Mesh, and Data Fabric (r1)
 
MLflow with Databricks
MLflow with DatabricksMLflow with Databricks
MLflow with Databricks
 
MLOps for production-level machine learning
MLOps for production-level machine learningMLOps for production-level machine learning
MLOps for production-level machine learning
 
Machine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to ImplementationMachine Learning Model Deployment: Strategy to Implementation
Machine Learning Model Deployment: Strategy to Implementation
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life CycleMLflow: Infrastructure for a Complete Machine Learning Life Cycle
MLflow: Infrastructure for a Complete Machine Learning Life Cycle
 
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis Labs
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis LabsRedis + Kafka = Performance at Scale | Julien Ruaux, Redis Labs
Redis + Kafka = Performance at Scale | Julien Ruaux, Redis Labs
 

Similar a Building an ML Platform with Ray and MLflow

Ray and Its Growing Ecosystem
Ray and Its Growing EcosystemRay and Its Growing Ecosystem
Ray and Its Growing EcosystemDatabricks
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaChetan Khatri
 
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflowImproving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflowDatabricks
 
slide-keras-tf.pptx
slide-keras-tf.pptxslide-keras-tf.pptx
slide-keras-tf.pptxRithikRaj25
 
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...PAPIs.io
 
Python and Oracle : allies for best of data management
Python and Oracle : allies for best of data managementPython and Oracle : allies for best of data management
Python and Oracle : allies for best of data managementLaurent Leturgez
 
Rails Tips and Best Practices
Rails Tips and Best PracticesRails Tips and Best Practices
Rails Tips and Best PracticesDavid Keener
 
ACM Sunnyvale Meetup.pdf
ACM Sunnyvale Meetup.pdfACM Sunnyvale Meetup.pdf
ACM Sunnyvale Meetup.pdfAnyscale
 
Flux - Open Machine Learning Stack / Pipeline
Flux - Open Machine Learning Stack / PipelineFlux - Open Machine Learning Stack / Pipeline
Flux - Open Machine Learning Stack / PipelineJan Wiegelmann
 
Viktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceViktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceLviv Startup Club
 
S1 DML Syntax and Invocation
S1 DML Syntax and InvocationS1 DML Syntax and Invocation
S1 DML Syntax and InvocationArvind Surve
 
DML Syntax and Invocation process
DML Syntax and Invocation processDML Syntax and Invocation process
DML Syntax and Invocation processArvind Surve
 
Eclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science ProjectEclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science ProjectMatthew Gerring
 
Michael Hall [InfluxData] | Become an InfluxDB Pro in 20 Minutes | InfluxDays...
Michael Hall [InfluxData] | Become an InfluxDB Pro in 20 Minutes | InfluxDays...Michael Hall [InfluxData] | Become an InfluxDB Pro in 20 Minutes | InfluxDays...
Michael Hall [InfluxData] | Become an InfluxDB Pro in 20 Minutes | InfluxDays...InfluxData
 
D Trace Support In My Sql Guide To Solving Reallife Performance Problems
D Trace Support In My Sql Guide To Solving Reallife Performance ProblemsD Trace Support In My Sql Guide To Solving Reallife Performance Problems
D Trace Support In My Sql Guide To Solving Reallife Performance ProblemsMySQLConference
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...Databricks
 
Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...
Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...
Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...Databricks
 

Similar a Building an ML Platform with Ray and MLflow (20)

Ray and Its Growing Ecosystem
Ray and Its Growing EcosystemRay and Its Growing Ecosystem
Ray and Its Growing Ecosystem
 
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scalaAutomate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
Automate ml workflow_transmogrif_ai-_chetan_khatri_berlin-scala
 
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflowImproving the Life of Data Scientists: Automating ML Lifecycle through MLflow
Improving the Life of Data Scientists: Automating ML Lifecycle through MLflow
 
slide-keras-tf.pptx
slide-keras-tf.pptxslide-keras-tf.pptx
slide-keras-tf.pptx
 
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
Building machine learning service in your business — Eric Chen (Uber) @PAPIs ...
 
Python and Oracle : allies for best of data management
Python and Oracle : allies for best of data managementPython and Oracle : allies for best of data management
Python and Oracle : allies for best of data management
 
ProgrammingPrimerAndOOPS
ProgrammingPrimerAndOOPSProgrammingPrimerAndOOPS
ProgrammingPrimerAndOOPS
 
Rails Tips and Best Practices
Rails Tips and Best PracticesRails Tips and Best Practices
Rails Tips and Best Practices
 
ACM Sunnyvale Meetup.pdf
ACM Sunnyvale Meetup.pdfACM Sunnyvale Meetup.pdf
ACM Sunnyvale Meetup.pdf
 
Flux - Open Machine Learning Stack / Pipeline
Flux - Open Machine Learning Stack / PipelineFlux - Open Machine Learning Stack / Pipeline
Flux - Open Machine Learning Stack / Pipeline
 
Database programming
Database programmingDatabase programming
Database programming
 
Viktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning ServiceViktor Tsykunov: Azure Machine Learning Service
Viktor Tsykunov: Azure Machine Learning Service
 
S1 DML Syntax and Invocation
S1 DML Syntax and InvocationS1 DML Syntax and Invocation
S1 DML Syntax and Invocation
 
DML Syntax and Invocation process
DML Syntax and Invocation processDML Syntax and Invocation process
DML Syntax and Invocation process
 
Eclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science ProjectEclipse Con Europe 2014 How to use DAWN Science Project
Eclipse Con Europe 2014 How to use DAWN Science Project
 
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
Deep Learning for Computer Vision: Software Frameworks (UPC 2016)
 
Michael Hall [InfluxData] | Become an InfluxDB Pro in 20 Minutes | InfluxDays...
Michael Hall [InfluxData] | Become an InfluxDB Pro in 20 Minutes | InfluxDays...Michael Hall [InfluxData] | Become an InfluxDB Pro in 20 Minutes | InfluxDays...
Michael Hall [InfluxData] | Become an InfluxDB Pro in 20 Minutes | InfluxDays...
 
D Trace Support In My Sql Guide To Solving Reallife Performance Problems
D Trace Support In My Sql Guide To Solving Reallife Performance ProblemsD Trace Support In My Sql Guide To Solving Reallife Performance Problems
D Trace Support In My Sql Guide To Solving Reallife Performance Problems
 
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
A Tale of Three Deep Learning Frameworks: TensorFlow, Keras, & PyTorch with B...
 
Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...
Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...
Building Deep Reinforcement Learning Applications on Apache Spark with Analyt...
 

Más de Databricks

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDatabricks
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Databricks
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Databricks
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Databricks
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Databricks
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of HadoopDatabricks
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDatabricks
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceDatabricks
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringDatabricks
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixDatabricks
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationDatabricks
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchDatabricks
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesDatabricks
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsDatabricks
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkDatabricks
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkDatabricks
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesDatabricks
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkDatabricks
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeDatabricks
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionDatabricks
 

Más de Databricks (20)

DW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptxDW Migration Webinar-March 2022.pptx
DW Migration Webinar-March 2022.pptx
 
Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1Data Lakehouse Symposium | Day 1 | Part 1
Data Lakehouse Symposium | Day 1 | Part 1
 
Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2Data Lakehouse Symposium | Day 1 | Part 2
Data Lakehouse Symposium | Day 1 | Part 2
 
Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2Data Lakehouse Symposium | Day 2
Data Lakehouse Symposium | Day 2
 
Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4Data Lakehouse Symposium | Day 4
Data Lakehouse Symposium | Day 4
 
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
5 Critical Steps to Clean Your Data Swamp When Migrating Off of Hadoop
 
Democratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized PlatformDemocratizing Data Quality Through a Centralized Platform
Democratizing Data Quality Through a Centralized Platform
 
Learn to Use Databricks for Data Science
Learn to Use Databricks for Data ScienceLearn to Use Databricks for Data Science
Learn to Use Databricks for Data Science
 
Why APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML MonitoringWhy APM Is Not the Same As ML Monitoring
Why APM Is Not the Same As ML Monitoring
 
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch FixThe Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
The Function, the Context, and the Data—Enabling ML Ops at Stitch Fix
 
Stage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI IntegrationStage Level Scheduling Improving Big Data and AI Integration
Stage Level Scheduling Improving Big Data and AI Integration
 
Simplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorchSimplify Data Conversion from Spark to TensorFlow and PyTorch
Simplify Data Conversion from Spark to TensorFlow and PyTorch
 
Scaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on KubernetesScaling your Data Pipelines with Apache Spark on Kubernetes
Scaling your Data Pipelines with Apache Spark on Kubernetes
 
Sawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature AggregationsSawtooth Windows for Feature Aggregations
Sawtooth Windows for Feature Aggregations
 
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen SinkRedis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
Redis + Apache Spark = Swiss Army Knife Meets Kitchen Sink
 
Re-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and SparkRe-imagine Data Monitoring with whylogs and Spark
Re-imagine Data Monitoring with whylogs and Spark
 
Raven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction QueriesRaven: End-to-end Optimization of ML Prediction Queries
Raven: End-to-end Optimization of ML Prediction Queries
 
Processing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache SparkProcessing Large Datasets for ADAS Applications using Apache Spark
Processing Large Datasets for ADAS Applications using Apache Spark
 
Massive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta LakeMassive Data Processing in Adobe Using Delta Lake
Massive Data Processing in Adobe Using Delta Lake
 
Machine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack DetectionMachine Learning CI/CD for Email Attack Detection
Machine Learning CI/CD for Email Attack Detection
 

Último

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusTimothy Spann
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Serviceranjana rawat
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998YohFuh
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...Suhani Kapoor
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxolyaivanovalion
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSAishani27
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxolyaivanovalion
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysismanisha194592
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionfulawalesam
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130Suhani Kapoor
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 

Último (20)

Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
(PARI) Call Girls Wanowrie ( 7001035870 ) HI-Fi Pune Escorts Service
 
RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998RA-11058_IRR-COMPRESS Do 198 series of 1998
RA-11058_IRR-COMPRESS Do 198 series of 1998
 
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
VIP High Profile Call Girls Amravati Aarushi 8250192130 Independent Escort Se...
 
CebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptxCebaBaby dropshipping via API with DroFX.pptx
CebaBaby dropshipping via API with DroFX.pptx
 
Ukraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICSUkraine War presentation: KNOW THE BASICS
Ukraine War presentation: KNOW THE BASICS
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Ravak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptxRavak dropshipping via API with DroFx.pptx
Ravak dropshipping via API with DroFx.pptx
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Week-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interactionWeek-01-2.ppt BBB human Computer interaction
Week-01-2.ppt BBB human Computer interaction
 
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
VIP Call Girls Service Miyapur Hyderabad Call +91-8250192130
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 

Building an ML Platform with Ray and MLflow

  • 1. Building an ML Platform with Ray and MLflow Amog Kamsetty and Archit Kulkarni Ray Team @ Anyscale
  • 2. The Team Archit Kulkarni Amog Kamsetty Dmitri Gekhtman Edward Oakes Richard Liaw Kai Fricke Simon Mo Kathryn Zhou
  • 3. Overview of Talk ▪ What are ML Platforms? ▪ Ray and its libraries ▪ MLflow ▪ Demo: An ML Platform built with MLflow and Ray
  • 4. What are ML Platforms?
  • 6. Execution - Feature engineering - Training - Including tuning - Serving - Offline scoring, inference - Online serving Typical ML Process -- Simplified Management - Tracking - Data, Code, Configurations - Reproducing Results - Deployment - Deploy in a variety of environments
  • 7. Challenges with the ML Process Data/Features • Data Preparation • Data Analysis • Feature Engineering • Data Pipeline • Data Management/Feat ure Store • Manages big data clusters Model • ML Expertise • Implement SOTA ML Research • Experimentation • Manage GPU infrastructure • Scalable training & hyperparameter tuning Production • A/B Testing • Model Evaluation • Analysis of Predictions • Deploy in variety of environments • CI/CD • Highly Available prediction service Data/Research Scientist Engineers
  • 8. Challenges with the ML Process Data • Data Preparation • Data Analysis • Feature Engineering • Data Pipeline • Data Management/Feat ure Store • Manages big data clusters Model • ML Expertise • Implement SOTA ML Research • Experimentation • Manage GPU infrastructure • Scalable training & hyperparameter tuning Production • A/B Testing • Model Evaluation • Analysis of Predictions • Deploy in variety of environments • CI/CD • Highly Available prediction service Data/Research Scientist Software/Data/ ML Engineer ML Platform Abstraction
  • 9. ML Platforms -- Scale - LinkedIn: - 500+ “AI engineers” building models; 50+ MLP engineers - > 50% offline compute demand (12K servers each with 256G RAM) - More than 2x a year - Uber Michelangelo, AirBnB Bighead, Facebook FBLearner, etc. - Globally, a few Billion $ now, growing 40%+ YoY - Many companies building ML Platforms from the ground up
  • 10. ML Platforms -- Landscape (Source: Intel Capital)
  • 11. ML Platforms -- Landscape (Source: Intel Capital)
  • 12. Execution - Feature engineering 🔪 - Training 🍳 - Including tuning 🧂 - Serving 🍽 - Offline scoring, inference - Online serving Typical ML Process -- Simplified Management - Tracking 📝 - Data, Code, Configurations - Reproducing Results 📖 - Deployment 🚚 💻 - Deploy in a variety of environments
  • 13. Execution - Feature engineering 🔪 - Training 🍳 - Including tuning 🧂 - Serving 🍽 - Offline scoring, inference - Online serving Typical ML Process -- Simplified Management - Tracking 📝 - Data, Code, Configurations - Reproducing Results 📖 - Deployment 🚚 💻 - Variety of environments
  • 14. Ray and its Libraries
  • 15. What is Ray? • A simple/general library for distributed computing • Single machine or 100s of nodes • Agnostic to the type of work • An ecosystem of libraries (for scaling ML and more) • Native: Ray RLlib, Ray Tune, Ray Serve • Third party: Modin, Dask, Horovod, XGBoost, Pytorch Lightning • Tools for launching clusters on any cloud provider
  • 16. Three key ideas Execute remote functions as tasks, and instantiate remote classes as actors • Support both stateful and stateless computations Asynchronous execution using futures • Enable parallelism Distributed (immutable) object store • Efficient communication (send arguments by reference)
  • 18. API Functions -> Tasks def read_array(file): # read array “a” from “file” return a def add(a, b): return np.add(a, b)
  • 19. API Functions -> Tasks @ray.remote def read_array(file): # read array “a” from “file” return a @ray.remote def add(a, b): return np.add(a, b)
  • 20. API Functions -> Tasks @ray.remote def read_array(file): # read array “a” from “file” return a @ray.remote def add(a, b): return np.add(a, b) id1 = read_array.remote(“/input1”) id1 read_array
  • 21. API Functions -> Tasks @ray.remote def read_array(file): # read array “a” from “file” return a @ray.remote def add(a, b): return np.add(a, b) id1 = read_array.remote(“/input1”) id2 = read_array.remote(“/input2”) id1 read_array id2 zeros read_array
  • 22. API Functions -> Tasks @ray.remote def read_array(file): # read array “a” from “file” return a @ray.remote def add(a, b): return np.add(a, b) id1 = read_array.remote(“/input1”) id2 = read_array.remote(“/input2”) id3 = add.remote(id1, id2) id1 read_array id2 zeros read_array id3 add
  • 23. API Functions -> Tasks @ray.remote def read_array(file): # read array “a” from “file” return a @ray.remote def add(a, b): return np.add(a, b) id1 = read_array.remote(“/input1”) id2 = read_array.remote(“/input2”) id3 = add.remote(id1, id2); ray.get(id3) id1 read_array id2 zeros read_array id3 add
  • 24. API Functions -> Tasks @ray.remote def read_array(file): # read array “a” from “file” return a @ray.remote def add(a, b): return np.add(a, b) id1 = read_array.remote(“/input1”) id2 = read_array.remote(“/input2”) id3 = add.remote(id1, id2) Classes -> Actors
  • 25. API Functions -> Tasks @ray.remote def read_array(file): # read array “a” from “file” return a @ray.remote def add(a, b): return np.add(a, b) id1 = read_array.remote(“/input1”) id2 = read_array.remote(“/input2”) id3 = add.remote(id1, id2) Classes -> Actors @ray.remote class Counter(object): def __init__(self): self.value = 0 def inc(self): self.value += 1 return self.value
  • 26. API Functions -> Tasks @ray.remote def read_array(file): # read array “a” from “file” return a @ray.remote def add(a, b): return np.add(a, b) id1 = read_array.remote(“/input1”) id2 = read_array.remote(“/input2”) id3 = add.remote(id1, id2) Classes -> Actors @ray.remote class Counter(object): def __init__(self): self.value = 0 def inc(self): self.value += 1 return self.value c = Counter.remote() id4 = c.inc.remote() id5 = c.inc.remote() ray.get([id4, id5])
  • 27. API Functions -> Tasks @ray.remote def read_array(file): # read array “a” from “file” return a @ray.remote(num_gpus=1) def add(a, b): return np.add(a, b) id1 = read_array.remote(“/input1”) id2 = read_array.remote(“/input2”) id3 = add.remote(id1, id2) Classes -> Actors @ray.remote(num_gpus=1) class Counter(object): def __init__(self): self.value = 0 def inc(self): self.value += 1 return self.value c = Counter.remote() id4 = c.inc.remote() id5 = c.inc.remote() ray.get([id4, id5])
  • 28. at Anyscale Your app here! Native Libraries 3rd Party Libraries Ecosystem Universal framework for Distributed computing Ray Ecosystem
  • 30. Ray Tune: Scalable Hyperparameter Tuning Wide variety of algorithms Compatible with ML frameworks HYPERBAND PBT BAYESIAN OPT.
  • 31. Ray Tune focuses on simplifying execution Easily launch distributed multi-gpu tuning jobs Automatic fault tolerance to save 3x on GPU costs https://www.vecteezy.com/ $ ray up {cluster config} ray.init(address="auto") tune.run(func, num_samples=100)
  • 32. Ray Tune interoperates with other HPO libraries Ray Tune Ax Optuna scikit-optimize …
  • 33. def train_model(config={}): model = ConvNet(config) for i in range(steps): current_loss = model.train()
  • 34. from ray import tune def train_model(config={}): model = ConvNet(config) for i in range(steps): current_loss = model.train() tune.report(loss=current_loss)
  • 35. def train_model(config): model = ConvNet(config) for i in range(epochs): current_loss = model.train() tune.report(loss=current_loss) tune.run(train_model, config={“lr”: 0.1})
  • 36. tune.run( train_model, config={“lr”: tune.uniform(0.001, 0.1)}, num_samples=100 ) def train_model(config): model = ConvNet(config) for i in range(epochs): current_loss = model.train() tune.report(loss=current_loss)
  • 37. tune.run( train_model, config={“lr”: tune.uniform(0.001, 0.1)}, num_samples=100, scheduler=ASHAScheduler()) def train_model(config): model = ConvNet(config) for i in range(epochs): current_loss = model.train() tune.report(loss=current_loss)
  • 38. tune.run( train_model, config={“lr”: tune.uniform(0.001, 0.1)}, num_samples=100, scheduler=PopulationBasedTraining(...)) def train_model(config, checkpoint_dir=None): model = ConvNet(config) if checkpoint_dir is not None: model.load_checkpoint(checkpoint_dir+”model.pt”) for i in range(epochs): current_loss = model.train() with tune.checkpoint_dir() as dir: model.save_checkpoint(dir+”model.pt”) tune.report(loss=current_loss)
  • 40. Ray Serve is a Web Framework Built for Model Serving
  • 42. Ray Serve is high-performance and flexible • Framework-agnostic • Easily scales • Supports batching • Query your endpoints from HTTP and from Python • Easily integrate with other tools
  • 43. Ray Serve is built on top of Ray For user, no need to think about: • Interprocess communication • Failure management • Scheduling Just tell Ray Serve to scale up your model.
  • 44. Serve functions and stateful classes. Ray Serve will use multiple replicas to parallelize across cores and across nodes in your cluster. Ray Serve API
  • 45. Flexibility Query your model from HTTP: > curl "http://127.0.0.1:8000/my/route" Or query from Python using ServeHandle:
  • 47. Challenges of ML in production • It’s difficult to keep track of experiments. • It’s difficult to reproduce code. • There’s no standard way to package and deploy models. • There’s no central store to manage models (their versions and stage transitions). Source: mlflow.org
  • 48. What is MLflow? • Open-source ML lifecycle management tool • Single solution for all of the above challenges • Library-agnostic and language-agnostic • (Works with your existing code)
  • 49. Four key functions of MLflow Source: MLflow
  • 53. Ray Tune + MLflow Tracking def train_model(config): model = ConvNet(config) for i in range(epochs): current_loss = model.train() tune.report(loss=current_loss) tune.run( train_model, config={“lr”: tune.uniform(0.001, 0.1)}, num_samples=100, callbacks=[MLflowLoggerCallback(“my_experiment”)])
  • 54. Ray Tune + MLflow Tracking @mlflow_mixin def train_model(config): mlflow.autolog() xgboost_results = xgb.train(config, ...) tune.run( train_model, config={“lr”: tune.uniform(0.001, 0.1)}, num_samples=100)
  • 55. + > pip install mlflow-ray-serve > ray start --head > serve start
  • 56. MLflow deployments CLI Create deployment > mlflow deployments create -t ray-serve -m <model URI> --name my_model -C num_replicas=100 Model URI: • models:/MyModel/1 • runs:/93203689db9c4b50afb6869 • s3://<bucket>/<path> • ...
  • 57. MLflow deployments Python API Create model
  • 58. Integrating with Ray Serve is easy. • Ray Serve endpoints can be called from Python. • Clean conceptual separation: • Ray Serve handles data plane (processing) • MLflow handles control plane (metadata, configuration)
  • 59. Demo: An ML Platform built with MLflow and Ray
  • 60. Acknowledgements Thanks to Jules Damji, Sid Murching, and Paul Ogilvie for their help and guidance with MLflow. Thanks to Dmitri Gekhtman, Kai Fricke, Simon Mo, Edward Oakes, Richard Liaw, Kathryn Zhou and the rest of the Ray team!
  • 61.
  • 62. Feedback Your feedback is important to us. Don’t forget to rate and review the sessions.