In the last several months, MLflow has introduced significant platform enhancements that simplify machine learning lifecycle management. Expanded autologging capabilities, including a new integration with scikit-learn, have streamlined the instrumentation and experimentation process in MLflow Tracking. Additionally, schema management functionality has been incorporated into MLflow Models, enabling users to seamlessly inspect and control model inference APIs for batch and real-time scoring. In this session, we will explore these new features. We will share MLflow’s development roadmap, providing an overview of near-term advancements in the platform.
Architecture decision records - How not to get lost in the past
Whats new in_mlflow
1. What’s new in ?
Corey Zumar
Software Engineer, Databricks
A System to Accelerate the Machine Learning Lifecycle
2. Overview of new developments
Feature deep dives
Autologging for scikit-learn
Model schemas & input examples
MLflow’s plugin ecosystem
What’s next for MLflow?
Outline
3. : An Open Source ML Platform
Experiment
management
Model managementReproducible runs Model packaging
and deployment
T R A C K I N G P R O J E C T S M O D E L R E G I S T R YM O D E L S
Training
Deployment
Raw Data
Data Prep
ML Engineer
Application
DeveloperData Engineer
Any Language
Any ML Library
4. : Overview of new developments
TRACKING PROJECTS MODELS MODEL REGISTRY
- Scikit-learn
autologging (1.11)
- Fast.ai
autologging (1.9)
- UI accessibility,
syntax highlighting
for artifacts + PDF
support (1.9, 1.11)
- Backend plugin
support (1.9)
- YARN execution
backend
- Expanded
artifact
resolution
capabilities (1.9)
- Model schemas &
input examples
(1.9)
- Deployment
plugin support
(1.9)
- Spacy model
flavor (1.8)
- Tags for registered
models and model
versions (1.9)
- Enhanced version
comparison UI, including
schemas (1.11)
- Simplified model
archiving (1.10)
5. Overview of new developments
Feature deep dives
Autologging for scikit-learn
Model signatures & input examples
MLflow’s plugin ecosystem
What’s next for MLflow?
Outline
21. Overview of new developments
Feature deep dives
Autologging for scikit-learn
Model schemas & input examples
MLflow’s plugin ecosystem
What’s next for MLflow?
Outline
22. Model Schemas
Specify input and output data types for models
Incompatible schemas!
Model
Input Schema
Output Schema
Check Compatibility
and Validate New
Model Versions
new
in
1.9
zipcode: string,
sqft: double,
distance: double
price: double
log_model(…)
23. Model Schemas
Infer model input / output signature from data
infer_signature(
inputs,
outputs
)
inputs: [
'year built': long,
'year sold': long,
'lot area': long,
'zip code': long,
'quality': long
]
outputs: ['sale price': double]
log_model(
...,
signature
)
24. Model Schemas
Validate inputs against schema during inference
input_frame = pd.DataFrame.from_dict({
"year built": data["year built"],
"year sold" data["year sold"],
"lot area": data["lot area"],
"zip code": data["zip code"],
"condition": data["condition"],
})
Input Schema
Output Schema
Model
Schema Mismatch Error
25.
26.
27.
28.
29.
30.
31.
32.
33.
34. Overview of new developments
Feature deep dives
Autologging for scikit-learn
Model schemas & input examples
MLflow’s plugin ecosystem
What’s next for MLflow?
Outline
37. Creating a plugin
1. Implement the plugin interface
(https://mlflow.org/docs/latest/plugins.html)
2. Add an MLflow plugin entrypoint & upload your package to PyPI
3. Add your plugin to the MLflow docs:
https://mlflow.org/docs/latest/plugins.html#community-plugins
38. Pluggable way to create and manage
deployment endpoints in MLflow
Used in new endpoint
Other integrations being ported:
Deployments API
mlflow deployments create -t redisai -n spam
-m models:/SpamScorer/production
mlflow deployments predict -t redisai –n spam
-f emails.json
new
in
1.9
39. Pluggable way to execute MLflow
Projects on a variety of compute
resources
Used for new YARN backend
Other integrations being ported:
Project Backend Plugins
mlflow run --backend yarn
https://github.com/mlflow/mlflow#examples/pytorch
new
in
1.9
40. Community Plugins
Elastic Search backend for MLflow Tracking (experimental)
(https://pypi.org/project/mlflow-elasticsearchstore/)
Model deployment to RedisAI (https://pypi.org/project/mlflow-redisai/)
Project execution on YARN (https://pypi.org/project/mlflow-yarn/)
Artifact storage in SQL Server
(https://pypi.org/project/mlflow-dbstore/)
Artifact storage in Alibaba Cloud OSS
(https://pypi.org/project/aliyunstoreplugin/)
41. Overview of new developments
Feature deep dives
Autologging for scikit-learn
Model schemas & input examples
MLflow’s plugin ecosystem
What’s next for MLflow?
Outline
42. What’s next for ?
Model explainability
integrations (see Data + AI
Summit Europe 2020)
Support for tensor input /
output schemas
Expanded input example &
schema collection for
autologging (XGBoost,
TensorFlow, and more)
Input Schema
Output Schema
autolog()
Model
43. Model Serving on Databricks
Tracking
Experiment tracking
Logged
Model
Model Registry
Model management
Model Serving
Turnkey serving for
MLflow models
new
Staging Production Archived
Data Scientists Application Engineers
Reports
Applications
...
REST
Endpoint
public
preview
Deployment Backends