Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.

MLFlow 1.0 Meetup

806 visualizaciones

Publicado el

MLflow 1.0 is coming soon as the first stable release of MLflow. It also packs many cleanups and improvements, such as simpler metadata management, search APIs and HDFS support. In this talk, we’ll present these new features in detail, and then discuss additional MLflow components that Databricks and other companies are working on for the rest of 2019. These new tools include a model registry to share and track models, as well as a multi-step workflow abstraction, both of which were announced at Spark + AI Summit 2019.

Publicado en: Software
  • Sé el primero en comentar

MLFlow 1.0 Meetup

  1. 1. What’s New in 1.0 and Beyond Matei Zaharia, Clemens Mewald, Richard Zang
  2. 2. Outline MLflow Intro (Matei Zaharia) Overview of Components (Clemens Mewald) MLflow 1.0 & Roadmap (Clemens Mewald) Demo (Richard Zang)
  3. 3. ML Lifecycle Challenges Delta Tuning Model Mgmt Raw Data ETL TrainFeaturize Score/Serve Batch + Realtime Monitor Alert, Debug Deploy AutoML, Hyper-p. search Experiment Tracking Remote Cloud Execution Project Mgmt (scale teams) Model Exchange Data Drift Model Drift Orchestration (Airflow) A/B Testing CI/CD/Jenkins push to prod Feature Repository Lifecycle mgmt. RetrainUpdate FeaturesProduction Logs Zoo of Ecosystem Frameworks Collaboration Scale Governance An open source platform for the machin learning lifecycle
  4. 4. What is MLflow? An open source, extensible framework to manage the complete ML Lifecycle.
  5. 5. MLflow Community Growth 600k 100+ 40 Comparison: Apache Spark took 3 years to get to 100 contributors, and has 1.2M downloads/month on PyPI
  6. 6. Community Growth in Context ● Time till 100 contributors: MLflow = 1 year, Spark = 3 years ● 600,000 monthly downloads on PyPI Package Downloads Last Month mlflow 600,000 h2o 45,000 sagemaker 50,000 pyspark 729,000 scikit-learn 7,743,000
  7. 7. Some Users & Contributors
  8. 8. Supported Integrations: June ‘18 8
  9. 9. Supported Integrations: June ‘19 9
  10. 10. What Does the 1.0 Release Mean? API stability of the original components • Safe to build apps and integrations around them long term Time to start adding some new features! 10
  11. 11. MLflow Components 11 Tracking Record and query experiments: code, data, config, results Projects Packaging format for reproducible runs on any platform Models General model format that supports diverse deployment tools mlflow.org github.com/mlflow twitter.com/MLflowdatabricks.com/mlflow
  12. 12. Notebooks Local Apps Cloud Jobs Tracking Server UI API MLflow Tracking Python or REST API
  13. 13. Key Concepts in Tracking Parameters: key-value inputs to your code Metrics: numeric values (can update over time) Artifacts: arbitrary files, including models Source: what code ran?
  14. 14. Project Spec Code DataConfig Local Execution Remote Execution MLflow Projects
  15. 15. Example MLflow Project my_project/ ├── MLproject │ │ │ │ │ ├── conda.yaml ├── main.py └── model.py ... conda_env: conda.yaml entry_points: main: parameters: training_data: path lambda: {type: float, default: 0.1} command: python main.py {training_data} {lambda} $ mlflow run git://<my_project> mlflow.run(“git://<my_project>”, ...)
  16. 16. Model Format Flavor 2Flavor 1 Run Sources Inference Code Batch & Stream Scoring Cloud Serving Tools MLflow Models Simple model flavors usable by many tools
  17. 17. Example MLflow Model my_model/ ├── MLmodel │ │ │ │ │ └── estimator/ ├── saved_model.pb └── variables/ ... Usable by tools that understand TensorFlow model format Usable by any tool that can run Python (Docker, Spark, etc!) run_id: 769915006efd4c4bbd662461 time_created: 2018-06-28T12:34 flavors: tensorflow: saved_model_dir: estimator signature_def_key: predict python_function: loader_module: mlflow.tensorflow
  18. 18. MLflow Components 18 Tracking Record and query experiments: code, data, config, results Projects Packaging format for reproducible runs on any platform Models General model format that supports diverse deployment tools mlflow.org github.com/mlflow twitter.com/MLflowdatabricks.com/mlflow
  19. 19. What’s new with
  20. 20. Selected New Features in MLflow 1.0 • Support for logging metrics per user-defined step • Improved search • HDFS support for artifacts • ONNX Model Flavor [experimental] • Deploying an MLflow Model as a Docker Image [experimental]
  21. 21. Support for logging metrics per user-defined step Metrics logged at the end of a run, e.g.: ● Overall accuracy ● Overall AUC ● Overall loss Metrics logged while training, e.g.: ● Accuracy per minibatch ● AUC per minibatch ● Loss per minibatch Currently visualized by logging order:
  22. 22. Support for logging metrics per user-defined step New step argument for log_metric ● Define the x coordinate for the metric ● Define ordering and scale of the horizontal axis in visualizations log_metric ("exp", 1, 10) log_metric ("exp", 2, 1000) log_metric ("exp", 4, 10000) log_metric ("exp", 8, 100000) log_metric ("exp", 16, 1000000) log_metric(key, value, step=None)
  23. 23. Improved Search Search API supports a simplified version of the SQL WHERE clause, e.g.: params.model = "LogisticRegression" and metrics.error <= 0.05
  24. 24. Improved Search Search API supports a simplified version of the SQL WHERE clause, e.g.: params.model = "LogisticRegression" and metrics.error <= 0.05 all_experiments = [exp.experiment_id for exp in MlflowClient().list_experiments()] runs = MlflowClient().search_runs( all_experiments, "params.model='LogisticRegression'" " and metrics.error<=0.05", ViewType.ALL) Python API Example
  25. 25. Improved Search Search API supports a simplified version of the SQL WHERE clause, e.g.: Python API Example UI Example all_experiments = [exp.experiment_id for exp in MlflowClient().list_experiments()] runs = MlflowClient().search_runs( all_experiments, "params.model='LogisticRegression'" " and metrics.error<=0.05", ViewType.ALL) params.model = "LogisticRegression" and metrics.error <= 0.05
  26. 26. HDFS Support for Artifacts mlflow.log_artifact(local_path, artifact_path=None) AWS S3 Azure Blob Store Google Cloud Storage HDFS● DBFS ● NFS ● FTP ● SFTP Supported Artifact Stores
  27. 27. ONNX Model Flavor [Experimental] ONNX models export both • ONNX native format • Pyfunc mlflow.onnx.load_model(model_uri) mlflow.onnx.log_model(onnx_model, artifact_path, conda_env=None) mlflow.onnx.save_model(onnx_model, path, conda_env=None, mlflow_model=<mlflow.models.Model object>) Supported Model Flavors Scikit TensorFlow MLlib H2O PyTorch Keras MLeap Python Function R FunctionONNX
  28. 28. Docker Build [Experimental] $ mlflow models build-docker -m "runs:/some-run-uuid/my-model" -n "my-image-name" $ docker run -p 5001:8080 "my-image-name" Builds a Docker image whose default entrypoint serves the specified MLflow model at port 8080 within the container.
  29. 29. 29 mlflow.org github.com/mlflow twitter.com/MLflowdatabricks.com/mlflow beyond 1.0
  30. 30. What users want to see next
  31. 31. What’s coming soon • New component: Model Registry • Version-controlled registry of models • Model lifecycle management • Model monitoring
  32. 32. What’s coming soon • New component: Model Registry • Version-controlled registry of models • Model lifecycle management • Model monitoring • Auto-logging from common frameworks
  33. 33. What’s coming soon • New component: Model Registry • Version-controlled registry of models • Model lifecycle management • Model monitoring • Auto-logging from common frameworks • Parallel coordinates plot
  34. 34. What’s coming soon • New component: Model Registry • Version-controlled registry of models • Model lifecycle management • Model monitoring • Auto-logging from common frameworks • Parallel coordinates plot • Kubernetes remote run
  35. 35. What’s coming soon • New component: Model Registry • Version-controlled registry of models • Model lifecycle management • Model monitoring • Auto-logging from common frameworks • Parallel coordinates plot • Kubernetes remote run • Delta Lake integration (Delta.io) for Data Versioning
  36. 36. What’s coming soon • New component: Model Registry • Version-controlled registry of models • Model lifecycle management • Model monitoring • Auto-logging from common frameworks • Parallel coordinates plot • Kubernetes remote run • Delta Lake integration (Delta.io) for Data Versioning • And more...
  37. 37. 37 mlflow.org github.com/mlflow twitter.com/MLflowdatabricks.com/mlflow Demo
  38. 38. 38 mlflow.org github.com/mlflow twitter.com/MLflowdatabricks.com/mlflow Thank You

×