Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
End-to-End ML Pipelines
TFX + KubeFlow + Airflow + MLflow + TPU
Chris Fregly
Founder @ .
Founder @ PipelineAI
Real-time Machine Learning and AI in Production
Former Databricks, Netflix
Apache Spark Contributor
O...
Advanced Spark and TensorFlow Meetup (Global, Monthly Events)
https://meetup.com/Advanced-Spark-and-TensorFlow-Meetup
Upcoming Full-Day Workshop on Saturday, November 2, 2019!
https://pipeline.ai @cfregly @PipelineAI
Next Workshop:
Nov 2, 2...
1 OK with Command Line?
2 OK with Python?
3 OK with Linear Algebra?
Who are you?
4 OK with Docker?
6
5 OK with Jupyter Not...
Recent Poll (July 2019)
4,000 Stars = $6,000,000 Seed
$1,500 per GitHub Star?!
(Please star our repo ASAP!!)
Recent Comment from Popular VC Invest...
Community Edition
https://community.pipeline.ai
Note #1 of 10
IGNORE WARNINGS & ERRORS
Everything will be OK!
Note #2 of 10
THERE IS A LOT OF MATERIAL HERE
Many opportunities to explore on your own.
(Don’t upload sensitive data)
Note #3 of 10
YOU HAVE YOUR OWN INSTANCE
16 CPU, 104 GB RAM, 200GB SSD
Note #4 of 10
DATASETS
Chicago Taxi Dataset
(and various others)
Note #5 of 10
SOME NOTEBOOKS TAKE MINUTES
Please be patient.
(We are using large datasets)
Note #6 of 10
QUESTIONS?
Post questions to Zoom chat or Q&A.
(Antje and I will answer soon)
Antje >
Note #7 of 10
KUBEFLOW IS NOT A SILVER BULLET
There are still gaps in the pipeline.
(But gaps are getting smaller)
Note #8 of 10
THIS IS NOT CLOUD DEPENDENT*
*Except for 2 small exceptions…
Patches are underway.
Note #9 of 10
PRIMARILY TENSORFLOW 1.x
TF 2.x is not fully supported by TFX
(We have a section on TF 2)
Note #10 of 10
SHUTDOWN EACH NOTEBOOK AFTER
We are using complex browser voo-doo.
System 6
System 5System 4
Training
At Scale
System 3
System 1
Data
Ingestion
Data
Analysis
Data
Transform
Data
Validation
...
1 Setup Environment with Kubernetes
TensorFlow Extended (TFX)
ML Pipelines with Airflow and KubeFlow
Agenda
Hyper-Paramete...
Bonus Extras!
Keras Tuner11
12
A/B Tests13
Metrics and Monitoring
6 TPUs
MLflow
TensorFlow Privacy
7
8
9
10
Papermill
Tens...
Hands On
00_Explore_Environment
1.1 Kubernetes
TensorFlow Extended (TFX)
Airflow ML Pipelines
1.0 Environment Overview
KubeFlow ML Pipelines
6
Hyper-Param...
1.1 Kubernetes
Kubernetes
NFS
Ceph
Cassandra
MySQL
Spark
Airflow
Tensorflow
Caffe
TF-Serving
Flask+Scikit
Operating system...
Hands On
01_Explore_Kubernetes_Cluster
1.2 TensorFlow Extended (TFX)
Feature
Load
Feature
Analyze Feature
Transform
Model
Train
Model
Evaluate
Model
Deploy
Repro...
1.3 Airflow ML Pipelines
1.4 KubeFlow ML Pipelines
1.5 MLflow Experiment Tracking
1.6 Hyper-Parameter Tuning (Katib)
1.7 Prediction Traffic Routing (Istio)
1 Setup Environment with Kubernetes
TensorFlow Extended (TFX)
ML Pipelines with Airflow and KubeFlow
Agenda
Hyper-Paramete...
2.1 TFX Internals
2.0 TFX Components
6
Feature
Load
Feature
Analyze Feature
Transform
Model
Train
Model
Evaluate
Model
Dep...
2.1 TFX Internals
Driver/Publisher
Moves data to/from Metadata Store
Executor
Runs the Actual Processing Code
Metadata Sto...
2.2 TFX Libraries
2.2.1
TFX Components Use These:
TensorFlow Data Validation (TFDV)
TensorFlow Transform (TFT)
TensorFlow ...
2.2.1 TFX Libraries - TFDV
TensorFlow Data Validation (TFDV)
Find Missing, Redundant & Important Features
Identify Feature...
Hands On
02_TensorFlow_Data_Validation
(TFDV)
2.2.2 TFX Libraries - TFT
TensorFlow Transform (TFT)
Preprocess `tf.Example` data with TensorFlow
Useful for data that req...
Hands On
03_TensorFlow_Transform
(TFT)
Hands On
03a_TensorFlow_Transform_Advanced
(TFT)
2.2.3 TFX Libraries - TFMA
TensorFlow Model Analysis (TFMA)
Analyze Model on Different Slices of Dataset
Track Metrics Ove...
Hands On
04_TensorFlow_Model_Analysis
(TFMA)
2.2.4 TFX Libraries – Metadata
TensorFlow Metadata (TFMD)
ML Metadata (MLMD)
Record and Retrieve Experiment Metadata
Artif...
2.3 TFX Components
ExampleGen
StatisticsGen
SchemaGen
ExampleValidator
Evaluator
Transform
ModelValidator
Trainer
Model Pu...
2.3.1 ExampleGen
Feature
Load
Feature
Analyze Feature
Transform
Model
Train
Model
Evaluate
Model
Deploy Reproduce
Training...
2.3.2 StatisticsGen
Feature
Load
Feature
Analyze Feature
Transform
Model
Train
Model
Evaluate
Model
Deploy Reproduce
Train...
2.3.3 SchemaGen
Feature
Load
Feature
Analyze Feature
Transform
Model
Train
Model
Evaluate
Model
Deploy Reproduce
Training
...
2.3.4 ExampleValidator
Feature
Load
Feature
Analyze Feature
Transform
Model
Train
Model
Evaluate
Model
Deploy Reproduce
Tr...
2.3.5 Transform
Feature
Load
Feature
Analyze Feature
Transform
Model
Train
Model
Evaluate
Model
Deploy Reproduce
Training
...
2.3.6 Trainer
Feature
Load
Feature
Analyze Feature
Transform
Model
Train
Model
Evaluate
Model
Deploy Reproduce
Training
Tr...
2.3.7 Evaluator
Feature
Load
Feature
Analyze Feature
Transform
Model
Train
Model
Evaluate
Model
Deploy Reproduce
Training
...
2.3.8 ModelValidator
Feature
Load
Feature
Analyze Feature
Transform
Model
Train
Model
Evaluate
Model
Deploy Reproduce
Trai...
2.3.9 Model Pusher (Deployer)
Feature
Load
Feature
Analyze Feature
Transform
Model
Train
Model
Evaluate
Model
Deploy Repro...
2.3.10 Slack Component (!!)
Feature
Load
Feature
Analyze Feature
Transform
Model
Train
Model
Evaluate
Model
Deploy Reprodu...
1 Setup Environment with Kubernetes
TensorFlow Extended (TFX)
ML Pipelines with Airflow and KubeFlow
Agenda
Hyper-Paramete...
3.0 ML Pipelines with Airflow and KubeFlow
Feature
Load
Feature
Analyze Feature
Transform
Model
Train
Model
Evaluate
Model...
3.1 Airflow
6
Feature
Load
Feature
Analyze Feature
Transform
Model
Train
Model
Evaluate
Model
Deploy Reproduce
Training
Mo...
Hands On
05_Airflow_ML_Pipelines
(Chicago Taxi Dataset)
Hands On
06_Airflow_Feature_Analysis
Hands On
07_Airflow_Model_Analysis
3.2 KubeFlow
6
Feature
Load
Feature
Analyze Feature
Transform
Model
Train
Model
Evaluate
Model
Deploy Reproduce
Training
P...
Hands On
08_Simple_KubeFlow_ML_Pipeline
Hands On
09_Advanced_KubeFlow_ML_Pipeline
(Chicago Taxi Dataset)
Hands On
10_Distributed_TensorFlow_Job
Hands On
10a_Distributed_PyTorch_Job
1 Setup Environment with Kubernetes
TensorFlow Extended (TFX)
ML Pipelines with Airflow and KubeFlow
Agenda
Hyper-Paramete...
4.0 Hyper-Parameter Tuning
6
Experiment
Single Optimization Run
Single Objective Function Across Runs
Contains Many Trials...
Hands On
11_Hyper_Parameter_Tuning
1 Setup Environment with Kubernetes
TensorFlow Extended (TFX)
ML Pipelines with Airflow and KubeFlow
Agenda
Hyper-Paramete...
5.0 Deploy Notebook as Job
Feature
Load
Feature
Analyze Feature
Transform
Model
Train
Model
Evaluate
Model
Deploy Reproduc...
5.1 Create Docker Image
Feature
Load
Feature
Analyze Feature
Transform
Model
Train
Model
Evaluate
Model
Deploy Reproduce
T...
5.2 Deploy Notebook as Job
Feature
Load
Feature
Analyze Feature
Transform
Model
Train
Model
Evaluate
Model
Deploy Reproduc...
Hands On
12_Deploy_Notebook_Xgboost
Hands On
12a_Deploy_Notebook_TensorFlow
1 Setup Environment with Kubernetes
TensorFlow Extended (TFX)
ML Pipelines with TFX, Airflow, and KubeFlow
Agenda
Hyper-Pa...
Bonus Extras!
Keras Tuner11
12
A/B Tests13
Metrics and Monitoring
6 TPUs
MLflow
TensorFlow Privacy
7
8
9
10
Papermill
Tens...
6.0 TPUs
Hands On
13_TPU_Keras_MNIST
7.0 MLflow
7.1 Experiment Tracking
Hyper-Parameter Tuning
Kubernetes-based Jobs
7.2
7.3
Hands On
14_MLflow_Scikit_Learn
Hands On
14a_MLflow_Keras
Hands On
14b_MLflow_TensorFlow
8.0 Papermill
Hands On
15_Papermill_Notebook_Job
9.0 TensorFlow Privacy (Differential Privacy)
Hands On
16_TF_Privacy
10.0 TensorFlow 2.0
11.0 Keras Tuner
Hands On
17_Keras_Tuner
12.0 Model Serving
Hands On
18_Simple_Serving_REST
Hands On
18a_AB_Test_REST
Bonus Extras!
Keras Tuner11
12
A/B Tests13
Metrics and Monitoring
6 TPUs
MLflow
TensorFlow Privacy
7
8
9
10
Papermill
Tens...
Thank you!
https://pipeline.ai @cfregly @PipelineAI
Next Workshop:
Nov 2, 2019
Next Workshop:
Nov 2, 2019
Próxima SlideShare
Cargando en…5
×

Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTorch + XGBoost + Airflow + MLflow + Spark + Jupyter + TPU

2.844 visualizaciones

Publicado el

Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTorch + XGBoost + Airflow + MLflow + Spark + Jupyter + TPU

RSVP Here: https://www.eventbrite.com/e/full-day-workshop-kubeflow-kerastensorflow-20-tf-extended-tfx-kubernetes-pytorch-xgboost-airflow-tickets-63362929227

Description

In this workshop, we build real-world machine learning pipelines using TensorFlow Extended (TFX), KubeFlow, and Airflow.

Described in the 2017 paper, TFX is used internally by thousands of Google data scientists and engineers across every major product line within Google.

KubeFlow is a modern, end-to-end pipeline orchestration framework that embraces the latest AI best practices including hyper-parameter tuning, distributed model training, and model tracking.

Airflow is the most-widely used pipeline orchestration framework in machine learning.



Pre-requisites

Modern browser - and that's it!

Every attendee will receive a cloud instance

Nothing will be installed on your local laptop

Everything can be downloaded at the end of the workshop



Location

Online Workshop

The link will be sent a few hours before the start of the workshop.

Only registered users will receive the link.

If you do not receive the link a few hours before the start of the workshop, please send your Eventbrite registration confirmation to support@pipeline.ai for help.



Agenda

1. Create a Kubernetes cluster

2. Install KubeFlow, Airflow, TFX, and Jupyter

3. Setup ML Training Pipelines with KubeFlow and Airflow

4. Transform Data with TFX Transform

5. Validate Training Data with TFX Data Validation

6. Train Models with Jupyter, Keras/TensorFlow 2.0, PyTorch, XGBoost, and KubeFlow

7. Run a Notebook Directly on Kubernetes Cluster with KubeFlow

8. Analyze Models using TFX Model Analysis and Jupyter

9. Perform Hyper-Parameter Tuning with KubeFlow

10. Select the Best Model using KubeFlow Experiment Tracking

11. Reproduce Model Training with TFX Metadata Store and Pachyderm

12. Deploy the Model to Production with TensorFlow Serving and Istio

13. Save and Download your Workspace



Key Takeaways

Attendees will gain experience training, analyzing, and serving real-world Keras/TensorFlow 2.0 models in production using model frameworks and open-source tools.


RSVP Here: https://www.eventbrite.com/e/full-day-workshop-kubeflow-kerastensorflow-20-tf-extended-tfx-kubernetes-pytorch-xgboost-airflow-tickets-63362929227

Video: https://youtu.be/AaBqhGEwxXI
GitHub: https://github.com/PipelineAI/pipeline

Publicado en: Tecnología
  • Sé el primero en comentar

Hands-on Learning with KubeFlow + Keras/TensorFlow 2.0 + TF Extended (TFX) + Kubernetes + PyTorch + XGBoost + Airflow + MLflow + Spark + Jupyter + TPU

  1. 1. End-to-End ML Pipelines TFX + KubeFlow + Airflow + MLflow + TPU Chris Fregly Founder @ .
  2. 2. Founder @ PipelineAI Real-time Machine Learning and AI in Production Former Databricks, Netflix Apache Spark Contributor O’Reilly Author High Performance TensorFlow in Production Meetup Organizer Advanced Spark and TensorFlow Meetup Who Am I? (@cfregly)
  3. 3. Advanced Spark and TensorFlow Meetup (Global, Monthly Events) https://meetup.com/Advanced-Spark-and-TensorFlow-Meetup
  4. 4. Upcoming Full-Day Workshop on Saturday, November 2, 2019! https://pipeline.ai @cfregly @PipelineAI Next Workshop: Nov 2, 2019 Next Workshop: Nov 2, 2019
  5. 5. 1 OK with Command Line? 2 OK with Python? 3 OK with Linear Algebra? Who are you? 4 OK with Docker? 6 5 OK with Jupyter Notebook?
  6. 6. Recent Poll (July 2019)
  7. 7. 4,000 Stars = $6,000,000 Seed $1,500 per GitHub Star?! (Please star our repo ASAP!!) Recent Comment from Popular VC Investor in Silicon Valley
  8. 8. Community Edition https://community.pipeline.ai
  9. 9. Note #1 of 10 IGNORE WARNINGS & ERRORS Everything will be OK!
  10. 10. Note #2 of 10 THERE IS A LOT OF MATERIAL HERE Many opportunities to explore on your own. (Don’t upload sensitive data)
  11. 11. Note #3 of 10 YOU HAVE YOUR OWN INSTANCE 16 CPU, 104 GB RAM, 200GB SSD
  12. 12. Note #4 of 10 DATASETS Chicago Taxi Dataset (and various others)
  13. 13. Note #5 of 10 SOME NOTEBOOKS TAKE MINUTES Please be patient. (We are using large datasets)
  14. 14. Note #6 of 10 QUESTIONS? Post questions to Zoom chat or Q&A. (Antje and I will answer soon) Antje >
  15. 15. Note #7 of 10 KUBEFLOW IS NOT A SILVER BULLET There are still gaps in the pipeline. (But gaps are getting smaller)
  16. 16. Note #8 of 10 THIS IS NOT CLOUD DEPENDENT* *Except for 2 small exceptions… Patches are underway.
  17. 17. Note #9 of 10 PRIMARILY TENSORFLOW 1.x TF 2.x is not fully supported by TFX (We have a section on TF 2)
  18. 18. Note #10 of 10 SHUTDOWN EACH NOTEBOOK AFTER We are using complex browser voo-doo.
  19. 19. System 6 System 5System 4 Training At Scale System 3 System 1 Data Ingestion Data Analysis Data Transform Data Validation System 2 Build Model Model Validation Serving Logging Monitoring Roll-out Data Splitting Ad-Hoc Training Why TFX and Why KubeFlow? Improve Training/Serving Consistency Unify Disparate Systems Manage Pipeline Complexity Improve Portability Wrangle Large Datasets Improve Model Quality Manage Versions Composability Distributed Training Configure
  20. 20. 1 Setup Environment with Kubernetes TensorFlow Extended (TFX) ML Pipelines with Airflow and KubeFlow Agenda Hyper-Parameter Tuning with KubeFlow Deploy Notebook with Kubernetes 2 3 4 5
  21. 21. Bonus Extras! Keras Tuner11 12 A/B Tests13 Metrics and Monitoring 6 TPUs MLflow TensorFlow Privacy 7 8 9 10 Papermill TensorFlow 2.0
  22. 22. Hands On 00_Explore_Environment
  23. 23. 1.1 Kubernetes TensorFlow Extended (TFX) Airflow ML Pipelines 1.0 Environment Overview KubeFlow ML Pipelines 6 Hyper-Parameter Tuning (Katib) Prediction Traffic Router (Istio) 1.2 1.3 1.4 1.6 1.7 MLflow Pipelines1.5
  24. 24. 1.1 Kubernetes Kubernetes NFS Ceph Cassandra MySQL Spark Airflow Tensorflow Caffe TF-Serving Flask+Scikit Operating system (Linux, Windows) CPU Memory DiskSSD GPU FPGA ASIC NIC Jupyter GCP AWS Azure On-prem Namespace Quota Logging Monitoring RBAC
  25. 25. Hands On 01_Explore_Kubernetes_Cluster
  26. 26. 1.2 TensorFlow Extended (TFX) Feature Load Feature Analyze Feature Transform Model Train Model Evaluate Model Deploy Reproduce Training
  27. 27. 1.3 Airflow ML Pipelines
  28. 28. 1.4 KubeFlow ML Pipelines
  29. 29. 1.5 MLflow Experiment Tracking
  30. 30. 1.6 Hyper-Parameter Tuning (Katib)
  31. 31. 1.7 Prediction Traffic Routing (Istio)
  32. 32. 1 Setup Environment with Kubernetes TensorFlow Extended (TFX) ML Pipelines with Airflow and KubeFlow Agenda Hyper-Parameter Tuning with KubeFlow Deploy Notebook with Kubernetes 2 3 4 5
  33. 33. 2.1 TFX Internals 2.0 TFX Components 6 Feature Load Feature Analyze Feature Transform Model Train Model Evaluate Model Deploy Reproduce Training 2.2 TFX Libraries 2.2 TFX Components
  34. 34. 2.1 TFX Internals Driver/Publisher Moves data to/from Metadata Store Executor Runs the Actual Processing Code Metadata Store Artifact, execution, and lineage Info Track inputs & outputs of all components Stores training run including inputs & outputs Analysis, validation, and versioning results Feature Load Feature Analyze Feature Transform Model Train Model Evaluate Model Deploy Reproduce Training
  35. 35. 2.2 TFX Libraries 2.2.1 TFX Components Use These: TensorFlow Data Validation (TFDV) TensorFlow Transform (TFT) TensorFlow Model Analysis (TFMA) TensorFlow Metadata (TFMD) + ML Metadata (MLMD) 2.2.2 2.2.3 2.2.4 Feature Load Feature Analyze Feature Transform Model Train Model Evaluate Model Deploy Reproduce Training
  36. 36. 2.2.1 TFX Libraries - TFDV TensorFlow Data Validation (TFDV) Find Missing, Redundant & Important Features Identify Features with Unusually-Large Scale `infer_schema()` Generates Schema Describe Feature Ranges Detect Data Drift Feature Load Feature Analyze Feature Transform Model Train Model Evaluate Model Deploy Reproduce Training Uniformly Distributed Data è ç Non-Uniformly Distributed Data
  37. 37. Hands On 02_TensorFlow_Data_Validation (TFDV)
  38. 38. 2.2.2 TFX Libraries - TFT TensorFlow Transform (TFT) Preprocess `tf.Example` data with TensorFlow Useful for data that requires a full pass Normalize all inputs by mean and std dev Create vocabulary of strings è integers over all data Bucketize features based on entire data distribution Outputs a TensorFlow graph Re-used across both training and serving Uses Apache Beam (local mode) for Parallel Analysis Can also use distributed mode `preprocessing_fn(inputs)`: Primary Fn to Implement Feature Load Feature Analyze Feature Transform Model Train Model Evaluate Model Deploy Reproduce Training import tensorflow as tf import tensorflow_transform as tft def preprocessing_fn(inputs): x = inputs['x'] y = inputs['y'] s = inputs['s'] x_centered = x - tft.mean(x) y_normalized = tft.scale_to_0_1(y) s_integerized = tft.compute_and_apply_vocabulary(s) x_centered_times_y_normalized = x_centered * y_normalized return { 'x_centered': x_centered, 'y_normalized': y_normalized, 'x_centered_times_y_normalized':x_centered_times_y_normalized, 's_integerized': s_integerized }
  39. 39. Hands On 03_TensorFlow_Transform (TFT)
  40. 40. Hands On 03a_TensorFlow_Transform_Advanced (TFT)
  41. 41. 2.2.3 TFX Libraries - TFMA TensorFlow Model Analysis (TFMA) Analyze Model on Different Slices of Dataset Track Metrics Over Time (“Next Day Eval”) `EvalSavedModel` Contains Slicing Info TFMA Pipeline: Read, Extract, Evaluate, Write ie. Ensure Model Works Fairly Across All Users Feature Load Feature Analyze Feature Transform Model Train Model Evaluate Model Deploy Reproduce Training
  42. 42. Hands On 04_TensorFlow_Model_Analysis (TFMA)
  43. 43. 2.2.4 TFX Libraries – Metadata TensorFlow Metadata (TFMD) ML Metadata (MLMD) Record and Retrieve Experiment Metadata Artifact, Execution, and Lineage Info Track Inputs / Outputs of All TFX Components Stores Training Run Info Analysis and Validation Results Model Versioning Info Feature Load Feature Analyze Feature Transform Model Train Model Evaluate Model Deploy Reproduce Training
  44. 44. 2.3 TFX Components ExampleGen StatisticsGen SchemaGen ExampleValidator Evaluator Transform ModelValidator Trainer Model Pusher2.3.92.3.1 2.3.2 2.3.3 2.3.4 2.3.5 2.3.6 2.3.7 2.3.8 Slack (!!)2.3.10
  45. 45. 2.3.1 ExampleGen Feature Load Feature Analyze Feature Transform Model Train Model Evaluate Model Deploy Reproduce Training Load Training Data Into TFX Pipeline Supports External Data Sources Supports CSV and TFRecord Formats Converts Data to tf.Example Note: TFX Pipelines require tf.Example (?!) Difficult to use non-TF models like XGBoost from tfx.utils.dsl_utils import csv_input from tfx.components.example_gen.csv_example_gen.component import CsvExampleGen examples = csv_input(os.path.join(base_dir, 'data/simple')) example_gen = CsvExampleGen(input_base=examples)
  46. 46. 2.3.2 StatisticsGen Feature Load Feature Analyze Feature Transform Model Train Model Evaluate Model Deploy Reproduce Training Generates Statistics on Training Data Global `mean` and `stddev` per input feature Consumes tf.Example instances from tfx import components compute_eval_stats = components.StatisticsGen( input_data=examples_gen.outputs.eval_examples, name='compute-eval-stats' )
  47. 47. 2.3.3 SchemaGen Feature Load Feature Analyze Feature Transform Model Train Model Evaluate Model Deploy Reproduce Training Schema Needed by Some TFX Components Data Types, Value Ranges, Optional, Required Consumes Data from StatisticsGen Schema used by TFDV, TFT, TFMA Libraries Uses TFDV Library to infer schema Best effort and basic Human should verify feature { name: "age" value_count { min: 1 max: 1 } type: FLOAT presence { min_fraction: 1 min_count: 1 } } from tfx import components infer_schema = components.SchemaGen( stats=compute_training_stats.outputs.output)
  48. 48. 2.3.4 ExampleValidator Feature Load Feature Analyze Feature Transform Model Train Model Evaluate Model Deploy Reproduce Training Identifies Anomalies in Training Data Used with serving data to detect drift / skew Uses StatisticsGen and SchemaGen Outputs Produces Validation Results Uses TFDV Library for Input Validation from tfx import components infer_schema = components.SchemaGen( stats=compute_training_stats.outputs.output )
  49. 49. 2.3.5 Transform Feature Load Feature Analyze Feature Transform Model Train Model Evaluate Model Deploy Reproduce Training Uses Data from ExampleGen & SchemaGen Transformations Become Part of TF Graph (!!) Helps Avoid Training/Serving Skew Uses TFT Library for Transformations Transformations Require Full Pass Thru Dataset Global Reduction Across All Batches Create Word Embeddings, Normalize, PCA def preprocessing_fn(inputs): # inputs: map from feature keys # to raw not-yet-transformed features # outputs: map from string feature key # to transformed feature operations
  50. 50. 2.3.6 Trainer Feature Load Feature Analyze Feature Transform Model Train Model Evaluate Model Deploy Reproduce Training Trains / Validates tf.Examples from Transform Uses schema.proto from SchemaGen Produces SavedModel and EvalSavedModel Uses Core TensorFlow Python API Works with TensorFlow 1.x Estimator API TensorFlow 2.0 Keras Support Coming Soon from tfx import components trainer = components.Trainer( module_file=taxi_pipeline_utils, train_files=transform_training.outputs.output, eval_files=transform_eval.outputs.output, schema=infer_schema.outputs.output, tf_transform_dir=transform_training.outputs.output, train_steps=10000, eval_steps=5000)
  51. 51. 2.3.7 Evaluator Feature Load Feature Analyze Feature Transform Model Train Model Evaluate Model Deploy Reproduce Training Uses EvalSavedModel from Trainer Writes Analysis Results to ML Metadata Store Uses TFMA Library for Analysis TFMA Uses Apache Beam to Scale Analysis from tfx import components import tensorflow_model_analysis as tfma taxi_eval_spec = [ tfma.SingleSliceSpec(), tfma.SingleSliceSpec(columns=['trip_start_hour']) ] model_analyzer = components.Evaluator( examples=examples_gen.outputs.eval_examples, eval_spec=taxi_eval_spec, model_exports=trainer.outputs.output)
  52. 52. 2.3.8 ModelValidator Feature Load Feature Analyze Feature Transform Model Train Model Evaluate Model Deploy Reproduce Training Validate Models from Trainer Uses Data from SchemaGen & StatisticsGen Compares New Models to Baseline Baseline == current model in production New Model is Good if Meets/Exceeds Metrics If Good, Notify Pusher to Deploy New Model Simulate “Next Day Evaluation” On New Data import tensorflow_model_analysis as tfma taxi_mv_spec = [tfma.SingleSliceSpec()] model_validator = components.ModelValidator( examples=examples_gen.outputs.output, model=trainer.outputs.output)
  53. 53. 2.3.9 Model Pusher (Deployer) Feature Load Feature Analyze Feature Transform Model Train Model Evaluate Model Deploy Reproduce Training Push Good Model to Deployment Target Uses Trained SavedModel Writes Version Data to Metadata Store Write to FileSystem or TensorFlow Hub from tfx import components pusher = components.Pusher( model_export=trainer.outputs.output, model_blessing=model_validator.outputs.blessing, serving_model_dir=serving_model_dir)
  54. 54. 2.3.10 Slack Component (!!) Feature Load Feature Analyze Feature Transform Model Train Model Evaluate Model Deploy Reproduce Training Runs After ModelValidator Adds Human-in-the-Loop Step to Pipeline TFX Sends Message to Slack with Model URI Asks Human to Review the New Model Respond ‘LGTM’, ‘approve’, ‘decline’, ‘reject’ Requires Slack API Setup / Integration export SLACK_BOT_TOKEN={your_token} _channel_id = 'my-channel-id' _slack_token = os.environ['SLACK_BOT_TOKEN’] slack_validator = SlackComponent( model_export=trainer.outputs.output, model_blessing=model_validator.outputs.blessing, slack_token=_slack_token, channel_id=_channel_id, timeout_sec=3600, ) https://github.com/tensorflow/tfx/tree/master /tfx/examples/custom_components/slack/slack_component
  55. 55. 1 Setup Environment with Kubernetes TensorFlow Extended (TFX) ML Pipelines with Airflow and KubeFlow Agenda Hyper-Parameter Tuning with KubeFlow Deploy Notebook with Kubernetes 2 3 4 5
  56. 56. 3.0 ML Pipelines with Airflow and KubeFlow Feature Load Feature Analyze Feature Transform Model Train Model Evaluate Model Deploy 3.1 Airflow KubeFlow3.2
  57. 57. 3.1 Airflow 6 Feature Load Feature Analyze Feature Transform Model Train Model Evaluate Model Deploy Reproduce Training Most Widely-Used Workflow Orchestrator Define Execution Graphs in Python Decent UI Good Community Support
  58. 58. Hands On 05_Airflow_ML_Pipelines (Chicago Taxi Dataset)
  59. 59. Hands On 06_Airflow_Feature_Analysis
  60. 60. Hands On 07_Airflow_Model_Analysis
  61. 61. 3.2 KubeFlow 6 Feature Load Feature Analyze Feature Transform Model Train Model Evaluate Model Deploy Reproduce Training Pipelines Based on Argo CI/CD Project from Intuit TFJob & PyTorch Job Supports Distributed Training TensorFlow & PyTorch Jobs KubeFlow Fairing Project (!!) Run a notebook as a production job Deploy training code with dependencies
  62. 62. Hands On 08_Simple_KubeFlow_ML_Pipeline
  63. 63. Hands On 09_Advanced_KubeFlow_ML_Pipeline (Chicago Taxi Dataset)
  64. 64. Hands On 10_Distributed_TensorFlow_Job
  65. 65. Hands On 10a_Distributed_PyTorch_Job
  66. 66. 1 Setup Environment with Kubernetes TensorFlow Extended (TFX) ML Pipelines with Airflow and KubeFlow Agenda Hyper-Parameter Tuning with KubeFlow Deploy Notebook with Kubernetes 2 3 4 5
  67. 67. 4.0 Hyper-Parameter Tuning 6 Experiment Single Optimization Run Single Objective Function Across Runs Contains Many Trials Trial List of Param Values Suggestion Optimization Algorithm Job Evaluates a Trial Calculates Objective
  68. 68. Hands On 11_Hyper_Parameter_Tuning
  69. 69. 1 Setup Environment with Kubernetes TensorFlow Extended (TFX) ML Pipelines with Airflow and KubeFlow Agenda Hyper-Parameter Tuning with KubeFlow Deploy Notebook with Kubernetes 2 3 4 5
  70. 70. 5.0 Deploy Notebook as Job Feature Load Feature Analyze Feature Transform Model Train Model Evaluate Model Deploy Reproduce Training 6 5.1 Wrap Model in a Docker Image Deploy Job to Kubernetes5.2
  71. 71. 5.1 Create Docker Image Feature Load Feature Analyze Feature Transform Model Train Model Evaluate Model Deploy Reproduce Training
  72. 72. 5.2 Deploy Notebook as Job Feature Load Feature Analyze Feature Transform Model Train Model Evaluate Model Deploy Reproduce Training
  73. 73. Hands On 12_Deploy_Notebook_Xgboost
  74. 74. Hands On 12a_Deploy_Notebook_TensorFlow
  75. 75. 1 Setup Environment with Kubernetes TensorFlow Extended (TFX) ML Pipelines with TFX, Airflow, and KubeFlow Agenda Hyper-Parameter Tuning with TFX and KubeFlow Deploy Notebook with Kubernetes 2 3 4 5
  76. 76. Bonus Extras! Keras Tuner11 12 A/B Tests13 Metrics and Monitoring 6 TPUs MLflow TensorFlow Privacy 7 8 9 10 Papermill TensorFlow 2.0
  77. 77. 6.0 TPUs
  78. 78. Hands On 13_TPU_Keras_MNIST
  79. 79. 7.0 MLflow 7.1 Experiment Tracking Hyper-Parameter Tuning Kubernetes-based Jobs 7.2 7.3
  80. 80. Hands On 14_MLflow_Scikit_Learn
  81. 81. Hands On 14a_MLflow_Keras
  82. 82. Hands On 14b_MLflow_TensorFlow
  83. 83. 8.0 Papermill
  84. 84. Hands On 15_Papermill_Notebook_Job
  85. 85. 9.0 TensorFlow Privacy (Differential Privacy)
  86. 86. Hands On 16_TF_Privacy
  87. 87. 10.0 TensorFlow 2.0
  88. 88. 11.0 Keras Tuner
  89. 89. Hands On 17_Keras_Tuner
  90. 90. 12.0 Model Serving
  91. 91. Hands On 18_Simple_Serving_REST
  92. 92. Hands On 18a_AB_Test_REST
  93. 93. Bonus Extras! Keras Tuner11 12 A/B Tests13 Metrics and Monitoring 6 TPUs MLflow TensorFlow Privacy 7 8 9 10 Papermill TensorFlow 2.0
  94. 94. Thank you! https://pipeline.ai @cfregly @PipelineAI Next Workshop: Nov 2, 2019 Next Workshop: Nov 2, 2019

×