Se ha denunciado esta presentación.
Utilizamos tu perfil de LinkedIn y tus datos de actividad para personalizar los anuncios y mostrarte publicidad más relevante. Puedes cambiar tus preferencias de publicidad en cualquier momento.
Chapter 1
Chapter 2
Applied research group
Collaborating with Azure Data product group
Open-sourcing our code
Apache Hadoop, REEF, Heron, MLfl...
Our labs by numbers
637
Patents GAed or Public Preview
features just this year
LoC in OSS
0.5M
LoC in OSS
130+Publications...
Systems considered thus far
Cloud Providers Private Services OSS
Training
Experiment Tracking
Managed Notebooks
Pipelines / Projects
Multi-Framework
Proprietary Algos
Distributed Training...
Let Data Scientists do Data Science!
offline
online
Data-driven development
Solution Deployment
NN
Model transform
ONNX
ONNX’ Optimization
Close/Update Inciden...
DEMO Python code
import pandas as pd
import lightgbm as lgb
from sklearn import metrics
data_train = pd.read_csv("global_t...
Current OnCall Workflow
Revised OnCall Workflow with Griffon
A support engineer (SE) spends hours
of manual labor looking ...
ONNX: Interoperability across ML frameworks
Open format to represent ML models
Backed by Microsoft, Amazon, Facebook, and ...
Train a model using a
popular framework
such as TensorFlow
Convert the model to
ONNX format
Perform inference
efficiently ...
ONNX Runtime and optimizations
Key design points:
Graph IR
Support for multiple backends (e.g., CPU, GPU, FPGA)
Graph opti...
~40 ONNX
models
in production
>10 orgs
are migrating
their models to
ONNX Runtime
Average
Speedup
2.7x
ONNX Runtime in pro...
ONNX Runtime in production
Office – Grammar Checking Model
14.6x reduction in latency
Train a sklearn
model
mlflow models serve -m /artifacts/model -p 1234
curl -X POST -H "Content-Type:application/json;format=pandas-split"--data ...
Flock: Data Science Platform @ CISL
Flock: Data Science Platform @ CISL
Flock: Data Science Platform @ CISL
Flock: Data Science Platform @ CISL
Próxima SlideShare
Cargando en…5
×

Flock: Data Science Platform @ CISL

2.469 visualizaciones

Publicado el

In this talk, we will present the basic features and functionality of Flock, an end-to-end research platform that we are developing at CISL which simplifies and automates the integration of machine learning solutions in data engines. Flock makes use of MLflow for model and experiment tracking but extends and complements it by providing automatic logging, model optimizations and support for the ONNX model format.

We will showcase Flock's features through a demo using Microsoft's Azure Data Studio and SQL Server.

Publicado en: Software

Flock: Data Science Platform @ CISL

  1. 1. Chapter 1 Chapter 2
  2. 2. Applied research group Collaborating with Azure Data product group Open-sourcing our code Apache Hadoop, REEF, Heron, MLflow
  3. 3. Our labs by numbers 637 Patents GAed or Public Preview features just this year LoC in OSS 0.5M LoC in OSS 130+Publications in top tier conferences/journals 1.1M LoC in products 600k Servers running our code in Azure/Cosmos
  4. 4. Systems considered thus far Cloud Providers Private Services OSS
  5. 5. Training Experiment Tracking Managed Notebooks Pipelines / Projects Multi-Framework Proprietary Algos Distributed Training Auto ML Serving Batch prediction On-prem deployment Model Monitoring Model Validation Data Management Data Provenance Data testing Feature Store Featurization DSL Labelling In-DB ML Good Support OK Support No Support Unknown
  6. 6. Let Data Scientists do Data Science!
  7. 7. offline online Data-driven development Solution Deployment NN Model transform ONNX ONNX’ Optimization Close/Update Incidents Job-id Job telemetry telemetry application tracking model training LightGBM policies deployment ONNX’ pyfunc policies Dhalion
  8. 8. DEMO Python code import pandas as pd import lightgbm as lgb from sklearn import metrics data_train = pd.read_csv("global_train_x_label_with_mapping.csv") data_test = pd.read_csv("global_test_x_label_with_mapping.csv") train_x = data_train.iloc[:,:-1].values train_y = data_train.iloc[:,-1].values test_x = data_test.iloc[:,:-1].values test_y = data_test.iloc[:,-1].values n_leaves = 8 n_trees = 100 clf = lgb.LGBMClassifier(num_leaves=n_leaves, n_estimators=n_trees) clf.fit(train_x,train_y) score = metrics.precision_score(test_y, clf.predict(test_x), average='macro’) print("Precision Score on Test Data: " + str(score)) import mlflow import mlflow.onnx import multiprocessing import torch import onnx from onnx import optimizer from functools import partial from flock import get_tree_parameters, LightGBMBinaryClassifier_Batched import mlflow.sklearn import mlflow import pandas as pd import lightgbm as lgb from sklearn import metrics data_train = pd.read_csv('global_train_x_label_with_mapping.csv') data_test = pd.read_csv('global_test_x_label_with_mapping.csv') train_x = data_train.iloc[:, :-1].values train_y = data_train.iloc[:, (-1)].values test_x = data_test.iloc[:, :-1].values test_y = data_test.iloc[:, (-1)].values n_leaves = 8 n_trees = 100 clf = lgb.LGBMClassifier(num_leaves=n_leaves, n_estimators=n_trees) mlflow.log_param('clf_init_n_estimators', n_trees) mlflow.log_param('clf_init_num_leaves', n_leaves) clf.fit(train_x, train_y) mlflow.sklearn.log_model(clf, 'clf_model') score = metrics.precision_score(test_y, clf.predict(test_x), average='macro') mlflow.log_param('precision_score_average', ' macro') mlflow.log_param('score', score) print('Precision Score on Test Data: ' + str(score)) n_features = 100 activation = 'sigmoid' torch.set_num_threads(1) device = torch.device('cpu') model_name = 'griffon' model = clf.booster_.dump_model() n_features = clf.n_features_ tree_infos = model['tree_info'] pool = multiprocessing.Pool(8) parameters = pool.map(partial(get_tree_parameters, n_features=n_features), tree_infos) lgb_nn = LightGBMBinaryClassifier_Batched(parameters, n_features, activation ).to(device) torch.onnx.export(lgb_nn, torch.randn(1, n_features).to(device), model_name + '_nn.onnx', export_params=True, operator_export_type=torch.onnx. OperatorExportTypes.ONNX_ATEN_FALLBACK) passes = ['eliminate_deadend', 'eliminate_identity', 'eliminate_nop_monotone_argmax', 'eliminate_nop_transpose', 'eliminate_unused_initializer', 'extract_constant_to_initializer', 'fuse_consecutive_concats', 'fuse_consecutive_reduce_unsqueeze', 'fuse_consecutive_squeezes', 'fuse_consecutive_transposes', 'fuse_matmul_add_bias_into_gemm', 'fuse_transpose_into_gemm', 'lift_lexical_references'] model = onnx.load(model_name + '_nn.onnx') opt_model = optimizer.optimize(model, passes) mlflow.onnx.log_model(opt_model, 'opt_model') pyfunc_loaded = mlflow.pyfunc.load_pyfunc('opt_model', run_id=mlflow. active_run().info.run_uuid) scoring = pyfunc_loaded.predict(pd.DataFrame(test_x[:1].astype('float32')) ).values print('Scoring through mlflow pyfunc: ', scoring) mlflow.log_param('pyfunc_scoring', scoring[0][0]) User code Instrumented code Flock
  9. 9. Current OnCall Workflow Revised OnCall Workflow with Griffon A support engineer (SE) spends hours of manual labor looking through hundreds of metrics After 5-6 hours of investigation, the reason for job slow down is found. A job goes out of SLA and Support is alerted A job goes out of SLA and the SE is alerted The Job ID is fed through Griffon and the top reasons for job slowdown are generated automatically The reason is found in the top five generated by Griffon. All the metrics Griffon has looked at can be ruled out and the SE can direct their efforts to a smaller set of metrics.
  10. 10. ONNX: Interoperability across ML frameworks Open format to represent ML models Backed by Microsoft, Amazon, Facebook, and several hardware vendors
  11. 11. Train a model using a popular framework such as TensorFlow Convert the model to ONNX format Perform inference efficiently across multiple platforms and hardware using ONNX runtime
  12. 12. ONNX Runtime and optimizations Key design points: Graph IR Support for multiple backends (e.g., CPU, GPU, FPGA) Graph optimizations Rule-based optimizer inspired by DB optimizers Improved inference time and memory consumption Examples: 117msec → 34msec; 250MB → 200MB
  13. 13. ~40 ONNX models in production >10 orgs are migrating their models to ONNX Runtime Average Speedup 2.7x ONNX Runtime in production
  14. 14. ONNX Runtime in production Office – Grammar Checking Model 14.6x reduction in latency
  15. 15. Train a sklearn model
  16. 16. mlflow models serve -m /artifacts/model -p 1234 curl -X POST -H "Content-Type:application/json;format=pandas-split"--data '{"columns":["alcohol","chlorides", "citric acid", "density","fixed acidity","free sulfur dioxide", "pH", "residual sugar", "sulphates", "total sulfur dioxide", "volatile acidity"],"data":[[12.8,0.029, 0.48, 0.98, 6.2, 29, 3.33, 1.2, 0.39, 75, 0.66]]}' http://127.0.0.1:1234/invocations [6.379428821398614] Deploy the server Perform Inference ONNX Runtime is automatically invoked

×