SlideShare una empresa de Scribd logo
1 de 61
Descargar para leer sin conexión
28.03.19 DFKI – DIMA – TU Berlin
Database Systems and Information
Management Group
TU Berlin
https://www.dima.tu-berlin.de/
Intelligent Analytics for Massive Data
German Research Center for Artificial
Intelligence
https://www.dfki.de/ 1
Continuous Deployment of Machine
Learning Pipelines
Behrouz Derakhshan behrouz.derakhshan@dfki.de
Alireza Rezaei Mahdiraji alireza.rm@dfki.de
Tilmann Rabl rabl@tu-berlin.de
Volker Markl volker.markl@tu-berlin.de
28.03.19 DFKI – DIMA – TU Berlin 2
Life Cycle of Machine Learning Applications
28.03.19 DFKI – DIMA – TU Berlin 2
■ Life cycle of ML application does not end with training
Life Cycle of Machine Learning Applications
28.03.19 DFKI – DIMA – TU Berlin 2
■ Life cycle of ML application does not end with training
■ Models and Pipelines must be deployed to answer
prediction queries
Life Cycle of Machine Learning Applications
28.03.19 DFKI – DIMA – TU Berlin 2
■ Life cycle of ML application does not end with training
■ Models and Pipelines must be deployed to answer
prediction queries
■ Deployed models and pipelines should be monitored and
trained further
Life Cycle of Machine Learning Applications
28.03.19 DFKI – DIMA – TU Berlin 2
■ Life cycle of ML application does not end with training
■ Models and Pipelines must be deployed to answer
prediction queries
■ Deployed models and pipelines should be monitored and
trained further
Life Cycle of Machine Learning Applications
Focus of this talk
28.03.19 DFKI – DIMA – TU Berlin 3
2. Deployment
Improvement By Online Learning
Data
Preprocessing
Model
Training
Model
1. Training
Data
Preprocessing
Model
Training
Model
Deployment Platform
3. Query Answering
Prediction
Query Prediction
Training DataTraining Data
28.03.19 DFKI – DIMA – TU Berlin 3
2. Deployment
Improvement By Online Learning
Data
Preprocessing
Model
Training
Model
1. Training
Data
Preprocessing
Model
Training
Model
Deployment Platform
3. Query Answering
Prediction
Query Prediction
Training
Data
Training DataTraining Data
28.03.19 DFKI – DIMA – TU Berlin 3
2. Deployment
Improvement By Online Learning
Data
Preprocessing
Model
Training
Model
1. Training
Data
Preprocessing
Model
Training
Model
Deployment Platform
3. Query Answering
Prediction
Query Prediction
Training
Data
Training DataTraining Data
4. Online Learning
28.03.19 DFKI – DIMA – TU Berlin 3
2. Deployment
Improvement By Online Learning
Data
Preprocessing
Model
Training
Model
1. Training
Data
Preprocessing
Model
Training
Model
Deployment Platform
3. Query Answering
Prediction
Query Prediction
Training
Data
Training DataTraining Data
• Efficient and Fast
4. Online Learning
28.03.19 DFKI – DIMA – TU Berlin 3
2. Deployment
Improvement By Online Learning
Data
Preprocessing
Model
Training
Model
1. Training
Data
Preprocessing
Model
Training
Model
Deployment Platform
3. Query Answering
Prediction
Query Prediction
Training
Data
Training DataTraining Data
• Cannot guarantee high-quality models
• Efficient and Fast
4. Online Learning
28.03.19 DFKI – DIMA – TU Berlin 4
2. Deployment
Improvement By Retraining
Data
Preprocessing
Model
Training
Model
Data
Preprocessing
Model
Training
Model
Deployment Platform
Prediction
Query Prediction
Training
Data
Training Data
1. Training
3. Query Answering
4. Online Learning
28.03.19 DFKI – DIMA – TU Berlin 4
2. Deployment
Improvement By Retraining
Data
Preprocessing
Model
Training
Model
Data
Preprocessing
Model
Training
Model
Deployment Platform
Prediction
Query Prediction
New Training Data
5. Gather Training Data
Training
Data
Training Data
1. Training
3. Query Answering
4. Online Learning
28.03.19 DFKI – DIMA – TU Berlin 4
2. Deployment
Improvement By Retraining
Data
Preprocessing
Model
Training
Model
Data
Preprocessing
Model
Training
Model
Deployment Platform
Prediction
Query Prediction
New Training Data
5. Gather Training Data
Training
Data
6. Retraining
Training Data
1. Training
3. Query Answering
4. Online Learning
28.03.19 DFKI – DIMA – TU Berlin 4
2. Deployment
Improvement By Retraining
Data
Preprocessing
Model
Training
Model
Data
Preprocessing
Model
Training
Model
Deployment Platform
Prediction
Query Prediction
New Training Data
5. Gather Training Data
Training
Data
6. Retraining
Historical
Training Data
1. Training
3. Query Answering
4. Online Learning
28.03.19 DFKI – DIMA – TU Berlin 4
2. Deployment
Improvement By Retraining
Data
Preprocessing
Model
Training
Model
Data
Preprocessing
Model
Training
Model
Deployment Platform
Prediction
Query Prediction
New Training Data
5. Gather Training Data
Training
Data
6. Retraining
Historical
Training Data
• Provides high-quality models
1. Training
3. Query Answering
4. Online Learning
28.03.19 DFKI – DIMA – TU Berlin 4
2. Deployment
Improvement By Retraining
Data
Preprocessing
Model
Training
Model
Data
Preprocessing
Model
Training
Model
Deployment Platform
Prediction
Query Prediction
New Training Data
5. Gather Training Data
Training
Data
6. Retraining
Historical
Training Data
• Provides high-quality models
• Time-consuming and resource-intensive
• Out-of-core
1. Training
3. Query Answering
4. Online Learning
28.03.19 DFKI – DIMA – TU Berlin 5
Ideal Deployment Platform
Can a platform provide the same level of quality as Retraining
and perform (almost) as efficiently as Online Learning?
28.03.19 DFKI – DIMA – TU Berlin 6
Continuous Deployment Platform
Prediction
Query Prediction
Training
Data
Continuous Deployment Platform
• Train the model inside the platform
• Compute features and cache them
• Update data preprocessing statistics
• Replace Retraining with Proactive Training
Data
Preprocessing
Model
Training
Model
Historical Training Data Online Learning
Proactive Training
Cache computed
Features
Stats
28.03.19 DFKI – DIMA – TU Berlin 7
Historical Training Data
Preprocess MaterializeDiscretize Sample Update
Raw Data
Chunks
Preprocessed
Features
Model
Continuous Deployment Platform
Scheduled Execution
Raw
Data
28.03.19 DFKI – DIMA – TU Berlin 8
Historical Training Data
Preprocess MaterializeDiscretize Sample Update
Raw Data
Chunks
Preprocessed
Features
Model
Continuous Deployment Platform
Scheduled Execution
Raw
Data
Data Preparation Phase
28.03.19 DFKI – DIMA – TU Berlin 9
Historical Training Data
Preprocess MaterializeDiscretize Sample Update
Raw Data
Chunks
Preprocessed
Features
Model
Continuous Deployment Platform
Scheduled Execution
Raw
Data
Proactive Training Phase
28.03.19 DFKI – DIMA – TU Berlin 10
Historical Training Data
Data Preparation Phase
Raw
Data
Raw Data
Chunks
Discretize
…..
t0 t1 t2 t3
t0 t1 t2 t3 tn-3 tn-2 tn-1 tn
28.03.19 DFKI – DIMA – TU Berlin 10
Historical Training Data
Data Preparation Phase
Raw
Data
Raw Data
Chunks
Discretize Preprocess
Parser
Missing Value
Imputer
Standard
Scaler
Feature
Hasher
…..
t0 t1 t2 t3
t0 t1 t2 t3 tn-3 tn-2 tn-1 tn
28.03.19 DFKI – DIMA – TU Berlin 10
Historical Training Data
Data Preparation Phase
Raw
Data
Raw Data
Chunks
Discretize Preprocess
Feature
Chunks
Parser
Missing Value
Imputer
Standard
Scaler
Feature
Hasher
…..
…..
Cache
Layer
t0 t1 t2 t3
t0 t1 t2 t3
t0 t1 t2 t3 tn-3 tn-2 tn-1 tn
28.03.19 DFKI – DIMA – TU Berlin 10
Historical Training Data
Data Preparation Phase
Raw
Data
Raw Data
Chunks
Discretize Preprocess
Feature
Chunks
Parser
Missing Value
Imputer
Standard
Scaler
Feature
Hasher
Stats Stats
…..
…..
Cache
Layer
t0 t1 t2 t3
t0 t1 t2 t3
t0 t1 t2 t3 tn-3 tn-2 tn-1 tn
Update and Store Statistics
28.03.19 DFKI – DIMA – TU Berlin 11
Storage Requirement
Historical Training Data
…..
…..
t0 t1 t2 t3 tn-3 tn-2 tn-1 tn
Cache
Layer
28.03.19 DFKI – DIMA – TU Berlin 11
Storage Requirement
Historical Training Data
…..
…..
Removed Cached Features
t0 t1 t2 t3 tn-3 tn-2 tn-1 tn
Cache
Layer
28.03.19 DFKI – DIMA – TU Berlin 12
Historical Training Data
…..
…..
Sample Update ModelMaterialize
Cache
Layer
Scheduled Execution
Proactive Training Phase
28.03.19 DFKI – DIMA – TU Berlin 12
Historical Training Data
…..
…..
Sample Update ModelMaterialize
Cache
Layer
mini-batch SGD Algorithm
Input: = training dataset
Output: = trained model
1: initialize
2: for = do
3:
4:
5
6: end for
7: return
𝐷
𝑚
𝑚0
𝑖 1…𝑛 
𝑠𝑖 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑓𝑟𝑜𝑚 𝐷
𝑔 = 𝛁𝐽(𝑠𝑖, 𝑚𝑖−1)
:     𝑚𝑖 = 𝑚𝑖−1 − 𝜂𝑖−1 𝑔
𝑚𝑛
Scheduled Execution
Proactive Training Phase
28.03.19 DFKI – DIMA – TU Berlin 12
Historical Training Data
…..
…..
Sample Update ModelMaterialize
Cache
Layer
mini-batch SGD Algorithm
Input: = training dataset
Output: = trained model
1: initialize
2: for = do
3:
4:
5
6: end for
7: return
𝐷
𝑚
𝑚0
𝑖 1…𝑛 
𝑠𝑖 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑓𝑟𝑜𝑚 𝐷
𝑔 = 𝛁𝐽(𝑠𝑖, 𝑚𝑖−1)
:     𝑚𝑖 = 𝑚𝑖−1 − 𝜂𝑖−1 𝑔
𝑚𝑛
Scheduled Execution
Proactive Training Phase
28.03.19 DFKI – DIMA – TU Berlin 12
Historical Training Data
…..
…..
Sample Update ModelMaterialize
Cache
Layer
mini-batch SGD Algorithm
Input: = training dataset
Output: = trained model
1: initialize
2: for = do
3:
4:
5
6: end for
7: return
𝐷
𝑚
𝑚0
𝑖 1…𝑛 
𝑠𝑖 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑓𝑟𝑜𝑚 𝐷
𝑔 = 𝛁𝐽(𝑠𝑖, 𝑚𝑖−1)
:     𝑚𝑖 = 𝑚𝑖−1 − 𝜂𝑖−1 𝑔
𝑚𝑛
Scheduled Execution
Proactive Training Phase
28.03.19 DFKI – DIMA – TU Berlin 12
Historical Training Data
…..
…..
Sample Update ModelMaterialize
Cache
Layer
mini-batch SGD Algorithm
Input: = training dataset
Output: = trained model
1: initialize
2: for = do
3:
4:
5
6: end for
7: return
𝐷
𝑚
𝑚0
𝑖 1…𝑛 
𝑠𝑖 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑓𝑟𝑜𝑚 𝐷
𝑔 = 𝛁𝐽(𝑠𝑖, 𝑚𝑖−1)
:     𝑚𝑖 = 𝑚𝑖−1 − 𝜂𝑖−1 𝑔
𝑚𝑛
Scheduled Execution
Proactive Training Phase
28.03.19 DFKI – DIMA – TU Berlin 12
Historical Training Data
…..
…..
Sample Update ModelMaterialize
Cache
Layer
mini-batch SGD Algorithm
Input: = training dataset
Output: = trained model
1: initialize
2: for = do
3:
4:
5
6: end for
7: return
𝐷
𝑚
𝑚0
𝑖 1…𝑛 
𝑠𝑖 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑓𝑟𝑜𝑚 𝐷
𝑔 = 𝛁𝐽(𝑠𝑖, 𝑚𝑖−1)
:     𝑚𝑖 = 𝑚𝑖−1 − 𝜂𝑖−1 𝑔
𝑚𝑛
Scheduled Execution
Proactive Training Phase
28.03.19 DFKI – DIMA – TU Berlin 13
Data Materializing
Historical Training Data
…..
…..
Sample Update ModelMaterialize
Cache
Layer
28.03.19 DFKI – DIMA – TU Berlin 13
Data Materializing
Historical Training Data
…..
…..
Sample Update ModelMaterialize
Parser
Missing Value
Imputer
Standard
Scaler
Feature
Hasher
Stats Stats
Cache
Layer
Statistics precomputed during the Data Preparation Phase
28.03.19 DFKI – DIMA – TU Berlin 14
Evaluation
Datasets Size #Instances Initial Deployment
URL 2.1 GB 2.4 M Day 0 Day 1-120
Taxi 42 GB 280 M Jan 15 Feb 15 – Jun 16
Parser
Missing Value
Imputer
Standard
Scaler
Feature
Hasher
SVM Model
URL
Pipeline
Parser
Feature
Extractor
Anomaly
Detector
Standard
Scaler
Linear Regression
Model
Taxi
Pipeline
28.03.19 DFKI – DIMA – TU Berlin 15
Quality
Can Proactive Training provide the same level of quality as Retraining?
28.03.19 DFKI – DIMA – TU Berlin 15
Quality
−
−
−
−
−
−
−
−
−
−
−
−
− −
−
−
−
−
−
−
−
−
− − −
−
−
−
−
−
−
−
−
−
−
−
−
− −
−
−
−
−
−
−
−
−
−
− −
−
−
−
−
−
−
−
−
−
−
−
−
− −
−
−
−
−
−
−
−
−
−
− −
2.1
2.2
2.3
day1 day30 day60 day90 day120
Misclassification%
− − −Proactive Retraining Online
Can Proactive Training provide the same level of quality as Retraining?
Cumulative Prequential Prediction Error Rate for the URL Pipeline During the Deployment
28.03.19 DFKI – DIMA – TU Berlin 16
Training Time
Can Proactive Training perform (almost) as efficiently as Online Learning?
28.03.19 DFKI – DIMA – TU Berlin 16
Training Time
− − − − − − − − − − − − − − − − − − − − − − − −
−
− − −
− −
− −
− −
− −
− −
− −
− −
− −
− −
− −
− − − − − − − − − − − − − − − − − − − − − − − −
0
250
500
750
day1 day30 day60 day90 day120
Time(m)
− − −Proactive Retraining Online
Cumulative Training Time for the URL Pipeline During the Deployment
Can Proactive Training perform (almost) as efficiently as Online Learning?
28.03.19 DFKI – DIMA – TU Berlin 16
Training Time
− − − − − − − − − − − − − − − − − − − − − − − −
−
− − −
− −
− −
− −
− −
− −
− −
− −
− −
− −
− −
− − − − − − − − − − − − − − − − − − − − − − − −
0
250
500
750
day1 day30 day60 day90 day120
Time(m)
− − −Proactive Retraining Online
Cumulative Training Time for the URL Pipeline During the Deployment
Can Proactive Training perform (almost) as efficiently as Online Learning?
Proactive training provides same level of model accuracy as Retraining, while
matching the speed of Online Learning
28.03.19 DFKI – DIMA – TU Berlin 17
Feature Caching and Statistics Computation
What are the effects of Statistics Computation and Feature Caching ?
111
91
55
0
40
80
120
No
Optimiziation
0% Cached 100% Cached
Time(m)
Total Training Time in Presence of Statistics Computation and Feature Caching
28.03.19 DFKI – DIMA – TU Berlin 17
Feature Caching and Statistics Computation
What are the effects of Statistics Computation and Feature Caching ?
111
91
55
0
40
80
120
No
Optimiziation
0% Cached 100% Cached
Time(m)
Total Training Time in Presence of Statistics Computation and Feature Caching
Statistics Computation and Feature Caching improves the performance of
Proactive training by a factor of 2
28.03.19 DFKI – DIMA – TU Berlin 18
Summary
Continuous Deployment Platform
• Proactive Training, instead of
Offline Retraining
• Feature Caching
• Online Statistics Computation
• Reduces the total training time
• Achieves high quality
Historical Training Data
Preprocess MaterializeDiscretize Sample Update
Raw
Data
Raw Data
Chunks
Preprocessed
Features
Model
Schedule Execution
2.23
2.24
2.25
2.26
2.27
0 400 800
Time(m)
Misclassification%
Proactive Retraining Online
28.03.19 DFKI – DIMA – TU Berlin 19
1. D. Crankshaw, X. Wang, G. Zhou, M. Franklin, et al. 2016. Clipper: A Low-Latency Online Prediction
Serving System. arXiv preprint arXiv:1612.03079 (2016).
2. D. Crankshaw, P. Bailis, J. Gonzalez, H. Li, et al. 2014. The missing piece incomplex analytics: Low
latency, scalable model management and serving with velox.
3. D. Baylor, E. Breck, H. Cheng, N. Fiedel, et al. 2017. TFX: A TensorFlow-Based Production-Scale
Machine Learning Platform. In Proceedings of the 23rd ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining. ACM, 1387–1395.
4. L. Bottou. 2010. Large-scale machine learning with stochastic gradient descent. In Proceedings of
COMPSTAT’2010. Springer, 177–186.
5. M. Zaharia, M. Chowdhury, M. Franklin, S. Shenker, and I. Stoica. 2010. Spark: cluster computing with
working sets. HotCloud 10 (2010), 10–10.
6. O. Chapelle. [n. d.]. NYC Taxi & Lomousine Commision Trip Record Data. http://www.nyc.gov/html/tlc/
html/about/trip_record_data.shtml. [Online;accessed 10-April-2018].
7. J. Ma, L. Saul, S. Savage, and G. Voelker. 2009. Identifying suspicious URLs: an application of large-
scale online learning. In Proceedings of the 26th annual international conference on machine learning.
ACM, 681–688.
8. D. Kingma and J. Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint
arXiv:1412.6980 (2014).
9. M. Zeiler. 2012. ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 (2012).
10. T. Tieleman and G. Hinton. 2012. Lecture 6.5-rmsprop: Divide the gradient by a running average of its
recent magnitude. COURSERA: Neural networks for machine learning 4, 2 (2012), 26–31.
References
28.03.19 DFKI – DIMA – TU Berlin 20
Backup Slides
28.03.19 DFKI – DIMA – TU Berlin 21
Preprocess MaterializeDiscretize Sample Update
Raw Data
Chunks
Preprocessed
Features
Model
Data Manager
Historical Training Data
Raw
Data
Data Manager
• Data Discretizing
• Data Sampling
• Historical Data Management Scheduled Execution
28.03.19 DFKI – DIMA – TU Berlin 22
Scheduled Execution
Historical Training Data
Preprocess MaterializeDiscretize Sample Update
Raw
Data
Raw Data
Chunks
Preprocessed
Features
Model
Pipeline Manager
Pipeline Manager
• Data Preprocessing
• Data Materialization
• Pipeline Component Management
28.03.19 DFKI – DIMA – TU Berlin 23
Historical Training Data
Preprocess MaterializeDiscretize Sample Update
Raw
Data
Raw Data
Chunks
Preprocessed
Features
Model
Model Updater
Model Updater
• Online Training
• Proactive Training
Scheduled Execution
28.03.19 DFKI – DIMA – TU Berlin 24
Historical Training Data
Preprocess MaterializeDiscretize Sample Update
Raw
Data
Raw Data
Chunks
Preprocessed
Features
Model
Scheduler
Scheduler
• Schedule Proactive Training
Scheduled Execution
28.03.19 DFKI – DIMA – TU Berlin 25
■ TFX
□ Manual Retraining
□ No Online Learning
■ Velox
□ Automatic Retraining
□ Online Learning
■ Clipper
□ No Retraining
□ No Online Learning
□ Ensemble of Models
Related Work
28.03.19 DFKI – DIMA – TU Berlin 26
Proactive Training vs Periodical Retraining (Taxi)
Cumulative Prequential Prediction Error Rate for the Taxi Pipeline During the Deployment
−
−
− −
−
−
− −
−
−
−
−
−
−
−
−
−
−
− − − −
−
−
−
−
− −
−
−
− −
−
−
−
−
−
−
−
−
−
−
− − − −
−
−
−
−
− −
−
−
− −
−
−
−
−
−
−
−
−
−
−
− − − −
−
−
0.0960
0.0965
0.0970
0.0975
0.0980
Feb15 Jul15 Jan16 June16
RMSLE
− − −Proactive Retraining Online
28.03.19 DFKI – DIMA – TU Berlin 27
Proactive Training vs Periodical Retraining (Taxi)
Cumulative Training Time for the Taxi Pipeline During the Deployment
− − − − − − − − − − − − − − − − − − − − − − − −
−
−
− −
−
− −
−
− −
−
−
− −
−
− −
−
− −
−
− −
−
− − − − − − − − − − − − − − − − − − − − − − − −
0
500
1000
1500
Feb15 Jul15 Jan16 June16
Time(m)
− − −Proactive Retraining Online
28.03.19 DFKI – DIMA – TU Berlin 28
Materialization and Statistics Computation (Taxi)
●
●
●
●300
400
500
600
700
800
0.0 0.2 0.6 1.0
Cached Features Ratio
Time(m)
● TimeBased
Uniform
WindowBased
NoOptimization
Materialization Utilization Rate
for different ratio of Cached Features Taxi
Sampling
Ratio of Cached Features
m = 0.2 m = 0.6
Uniform 0.51 0.90
Window-based 0.57 1.0
Time-based 0.65 0.97
Materialization Utilization Rate:
Ratio of preprocessed features that
skipped the materialization step
28.03.19 DFKI – DIMA – TU Berlin 29
Materialization and Statistics Computation (URL)
Sampling
Ratio of Cached Features
m = 0.2 m = 0.6
Uniform 0.52 0.91
Window-based 0.58 1.0
Time-based 0.68 0.97
Materialization Utilization Rate
for different ratio of Cached Features URL
●
●
●
●
60
70
80
90
100
110
0.0 0.2 0.6 1.0
Cached Features Ratio
Time(m)
● TimeBased
Uniform
WindowBased
NoOptimization
Materialization Utilization Rate:
Ratio of preprocessed features that
skipped the materialization step
28.03.19 DFKI – DIMA – TU Berlin 30
Time vs Quality (Taxi)
0.0974
0.0975
0.0976
0 1000 2000
Time(m)
RMSLE
Proactive Retraining Online
28.03.19 DFKI – DIMA – TU Berlin 31
Adaptation
URL Taxi
r = 1E-2 r = 1E-3 r = 1E-4 r = 1E-2 r = 1E-3 r = 1E-4
Adam 0.030 0.026 0.035 0.09553 0.09551 0.09551
RMSProp 0.030 0.027 0.034 0.09552 0.09552 0.09550
Adadelta 0.029 0.028 0.034 0.09609 0.09610 0.09619
Effect of Learning Rate Adaption during the Deployment
Effect Learning Rate Adaption and Regularization Parameter on Initial Training
Tuning
− − −Adam Rmsprop Adadelta
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
2.10
2.15
2.20
2.25
2.30
2.35
day1 day15 day30
URL
Misclassification%
−
−
− −
− − − − −
−
−
− −
− − − − −
−
−
− −
− − − − −
0.085
0.090
0.095
Feb15 Apr15
TaxiRMSLE
28.03.19 DFKI – DIMA – TU Berlin 32
Effect of Sampling on model quality
Effect of Sampling Method on the Error Rate
− − −Timebased Windowbased Uniform
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
−
2.10
2.15
2.20
2.25
2.30
day1 day15 day30
URL
Misclassification%
−
−
− −
− − − − −
−
−
− −
− − − − −
−
−
− −
− − − − −
0.085
0.090
0.095
Feb15 Apr15
Taxi
RMSLE
28.03.19 DFKI – DIMA – TU Berlin 33
Materialization Process
28.03.19 DFKI – DIMA – TU Berlin 34
Ads CTR USE Case Figure

Más contenido relacionado

Similar a Continuous Deployment of Machine Learning Pipelines

CD Theory With Lab Component-IMLCIS.doc
CD Theory With Lab Component-IMLCIS.docCD Theory With Lab Component-IMLCIS.doc
CD Theory With Lab Component-IMLCIS.docGayathriRHICETCSESTA
 
Product Engineer Certified Lean Six Sigma Black Belt by IASSC
Product Engineer Certified Lean Six Sigma Black Belt by IASSCProduct Engineer Certified Lean Six Sigma Black Belt by IASSC
Product Engineer Certified Lean Six Sigma Black Belt by IASSCHAKKACHE Mohamed
 
AzureML Welcome to the future of Predictive Analytics
AzureML Welcome to the future of Predictive Analytics AzureML Welcome to the future of Predictive Analytics
AzureML Welcome to the future of Predictive Analytics Ruben Pertusa Lopez
 
Deep Learning for Autonomous Driving
Deep Learning for Autonomous DrivingDeep Learning for Autonomous Driving
Deep Learning for Autonomous DrivingJan Wiegelmann
 
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXTDriving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXTDataWorks Summit
 
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALSPYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALSQuantUniversity
 
Certified Data Science Specialist Course Preview Dr. Nickholas
Certified Data Science Specialist Course Preview Dr. NickholasCertified Data Science Specialist Course Preview Dr. Nickholas
Certified Data Science Specialist Course Preview Dr. NickholasiTrainMalaysia1
 
Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21Gülden Bilgütay
 
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...Joachim Schlosser
 
Data Science in the Elastic Stack
Data Science in the Elastic StackData Science in the Elastic Stack
Data Science in the Elastic StackRochelle Sonnenberg
 
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...All Things Open
 
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)dtz001
 
Alluxio Monthly Webinar - Accelerate AI Path to Production
Alluxio Monthly Webinar - Accelerate AI Path to ProductionAlluxio Monthly Webinar - Accelerate AI Path to Production
Alluxio Monthly Webinar - Accelerate AI Path to ProductionAlluxio, Inc.
 
Scaling machine learning to millions of users with Apache Beam
Scaling machine learning to millions of users with Apache BeamScaling machine learning to millions of users with Apache Beam
Scaling machine learning to millions of users with Apache BeamTatiana Al-Chueyr
 
The A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOpsThe A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOpsDataPhoenix
 
( Big ) Data Management - Data Mining and Machine Learning - Global concepts ...
( Big ) Data Management - Data Mining and Machine Learning - Global concepts ...( Big ) Data Management - Data Mining and Machine Learning - Global concepts ...
( Big ) Data Management - Data Mining and Machine Learning - Global concepts ...Nicolas Sarramagna
 
Supply Chain Management
Supply Chain ManagementSupply Chain Management
Supply Chain ManagementKori Bori
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and EngineeringVijayananda Mohire
 

Similar a Continuous Deployment of Machine Learning Pipelines (20)

CD Theory With Lab Component-IMLCIS.doc
CD Theory With Lab Component-IMLCIS.docCD Theory With Lab Component-IMLCIS.doc
CD Theory With Lab Component-IMLCIS.doc
 
Product Engineer Certified Lean Six Sigma Black Belt by IASSC
Product Engineer Certified Lean Six Sigma Black Belt by IASSCProduct Engineer Certified Lean Six Sigma Black Belt by IASSC
Product Engineer Certified Lean Six Sigma Black Belt by IASSC
 
AzureML Welcome to the future of Predictive Analytics
AzureML Welcome to the future of Predictive Analytics AzureML Welcome to the future of Predictive Analytics
AzureML Welcome to the future of Predictive Analytics
 
Introducing MLOps.pdf
Introducing MLOps.pdfIntroducing MLOps.pdf
Introducing MLOps.pdf
 
Deep Learning for Autonomous Driving
Deep Learning for Autonomous DrivingDeep Learning for Autonomous Driving
Deep Learning for Autonomous Driving
 
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXTDriving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
Driving Enterprise Adoption: Tragedies, Triumphs and Our NEXT
 
Deep learning in manufacturing predicting and preventing manufacturing defect...
Deep learning in manufacturing predicting and preventing manufacturing defect...Deep learning in manufacturing predicting and preventing manufacturing defect...
Deep learning in manufacturing predicting and preventing manufacturing defect...
 
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALSPYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
PYTHON AND DATA SCIENCE FOR INVESTMENT PROFESSIONALS
 
Certified Data Science Specialist Course Preview Dr. Nickholas
Certified Data Science Specialist Course Preview Dr. NickholasCertified Data Science Specialist Course Preview Dr. Nickholas
Certified Data Science Specialist Course Preview Dr. Nickholas
 
Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21Machine Learning for .NET Developers - ADC21
Machine Learning for .NET Developers - ADC21
 
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
Den Datenschatz heben und Zeit- und Energieeffizienz steigern: Mathematik und...
 
Data Science in the Elastic Stack
Data Science in the Elastic StackData Science in the Elastic Stack
Data Science in the Elastic Stack
 
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
Deployment Design Patterns - Deploying Machine Learning and Deep Learning Mod...
 
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
AllThingsOpen 2018 - Deployment Design Patterns (Dan Zaratsian)
 
Alluxio Monthly Webinar - Accelerate AI Path to Production
Alluxio Monthly Webinar - Accelerate AI Path to ProductionAlluxio Monthly Webinar - Accelerate AI Path to Production
Alluxio Monthly Webinar - Accelerate AI Path to Production
 
Scaling machine learning to millions of users with Apache Beam
Scaling machine learning to millions of users with Apache BeamScaling machine learning to millions of users with Apache Beam
Scaling machine learning to millions of users with Apache Beam
 
The A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOpsThe A-Z of Data: Introduction to MLOps
The A-Z of Data: Introduction to MLOps
 
( Big ) Data Management - Data Mining and Machine Learning - Global concepts ...
( Big ) Data Management - Data Mining and Machine Learning - Global concepts ...( Big ) Data Management - Data Mining and Machine Learning - Global concepts ...
( Big ) Data Management - Data Mining and Machine Learning - Global concepts ...
 
Supply Chain Management
Supply Chain ManagementSupply Chain Management
Supply Chain Management
 
Key projects Data Science and Engineering
Key projects Data Science and EngineeringKey projects Data Science and Engineering
Key projects Data Science and Engineering
 

Último

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceDelhi Call girls
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% SecurePooja Nehwal
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...amitlee9823
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Researchmichael115558
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...amitlee9823
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...karishmasinghjnh
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramMoniSankarHazra
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...SUHANI PANDEY
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...amitlee9823
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Delhi Call girls
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxolyaivanovalion
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...amitlee9823
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx9to5mart
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Standamitlee9823
 

Último (20)

BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men  🔝Bangalore🔝   Esc...
➥🔝 7737669865 🔝▻ Bangalore Call-girls in Women Seeking Men 🔝Bangalore🔝 Esc...
 
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Rabindra Nagar  (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Rabindra Nagar (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Discover Why Less is More in B2B Research
Discover Why Less is More in B2B ResearchDiscover Why Less is More in B2B Research
Discover Why Less is More in B2B Research
 
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men  🔝Mathura🔝   Escorts...
➥🔝 7737669865 🔝▻ Mathura Call-girls in Women Seeking Men 🔝Mathura🔝 Escorts...
 
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
👉 Amritsar Call Girl 👉📞 6367187148 👉📞 Just📲 Call Ruhi Call Girl Phone No Amri...
 
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night StandCall Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Attibele ☎ 7737669865 🥵 Book Your One night Stand
 
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night StandCall Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Nandini Layout ☎ 7737669865 🥵 Book Your One night Stand
 
Capstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics ProgramCapstone Project on IBM Data Analytics Program
Capstone Project on IBM Data Analytics Program
 
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
VIP Model Call Girls Hinjewadi ( Pune ) Call ON 8005736733 Starting From 5K t...
 
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
Call Girls Hsr Layout Just Call 👗 7737669865 👗 Top Class Call Girl Service Ba...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
Predicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science ProjectPredicting Loan Approval: A Data Science Project
Predicting Loan Approval: A Data Science Project
 
Midocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFxMidocean dropshipping via API with DroFx
Midocean dropshipping via API with DroFx
 
Abortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get CytotecAbortion pills in Jeddah | +966572737505 | Get Cytotec
Abortion pills in Jeddah | +966572737505 | Get Cytotec
 
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts ServiceCall Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
Call Girls In Shalimar Bagh ( Delhi) 9953330565 Escorts Service
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
hybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptxhybrid Seed Production In Chilli & Capsicum.pptx
hybrid Seed Production In Chilli & Capsicum.pptx
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 

Continuous Deployment of Machine Learning Pipelines

  • 1. 28.03.19 DFKI – DIMA – TU Berlin Database Systems and Information Management Group TU Berlin https://www.dima.tu-berlin.de/ Intelligent Analytics for Massive Data German Research Center for Artificial Intelligence https://www.dfki.de/ 1 Continuous Deployment of Machine Learning Pipelines Behrouz Derakhshan behrouz.derakhshan@dfki.de Alireza Rezaei Mahdiraji alireza.rm@dfki.de Tilmann Rabl rabl@tu-berlin.de Volker Markl volker.markl@tu-berlin.de
  • 2. 28.03.19 DFKI – DIMA – TU Berlin 2 Life Cycle of Machine Learning Applications
  • 3. 28.03.19 DFKI – DIMA – TU Berlin 2 ■ Life cycle of ML application does not end with training Life Cycle of Machine Learning Applications
  • 4. 28.03.19 DFKI – DIMA – TU Berlin 2 ■ Life cycle of ML application does not end with training ■ Models and Pipelines must be deployed to answer prediction queries Life Cycle of Machine Learning Applications
  • 5. 28.03.19 DFKI – DIMA – TU Berlin 2 ■ Life cycle of ML application does not end with training ■ Models and Pipelines must be deployed to answer prediction queries ■ Deployed models and pipelines should be monitored and trained further Life Cycle of Machine Learning Applications
  • 6. 28.03.19 DFKI – DIMA – TU Berlin 2 ■ Life cycle of ML application does not end with training ■ Models and Pipelines must be deployed to answer prediction queries ■ Deployed models and pipelines should be monitored and trained further Life Cycle of Machine Learning Applications Focus of this talk
  • 7. 28.03.19 DFKI – DIMA – TU Berlin 3 2. Deployment Improvement By Online Learning Data Preprocessing Model Training Model 1. Training Data Preprocessing Model Training Model Deployment Platform 3. Query Answering Prediction Query Prediction Training DataTraining Data
  • 8. 28.03.19 DFKI – DIMA – TU Berlin 3 2. Deployment Improvement By Online Learning Data Preprocessing Model Training Model 1. Training Data Preprocessing Model Training Model Deployment Platform 3. Query Answering Prediction Query Prediction Training Data Training DataTraining Data
  • 9. 28.03.19 DFKI – DIMA – TU Berlin 3 2. Deployment Improvement By Online Learning Data Preprocessing Model Training Model 1. Training Data Preprocessing Model Training Model Deployment Platform 3. Query Answering Prediction Query Prediction Training Data Training DataTraining Data 4. Online Learning
  • 10. 28.03.19 DFKI – DIMA – TU Berlin 3 2. Deployment Improvement By Online Learning Data Preprocessing Model Training Model 1. Training Data Preprocessing Model Training Model Deployment Platform 3. Query Answering Prediction Query Prediction Training Data Training DataTraining Data • Efficient and Fast 4. Online Learning
  • 11. 28.03.19 DFKI – DIMA – TU Berlin 3 2. Deployment Improvement By Online Learning Data Preprocessing Model Training Model 1. Training Data Preprocessing Model Training Model Deployment Platform 3. Query Answering Prediction Query Prediction Training Data Training DataTraining Data • Cannot guarantee high-quality models • Efficient and Fast 4. Online Learning
  • 12. 28.03.19 DFKI – DIMA – TU Berlin 4 2. Deployment Improvement By Retraining Data Preprocessing Model Training Model Data Preprocessing Model Training Model Deployment Platform Prediction Query Prediction Training Data Training Data 1. Training 3. Query Answering 4. Online Learning
  • 13. 28.03.19 DFKI – DIMA – TU Berlin 4 2. Deployment Improvement By Retraining Data Preprocessing Model Training Model Data Preprocessing Model Training Model Deployment Platform Prediction Query Prediction New Training Data 5. Gather Training Data Training Data Training Data 1. Training 3. Query Answering 4. Online Learning
  • 14. 28.03.19 DFKI – DIMA – TU Berlin 4 2. Deployment Improvement By Retraining Data Preprocessing Model Training Model Data Preprocessing Model Training Model Deployment Platform Prediction Query Prediction New Training Data 5. Gather Training Data Training Data 6. Retraining Training Data 1. Training 3. Query Answering 4. Online Learning
  • 15. 28.03.19 DFKI – DIMA – TU Berlin 4 2. Deployment Improvement By Retraining Data Preprocessing Model Training Model Data Preprocessing Model Training Model Deployment Platform Prediction Query Prediction New Training Data 5. Gather Training Data Training Data 6. Retraining Historical Training Data 1. Training 3. Query Answering 4. Online Learning
  • 16. 28.03.19 DFKI – DIMA – TU Berlin 4 2. Deployment Improvement By Retraining Data Preprocessing Model Training Model Data Preprocessing Model Training Model Deployment Platform Prediction Query Prediction New Training Data 5. Gather Training Data Training Data 6. Retraining Historical Training Data • Provides high-quality models 1. Training 3. Query Answering 4. Online Learning
  • 17. 28.03.19 DFKI – DIMA – TU Berlin 4 2. Deployment Improvement By Retraining Data Preprocessing Model Training Model Data Preprocessing Model Training Model Deployment Platform Prediction Query Prediction New Training Data 5. Gather Training Data Training Data 6. Retraining Historical Training Data • Provides high-quality models • Time-consuming and resource-intensive • Out-of-core 1. Training 3. Query Answering 4. Online Learning
  • 18. 28.03.19 DFKI – DIMA – TU Berlin 5 Ideal Deployment Platform Can a platform provide the same level of quality as Retraining and perform (almost) as efficiently as Online Learning?
  • 19. 28.03.19 DFKI – DIMA – TU Berlin 6 Continuous Deployment Platform Prediction Query Prediction Training Data Continuous Deployment Platform • Train the model inside the platform • Compute features and cache them • Update data preprocessing statistics • Replace Retraining with Proactive Training Data Preprocessing Model Training Model Historical Training Data Online Learning Proactive Training Cache computed Features Stats
  • 20. 28.03.19 DFKI – DIMA – TU Berlin 7 Historical Training Data Preprocess MaterializeDiscretize Sample Update Raw Data Chunks Preprocessed Features Model Continuous Deployment Platform Scheduled Execution Raw Data
  • 21. 28.03.19 DFKI – DIMA – TU Berlin 8 Historical Training Data Preprocess MaterializeDiscretize Sample Update Raw Data Chunks Preprocessed Features Model Continuous Deployment Platform Scheduled Execution Raw Data Data Preparation Phase
  • 22. 28.03.19 DFKI – DIMA – TU Berlin 9 Historical Training Data Preprocess MaterializeDiscretize Sample Update Raw Data Chunks Preprocessed Features Model Continuous Deployment Platform Scheduled Execution Raw Data Proactive Training Phase
  • 23. 28.03.19 DFKI – DIMA – TU Berlin 10 Historical Training Data Data Preparation Phase Raw Data Raw Data Chunks Discretize ….. t0 t1 t2 t3 t0 t1 t2 t3 tn-3 tn-2 tn-1 tn
  • 24. 28.03.19 DFKI – DIMA – TU Berlin 10 Historical Training Data Data Preparation Phase Raw Data Raw Data Chunks Discretize Preprocess Parser Missing Value Imputer Standard Scaler Feature Hasher ….. t0 t1 t2 t3 t0 t1 t2 t3 tn-3 tn-2 tn-1 tn
  • 25. 28.03.19 DFKI – DIMA – TU Berlin 10 Historical Training Data Data Preparation Phase Raw Data Raw Data Chunks Discretize Preprocess Feature Chunks Parser Missing Value Imputer Standard Scaler Feature Hasher ….. ….. Cache Layer t0 t1 t2 t3 t0 t1 t2 t3 t0 t1 t2 t3 tn-3 tn-2 tn-1 tn
  • 26. 28.03.19 DFKI – DIMA – TU Berlin 10 Historical Training Data Data Preparation Phase Raw Data Raw Data Chunks Discretize Preprocess Feature Chunks Parser Missing Value Imputer Standard Scaler Feature Hasher Stats Stats ….. ….. Cache Layer t0 t1 t2 t3 t0 t1 t2 t3 t0 t1 t2 t3 tn-3 tn-2 tn-1 tn Update and Store Statistics
  • 27. 28.03.19 DFKI – DIMA – TU Berlin 11 Storage Requirement Historical Training Data ….. ….. t0 t1 t2 t3 tn-3 tn-2 tn-1 tn Cache Layer
  • 28. 28.03.19 DFKI – DIMA – TU Berlin 11 Storage Requirement Historical Training Data ….. ….. Removed Cached Features t0 t1 t2 t3 tn-3 tn-2 tn-1 tn Cache Layer
  • 29. 28.03.19 DFKI – DIMA – TU Berlin 12 Historical Training Data ….. ….. Sample Update ModelMaterialize Cache Layer Scheduled Execution Proactive Training Phase
  • 30. 28.03.19 DFKI – DIMA – TU Berlin 12 Historical Training Data ….. ….. Sample Update ModelMaterialize Cache Layer mini-batch SGD Algorithm Input: = training dataset Output: = trained model 1: initialize 2: for = do 3: 4: 5 6: end for 7: return 𝐷 𝑚 𝑚0 𝑖 1…𝑛  𝑠𝑖 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑓𝑟𝑜𝑚 𝐷 𝑔 = 𝛁𝐽(𝑠𝑖, 𝑚𝑖−1) :     𝑚𝑖 = 𝑚𝑖−1 − 𝜂𝑖−1 𝑔 𝑚𝑛 Scheduled Execution Proactive Training Phase
  • 31. 28.03.19 DFKI – DIMA – TU Berlin 12 Historical Training Data ….. ….. Sample Update ModelMaterialize Cache Layer mini-batch SGD Algorithm Input: = training dataset Output: = trained model 1: initialize 2: for = do 3: 4: 5 6: end for 7: return 𝐷 𝑚 𝑚0 𝑖 1…𝑛  𝑠𝑖 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑓𝑟𝑜𝑚 𝐷 𝑔 = 𝛁𝐽(𝑠𝑖, 𝑚𝑖−1) :     𝑚𝑖 = 𝑚𝑖−1 − 𝜂𝑖−1 𝑔 𝑚𝑛 Scheduled Execution Proactive Training Phase
  • 32. 28.03.19 DFKI – DIMA – TU Berlin 12 Historical Training Data ….. ….. Sample Update ModelMaterialize Cache Layer mini-batch SGD Algorithm Input: = training dataset Output: = trained model 1: initialize 2: for = do 3: 4: 5 6: end for 7: return 𝐷 𝑚 𝑚0 𝑖 1…𝑛  𝑠𝑖 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑓𝑟𝑜𝑚 𝐷 𝑔 = 𝛁𝐽(𝑠𝑖, 𝑚𝑖−1) :     𝑚𝑖 = 𝑚𝑖−1 − 𝜂𝑖−1 𝑔 𝑚𝑛 Scheduled Execution Proactive Training Phase
  • 33. 28.03.19 DFKI – DIMA – TU Berlin 12 Historical Training Data ….. ….. Sample Update ModelMaterialize Cache Layer mini-batch SGD Algorithm Input: = training dataset Output: = trained model 1: initialize 2: for = do 3: 4: 5 6: end for 7: return 𝐷 𝑚 𝑚0 𝑖 1…𝑛  𝑠𝑖 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑓𝑟𝑜𝑚 𝐷 𝑔 = 𝛁𝐽(𝑠𝑖, 𝑚𝑖−1) :     𝑚𝑖 = 𝑚𝑖−1 − 𝜂𝑖−1 𝑔 𝑚𝑛 Scheduled Execution Proactive Training Phase
  • 34. 28.03.19 DFKI – DIMA – TU Berlin 12 Historical Training Data ….. ….. Sample Update ModelMaterialize Cache Layer mini-batch SGD Algorithm Input: = training dataset Output: = trained model 1: initialize 2: for = do 3: 4: 5 6: end for 7: return 𝐷 𝑚 𝑚0 𝑖 1…𝑛  𝑠𝑖 = 𝑠𝑎𝑚𝑝𝑙𝑒 𝑓𝑟𝑜𝑚 𝐷 𝑔 = 𝛁𝐽(𝑠𝑖, 𝑚𝑖−1) :     𝑚𝑖 = 𝑚𝑖−1 − 𝜂𝑖−1 𝑔 𝑚𝑛 Scheduled Execution Proactive Training Phase
  • 35. 28.03.19 DFKI – DIMA – TU Berlin 13 Data Materializing Historical Training Data ….. ….. Sample Update ModelMaterialize Cache Layer
  • 36. 28.03.19 DFKI – DIMA – TU Berlin 13 Data Materializing Historical Training Data ….. ….. Sample Update ModelMaterialize Parser Missing Value Imputer Standard Scaler Feature Hasher Stats Stats Cache Layer Statistics precomputed during the Data Preparation Phase
  • 37. 28.03.19 DFKI – DIMA – TU Berlin 14 Evaluation Datasets Size #Instances Initial Deployment URL 2.1 GB 2.4 M Day 0 Day 1-120 Taxi 42 GB 280 M Jan 15 Feb 15 – Jun 16 Parser Missing Value Imputer Standard Scaler Feature Hasher SVM Model URL Pipeline Parser Feature Extractor Anomaly Detector Standard Scaler Linear Regression Model Taxi Pipeline
  • 38. 28.03.19 DFKI – DIMA – TU Berlin 15 Quality Can Proactive Training provide the same level of quality as Retraining?
  • 39. 28.03.19 DFKI – DIMA – TU Berlin 15 Quality − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − 2.1 2.2 2.3 day1 day30 day60 day90 day120 Misclassification% − − −Proactive Retraining Online Can Proactive Training provide the same level of quality as Retraining? Cumulative Prequential Prediction Error Rate for the URL Pipeline During the Deployment
  • 40. 28.03.19 DFKI – DIMA – TU Berlin 16 Training Time Can Proactive Training perform (almost) as efficiently as Online Learning?
  • 41. 28.03.19 DFKI – DIMA – TU Berlin 16 Training Time − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − 0 250 500 750 day1 day30 day60 day90 day120 Time(m) − − −Proactive Retraining Online Cumulative Training Time for the URL Pipeline During the Deployment Can Proactive Training perform (almost) as efficiently as Online Learning?
  • 42. 28.03.19 DFKI – DIMA – TU Berlin 16 Training Time − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − 0 250 500 750 day1 day30 day60 day90 day120 Time(m) − − −Proactive Retraining Online Cumulative Training Time for the URL Pipeline During the Deployment Can Proactive Training perform (almost) as efficiently as Online Learning? Proactive training provides same level of model accuracy as Retraining, while matching the speed of Online Learning
  • 43. 28.03.19 DFKI – DIMA – TU Berlin 17 Feature Caching and Statistics Computation What are the effects of Statistics Computation and Feature Caching ? 111 91 55 0 40 80 120 No Optimiziation 0% Cached 100% Cached Time(m) Total Training Time in Presence of Statistics Computation and Feature Caching
  • 44. 28.03.19 DFKI – DIMA – TU Berlin 17 Feature Caching and Statistics Computation What are the effects of Statistics Computation and Feature Caching ? 111 91 55 0 40 80 120 No Optimiziation 0% Cached 100% Cached Time(m) Total Training Time in Presence of Statistics Computation and Feature Caching Statistics Computation and Feature Caching improves the performance of Proactive training by a factor of 2
  • 45. 28.03.19 DFKI – DIMA – TU Berlin 18 Summary Continuous Deployment Platform • Proactive Training, instead of Offline Retraining • Feature Caching • Online Statistics Computation • Reduces the total training time • Achieves high quality Historical Training Data Preprocess MaterializeDiscretize Sample Update Raw Data Raw Data Chunks Preprocessed Features Model Schedule Execution 2.23 2.24 2.25 2.26 2.27 0 400 800 Time(m) Misclassification% Proactive Retraining Online
  • 46. 28.03.19 DFKI – DIMA – TU Berlin 19 1. D. Crankshaw, X. Wang, G. Zhou, M. Franklin, et al. 2016. Clipper: A Low-Latency Online Prediction Serving System. arXiv preprint arXiv:1612.03079 (2016). 2. D. Crankshaw, P. Bailis, J. Gonzalez, H. Li, et al. 2014. The missing piece incomplex analytics: Low latency, scalable model management and serving with velox. 3. D. Baylor, E. Breck, H. Cheng, N. Fiedel, et al. 2017. TFX: A TensorFlow-Based Production-Scale Machine Learning Platform. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 1387–1395. 4. L. Bottou. 2010. Large-scale machine learning with stochastic gradient descent. In Proceedings of COMPSTAT’2010. Springer, 177–186. 5. M. Zaharia, M. Chowdhury, M. Franklin, S. Shenker, and I. Stoica. 2010. Spark: cluster computing with working sets. HotCloud 10 (2010), 10–10. 6. O. Chapelle. [n. d.]. NYC Taxi & Lomousine Commision Trip Record Data. http://www.nyc.gov/html/tlc/ html/about/trip_record_data.shtml. [Online;accessed 10-April-2018]. 7. J. Ma, L. Saul, S. Savage, and G. Voelker. 2009. Identifying suspicious URLs: an application of large- scale online learning. In Proceedings of the 26th annual international conference on machine learning. ACM, 681–688. 8. D. Kingma and J. Ba. 2014. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014). 9. M. Zeiler. 2012. ADADELTA: an adaptive learning rate method. arXiv preprint arXiv:1212.5701 (2012). 10. T. Tieleman and G. Hinton. 2012. Lecture 6.5-rmsprop: Divide the gradient by a running average of its recent magnitude. COURSERA: Neural networks for machine learning 4, 2 (2012), 26–31. References
  • 47. 28.03.19 DFKI – DIMA – TU Berlin 20 Backup Slides
  • 48. 28.03.19 DFKI – DIMA – TU Berlin 21 Preprocess MaterializeDiscretize Sample Update Raw Data Chunks Preprocessed Features Model Data Manager Historical Training Data Raw Data Data Manager • Data Discretizing • Data Sampling • Historical Data Management Scheduled Execution
  • 49. 28.03.19 DFKI – DIMA – TU Berlin 22 Scheduled Execution Historical Training Data Preprocess MaterializeDiscretize Sample Update Raw Data Raw Data Chunks Preprocessed Features Model Pipeline Manager Pipeline Manager • Data Preprocessing • Data Materialization • Pipeline Component Management
  • 50. 28.03.19 DFKI – DIMA – TU Berlin 23 Historical Training Data Preprocess MaterializeDiscretize Sample Update Raw Data Raw Data Chunks Preprocessed Features Model Model Updater Model Updater • Online Training • Proactive Training Scheduled Execution
  • 51. 28.03.19 DFKI – DIMA – TU Berlin 24 Historical Training Data Preprocess MaterializeDiscretize Sample Update Raw Data Raw Data Chunks Preprocessed Features Model Scheduler Scheduler • Schedule Proactive Training Scheduled Execution
  • 52. 28.03.19 DFKI – DIMA – TU Berlin 25 ■ TFX □ Manual Retraining □ No Online Learning ■ Velox □ Automatic Retraining □ Online Learning ■ Clipper □ No Retraining □ No Online Learning □ Ensemble of Models Related Work
  • 53. 28.03.19 DFKI – DIMA – TU Berlin 26 Proactive Training vs Periodical Retraining (Taxi) Cumulative Prequential Prediction Error Rate for the Taxi Pipeline During the Deployment − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − 0.0960 0.0965 0.0970 0.0975 0.0980 Feb15 Jul15 Jan16 June16 RMSLE − − −Proactive Retraining Online
  • 54. 28.03.19 DFKI – DIMA – TU Berlin 27 Proactive Training vs Periodical Retraining (Taxi) Cumulative Training Time for the Taxi Pipeline During the Deployment − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − − 0 500 1000 1500 Feb15 Jul15 Jan16 June16 Time(m) − − −Proactive Retraining Online
  • 55. 28.03.19 DFKI – DIMA – TU Berlin 28 Materialization and Statistics Computation (Taxi) ● ● ● ●300 400 500 600 700 800 0.0 0.2 0.6 1.0 Cached Features Ratio Time(m) ● TimeBased Uniform WindowBased NoOptimization Materialization Utilization Rate for different ratio of Cached Features Taxi Sampling Ratio of Cached Features m = 0.2 m = 0.6 Uniform 0.51 0.90 Window-based 0.57 1.0 Time-based 0.65 0.97 Materialization Utilization Rate: Ratio of preprocessed features that skipped the materialization step
  • 56. 28.03.19 DFKI – DIMA – TU Berlin 29 Materialization and Statistics Computation (URL) Sampling Ratio of Cached Features m = 0.2 m = 0.6 Uniform 0.52 0.91 Window-based 0.58 1.0 Time-based 0.68 0.97 Materialization Utilization Rate for different ratio of Cached Features URL ● ● ● ● 60 70 80 90 100 110 0.0 0.2 0.6 1.0 Cached Features Ratio Time(m) ● TimeBased Uniform WindowBased NoOptimization Materialization Utilization Rate: Ratio of preprocessed features that skipped the materialization step
  • 57. 28.03.19 DFKI – DIMA – TU Berlin 30 Time vs Quality (Taxi) 0.0974 0.0975 0.0976 0 1000 2000 Time(m) RMSLE Proactive Retraining Online
  • 58. 28.03.19 DFKI – DIMA – TU Berlin 31 Adaptation URL Taxi r = 1E-2 r = 1E-3 r = 1E-4 r = 1E-2 r = 1E-3 r = 1E-4 Adam 0.030 0.026 0.035 0.09553 0.09551 0.09551 RMSProp 0.030 0.027 0.034 0.09552 0.09552 0.09550 Adadelta 0.029 0.028 0.034 0.09609 0.09610 0.09619 Effect of Learning Rate Adaption during the Deployment Effect Learning Rate Adaption and Regularization Parameter on Initial Training Tuning − − −Adam Rmsprop Adadelta − − − − − − − − − − − − − − − − − − − − − 2.10 2.15 2.20 2.25 2.30 2.35 day1 day15 day30 URL Misclassification% − − − − − − − − − − − − − − − − − − − − − − − − − − − 0.085 0.090 0.095 Feb15 Apr15 TaxiRMSLE
  • 59. 28.03.19 DFKI – DIMA – TU Berlin 32 Effect of Sampling on model quality Effect of Sampling Method on the Error Rate − − −Timebased Windowbased Uniform − − − − − − − − − − − − − − − − − − − − − 2.10 2.15 2.20 2.25 2.30 day1 day15 day30 URL Misclassification% − − − − − − − − − − − − − − − − − − − − − − − − − − − 0.085 0.090 0.095 Feb15 Apr15 Taxi RMSLE
  • 60. 28.03.19 DFKI – DIMA – TU Berlin 33 Materialization Process
  • 61. 28.03.19 DFKI – DIMA – TU Berlin 34 Ads CTR USE Case Figure