SlideShare una empresa de Scribd logo
1 de 79
Descargar para leer sin conexión
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
Distributed Deep Learning Using Hopsworks
CGI Trainee Program Workshop
Kim Hammar
kim@logicalclocks.com
Before we start..
1. Register for an account at:
www.hops.site
2. Follow the instructions at:
http://bit.ly/2EnZQgW
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
DISTRIBUTED COMPUTING + DEEP LEARNING = ?
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Distributed Computing Deep Learning
Why Combine the two?
2em11
Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR
abs/1707.02968 (2017). arXiv: 1707.02968. URL: http://arxiv.org/abs/1707.02968.
2em12
Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing
Systems 25. Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231.
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
DISTRIBUTED COMPUTING + DEEP LEARNING = ?
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Distributed Computing Deep Learning
Why Combine the two?
We like challenging problems
2em11
Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR
abs/1707.02968 (2017). arXiv: 1707.02968. URL: http://arxiv.org/abs/1707.02968.
2em12
Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing
Systems 25. Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231.
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
DISTRIBUTED COMPUTING + DEEP LEARNING = ?
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Distributed Computing Deep Learning
Why Combine the two?
We like challenging problems
More productive data science
Unreasonable effectiveness of data1
To achieve state-of-the-art results2
2em11
Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR
abs/1707.02968 (2017). arXiv: 1707.02968. URL: http://arxiv.org/abs/1707.02968.
2em12
Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing
Systems 25. Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231.
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
DISTRIBUTED DEEP LEARNING (DDL): PREDICTABLE
SCALING
3
2em13
Jeff Dean. Building Intelligent Systems withLarge Scale Deep Learning. https : / / www . scribd . com /
document/355752799/Jeff-Dean-s-Lecture-for-YC-AI. 2018.
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
DISTRIBUTED DEEP LEARNING (DDL): PREDICTABLE
SCALING
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
DDL IS NOT A SECRET ANYMORE
4
2em14
Tal Ben-Nun and Torsten Hoefler. “Demystifying Parallel and Distributed Deep Learning: An In-Depth
Concurrency Analysis”. In: CoRR abs/1802.09941 (2018). arXiv: 1802.09941. URL: http://arxiv.org/abs/
1802.09941.
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
DDL IS NOT A SECRET ANYMORE
TensorflowOnSpark
CaffeOnSpark
Distributed TF
Frameworks for DDL Companies using DDL
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
DDL REQUIRES AN ENTIRE
SOFTWARE/INFRASTRUCTURE STACK
e1
e2
e3
Distributed Training
e4
Gradient
Gradient
Gradient
Gradient
Distributed Systems
Data Validation
Feature Engineering
Data Collection
Hardware
Management
HyperParameter
Tuning
Model
Serving
Pipeline Management
A/B
Testing
Monitoring
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
OUTLINE
1. Hopsworks: Background of the platform
2. Managed Distributed Deep Learning using HopsYARN,
HopsML, PySpark, and Tensorflow
3. Black-Box Optimization (Hyperparameter Tuning) using
Hopsworks, Metadata Store, PySpark, and Maggy5
4. Feature Store data management for machine learning
5. Coffee Break
6. Demo, end-to-end ML pipeline
7. Hands-on Workshop, try out Hopsworks on our cluster in
Luleå
2em15
Moritz Meister and Sina Sheikholeslami. Maggy. https://github.com/logicalclocks/maggy.
2019.
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
HOPSWORKS
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
HOPSWORKS
HopsFS
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
HOPSWORKS
HopsFS
HopsYARN
(GPU/CPU as a resource)
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
HOPSWORKS
HopsFS
HopsYARN
(GPU/CPU as a resource)
Frameworks
(ML/Data)
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
HOPSWORKS
HopsFS
HopsYARN
(GPU/CPU as a resource)
Frameworks
(ML/Data)
Feature Store Pipelines Experiments Models
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
ML/AI Assets
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
HOPSWORKS
HopsFS
HopsYARN
(GPU/CPU as a resource)
Frameworks
(ML/Data)
Feature Store Pipelines Experiments Models
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
ML/AI Assets
from hops import featurestore
from hops import experiment
featurestore.get_features([
"average_attendance",
"average_player_age"])
experiment.collective_all_reduce(features , model)
APIs
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
HOPSWORKS
HopsFS
HopsYARN
(GPU/CPU as a resource)
Frameworks
(ML/Data)
Distributed Metadata
(Available from REST API)
Feature Store Pipelines Experiments Models
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
ML/AI Assets
from hops import featurestore
from hops import experiment
featurestore.get_features([
"average_attendance",
"average_player_age"])
experiment.collective_all_reduce(features , model)
APIs
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
INNER AND OUTER LOOP OF LARGE SCALE DEEP
LEARNING
Inner loop
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
. . .
worker1 worker2 workerN
Data
Synchronization
1 2 N
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
INNER AND OUTER LOOP OF LARGE SCALE DEEP
LEARNING
Outer loop
Metric τ
Search Method
hparams h
Inner loop
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
. . .
worker1 worker2 workerN
Data
Synchronization
1 2 N
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
INNER AND OUTER LOOP OF LARGE SCALE DEEP
LEARNING
Outer loop
Metric τ
Search Method
hparams h
Inner loop
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
. . .
worker1 worker2 workerN
Data
Synchronization
1 2 N
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
INNER LOOP: DISTRIBUTED DEEP LEARNING




x1
...
xn




Features
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Model θ
ˆy
Prediction
L(y, ˆy)
Loss
Gradient
θL(y, ˆy)
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
INNER LOOP: DISTRIBUTED DEEP LEARNING
e1
e2
e3
e4
p1
p2
p3
p4
Gradient
Gradient
Gradient
Gradient
Data Partition
Data Partition
Data Partition
Data Partition
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
DISTRIBUTED DEEP LEARNING IN PRACTICE
Implementation
of distributed
algorithms is
becoming a
commodity (TF,
PyTorch etc)
The hardest part
of DDL is now:
Cluster
management
Allocating
GPUs
Data
management
Operations &
performance
?
Models GPUs Data Distribution
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
HOPSWORKS DDL SOLUTION
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
HOPSWORKS DDL SOLUTION
from hops import experiment
experiment.collective_all_reduce(train_fn)
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
HOPSWORKS DDL SOLUTION
from hops
import experiment
experiment.
collective_all_reduce(
train_fn
)
HopsYARN RM
YARN container
GPU as a resource
YARN container
GPU as a resource
YARN container
GPU as a resource
YARN container
GPU as a resource
Resource requests
Client API
YARN container
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
HOPSWORKS DDL SOLUTION
from hops
import experiment
experiment.
collective_all_reduce(
train_fn
)
HopsYARN RM
YARN container
GPU as a resource
Spark executor
YARN container
GPU as a resource
Spark executor
YARN container
GPU as a resource
Spark executor
YARN container
GPU as a resource
Spark executor
Resource requests
Client API
YARN container
Spark driver
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
HOPSWORKS DDL SOLUTION
from hops
import experiment
experiment.
collective_all_reduce(
train_fn
)
HopsYARN RM
YARN container
GPU as a resource
conda env
Spark executor
YARN container
GPU as a resource
conda env
Spark executor
YARN container
GPU as a resource
conda env
Spark executor
YARN container
GPU as a resource
conda env
Spark executor
Resource requests
Client API
YARN container
conda env
Spark driver
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
HOPSWORKS DDL SOLUTION
from hops
import experiment
experiment.
collective_all_reduce(
train_fn
)
HopsYARN RM
YARN container
GPU as a resource
conda env
Spark executor
YARN container
GPU as a resource
conda env
Spark executor
YARN container
GPU as a resource
conda env
Spark executor
YARN container
GPU as a resource
conda env
Spark executor
Resource requests
Client API
Here is my ip: 192.168.1.1
Here is my ip: 192.168.1.2
Here is my ip: 192.168.1.3
Here is my ip: 192.168.1.4
YARN container
conda env
Spark driver
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
HOPSWORKS DDL SOLUTION
from hops
import experiment
experiment.
collective_all_reduce(
train_fn
)
HopsYARN RM
YARN container
GPU as a resource
conda env
Spark executor
YARN container
GPU as a resource
conda env
Spark executor
YARN container
GPU as a resource
conda env
Spark executor
YARN container
GPU as a resource
conda env
Spark executor
Resource requests
Client API
Gradient
Gradient
GradientGradient
YARN container
conda env
Spark driver
Hops Distributed File System (HopsFS)
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
HOPSWORKS DDL SOLUTION
from hops
import experiment
experiment.
collective_all_reduce(
train_fn
)
HopsYARN RM
YARN container
GPU as a resource
conda env
Spark executor
YARN container
GPU as a resource
conda env
Spark executor
YARN container
GPU as a resource
conda env
Spark executor
YARN container
GPU as a resource
conda env
Spark executor
Resource requests
Client API
Gradient
Gradient
GradientGradient
YARN container
conda env
Spark driver
Hops Distributed File System (HopsFS)
Hide complexity behind simple API
Allocate resources using pyspark
Allocate GPUs for spark executors using HopsYARN
Serve sharded training data to workers from HopsFS
Use HopsFS for aggregating logs, checkpoints and results
Store experiment metadata in metastore
Use dynamic allocation for interactive resource management
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
OUTER LOOP: BLACK BOX OPTIMIZATION
Outer loop
Metric τ
Search Method
hparams h
Inner loop
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
. . .
worker1 worker2 workerN
Data
Synchronization
1 2 N
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
OUTER LOOP: BLACK BOX OPTIMIZATION
Outer loop
Metric τ
Search Method
hparams h
Inner loop
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
. . .
worker1 worker2 workerN
Data
Synchronization
1 2 N
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
OUTER LOOP: BLACK BOX OPTIMIZATION




x1
...
xn




Features
Hyperparameters
η, num_layers, neurons
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Model θ
ˆy
Prediction
L(y, ˆy)
Loss
Gradient
θL(y, ˆy)
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
OUTER LOOP: BLACK BOX OPTIMIZATION
Example Use-Case from one of our clients:
Goal: Train a One-Class GAN model for fraud detection
Problem: GANs are extremely sensitive to
hyperparameters and there exists a very large space of
possible hyperparameters.
Example hyperparameters to tune: learning rates η,
optimizers, layers.. etc.
Real input x
Random Noise z
Generator
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Discriminator
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
OUTER LOOP: BLACK BOX OPTIMIZATION
Num Neurons/Layer
25
30
35
40
45 Num
Layers
2
4
6
8
10
12
LearningRate
0.00
0.02
0.04
0.06
0.08
0.10
Search Space




x1
...
xn




Features
Hyperparameters
η, num_layers, neurons
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Model θ
ˆy
Prediction
L(y, ˆy)
Loss
Gradient
θL(y, ˆy)
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
OUTER LOOP: BLACK BOX OPTIMIZATION
Num Neurons/Layer
25
30
35
40
45 Num
Layers
2
4
6
8
10
12
LearningRate
0.00
0.02
0.04
0.06
0.08
0.10
Search Space
η1, ..
η2, ..
η3, ..
η4, ..
η5, ..
Shared Task Queue Parallel Workers
w1
w1
w1
w1




x1
...
xn




Features
Hyperparameters
η, num_layers, neurons
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Model θ
ˆy
Prediction
L(y, ˆy)
Loss
Gradient
θL(y, ˆy)
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
OUTER LOOP: BLACK BOX OPTIMIZATION
Num Neurons/Layer
25
30
35
40
45 Num
Layers
2
4
6
8
10
12
LearningRate
0.00
0.02
0.04
0.06
0.08
0.10
Search Space
η1, ..
η2, ..
η3, ..
η4, ..
η5, ..
Shared Task Queue Parallel Workers
w1
w1
w1
w1




x1
...
xn




Features
Hyperparameters
η, num_layers, neurons
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Model θ
ˆy
Prediction
L(y, ˆy)
Loss
Gradient
θL(y, ˆy)
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
OUTER LOOP: BLACK BOX OPTIMIZATION
Num Neurons/Layer
25
30
35
40
45 Num
Layers
2
4
6
8
10
12
LearningRate
0.00
0.02
0.04
0.06
0.08
0.10
Search Space
η1, ..
η2, ..
η3, ..
η4, ..
η5, ..
Shared Task Queue Parallel Workers
w1
w1
w1
w1
Which algorithm to use for search?




x1
...
xn




Features
Hyperparameters
η, num_layers, neurons
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Model θ
ˆy
Prediction
L(y, ˆy)
Loss
Gradient
θL(y, ˆy)
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
OUTER LOOP: BLACK BOX OPTIMIZATION
Num Neurons/Layer
25
30
35
40
45 Num
Layers
2
4
6
8
10
12
LearningRate
0.00
0.02
0.04
0.06
0.08
0.10
Search Space
η1, ..
η2, ..
η3, ..
η4, ..
η5, ..
Shared Task Queue Parallel Workers
w1
w1
w1
w1
How to monitor progress?
Which algorithm to use for search?




x1
...
xn




Features
Hyperparameters
η, num_layers, neurons
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Model θ
ˆy
Prediction
L(y, ˆy)
Loss
Gradient
θL(y, ˆy)
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
OUTER LOOP: BLACK BOX OPTIMIZATION
Num Neurons/Layer
25
30
35
40
45 Num
Layers
2
4
6
8
10
12
LearningRate
0.00
0.02
0.04
0.06
0.08
0.10
Search Space
η1, ..
η2, ..
η3, ..
η4, ..
η5, ..
Shared Task Queue Parallel Workers
w1
w1
w1
w1
How to aggregate results?
How to monitor progress?
Which algorithm to use for search?




x1
...
xn




Features
Hyperparameters
η, num_layers, neurons
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Model θ
ˆy
Prediction
L(y, ˆy)
Loss
Gradient
θL(y, ˆy)
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
OUTER LOOP: BLACK BOX OPTIMIZATION
Num Neurons/Layer
25
30
35
40
45 Num
Layers
2
4
6
8
10
12
LearningRate
0.00
0.02
0.04
0.06
0.08
0.10
Search Space
η1, ..
η2, ..
η3, ..
η4, ..
η5, ..
Shared Task Queue Parallel Workers
w1
w1
w1
w1
How to aggregate results?
How to monitor progress?
Which algorithm to use for search?
Fault Tolerance?




x1
...
xn




Features
Hyperparameters
η, num_layers, neurons
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Model θ
ˆy
Prediction
L(y, ˆy)
Loss
Gradient
θL(y, ˆy)
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
OUTER LOOP: BLACK BOX OPTIMIZATION
Num Neurons/Layer
25
30
35
40
45 Num
Layers
2
4
6
8
10
12
LearningRate
0.00
0.02
0.04
0.06
0.08
0.10
Search Space
η1, ..
η2, ..
η3, ..
η4, ..
η5, ..
Shared Task Queue Parallel Workers
w1
w1
w1
w1
How to aggregate results?
How to monitor progress?
Which algorithm to use for search?
Fault Tolerance?




x1
...
xn




Features
Hyperparameters
η, num_layers, neurons
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Model θ
ˆy
Prediction
L(y, ˆy)
Loss
Gradient
θL(y, ˆy)
This should be managed with platform support!
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
PARALLEL EXPERIMENTS
from hops import experiment
experiment.random_search(train_fn)
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
ASYNCHRONOUS SEARCH WORKFLOW
λ Suggestionsb0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Trial
Metric
α
λ Suggestions
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Trial
Metric
α
λ Suggestions
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Trial
Metric
α
Workers Coordinator
Global Task Queue
0 20 40 60 80 100
Epochs
Accuracy
lr=0.0021,layers=5
lr=0.01,layers=2
lr=0.01,layers=10
lr=0.001,layers=15
lr=0.001,layers=25
lr=0.019,layers=5
lr=0.001,layers=7
lr=0.01,layers=4
lr=0.0014,layers=3
lr=0.05,layers=1
Trials Progress
Black-Box Optimziers
minx f(x)
x ∈ S
Suggested tasks
Results
Suggested tasks
Results
Suggested tasks
Results
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
ASYNCHRONOUS SEARCH WORKFLOW
λ Suggestionsb0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Trial
Metric
α
λ Suggestions
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Trial
Metric
α
λ Suggestions
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Trial
Metric
α
Workers Coordinator
Global Task Queue
0 20 40 60 80 100
Epochs
Accuracy
lr=0.0021,layers=5
lr=0.01,layers=2
lr=0.01,layers=10
lr=0.001,layers=15
lr=0.001,layers=25
lr=0.019,layers=5
lr=0.001,layers=7
lr=0.01,layers=4
lr=0.0014,layers=3
lr=0.05,layers=1
Trials Progress
Black-Box Optimziers
minx f(x)
x ∈ S
Suggested tasks
Results
Heartbeats
Suggested tasks
Results
Heartbeats
Suggested tasks
Results
Heartbeats
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
ASYNCHRONOUS SEARCH WORKFLOW
λ Suggestionsb0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Trial
Metric
α
λ Suggestions
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Trial
Metric
α
λ Suggestions
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Trial
Metric
α
Workers Coordinator
Global Task Queue
0 20 40 60 80 100
Epochs
Accuracy
lr=0.0021,layers=5
lr=0.01,layers=2
lr=0.01,layers=10
lr=0.001,layers=15
lr=0.001,layers=25
lr=0.019,layers=5
lr=0.001,layers=7
lr=0.01,layers=4
lr=0.0014,layers=3
lr=0.05,layers=1
Trials Progress
Black-Box Optimziers
minx f(x)
x ∈ S
Suggested tasks
Results
Early Stop
Heartbeats
Suggested tasks
Results
Early Stop
Heartbeats
Suggested tasks
Results
Heartbeats
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
ASYNCHRONOUS SEARCH WORKFLOW
λ Suggestions
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Trial
Metric
α
λ Suggestions
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Trial
Metric
α
λ Suggestions
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Trial
Metric
α
Workers Coordinator
Global Task Queue
0 20 40 60 80 100
Epochs
Accuracy
lr=0.0021,layers=5
lr=0.01,layers=2
lr=0.01,layers=10
lr=0.001,layers=15
lr=0.001,layers=25
lr=0.019,layers=5
lr=0.001,layers=7
lr=0.01,layers=4
lr=0.0014,layers=3
lr=0.05,layers=1
Trials Progress
Black-Box Optimziers
minx f(x)
x ∈ S
Suggested tasks
Results
Early Stop
Heartbeats
Suggested tasks
Results
Early Stop
Heartbeats
Suggested tasks
Results
Heartbeats
Checkpoints
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
INNER AND OUTER LOOP OF LARGE SCALE DEEP
LEARNING
Outer loop
Metric τ
Search Method
hparams h
Inner loop
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
b1
x1,1
x1,2
x1,3
ˆy
. . .
worker1 worker2 workerN
Data
Synchronization
1 2 N
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
FEATURE STORE
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
FEATURE STORE
ϕ(x)




y1
...
yn








x1,1 . . . x1,n
... . . .
...
xn,1 . . . xn,n




ˆy
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
FEATURE STORE
ϕ(x)




y1
...
yn








x1,1 . . . x1,n
... . . .
...
xn,1 . . . xn,n




ˆy
?
_
_(
")
)_/
_
2em16
scaling_michelangelo.
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
FEATURE STORE
ϕ(x)




y1
...
yn








x1,1 . . . x1,n
... . . .
...
xn,1 . . . xn,n




ˆy
?
_
_(
")
)_/
_
“Data is the hardest part of ML and the most important
piece to get right.
Modelers spend most of their time selecting and
transforming features at training time and then building
the pipelines to deliver those features to production models.”
- Uber6
2em16
scaling_michelangelo.
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
FEATURE STORE
ϕ(x)




y1
...
yn








x1,1 . . . x1,n
... . . .
...
xn,1 . . . xn,n




ˆyFeature Store
“Data is the hardest part of ML and the most important
piece to get right.
Modelers spend most of their time selecting and
transforming features at training time and then building
the pipelines to deliver those features to production models.”
- Uber7
2em17
scaling_michelangelo.
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
WHAT IS A FEATURE?
A feature is a measurable property of some data-sample
A feature could be..
An aggregate value (min, max, mean, sum)
A raw value (a pixel, a word from a piece of text)
A value from a database table (the age of a customer)
A derived representation: e.g an embedding or a cluster
Features are the fuel for AI systems:




x1
...
xn




Features
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Model θ
ˆy
Prediction
L(y, ˆy)
Loss
Gradient
θL(y, ˆy)
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
FEATURE ENGINEERING IS CRUCIAL FOR MODEL
PERFORMANCE
x1• • • • • •◦ ◦ ◦
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
FEATURE ENGINEERING IS CRUCIAL FOR MODEL
PERFORMANCE
x1
x2
•
•
• •
•
•
◦
◦
◦
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
FEATURE ENGINEERING IS CRUCIAL FOR MODEL
PERFORMANCE
x1
x2
•
•
• •
•
•
◦
◦
◦
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
DISENTANGLE YOUR ML PIPELINES WITH A FEATURE
STORE
Data Sources Dataset 1 Dataset 2 . . . Dataset n
Feature Store
Feature Store
A data management platform for machine learning.
The interface between data engineering and data science.
Models
Models are trained using sets of features.
The features are fetched from the feature store
and can overlap between models.
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy≥ 0.9 < 0.9
≥ 0.2
< 0.2
≥ 11.2
< 11.2
B B
A
(−1, −1) (−8, −8)(−10, 0) (0, −10)
40 60 80 100
160
180
200
X
Y
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
DISENTANGLE YOUR ML PIPELINES WITH A FEATURE
STORE
Dataset 1 Dataset 2 . . . Dataset n
Feature Store
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy≥ 0.9 < 0.9
≥ 0.2
< 0.2
≥ 11.2
< 11.2
B B
A
(−1, −1) (−8, −8)(−10, 0) (0, −10)
40 60 80 100
160
180
200
X
Y
Backfilling
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
DISENTANGLE YOUR ML PIPELINES WITH A FEATURE
STORE
Dataset 1 Dataset 2 . . . Dataset n
Feature Store
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy≥ 0.9 < 0.9
≥ 0.2
< 0.2
≥ 11.2
< 11.2
B B
A
(−1, −1) (−8, −8)(−10, 0) (0, −10)
40 60 80 100
160
180
200
X
Y
Backfilling
Analysis
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
DISENTANGLE YOUR ML PIPELINES WITH A FEATURE
STORE
Dataset 1 Dataset 2 . . . Dataset n
Feature Store
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy≥ 0.9 < 0.9
≥ 0.2
< 0.2
≥ 11.2
< 11.2
B B
A
(−1, −1) (−8, −8)(−10, 0) (0, −10)
40 60 80 100
160
180
200
X
Y
Backfilling
Analysis
Versioning
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
DISENTANGLE YOUR ML PIPELINES WITH A FEATURE
STORE
Dataset 1 Dataset 2 . . . Dataset n
Feature Store
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy≥ 0.9 < 0.9
≥ 0.2
< 0.2
≥ 11.2
< 11.2
B B
A
(−1, −1) (−8, −8)(−10, 0) (0, −10)
40 60 80 100
160
180
200
X
Y
Backfilling
Analysis
Versioning
Documentation
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
SUMMARY
Deep Learning is going distributed
Algorithms for DDL are available in several frameworks
Applying DDL in practice brings a lot of operational
complexity
Hopsworks is a platform for scale out deep learning and
big data processing
Hopsworks makes DDL simpler by providing simple
abstractions for distributed training, parallel experiments
and much more..
@hopshadoop
www.hops.io
@logicalclocks
www.logicalclocks.com
We are open source:
https://github.com/logicalclocks/hopsworks
https://github.com/hopshadoop/hops
Thanks to Logical Clocks Team: Jim Dowling, Seif Haridi, Theo Kakantousis, Fabio Buso,
Gautier Berthou, Ermias Gebremeskel, Mahmoud Ismail, Salman Niazi, Antonios Kouzoupis, Robin Andersson,
Alex Ormenisan, Rasmus Toivonen and Steffen Grohsschmiedt.
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
Feature
Computation
Raw/Structured Data
Data Lake
Feature Store
Curated Features
Model
b0
x0,1
x0,2
x0,3
b1
x1,1
x1,2
x1,3
ˆy
Demo-Setting
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
Hands-on Workshop
1. If you haven’t registered, do it now on hops.site
2. Cheatsheet: http://snurran.sics.se/hops/
kim/workshop_cheat.txt
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
EXERCISE 1 (HELLO HOPSWORKS)
1. Create a Deep Learning Tour Project on Hopsworks
2. Start a Jupyter Notebook with the config:
“Experiment” Mode
1 GPU
4000 (MB) memory for the driver (appmaster)
8000 (MB) memory for the executor
Rest can be default
3. Create a new “PySpark” notebook
4. In the first cell, write:
print("Hello Hopsworks")
5. Execute the cell (Ctrl + <Enter>)
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
EXERCISE 2 (DISTRIBUTED HELLO HOPSWORKS WITH
GPU)
1. Add a new cell with the contents:
def executor():
print("Hello from GPU")
2. Add a new cell with the contents:
from hops import experiment
experiment.launch(executor)
3. Execute the two cells in order (Ctrl + <Enter>)
4. Go to the Application UI
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
EXERCISE 2 (DISTRIBUTED HELLO HOPSWORKS WITH
GPU)
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
EXERCISE 2 (DISTRIBUTED HELLO HOPSWORKS WITH
GPU)
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
EXERCISE 3 (LOAD MNIST FROM HOPSFS)
1. Add a new cell with the contents:
from hops import hdfs
import tensorflow as tf
def create_tf_dataset():
train_files = [hdfs.project_path() +
"TestJob/data/mnist/train/train.tfrecords"]
dataset = tf.data.TFRecordDataset(train_files)
def decode(example):
example = tf.parse_single_example(example ,{
’image_raw’: tf.FixedLenFeature([], tf.string),
’label’: tf.FixedLenFeature([], tf.int64)})
image = tf.reshape(tf.decode_raw(example[’image_raw’],
tf.uint8), (28,28,1))
label = tf.one_hot(tf.cast(example[’label’], tf.int32), 10)
return image, label
return dataset.map(decode).batch(128).repeat()
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
EXERCISE 3 (LOAD MNIST FROM HOPSFS)
1. Add a new cell with the contents:
create_tf_dataset()
2. Execute the two cells in order (Ctrl + <Enter>)
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
EXERCISE 4 (DEFINE CNN MODEL)
from tensorflow import keras
def create_model():
model = keras.Sequential()
model.add(keras.layers.Conv2D(filters=32, kernel_size=3, padding=’same’,
activation=’relu’, input_shape=(28,28,1)))
model.add(keras.layers.BatchNormalization())
model.add(keras.layers.MaxPooling2D(pool_size=2))
model.add(keras.layers.Dropout(0.3))
model.add(keras.layers.Conv2D(filters=64, kernel_size=3,
padding=’same’, activation=’relu’))
model.add(keras.layers.BatchNormalization())
model.add(keras.layers.MaxPooling2D(pool_size=2))
model.add(keras.layers.Dropout(0.3))
model.add(keras.layers.Flatten())
model.add(keras.layers.Dense(128, activation=’relu’))
model.add(keras.layers.Dropout(0.5))
model.add(keras.layers.Dense(10, activation=’softmax’))
return model
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
EXERCISE 4 (DEFINE CNN MODEL)
1. Add a new cell with the contents:
create_model().summary()
2. Execute the two cells in order (Ctrl + <Enter>)
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
EXERCISE 5 (DEFINE & RUN THE EXPERIMENT)
1. Add a new cell with the contents:
from hops import tensorboard
from tensorflow.python.keras.callbacks import TensorBoard
def train_fn():
dataset = create_tf_dataset()
model = create_model()
model.compile(loss=keras.losses.categorical_crossentropy ,
optimizer=keras.optimizers.Adam(),metrics=[’accuracy’])
tb_callback = TensorBoard(log_dir=tensorboard.logdir())
model_ckpt_callback = keras.callbacks.ModelCheckpoint(
tensorboard.logdir(), monitor=’acc’)
history = model.fit(dataset , epochs=50,
steps_per_epoch=80, callbacks=[tb_callback])
return history.history["acc"][−1]
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
EXERCISE 5 (DEFINE & RUN THE EXPERIMENT)
1. Add a new cell with the contents:
experiment.launch(train_fn)
2. Execute the two cells in order (Ctrl + <Enter>)
3. Go to the Application UI and monitor the training progress
INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
REFERENCES
Example notebooks https:
//github.com/logicalclocks/hops-examples
HopsML8
Hopsworks9
Hopsworks’ feature store10
Maggy
https://github.com/logicalclocks/maggy
2em18
Logical Clocks AB. HopsML: Python-First ML Pipelines. https : / / hops . readthedocs . io / en /
latest/hopsml/hopsML.html. 2018.
2em19
Jim Dowling. Introducing Hopsworks. https : / / www . logicalclocks . com / introducing -
hopsworks/. 2018.
2em110
Kim Hammar and Jim Dowling. Feature Store: the missing data layer in ML pipelines? https://www.
logicalclocks.com/feature-store/. 2018.

Más contenido relacionado

Similar a CGI trainees workshop Distributed Deep Learning, 24/5 2019, Kim Hammar

April 2016 HUG: CaffeOnSpark: Distributed Deep Learning on Spark Clusters
April 2016 HUG: CaffeOnSpark: Distributed Deep Learning on Spark ClustersApril 2016 HUG: CaffeOnSpark: Distributed Deep Learning on Spark Clusters
April 2016 HUG: CaffeOnSpark: Distributed Deep Learning on Spark Clusters
Yahoo Developer Network
 

Similar a CGI trainees workshop Distributed Deep Learning, 24/5 2019, Kim Hammar (20)

DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
 
The Data Janitor Returns | Daniel Molnar | DN18
The Data Janitor Returns | Daniel Molnar | DN18The Data Janitor Returns | Daniel Molnar | DN18
The Data Janitor Returns | Daniel Molnar | DN18
 
2019 05 11 Chicago Codecamp - Deep Learning for everyone? Challenge Accepted!
2019 05 11 Chicago Codecamp - Deep Learning for everyone? Challenge Accepted!2019 05 11 Chicago Codecamp - Deep Learning for everyone? Challenge Accepted!
2019 05 11 Chicago Codecamp - Deep Learning for everyone? Challenge Accepted!
 
Getting Started with Hadoop
Getting Started with HadoopGetting Started with Hadoop
Getting Started with Hadoop
 
Project Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare MetalProject Tungsten: Bringing Spark Closer to Bare Metal
Project Tungsten: Bringing Spark Closer to Bare Metal
 
End-to-End Platform Support for Distributed Deep Learning in Finance
End-to-End Platform Support for Distributed Deep Learning in FinanceEnd-to-End Platform Support for Distributed Deep Learning in Finance
End-to-End Platform Support for Distributed Deep Learning in Finance
 
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud Using Intel Analytics Zoo & Alluxio
 
Large data with Scikit-learn - Boston Data Mining Meetup - Alex Perrier
Large data with Scikit-learn - Boston Data Mining Meetup  - Alex PerrierLarge data with Scikit-learn - Boston Data Mining Meetup  - Alex Perrier
Large data with Scikit-learn - Boston Data Mining Meetup - Alex Perrier
 
Scaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter ExperienceScaling Big Data Mining Infrastructure Twitter Experience
Scaling Big Data Mining Infrastructure Twitter Experience
 
DBXTalk - Smalltalks 2011
DBXTalk - Smalltalks 2011DBXTalk - Smalltalks 2011
DBXTalk - Smalltalks 2011
 
April 2016 HUG: CaffeOnSpark: Distributed Deep Learning on Spark Clusters
April 2016 HUG: CaffeOnSpark: Distributed Deep Learning on Spark ClustersApril 2016 HUG: CaffeOnSpark: Distributed Deep Learning on Spark Clusters
April 2016 HUG: CaffeOnSpark: Distributed Deep Learning on Spark Clusters
 
Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)Deep Dive on Deep Learning (June 2018)
Deep Dive on Deep Learning (June 2018)
 
The deep learning tour - Q1 2017
The deep learning tour - Q1 2017 The deep learning tour - Q1 2017
The deep learning tour - Q1 2017
 
Bids talk 9.18
Bids talk 9.18Bids talk 9.18
Bids talk 9.18
 
Toulouse Data Science meetup - Apache zeppelin
Toulouse Data Science meetup - Apache zeppelinToulouse Data Science meetup - Apache zeppelin
Toulouse Data Science meetup - Apache zeppelin
 
Fabric for Deep Learning
Fabric for Deep LearningFabric for Deep Learning
Fabric for Deep Learning
 
Fast and Scalable Python
Fast and Scalable PythonFast and Scalable Python
Fast and Scalable Python
 
Microservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCONMicroservices Application Tracing Standards and Simulators - Adrians at OSCON
Microservices Application Tracing Standards and Simulators - Adrians at OSCON
 
Kim Hammar - Distributed Deep Learning - RISE Learning Machines Meetup
Kim Hammar - Distributed Deep Learning - RISE Learning Machines MeetupKim Hammar - Distributed Deep Learning - RISE Learning Machines Meetup
Kim Hammar - Distributed Deep Learning - RISE Learning Machines Meetup
 
Introduction of DiscoGAN
Introduction of DiscoGANIntroduction of DiscoGAN
Introduction of DiscoGAN
 

Más de Kim Hammar

Automated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesAutomated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jectures
Kim Hammar
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Kim Hammar
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Kim Hammar
 
Learning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via DecompositionLearning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via Decomposition
Kim Hammar
 

Más de Kim Hammar (20)

Automated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jecturesAutomated Security Response through Online Learning with Adaptive Con jectures
Automated Security Response through Online Learning with Adaptive Con jectures
 
Självlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTHSjälvlärande System för Cybersäkerhet. KTH
Självlärande System för Cybersäkerhet. KTH
 
Learning Automated Intrusion Response
Learning Automated Intrusion ResponseLearning Automated Intrusion Response
Learning Automated Intrusion Response
 
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback ControlIntrusion Tolerance for Networked Systems through Two-level Feedback Control
Intrusion Tolerance for Networked Systems through Two-level Feedback Control
 
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
Gamesec23 - Scalable Learning of Intrusion Response through Recursive Decompo...
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
 
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
Learning Near-Optimal Intrusion Responses for IT Infrastructures via Decompos...
 
Learning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via DecompositionLearning Optimal Intrusion Responses via Decomposition
Learning Optimal Intrusion Responses via Decomposition
 
Digital Twins for Security Automation
Digital Twins for Security AutomationDigital Twins for Security Automation
Digital Twins for Security Automation
 
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
Learning Near-Optimal Intrusion Response for Large-Scale IT Infrastructures v...
 
Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.Självlärande system för cyberförsvar.
Självlärande system för cyberförsvar.
 
Intrusion Response through Optimal Stopping
Intrusion Response through Optimal StoppingIntrusion Response through Optimal Stopping
Intrusion Response through Optimal Stopping
 
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
CNSM 2022 - An Online Framework for Adapting Security Policies in Dynamic IT ...
 
Self-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber DefenseSelf-Learning Systems for Cyber Defense
Self-Learning Systems for Cyber Defense
 
Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.Self-learning Intrusion Prevention Systems.
Self-learning Intrusion Prevention Systems.
 
Learning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal StoppingLearning Security Strategies through Game Play and Optimal Stopping
Learning Security Strategies through Game Play and Optimal Stopping
 
Intrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal StoppingIntrusion Prevention through Optimal Stopping
Intrusion Prevention through Optimal Stopping
 
Intrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-PlayIntrusion Prevention through Optimal Stopping and Self-Play
Intrusion Prevention through Optimal Stopping and Self-Play
 
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
Introduktion till försvar mot nätverksintrång. 22 Feb 2022. EP1200 KTH.
 
Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.Intrusion Prevention through Optimal Stopping.
Intrusion Prevention through Optimal Stopping.
 

Último

Último (20)

Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Ransomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdfRansomware_Q4_2023. The report. [EN].pdf
Ransomware_Q4_2023. The report. [EN].pdf
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data DiscoveryTrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
TrustArc Webinar - Unlock the Power of AI-Driven Data Discovery
 
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
Apidays Singapore 2024 - Scalable LLM APIs for AI and Generative AI Applicati...
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 

CGI trainees workshop Distributed Deep Learning, 24/5 2019, Kim Hammar

  • 1. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP Distributed Deep Learning Using Hopsworks CGI Trainee Program Workshop Kim Hammar kim@logicalclocks.com
  • 2. Before we start.. 1. Register for an account at: www.hops.site 2. Follow the instructions at: http://bit.ly/2EnZQgW
  • 3. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP DISTRIBUTED COMPUTING + DEEP LEARNING = ? b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Distributed Computing Deep Learning Why Combine the two? 2em11 Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR abs/1707.02968 (2017). arXiv: 1707.02968. URL: http://arxiv.org/abs/1707.02968. 2em12 Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing Systems 25. Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231.
  • 4. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP DISTRIBUTED COMPUTING + DEEP LEARNING = ? b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Distributed Computing Deep Learning Why Combine the two? We like challenging problems 2em11 Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR abs/1707.02968 (2017). arXiv: 1707.02968. URL: http://arxiv.org/abs/1707.02968. 2em12 Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing Systems 25. Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231.
  • 5. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP DISTRIBUTED COMPUTING + DEEP LEARNING = ? b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Distributed Computing Deep Learning Why Combine the two? We like challenging problems More productive data science Unreasonable effectiveness of data1 To achieve state-of-the-art results2 2em11 Chen Sun et al. “Revisiting Unreasonable Effectiveness of Data in Deep Learning Era”. In: CoRR abs/1707.02968 (2017). arXiv: 1707.02968. URL: http://arxiv.org/abs/1707.02968. 2em12 Jeffrey Dean et al. “Large Scale Distributed Deep Networks”. In: Advances in Neural Information Processing Systems 25. Ed. by F. Pereira et al. Curran Associates, Inc., 2012, pp. 1223–1231.
  • 6. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP DISTRIBUTED DEEP LEARNING (DDL): PREDICTABLE SCALING 3 2em13 Jeff Dean. Building Intelligent Systems withLarge Scale Deep Learning. https : / / www . scribd . com / document/355752799/Jeff-Dean-s-Lecture-for-YC-AI. 2018.
  • 7. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP DISTRIBUTED DEEP LEARNING (DDL): PREDICTABLE SCALING
  • 8. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP DDL IS NOT A SECRET ANYMORE 4 2em14 Tal Ben-Nun and Torsten Hoefler. “Demystifying Parallel and Distributed Deep Learning: An In-Depth Concurrency Analysis”. In: CoRR abs/1802.09941 (2018). arXiv: 1802.09941. URL: http://arxiv.org/abs/ 1802.09941.
  • 9. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP DDL IS NOT A SECRET ANYMORE TensorflowOnSpark CaffeOnSpark Distributed TF Frameworks for DDL Companies using DDL
  • 10. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP DDL REQUIRES AN ENTIRE SOFTWARE/INFRASTRUCTURE STACK e1 e2 e3 Distributed Training e4 Gradient Gradient Gradient Gradient Distributed Systems Data Validation Feature Engineering Data Collection Hardware Management HyperParameter Tuning Model Serving Pipeline Management A/B Testing Monitoring
  • 11. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP OUTLINE 1. Hopsworks: Background of the platform 2. Managed Distributed Deep Learning using HopsYARN, HopsML, PySpark, and Tensorflow 3. Black-Box Optimization (Hyperparameter Tuning) using Hopsworks, Metadata Store, PySpark, and Maggy5 4. Feature Store data management for machine learning 5. Coffee Break 6. Demo, end-to-end ML pipeline 7. Hands-on Workshop, try out Hopsworks on our cluster in Luleå 2em15 Moritz Meister and Sina Sheikholeslami. Maggy. https://github.com/logicalclocks/maggy. 2019.
  • 12. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP HOPSWORKS
  • 13. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP HOPSWORKS HopsFS
  • 14. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP HOPSWORKS HopsFS HopsYARN (GPU/CPU as a resource)
  • 15. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP HOPSWORKS HopsFS HopsYARN (GPU/CPU as a resource) Frameworks (ML/Data)
  • 16. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP HOPSWORKS HopsFS HopsYARN (GPU/CPU as a resource) Frameworks (ML/Data) Feature Store Pipelines Experiments Models b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy ML/AI Assets
  • 17. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP HOPSWORKS HopsFS HopsYARN (GPU/CPU as a resource) Frameworks (ML/Data) Feature Store Pipelines Experiments Models b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy ML/AI Assets from hops import featurestore from hops import experiment featurestore.get_features([ "average_attendance", "average_player_age"]) experiment.collective_all_reduce(features , model) APIs
  • 18. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP HOPSWORKS HopsFS HopsYARN (GPU/CPU as a resource) Frameworks (ML/Data) Distributed Metadata (Available from REST API) Feature Store Pipelines Experiments Models b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy ML/AI Assets from hops import featurestore from hops import experiment featurestore.get_features([ "average_attendance", "average_player_age"]) experiment.collective_all_reduce(features , model) APIs
  • 19. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP INNER AND OUTER LOOP OF LARGE SCALE DEEP LEARNING Inner loop b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy . . . worker1 worker2 workerN Data Synchronization 1 2 N
  • 20. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP INNER AND OUTER LOOP OF LARGE SCALE DEEP LEARNING Outer loop Metric τ Search Method hparams h Inner loop b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy . . . worker1 worker2 workerN Data Synchronization 1 2 N
  • 21. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP INNER AND OUTER LOOP OF LARGE SCALE DEEP LEARNING Outer loop Metric τ Search Method hparams h Inner loop b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy . . . worker1 worker2 workerN Data Synchronization 1 2 N
  • 22. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP INNER LOOP: DISTRIBUTED DEEP LEARNING     x1 ... xn     Features b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Model θ ˆy Prediction L(y, ˆy) Loss Gradient θL(y, ˆy)
  • 23. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP INNER LOOP: DISTRIBUTED DEEP LEARNING e1 e2 e3 e4 p1 p2 p3 p4 Gradient Gradient Gradient Gradient Data Partition Data Partition Data Partition Data Partition
  • 24. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP DISTRIBUTED DEEP LEARNING IN PRACTICE Implementation of distributed algorithms is becoming a commodity (TF, PyTorch etc) The hardest part of DDL is now: Cluster management Allocating GPUs Data management Operations & performance ? Models GPUs Data Distribution b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy
  • 25. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP HOPSWORKS DDL SOLUTION
  • 26. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP HOPSWORKS DDL SOLUTION from hops import experiment experiment.collective_all_reduce(train_fn)
  • 27. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP HOPSWORKS DDL SOLUTION from hops import experiment experiment. collective_all_reduce( train_fn ) HopsYARN RM YARN container GPU as a resource YARN container GPU as a resource YARN container GPU as a resource YARN container GPU as a resource Resource requests Client API YARN container
  • 28. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP HOPSWORKS DDL SOLUTION from hops import experiment experiment. collective_all_reduce( train_fn ) HopsYARN RM YARN container GPU as a resource Spark executor YARN container GPU as a resource Spark executor YARN container GPU as a resource Spark executor YARN container GPU as a resource Spark executor Resource requests Client API YARN container Spark driver
  • 29. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP HOPSWORKS DDL SOLUTION from hops import experiment experiment. collective_all_reduce( train_fn ) HopsYARN RM YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor Resource requests Client API YARN container conda env Spark driver
  • 30. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP HOPSWORKS DDL SOLUTION from hops import experiment experiment. collective_all_reduce( train_fn ) HopsYARN RM YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor Resource requests Client API Here is my ip: 192.168.1.1 Here is my ip: 192.168.1.2 Here is my ip: 192.168.1.3 Here is my ip: 192.168.1.4 YARN container conda env Spark driver
  • 31. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP HOPSWORKS DDL SOLUTION from hops import experiment experiment. collective_all_reduce( train_fn ) HopsYARN RM YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor Resource requests Client API Gradient Gradient GradientGradient YARN container conda env Spark driver Hops Distributed File System (HopsFS)
  • 32. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP HOPSWORKS DDL SOLUTION from hops import experiment experiment. collective_all_reduce( train_fn ) HopsYARN RM YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor YARN container GPU as a resource conda env Spark executor Resource requests Client API Gradient Gradient GradientGradient YARN container conda env Spark driver Hops Distributed File System (HopsFS) Hide complexity behind simple API Allocate resources using pyspark Allocate GPUs for spark executors using HopsYARN Serve sharded training data to workers from HopsFS Use HopsFS for aggregating logs, checkpoints and results Store experiment metadata in metastore Use dynamic allocation for interactive resource management
  • 33. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP OUTER LOOP: BLACK BOX OPTIMIZATION Outer loop Metric τ Search Method hparams h Inner loop b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy . . . worker1 worker2 workerN Data Synchronization 1 2 N
  • 34. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP OUTER LOOP: BLACK BOX OPTIMIZATION Outer loop Metric τ Search Method hparams h Inner loop b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy . . . worker1 worker2 workerN Data Synchronization 1 2 N
  • 35. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP OUTER LOOP: BLACK BOX OPTIMIZATION     x1 ... xn     Features Hyperparameters η, num_layers, neurons b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Model θ ˆy Prediction L(y, ˆy) Loss Gradient θL(y, ˆy)
  • 36. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP OUTER LOOP: BLACK BOX OPTIMIZATION Example Use-Case from one of our clients: Goal: Train a One-Class GAN model for fraud detection Problem: GANs are extremely sensitive to hyperparameters and there exists a very large space of possible hyperparameters. Example hyperparameters to tune: learning rates η, optimizers, layers.. etc. Real input x Random Noise z Generator b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Discriminator b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy
  • 37. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP OUTER LOOP: BLACK BOX OPTIMIZATION Num Neurons/Layer 25 30 35 40 45 Num Layers 2 4 6 8 10 12 LearningRate 0.00 0.02 0.04 0.06 0.08 0.10 Search Space     x1 ... xn     Features Hyperparameters η, num_layers, neurons b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Model θ ˆy Prediction L(y, ˆy) Loss Gradient θL(y, ˆy)
  • 38. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP OUTER LOOP: BLACK BOX OPTIMIZATION Num Neurons/Layer 25 30 35 40 45 Num Layers 2 4 6 8 10 12 LearningRate 0.00 0.02 0.04 0.06 0.08 0.10 Search Space η1, .. η2, .. η3, .. η4, .. η5, .. Shared Task Queue Parallel Workers w1 w1 w1 w1     x1 ... xn     Features Hyperparameters η, num_layers, neurons b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Model θ ˆy Prediction L(y, ˆy) Loss Gradient θL(y, ˆy)
  • 39. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP
  • 40. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP OUTER LOOP: BLACK BOX OPTIMIZATION Num Neurons/Layer 25 30 35 40 45 Num Layers 2 4 6 8 10 12 LearningRate 0.00 0.02 0.04 0.06 0.08 0.10 Search Space η1, .. η2, .. η3, .. η4, .. η5, .. Shared Task Queue Parallel Workers w1 w1 w1 w1     x1 ... xn     Features Hyperparameters η, num_layers, neurons b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Model θ ˆy Prediction L(y, ˆy) Loss Gradient θL(y, ˆy)
  • 41. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP OUTER LOOP: BLACK BOX OPTIMIZATION Num Neurons/Layer 25 30 35 40 45 Num Layers 2 4 6 8 10 12 LearningRate 0.00 0.02 0.04 0.06 0.08 0.10 Search Space η1, .. η2, .. η3, .. η4, .. η5, .. Shared Task Queue Parallel Workers w1 w1 w1 w1 Which algorithm to use for search?     x1 ... xn     Features Hyperparameters η, num_layers, neurons b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Model θ ˆy Prediction L(y, ˆy) Loss Gradient θL(y, ˆy)
  • 42. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP OUTER LOOP: BLACK BOX OPTIMIZATION Num Neurons/Layer 25 30 35 40 45 Num Layers 2 4 6 8 10 12 LearningRate 0.00 0.02 0.04 0.06 0.08 0.10 Search Space η1, .. η2, .. η3, .. η4, .. η5, .. Shared Task Queue Parallel Workers w1 w1 w1 w1 How to monitor progress? Which algorithm to use for search?     x1 ... xn     Features Hyperparameters η, num_layers, neurons b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Model θ ˆy Prediction L(y, ˆy) Loss Gradient θL(y, ˆy)
  • 43. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP OUTER LOOP: BLACK BOX OPTIMIZATION Num Neurons/Layer 25 30 35 40 45 Num Layers 2 4 6 8 10 12 LearningRate 0.00 0.02 0.04 0.06 0.08 0.10 Search Space η1, .. η2, .. η3, .. η4, .. η5, .. Shared Task Queue Parallel Workers w1 w1 w1 w1 How to aggregate results? How to monitor progress? Which algorithm to use for search?     x1 ... xn     Features Hyperparameters η, num_layers, neurons b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Model θ ˆy Prediction L(y, ˆy) Loss Gradient θL(y, ˆy)
  • 44. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP OUTER LOOP: BLACK BOX OPTIMIZATION Num Neurons/Layer 25 30 35 40 45 Num Layers 2 4 6 8 10 12 LearningRate 0.00 0.02 0.04 0.06 0.08 0.10 Search Space η1, .. η2, .. η3, .. η4, .. η5, .. Shared Task Queue Parallel Workers w1 w1 w1 w1 How to aggregate results? How to monitor progress? Which algorithm to use for search? Fault Tolerance?     x1 ... xn     Features Hyperparameters η, num_layers, neurons b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Model θ ˆy Prediction L(y, ˆy) Loss Gradient θL(y, ˆy)
  • 45. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP OUTER LOOP: BLACK BOX OPTIMIZATION Num Neurons/Layer 25 30 35 40 45 Num Layers 2 4 6 8 10 12 LearningRate 0.00 0.02 0.04 0.06 0.08 0.10 Search Space η1, .. η2, .. η3, .. η4, .. η5, .. Shared Task Queue Parallel Workers w1 w1 w1 w1 How to aggregate results? How to monitor progress? Which algorithm to use for search? Fault Tolerance?     x1 ... xn     Features Hyperparameters η, num_layers, neurons b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Model θ ˆy Prediction L(y, ˆy) Loss Gradient θL(y, ˆy) This should be managed with platform support!
  • 46. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP PARALLEL EXPERIMENTS from hops import experiment experiment.random_search(train_fn)
  • 47. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP ASYNCHRONOUS SEARCH WORKFLOW λ Suggestionsb0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Trial Metric α λ Suggestions b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Trial Metric α λ Suggestions b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Trial Metric α Workers Coordinator Global Task Queue 0 20 40 60 80 100 Epochs Accuracy lr=0.0021,layers=5 lr=0.01,layers=2 lr=0.01,layers=10 lr=0.001,layers=15 lr=0.001,layers=25 lr=0.019,layers=5 lr=0.001,layers=7 lr=0.01,layers=4 lr=0.0014,layers=3 lr=0.05,layers=1 Trials Progress Black-Box Optimziers minx f(x) x ∈ S Suggested tasks Results Suggested tasks Results Suggested tasks Results
  • 48. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP ASYNCHRONOUS SEARCH WORKFLOW λ Suggestionsb0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Trial Metric α λ Suggestions b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Trial Metric α λ Suggestions b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Trial Metric α Workers Coordinator Global Task Queue 0 20 40 60 80 100 Epochs Accuracy lr=0.0021,layers=5 lr=0.01,layers=2 lr=0.01,layers=10 lr=0.001,layers=15 lr=0.001,layers=25 lr=0.019,layers=5 lr=0.001,layers=7 lr=0.01,layers=4 lr=0.0014,layers=3 lr=0.05,layers=1 Trials Progress Black-Box Optimziers minx f(x) x ∈ S Suggested tasks Results Heartbeats Suggested tasks Results Heartbeats Suggested tasks Results Heartbeats
  • 49. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP ASYNCHRONOUS SEARCH WORKFLOW λ Suggestionsb0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Trial Metric α λ Suggestions b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Trial Metric α λ Suggestions b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Trial Metric α Workers Coordinator Global Task Queue 0 20 40 60 80 100 Epochs Accuracy lr=0.0021,layers=5 lr=0.01,layers=2 lr=0.01,layers=10 lr=0.001,layers=15 lr=0.001,layers=25 lr=0.019,layers=5 lr=0.001,layers=7 lr=0.01,layers=4 lr=0.0014,layers=3 lr=0.05,layers=1 Trials Progress Black-Box Optimziers minx f(x) x ∈ S Suggested tasks Results Early Stop Heartbeats Suggested tasks Results Early Stop Heartbeats Suggested tasks Results Heartbeats
  • 50. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP ASYNCHRONOUS SEARCH WORKFLOW λ Suggestions b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Trial Metric α λ Suggestions b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Trial Metric α λ Suggestions b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Trial Metric α Workers Coordinator Global Task Queue 0 20 40 60 80 100 Epochs Accuracy lr=0.0021,layers=5 lr=0.01,layers=2 lr=0.01,layers=10 lr=0.001,layers=15 lr=0.001,layers=25 lr=0.019,layers=5 lr=0.001,layers=7 lr=0.01,layers=4 lr=0.0014,layers=3 lr=0.05,layers=1 Trials Progress Black-Box Optimziers minx f(x) x ∈ S Suggested tasks Results Early Stop Heartbeats Suggested tasks Results Early Stop Heartbeats Suggested tasks Results Heartbeats Checkpoints
  • 51. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP INNER AND OUTER LOOP OF LARGE SCALE DEEP LEARNING Outer loop Metric τ Search Method hparams h Inner loop b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 b1 x1,1 x1,2 x1,3 ˆy . . . worker1 worker2 workerN Data Synchronization 1 2 N
  • 52. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP FEATURE STORE
  • 53. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP FEATURE STORE ϕ(x)     y1 ... yn         x1,1 . . . x1,n ... . . . ... xn,1 . . . xn,n     ˆy
  • 54. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP FEATURE STORE ϕ(x)     y1 ... yn         x1,1 . . . x1,n ... . . . ... xn,1 . . . xn,n     ˆy ? _ _( ") )_/ _ 2em16 scaling_michelangelo.
  • 55. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP FEATURE STORE ϕ(x)     y1 ... yn         x1,1 . . . x1,n ... . . . ... xn,1 . . . xn,n     ˆy ? _ _( ") )_/ _ “Data is the hardest part of ML and the most important piece to get right. Modelers spend most of their time selecting and transforming features at training time and then building the pipelines to deliver those features to production models.” - Uber6 2em16 scaling_michelangelo.
  • 56. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP FEATURE STORE ϕ(x)     y1 ... yn         x1,1 . . . x1,n ... . . . ... xn,1 . . . xn,n     ˆyFeature Store “Data is the hardest part of ML and the most important piece to get right. Modelers spend most of their time selecting and transforming features at training time and then building the pipelines to deliver those features to production models.” - Uber7 2em17 scaling_michelangelo.
  • 57. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP WHAT IS A FEATURE? A feature is a measurable property of some data-sample A feature could be.. An aggregate value (min, max, mean, sum) A raw value (a pixel, a word from a piece of text) A value from a database table (the age of a customer) A derived representation: e.g an embedding or a cluster Features are the fuel for AI systems:     x1 ... xn     Features b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Model θ ˆy Prediction L(y, ˆy) Loss Gradient θL(y, ˆy)
  • 58. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP FEATURE ENGINEERING IS CRUCIAL FOR MODEL PERFORMANCE x1• • • • • •◦ ◦ ◦
  • 59. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP FEATURE ENGINEERING IS CRUCIAL FOR MODEL PERFORMANCE x1 x2 • • • • • • ◦ ◦ ◦
  • 60. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP FEATURE ENGINEERING IS CRUCIAL FOR MODEL PERFORMANCE x1 x2 • • • • • • ◦ ◦ ◦
  • 61. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP DISENTANGLE YOUR ML PIPELINES WITH A FEATURE STORE Data Sources Dataset 1 Dataset 2 . . . Dataset n Feature Store Feature Store A data management platform for machine learning. The interface between data engineering and data science. Models Models are trained using sets of features. The features are fetched from the feature store and can overlap between models. b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy≥ 0.9 < 0.9 ≥ 0.2 < 0.2 ≥ 11.2 < 11.2 B B A (−1, −1) (−8, −8)(−10, 0) (0, −10) 40 60 80 100 160 180 200 X Y
  • 62. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP DISENTANGLE YOUR ML PIPELINES WITH A FEATURE STORE Dataset 1 Dataset 2 . . . Dataset n Feature Store b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy≥ 0.9 < 0.9 ≥ 0.2 < 0.2 ≥ 11.2 < 11.2 B B A (−1, −1) (−8, −8)(−10, 0) (0, −10) 40 60 80 100 160 180 200 X Y Backfilling
  • 63. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP DISENTANGLE YOUR ML PIPELINES WITH A FEATURE STORE Dataset 1 Dataset 2 . . . Dataset n Feature Store b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy≥ 0.9 < 0.9 ≥ 0.2 < 0.2 ≥ 11.2 < 11.2 B B A (−1, −1) (−8, −8)(−10, 0) (0, −10) 40 60 80 100 160 180 200 X Y Backfilling Analysis
  • 64. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP DISENTANGLE YOUR ML PIPELINES WITH A FEATURE STORE Dataset 1 Dataset 2 . . . Dataset n Feature Store b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy≥ 0.9 < 0.9 ≥ 0.2 < 0.2 ≥ 11.2 < 11.2 B B A (−1, −1) (−8, −8)(−10, 0) (0, −10) 40 60 80 100 160 180 200 X Y Backfilling Analysis Versioning
  • 65. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP DISENTANGLE YOUR ML PIPELINES WITH A FEATURE STORE Dataset 1 Dataset 2 . . . Dataset n Feature Store b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy≥ 0.9 < 0.9 ≥ 0.2 < 0.2 ≥ 11.2 < 11.2 B B A (−1, −1) (−8, −8)(−10, 0) (0, −10) 40 60 80 100 160 180 200 X Y Backfilling Analysis Versioning Documentation
  • 66. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP SUMMARY Deep Learning is going distributed Algorithms for DDL are available in several frameworks Applying DDL in practice brings a lot of operational complexity Hopsworks is a platform for scale out deep learning and big data processing Hopsworks makes DDL simpler by providing simple abstractions for distributed training, parallel experiments and much more.. @hopshadoop www.hops.io @logicalclocks www.logicalclocks.com We are open source: https://github.com/logicalclocks/hopsworks https://github.com/hopshadoop/hops Thanks to Logical Clocks Team: Jim Dowling, Seif Haridi, Theo Kakantousis, Fabio Buso, Gautier Berthou, Ermias Gebremeskel, Mahmoud Ismail, Salman Niazi, Antonios Kouzoupis, Robin Andersson, Alex Ormenisan, Rasmus Toivonen and Steffen Grohsschmiedt.
  • 67. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP Feature Computation Raw/Structured Data Data Lake Feature Store Curated Features Model b0 x0,1 x0,2 x0,3 b1 x1,1 x1,2 x1,3 ˆy Demo-Setting
  • 68. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP Hands-on Workshop 1. If you haven’t registered, do it now on hops.site 2. Cheatsheet: http://snurran.sics.se/hops/ kim/workshop_cheat.txt
  • 69. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP EXERCISE 1 (HELLO HOPSWORKS) 1. Create a Deep Learning Tour Project on Hopsworks 2. Start a Jupyter Notebook with the config: “Experiment” Mode 1 GPU 4000 (MB) memory for the driver (appmaster) 8000 (MB) memory for the executor Rest can be default 3. Create a new “PySpark” notebook 4. In the first cell, write: print("Hello Hopsworks") 5. Execute the cell (Ctrl + <Enter>)
  • 70. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP EXERCISE 2 (DISTRIBUTED HELLO HOPSWORKS WITH GPU) 1. Add a new cell with the contents: def executor(): print("Hello from GPU") 2. Add a new cell with the contents: from hops import experiment experiment.launch(executor) 3. Execute the two cells in order (Ctrl + <Enter>) 4. Go to the Application UI
  • 71. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP EXERCISE 2 (DISTRIBUTED HELLO HOPSWORKS WITH GPU)
  • 72. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP EXERCISE 2 (DISTRIBUTED HELLO HOPSWORKS WITH GPU)
  • 73. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP EXERCISE 3 (LOAD MNIST FROM HOPSFS) 1. Add a new cell with the contents: from hops import hdfs import tensorflow as tf def create_tf_dataset(): train_files = [hdfs.project_path() + "TestJob/data/mnist/train/train.tfrecords"] dataset = tf.data.TFRecordDataset(train_files) def decode(example): example = tf.parse_single_example(example ,{ ’image_raw’: tf.FixedLenFeature([], tf.string), ’label’: tf.FixedLenFeature([], tf.int64)}) image = tf.reshape(tf.decode_raw(example[’image_raw’], tf.uint8), (28,28,1)) label = tf.one_hot(tf.cast(example[’label’], tf.int32), 10) return image, label return dataset.map(decode).batch(128).repeat()
  • 74. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP EXERCISE 3 (LOAD MNIST FROM HOPSFS) 1. Add a new cell with the contents: create_tf_dataset() 2. Execute the two cells in order (Ctrl + <Enter>)
  • 75. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP EXERCISE 4 (DEFINE CNN MODEL) from tensorflow import keras def create_model(): model = keras.Sequential() model.add(keras.layers.Conv2D(filters=32, kernel_size=3, padding=’same’, activation=’relu’, input_shape=(28,28,1))) model.add(keras.layers.BatchNormalization()) model.add(keras.layers.MaxPooling2D(pool_size=2)) model.add(keras.layers.Dropout(0.3)) model.add(keras.layers.Conv2D(filters=64, kernel_size=3, padding=’same’, activation=’relu’)) model.add(keras.layers.BatchNormalization()) model.add(keras.layers.MaxPooling2D(pool_size=2)) model.add(keras.layers.Dropout(0.3)) model.add(keras.layers.Flatten()) model.add(keras.layers.Dense(128, activation=’relu’)) model.add(keras.layers.Dropout(0.5)) model.add(keras.layers.Dense(10, activation=’softmax’)) return model
  • 76. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP EXERCISE 4 (DEFINE CNN MODEL) 1. Add a new cell with the contents: create_model().summary() 2. Execute the two cells in order (Ctrl + <Enter>)
  • 77. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP EXERCISE 5 (DEFINE & RUN THE EXPERIMENT) 1. Add a new cell with the contents: from hops import tensorboard from tensorflow.python.keras.callbacks import TensorBoard def train_fn(): dataset = create_tf_dataset() model = create_model() model.compile(loss=keras.losses.categorical_crossentropy , optimizer=keras.optimizers.Adam(),metrics=[’accuracy’]) tb_callback = TensorBoard(log_dir=tensorboard.logdir()) model_ckpt_callback = keras.callbacks.ModelCheckpoint( tensorboard.logdir(), monitor=’acc’) history = model.fit(dataset , epochs=50, steps_per_epoch=80, callbacks=[tb_callback]) return history.history["acc"][−1]
  • 78. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP EXERCISE 5 (DEFINE & RUN THE EXPERIMENT) 1. Add a new cell with the contents: experiment.launch(train_fn) 2. Execute the two cells in order (Ctrl + <Enter>) 3. Go to the Application UI and monitor the training progress
  • 79. INTRO HOPSWORKS DISTRIBUTED DL BLACK-BOX OPTIMIZATION FEATURE STORE SUMMARY DEMO/WORKSHOP REFERENCES Example notebooks https: //github.com/logicalclocks/hops-examples HopsML8 Hopsworks9 Hopsworks’ feature store10 Maggy https://github.com/logicalclocks/maggy 2em18 Logical Clocks AB. HopsML: Python-First ML Pipelines. https : / / hops . readthedocs . io / en / latest/hopsml/hopsML.html. 2018. 2em19 Jim Dowling. Introducing Hopsworks. https : / / www . logicalclocks . com / introducing - hopsworks/. 2018. 2em110 Kim Hammar and Jim Dowling. Feature Store: the missing data layer in ML pipelines? https://www. logicalclocks.com/feature-store/. 2018.