HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019

AMD/ROCm for Hopsworks
HopsML 4th Meetup, Stockholm
June 4th, 2019
jim_dowling
CEO @ Logical Clocks
Assoc Prof @ KTH
robzor92

©2018 Logical Clocks AB. All Rights Reserved
Great Hedge of India
2
•East India Company was one of
the industrial world’s first monopolies.
•They assembled a thorny hedge
(not a wall!) spanning India.
•You paid customs duty to bring salt
over the wall (sorry, hedge).
In 2019, Nvidia GeForce graphics cards are allowed
to be used in a Data Center. Monoplies are not good
for deep learning!
[Image from Wikipedia]

Nvidia™ 2080Ti vs AMD Radeon™ VII ResNet-50 Benchmark
Nvidia™ 2080Ti
Memory: 11GB
TensorFlow 1.12
CUDA 10.0.130, cuDNN 7.4.1
Model: RESNET-50
Dataset: imagenet (synthetic)
------------------------------------------------------------
FP32 total images/sec: ~322
https://lambdalabs.com/blog/2080-ti-deep-learning-benchmarks/
https://www.phoronix.com/scan.php?page=article&item=nvidia-rtx2080ti-
tensorflow&num=2
3
https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/issues/173
AMD Radeon™ VII
Memory: 16 GB
TensorFlow 1.13.1
ROCm: 2.3
Model: RESNET-50
Dataset: imagenet (synthetic)
------------------------------------------------------------

4#UnifiedAnalytics #SparkAISummit
Latest Machine Learning
Frameworks
Dockers and Kubernetes
support
Optimized Math &
Communication Libraries
Up-Streamed for Linux
Kernel Distributions
Frameworks
Middleware and
Libraries
Eigen
Spark / Machine Learning Apps
Data Platform
Tools
ROCm
Fully Open Source ROCm Platform
OpenMP HIP OpenCL™ Python
Devices GPU CPU APU DLA
RCCL
BLAS, FFT,
RNG
MIOpen
O P E N S O U R C E
F O U N D A T I O N
F O R M A C H I N E
L E A R N I N G
A M D M L S O F T W A R E S T R A T E G Y

Linux Kernel 4.17
700+ upstream ROCm driver commits since 4.12 kernel
https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver
Distro: Upstream Linux Kernel Support

Programming Models
LLVM: https://llvm.org/docs/AMDGPUUsage.html
CLANG HIP: https://clang.llvm.org/doxygen/HIP_8h_source.html
OpenMP Python OpenCLHIP
LLVM -> AMDGCN Compiler
AMDGPU Code
LLVM
Languages: Multiple Programming options

ROCm Distributed Training
Optimized collective
communication operations
library
Easy MPI integration
Support for Infiniband and
RoCE highspeed network
fabrics
ROCm enabled UCX
ROCm w/ ROCnRDMA
RCCL
1,00X
1,99X
3,98X
7,64X
0,00X
1,00X
2,00X
3,00X
4,00X
5,00X
6,00X
7,00X
8,00X
RESNET50
Multi-GPU Scaling
(PCIe, CPU parameter-
server, 1/2/4/8 GPU)
1GPU 2GPU 4GPU 8GPU
ResNet-50

ROCm -> Spark / TensorFlow
• Spark / TensorFlow
applications run
unchanged on ROCm
• Hopsworks runs
Spark/TensorFlow on
YARN and Conda
9

Container
A Container
is a CGroup that
isolates CPU,
memory, and
GPU resources
and has a conda
environment and
TLS certs.
ContainerContainerContainer
YARN support for ROCm in Hops
Resource
Manager
Node
Manager
Node
Manager
Node
Manager
Executor ExecutorExecutorDriver

11
Distributed Deep Learning in Hopsworks
HopsFS (HDFS)
TensorBoard ModelsExperiments Training Data Logs
Executor 1 Executor N
Driver
conda_env
conda_env conda_env

Hyperparameter Optimization
12
# RUNS ON THE EXECUTORS
def train(lr, dropout):
def input_fn(): # return dataset
optimizer = …
model = …
model.add(Conv2D(…))
model.compile(…)
model.fit(…)
model.evaluate(…)
# RUNS ON THE DRIVER
Hparams= {‘lr’:[0.001, 0.0001],
‘dropout’: [0.25, 0.5, 0.75]}
experiment.grid_search(train,HParams)
https://github.com/logicalclocks/hops-examples
HopsFS (HDFS)
Driver
conda_env
conda_env conda_env

Distributed Training
13
# RUNS ON THE EXECUTORS
def train():
def input_fn(): # return dataset
model = …
optimizer = …
model.compile(…)
rc = tf.estimator.RunConfig(
‘CollectiveAllReduceStrategy’)
keras_estimator = tf.keras.estimator.
model_to_estimator(….)
tf.estimator.train_and_evaluate(
keras_estimator, input_fn)
# RUNS ON THE DRIVER
experiment.collective_all_reduce(train)
https://github.com/logicalclocks/hops-examples
HopsFS (HDFS)
Driver
conda_env
conda_env conda_env

Horizontally Scalable ML Pipelines with Hopsworks
Raw Data
Event Data
Monitor
HopsFS
Feature
Store Serving
Feature StoreData PrepIngest DeployExperiment/Train
Airflow
logs
logs
Metadata Store

ML Pipelines of Jupyter Notebooks
15
Select
Features,
File Format
Feature
Engineering
Validate &
Deploy Model
Experiment,
Train Model
Airflow
End-to-End ML Pipeline
Feature Backfill Pipeline Training and Deployment Pipeline
Feature
Store

Online Model Serving and Monitoring
16
16
Link Predictions with Outcomes to measure Model Performance
Feature
Store
2. Build Feature Vector
4. Log Prediction
Data Lake
Monitor
Model Serving
Images
Model
Server
Kubernetes
3. Make Prediction
Hopsworks
Request Response
1. Access Control
<<HTTPS>>

Summary
•Hopsworks now supports both Nvidia (cuda) and AMD
(ROCm) GPUs
-Hopsworks 0.10+
•New AMD GPUs will challenge Nvidia’s hegemony in DL
-Vega R7
-Navi architecture GPUs coming in July (RX 5700)
• 1.25x performance per clock and 1.5x performance per watt
• GDDR6 memory and support PCIe 4.0
17/32
https://databricks.com/session/rocm-and-distributed-deep-learning-on-spark-and-tensorflow

18
@logicalclocks
www.logicalclocks.com
Try it Out!
1. Register for an account at: www.hops.site
robzor92

HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019

Similar a HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019 (20)

Más de Jim Dowling

Más de Jim Dowling (20)

Último

Último (20)

HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019