Más contenido relacionado La actualidad más candente (20) Similar a HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019 (20) HopsML Meetup talk on Hopsworks + ROCm/AMD June 20192. ©2018 Logical Clocks AB. All Rights Reserved
Great Hedge of India
2
•East India Company was one of
the industrial world’s first monopolies.
•They assembled a thorny hedge
(not a wall!) spanning India.
•You paid customs duty to bring salt
over the wall (sorry, hedge).
In 2019, Nvidia GeForce graphics cards are allowed
to be used in a Data Center. Monoplies are not good
for deep learning!
[Image from Wikipedia]
3. ©2019 Logical Clocks AB. All Rights Reserved
Nvidia™ 2080Ti vs AMD Radeon™ VII ResNet-50 Benchmark
Nvidia™ 2080Ti
Memory: 11GB
TensorFlow 1.12
CUDA 10.0.130, cuDNN 7.4.1
Model: RESNET-50
Dataset: imagenet (synthetic)
------------------------------------------------------------
FP32 total images/sec: ~322
FP16 total images/sec: ~560
https://lambdalabs.com/blog/2080-ti-deep-learning-benchmarks/
https://www.phoronix.com/scan.php?page=article&item=nvidia-rtx2080ti-
tensorflow&num=2
3
https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/issues/173
AMD Radeon™ VII
Memory: 16 GB
TensorFlow 1.13.1
ROCm: 2.3
Model: RESNET-50
Dataset: imagenet (synthetic)
------------------------------------------------------------
FP32 total images/sec: ~302
FP16 total images/sec: ~415
4. ©2019 Logical Clocks AB. All Rights Reserved
4#UnifiedAnalytics #SparkAISummit
Latest Machine Learning
Frameworks
Dockers and Kubernetes
support
Optimized Math &
Communication Libraries
Up-Streamed for Linux
Kernel Distributions
Frameworks
Middleware and
Libraries
Eigen
Spark / Machine Learning Apps
Data Platform
Tools
ROCm
Fully Open Source ROCm Platform
OpenMP HIP OpenCL™ Python
Devices GPU CPU APU DLA
RCCL
BLAS, FFT,
RNG
MIOpen
O P E N S O U R C E
F O U N D A T I O N
F O R M A C H I N E
L E A R N I N G
A M D M L S O F T W A R E S T R A T E G Y
5. ©2019 Logical Clocks AB. All Rights Reserved
Linux Kernel 4.17
700+ upstream ROCm driver commits since 4.12 kernel
https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver
Distro: Upstream Linux Kernel Support
6. ©2019 Logical Clocks AB. All Rights Reserved
Programming Models
LLVM: https://llvm.org/docs/AMDGPUUsage.html
CLANG HIP: https://clang.llvm.org/doxygen/HIP_8h_source.html
OpenMP Python OpenCLHIP
LLVM -> AMDGCN Compiler
AMDGPU Code
LLVM
Languages: Multiple Programming options
7. ©2019 Logical Clocks AB. All Rights Reserved
ROCm Distributed Training
8#UnifiedAnalytics #SparkAISummit
Optimized collective
communication operations
library
Easy MPI integration
Support for Infiniband and
RoCE highspeed network
fabrics
ROCm enabled UCX
ROCm w/ ROCnRDMA
RCCL
1,00X
1,99X
3,98X
7,64X
0,00X
1,00X
2,00X
3,00X
4,00X
5,00X
6,00X
7,00X
8,00X
RESNET50
Multi-GPU Scaling
(PCIe, CPU parameter-
server, 1/2/4/8 GPU)
1GPU 2GPU 4GPU 8GPU
ResNet-50
8. ©2019 Logical Clocks AB. All Rights Reserved
ROCm -> Spark / TensorFlow
• Spark / TensorFlow
applications run
unchanged on ROCm
• Hopsworks runs
Spark/TensorFlow on
YARN and Conda
9
9. ©2019 Logical Clocks AB. All Rights Reserved
Container
A Container
is a CGroup that
isolates CPU,
memory, and
GPU resources
and has a conda
environment and
TLS certs.
ContainerContainerContainer
YARN support for ROCm in Hops
10#UnifiedAnalytics #SparkAISummit
Resource
Manager
Node
Manager
Node
Manager
Node
Manager
Executor ExecutorExecutorDriver
10. ©2018 Logical Clocks AB. All Rights Reserved
11
Distributed Deep Learning in Hopsworks
HopsFS (HDFS)
TensorBoard ModelsExperiments Training Data Logs
Executor 1 Executor N
Driver
conda_env
conda_env conda_env
11. ©2018 Logical Clocks AB. All Rights Reserved
Hyperparameter Optimization
12
# RUNS ON THE EXECUTORS
def train(lr, dropout):
def input_fn(): # return dataset
optimizer = …
model = …
model.add(Conv2D(…))
model.compile(…)
model.fit(…)
model.evaluate(…)
# RUNS ON THE DRIVER
Hparams= {‘lr’:[0.001, 0.0001],
‘dropout’: [0.25, 0.5, 0.75]}
experiment.grid_search(train,HParams)
https://github.com/logicalclocks/hops-examples
HopsFS (HDFS)
TensorBoard ModelsExperiments Training Data Logs
Executor 1 Executor N
Driver
conda_env
conda_env conda_env
12. ©2018 Logical Clocks AB. All Rights Reserved
Distributed Training
13
# RUNS ON THE EXECUTORS
def train():
def input_fn(): # return dataset
model = …
optimizer = …
model.compile(…)
rc = tf.estimator.RunConfig(
‘CollectiveAllReduceStrategy’)
keras_estimator = tf.keras.estimator.
model_to_estimator(….)
tf.estimator.train_and_evaluate(
keras_estimator, input_fn)
# RUNS ON THE DRIVER
experiment.collective_all_reduce(train)
https://github.com/logicalclocks/hops-examples
HopsFS (HDFS)
TensorBoard ModelsExperiments Training Data Logs
Executor 1 Executor N
Driver
conda_env
conda_env conda_env
13. ©2019 Logical Clocks AB. All Rights Reserved
14#UnifiedAnalytics #SparkAISummit
Horizontally Scalable ML Pipelines with Hopsworks
Raw Data
Event Data
Monitor
HopsFS
Feature
Store Serving
Feature StoreData PrepIngest DeployExperiment/Train
Airflow
logs
logs
Metadata Store
14. ©2018 Logical Clocks AB. All Rights Reserved
ML Pipelines of Jupyter Notebooks
15
Select
Features,
File Format
Feature
Engineering
Validate &
Deploy Model
Experiment,
Train Model
Airflow
End-to-End ML Pipeline
Feature Backfill Pipeline Training and Deployment Pipeline
Feature
Store
15. ©2018 Logical Clocks AB. All Rights Reserved
Online Model Serving and Monitoring
16
16
Link Predictions with Outcomes to measure Model Performance
Feature
Store
2. Build Feature Vector
4. Log Prediction
Data Lake
Monitor
Model Serving
Images
Model
Server
Kubernetes
3. Make Prediction
Hopsworks
Request Response
1. Access Control
<<HTTPS>>
16. Summary
•Hopsworks now supports both Nvidia (cuda) and AMD
(ROCm) GPUs
-Hopsworks 0.10+
•New AMD GPUs will challenge Nvidia’s hegemony in DL
-Vega R7
-Navi architecture GPUs coming in July (RX 5700)
• 1.25x performance per clock and 1.5x performance per watt
• GDDR6 memory and support PCIe 4.0
17/32
https://databricks.com/session/rocm-and-distributed-deep-learning-on-spark-and-tensorflow
17. ©2019 Logical Clocks AB. All Rights Reserved
18
@logicalclocks
www.logicalclocks.com
Try it Out!
1. Register for an account at: www.hops.site
robzor92