SlideShare una empresa de Scribd logo
1 de 17
Descargar para leer sin conexión
AMD/ROCm for Hopsworks
HopsML 4th Meetup, Stockholm
June 4th, 2019
jim_dowling
CEO @ Logical Clocks
Assoc Prof @ KTH
robzor92
©2018 Logical Clocks AB. All Rights Reserved
Great Hedge of India
2
•East India Company was one of
the industrial world’s first monopolies.
•They assembled a thorny hedge
(not a wall!) spanning India.
•You paid customs duty to bring salt
over the wall (sorry, hedge).
In 2019, Nvidia GeForce graphics cards are allowed
to be used in a Data Center. Monoplies are not good
for deep learning!
[Image from Wikipedia]
©2019 Logical Clocks AB. All Rights Reserved
Nvidia™ 2080Ti vs AMD Radeon™ VII ResNet-50 Benchmark
Nvidia™ 2080Ti
Memory: 11GB
TensorFlow 1.12
CUDA 10.0.130, cuDNN 7.4.1
Model: RESNET-50
Dataset: imagenet (synthetic)
------------------------------------------------------------
FP32 total images/sec: ~322
FP16 total images/sec: ~560
https://lambdalabs.com/blog/2080-ti-deep-learning-benchmarks/
https://www.phoronix.com/scan.php?page=article&item=nvidia-rtx2080ti-
tensorflow&num=2
3
https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/issues/173
AMD Radeon™ VII
Memory: 16 GB
TensorFlow 1.13.1
ROCm: 2.3
Model: RESNET-50
Dataset: imagenet (synthetic)
------------------------------------------------------------
FP32 total images/sec: ~302
FP16 total images/sec: ~415
©2019 Logical Clocks AB. All Rights Reserved
4#UnifiedAnalytics #SparkAISummit
Latest Machine Learning
Frameworks
Dockers and Kubernetes
support
Optimized Math &
Communication Libraries
Up-Streamed for Linux
Kernel Distributions
Frameworks
Middleware and
Libraries
Eigen
Spark / Machine Learning Apps
Data Platform
Tools
ROCm
Fully Open Source ROCm Platform
OpenMP HIP OpenCL™ Python
Devices GPU CPU APU DLA
RCCL
BLAS, FFT,
RNG
MIOpen
O P E N S O U R C E
F O U N D A T I O N
F O R M A C H I N E
L E A R N I N G
A M D M L S O F T W A R E S T R A T E G Y
©2019 Logical Clocks AB. All Rights Reserved
Linux Kernel 4.17
700+ upstream ROCm driver commits since 4.12 kernel
https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver
Distro: Upstream Linux Kernel Support
©2019 Logical Clocks AB. All Rights Reserved
Programming Models
LLVM: https://llvm.org/docs/AMDGPUUsage.html
CLANG HIP: https://clang.llvm.org/doxygen/HIP_8h_source.html
OpenMP Python OpenCLHIP
LLVM -> AMDGCN Compiler
AMDGPU Code
LLVM
Languages: Multiple Programming options
©2019 Logical Clocks AB. All Rights Reserved
ROCm Distributed Training
8#UnifiedAnalytics #SparkAISummit
Optimized collective
communication operations
library
Easy MPI integration
Support for Infiniband and
RoCE highspeed network
fabrics
ROCm enabled UCX
ROCm w/ ROCnRDMA
RCCL
1,00X
1,99X
3,98X
7,64X
0,00X
1,00X
2,00X
3,00X
4,00X
5,00X
6,00X
7,00X
8,00X
RESNET50
Multi-GPU Scaling
(PCIe, CPU parameter-
server, 1/2/4/8 GPU)
1GPU 2GPU 4GPU 8GPU
ResNet-50
©2019 Logical Clocks AB. All Rights Reserved
ROCm -> Spark / TensorFlow
• Spark / TensorFlow
applications run
unchanged on ROCm
• Hopsworks runs
Spark/TensorFlow on
YARN and Conda
9
©2019 Logical Clocks AB. All Rights Reserved
Container
A Container
is a CGroup that
isolates CPU,
memory, and
GPU resources
and has a conda
environment and
TLS certs.
ContainerContainerContainer
YARN support for ROCm in Hops
10#UnifiedAnalytics #SparkAISummit
Resource
Manager
Node
Manager
Node
Manager
Node
Manager
Executor ExecutorExecutorDriver
©2018 Logical Clocks AB. All Rights Reserved
11
Distributed Deep Learning in Hopsworks
HopsFS (HDFS)
TensorBoard ModelsExperiments Training Data Logs
Executor 1 Executor N
Driver
conda_env
conda_env conda_env
©2018 Logical Clocks AB. All Rights Reserved
Hyperparameter Optimization
12
# RUNS ON THE EXECUTORS
def train(lr, dropout):
def input_fn(): # return dataset
optimizer = …
model = …
model.add(Conv2D(…))
model.compile(…)
model.fit(…)
model.evaluate(…)
# RUNS ON THE DRIVER
Hparams= {‘lr’:[0.001, 0.0001],
‘dropout’: [0.25, 0.5, 0.75]}
experiment.grid_search(train,HParams)
https://github.com/logicalclocks/hops-examples
HopsFS (HDFS)
TensorBoard ModelsExperiments Training Data Logs
Executor 1 Executor N
Driver
conda_env
conda_env conda_env
©2018 Logical Clocks AB. All Rights Reserved
Distributed Training
13
# RUNS ON THE EXECUTORS
def train():
def input_fn(): # return dataset
model = …
optimizer = …
model.compile(…)
rc = tf.estimator.RunConfig(
‘CollectiveAllReduceStrategy’)
keras_estimator = tf.keras.estimator.
model_to_estimator(….)
tf.estimator.train_and_evaluate(
keras_estimator, input_fn)
# RUNS ON THE DRIVER
experiment.collective_all_reduce(train)
https://github.com/logicalclocks/hops-examples
HopsFS (HDFS)
TensorBoard ModelsExperiments Training Data Logs
Executor 1 Executor N
Driver
conda_env
conda_env conda_env
©2019 Logical Clocks AB. All Rights Reserved
14#UnifiedAnalytics #SparkAISummit
Horizontally Scalable ML Pipelines with Hopsworks
Raw Data
Event Data
Monitor
HopsFS
Feature
Store Serving
Feature StoreData PrepIngest DeployExperiment/Train
Airflow
logs
logs
Metadata Store
©2018 Logical Clocks AB. All Rights Reserved
ML Pipelines of Jupyter Notebooks
15
Select
Features,
File Format
Feature
Engineering
Validate &
Deploy Model
Experiment,
Train Model
Airflow
End-to-End ML Pipeline
Feature Backfill Pipeline Training and Deployment Pipeline
Feature
Store
©2018 Logical Clocks AB. All Rights Reserved
Online Model Serving and Monitoring
16
16
Link Predictions with Outcomes to measure Model Performance
Feature
Store
2. Build Feature Vector
4. Log Prediction
Data Lake
Monitor
Model Serving
Images
Model
Server
Kubernetes
3. Make Prediction
Hopsworks
Request Response
1. Access Control
<<HTTPS>>
Summary
•Hopsworks now supports both Nvidia (cuda) and AMD
(ROCm) GPUs
-Hopsworks 0.10+
•New AMD GPUs will challenge Nvidia’s hegemony in DL
-Vega R7
-Navi architecture GPUs coming in July (RX 5700)
• 1.25x performance per clock and 1.5x performance per watt
• GDDR6 memory and support PCIe 4.0
17/32
https://databricks.com/session/rocm-and-distributed-deep-learning-on-spark-and-tensorflow
©2019 Logical Clocks AB. All Rights Reserved
18
@logicalclocks
www.logicalclocks.com
Try it Out!
1. Register for an account at: www.hops.site
robzor92

Más contenido relacionado

La actualidad más candente

Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォームPivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Masayuki Matsushita
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
MLconf
 
Python for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandasPython for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandas
Wes McKinney
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
Modern Data Stack France
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
inside-BigData.com
 

La actualidad más candente (20)

How Apache Arrow and Parquet boost cross-language interoperability
How Apache Arrow and Parquet boost cross-language interoperabilityHow Apache Arrow and Parquet boost cross-language interoperability
How Apache Arrow and Parquet boost cross-language interoperability
 
Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019
Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019
Hopsworks hands on_feature_store_palo_alto_kim_hammar_23_april_2019
 
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォームPivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
Pivotal Greenplum 次世代マルチクラウド・データ分析プラットフォーム
 
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
Arun Rathinasabapathy, Senior Software Engineer, LexisNexis at MLconf ATL 2016
 
Hivemall talk@Hadoop summit 2014, San Jose
Hivemall talk@Hadoop summit 2014, San JoseHivemall talk@Hadoop summit 2014, San Jose
Hivemall talk@Hadoop summit 2014, San Jose
 
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
PyMADlib - A Python wrapper for MADlib : in-database, parallel, machine learn...
 
HUG August 2010: Best practices
HUG August 2010: Best practicesHUG August 2010: Best practices
HUG August 2010: Best practices
 
XML Parsing with Map Reduce
XML Parsing with Map ReduceXML Parsing with Map Reduce
XML Parsing with Map Reduce
 
Python for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandasPython for Financial Data Analysis with pandas
Python for Financial Data Analysis with pandas
 
Greenplum-Spark November 2018
Greenplum-Spark November 2018Greenplum-Spark November 2018
Greenplum-Spark November 2018
 
An Incomplete Data Tools Landscape for Hackers in 2015
An Incomplete Data Tools Landscape for Hackers in 2015An Incomplete Data Tools Landscape for Hackers in 2015
An Incomplete Data Tools Landscape for Hackers in 2015
 
RAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data ScienceRAPIDS – Open GPU-accelerated Data Science
RAPIDS – Open GPU-accelerated Data Science
 
A Database-Hadoop Hybrid Approach to Scalable Machine Learning
A Database-Hadoop Hybrid Approach to Scalable Machine LearningA Database-Hadoop Hybrid Approach to Scalable Machine Learning
A Database-Hadoop Hybrid Approach to Scalable Machine Learning
 
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalRMADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
MADlib Architecture and Functional Demo on How to Use MADlib/PivotalR
 
Strata London 2016: The future of column oriented data processing with Arrow ...
Strata London 2016: The future of column oriented data processing with Arrow ...Strata London 2016: The future of column oriented data processing with Arrow ...
Strata London 2016: The future of column oriented data processing with Arrow ...
 
Introduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig FundamentalsIntroduction to Pig | Pig Architecture | Pig Fundamentals
Introduction to Pig | Pig Architecture | Pig Fundamentals
 
Syncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScoreSyncsort et le retour d'expérience ComScore
Syncsort et le retour d'expérience ComScore
 
AMD It's Time to ROC
AMD It's Time to ROCAMD It's Time to ROC
AMD It's Time to ROC
 
Rapids: Data Science on GPUs
Rapids: Data Science on GPUsRapids: Data Science on GPUs
Rapids: Data Science on GPUs
 
Google Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine LearningGoogle Cloud Platform Empowers TensorFlow and Machine Learning
Google Cloud Platform Empowers TensorFlow and Machine Learning
 

Similar a HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019

“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
Edge AI and Vision Alliance
 
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
Linaro
 

Similar a HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019 (20)

ROCm and Distributed Deep Learning on Spark and TensorFlow
ROCm and Distributed Deep Learning on Spark and TensorFlowROCm and Distributed Deep Learning on Spark and TensorFlow
ROCm and Distributed Deep Learning on Spark and TensorFlow
 
Speeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCCSpeeding up Programs with OpenACC in GCC
Speeding up Programs with OpenACC in GCC
 
Innovation with ai at scale on the edge vt sept 2019 v0
Innovation with ai at scale  on the edge vt sept 2019 v0Innovation with ai at scale  on the edge vt sept 2019 v0
Innovation with ai at scale on the edge vt sept 2019 v0
 
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)Backend.AI Technical Introduction (19.09 / 2019 Autumn)
Backend.AI Technical Introduction (19.09 / 2019 Autumn)
 
BSC LMS DDL
BSC LMS DDL BSC LMS DDL
BSC LMS DDL
 
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDSAccelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
Accelerated Machine Learning with RAPIDS and MLflow, Nvidia/RAPIDS
 
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
“Making Edge AI Inference Programming Easier and Flexible,” a Presentation fr...
 
Amazon EC2 deepdive and a sprinkel of AWS Compute | AWS Floor28
Amazon EC2 deepdive and a sprinkel of AWS Compute | AWS Floor28Amazon EC2 deepdive and a sprinkel of AWS Compute | AWS Floor28
Amazon EC2 deepdive and a sprinkel of AWS Compute | AWS Floor28
 
HSA HSAIL Introduction Hot Chips 2013
HSA HSAIL Introduction  Hot Chips 2013 HSA HSAIL Introduction  Hot Chips 2013
HSA HSAIL Introduction Hot Chips 2013
 
Uber on Using Horovod for Distributed Deep Learning (AIM411) - AWS re:Invent ...
Uber on Using Horovod for Distributed Deep Learning (AIM411) - AWS re:Invent ...Uber on Using Horovod for Distributed Deep Learning (AIM411) - AWS re:Invent ...
Uber on Using Horovod for Distributed Deep Learning (AIM411) - AWS re:Invent ...
 
Xilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systemsXilinx Edge Compute using Power 9 /OpenPOWER systems
Xilinx Edge Compute using Power 9 /OpenPOWER systems
 
Distributed Model Training using MXNet with Horovod
Distributed Model Training using MXNet with HorovodDistributed Model Training using MXNet with Horovod
Distributed Model Training using MXNet with Horovod
 
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware WebinarAn Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
An Introduction to OpenCL™ Programming with AMD GPUs - AMD & Acceleware Webinar
 
Jfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocksJfokus 2019-dowling-logical-clocks
Jfokus 2019-dowling-logical-clocks
 
An Update on the European Processor Initiative
An Update on the European Processor InitiativeAn Update on the European Processor Initiative
An Update on the European Processor Initiative
 
PowerAI Deep dive
PowerAI Deep divePowerAI Deep dive
PowerAI Deep dive
 
Cloud-native Java EE-volution
Cloud-native Java EE-volutionCloud-native Java EE-volution
Cloud-native Java EE-volution
 
SAYAN GHOSH
SAYAN GHOSHSAYAN GHOSH
SAYAN GHOSH
 
Scaling TensorFlow with Hops, Global AI Conference Santa Clara
Scaling TensorFlow with Hops, Global AI Conference Santa ClaraScaling TensorFlow with Hops, Global AI Conference Santa Clara
Scaling TensorFlow with Hops, Global AI Conference Santa Clara
 
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
HKG18-301 - Dramatically Accelerate 96Board Software via an FPGA with Integra...
 

Más de Jim Dowling

Berlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingBerlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowling
Jim Dowling
 

Más de Jim Dowling (20)

ARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdfARVC and flecainide case report[EI] Jim.docx.pdf
ARVC and flecainide case report[EI] Jim.docx.pdf
 
PyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdfPyData Berlin 2023 - Mythical ML Pipeline.pdf
PyData Berlin 2023 - Mythical ML Pipeline.pdf
 
Serverless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData SeattleServerless ML Workshop with Hopsworks at PyData Seattle
Serverless ML Workshop with Hopsworks at PyData Seattle
 
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdfPyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
PyCon Sweden 2022 - Dowling - Serverless ML with Hopsworks.pdf
 
_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf_Python Ireland Meetup - Serverless ML - Dowling.pdf
_Python Ireland Meetup - Serverless ML - Dowling.pdf
 
Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning Building Hopsworks, a cloud-native managed feature store for machine learning
Building Hopsworks, a cloud-native managed feature store for machine learning
 
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022Real-Time Recommendations  with Hopsworks and OpenSearch - MLOps World 2022
Real-Time Recommendations with Hopsworks and OpenSearch - MLOps World 2022
 
Ml ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science MeetupMl ops and the feature store with hopsworks, DC Data Science Meetup
Ml ops and the feature store with hopsworks, DC Data Science Meetup
 
Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021Hops fs huawei internal conference july 2021
Hops fs huawei internal conference july 2021
 
Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21Hopsworks MLOps World talk june 21
Hopsworks MLOps World talk june 21
 
Hopsworks Feature Store 2.0 a new paradigm
Hopsworks Feature Store  2.0   a new paradigmHopsworks Feature Store  2.0   a new paradigm
Hopsworks Feature Store 2.0 a new paradigm
 
Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks Metadata and Provenance for ML Pipelines with Hopsworks
Metadata and Provenance for ML Pipelines with Hopsworks
 
GANs for Anti Money Laundering
GANs for Anti Money LaunderingGANs for Anti Money Laundering
GANs for Anti Money Laundering
 
Berlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowlingBerlin buzzwords 2020-feature-store-dowling
Berlin buzzwords 2020-feature-store-dowling
 
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala UniversityInvited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
 
Hopsworks data engineering melbourne april 2020
Hopsworks   data engineering melbourne april 2020Hopsworks   data engineering melbourne april 2020
Hopsworks data engineering melbourne april 2020
 
The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines The Bitter Lesson of ML Pipelines
The Bitter Lesson of ML Pipelines
 
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and MaggyAsynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
Asynchronous Hyperparameter Search with Spark on Hopsworks and Maggy
 
PyData Meetup - Feature Store for Hopsworks and ML Pipelines
PyData Meetup - Feature Store for Hopsworks and ML PipelinesPyData Meetup - Feature Store for Hopsworks and ML Pipelines
PyData Meetup - Feature Store for Hopsworks and ML Pipelines
 
The Feature Store in Hopsworks
The Feature Store in HopsworksThe Feature Store in Hopsworks
The Feature Store in Hopsworks
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
ICT role in 21st century education and its challenges
ICT role in 21st century education and its challengesICT role in 21st century education and its challenges
ICT role in 21st century education and its challenges
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
Connector Corner: Accelerate revenue generation using UiPath API-centric busi...
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
Exploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with MilvusExploring Multimodal Embeddings with Milvus
Exploring Multimodal Embeddings with Milvus
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024AXA XL - Insurer Innovation Award Americas 2024
AXA XL - Insurer Innovation Award Americas 2024
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
Emergent Methods: Multi-lingual narrative tracking in the news - real-time ex...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 

HopsML Meetup talk on Hopsworks + ROCm/AMD June 2019

  • 1. AMD/ROCm for Hopsworks HopsML 4th Meetup, Stockholm June 4th, 2019 jim_dowling CEO @ Logical Clocks Assoc Prof @ KTH robzor92
  • 2. ©2018 Logical Clocks AB. All Rights Reserved Great Hedge of India 2 •East India Company was one of the industrial world’s first monopolies. •They assembled a thorny hedge (not a wall!) spanning India. •You paid customs duty to bring salt over the wall (sorry, hedge). In 2019, Nvidia GeForce graphics cards are allowed to be used in a Data Center. Monoplies are not good for deep learning! [Image from Wikipedia]
  • 3. ©2019 Logical Clocks AB. All Rights Reserved Nvidia™ 2080Ti vs AMD Radeon™ VII ResNet-50 Benchmark Nvidia™ 2080Ti Memory: 11GB TensorFlow 1.12 CUDA 10.0.130, cuDNN 7.4.1 Model: RESNET-50 Dataset: imagenet (synthetic) ------------------------------------------------------------ FP32 total images/sec: ~322 FP16 total images/sec: ~560 https://lambdalabs.com/blog/2080-ti-deep-learning-benchmarks/ https://www.phoronix.com/scan.php?page=article&item=nvidia-rtx2080ti- tensorflow&num=2 3 https://github.com/ROCmSoftwarePlatform/tensorflow-upstream/issues/173 AMD Radeon™ VII Memory: 16 GB TensorFlow 1.13.1 ROCm: 2.3 Model: RESNET-50 Dataset: imagenet (synthetic) ------------------------------------------------------------ FP32 total images/sec: ~302 FP16 total images/sec: ~415
  • 4. ©2019 Logical Clocks AB. All Rights Reserved 4#UnifiedAnalytics #SparkAISummit Latest Machine Learning Frameworks Dockers and Kubernetes support Optimized Math & Communication Libraries Up-Streamed for Linux Kernel Distributions Frameworks Middleware and Libraries Eigen Spark / Machine Learning Apps Data Platform Tools ROCm Fully Open Source ROCm Platform OpenMP HIP OpenCL™ Python Devices GPU CPU APU DLA RCCL BLAS, FFT, RNG MIOpen O P E N S O U R C E F O U N D A T I O N F O R M A C H I N E L E A R N I N G A M D M L S O F T W A R E S T R A T E G Y
  • 5. ©2019 Logical Clocks AB. All Rights Reserved Linux Kernel 4.17 700+ upstream ROCm driver commits since 4.12 kernel https://github.com/RadeonOpenCompute/ROCK-Kernel-Driver Distro: Upstream Linux Kernel Support
  • 6. ©2019 Logical Clocks AB. All Rights Reserved Programming Models LLVM: https://llvm.org/docs/AMDGPUUsage.html CLANG HIP: https://clang.llvm.org/doxygen/HIP_8h_source.html OpenMP Python OpenCLHIP LLVM -> AMDGCN Compiler AMDGPU Code LLVM Languages: Multiple Programming options
  • 7. ©2019 Logical Clocks AB. All Rights Reserved ROCm Distributed Training 8#UnifiedAnalytics #SparkAISummit Optimized collective communication operations library Easy MPI integration Support for Infiniband and RoCE highspeed network fabrics ROCm enabled UCX ROCm w/ ROCnRDMA RCCL 1,00X 1,99X 3,98X 7,64X 0,00X 1,00X 2,00X 3,00X 4,00X 5,00X 6,00X 7,00X 8,00X RESNET50 Multi-GPU Scaling (PCIe, CPU parameter- server, 1/2/4/8 GPU) 1GPU 2GPU 4GPU 8GPU ResNet-50
  • 8. ©2019 Logical Clocks AB. All Rights Reserved ROCm -> Spark / TensorFlow • Spark / TensorFlow applications run unchanged on ROCm • Hopsworks runs Spark/TensorFlow on YARN and Conda 9
  • 9. ©2019 Logical Clocks AB. All Rights Reserved Container A Container is a CGroup that isolates CPU, memory, and GPU resources and has a conda environment and TLS certs. ContainerContainerContainer YARN support for ROCm in Hops 10#UnifiedAnalytics #SparkAISummit Resource Manager Node Manager Node Manager Node Manager Executor ExecutorExecutorDriver
  • 10. ©2018 Logical Clocks AB. All Rights Reserved 11 Distributed Deep Learning in Hopsworks HopsFS (HDFS) TensorBoard ModelsExperiments Training Data Logs Executor 1 Executor N Driver conda_env conda_env conda_env
  • 11. ©2018 Logical Clocks AB. All Rights Reserved Hyperparameter Optimization 12 # RUNS ON THE EXECUTORS def train(lr, dropout): def input_fn(): # return dataset optimizer = … model = … model.add(Conv2D(…)) model.compile(…) model.fit(…) model.evaluate(…) # RUNS ON THE DRIVER Hparams= {‘lr’:[0.001, 0.0001], ‘dropout’: [0.25, 0.5, 0.75]} experiment.grid_search(train,HParams) https://github.com/logicalclocks/hops-examples HopsFS (HDFS) TensorBoard ModelsExperiments Training Data Logs Executor 1 Executor N Driver conda_env conda_env conda_env
  • 12. ©2018 Logical Clocks AB. All Rights Reserved Distributed Training 13 # RUNS ON THE EXECUTORS def train(): def input_fn(): # return dataset model = … optimizer = … model.compile(…) rc = tf.estimator.RunConfig( ‘CollectiveAllReduceStrategy’) keras_estimator = tf.keras.estimator. model_to_estimator(….) tf.estimator.train_and_evaluate( keras_estimator, input_fn) # RUNS ON THE DRIVER experiment.collective_all_reduce(train) https://github.com/logicalclocks/hops-examples HopsFS (HDFS) TensorBoard ModelsExperiments Training Data Logs Executor 1 Executor N Driver conda_env conda_env conda_env
  • 13. ©2019 Logical Clocks AB. All Rights Reserved 14#UnifiedAnalytics #SparkAISummit Horizontally Scalable ML Pipelines with Hopsworks Raw Data Event Data Monitor HopsFS Feature Store Serving Feature StoreData PrepIngest DeployExperiment/Train Airflow logs logs Metadata Store
  • 14. ©2018 Logical Clocks AB. All Rights Reserved ML Pipelines of Jupyter Notebooks 15 Select Features, File Format Feature Engineering Validate & Deploy Model Experiment, Train Model Airflow End-to-End ML Pipeline Feature Backfill Pipeline Training and Deployment Pipeline Feature Store
  • 15. ©2018 Logical Clocks AB. All Rights Reserved Online Model Serving and Monitoring 16 16 Link Predictions with Outcomes to measure Model Performance Feature Store 2. Build Feature Vector 4. Log Prediction Data Lake Monitor Model Serving Images Model Server Kubernetes 3. Make Prediction Hopsworks Request Response 1. Access Control <<HTTPS>>
  • 16. Summary •Hopsworks now supports both Nvidia (cuda) and AMD (ROCm) GPUs -Hopsworks 0.10+ •New AMD GPUs will challenge Nvidia’s hegemony in DL -Vega R7 -Navi architecture GPUs coming in July (RX 5700) • 1.25x performance per clock and 1.5x performance per watt • GDDR6 memory and support PCIe 4.0 17/32 https://databricks.com/session/rocm-and-distributed-deep-learning-on-spark-and-tensorflow
  • 17. ©2019 Logical Clocks AB. All Rights Reserved 18 @logicalclocks www.logicalclocks.com Try it Out! 1. Register for an account at: www.hops.site robzor92