SlideShare una empresa de Scribd logo
1 de 20
Descargar para leer sin conexión
Deep Learning : Convergence of HPC and
Hyperscale
Sumit Sanyal
CEO
Technology Revolution :
Deep Neural Nets
Ø  A new paradigm in programming
Ø  DNNs remarkably effective at tackling many problems
Ø  Designing new NN architectures as opposed to “programming”
Ø  Training as opposed to “compiling”
Ø  Trained weights as the “new” binaries
Ø  Big Neural Nets required to process Big Data
Ø  Videos, images, speech and text
Ø  DNNs are significantly increasing recognition accuracies which have stagnated
for decades
Ø  Used to structure Big Data
Make a
guess at NN
architecture
Run on
training
data, does
it perform
well?
No
Run on
test data,
does it
perform
well?
No
Yes
DONE!
Design Bigger
Network
Need more Training
Data
Yes
Neural network development
Iterate
Iterate
Train on
labeled
dataset
Wait 1-7+ days
3
Ø  Deep Neural Nets require a lot of compute cycles!
Ø  An example – image classification:
Ø  Ratio of (Compute cycles : IO bandwidth) significantly higher than non AI algorithms
Ø  Training AlexNet (for image classification) requires ~27,000 flops/input data byte
Ø  Training VGG ~150,000 flops/ data byte
Ø  R3 / R2 à Volume (compute) / Surface(IO BW)
Ø  Significantly higher for Deep Nets
Ø  Power dissipation challenges
Ø  Compute density limited by DC cooling capacity
Ø  At 1 ~uW / MHz (current state-of-art in 28 nm) requires 300 Watts!
Compute vs. IO
AI is no longer bored J
Neural Net Computations
Ø  All Deep Neural Net implementations have the following properties
Ø  Small set of non-linear transforms
Ø  Small set of linear algebra primitives
Ø  Relatively modest dynamic range of weight/data values
Ø  Very regular/repetitive data flows
Ø  Only persistent memory requirement is for weights
Ø  Updated while learning, fixed for recognition
Ø Variance in the size of the net across applications is >105
Compute cycles will be commoditized; not computers!
Why minds.ai?
minds.ai was founded on the basis that Deep Learning is the next
frontier for HPC – it’s what we do
“…around 2008 my group at Stanford started advocating
shifting deep learning to GPUs (this was really
controversial at that time; but now everyone does it); and
I'm now advocating shifting to HPC (High
Performance Computing/Supercomputing) tactics for
scaling up deep learning. Machine learning should
embrace HPC. These methods will make researchers
more efficient and help accelerate the progress of our
whole field.” – Andrew Ng, Feb 2016
HPC Design
Expertise
Neural Network
Design Expertise
Image and Video
Recognition
Expertise
GPU Programming
Expertise
+ + +
Ø  Instruction level – SIMD, VLIW etc.
Ø  Thread level – warps
Ø  Processor level – many cores
Ø  Server level – many GPUs in a server
Ø  Cluster level – many servers with high BW interconnect
Ø  Data Center level
Ø  Planet level J
7 levels of parallelism
Ø Exponential growth in Data Centers
Ø  Commoditization of Enterprise silicon
Ø  Traditional mobile players announcing ASICs for enterprise
compute
Ø Higher demands for compute density
Ø  GPGPUs have won the first round
Ø  Dennardian scaling is breaking down
Ø Power dissipation will emerge as major challenge
Ø  Chip level, server level and DC level
ASICs for DNNs
Age of dark silicon
Transistor property Dennardian Post Dennardian
# of transistors
( Q )
S2 S2
Peak clock frequency
( F )
S S
Capacitance
( C )
1/S 1/S
Supply Voltage
( Vdd )
1 / S2 1
Dynamic Power
(QFCVdd
2)
1 S2
Active Silicon 1 1/S2
* S is the ratio of feature size between next generation processes
Ø  Dark silicon will drive heterogeneity
Ø  Multi core architectures with different Instruction Sets
Ø  Power aware scheduling across cores
Ø  Decreasing parts of the chip can run at full clock frequency
Ø  Specialized silicon for server blades
Ø  Bridges to intra server and inter server communications
Ø  Last level caching support, caching across MPI
Ø  Distributed compute in network interfaces
Heterogeneity in Enterprise silicon
Ø  Instruction level – SIMD, VLIW etc.
Ø  Thread level – warps
Ø  Processor level – many cores
Ø  Server level – many GPUs in a server
Ø  Cluster level – many servers with high BW interconnect
Ø  Data Center level
Ø  Planet level J
7 levels of parallelism
Ø  Deep Learning is driving the convergence of High Performance
Compute (HPC) and Hyperscale (Data Centers)
Ø  Traditional HPC ecosystems : expensive and bleeding edge
Ø  DC infrastructure : commodity and homogeneous
Ø  Single or dual CPU servers common
Ø  All of this is changing
Ø  GPGPUs now common in DCs, initial resistance
Ø  InfiniBand penetration has reached a tipping point
Ø  Dense compute clusters require high bandwidth interconnects
Heterogeneity in Hyperscale
Ø  Intra server vs. Inter server bandwidths
Ø  Inter server bandwidths will grow faster than intra
server
Ø  Lead to larger, denser servers
Ø  8 or more GPGPUs per server for DNN training jobs
Ø  Will co-exist with CPU based servers for search and
database operations
Ø  Many kinds of servers, one size fits all does not work
Server Architectures
Training Server Reference Design
CPU
GPU
GPU
GPU
GPU
GPU
GPU
GPU
GPU
CPU
IB NIC
IB NIC
Ethernet
PCIe Root Hub
SSD
SSD
GPU Training Cluster Architecture
Server Node
IB Switch
Server Node
Server Node
Server Node
Server Node
Server Node
Server Node
Ethernet Switch
RAID & Access
•  Scalable up to 7 Server Nodes
•  8 GPUs per Node
•  ~2.5kw per Server Node
•  100Gbps IB Interconnect
Ø  Instruction level – SIMD, VLIW etc.
Ø  Thread level – warps
Ø  Processor level – many cores
Ø  Server level – many GPUs in a server
Ø  Cluster level – many servers with high BW interconnect
Ø  Data Center level
Ø  Planet level J
7 levels of parallelism
Ø  Computing super clusters
Ø  105 variability in size of compute ‘jobs’
Ø  Large number of collocated servers running the same job
Ø  High BW, low latency interconnect
Ø  Clusters could grow to significant fraction of a Data Center
Ø  Architecture of clusters will be hierarchical and heterogeneous
Ø  Edge servers for security and Data management
Ø  Dedicated RAID servers
Ø  Dedicated compute servers
Ø  Control and management nodes
Ø  Multi DC training clusters for big models are technically feasible
Ø  Scientific community has performed trans continental simulations
Data Center Architectures
Thank you.
•  Sumit Sanyal, Founder and CEO
sumit@minds.ai
•  Steve Kuo, Co-Founder
steve@minds.ai
Contacts
19
Accelerated Deep Learning

Más contenido relacionado

La actualidad más candente

High-Performance and Scalable Designs of Programming Models for Exascale Systems
High-Performance and Scalable Designs of Programming Models for Exascale SystemsHigh-Performance and Scalable Designs of Programming Models for Exascale Systems
High-Performance and Scalable Designs of Programming Models for Exascale Systemsinside-BigData.com
 
Accelerating apache spark with rdma
Accelerating apache spark with rdmaAccelerating apache spark with rdma
Accelerating apache spark with rdmainside-BigData.com
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computinginside-BigData.com
 
HPC Best Practices: Application Performance Optimization
HPC Best Practices: Application Performance OptimizationHPC Best Practices: Application Performance Optimization
HPC Best Practices: Application Performance Optimizationinside-BigData.com
 
Programming Models for Exascale Systems
Programming Models for Exascale SystemsProgramming Models for Exascale Systems
Programming Models for Exascale Systemsinside-BigData.com
 
DDN: Protecting Your Data, Protecting Your Hardware
DDN: Protecting Your Data, Protecting Your HardwareDDN: Protecting Your Data, Protecting Your Hardware
DDN: Protecting Your Data, Protecting Your Hardwareinside-BigData.com
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuninginside-BigData.com
 
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...Ganesan Narayanasamy
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...inside-BigData.com
 
Overview of the MVAPICH Project and Future Roadmap
Overview of the MVAPICH Project and Future RoadmapOverview of the MVAPICH Project and Future Roadmap
Overview of the MVAPICH Project and Future Roadmapinside-BigData.com
 
Challenges and Opportunities for HPC Interconnects and MPI
Challenges and Opportunities for HPC Interconnects and MPIChallenges and Opportunities for HPC Interconnects and MPI
Challenges and Opportunities for HPC Interconnects and MPIinside-BigData.com
 
High Performance Interconnects: Assessment & Rankings
High Performance Interconnects: Assessment & RankingsHigh Performance Interconnects: Assessment & Rankings
High Performance Interconnects: Assessment & Rankingsinside-BigData.com
 
High Performance Interconnects: Landscape, Assessments & Rankings
High Performance Interconnects: Landscape, Assessments & RankingsHigh Performance Interconnects: Landscape, Assessments & Rankings
High Performance Interconnects: Landscape, Assessments & Rankingsinside-BigData.com
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Clusterinside-BigData.com
 
Regarding Clouds, Mainframes, and Desktops … and Linux
Regarding Clouds, Mainframes, and Desktops … and LinuxRegarding Clouds, Mainframes, and Desktops … and Linux
Regarding Clouds, Mainframes, and Desktops … and LinuxRobert Sutor
 
Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand SolutionsMellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand Solutionsinside-BigData.com
 

La actualidad más candente (20)

High-Performance and Scalable Designs of Programming Models for Exascale Systems
High-Performance and Scalable Designs of Programming Models for Exascale SystemsHigh-Performance and Scalable Designs of Programming Models for Exascale Systems
High-Performance and Scalable Designs of Programming Models for Exascale Systems
 
Accelerating apache spark with rdma
Accelerating apache spark with rdmaAccelerating apache spark with rdma
Accelerating apache spark with rdma
 
Overview of HPC Interconnects
Overview of HPC InterconnectsOverview of HPC Interconnects
Overview of HPC Interconnects
 
CUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computingCUDA-Python and RAPIDS for blazing fast scientific computing
CUDA-Python and RAPIDS for blazing fast scientific computing
 
HPC Best Practices: Application Performance Optimization
HPC Best Practices: Application Performance OptimizationHPC Best Practices: Application Performance Optimization
HPC Best Practices: Application Performance Optimization
 
Programming Models for Exascale Systems
Programming Models for Exascale SystemsProgramming Models for Exascale Systems
Programming Models for Exascale Systems
 
State of ARM-based HPC
State of ARM-based HPCState of ARM-based HPC
State of ARM-based HPC
 
OpenPOWER Webinar
OpenPOWER Webinar OpenPOWER Webinar
OpenPOWER Webinar
 
DDN: Protecting Your Data, Protecting Your Hardware
DDN: Protecting Your Data, Protecting Your HardwareDDN: Protecting Your Data, Protecting Your Hardware
DDN: Protecting Your Data, Protecting Your Hardware
 
Energy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic TuningEnergy Efficient Computing using Dynamic Tuning
Energy Efficient Computing using Dynamic Tuning
 
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
OpenCAPI-based Image Analysis Pipeline for 18 GB/s kilohertz-framerate X-ray ...
 
HPC Network Stack on ARM
HPC Network Stack on ARMHPC Network Stack on ARM
HPC Network Stack on ARM
 
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
How to Achieve High-Performance, Scalable and Distributed DNN Training on Mod...
 
Overview of the MVAPICH Project and Future Roadmap
Overview of the MVAPICH Project and Future RoadmapOverview of the MVAPICH Project and Future Roadmap
Overview of the MVAPICH Project and Future Roadmap
 
Challenges and Opportunities for HPC Interconnects and MPI
Challenges and Opportunities for HPC Interconnects and MPIChallenges and Opportunities for HPC Interconnects and MPI
Challenges and Opportunities for HPC Interconnects and MPI
 
High Performance Interconnects: Assessment & Rankings
High Performance Interconnects: Assessment & RankingsHigh Performance Interconnects: Assessment & Rankings
High Performance Interconnects: Assessment & Rankings
 
High Performance Interconnects: Landscape, Assessments & Rankings
High Performance Interconnects: Landscape, Assessments & RankingsHigh Performance Interconnects: Landscape, Assessments & Rankings
High Performance Interconnects: Landscape, Assessments & Rankings
 
Introducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi ClusterIntroducing HPC with a Raspberry Pi Cluster
Introducing HPC with a Raspberry Pi Cluster
 
Regarding Clouds, Mainframes, and Desktops … and Linux
Regarding Clouds, Mainframes, and Desktops … and LinuxRegarding Clouds, Mainframes, and Desktops … and Linux
Regarding Clouds, Mainframes, and Desktops … and Linux
 
Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand SolutionsMellanox Announces HDR 200 Gb/s InfiniBand Solutions
Mellanox Announces HDR 200 Gb/s InfiniBand Solutions
 

Similar a Deep Learning Convergence of HPC and Hyperscale

The Convergence of HPC and Deep Learning
The Convergence of HPC and Deep LearningThe Convergence of HPC and Deep Learning
The Convergence of HPC and Deep Learninginside-BigData.com
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community
 
Modern processor art
Modern processor artModern processor art
Modern processor artwaqasjadoon11
 
Modern processor art
Modern processor artModern processor art
Modern processor artwaqasjadoon11
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDKKernel TLV
 
Implementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch governmentImplementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch governmentDoKC
 
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance Ceph Community
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Community
 
Cuda meetup presentation 5
Cuda meetup presentation 5Cuda meetup presentation 5
Cuda meetup presentation 5Rihards Gailums
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningDataWorks Summit
 
Distributed DNN training: Infrastructure, challenges, and lessons learned
Distributed DNN training: Infrastructure, challenges, and lessons learnedDistributed DNN training: Infrastructure, challenges, and lessons learned
Distributed DNN training: Infrastructure, challenges, and lessons learnedWee Hyong Tok
 
Parallelism Processor Design
Parallelism Processor DesignParallelism Processor Design
Parallelism Processor DesignSri Prasanna
 
G rpc talk with intel (3)
G rpc talk with intel (3)G rpc talk with intel (3)
G rpc talk with intel (3)Intel
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics PlatformSantanu Dey
 

Similar a Deep Learning Convergence of HPC and Hyperscale (20)

The Convergence of HPC and Deep Learning
The Convergence of HPC and Deep LearningThe Convergence of HPC and Deep Learning
The Convergence of HPC and Deep Learning
 
Introduction to GPU Programming
Introduction to GPU ProgrammingIntroduction to GPU Programming
Introduction to GPU Programming
 
Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph Ceph Community Talk on High-Performance Solid Sate Ceph
Ceph Community Talk on High-Performance Solid Sate Ceph
 
Modern processor art
Modern processor artModern processor art
Modern processor art
 
Danish presentation
Danish presentationDanish presentation
Danish presentation
 
Modern processor art
Modern processor artModern processor art
Modern processor art
 
processor struct
processor structprocessor struct
processor struct
 
Manycores for the Masses
Manycores for the MassesManycores for the Masses
Manycores for the Masses
 
Introduction to DPDK
Introduction to DPDKIntroduction to DPDK
Introduction to DPDK
 
Implementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch governmentImplementing data and databases on K8s within the Dutch government
Implementing data and databases on K8s within the Dutch government
 
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
Ceph Day Shanghai - SSD/NVM Technology Boosting Ceph Performance
 
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
Ceph Day London 2014 - Best Practices for Ceph-powered Implementations of Sto...
 
Cuda meetup presentation 5
Cuda meetup presentation 5Cuda meetup presentation 5
Cuda meetup presentation 5
 
Deep Learning at Scale
Deep Learning at ScaleDeep Learning at Scale
Deep Learning at Scale
 
Cloud Networking Trends
Cloud Networking TrendsCloud Networking Trends
Cloud Networking Trends
 
Accelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learningAccelerating TensorFlow with RDMA for high-performance deep learning
Accelerating TensorFlow with RDMA for high-performance deep learning
 
Distributed DNN training: Infrastructure, challenges, and lessons learned
Distributed DNN training: Infrastructure, challenges, and lessons learnedDistributed DNN training: Infrastructure, challenges, and lessons learned
Distributed DNN training: Infrastructure, challenges, and lessons learned
 
Parallelism Processor Design
Parallelism Processor DesignParallelism Processor Design
Parallelism Processor Design
 
G rpc talk with intel (3)
G rpc talk with intel (3)G rpc talk with intel (3)
G rpc talk with intel (3)
 
Building a High Performance Analytics Platform
Building a High Performance Analytics PlatformBuilding a High Performance Analytics Platform
Building a High Performance Analytics Platform
 

Más de inside-BigData.com

Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...inside-BigData.com
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networksinside-BigData.com
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...inside-BigData.com
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...inside-BigData.com
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networksinside-BigData.com
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoringinside-BigData.com
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecastsinside-BigData.com
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Updateinside-BigData.com
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19inside-BigData.com
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODinside-BigData.com
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Accelerationinside-BigData.com
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficientlyinside-BigData.com
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Erainside-BigData.com
 
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...inside-BigData.com
 
Adaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and EigensolversAdaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and Eigensolversinside-BigData.com
 
Scientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous ArchitecturesScientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous Architecturesinside-BigData.com
 
SW/HW co-design for near-term quantum computing
SW/HW co-design for near-term quantum computingSW/HW co-design for near-term quantum computing
SW/HW co-design for near-term quantum computinginside-BigData.com
 

Más de inside-BigData.com (20)

Major Market Shifts in IT
Major Market Shifts in ITMajor Market Shifts in IT
Major Market Shifts in IT
 
Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...Preparing to program Aurora at Exascale - Early experiences and future direct...
Preparing to program Aurora at Exascale - Early experiences and future direct...
 
Transforming Private 5G Networks
Transforming Private 5G NetworksTransforming Private 5G Networks
Transforming Private 5G Networks
 
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
The Incorporation of Machine Learning into Scientific Simulations at Lawrence...
 
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
Evolving Cyberinfrastructure, Democratizing Data, and Scaling AI to Catalyze ...
 
HPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural NetworksHPC Impact: EDA Telemetry Neural Networks
HPC Impact: EDA Telemetry Neural Networks
 
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean MonitoringBiohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
Biohybrid Robotic Jellyfish for Future Applications in Ocean Monitoring
 
Machine Learning for Weather Forecasts
Machine Learning for Weather ForecastsMachine Learning for Weather Forecasts
Machine Learning for Weather Forecasts
 
HPC AI Advisory Council Update
HPC AI Advisory Council UpdateHPC AI Advisory Council Update
HPC AI Advisory Council Update
 
Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19Fugaku Supercomputer joins fight against COVID-19
Fugaku Supercomputer joins fight against COVID-19
 
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPODHPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
HPC at Scale Enabled by DDN A3i and NVIDIA SuperPOD
 
Versal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud AccelerationVersal Premium ACAP for Network and Cloud Acceleration
Versal Premium ACAP for Network and Cloud Acceleration
 
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance EfficientlyZettar: Moving Massive Amounts of Data across Any Distance Efficiently
Zettar: Moving Massive Amounts of Data across Any Distance Efficiently
 
Scaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's EraScaling TCO in a Post Moore's Era
Scaling TCO in a Post Moore's Era
 
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
 
Data Parallel Deep Learning
Data Parallel Deep LearningData Parallel Deep Learning
Data Parallel Deep Learning
 
Making Supernovae with Jets
Making Supernovae with JetsMaking Supernovae with Jets
Making Supernovae with Jets
 
Adaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and EigensolversAdaptive Linear Solvers and Eigensolvers
Adaptive Linear Solvers and Eigensolvers
 
Scientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous ArchitecturesScientific Applications and Heterogeneous Architectures
Scientific Applications and Heterogeneous Architectures
 
SW/HW co-design for near-term quantum computing
SW/HW co-design for near-term quantum computingSW/HW co-design for near-term quantum computing
SW/HW co-design for near-term quantum computing
 

Último

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Mark Simos
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piececharlottematthew16
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024The Digital Insurer
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Wonjun Hwang
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostZilliz
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Commit University
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfAddepto
 

Último (20)

Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
Tampa BSides - Chef's Tour of Microsoft Security Adoption Framework (SAF)
 
Human Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR SystemsHuman Factors of XR: Using Human Factors to Design XR Systems
Human Factors of XR: Using Human Factors to Design XR Systems
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Story boards and shot lists for my a level piece
Story boards and shot lists for my a level pieceStory boards and shot lists for my a level piece
Story boards and shot lists for my a level piece
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024My INSURER PTE LTD - Insurtech Innovation Award 2024
My INSURER PTE LTD - Insurtech Innovation Award 2024
 
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
Bun (KitWorks Team Study 노별마루 발표 2024.4.22)
 
Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?Are Multi-Cloud and Serverless Good or Bad?
Are Multi-Cloud and Serverless Good or Bad?
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage CostLeverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
Leverage Zilliz Serverless - Up to 50X Saving for Your Vector Storage Cost
 
Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!Nell’iperspazio con Rocket: il Framework Web di Rust!
Nell’iperspazio con Rocket: il Framework Web di Rust!
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxE-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
Transcript: New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
DMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special EditionDMCC Future of Trade Web3 - Special Edition
DMCC Future of Trade Web3 - Special Edition
 
Gen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdfGen AI in Business - Global Trends Report 2024.pdf
Gen AI in Business - Global Trends Report 2024.pdf
 

Deep Learning Convergence of HPC and Hyperscale

  • 1. Deep Learning : Convergence of HPC and Hyperscale Sumit Sanyal CEO
  • 2. Technology Revolution : Deep Neural Nets Ø  A new paradigm in programming Ø  DNNs remarkably effective at tackling many problems Ø  Designing new NN architectures as opposed to “programming” Ø  Training as opposed to “compiling” Ø  Trained weights as the “new” binaries Ø  Big Neural Nets required to process Big Data Ø  Videos, images, speech and text Ø  DNNs are significantly increasing recognition accuracies which have stagnated for decades Ø  Used to structure Big Data
  • 3. Make a guess at NN architecture Run on training data, does it perform well? No Run on test data, does it perform well? No Yes DONE! Design Bigger Network Need more Training Data Yes Neural network development Iterate Iterate Train on labeled dataset Wait 1-7+ days 3
  • 4. Ø  Deep Neural Nets require a lot of compute cycles! Ø  An example – image classification: Ø  Ratio of (Compute cycles : IO bandwidth) significantly higher than non AI algorithms Ø  Training AlexNet (for image classification) requires ~27,000 flops/input data byte Ø  Training VGG ~150,000 flops/ data byte Ø  R3 / R2 à Volume (compute) / Surface(IO BW) Ø  Significantly higher for Deep Nets Ø  Power dissipation challenges Ø  Compute density limited by DC cooling capacity Ø  At 1 ~uW / MHz (current state-of-art in 28 nm) requires 300 Watts! Compute vs. IO AI is no longer bored J
  • 5. Neural Net Computations Ø  All Deep Neural Net implementations have the following properties Ø  Small set of non-linear transforms Ø  Small set of linear algebra primitives Ø  Relatively modest dynamic range of weight/data values Ø  Very regular/repetitive data flows Ø  Only persistent memory requirement is for weights Ø  Updated while learning, fixed for recognition Ø Variance in the size of the net across applications is >105 Compute cycles will be commoditized; not computers!
  • 6. Why minds.ai? minds.ai was founded on the basis that Deep Learning is the next frontier for HPC – it’s what we do “…around 2008 my group at Stanford started advocating shifting deep learning to GPUs (this was really controversial at that time; but now everyone does it); and I'm now advocating shifting to HPC (High Performance Computing/Supercomputing) tactics for scaling up deep learning. Machine learning should embrace HPC. These methods will make researchers more efficient and help accelerate the progress of our whole field.” – Andrew Ng, Feb 2016 HPC Design Expertise Neural Network Design Expertise Image and Video Recognition Expertise GPU Programming Expertise + + +
  • 7. Ø  Instruction level – SIMD, VLIW etc. Ø  Thread level – warps Ø  Processor level – many cores Ø  Server level – many GPUs in a server Ø  Cluster level – many servers with high BW interconnect Ø  Data Center level Ø  Planet level J 7 levels of parallelism
  • 8. Ø Exponential growth in Data Centers Ø  Commoditization of Enterprise silicon Ø  Traditional mobile players announcing ASICs for enterprise compute Ø Higher demands for compute density Ø  GPGPUs have won the first round Ø  Dennardian scaling is breaking down Ø Power dissipation will emerge as major challenge Ø  Chip level, server level and DC level ASICs for DNNs
  • 9. Age of dark silicon Transistor property Dennardian Post Dennardian # of transistors ( Q ) S2 S2 Peak clock frequency ( F ) S S Capacitance ( C ) 1/S 1/S Supply Voltage ( Vdd ) 1 / S2 1 Dynamic Power (QFCVdd 2) 1 S2 Active Silicon 1 1/S2 * S is the ratio of feature size between next generation processes
  • 10. Ø  Dark silicon will drive heterogeneity Ø  Multi core architectures with different Instruction Sets Ø  Power aware scheduling across cores Ø  Decreasing parts of the chip can run at full clock frequency Ø  Specialized silicon for server blades Ø  Bridges to intra server and inter server communications Ø  Last level caching support, caching across MPI Ø  Distributed compute in network interfaces Heterogeneity in Enterprise silicon
  • 11. Ø  Instruction level – SIMD, VLIW etc. Ø  Thread level – warps Ø  Processor level – many cores Ø  Server level – many GPUs in a server Ø  Cluster level – many servers with high BW interconnect Ø  Data Center level Ø  Planet level J 7 levels of parallelism
  • 12. Ø  Deep Learning is driving the convergence of High Performance Compute (HPC) and Hyperscale (Data Centers) Ø  Traditional HPC ecosystems : expensive and bleeding edge Ø  DC infrastructure : commodity and homogeneous Ø  Single or dual CPU servers common Ø  All of this is changing Ø  GPGPUs now common in DCs, initial resistance Ø  InfiniBand penetration has reached a tipping point Ø  Dense compute clusters require high bandwidth interconnects Heterogeneity in Hyperscale
  • 13. Ø  Intra server vs. Inter server bandwidths Ø  Inter server bandwidths will grow faster than intra server Ø  Lead to larger, denser servers Ø  8 or more GPGPUs per server for DNN training jobs Ø  Will co-exist with CPU based servers for search and database operations Ø  Many kinds of servers, one size fits all does not work Server Architectures
  • 14. Training Server Reference Design CPU GPU GPU GPU GPU GPU GPU GPU GPU CPU IB NIC IB NIC Ethernet PCIe Root Hub SSD SSD
  • 15. GPU Training Cluster Architecture Server Node IB Switch Server Node Server Node Server Node Server Node Server Node Server Node Ethernet Switch RAID & Access •  Scalable up to 7 Server Nodes •  8 GPUs per Node •  ~2.5kw per Server Node •  100Gbps IB Interconnect
  • 16. Ø  Instruction level – SIMD, VLIW etc. Ø  Thread level – warps Ø  Processor level – many cores Ø  Server level – many GPUs in a server Ø  Cluster level – many servers with high BW interconnect Ø  Data Center level Ø  Planet level J 7 levels of parallelism
  • 17. Ø  Computing super clusters Ø  105 variability in size of compute ‘jobs’ Ø  Large number of collocated servers running the same job Ø  High BW, low latency interconnect Ø  Clusters could grow to significant fraction of a Data Center Ø  Architecture of clusters will be hierarchical and heterogeneous Ø  Edge servers for security and Data management Ø  Dedicated RAID servers Ø  Dedicated compute servers Ø  Control and management nodes Ø  Multi DC training clusters for big models are technically feasible Ø  Scientific community has performed trans continental simulations Data Center Architectures
  • 19. •  Sumit Sanyal, Founder and CEO sumit@minds.ai •  Steve Kuo, Co-Founder steve@minds.ai Contacts 19