SlideShare a Scribd company logo
1 of 63
Renee Yao | NVIDIA Senior Product Marketing Manager |
GTC Israel 2018
SIMPLIFYING AI
INFRASTRUCTURE: LESSONS
IN SCALING ON DGX SYSTEMS
2
NVIDIA
2
Pioneered GPU Computing | Founded 1993 | $9.7B | 11,000 Employees
© 2018 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use Only3
DATA
Historically we never had large datasets
The MNIST (1999) database
contains 60,000 training images
and 10,000 testing images.
COMPUTE
We never had enough compute
1980 1990 2000 2010 2020
GPU-Computing perf
1.5X per year
1000x
by
2025
Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte, O.
Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected for
2010-2015 by K. Rupp
102
103
104
105
106
107
Single-threaded perf
1.5X per year
1.1X per year
1980 1990 2000 2010 2020
END-TO-END PRODUCT FAMILY
FULLY INTEGRATED AI SYSTEMS
DESKTOP
TITAN
WORKSTATION
DGX Station
DATA CENTER
Tesla V100
DATA CENTER
Tesla V100
AUTOMOTIVE EMBEDDED
Tesla P4/T4
Drive AGX Pegasus Jetson AGX Xavier
VIRTUAL
WS
Virtual GPU
SERVER
PLATFORM
HGX1/ HGX2
HPC / TRAINING INFERENCE
DGX-1 DGX-2
5
DESIGNING AN AI PLATFORM
6
AI PLATFORM CONSIDERATIONS
Factors impacting deep learning deployment decisions
I have limited budget,
need lowest up-front
cost possible
COST
“I want the most GPU
bang for the buck
PERFORMANCE
“
PRODUCTIVITY
Must get started now,
line of business wants to
deliver results yesterday
“
7
PLATFORM IMPACT ON AI TCO
Study &
exploration
Platform Design
Productive
Experi-
mentation
HW & SW
Integra-
tion
Trouble-
shooting
Software
eng’g
Software
optimiz-
ation
Design and
Build for
Scale
Software
re-optimiz-
ation
InsightsTraining
at Scale
1. Designing and Building an AI Compute Platform – from Scratch
OPEX
CAPEX
Day
1
Month 3
Time and budget spent
on things other than
data science
“DIY”
TCO
8
TAKING A FULL-STACK MINDSET TO
AI SYSTEM DEPLOYMENT
Support Model
Accelerate Time-to-Resolution for
AI/DL Issues
AI/DL Software Stack
Maximize GPU Performance
Operating System Image
Maintain stability while keeping pace
with the latest
Hardware Architecture
Look beyond the spec sheet
9
INTRODUCING THE DGX PORTFOLIO
OF AI SUPERCOMPUTERS
10
PURPOSE-BUILT, NOT RE-PURPOSED
NVIDIA DGX STATION
AI WORKSTATION
NVIDIA DGX-1, DGX-2
AI DATA CENTER
• Universal SW for Deep Learning
• Predictable execution across
platforms
• Pervasive reach
NVIDIA AI SOFTWARE STACK
The Essential
Instrument for AI
Research
DGX-1
The Personal
AI Supercomputer
DGX Station
The World’s Most Powerful
AI System for the Most
Complex AI Challenges
DGX-2
11
SOFTWARE PERFORMANCE PROOF-POINTS
FROM THE FIELD
Global Technology Firm
Specializing in Digital Media
World-Leading Medical
Research Center
Home-grown
“optimized”
TensorFlow stack
DGX
TensorFlow
stack
1680
images/sec
2600
images/sec
Home-grown
TensorFlow stack
DGX
TensorFlow
stack
1238
images/sec
2600
images/sec
2.1X
1.5X
FASTER
2.1X
FASTER
ResNet50 Training ResNet50 Training
11
12
OpenAI
NYU
NVIDIA DGX SYSTEMS MOMENTUM BUILDING
Barriers Toppled, the Unsolvable Solved – a Sampling of DGX Systems Impact
April
2016
August
2018
UC Berkley CSIRO MIT CMU Fidelity RIKEN CloudSight
Mass. General DFK
IDSIA
Ford SAP NVIDIA
SATURNV launch
Hologic
Avitas Systems
(A GE Venture)
SK Telecom
PayPal
Chinese Academy
Of SciencesFAIR
Microsoft
University
of Michigan
Comcast
Nimbix
Noodle.ai
Oak Ridge
National
Laboratory
BHGE
Zenuity
Swiss Federal
Railway
13
DGX STATION OVERVIEW
14
NVIDIA DGX STATION
Groundbreaking AI – at your desk
The Fastest Personal Supercomputer
for Researchers and Data Scientists
Revolutionary form factor -
designed for the desk, whisper-quiet
Start experimenting in hours,
not weeks, powered by DGX Stack
Productivity that goes from desk
to data center to cloud
Breakthrough performance and
precision – powered by Volta
14
15
The Personal AI Supercomputer
for Researchers and Data Scientists
15
Key Features
1. 4 x NVIDIA Tesla V100 GPU (NOW 32 GB)
2. 2nd-gen NVLink (4-way)
3. Water-cooled design
4. 3 x DisplayPort (4K resolution)
5. Intel Xeon E5-2698 20-core
6. 256GB DDR4 RAM
2
1
5
4
3
6
NVIDIA DGX STATION
Groundbreaking AI – at your desk
16
DGX STATION: 72X FASTER THAN CPU
72X
9.9 hours
DGX
Station
20X1X
4-way GPU
Workstation
Dual Socket
CPU Server
36.4 hours
711 hours
Workload: ResNet50, 90 epochs to solution | CPU Server: Dual Xeon E5-2699 v4, 2.6GHz
17
JUMP START YOUR AI JOURNEY
Training AI on DGX Station
4x Speedup
8 Months Payback
Healthcare
10x Speedup
Self-Driving Cars
3x Speedup
2 Months Payback
Retail
8x Speedup
Smart City
18
Most mutations found when sequencing tumours are unknown
and often ignored. It requires a strong machine to uncover
the significance of each mutation and its interaction with
drugs, learning from large and multi-scaled image data of
cell expressing the mutations.
NovellusDx leverages DGX Station and containerized deep
learning framework to obtain very accurate readings of the
intracellular signalling pathway activity that is very stable
through time and other biological perturbations, with 10x
better accuracy, $70k annual saving, and 4x faster training.
In a clinical trial for results of progression free survival (PFS),
the clinical parameter—i.e. the number of months the
disease does not progress—increase 3x.
AI-Powered Tumor
Mutation Induced
Signaling Activity
Monitoring
19
For a self-driving car to reach the same level of accuracy as
a human, it will need to have travelled 11 billion miles of
test-drives, taking many years to complete.
Cognata is shaving years off this training process by enabling
virtual cars to experience the world of driving in a strikingly
realistic simulated environment.
With the NVIDIA DGX Station, Cognata
• Accelerates DNN based generative models training by a
factor of 10x
• Simultaneously runs dozens of AV simulations to
accumulate millions of virtual miles and improve and
identify edge cases
AI-powered solution enabled Cognata to increase
performance, save money, improve productivity and make
the world a safer place.
NVIDIA DGX
ACCELERATES
AUTONOMOUS VEHICLE
READINESS
20
In-store retail experience is not easy or streamline as online
retail experience.
Tracxpoint created a shopping cart that’s fully integrated
with hardware and AI powered and GPU-accelerated
software, called AIC.
Trained on DGX Station and inferenced using TensorRT and
Deepstream 2.0 on Jetson TX2, AIC can recognize 100,000
individual products in under a second, with a high accuracy,
3x performance gain, and 2 months ROI comparing to cloud
solutions.
Customers now simply place products in their cart (no need
to search for barcodes), communicate in real-time with
suppliers to get personalized offers while shopping, navigate
inside the supermarket, and then pay digitally on cart.
AI-Powered Shopping
Cart for Seamless
Online Retail
Experience in Store
Usability
Speed
Trust
21
Providing the sensor with the ability to analyze the data it
picks up has been top of mind for goverments, police,
security agencies, banking, smart cities, retail, and
transportation industry. Collecting, analyzing, and storing
data can be difficult, costly, and error-prone.
AnyVision, the world’s leading designer and developer of
recognition platforms, offers a wide range of capabilities,
including face recognition, human body recognition and
object identification.
Powered by a cutting-edge, deep neural network on NVIDIA
DGX Systems, Tesla, and Jetson, AnyVision is the first to
provide 1:1 and 1:N face recognition and can detect 115
million individuals in 0.2 seconds per database, with a 8x
performance increase in training.
Revolutionize Security
in Smart City with AI-
Powered Facial
Recognition Platform
22
DGX-1 OVERVIEW
23
NVIDIA DGX-1: THE ESSENTIAL TOOL OF AI
Highest Performance, Fully Integrated System
1 PFLOPS | 8x Tesla V100 32GB | 300 GB/s NVLink Hybrid Cube Mesh
2x Xeon | 8 TB RAID 0 | Quad 100Gbps, Dual 10GbE | 3U — 3500W
8 TB SSD 8 x Tesla V100 16GB32GB
24
DGX-1: 140X FASTER THAN CPU
140X
5.1 hoursDGX-1
8-way
GPU
Server
46X1X
15.5 hours
711 hours
Dual
Socket
CPU
Workload: ResNet50, 90 epochs to solution | CPU Server: Dual Xeon E5-2699 v4, 2.6GHz
26
WORLD-CLASS
RAILWAY LOGISTICS
10,671 trains
1.26 million riders
3,232 km of track
300 tunnels
6,000 bridges
30,000 switches
1 train
11 switches
30 possible ways
2 trains
900 ways
80 trains
1080 possibilities
> # of observed atoms in
the universe
One full day of experiments | 17 seconds
One day of whole train traffic in Switzerland | 0.3 seconds
86,000 steps in 0.3 seconds
27
DGX-2 OVERVIEW
2828
NVIDIA DGX-2
LIMITLESS DEEP LEARNING FOR EXPLORATION
WITHOUT BOUNDARIES
The World’s Most Powerful Deep Learning System
for the Most Complex Deep Learning Challenges
• Performance to Train the Previously Impossible
• Revolutionary AI Network Fabric
• Fastest Path to AI Scale
• Powered by NVIDIA GPU Cloud
For More Information: nvidia.com/dgx-2
29
DESIGNED TO TRAIN THE PREVIOUSLY IMPOSSIBLE
1
2
3
5
4
6 Two Intel Xeon Platinum CPUs
7 1.5 TB System Memory
29
30 TB NVME SSDs
Internal Storage
NVIDIA Tesla V100 32GB
Two GPU Boards
8 V100 32GB GPUs per board
6 NVSwitches per board
512GB Total HBM2 Memory
interconnected by
Plane Card
Twelve NVSwitches
2.4 TB/sec bi-section
bandwidth
Eight EDR Infiniband/100 GigE
1600 Gb/sec Total
Bi-directional Bandwidth
PCIe Switch Complex
8
9
9Dual 10/25 Gb/sec
Ethernet
30
10X PERFORMANCE GAIN IN LESS THAN A YEAR
DGX-1, SEP’17 DGX-2, Q3‘18
software improvements across the stack including NCCL, cuDNN, etc.
Workload: FairSeq, 55 epochs to solution. PyTorch training performance.
Time to Train (days)
1.5
15
0 5 10 15 20
DGX-2
DGX-1 with V100
10 Times Fasterdays
days
31
300 Skylake Gold CPU Servers
THE PERFORMANCE OF 300 SKYLAKE SERVERS
One DGX-2
SAME
performance
1/8
THE COST
60X
LESS SPACE
18X
LESS POWER
15 racks
$2.7M in
servers
32
2X HIGHER PERFORMANCE WITH NVSWITCH
2 DGX-1V servers have dual socket Xeon E5 2698v4 Processor. 8 x V100 GPUs. Servers connected via 4X 100Gb IB ports |
DGX-2 server has dual-socket Xeon Platinum 8168 Processor. 16 V100 GPUs
Weather Simulation
(ECMWF benchmark)
Language Processing
(Mixture of Experts)
DGX-2 with NVSwitch2x DGX-1 (Volta)
2.4X FASTER
2.7X FASTER
33
CRISIS MANAGEMENT
SOLUTION
Natural disasters are increasingly causing major destruction
to life, property and economies. DFKI is using the NVIDIA
DGX-2 to evolve DeepEye —which uses satellite images
enriched with social media content to identify natural
disasters— into a crisis management solution. With
the increased GPU memory and fully connected
GPUs based on the NVSwitch architecture, DFKI
can build bigger models and process more
data to aid rescuers in their decision-
making for faster, more efficient
dispatching of
resources.
34
“Fujifilm applies AI in a wide range of fields. In
healthcare, multiple NVIDIA GPUs will deliver
high-speed computation to develop AI supporting
image diagnostics.The introduction of this
supercomputer will massively increase our
processing power. We expect that AI learning that
once took days to complete can now be
completed within hours.”
AkiraYoda
chief digital officer of FUJIFILM Corporation
- Pharmaceuticals
- BioCDMO
- Regenerative medicine
- Analyzing and
recognizing medical
images
- Simulations display
materials and fine
chemicals
35
DL FROM DEVELOPMENT TO PRODUCTION
Accelerate Deep Learning Value
Experiment
Refine
Model
Deploy
Train at
Scale
Insights
Procure
DGX
Station
Install,
Build, Test
Training
Productive
Experimentation
Fast Bring-up
To Data CenterTo Desk
From
Idea
installed iterate
Inference
To
Results
refine, re-train
scale
To Edge
36
THE CHALLENGE OF SCALING AI
Addressing design, deployment and operations bottlenecks
DESIGN
GUESSWORK
DEPLOYMENT
COMPLEXITY
MULTIPLE POINTS
OF SUPPORT
NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
BUILDING AI FOR SDC IS HARD
Every neural net in our DRIVE
Software stack needs to
handle 1000s of conditions
and geolocations
HazardsAnimalsBicyclesPedestrians
Backlit
Snow
Vehicles
Day
Clear FogRainCloudy
Street
LampsNightTwilight
SDC SCALE TODAY AT NVIDIA
1PB collected/week
12-camera+Radar+Lidar
RIG mounted on 30 cars
4,000 GPUs in cluster
= 500 PFLOPs
1,500 labelers
20M objects labeled/mo
15PB active training
+ test dataset
20 unique models
50 labeling tasks
40
“A-HA” MOMENTS IN AI DATA CENTERS
Best Practices from Building the World’s Largest Deep Learning Environments
Rack Design Networking Storage Facilities Software
• DL drives
close to
operational
limits
• Similarities
to HPC best
practices
• 100GbE
preferred
• High-
bandwidth,
ultra-low
latency
• Datasets
range from
10k’s to
millions
objects
• terabyte
levels of
storage and
up
• assume
higher watts
per-rack
• Higher
FLOPS/watt
= DC less
floorspace
required
• Scale
requires
“cluster-
aware”
software
Example:
• Autonomous vehicle = 1TB / hr
• 100’s PB of raw data
• Billions of total images
• Millions of images for AI training
• 10+ neural nets and 10+ parallel
experiments
• 1 DGX-1 runs 1 deep neural net in
1 day
41
THE NEW GPU DATA CENTER LEARNINGS
42
GPU READY VS. CPU ONLY DATA CENTER
1/40th the footprint | 1/20th the power
43
1) POWER & COOLING
Cooling Techniques:
● Hot or cold aisle
containment
● Rear water door heat
exchangers
● Component-level
water cooling
Minimize power & floor space needs for better performance efficiency
Improved performance per watt | Improved performance per $$ | Higher Density
44
SAMPLE GPU SERVER
CONFIGURATIONS
DNN, Analytics,
and HPC workloads
Things to consider:
- Characterize peak power loads to avoid
unnecessary downtime
- Liquid cool – 3500 times more heat than 10 air-
cooled systems
- Component cool – capture 60-80% of server
heat, reduce cost by 50%, 2-5x increased in
density
- Rule of Thumb: 100 cfm/kW of server load + a
5% overhead for air-leakage
MaxQ
45
2) COMPUTE NETWORK
100 Gb Ethernet
Minimize Ethernet Adaptor Load on CPU
Support Cut-Through Communication
Support RDMA
Layer two network - Spine-leaf topology
for high bisection bandwidth
Fewer layer three network – minimize
routing bottleneck
Design localized traffic for scalable apps
EDR (100 Gb) or HDR (200 Gb)
InfiniBand
Option 1 - fat-tree networks to
maximize the total cluster bandwidth
Option 2 - multiple InfiniBand
connections per node for dense GPU
nodes
High-bandwidth, low-latency, and highly efficient
46
EXAMPLE: MULTI-NODE SERVER COMPARISON
with different High-Speed Interconnects
EDR InfiniBand is 20X the performance of the 10Gb Ethernet based solution > 2x app performance
47
3) STORAGE REQUIREMENT
Workloads dependent
- Support
multiple processes
accessing the same
files simultaneously
- Support many
threads and quick
access to small
pieces of data
- Dominated by
reads
- Requires high
streaming
bandwidth
- Fast random
access
- Fast memory
mapped (mmap)
performance
- Require any
combination of fast
bandwidth with
random and small
files
Parallel HPC Applications
Accelerated Analytics
Applications
Vision Based DL Apps Recurrent NN Apps
48
STORAGE ARCHITECTURE CONSIDERATIONS
Use Cases
Adequate
Read Cache?
Network Type
Recommendation
Network File System Options
Data Analytics N/A 10Gbe Object-Storage, NFS, or other system with good multi-threaded read
and small file performance
HPC NA 10/40/100 Gbe
InfiniBand
NFS, or HPC targeted file system with support for large # of clients
and fast single-node performance, support multi-threaded writes
DL, 256x256
images
Yes 10 Gbe NFS or storage with good small file support
DL, 1080p images Yes 10/40 Gbe InfiniBand, High-end NFS, HPC file system or storage with fast streaming
performance
DL, 4k images Yes 40 Gbe, InfiniBand HPC Filesystem, high-end NFS or storage with fast streaming
performance capable of 3+ GB/s per node
DL, uncompressed
Images
Yes InfiniBand,
40/100 Gbe
HPC Filesystem, high-end NFS or storage with fast streaming
performance capable of 3+ GB/s per node
DL, Datasets that
are
not cached
No InfiniBand,
10/40/100 Gbe
Same as above, aggregate storage performance must scale to meet
the all applications simultaneously
Each rack:
9 DGX-1 = 72 TESLA V100 GPUs = 9 PFLOPs
12 CPU nodes for services & data management
1.2PB per rack of cache can front object storage
MAGLEV DATA CENTER ARCHITECTURE
Kubernetes
Cloud Provider
Object Storage
On Premise
Object Storage
35kW Rack
CPU Node
CPU Node
CPU Node
CPU Node
CPU Node
CPU Node
CPU Node
CPU Node
CPU Node
DGX-1
DGX-1
DGX-1
DGX-1
DGX-1
DGX-1
DGX-1
DGX-1
DGX-1
CPU Node
CPU Node
CPU Node
MagLev Platform
35kW Rack
CPU Node
CPU Node
CPU Node
CPU Node
CPU Node
CPU Node
CPU Node
CPU Node
CPU Node
DGX-1
DGX-1
DGX-1
DGX-1
DGX-1
DGX-1
DGX-1
DGX-1
DGX-1
CPU Node
CPU Node
CPU Node
35kW Rack
CPU Node
CPU Node
CPU Node
CPU Node
CPU Node
CPU Node
CPU Node
CPU Node
CPU Node
DGX-1
DGX-1
DGX-1
DGX-1
DGX-1
DGX-1
DGX-1
DGX-1
DGX-1
CPU Node
CPU Node
CPU Node
35kW Rack
CPU Node
CPU Node
CPU Node
CPU Node
CPU Node
CPU Node
CPU Node
CPU Node
CPU Node
DGX-1
DGX-1
DGX-1
DGX-1
DGX-1
DGX-1
DGX-1
DGX-1
DGX-1
CPU Node
CPU Node
CPU Node
1PB per
week
15PB
Today
4) Manage Workflow
51
DATA SCIENCE WORKFLOW
All
Data
ETL
Manage Data
Structured
Data Store
Data
Preparation
Training
Model
Training
Visualization
Evaluate
Inference
Deploy
Slow Training Times for
Data Scientists
52
RAPIDS OPEN SOURCE SOFTWARE
Breakthrough performance for data science and machine learning workflows
Cloud
Kubernetes over 4000 GPU Cluster (= 480 PFLOPs)
Data
Lake
Selected
Datasets
Data selection
Job #1
Data selection
Job #N
…
Labeled
Datasets
Metrics
& Logs
MAGLEV
End to End Platform to Enable Industry-Grade AI Dev
“Collect ⇨ Select ⇨ Label ⇨ Train ⇨ Test”
as programmatic workflows
Ingest
1PB per
week
15PB
Today
Labeling
UI
Data selection
Job #2
Trained
Models
Training
Job #1
Training
Job #N
Training
Job #2
Testing
Job #1
Testing
Job #N
…
Testing
Job #1
…
ML/Metrics
UI
Run
Multi-Step
Workflow
(workflow =
sequence of
map jobs)
1,500
Labelers
Large
AI Dev team
20M
objects
labeled
per month
20 models
actively
developed
Code
Repository
App #1
App #2
App #N
Git+CI based
Workflow
Launcher
Traced Asset
Repository
Models
Datasets
Metrics
Code Version
NVIDIA DRIVE Car
4000-GPU Cluster
MAGLEV: AUTOMATION & TRACEABILITY
ML
Developer
Production
Engineer
Empower Prod engineers to run or schedule
complete workflows & version everything
Optimize
app perf
Deploy prod
applications
Publish
Develop
Applications
Run/Debug
Applications
Manual
Workflow
Launcher
Analyze
Experiments/results
56
INTRODUCING DGX REFERENCE
ARCHITECTURE SOLUTIONS
57
HIGH-DENSITY COMPUTE REFERENCE ARCH.
Nine DGX-1 Servers
• Eight Tesla V100 GPUs
• NVIDIA. GPUDirect™ over RDMA support
• Run at MaxQ
• 100 GbE networking (up to 4 x 100 GbE)
Twelve Storage Nodes
• 192 GB RAM
• 3.8 TB SSD
• 100 TB HDD (1.2 PB Total HDD)
• 50 GbE networking
Network
• In-rack: 100 GbE to DGX-1 servers
• In-rack: 50 GbE to storage nodes
• Out-of-rack: 4 x 100 GbE (up to 8)
Rack
• 35 kW Power
• 42U x 1200 mm x 700 mm (minimum)
• Rear Door Cooler
4 POD design with cooling
DGX-1 POD
• NVIDIA DGX POD™
• Support scalability to hundreds of nodes
• Based on proven SATURNV architecture
58
DGX REFERENCE ARCHITECTURE SOLUTIONS
Growing ecosystem of offers for enterprise IT - more to come!
Benefits:
• No more design
guesswork
• Faster, simpler
deployment
• Predictable
performance at
scale
• Simplified, single-
point of support
59
SUPPORTING AI:
ALTERNATIVE APPROACHES
Installed/
running
Problem!
“My PyTorch CNN model
is running 30% slower
than yesterday!”
“OK let me look into it”
IT Admin
60
Installed/
running
Problem!
Framework?
Libraries?
O/S?
GPU?
Drivers?
Server?
Network?
Storage?
???
IT SUPPORTING A DEEP LEARNING SYSTEM
61
Installed/
running
Problem!
Open source / forum
Open source / forum
Framework?
Libraries?
O/S?
GPU?
Drivers?
Server?
Network?
Storage?
Multiple paths to
problem resolution
Server, Storage & Network
Solution Providers
SUPPORTING AI:
ALTERNATIVE APPROACHES
62
SUPPORTING AI WITH DGX REFERENCE
ARCHITECTURE SOLUTIONS
“Update to PyTorch
container XX.XX”
AI ExpertiseDGX
VARs
Running!Problem!
DGX RA
Solution
Storage
DGX RA
Solution
Storage
“My PyTorch CNN model
is running 30% slower
than yesterday!”
IT Admin
63
THE VALUE OF AI INFRASTRUCTURE
REFERENCE ARCHITECTURES
Reference architectures from
NVIDIA and leading storage partners
SCALABLE
PERFORMANCE
Simplified, validated, converged
infrastructure offers
FASTER, SIMPLIFIED
DEPLOYMENT
TRUSTED EXPERTISE
AND SUPPORT
Available through select partners
as a turnkey solution
DGX RA
Solution
Storage
Effortless Productivity, Best Performance, Lowest TCO
64
PLATFORM IMPACT ON AI/DL TCO
Study &
exploration
Platform Design
Productive
Experi-
mentation
HW & SW
Integra-
tion
Trouble-
shooting
Software
eng’g
Software
optimiz-
ation
Design and
Build for
Scale
Software
re-optimiz-
ation
InsightsTraining
at Scale
1. Designing and Building an AI Compute Platform – from Scratch
OPEX
CAPEX
Day
1
Month 3
Time and budget spent
on things other than
data science
“DIY”
TCO
Study &
exploration
Platform Design
Productive
Experi-
mentation
Install and
Deploy
DGX
Trouble-
shooting
Software
eng’g
Software
optimiz-
ation
Design and
Build for
Scale
Software
re-
optimiz-
ation
InsightsTraining
at Scale
2. Deploying an Integrated, Full-Stack AI Solution
Day
1
Month 3
“DIY” TCO
CAPEX
DGX
TCOdeployment cycle
shortened
Study &
exploration
Insights
2. Deploying an Integrated, Full-Stack AI Solution
Day
1
Week 1
Install and
Deploy
DGX
CAPEX
Productive
Experi-
mentation
Training
at Scale
“DIY” TCO
DGX
TCO
65
NVIDIA DGX
SYSTEMS
Faster AI Innovation
and Insight
The World’s First Portfolio of
Purpose-Built AI Supercomputers
Powered by NVIDIA GPU Cloud
Get Started in AI – Faster
Effortless Productivity
Performance Without Compromise
For More Information: nvidia.com/dgx
65
THANK YOU

More Related Content

What's hot

AI Hardware Landscape 2021
AI Hardware Landscape 2021AI Hardware Landscape 2021
AI Hardware Landscape 2021Grigory Sapunov
 
State of AI Report 2022 - ONLINE.pptx
State of AI Report 2022 - ONLINE.pptxState of AI Report 2022 - ONLINE.pptx
State of AI Report 2022 - ONLINE.pptxEithuThutun
 
An AI accelerator ASIC architecture
An AI accelerator ASIC architectureAn AI accelerator ASIC architecture
An AI accelerator ASIC architectureKhanh Le
 
Generative AI in telecom.pdf
Generative AI in telecom.pdfGenerative AI in telecom.pdf
Generative AI in telecom.pdfJamieDornan2
 
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta..."The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...Edge AI and Vision Alliance
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Sri Ambati
 
The ECP Exascale Computing Project
The ECP Exascale Computing ProjectThe ECP Exascale Computing Project
The ECP Exascale Computing Projectinside-BigData.com
 
An introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceAn introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceJulien SIMON
 
Introduction to Ethereum Blockchain & Smart Contract
Introduction to Ethereum Blockchain & Smart ContractIntroduction to Ethereum Blockchain & Smart Contract
Introduction to Ethereum Blockchain & Smart ContractThanh Nguyen
 
High Performance Computing
High Performance ComputingHigh Performance Computing
High Performance ComputingDell World
 
GPU Virtualization in Embedded Automotive Solutions
GPU Virtualization in Embedded Automotive SolutionsGPU Virtualization in Embedded Automotive Solutions
GPU Virtualization in Embedded Automotive SolutionsGlobalLogic Ukraine
 
“Powering the Connected Intelligent Edge and the Future of On-Device AI,” a P...
“Powering the Connected Intelligent Edge and the Future of On-Device AI,” a P...“Powering the Connected Intelligent Edge and the Future of On-Device AI,” a P...
“Powering the Connected Intelligent Edge and the Future of On-Device AI,” a P...Edge AI and Vision Alliance
 
presentation.pdf
presentation.pdfpresentation.pdf
presentation.pdfcaa28steve
 
Status of the Memory Industry 2019 Report by Yole Developpement
Status of the Memory Industry 2019 Report by Yole DeveloppementStatus of the Memory Industry 2019 Report by Yole Developpement
Status of the Memory Industry 2019 Report by Yole DeveloppementYole Developpement
 
How does a blockchain work?
How does a blockchain work?How does a blockchain work?
How does a blockchain work?Deloitte UK
 
“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...
“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...
“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...Edge AI and Vision Alliance
 
InfiniBand Presentation
InfiniBand PresentationInfiniBand Presentation
InfiniBand PresentationShekhar Kumar
 
AI in Manufacturing - John.pdf
AI in Manufacturing - John.pdfAI in Manufacturing - John.pdf
AI in Manufacturing - John.pdfJohn Chang
 
Generative AI - The New Reality: How Key Players Are Progressing
Generative AI - The New Reality: How Key Players Are Progressing Generative AI - The New Reality: How Key Players Are Progressing
Generative AI - The New Reality: How Key Players Are Progressing Vishal Sharma
 

What's hot (20)

NVIDIA Keynote #GTC21
NVIDIA Keynote #GTC21 NVIDIA Keynote #GTC21
NVIDIA Keynote #GTC21
 
AI Hardware Landscape 2021
AI Hardware Landscape 2021AI Hardware Landscape 2021
AI Hardware Landscape 2021
 
State of AI Report 2022 - ONLINE.pptx
State of AI Report 2022 - ONLINE.pptxState of AI Report 2022 - ONLINE.pptx
State of AI Report 2022 - ONLINE.pptx
 
An AI accelerator ASIC architecture
An AI accelerator ASIC architectureAn AI accelerator ASIC architecture
An AI accelerator ASIC architecture
 
Generative AI in telecom.pdf
Generative AI in telecom.pdfGenerative AI in telecom.pdf
Generative AI in telecom.pdf
 
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta..."The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
"The Xilinx AI Engine: High Performance with Future-proof Architecture Adapta...
 
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
Open Source h2oGPT with Retrieval Augmented Generation (RAG), Web Search, and...
 
The ECP Exascale Computing Project
The ECP Exascale Computing ProjectThe ECP Exascale Computing Project
The ECP Exascale Computing Project
 
An introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging FaceAn introduction to computer vision with Hugging Face
An introduction to computer vision with Hugging Face
 
Introduction to Ethereum Blockchain & Smart Contract
Introduction to Ethereum Blockchain & Smart ContractIntroduction to Ethereum Blockchain & Smart Contract
Introduction to Ethereum Blockchain & Smart Contract
 
High Performance Computing
High Performance ComputingHigh Performance Computing
High Performance Computing
 
GPU Virtualization in Embedded Automotive Solutions
GPU Virtualization in Embedded Automotive SolutionsGPU Virtualization in Embedded Automotive Solutions
GPU Virtualization in Embedded Automotive Solutions
 
“Powering the Connected Intelligent Edge and the Future of On-Device AI,” a P...
“Powering the Connected Intelligent Edge and the Future of On-Device AI,” a P...“Powering the Connected Intelligent Edge and the Future of On-Device AI,” a P...
“Powering the Connected Intelligent Edge and the Future of On-Device AI,” a P...
 
presentation.pdf
presentation.pdfpresentation.pdf
presentation.pdf
 
Status of the Memory Industry 2019 Report by Yole Developpement
Status of the Memory Industry 2019 Report by Yole DeveloppementStatus of the Memory Industry 2019 Report by Yole Developpement
Status of the Memory Industry 2019 Report by Yole Developpement
 
How does a blockchain work?
How does a blockchain work?How does a blockchain work?
How does a blockchain work?
 
“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...
“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...
“Accelerating Newer ML Models Using the Qualcomm AI Stack,” a Presentation fr...
 
InfiniBand Presentation
InfiniBand PresentationInfiniBand Presentation
InfiniBand Presentation
 
AI in Manufacturing - John.pdf
AI in Manufacturing - John.pdfAI in Manufacturing - John.pdf
AI in Manufacturing - John.pdf
 
Generative AI - The New Reality: How Key Players Are Progressing
Generative AI - The New Reality: How Key Players Are Progressing Generative AI - The New Reality: How Key Players Are Progressing
Generative AI - The New Reality: How Key Players Are Progressing
 

Similar to Simplifying AI Infrastructure: Lessons in Scaling on DGX Systems

BAT40 NVIDIA Stampfli Künstliche Intelligenz, Roboter und autonome Fahrzeuge ...
BAT40 NVIDIA Stampfli Künstliche Intelligenz, Roboter und autonome Fahrzeuge ...BAT40 NVIDIA Stampfli Künstliche Intelligenz, Roboter und autonome Fahrzeuge ...
BAT40 NVIDIA Stampfli Künstliche Intelligenz, Roboter und autonome Fahrzeuge ...BATbern
 
Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)Rakuten Group, Inc.
 
abelbrownnvidiarakuten2016-170208065814 (1).pptx
abelbrownnvidiarakuten2016-170208065814 (1).pptxabelbrownnvidiarakuten2016-170208065814 (1).pptx
abelbrownnvidiarakuten2016-170208065814 (1).pptxgopikahari7
 
GTC China 2016
GTC China 2016GTC China 2016
GTC China 2016NVIDIA
 
Fuelling the AI Revolution with Gaming
Fuelling the AI Revolution with GamingFuelling the AI Revolution with Gaming
Fuelling the AI Revolution with GamingAlison B. Lowndes
 
GTC 2018: A New AI Era Dawns
GTC 2018: A New AI Era DawnsGTC 2018: A New AI Era Dawns
GTC 2018: A New AI Era DawnsNVIDIA
 
Enabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. LowndesEnabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. LowndesWithTheBest
 
Tales of AI agents saving the human race!
Tales of AI agents saving the human race!Tales of AI agents saving the human race!
Tales of AI agents saving the human race!Alison B. Lowndes
 
Fuelling the AI Revolution with Gaming
Fuelling the AI Revolution with GamingFuelling the AI Revolution with Gaming
Fuelling the AI Revolution with GamingC4Media
 
Harnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceHarnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceAlison B. Lowndes
 
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...Willy Marroquin (WillyDevNET)
 
NVIDIA Deep Learning Institute 2017 基調講演
NVIDIA Deep Learning Institute 2017 基調講演NVIDIA Deep Learning Institute 2017 基調講演
NVIDIA Deep Learning Institute 2017 基調講演NVIDIA Japan
 
AI in the Financial Services Industry
AI in the Financial Services IndustryAI in the Financial Services Industry
AI in the Financial Services IndustryAlison B. Lowndes
 
NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習NVIDIA Taiwan
 
GTC World Tour 2017 highlights
GTC World Tour 2017 highlightsGTC World Tour 2017 highlights
GTC World Tour 2017 highlightsShanker Trivedi
 
GTC 2016 Opening Keynote
GTC 2016 Opening KeynoteGTC 2016 Opening Keynote
GTC 2016 Opening KeynoteNVIDIA
 

Similar to Simplifying AI Infrastructure: Lessons in Scaling on DGX Systems (20)

Aplicações Potenciais de Deep Learning à Indústria do Petróleo
Aplicações Potenciais de Deep Learning à Indústria do PetróleoAplicações Potenciais de Deep Learning à Indústria do Petróleo
Aplicações Potenciais de Deep Learning à Indústria do Petróleo
 
BAT40 NVIDIA Stampfli Künstliche Intelligenz, Roboter und autonome Fahrzeuge ...
BAT40 NVIDIA Stampfli Künstliche Intelligenz, Roboter und autonome Fahrzeuge ...BAT40 NVIDIA Stampfli Künstliche Intelligenz, Roboter und autonome Fahrzeuge ...
BAT40 NVIDIA Stampfli Künstliche Intelligenz, Roboter und autonome Fahrzeuge ...
 
Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)Introduction to Deep Learning (NVIDIA)
Introduction to Deep Learning (NVIDIA)
 
abelbrownnvidiarakuten2016-170208065814 (1).pptx
abelbrownnvidiarakuten2016-170208065814 (1).pptxabelbrownnvidiarakuten2016-170208065814 (1).pptx
abelbrownnvidiarakuten2016-170208065814 (1).pptx
 
GTC China 2016
GTC China 2016GTC China 2016
GTC China 2016
 
Fuelling the AI Revolution with Gaming
Fuelling the AI Revolution with GamingFuelling the AI Revolution with Gaming
Fuelling the AI Revolution with Gaming
 
GTC 2018: A New AI Era Dawns
GTC 2018: A New AI Era DawnsGTC 2018: A New AI Era Dawns
GTC 2018: A New AI Era Dawns
 
Enabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. LowndesEnabling Artificial Intelligence - Alison B. Lowndes
Enabling Artificial Intelligence - Alison B. Lowndes
 
Tales of AI agents saving the human race!
Tales of AI agents saving the human race!Tales of AI agents saving the human race!
Tales of AI agents saving the human race!
 
Fuelling the AI Revolution with Gaming
Fuelling the AI Revolution with GamingFuelling the AI Revolution with Gaming
Fuelling the AI Revolution with Gaming
 
Phi Week 2019
Phi Week 2019Phi Week 2019
Phi Week 2019
 
Harnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligenceHarnessing the virtual realm for successful real world artificial intelligence
Harnessing the virtual realm for successful real world artificial intelligence
 
Hardware in Space
Hardware in SpaceHardware in Space
Hardware in Space
 
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...TECHNICAL OVERVIEW NVIDIA DEEP  LEARNING PLATFORM Giant Leaps in Performance ...
TECHNICAL OVERVIEW NVIDIA DEEP LEARNING PLATFORM Giant Leaps in Performance ...
 
NVIDIA Deep Learning Institute 2017 基調講演
NVIDIA Deep Learning Institute 2017 基調講演NVIDIA Deep Learning Institute 2017 基調講演
NVIDIA Deep Learning Institute 2017 基調講演
 
AI in the Financial Services Industry
AI in the Financial Services IndustryAI in the Financial Services Industry
AI in the Financial Services Industry
 
NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習NVIDIA DGX-1 超級電腦與人工智慧及深度學習
NVIDIA DGX-1 超級電腦與人工智慧及深度學習
 
GTC World Tour 2017 highlights
GTC World Tour 2017 highlightsGTC World Tour 2017 highlights
GTC World Tour 2017 highlights
 
GTC 2016 Opening Keynote
GTC 2016 Opening KeynoteGTC 2016 Opening Keynote
GTC 2016 Opening Keynote
 
GT C Tour 2018 Highlights
GT C Tour 2018 HighlightsGT C Tour 2018 Highlights
GT C Tour 2018 Highlights
 

More from Renee Yao

Medical Imaging AI Startups _RSNA 2021
Medical Imaging AI Startups _RSNA 2021Medical Imaging AI Startups _RSNA 2021
Medical Imaging AI Startups _RSNA 2021Renee Yao
 
Women L.E.A.D. Toastmasters Appreciation event
Women L.E.A.D. Toastmasters Appreciation eventWomen L.E.A.D. Toastmasters Appreciation event
Women L.E.A.D. Toastmasters Appreciation eventRenee Yao
 
Toastmasters Evaluation Contest Workshop
Toastmasters Evaluation Contest WorkshopToastmasters Evaluation Contest Workshop
Toastmasters Evaluation Contest WorkshopRenee Yao
 
Presentation tips for non native speakers
Presentation tips for non native speakersPresentation tips for non native speakers
Presentation tips for non native speakersRenee Yao
 
How to be an effective mentor
How to be an effective mentorHow to be an effective mentor
How to be an effective mentorRenee Yao
 
Why Toastmasters and How it Helps Your Daily Job
Why Toastmasters and How it Helps Your Daily Job Why Toastmasters and How it Helps Your Daily Job
Why Toastmasters and How it Helps Your Daily Job Renee Yao
 
How to get the most out of a mentorship
How to get the most out of a mentorshipHow to get the most out of a mentorship
How to get the most out of a mentorshipRenee Yao
 
AI in Healthcare | Future of Smart Hospitals
AI in Healthcare | Future of Smart Hospitals AI in Healthcare | Future of Smart Hospitals
AI in Healthcare | Future of Smart Hospitals Renee Yao
 
How to Evaluate Effectively
How to Evaluate EffectivelyHow to Evaluate Effectively
How to Evaluate EffectivelyRenee Yao
 
Startups Step Up - how healthcare ai startups are taking action during covid-...
Startups Step Up - how healthcare ai startups are taking action during covid-...Startups Step Up - how healthcare ai startups are taking action during covid-...
Startups Step Up - how healthcare ai startups are taking action during covid-...Renee Yao
 
Code for good
Code for goodCode for good
Code for goodRenee Yao
 
NetApp Insights 2018 Post Show
NetApp Insights 2018 Post ShowNetApp Insights 2018 Post Show
NetApp Insights 2018 Post ShowRenee Yao
 
Accelerate AI w/ Synthetic Data using GANs
Accelerate AI w/ Synthetic Data using GANsAccelerate AI w/ Synthetic Data using GANs
Accelerate AI w/ Synthetic Data using GANsRenee Yao
 
HPE and NVIDIA empowering AI and IoT
HPE and NVIDIA empowering AI and IoTHPE and NVIDIA empowering AI and IoT
HPE and NVIDIA empowering AI and IoTRenee Yao
 
Dell and NVIDIA for Your AI workloads in the Data Center
Dell and NVIDIA for Your AI workloads in the Data CenterDell and NVIDIA for Your AI workloads in the Data Center
Dell and NVIDIA for Your AI workloads in the Data CenterRenee Yao
 
Orchestrate Your AI Workload with Cisco Hyperflex, Powered by NVIDIA GPUs
Orchestrate Your AI Workload with Cisco Hyperflex, Powered by NVIDIA GPUs Orchestrate Your AI Workload with Cisco Hyperflex, Powered by NVIDIA GPUs
Orchestrate Your AI Workload with Cisco Hyperflex, Powered by NVIDIA GPUs Renee Yao
 
Public speaking journey
Public speaking journeyPublic speaking journey
Public speaking journeyRenee Yao
 
Cisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSION
Cisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSIONCisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSION
Cisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSIONRenee Yao
 

More from Renee Yao (19)

Medical Imaging AI Startups _RSNA 2021
Medical Imaging AI Startups _RSNA 2021Medical Imaging AI Startups _RSNA 2021
Medical Imaging AI Startups _RSNA 2021
 
Women L.E.A.D. Toastmasters Appreciation event
Women L.E.A.D. Toastmasters Appreciation eventWomen L.E.A.D. Toastmasters Appreciation event
Women L.E.A.D. Toastmasters Appreciation event
 
Toastmasters Evaluation Contest Workshop
Toastmasters Evaluation Contest WorkshopToastmasters Evaluation Contest Workshop
Toastmasters Evaluation Contest Workshop
 
Presentation tips for non native speakers
Presentation tips for non native speakersPresentation tips for non native speakers
Presentation tips for non native speakers
 
How to be an effective mentor
How to be an effective mentorHow to be an effective mentor
How to be an effective mentor
 
Why Toastmasters and How it Helps Your Daily Job
Why Toastmasters and How it Helps Your Daily Job Why Toastmasters and How it Helps Your Daily Job
Why Toastmasters and How it Helps Your Daily Job
 
How to get the most out of a mentorship
How to get the most out of a mentorshipHow to get the most out of a mentorship
How to get the most out of a mentorship
 
AI in Healthcare | Future of Smart Hospitals
AI in Healthcare | Future of Smart Hospitals AI in Healthcare | Future of Smart Hospitals
AI in Healthcare | Future of Smart Hospitals
 
How to Evaluate Effectively
How to Evaluate EffectivelyHow to Evaluate Effectively
How to Evaluate Effectively
 
Startups Step Up - how healthcare ai startups are taking action during covid-...
Startups Step Up - how healthcare ai startups are taking action during covid-...Startups Step Up - how healthcare ai startups are taking action during covid-...
Startups Step Up - how healthcare ai startups are taking action during covid-...
 
Code for good
Code for goodCode for good
Code for good
 
NetApp Insights 2018 Post Show
NetApp Insights 2018 Post ShowNetApp Insights 2018 Post Show
NetApp Insights 2018 Post Show
 
Accelerate AI w/ Synthetic Data using GANs
Accelerate AI w/ Synthetic Data using GANsAccelerate AI w/ Synthetic Data using GANs
Accelerate AI w/ Synthetic Data using GANs
 
HPE and NVIDIA empowering AI and IoT
HPE and NVIDIA empowering AI and IoTHPE and NVIDIA empowering AI and IoT
HPE and NVIDIA empowering AI and IoT
 
Dell and NVIDIA for Your AI workloads in the Data Center
Dell and NVIDIA for Your AI workloads in the Data CenterDell and NVIDIA for Your AI workloads in the Data Center
Dell and NVIDIA for Your AI workloads in the Data Center
 
Orchestrate Your AI Workload with Cisco Hyperflex, Powered by NVIDIA GPUs
Orchestrate Your AI Workload with Cisco Hyperflex, Powered by NVIDIA GPUs Orchestrate Your AI Workload with Cisco Hyperflex, Powered by NVIDIA GPUs
Orchestrate Your AI Workload with Cisco Hyperflex, Powered by NVIDIA GPUs
 
Public speaking journey
Public speaking journeyPublic speaking journey
Public speaking journey
 
Cisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSION
Cisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSIONCisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSION
Cisco_Big_Data_Webinar_At-A-Glance_ABSOLUTE_FINAL_VERSION
 
Renee Yao
Renee YaoRenee Yao
Renee Yao
 

Recently uploaded

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...Zilliz
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfOrbitshub
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDropbox
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKJago de Vreede
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdfSandro Moreira
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWERMadyBayot
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...apidays
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingEdi Saputra
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...apidays
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024The Digital Insurer
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsNanddeep Nachan
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxRustici Software
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Victor Rentea
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Orbitshub
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FMESafe Software
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MIND CTI
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistandanishmna97
 

Recently uploaded (20)

"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdfRising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
Rising Above_ Dubai Floods and the Fortitude of Dubai International Airport.pdf
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUKSpring Boot vs Quarkus the ultimate battle - DevoxxUK
Spring Boot vs Quarkus the ultimate battle - DevoxxUK
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf[BuildWithAI] Introduction to Gemini.pdf
[BuildWithAI] Introduction to Gemini.pdf
 
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWEREMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
EMPOWERMENT TECHNOLOGY GRADE 11 QUARTER 2 REVIEWER
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost SavingRepurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
Repurposing LNG terminals for Hydrogen Ammonia: Feasibility and Cost Saving
 
Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...Apidays New York 2024 - The value of a flexible API Management solution for O...
Apidays New York 2024 - The value of a flexible API Management solution for O...
 
Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024Manulife - Insurer Transformation Award 2024
Manulife - Insurer Transformation Award 2024
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024Finding Java's Hidden Performance Traps @ DevoxxUK 2024
Finding Java's Hidden Performance Traps @ DevoxxUK 2024
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 
CNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In PakistanCNIC Information System with Pakdata Cf In Pakistan
CNIC Information System with Pakdata Cf In Pakistan
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 

Simplifying AI Infrastructure: Lessons in Scaling on DGX Systems

  • 1. Renee Yao | NVIDIA Senior Product Marketing Manager | GTC Israel 2018 SIMPLIFYING AI INFRASTRUCTURE: LESSONS IN SCALING ON DGX SYSTEMS
  • 2. 2 NVIDIA 2 Pioneered GPU Computing | Founded 1993 | $9.7B | 11,000 Employees
  • 3. © 2018 NetApp, Inc. All rights reserved. NetApp Confidential – Limited Use Only3 DATA Historically we never had large datasets The MNIST (1999) database contains 60,000 training images and 10,000 testing images. COMPUTE We never had enough compute 1980 1990 2000 2010 2020 GPU-Computing perf 1.5X per year 1000x by 2025 Original data up to the year 2010 collected and plotted by M. Horowitz, F. Labonte, O. Shacham, K. Olukotun, L. Hammond, and C. Batten New plot and data collected for 2010-2015 by K. Rupp 102 103 104 105 106 107 Single-threaded perf 1.5X per year 1.1X per year 1980 1990 2000 2010 2020
  • 4. END-TO-END PRODUCT FAMILY FULLY INTEGRATED AI SYSTEMS DESKTOP TITAN WORKSTATION DGX Station DATA CENTER Tesla V100 DATA CENTER Tesla V100 AUTOMOTIVE EMBEDDED Tesla P4/T4 Drive AGX Pegasus Jetson AGX Xavier VIRTUAL WS Virtual GPU SERVER PLATFORM HGX1/ HGX2 HPC / TRAINING INFERENCE DGX-1 DGX-2
  • 5. 5 DESIGNING AN AI PLATFORM
  • 6. 6 AI PLATFORM CONSIDERATIONS Factors impacting deep learning deployment decisions I have limited budget, need lowest up-front cost possible COST “I want the most GPU bang for the buck PERFORMANCE “ PRODUCTIVITY Must get started now, line of business wants to deliver results yesterday “
  • 7. 7 PLATFORM IMPACT ON AI TCO Study & exploration Platform Design Productive Experi- mentation HW & SW Integra- tion Trouble- shooting Software eng’g Software optimiz- ation Design and Build for Scale Software re-optimiz- ation InsightsTraining at Scale 1. Designing and Building an AI Compute Platform – from Scratch OPEX CAPEX Day 1 Month 3 Time and budget spent on things other than data science “DIY” TCO
  • 8. 8 TAKING A FULL-STACK MINDSET TO AI SYSTEM DEPLOYMENT Support Model Accelerate Time-to-Resolution for AI/DL Issues AI/DL Software Stack Maximize GPU Performance Operating System Image Maintain stability while keeping pace with the latest Hardware Architecture Look beyond the spec sheet
  • 9. 9 INTRODUCING THE DGX PORTFOLIO OF AI SUPERCOMPUTERS
  • 10. 10 PURPOSE-BUILT, NOT RE-PURPOSED NVIDIA DGX STATION AI WORKSTATION NVIDIA DGX-1, DGX-2 AI DATA CENTER • Universal SW for Deep Learning • Predictable execution across platforms • Pervasive reach NVIDIA AI SOFTWARE STACK The Essential Instrument for AI Research DGX-1 The Personal AI Supercomputer DGX Station The World’s Most Powerful AI System for the Most Complex AI Challenges DGX-2
  • 11. 11 SOFTWARE PERFORMANCE PROOF-POINTS FROM THE FIELD Global Technology Firm Specializing in Digital Media World-Leading Medical Research Center Home-grown “optimized” TensorFlow stack DGX TensorFlow stack 1680 images/sec 2600 images/sec Home-grown TensorFlow stack DGX TensorFlow stack 1238 images/sec 2600 images/sec 2.1X 1.5X FASTER 2.1X FASTER ResNet50 Training ResNet50 Training 11
  • 12. 12 OpenAI NYU NVIDIA DGX SYSTEMS MOMENTUM BUILDING Barriers Toppled, the Unsolvable Solved – a Sampling of DGX Systems Impact April 2016 August 2018 UC Berkley CSIRO MIT CMU Fidelity RIKEN CloudSight Mass. General DFK IDSIA Ford SAP NVIDIA SATURNV launch Hologic Avitas Systems (A GE Venture) SK Telecom PayPal Chinese Academy Of SciencesFAIR Microsoft University of Michigan Comcast Nimbix Noodle.ai Oak Ridge National Laboratory BHGE Zenuity Swiss Federal Railway
  • 14. 14 NVIDIA DGX STATION Groundbreaking AI – at your desk The Fastest Personal Supercomputer for Researchers and Data Scientists Revolutionary form factor - designed for the desk, whisper-quiet Start experimenting in hours, not weeks, powered by DGX Stack Productivity that goes from desk to data center to cloud Breakthrough performance and precision – powered by Volta 14
  • 15. 15 The Personal AI Supercomputer for Researchers and Data Scientists 15 Key Features 1. 4 x NVIDIA Tesla V100 GPU (NOW 32 GB) 2. 2nd-gen NVLink (4-way) 3. Water-cooled design 4. 3 x DisplayPort (4K resolution) 5. Intel Xeon E5-2698 20-core 6. 256GB DDR4 RAM 2 1 5 4 3 6 NVIDIA DGX STATION Groundbreaking AI – at your desk
  • 16. 16 DGX STATION: 72X FASTER THAN CPU 72X 9.9 hours DGX Station 20X1X 4-way GPU Workstation Dual Socket CPU Server 36.4 hours 711 hours Workload: ResNet50, 90 epochs to solution | CPU Server: Dual Xeon E5-2699 v4, 2.6GHz
  • 17. 17 JUMP START YOUR AI JOURNEY Training AI on DGX Station 4x Speedup 8 Months Payback Healthcare 10x Speedup Self-Driving Cars 3x Speedup 2 Months Payback Retail 8x Speedup Smart City
  • 18. 18 Most mutations found when sequencing tumours are unknown and often ignored. It requires a strong machine to uncover the significance of each mutation and its interaction with drugs, learning from large and multi-scaled image data of cell expressing the mutations. NovellusDx leverages DGX Station and containerized deep learning framework to obtain very accurate readings of the intracellular signalling pathway activity that is very stable through time and other biological perturbations, with 10x better accuracy, $70k annual saving, and 4x faster training. In a clinical trial for results of progression free survival (PFS), the clinical parameter—i.e. the number of months the disease does not progress—increase 3x. AI-Powered Tumor Mutation Induced Signaling Activity Monitoring
  • 19. 19 For a self-driving car to reach the same level of accuracy as a human, it will need to have travelled 11 billion miles of test-drives, taking many years to complete. Cognata is shaving years off this training process by enabling virtual cars to experience the world of driving in a strikingly realistic simulated environment. With the NVIDIA DGX Station, Cognata • Accelerates DNN based generative models training by a factor of 10x • Simultaneously runs dozens of AV simulations to accumulate millions of virtual miles and improve and identify edge cases AI-powered solution enabled Cognata to increase performance, save money, improve productivity and make the world a safer place. NVIDIA DGX ACCELERATES AUTONOMOUS VEHICLE READINESS
  • 20. 20 In-store retail experience is not easy or streamline as online retail experience. Tracxpoint created a shopping cart that’s fully integrated with hardware and AI powered and GPU-accelerated software, called AIC. Trained on DGX Station and inferenced using TensorRT and Deepstream 2.0 on Jetson TX2, AIC can recognize 100,000 individual products in under a second, with a high accuracy, 3x performance gain, and 2 months ROI comparing to cloud solutions. Customers now simply place products in their cart (no need to search for barcodes), communicate in real-time with suppliers to get personalized offers while shopping, navigate inside the supermarket, and then pay digitally on cart. AI-Powered Shopping Cart for Seamless Online Retail Experience in Store Usability Speed Trust
  • 21. 21 Providing the sensor with the ability to analyze the data it picks up has been top of mind for goverments, police, security agencies, banking, smart cities, retail, and transportation industry. Collecting, analyzing, and storing data can be difficult, costly, and error-prone. AnyVision, the world’s leading designer and developer of recognition platforms, offers a wide range of capabilities, including face recognition, human body recognition and object identification. Powered by a cutting-edge, deep neural network on NVIDIA DGX Systems, Tesla, and Jetson, AnyVision is the first to provide 1:1 and 1:N face recognition and can detect 115 million individuals in 0.2 seconds per database, with a 8x performance increase in training. Revolutionize Security in Smart City with AI- Powered Facial Recognition Platform
  • 23. 23 NVIDIA DGX-1: THE ESSENTIAL TOOL OF AI Highest Performance, Fully Integrated System 1 PFLOPS | 8x Tesla V100 32GB | 300 GB/s NVLink Hybrid Cube Mesh 2x Xeon | 8 TB RAID 0 | Quad 100Gbps, Dual 10GbE | 3U — 3500W 8 TB SSD 8 x Tesla V100 16GB32GB
  • 24. 24 DGX-1: 140X FASTER THAN CPU 140X 5.1 hoursDGX-1 8-way GPU Server 46X1X 15.5 hours 711 hours Dual Socket CPU Workload: ResNet50, 90 epochs to solution | CPU Server: Dual Xeon E5-2699 v4, 2.6GHz
  • 25. 26 WORLD-CLASS RAILWAY LOGISTICS 10,671 trains 1.26 million riders 3,232 km of track 300 tunnels 6,000 bridges 30,000 switches 1 train 11 switches 30 possible ways 2 trains 900 ways 80 trains 1080 possibilities > # of observed atoms in the universe One full day of experiments | 17 seconds One day of whole train traffic in Switzerland | 0.3 seconds 86,000 steps in 0.3 seconds
  • 27. 2828 NVIDIA DGX-2 LIMITLESS DEEP LEARNING FOR EXPLORATION WITHOUT BOUNDARIES The World’s Most Powerful Deep Learning System for the Most Complex Deep Learning Challenges • Performance to Train the Previously Impossible • Revolutionary AI Network Fabric • Fastest Path to AI Scale • Powered by NVIDIA GPU Cloud For More Information: nvidia.com/dgx-2
  • 28. 29 DESIGNED TO TRAIN THE PREVIOUSLY IMPOSSIBLE 1 2 3 5 4 6 Two Intel Xeon Platinum CPUs 7 1.5 TB System Memory 29 30 TB NVME SSDs Internal Storage NVIDIA Tesla V100 32GB Two GPU Boards 8 V100 32GB GPUs per board 6 NVSwitches per board 512GB Total HBM2 Memory interconnected by Plane Card Twelve NVSwitches 2.4 TB/sec bi-section bandwidth Eight EDR Infiniband/100 GigE 1600 Gb/sec Total Bi-directional Bandwidth PCIe Switch Complex 8 9 9Dual 10/25 Gb/sec Ethernet
  • 29. 30 10X PERFORMANCE GAIN IN LESS THAN A YEAR DGX-1, SEP’17 DGX-2, Q3‘18 software improvements across the stack including NCCL, cuDNN, etc. Workload: FairSeq, 55 epochs to solution. PyTorch training performance. Time to Train (days) 1.5 15 0 5 10 15 20 DGX-2 DGX-1 with V100 10 Times Fasterdays days
  • 30. 31 300 Skylake Gold CPU Servers THE PERFORMANCE OF 300 SKYLAKE SERVERS One DGX-2 SAME performance 1/8 THE COST 60X LESS SPACE 18X LESS POWER 15 racks $2.7M in servers
  • 31. 32 2X HIGHER PERFORMANCE WITH NVSWITCH 2 DGX-1V servers have dual socket Xeon E5 2698v4 Processor. 8 x V100 GPUs. Servers connected via 4X 100Gb IB ports | DGX-2 server has dual-socket Xeon Platinum 8168 Processor. 16 V100 GPUs Weather Simulation (ECMWF benchmark) Language Processing (Mixture of Experts) DGX-2 with NVSwitch2x DGX-1 (Volta) 2.4X FASTER 2.7X FASTER
  • 32. 33 CRISIS MANAGEMENT SOLUTION Natural disasters are increasingly causing major destruction to life, property and economies. DFKI is using the NVIDIA DGX-2 to evolve DeepEye —which uses satellite images enriched with social media content to identify natural disasters— into a crisis management solution. With the increased GPU memory and fully connected GPUs based on the NVSwitch architecture, DFKI can build bigger models and process more data to aid rescuers in their decision- making for faster, more efficient dispatching of resources.
  • 33. 34 “Fujifilm applies AI in a wide range of fields. In healthcare, multiple NVIDIA GPUs will deliver high-speed computation to develop AI supporting image diagnostics.The introduction of this supercomputer will massively increase our processing power. We expect that AI learning that once took days to complete can now be completed within hours.” AkiraYoda chief digital officer of FUJIFILM Corporation - Pharmaceuticals - BioCDMO - Regenerative medicine - Analyzing and recognizing medical images - Simulations display materials and fine chemicals
  • 34. 35 DL FROM DEVELOPMENT TO PRODUCTION Accelerate Deep Learning Value Experiment Refine Model Deploy Train at Scale Insights Procure DGX Station Install, Build, Test Training Productive Experimentation Fast Bring-up To Data CenterTo Desk From Idea installed iterate Inference To Results refine, re-train scale To Edge
  • 35. 36 THE CHALLENGE OF SCALING AI Addressing design, deployment and operations bottlenecks DESIGN GUESSWORK DEPLOYMENT COMPLEXITY MULTIPLE POINTS OF SUPPORT
  • 36. NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE. BUILDING AI FOR SDC IS HARD Every neural net in our DRIVE Software stack needs to handle 1000s of conditions and geolocations HazardsAnimalsBicyclesPedestrians Backlit Snow Vehicles Day Clear FogRainCloudy Street LampsNightTwilight
  • 37. SDC SCALE TODAY AT NVIDIA 1PB collected/week 12-camera+Radar+Lidar RIG mounted on 30 cars 4,000 GPUs in cluster = 500 PFLOPs 1,500 labelers 20M objects labeled/mo 15PB active training + test dataset 20 unique models 50 labeling tasks
  • 38. 40 “A-HA” MOMENTS IN AI DATA CENTERS Best Practices from Building the World’s Largest Deep Learning Environments Rack Design Networking Storage Facilities Software • DL drives close to operational limits • Similarities to HPC best practices • 100GbE preferred • High- bandwidth, ultra-low latency • Datasets range from 10k’s to millions objects • terabyte levels of storage and up • assume higher watts per-rack • Higher FLOPS/watt = DC less floorspace required • Scale requires “cluster- aware” software Example: • Autonomous vehicle = 1TB / hr • 100’s PB of raw data • Billions of total images • Millions of images for AI training • 10+ neural nets and 10+ parallel experiments • 1 DGX-1 runs 1 deep neural net in 1 day
  • 39. 41 THE NEW GPU DATA CENTER LEARNINGS
  • 40. 42 GPU READY VS. CPU ONLY DATA CENTER 1/40th the footprint | 1/20th the power
  • 41. 43 1) POWER & COOLING Cooling Techniques: ● Hot or cold aisle containment ● Rear water door heat exchangers ● Component-level water cooling Minimize power & floor space needs for better performance efficiency Improved performance per watt | Improved performance per $$ | Higher Density
  • 42. 44 SAMPLE GPU SERVER CONFIGURATIONS DNN, Analytics, and HPC workloads Things to consider: - Characterize peak power loads to avoid unnecessary downtime - Liquid cool – 3500 times more heat than 10 air- cooled systems - Component cool – capture 60-80% of server heat, reduce cost by 50%, 2-5x increased in density - Rule of Thumb: 100 cfm/kW of server load + a 5% overhead for air-leakage MaxQ
  • 43. 45 2) COMPUTE NETWORK 100 Gb Ethernet Minimize Ethernet Adaptor Load on CPU Support Cut-Through Communication Support RDMA Layer two network - Spine-leaf topology for high bisection bandwidth Fewer layer three network – minimize routing bottleneck Design localized traffic for scalable apps EDR (100 Gb) or HDR (200 Gb) InfiniBand Option 1 - fat-tree networks to maximize the total cluster bandwidth Option 2 - multiple InfiniBand connections per node for dense GPU nodes High-bandwidth, low-latency, and highly efficient
  • 44. 46 EXAMPLE: MULTI-NODE SERVER COMPARISON with different High-Speed Interconnects EDR InfiniBand is 20X the performance of the 10Gb Ethernet based solution > 2x app performance
  • 45. 47 3) STORAGE REQUIREMENT Workloads dependent - Support multiple processes accessing the same files simultaneously - Support many threads and quick access to small pieces of data - Dominated by reads - Requires high streaming bandwidth - Fast random access - Fast memory mapped (mmap) performance - Require any combination of fast bandwidth with random and small files Parallel HPC Applications Accelerated Analytics Applications Vision Based DL Apps Recurrent NN Apps
  • 46. 48 STORAGE ARCHITECTURE CONSIDERATIONS Use Cases Adequate Read Cache? Network Type Recommendation Network File System Options Data Analytics N/A 10Gbe Object-Storage, NFS, or other system with good multi-threaded read and small file performance HPC NA 10/40/100 Gbe InfiniBand NFS, or HPC targeted file system with support for large # of clients and fast single-node performance, support multi-threaded writes DL, 256x256 images Yes 10 Gbe NFS or storage with good small file support DL, 1080p images Yes 10/40 Gbe InfiniBand, High-end NFS, HPC file system or storage with fast streaming performance DL, 4k images Yes 40 Gbe, InfiniBand HPC Filesystem, high-end NFS or storage with fast streaming performance capable of 3+ GB/s per node DL, uncompressed Images Yes InfiniBand, 40/100 Gbe HPC Filesystem, high-end NFS or storage with fast streaming performance capable of 3+ GB/s per node DL, Datasets that are not cached No InfiniBand, 10/40/100 Gbe Same as above, aggregate storage performance must scale to meet the all applications simultaneously
  • 47. Each rack: 9 DGX-1 = 72 TESLA V100 GPUs = 9 PFLOPs 12 CPU nodes for services & data management 1.2PB per rack of cache can front object storage MAGLEV DATA CENTER ARCHITECTURE Kubernetes Cloud Provider Object Storage On Premise Object Storage 35kW Rack CPU Node CPU Node CPU Node CPU Node CPU Node CPU Node CPU Node CPU Node CPU Node DGX-1 DGX-1 DGX-1 DGX-1 DGX-1 DGX-1 DGX-1 DGX-1 DGX-1 CPU Node CPU Node CPU Node MagLev Platform 35kW Rack CPU Node CPU Node CPU Node CPU Node CPU Node CPU Node CPU Node CPU Node CPU Node DGX-1 DGX-1 DGX-1 DGX-1 DGX-1 DGX-1 DGX-1 DGX-1 DGX-1 CPU Node CPU Node CPU Node 35kW Rack CPU Node CPU Node CPU Node CPU Node CPU Node CPU Node CPU Node CPU Node CPU Node DGX-1 DGX-1 DGX-1 DGX-1 DGX-1 DGX-1 DGX-1 DGX-1 DGX-1 CPU Node CPU Node CPU Node 35kW Rack CPU Node CPU Node CPU Node CPU Node CPU Node CPU Node CPU Node CPU Node CPU Node DGX-1 DGX-1 DGX-1 DGX-1 DGX-1 DGX-1 DGX-1 DGX-1 DGX-1 CPU Node CPU Node CPU Node 1PB per week 15PB Today
  • 49. 51 DATA SCIENCE WORKFLOW All Data ETL Manage Data Structured Data Store Data Preparation Training Model Training Visualization Evaluate Inference Deploy Slow Training Times for Data Scientists
  • 50. 52 RAPIDS OPEN SOURCE SOFTWARE Breakthrough performance for data science and machine learning workflows
  • 51. Cloud Kubernetes over 4000 GPU Cluster (= 480 PFLOPs) Data Lake Selected Datasets Data selection Job #1 Data selection Job #N … Labeled Datasets Metrics & Logs MAGLEV End to End Platform to Enable Industry-Grade AI Dev “Collect ⇨ Select ⇨ Label ⇨ Train ⇨ Test” as programmatic workflows Ingest 1PB per week 15PB Today Labeling UI Data selection Job #2 Trained Models Training Job #1 Training Job #N Training Job #2 Testing Job #1 Testing Job #N … Testing Job #1 … ML/Metrics UI Run Multi-Step Workflow (workflow = sequence of map jobs) 1,500 Labelers Large AI Dev team 20M objects labeled per month 20 models actively developed
  • 52. Code Repository App #1 App #2 App #N Git+CI based Workflow Launcher Traced Asset Repository Models Datasets Metrics Code Version NVIDIA DRIVE Car 4000-GPU Cluster MAGLEV: AUTOMATION & TRACEABILITY ML Developer Production Engineer Empower Prod engineers to run or schedule complete workflows & version everything Optimize app perf Deploy prod applications Publish Develop Applications Run/Debug Applications Manual Workflow Launcher Analyze Experiments/results
  • 54. 57 HIGH-DENSITY COMPUTE REFERENCE ARCH. Nine DGX-1 Servers • Eight Tesla V100 GPUs • NVIDIA. GPUDirect™ over RDMA support • Run at MaxQ • 100 GbE networking (up to 4 x 100 GbE) Twelve Storage Nodes • 192 GB RAM • 3.8 TB SSD • 100 TB HDD (1.2 PB Total HDD) • 50 GbE networking Network • In-rack: 100 GbE to DGX-1 servers • In-rack: 50 GbE to storage nodes • Out-of-rack: 4 x 100 GbE (up to 8) Rack • 35 kW Power • 42U x 1200 mm x 700 mm (minimum) • Rear Door Cooler 4 POD design with cooling DGX-1 POD • NVIDIA DGX POD™ • Support scalability to hundreds of nodes • Based on proven SATURNV architecture
  • 55. 58 DGX REFERENCE ARCHITECTURE SOLUTIONS Growing ecosystem of offers for enterprise IT - more to come! Benefits: • No more design guesswork • Faster, simpler deployment • Predictable performance at scale • Simplified, single- point of support
  • 56. 59 SUPPORTING AI: ALTERNATIVE APPROACHES Installed/ running Problem! “My PyTorch CNN model is running 30% slower than yesterday!” “OK let me look into it” IT Admin
  • 58. 61 Installed/ running Problem! Open source / forum Open source / forum Framework? Libraries? O/S? GPU? Drivers? Server? Network? Storage? Multiple paths to problem resolution Server, Storage & Network Solution Providers SUPPORTING AI: ALTERNATIVE APPROACHES
  • 59. 62 SUPPORTING AI WITH DGX REFERENCE ARCHITECTURE SOLUTIONS “Update to PyTorch container XX.XX” AI ExpertiseDGX VARs Running!Problem! DGX RA Solution Storage DGX RA Solution Storage “My PyTorch CNN model is running 30% slower than yesterday!” IT Admin
  • 60. 63 THE VALUE OF AI INFRASTRUCTURE REFERENCE ARCHITECTURES Reference architectures from NVIDIA and leading storage partners SCALABLE PERFORMANCE Simplified, validated, converged infrastructure offers FASTER, SIMPLIFIED DEPLOYMENT TRUSTED EXPERTISE AND SUPPORT Available through select partners as a turnkey solution DGX RA Solution Storage Effortless Productivity, Best Performance, Lowest TCO
  • 61. 64 PLATFORM IMPACT ON AI/DL TCO Study & exploration Platform Design Productive Experi- mentation HW & SW Integra- tion Trouble- shooting Software eng’g Software optimiz- ation Design and Build for Scale Software re-optimiz- ation InsightsTraining at Scale 1. Designing and Building an AI Compute Platform – from Scratch OPEX CAPEX Day 1 Month 3 Time and budget spent on things other than data science “DIY” TCO Study & exploration Platform Design Productive Experi- mentation Install and Deploy DGX Trouble- shooting Software eng’g Software optimiz- ation Design and Build for Scale Software re- optimiz- ation InsightsTraining at Scale 2. Deploying an Integrated, Full-Stack AI Solution Day 1 Month 3 “DIY” TCO CAPEX DGX TCOdeployment cycle shortened Study & exploration Insights 2. Deploying an Integrated, Full-Stack AI Solution Day 1 Week 1 Install and Deploy DGX CAPEX Productive Experi- mentation Training at Scale “DIY” TCO DGX TCO
  • 62. 65 NVIDIA DGX SYSTEMS Faster AI Innovation and Insight The World’s First Portfolio of Purpose-Built AI Supercomputers Powered by NVIDIA GPU Cloud Get Started in AI – Faster Effortless Productivity Performance Without Compromise For More Information: nvidia.com/dgx 65