3. 3
DEEP LEARNING EVERYWHERE
INTERNET & CLOUD
Image Classification
Speech Recognition
Language Translation
Language Processing
Sentiment Analysis
Recommendation
MEDIA &
ENTERTAINMENT
Video Captioning
Video Search
Real Time Translation
AUTONOMOUS
MACHINES
Pedestrian Detection
Lane Tracking
Recognize Traffic Sign
SECURITY & DEFENSE
Face Detection
Video Surveillance
Satellite Imagery
MEDICINE & BIOLOGY
Cancer Cell Detection
Diabetic Grading
Drug Discovery
5. 5
72%
74%
84%
88%
93%
96%
2010 2011 2012 2013 2014 2015
“SUPERHUMAN” RESULTS
SPARK HYPERSCALE
ADOPTION
Deep Learning
ImageNet — Accuracy %
Cloud Services with AI Powered by NVIDIA
Alibaba/Aliyun Amazon Baidu eBay Facebook
Flickr Google iFLYTEK iQIYI JD.com
Orange Periscope Pinterest Qihoo 360 Shazam
Skype Sogou Twitter Yahoo Supermarket Yandex Yelp
Hand-coded CV
Human
74%
76%
6. 6
Source: IDC Worldwide Big Data and Analytics 2016 Predictions, November 2015.
IDC FutureScape: Worldwide Digital Strategy Consulting 2016 Predictions, Nov 2015;
“By 2020, 80% of Big Data and Analytics
deployments will need distributed micro
analytics and 40% of all business analytics
software will incorporate prescriptive
analytics built on cognitive computing
functionality. Both of these trends require a
dramatic increase in processing power that
could be enabled by GPUs.”
— IDC
“By 2018, over 50% of developer teams will
embed cognitive services in their apps (vs 1%
today) providing U.S. enterprises with over
$60 billion annual savings by 2020.”
— IDC
AI — THE NEXT TRILLION $
IT OPPORTUNITY
7. 7
Deep Learning is a massive opportunity
Data Scientist productivity is vital
NVIDIA is the choice of the deep learning
world
DGX-1 is fast, instantly productive
NVIDIA DGX-1
The Essential Tool of
Deep Learning Scientists
170 TFLOPS | 8x Tesla P100 16GB | NVLink Hybrid Cube Mesh
2x Xeon | 8 TB RAID 0 | Quad IB 100Gbps, Dual 10GbE | 3U
8. 8
TESLA P100 WITH NVLINK
New GPU Architecture to Enable the World’s Fastest Compute Node
Pascal Architecture NVLink CoWoS HBM2 Page Migration Engine
PCIe
Switch
PCIe
Switch
CPU CPU
Highest Compute Performance GPU Interconnect for Maximum
Scalability
Unifying Compute & Memory in
Single Package
Simple Parallel Programming with
Virtually Unlimited Memory
Unified Memory
CPU
Tesla
P100
9. 9
Engineered for deep learning | 170TF FP16 | 8x Tesla P100
NVLink hybrid cube mesh | Accelerates major AI frameworks
NVIDIA DGX-1
WORLD’S FIRST DEEP LEARNING SUPERCOMPUTER
10. 10
NVIDIA DEEP LEARNING SDK
High Performance GPU-Acceleration for Deep Learning
COMPUTER VISION SPEECH AND AUDIO BEHAVIOR
Object Detection Voice Recognition Translation
Recommendation
Engines
Sentiment Analysis
DEEP LEARNING
cuDNN
MATH LIBRARIES
cuBLAS cuSPARSE
MULTI-GPU
NCCL
cuFFT
Mocha.jl
Image Classification
DEEP LEARNING
SDK
FRAMEWORKS
APPLICATIONS
11. 11
NVIDIA CUDNN
Building blocks for accelerating deep
neural networks on GPUs
High performance deep neural network
training and inference
Accelerates Caffe, CNTK, Tensorflow,
Theano, Torch
Performance continues to improve over
time
“NVIDIA has improved the speed of cuDNN
with each release while extending the
interface to more operations and devices
at the same time.”
— Evan Shelhamer, Lead Caffe Developer, UC Berkeley
developer.nvidia.com/cudnn
AlexNet training throughput based on 20 iterations,
CPU: 1x E5-2680v3 12 Core 2.5GHz.
0x
2x
4x
6x
8x
10x
12x
2014 2015 2016
K40
(cuDNN v1)
M40
(cuDNN v3)
Pascal
(cuDNN v5)
12. 12
NVIDIA DIGITS
Interactive Deep Learning GPU Training System
Test Image
Monitor ProgressConfigure DNNProcess Data Visualize Layers
developer.nvidia.com/digits
github.com/NVIDIA/DIGITS
13. 13
Instant productivity — plug-and-
play, supports every AI framework
Performance optimized across
the entire stack
Always up-to-date via the cloud
Mixed framework environments
—containerized
Direct access to NVIDIA experts
DGX STACK
Fully integrated Deep Learning platform
16. 16
DGX-1 CONTAINER LAUNCH FLOW
Customer data stays on premise
Web Browser
Node Management
User Authentication
Docker Image push/pull
Scheduler UI
HW/SW Metrics
LOCAL LAN
All Application Data
NFS Storage
DIGITS UI
Interactive Sessions
compute.nvidia.com 1. User schedules
containers to run
3. User interacts
with application
17. 17
DIGITS FOR DGX-1
A complete GPU-accelerated deep learning workflow
MANAGE TRAIN DEPLOY
DIGITS
DATA CENTER AUTOMOTIVE
TRAINTEST
MANAGE / AUGMENT
EMBEDDED
GPU INFERENCE ENGINE
MODEL ZOO
18. 18
BUILT FOR THE DATA CENTER
Data Center Ready24/7 Uptime
Boost data center throughput
Scalable Performance
Maximize reliability Simplify system operations
! !○
19. 19
END-TO-END DESIGN FOR SYSTEM UPTIME
24/7 Uptime
Scalable
Performance
Data Center
Ready
Guaranteed Quality
System Qual. Tests: Thermal, Stress, Airflow rate, Shock & Vibe
System Monitoring and Management for Tesla only
Dedicated Technical Staff for Failure Analysis
Extensive Qualification
& Testing
Long Burn-in Testing
Zero Error Tolerance at Aggressive Clocks
Even with Differentiated Engineering 5% of GPUs
are screened out
Differentiated
Engineering
Low Operating Voltage for Long Term Reliability
Large Guard-band for Guaranteed Quality
Error Correction Code (ECC) for Data Integrity
20. 20
DYNAMIC PAGE RETIREMENT MAXIMIZES UPTIME
24/7 Uptime
Scalable
Performance
Data Center
Ready
GPU MEMORY
Uncorrectable Data Error
causes application to
crash
Weak memory page is
retired
Tesla GPU with Dynamic
Page Retirement
GPU without Dynamic
Page Retirement (DPR)
Weak memory is still active
1. Users lose productivity as jobs continue to crash
2. IT Managers need to physically open up the server
and remove the bad GPU
3. Customer satisfaction risk with RMA process
1. Removes bad memory with simple reboot
2. No physical work required for IT
3. Negligible impact: <0.01% of memory is retired
!
21. 21
DATA CENTER QUALIFIED BY SERVER OEMS
24/7 Uptime
Scalable
Performance
Data Center
Ready
Server with
Tesla GPU
Server with
Unqualified GPU
Designed for max airflow through GPU
Supports airflow front-to-back & back-to-front
Lower power consumption
GPU Temp Running Linpack: 54C
Works against server airflow
Higher power consumption
Lower reliability
GPU Temp Running Linpack: 71C
Airflow
Temp: 54C
Temp: 71C
22. 22
SCALE-OUT PERFORMANCE IN THE DATA CENTER
24/7 Uptime
Scalable
Performance
Data Center
Ready
0
500
1000
1500
2000
8 16 32 64 96
Up to 2x Faster
Application Performance at Scale with
GPUDirect RDMA
GPUDirect RDMAA
Direct transfers between GPUs
67% Lower GPU-to-GPU Latency
5x Higher GPU-to-GPU MPI Bandwidth
Time-stepsperSec
# of Nodes
Hoomd-Blue Application
LJ Liquid Benchmark, 256K Particles
without RDMA
with RDMA
23. 23
NVLINK DELIVERS SCALABLE PERFORMANCE
24/7 Uptime
Scalable
Performance
Data Center
Ready
More than 45x Faster with 8x P100 Interconnected with NVLink
0x
5x
10x
15x
20x
25x
30x
35x
40x
45x
50x
Caffe/Alexnet VASP HOOMD-Blue COSMO MILC Amber HACC
2x K80 (M40 for Alexnet) 2x P100 4x P100 8x P100
Speed-upvsDualSocketHaswell
2x
Haswell
CPU
24. 24
DATA CENTER GPU MANAGEMENT
24/7 Uptime
Scalable
Performance
Device
Management
• Device Identification
• Board Monitoring
• Clock Management
Per GPU Configuration &
Monitoring
Data Center
Ready
Enterprise-Grade Management Tool for Operating the Data Center
Active Health
Monitoring ! Diagnostics &
System Validation
Runtime Health Checks
Prologue Checks
Epilogue Checks
Deep HW Diagnostics
System Validation Tests
Policy & Group Config
Management
Pre-configured policies
Job level accounting
Stateful configuration
Power & Clock
Mgmt.
Dynamic Power Capping
Synchronous Clock Boost
!
Data Center GPU Manager (Tesla GPUs Only)
All GPUs Supported
25. 25
DATA CENTER GPU MANAGER
24/7 Uptime
Scalable
Performance
Data Center
Ready
Integrated into Leading Industry Tools for HPC
Moab Cluster Suite
TORQUE
PBS Professional
IBM Platform HPC
IBM Platform LSF
Bright Cluster Manager
StackIQ Boss for HPC
with CUDA Pallet
Grid Engine
3rd Party
Software