In this video from the 2019 Stanford HPC Conference, Steve Oberlin from NVIDIA presents: HPC + Ai: Machine Learning Models in Scientific Computing.
"Most AI researchers and industry pioneers agree that the wide availability and low cost of highly-efficient and powerful GPUs and accelerated computing parallel programming tools (originally developed to benefit HPC applications) catalyzed the modern revolution in AI/deep learning. Clearly, AI has benefited greatly from HPC. Now, AI methods and tools are starting to be applied to HPC applications to great effect. This talk will describe an emerging workflow that uses traditional numeric simulation codes to generate synthetic data sets to train machine learning algorithms, then employs the resulting AI models to predict the computed results, often with dramatic gains in efficiency, performance, and even accuracy. Some compelling success stories will be shared, and the implications of this new HPC + AI workflow on HPC applications and system architecture in a post-Moore’s Law world considered."
Watch the video: https://youtu.be/SV3cnWf39kc
Learn more: https://nvidia.com
and
http://hpcadvisorycouncil.com/events/2019/stanford-workshop/
Sign up for our insideHPC Newsletter: http://insidehpc.com/newsletter
HPC + Ai: Machine Learning Models in Scientific Computing
1. Steve Oberlin, CTO Accelerated Computing
HPC + AI:
MACHINE LEARNING MODELS IN SCIENTIFIC COMPUTING
2. 2
GRAND CHALLENGES REQUIRE MASSIVE COMPUTING
REINVENTING THE LI-ION BATTERY
3M Node Hours | 7 Days on Titan
UNDERSTANDING HIV’S STRUCTURE
10M node Hours |16 Days on BlueWaters
CLOUD RESOLVING CLIMATE
SIMULATIONS
100M Node Hours | 840 Days on Piz Daint
9. 10
NOW, JUST ADD HPC AND STIR…
4
60
110
0
20
40
60
80
100
120
2010 2011 2012 2013 2014
GPU Entries
Classification Error Rates
28%
26%
16%
12%
7%
0%
5%
10%
15%
20%
25%
30%
2010 2011 2012 2013 2014
Team Date Top-5 Test Error
GoogLeNet 2014 6.66%
Baidu Deep Image 01/12/2015 5.98%
Baidu Deep Image 02/05/2015 5.33%
Microsoft 02/05/2015 4.94%
Google 03/02/2015 4.82%
Baidu Deep Image 03/17/2015 4.83%
Classification Task:
1.2M images • 1000 object categoriesEnter Deep Learning
Trained Human Performance: 5.1%
10. 11
ALGORITHMS + BIG DATA + GPUS =
THE BIG BANG OF MODERN AI
Auto
Encoders
GANLSTM
IDSIA
CNN on GPU
Stanford &
NVIDIA
Large-scale
DNN on GPU
U Toronto
AlexNet
on GPU
CaptioningNVIDIA BB8 Style TransferBRETTImageNet
Google Photo
Arterys
FDA Approved AlphaGo
Super
Resolution Deep Voice
Baidu
DuLight
NMT
Superhuman
ASR
Reinforcement
Learning
Transfer
Learning
recognition/classification -> recursion/time series -> generative
11. 12
BEYOND RECOGNITION
DNNs Go Generative
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. "High-
Resolution Image Synthesis and Semantic Manipulation with Conditional GANs", in CVPR, 2018.
13. 14
BEYOND RECOGNITION
DNNs Go Generative
“WaveNet: A Generative Model for Raw Audio”, https://arxiv.org/pdf/1609.03499.pdf, {avdnoord, sedielem,
heigazen, simonyan, vinyals, gravesa, nalk, andrewsenior, korayk}@google.com, Google DeepMind, London, UK
15. 16
AI on a super-Moore’s Law progression
0
10
20
30
40
50
60
K40
(2014)
K80
(2015)
P100
(2016)
V100
(2017)
AMBER Performance (ns/ day)
AMBER 14
CUDA 4
AMBER 14
CUDA 6
AMBER 16
CUDA 8
AMBER 16
CUDA 9
0
2400
4800
7200
9600
12000
8X K80
(2014)
8X MAXWELL
(2015)
DGX-1
(2016)
DGX-1V
(2017)
GoogleNet Performance (i/s)
cuDNN 2
CUDA 6
cuDNN 4
CUDA 7
cuDNN 6
CUDA 8
NCCL 1.6
cuDNN 7
CUDA 9
NCCL 2
Amber dataset: Cellulose NVE; GoogLeNet dataset: Imagenet
4x in 3 years 12x in 3 years
(65x in 5 years)
AI: A NEW COMPUTING PARADIGM
16. 17
2018: 10X AI GAIN IN ONE YEAR
DGX-1, SEP’17 DGX-2, Q3‘18
PyTorch Stack: Time to Train FAIRSEQ
software improvements across the stack including NCCL, cuDNN, etc.
0
5
10
15
DGX-1V DGX-2
15 days
1.5 days
17. 18NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
Mix freely with conventional
software and algorithms
SOFTWARE, BY EXAMPLE
Deep Learning builds functions from examples of desired behavior
Functions are the building
blocks of software. DL can
approximate any function.
Some functions are too complex to code by hand.
Generate complex functions by example.
Hurricane
Not a hurricane
HURRICANE
DETECTOR
Neural
network
!" = $(obs)
Optimizer
18. 19
THE POWER OF LEARNING FROM DATA
Predicting Chaos
“Model-Free Prediction of Large Spatiotemporally Chaotic Systems from Data: A
Reservoir Computing Approach”, Jaideep Pathak, Brian Hunt, Michelle Girvan,
Zhixin Lu, and Edward Ott
Phys. Rev. Lett. 120, 024102 – 12 January 2018
19. 20NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
BIG DATA IN SCIENCE
Big Science ingests/outputs Big Data
Large Hadron Collider Square Kilometer Array
Johns Hopkins
Turbulence Database
20. 21
RECOGNITION/CLASSIFICATION
Heterogeneous event selection at CMS experiment
DNN to reconstruct higher rate of events at lower power
Heterogeneous Event Selection at the CMS experiment
http://drive.google.com/file/d/0B596cb8D9K9kZjJzdzBRdGY0NFk/preview
21. 22
RECOGNITION/CLASSIFICATION -> CONTROL
DL for plasma fusion stability
DL enabling better accuracy –- ~95% TP vs. old ~80% -- promising control of live ITER Tokomac
Accelerated Deep Learning Discovery in Fusion Energy Science
http://on-demand-gtc.gputechconf.com/gtc-quicklink/7zGB7j
22. 23
DL for adaptive optics
DL enabling clearer views from the world’s largest ground-based telescopes
RECOGNITION/CLASSIFICATION -> CONTROL
Helping the Discovery of New Galaxies on the World's Largest Telescopes Using a Large GPU Cluster
http://on-demand-gtc.gputechconf.com/gtc-quicklink/ewiELDW
24. 25
2015: USING NUMERIC SIMULATIONS TO TRAIN AI
“Data-driven Fluid Simulations using Regression Forests” http://people.inf.ethz.ch/ladickyl/fluid_sigasia15.pdf
25. 26
2015: USING NUMERIC SIMULATIONS TO TRAIN AI
“Data-driven Fluid Simulations using Regression Forests” http://people.inf.ethz.ch/ladickyl/fluid_sigasia15.pdf
26. 27
TRAINING A DEEP LEARNING HPC MODEL
ERRORS
REGRESSION TESTING
(FP16/INT8)
INFERENCE
(FP16/INT8)
TRAINING
(FP32/FP16)
SIMULATION
(FP64/FP32)
DATA
REGRESSION SET
NEW DATA
TRAINING SET
28. 29
Background
Developing a new drug costs $2.5B and takes 10-15 years. Quantum chemistry
(QC) simulations are important to accurately screen millions of potential drugs to
a few most promising drug candidates.
Challenge
QC simulation is computationally expensive so researchers use approximations,
compromising on accuracy. To screen 10M drug candidates, it takes 5 years to
compute on CPUs.
Solution
Researchers at the University of Florida and the University of North Carolina
leveraged GPU deep learning to develop ANAKIN-ME, to reproduce molecular
energy surfaces with super speed (microseconds versus several minutes),
extremely high (DFT) accuracy, and at 5-6 orders of magnitude lower cost.
Impact
Faster, more accurate screening at far lower cost
DEEP LEARNING FOR
QUANTUM CHEMISTRY
29. 30
NEURAL NETWORK MODEL APPROACH
Training set: ~20M DFT data points.
Molecules with 1 to 8 atoms from GDB database
32. 33
SATELLITE TO MODEL TRANSLATION
Automatically generate inverse map from radiances to model variables
SATELLITE RADIANCES WEATHER MODEL VARIABLES
No analytic formula available for such a conversion
Data assimilation: forward operator + adjoint-sensitivity analysis
Deep learning can potentially obtain inverse operator numerically
33. 34
MODEL TRANSLATION
BY CONDITIONAL GAN
Adversarial model outputs a
physically plausible state
Both forward and inverse maps
For data assimilation and forecast
verification
Physically plausible state
from incomplete data
OBSERVATION GOES-15 band 3
MODEL VAR GFS Precipitable water
Training 2014-2016
Test 2013
INPUT: GOES-15 GENERATED TARGET: GFS
INPUT: GFS GENERATED TARGET: GOES-15
34. 35
DEEP LEARNING FOR MODEL CREATION: MIIDAPS-AI
Multi-Instrument Inversion and Data Assimilation Preprocessing System
Sid Boukabara NOAA/NESDIS Eric Maddy, Adam Neiss Riverside Technology Inc
MIIDAPS-AI TPW
Inverse operator for multiple IR and microwave satellites.
Iteratively uses CRTM radiative transfer model
5 seconds vs 2 hrs to process one day
1400x speedup.
35. 36
SLOW MOTION
SATELLITE LOOP
David Hall NVIDIA
INPUT GOES-15 band 3, GFS winds
OUTPUT Interpolated GOES-15
INPUT FREQ 1 every 3 hours
OUTPUT FREQ 1 every 18 minutes
Applications:
• Visualization
• Data Augmentation
• Replace dropped frames
• Reduce storage requirements
11 input images
110 output frames
36. 37
Resolving physics at sub-grid dimensions
DL enabling faster, more accurate climate modeling and predictions
DEEP LEARNING FOR CLIMATE MODELING
37. 38
Automating Extreme Weather Detection in Climate Model Output
2018 Gordon Bell Award winner – 1.13 EFLOPS (training at mixed precision)
DEEP LEARNING FOR CLIMATE ANALYTICS
“Exascale Deep Learning For Climate Analytics”, Jaideep Pathak, Brian Hunt, Michelle Thorsten Kurth, Sean Treichler, Joshua Romero, Mayur
Mudigonda, Nathan Luehr, Everett Phillips, Ankur Mahesh, Michael Matheson, Jack Deslippe, Massimiliano Fatica, Prabhat, Mike Houston
SC 2018
38. 39
Physics Informed Neural Networks
39
Mass conservation:
Momentum conservation:
Transport:
RESPECTING PHYSICS
Deep Learning for CFD