HPC + Ai: Machine Learning Models in Scientific Computing

Steve Oberlin, CTO Accelerated Computing
HPC + AI:
MACHINE LEARNING MODELS IN SCIENTIFIC COMPUTING

2
GRAND CHALLENGES REQUIRE MASSIVE COMPUTING
REINVENTING THE LI-ION BATTERY
3M Node Hours | 7 Days on Titan
UNDERSTANDING HIV’S STRUCTURE
10M node Hours |16 Days on BlueWaters
CLOUD RESOLVING CLIMATE
SIMULATIONS
100M Node Hours | 840 Days on Piz Daint

3
BAD TIMING
THE SLOW DEATH OF MOORE’S LAW

5
TOP500 EFFECTS
All
#1
#500
1
TFLOPS
100
GFLOPS
10
TFLOPS
100
TFLOPS
1
PFLOPS
10
PFLOPS
100
PFLOPS

6
SOMETHING NEW:
AI + HPC = REVOLUTION

7
INGREDIENTS: BIG DATA
Cloud Apps
(We are the sensors for our cloud service providers)

10
NOW, JUST ADD HPC AND STIR…
4
60
110
0
20
40
60
80
100
120
2010 2011 2012 2013 2014
GPU Entries
Classification Error Rates
28%
26%
16%
12%
7%
0%
5%
10%
15%
20%
25%
30%
2010 2011 2012 2013 2014
Team Date Top-5 Test Error
GoogLeNet 2014 6.66%
Baidu Deep Image 01/12/2015 5.98%
Microsoft 02/05/2015 4.94%
Google 03/02/2015 4.82%
Classification Task:
1.2M images • 1000 object categoriesEnter Deep Learning
Trained Human Performance: 5.1%

11
ALGORITHMS + BIG DATA + GPUS =
THE BIG BANG OF MODERN AI
Auto
Encoders
GANLSTM
IDSIA
CNN on GPU
Stanford &
NVIDIA
Large-scale
DNN on GPU
U Toronto
AlexNet
on GPU
CaptioningNVIDIA BB8 Style TransferBRETTImageNet
Google Photo
Arterys
FDA Approved AlphaGo
Super
Resolution Deep Voice
Baidu
DuLight
NMT
Superhuman
ASR
Reinforcement
Learning
Transfer
Learning
recognition/classification -> recursion/time series -> generative

12
BEYOND RECOGNITION
DNNs Go Generative
Ting-Chun Wang, Ming-Yu Liu, Jun-Yan Zhu, Andrew Tao, Jan Kautz, and Bryan Catanzaro. "High-
Resolution Image Synthesis and Semantic Manipulation with Conditional GANs", in CVPR, 2018.

13
BEYOND RECOGNITION
DNNs Go Generative

14
BEYOND RECOGNITION
DNNs Go Generative
“WaveNet: A Generative Model for Raw Audio”, https://arxiv.org/pdf/1609.03499.pdf, {avdnoord, sedielem,
heigazen, simonyan, vinyals, gravesa, nalk, andrewsenior, korayk}@google.com, Google DeepMind, London, UK

15
WHAT DOES THIS HAVE TO DO WITH
SCIENCE?
(HPC + AI = ?)

16
AI on a super-Moore’s Law progression
0
10
20
30
40
50
60
K40
(2014)
K80
(2015)
P100
(2016)
V100
(2017)
AMBER Performance (ns/ day)
AMBER 14
CUDA 4
AMBER 14
CUDA 6
AMBER 16
CUDA 8
AMBER 16
CUDA 9
0
2400
4800
7200
9600
12000
8X K80
(2014)
8X MAXWELL
(2015)
DGX-1
(2016)
DGX-1V
(2017)
GoogleNet Performance (i/s)
cuDNN 2
CUDA 6
cuDNN 4
CUDA 7
cuDNN 6
CUDA 8
NCCL 1.6
cuDNN 7
CUDA 9
NCCL 2
Amber dataset: Cellulose NVE; GoogLeNet dataset: Imagenet
4x in 3 years 12x in 3 years
(65x in 5 years)
AI: A NEW COMPUTING PARADIGM

17
2018: 10X AI GAIN IN ONE YEAR
DGX-1, SEP’17 DGX-2, Q3‘18
PyTorch Stack: Time to Train FAIRSEQ
software improvements across the stack including NCCL, cuDNN, etc.
0
5
10
15
DGX-1V DGX-2
15 days
1.5 days

18NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
Mix freely with conventional
software and algorithms
SOFTWARE, BY EXAMPLE
Deep Learning builds functions from examples of desired behavior
Functions are the building
blocks of software. DL can
approximate any function.
Some functions are too complex to code by hand.
Generate complex functions by example.
Hurricane
Not a hurricane
HURRICANE
DETECTOR
Neural
network
!" = $(obs)
Optimizer

19
THE POWER OF LEARNING FROM DATA
Predicting Chaos
“Model-Free Prediction of Large Spatiotemporally Chaotic Systems from Data: A
Reservoir Computing Approach”, Jaideep Pathak, Brian Hunt, Michelle Girvan,
Zhixin Lu, and Edward Ott
Phys. Rev. Lett. 120, 024102 – 12 January 2018

20NVIDIA CONFIDENTIAL. DO NOT DISTRIBUTE.
BIG DATA IN SCIENCE
Big Science ingests/outputs Big Data
Large Hadron Collider Square Kilometer Array
Johns Hopkins
Turbulence Database

21
RECOGNITION/CLASSIFICATION
Heterogeneous event selection at CMS experiment
DNN to reconstruct higher rate of events at lower power
Heterogeneous Event Selection at the CMS experiment
http://drive.google.com/file/d/0B596cb8D9K9kZjJzdzBRdGY0NFk/preview

22
RECOGNITION/CLASSIFICATION -> CONTROL
DL for plasma fusion stability
DL enabling better accuracy –- ~95% TP vs. old ~80% -- promising control of live ITER Tokomac
Accelerated Deep Learning Discovery in Fusion Energy Science
http://on-demand-gtc.gputechconf.com/gtc-quicklink/7zGB7j

23
DL for adaptive optics
DL enabling clearer views from the world’s largest ground-based telescopes
RECOGNITION/CLASSIFICATION -> CONTROL
Helping the Discovery of New Galaxies on the World's Largest Telescopes Using a Large GPU Cluster
http://on-demand-gtc.gputechconf.com/gtc-quicklink/ewiELDW

24
RECOGNITION/CLASSIFICATION -> FILTER
De-noising gravitational waves
DL enabling 5000x faster filtering for real-time multi-messenger astronomy

25
2015: USING NUMERIC SIMULATIONS TO TRAIN AI
“Data-driven Fluid Simulations using Regression Forests” http://people.inf.ethz.ch/ladickyl/fluid_sigasia15.pdf

26
2015: USING NUMERIC SIMULATIONS TO TRAIN AI
“Data-driven Fluid Simulations using Regression Forests” http://people.inf.ethz.ch/ladickyl/fluid_sigasia15.pdf

27
TRAINING A DEEP LEARNING HPC MODEL
ERRORS
REGRESSION TESTING
(FP16/INT8)
INFERENCE
(FP16/INT8)
TRAINING
(FP32/FP16)
SIMULATION
(FP64/FP32)
DATA
REGRESSION SET
NEW DATA
TRAINING SET

28
IS A ML MODEL USEFUL FOR SCIENCE?

29
Background
Developing a new drug costs $2.5B and takes 10-15 years. Quantum chemistry
(QC) simulations are important to accurately screen millions of potential drugs to
a few most promising drug candidates.
Challenge
QC simulation is computationally expensive so researchers use approximations,
compromising on accuracy. To screen 10M drug candidates, it takes 5 years to
compute on CPUs.
Solution
Researchers at the University of Florida and the University of North Carolina
leveraged GPU deep learning to develop ANAKIN-ME, to reproduce molecular
energy surfaces with super speed (microseconds versus several minutes),
extremely high (DFT) accuracy, and at 5-6 orders of magnitude lower cost.
Impact
Faster, more accurate screening at far lower cost
DEEP LEARNING FOR
QUANTUM CHEMISTRY

30
NEURAL NETWORK MODEL APPROACH
Training set: ~20M DFT data points.
Molecules with 1 to 8 atoms from GDB database

33
SATELLITE TO MODEL TRANSLATION
Automatically generate inverse map from radiances to model variables
SATELLITE RADIANCES WEATHER MODEL VARIABLES
No analytic formula available for such a conversion
Data assimilation: forward operator + adjoint-sensitivity analysis
Deep learning can potentially obtain inverse operator numerically

34
MODEL TRANSLATION
BY CONDITIONAL GAN
Adversarial model outputs a
physically plausible state
Both forward and inverse maps
For data assimilation and forecast
verification
Physically plausible state
from incomplete data
OBSERVATION GOES-15 band 3
MODEL VAR GFS Precipitable water
Training 2014-2016
Test 2013
INPUT: GOES-15 GENERATED TARGET: GFS
INPUT: GFS GENERATED TARGET: GOES-15

35
DEEP LEARNING FOR MODEL CREATION: MIIDAPS-AI
Multi-Instrument Inversion and Data Assimilation Preprocessing System
Sid Boukabara NOAA/NESDIS Eric Maddy, Adam Neiss Riverside Technology Inc
MIIDAPS-AI TPW
Inverse operator for multiple IR and microwave satellites.
Iteratively uses CRTM radiative transfer model
5 seconds vs 2 hrs to process one day
1400x speedup.

36
SLOW MOTION
SATELLITE LOOP
David Hall NVIDIA
INPUT GOES-15 band 3, GFS winds
OUTPUT Interpolated GOES-15
INPUT FREQ 1 every 3 hours
OUTPUT FREQ 1 every 18 minutes
Applications:
• Visualization
• Data Augmentation
• Replace dropped frames
• Reduce storage requirements
11 input images
110 output frames

37
Resolving physics at sub-grid dimensions
DL enabling faster, more accurate climate modeling and predictions
DEEP LEARNING FOR CLIMATE MODELING

38
Automating Extreme Weather Detection in Climate Model Output
2018 Gordon Bell Award winner – 1.13 EFLOPS (training at mixed precision)
DEEP LEARNING FOR CLIMATE ANALYTICS
“Exascale Deep Learning For Climate Analytics”, Jaideep Pathak, Brian Hunt, Michelle Thorsten Kurth, Sean Treichler, Joshua Romero, Mayur
Mudigonda, Nathan Luehr, Everett Phillips, Ankur Mahesh, Michael Matheson, Jack Deslippe, Massimiliano Fatica, Prabhat, Mike Houston
SC 2018

39
Physics Informed Neural Networks
39
Mass conservation:
Momentum conservation:
Transport:
RESPECTING PHYSICS
Deep Learning for CFD

40
DISCOVERING HIDDEN PHYSICS
Learned vs. Ground Truth
Training set: CFD Simulation of an External
Flow over a Cylinder with OpenFOAM

42
THE PROMISE OF HPC+AI
Better stability/accuracy -> Higher-quality simulations
Resists Moore’s Law fade -> Continuing future simulation progress
Lower-precision data requirements -> Larger/finer grids, better science
Trained models run at peak speed -> Perfectly accelerate existing applications
Inference perf function of DNN arch -> Better science at low incremental cost
Reduced code optimization requirement -> Scientists can code simply, naturally
Reduced maintenance -> More science/dollar
Orders-of-magnitude speed-up/lower energy -> Bigger/longer/cheaper simulations

43
3 ORDERS OF MAGNITUDE
.6 MPH 6 MPH 60 MPH 600 MPH
1000x

44
WHAT IF…?
(Warning: CTO math)
REINVENTING THE LI-ION BATTERY
3M Node Hours | 7 Days on Titan
UNDERSTANDING HIV’S STRUCTURE
10M node Hours |16 Days on BlueWaters
CLOUD RESOLVING CLIMATE
SIMULATIONS
100M Node Hours | 840 Days on Piz Daint
10x: 17 hours
100x: 100 minutes
1000x: 86 seconds
10x: 84 days
100x: 8.4 days
1000x: 20 hours
10x: 1.6 days
100x: 4 hours
1000x: 23 minutes

45
THIS IS ONLY THE BEGINNING…

HPC + Ai: Machine Learning Models in Scientific Computing

Recomendados

Recomendados

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

Similar a HPC + Ai: Machine Learning Models in Scientific Computing

Similar a HPC + Ai: Machine Learning Models in Scientific Computing (20)

Más de inside-BigData.com

Más de inside-BigData.com (20)

Último

Último (20)

HPC + Ai: Machine Learning Models in Scientific Computing