A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
Elster falch-gpu-cse-sem-oct2013
1. 1
http://research.idi.ntnu.no/hpc-lab
The Power of GPU Computing
Thomas L. Falch and Dr. Anne C. Elster(*)
HPC-Lab, Dept. Computer and Info. Science
Norwegian University of Science & Technology
Trondheim, Norway
(*)Elster also holds a 0% Visiting Scientist appointment at
ICES, UT Austin where she spends summer and sabatticals
Falch & Elster: The Power of GPU Computing
NTNU CSE Seminar Oct 2, 2013
3. 3
Dr. Elster´s HPC-Lab currently focuses
on research related to novel GPU
and multi-core architectures
http://research.idi.ntnu.no/hpc-lab
> 40 Master students (since 2001)
> 15 masters projects on GPU for HPC
Parallelization of Seismic and Image
Related Applications on GPUs and
Multi-Cores
Modeling Heterogenous systems
Parallel and Distributed Algorithms and
Tools
Performance Evaluation and
Benchmarking
Collaborators / Supporters:
Adaptive and Auto-Tuneable
Algorithms and Implementations
NTNU CSE Seminar Oct 2, 2013
AMD, ARM, CERN, NVIDIA, Statoil, GPU Computing
Falch & Elster: The Power of Schlumberger, GE-Healthcare, and others
4. 4
Outline
http://research.idi.ntnu.no/hpc-lab
• Introduction to GPU computing
• Overview of GPU projects at the HPC-Lab
– 3D Real-Time Snow simulation
– Flow simulations (porous media)
– Surface extraction
• Visualization of scattered point data
Falch & Elster: The Power of GPU Computing
NTNU CSE Seminar Oct 2, 2013
7. 7
The “Walls” (refr. Dr. David Patterson)
http://research.idi.ntnu.no/hpc-lab
To increase processor performance one can:
1. Increase the system clock speed -> Power Wall(*)
2. Increase memory bandwidth-> more complex
3. Parallelize -> more complex
(*) The Power Wall: Too much Heat and
transistor performance degrades
(more power leakage as power increases)!
Now maxing out at 3-4GHz for general processors
Falch & Elster: The Power of GPU Computing
NTNU CSE Seminar Oct 2, 2013
11. 11
How to get to Exascale?
http://research.idi.ntnu.no/hpc-lab
Limited by Power!
Solution?
(BOF at SC´10 by Elster, Vaquez-Poletti & Perhac:
Towards Exa-Scale: Heterogeneous Clouds
(CPUs, GPUs and Embedded Devices)
Falch & Elster: The Power of GPU Computing
NTNU CSE Seminar Oct 2, 2013
12. 12
http://research.idi.ntnu.no/hpc-lab
Outline
• Introduction to GPU computing
• Overview of GPU projects at the HPC-Lab
– 3D Real-Time Snow simulation
– Flow simulations (porous media)
– Surface extraction
• Visualization of scattered point data
Falch & Elster: The Power of GPU Computing
NTNU CSE Seminar Oct 2, 2013
13. 13
NTNU GPU Activities
http://research.idi.ntnu.no/hpc-lab
NTNU is a NVIDIA CUDA Research & Teaching Center
• Elster teaches a Senior Parallel computing class with
50+ students
•
Elster´s HPC-lab has graduated 20+ Master
students in GPU computing (2007-2013)
Falch & Elster: The Power of GPU Computing
NTNU CSE Seminar Oct 2, 2013
14. 14
HPC-Lab History (last 6 yrs):
http://research.idi.ntnu.no/hpc-lab
Fall 2006:
• First 2 student projects with GPU programming (Cg)
Christian Larsen (MS Fall Project, December 2006):
“Utilizing GPUs on Cluster Computers” (joint with Schlumberger)
Erik Axel Nielsen asks for FX 4800 card for project with GE Healthcare
•
Elster head of Computational Science & Visualization program and helped
NTNU acquire new IBM Supercomputer
(Njord, 7+ TFLOPS, proprietary switch)
14
Falch & Elster: The Power of GPU Computing
NTNU CSE Seminar Oct 2, 2013
15. 15
HPC-Lab History (contin.):
http://research.idi.ntnu.no/hpc-lab
2007:
Erik Axel Nielsen (Masters thesis, June 2007):
“Real-time Wavelet Filtering on the GPU” -- joint project with GE Healthcare.
40 times GPU speedup of algorithm led to our implementation being adopted the same fall in
their high-end cardivascular ultrasound scanner.
Christian Larsen (Masters thesis, June 2007) Tore Fevang, Schlumberger (co-advisor):
"Framework for Polygonial Structures Computations on Clusters” (incl GPU parallelization)
Idar Borlaug (Masters thesis, June 2007):
“ Seismic Processing Using Parallel 3D FMM”
Thibault Collet (Masters thesis summer 2007):
"Massively Online Games with Food Chains"
Knut Imar Hagen (Masters thesis, June 2007)
“Fault-tolerance for MPI Codes on Computation
Clusters” (joint project with Statoil)
Nils Magnus Larsgård (Masters thesis summer 2007):
“Framework for Converting MPI Codes to Hybrid OpenMP/MPI Codes”
15
Falch & Elster: The Power of GPU Computing
NTNU CSE Seminar Oct 2, 2013
17. 17
HPC-Lab History (contin.):
http://research.idi.ntnu.no/hpc-lab
2008:
HPC-LAB at IDI/NTNU opens in Oct. with
• several NVIDIA donation
• Several quad-core machines (1-2
donated by Schlumberger)
Atle Rudshaug (Masters thesis, June 2008): “Optimizing & Parallelizing a Large
Commercial Code for Modeling Oil-well Networks” -- joint project with Yggdrasil
Andreas Bach (Masters thesis, September 2008): “Profiling and Optimizing
a Seismic Application on Modern Architectures” -- joint project with Statoil
Rune Hovland (Masters project, Dec 2008) :
"Latency and Bandwidth Impact on GPU Systems" (ParCo 2009 w/ Elster)
Daniele Giuseppe Spampinato (Masters Project, December 2008):
"Linear Optimizations with CUDA (IPDPS MTAAP 2009 w/ Elster)
17
Falch & Elster: The Power of GPU Computing
NTNU CSE Seminar Oct 2, 2013
18. 18
Selected Master theses and Master reports
supervised by Dr. Elster in 2009
1) Robin Eidissen (Masters thesis, January 2009) :
http://research.idi.ntnu.no/hpc-lab
"Utilizing GPUs for Real-Time Visualization of Snow” (demoed @ SC´08-SC´10)
Eirik Aksnes and Henrik Hesland (MS Project, Jan 2009) :
"GPU Techniques for Porous Rock Visualization”
2) Rune Erlend Jensen (Masters thesis, May 2009, currently PhD student at HPC-Lab) :
"Techniques and Tools for Optimizing Codes on Modern Architectures:
A Low-Level Approach” (NR MS Thesis Award!)
3) Rune Johan Hovland (Masters thesis, June 2009),
Dr. Magnus Lie Hetland (co-advisor): "Throughput Computing on Future GPUs”
4) Henrik Hesland (Masters thesis, June 2009) Thorvald Natvig (co-advisor):
"GPU-Enabled Interactive Pore Detection for 3D Rock Visualization "
5) Eirik Ola Aksnes (Masters thesis, July 2009)
Ståle Fjeldstand & Atle Rudshaug, Numerical Rocks (co-advisors):
"Simulation of Fluid Flow Through Porous Rocks on Modern GPUs" (ParCo 2009)
6) Daniel Haugen (Masters thesis, July 2009) Tore Fevang, Schlumberger (co-advisor):
"Seismic Data Compression and GPU Memory Latency"
7) Åsmund Herikstad (Masters thesis, July 2009) Svein-Erik Måsøy, MedTek, NTNU (co-advisor)
"Parallel Techniques for Estimation and Correction of Aberration in Medical Ultrasound Imaging"
8) Owe Johansen (Masters thesis, July 2009) John Hybertsen & Jon André Haugen, Statoil (coadvisors): "Seismic Shot Processing on GPU"
9) Daniele Giuseppe Spampinato (Masters thesis, July 2009; currently PhD student @ ETH)
"Modeling Communication on Multi-GPU Systems” (ParCo 2009)
Falch & Elster: The Power of GPU Computing
NTNU CSE Seminar Oct 2, 2013
19. 19
HPC-Lab -- Spring 2010
http://research.idi.ntnu.no/hpc-lab
Dr. Anne C. Elster
Lab Director
Dr. John P. Ryan
Post Doc
Dr. Jan Perhac
Post Doc
Jan Christian
Meyer (PhD stud)
Thorvald Natvig
(PhD stud.)
Rune E. Jensen
(PhD stud.)
Master Students – Spring 2010
Ahmed Aqrawi
Assist. TDT 4200
Aleksander
Gjermundsen
Affiliates /Visitors
Andreas
Hysing
Øystein Krog
Holger Ludvigsen
Assist TDT 4205
+ 2 Cybernetics students
+ 3 visualization students
+ 1-2 || arch/multicore students
+ 1 Marine student
Eirik O. Aksnes
(tentative PhD,
Now consultant
for Statoil)
Refsnaes & Singh did FEM on GPU -NTNU CSE
Kvamsdal & Elster Seminar Oct 2, 2013
Gagandeep
Collarborations between
Falch & Elster: The Power of GPU Computing
Singh (Math)
20. 20
HPC-Lab History (contin.):
http://research.idi.ntnu.no/hpc-lab
2010:
-
NVIDIA Fermi-based card(470, c2050, c2070(fall))
More on OpenCL
Ahmed A. Aqwari (Masters thesis, June 2010):
“Effects of Compression on Data Intensive Algorithms”
Aleksander Gjermundsen (Masters thesis, July 2010):
“Audio Processing on GPU”
Andreas Hysing (Masters thesis, Aug 2010): Parallel Inversion code (w/Statoil)
Øystein Krog (Masters thesis, June 2010):
“GPU-based Real-Time Snow Avalanche Simulations” (SPH)
Holger Ludvigsen (Masters thesis, June 2010, Dr. Frank Lindseth (co-advisor):
“Real-Time GPU-Based 3D Ultrasound Reconstruction
and Visualization”
Thorvald Natvig (PhD Dec 2010) “Automatic Run-Time Communication and I/O”
20
Falch & Elster: The Power of GPU Computing
NTNU CSE Seminar Oct 2, 2013
21. HPC-Lab -- 2011
(Elster on sabatical 2010/11)
21
Dr. Anne C. Elster
Lab Director
Dr. Ian Karlin
Post Doc
Jan Christian
Meyer (PhD stud)
Rune E. Jensen
(PhD stud.)
http://research.idi.ntnu.no/hpc-lab
Erik Smistad
(PhD stud.)
Elster co-advisor
Master Students – Spring 2011
Fredrik Fossum
GPU Rigid body
simulation
Yngve S. Lindal
(GPU proj
@ CERN)
Affiliates /Visitors
Ole-Martin
Brende
(MedTech)
Ove
Stinessen
(Statoil proj)
Jarle
Stensland
(OpenCL BLAS)
Thor Kristian
Valderhaug
(Numerical Rocks
proj Multi-GPU LBM)
Geir Jostein
Lien
(2-yr Master
Informatics,
graduated
2012)
Miguel Martinez-delAmor (PhD student from
Spain,
Falch & Elster: The Power of GPU Computing Fall 2011 NTNU CSE Seminar Oct 2, 2013
22. 22
HPC-Lab: Master theses 2012
http://research.idi.ntnu.no/hpc-lab
Kjetil Babington: Terrain Rendering Techniques for the HPC-Lab Snow Simulato
Thomas Løfsgaard Falch: 3D Visualization of X-ray Diffraction Data
Geir Josten Lien: Auto-tunable GPU BLAS
Jan Magne Rovde: Real-Time Granular Flow Simulation Using
the PCISPH Method on GPGPU Devices Using CUDA
Frederik Magnus Johansen Vestre:
Enhancing and Porting the HPC-Lab Snow Simulator to OpenCL
on Mobile Platforms
Jan Christian Meyer, PhD Theses (December):
Performance Modeling of Heterogeneous Systems
Falch & Elster: The Power of GPU Computing
NTNU CSE Seminar Oct 2, 2013
24. 24
HPC-Lab: Master Theses 2013
http://research.idi.ntnu.no/hpc-lab
Lark Kirkholt Melhus (June):
Analyzing Contextual Bias of Program Execution on Modern CPUs
Magnus Mikalsen (June): OpenACC-based Snow Simulation
Andreas Nordahl (June): Enhancing the HPC-Lab Snow Simulator
with More Realistic Terrains and Other Interactive Features
Lars Espen Nordhus (June): Ray Tracing for Simulation of Wireless Networks
in 3D Scenes
Stian Aaraas Pedersen (June): Progressive Photon Mapping on GPUs
Andreas Skomedal (June): Heterogeneous FTDT for Seismic Processing
Henrik Holenbakken Knutsen (Sept): Enhancing Software Portability
with Hardware Parametrized Autotuning
Falch & Elster: The Power of GPU Computing
NTNU CSE Seminar Oct 2, 2013
25. 25
HPC-Lab: 2013/2014
http://research.idi.ntnu.no/hpc-lab
Anne C. Elster – Director
Malik Khan – Post Doc to start Nov 1, 2013
PhD students:
●
Rune Jensen,
●
Johannes Kvam
●
Thomas Falch,
Samira Pakdel,
Ruben Spaans
Co-supervised by Elster:
●
●
●
●
●
Johannes Kvam, Erik Smistad, Mehdi Bozorgi, Lane Holloway (UT ECE Student)
●
+ 8 master students & 2 MedTech PhD students
Falch & Elster: The Power of GPU Computing
NTNU CSE Seminar Oct 2, 2013
26. 26
Outline
http://research.idi.ntnu.no/hpc-lab
• Introduction to GPU computing
• Overview of GPU projects at the HPC-Lab
– 3D Real-Time Snow simulation
– Flow simulations (porous media)
– Surface extraction
• Visualization of scattered point data
Falch & Elster: The Power of GPU Computing
NTNU CSE Seminar Oct 2, 2013
30. 30
Add Road Generation
(Used A* algorithm, Demo @ SC11)
Falch & Elster: The Power of GPU Computing
http://research.idi.ntnu.no/hpc-lab
NTNU CSE Seminar Oct 2, 2013
31. 31
Add Ray-Tracing
Falch & Elster: The Power of GPU Computing
http://research.idi.ntnu.no/hpc-lab
NTNU CSE Seminar Oct 2, 2013
32. 32
Outline
http://research.idi.ntnu.no/hpc-lab
• Introduction to GPU computing
• Overview of GPU projects at the HPC-Lab
– 3D Real-Time Snow simulation
– Flow simulations (porous media)
– Surface extraction
• Visualization of scattered point data
Falch & Elster: The Power of GPU Computing
NTNU CSE Seminar Oct 2, 2013
33. 33
Simulations of Fluid Flow through
Porous Rocks using GPUs
http://research.idi.ntnu.no/hpc-lab
Eirik Ola Aksnes & A.C: Elster (ParCo 2009)
+ current work with Thor Kristian Valderhaug using OpenCL
In collaboration with :
Numerical Rocks & NTNU Chemistry Dept.
Use Lattice Bolzmann Method
a.k.a LBM
Falch & Elster: The Power of GPU Computing
NTNU CSE Seminar Oct 2, 2013
34. 34
Outline
http://research.idi.ntnu.no/hpc-lab
• Introduction to GPU computing
• Overview of GPU projects at the HPC-Lab
– 3D Real-Time Snow simulation
– Flow simulations (porous media)
– Surface extraction
• Visualization of scattered point data
Falch & Elster: The Power of GPU Computing
NTNU CSE Seminar Oct 2, 2013
37. 37
3D Surface Extraction on GPUs
http://research.idi.ntnu.no/hpc-lab
• Use Marching Cubes – algorithm for extracting a 3D
surface from a set of sampled scalars
• Algorithm used extensively for visualizing and analyzing
medical data (X-ray, MR) and the result of 3D
segmentation.
• Completely data parallel
• Challenge:
How to store the result of each
cube in parallel on GPU
Falch & Elster: The Power of GPU Computing
NTNU CSE Seminar Oct 2, 2013
38. 38
3D Surface Extraction
-- Histogram data
•
http://research.idi.ntnu.no/hpc-lab
Challenge: How to store the result of each cube in parallel on GPU?
In serial implementation this is simple – just use a stack and
add the vertex data to the stack
• GPU Solution: Histogram Pyramids [1]
• A datastructure that:
•
•
•
•
Filters out cubes that has no triangle (stream reduction)
Returns total sum of triangles
Provides each cube with an index for memory storage
Can be efficiently used by means of textures yielding large speed-ups
[1] G. Ziegler et al: On-the-fly Point Clouds through Histogram Pyramids; Vision, Modeling, and Visualization 2006
Falch & Elster: The Power of GPU Computing
NTNU CSE Seminar Oct 2, 2013
40. 40
3D Surface Extraction
http://research.idi.ntnu.no/hpc-lab
-- Results:
HPMC Dyken et al.
Size
Exec.
time
512^3
vs.
Our OpenCL implementation
FPS
(avg)
Memory
Size
Exec.
time
FPS
(avg)
Memory
3324 ms
0.3
490 MB
512^3
34 ms
0.3
121 MB
256^3
5 ms
223
122 MB
256^3
10 ms
105
40 MB
128^3
3 ms
394
44 MB
128^3
4 ms
233
26 MB
64^3
2 ms
519
22MB
64^3
3 ms
319
22MB
Our Test system:
•
•
•
•
Intel i5 750, 4GB RAM
ATI Radeon 5870 (1GB RAM)
AMD Catalyst 11.2 graphics driver
APP SDK 2.3 w/ OpenCL 1.1
Note:
OpenCL-OpenGL
Synch measured to
be 2-20ms, i.e. 7090<% for smallest
datasets
Falch & Elster: The Power of GPU Computing
NTNU CSE Seminar Oct 2, 2013
41. 41
Outline
http://research.idi.ntnu.no/hpc-lab
• Introduction to GPU computing
• Overview of GPU projects at the HPC-Lab
– 3D Real-Time Snow simulation
– Flow simulations (porous media)
– Surface extraction
• Visualization of scattered point data
Falch & Elster: The Power of GPU Computing
NTNU CSE Seminar Oct 2, 2013
42. 42
Scattered Point Data
(a )
(b)
Falch & Elster: The Power of GPU Computing
http://research.idi.ntnu.no/hpc-lab
(c)
NTNU CSE Seminar Oct 2, 2013
46. 46
Volume Ray Casting of
Scattered Point Data
http://research.idi.ntnu.no/hpc-lab
Eye/camera
Image
Ray
Bounding box
Falch & Elster: The Power of GPU Computing
NTNU CSE Seminar Oct 2, 2013
51. 51
Multi GPU
http://research.idi.ntnu.no/hpc-lab
• Load distribution challenging
– Different hardware
– Different amount of work per thread (ray/pixel)
• Use previous image to divide work for next
• Ray length as proxy for amount of work
Falch & Elster: The Power of GPU Computing
NTNU CSE Seminar Oct 2, 2013
52. 52
Results
Falch & Elster: The Power of GPU Computing
http://research.idi.ntnu.no/hpc-lab
NTNU CSE Seminar Oct 2, 2013
53. 53
Results
Falch & Elster: The Power of GPU Computing
http://research.idi.ntnu.no/hpc-lab
NTNU CSE Seminar Oct 2, 2013
54. 54
Results
Falch & Elster: The Power of GPU Computing
http://research.idi.ntnu.no/hpc-lab
NTNU CSE Seminar Oct 2, 2013
55. 55
Results
Falch & Elster: The Power of GPU Computing
http://research.idi.ntnu.no/hpc-lab
NTNU CSE Seminar Oct 2, 2013
56. 56
Results
Falch & Elster: The Power of GPU Computing
http://research.idi.ntnu.no/hpc-lab
NTNU CSE Seminar Oct 2, 2013
57. 57
Results
Falch & Elster: The Power of GPU Computing
http://research.idi.ntnu.no/hpc-lab
NTNU CSE Seminar Oct 2, 2013
58. 58
Results
Falch & Elster: The Power of GPU Computing
http://research.idi.ntnu.no/hpc-lab
NTNU CSE Seminar Oct 2, 2013
62. 62
Dealing with bandwidth issues:
Compression of Large Seismic
Datasets on GPU (Aqrawi & Elster IPDPS 2011)
http://research.idi.ntnu.no/hpc-lab
Falch & Elster: The Power of GPU Computing
NTNU CSE Seminar Oct 2, 2013
63. 63
Motivation
http://research.idi.ntnu.no/hpc-lab
Locality & I/O – challenge for data intensive algorithms
Look at techniques for reducing Mem. Bandwidth
– Hardware: HDD, SSD
– Compression: JPEG, MPEG, MP3 ...
–
Explore GPU compression capabilities
Seismic filtering process
–
Transform coding works well for signal data*
* [H.S.Malvar 1992], [L.C.Duval 2000], [C.Larsen 2006], [D.Haugen 2009]
Falch & Elster: The Power of GPU Computing
NTNU CSE Seminar Oct 2, 2013
64. 64
http://research.idi.ntnu.no/hpc-lab
Results GPU acceleration
Execution time comparison to FERMI arcitechture
700
600
Execution time (s)
500
Intel i7 Single
Intel i7 Quad
Nvidia Tesla c1060
Nvidia Tesla c2050
400
300
200
100
0
DCT 3D
DCT AAN 3D
LOT 1D
Algorithm
Falch & Elster: The Power of GPU Computing
NTNU CSE Seminar Oct 2, 2013
65. Falch & Elster: The Power of GPU Computing
3D AAN(GPU)
2D AAN(GPU)
1D AAN(GPU)
2D DCT (GPU)
1D DCT (GPU)
3D AAN(Quad)
2D AAN(Quad)
1D AAN(Quad)
2D DCT (Quad)
I/O Speedup HDD
1D DCT (Quad)
3D AAN (Single)
2D AAN (Single)
1D AAN (Single)
2D DCT (Single)
1D DCT (Single)
Huffman (GPU)
Huffman (Quad)
Huffman (Single)
RLE(Quad)
RLE (Single)
I/O speed up compa red to pla tform
65
http://research.idi.ntnu.no/hpc-lab
Results I/O Speedup
I/O Speedup SSD
7
6
5
4
3
2
1
0
Compression algor ithm
NTNU CSE Seminar Oct 2, 2013
66. 66
Summary Compression
http://research.idi.ntnu.no/hpc-lab
–
–
When optimizing for I/O need efficent compression rate
AND fast compression algorithm
Compression can give up to:
– 6.2 I/O speedup on HDD (70MB/s)
– 3.9 I/O speedup on SSD (140MB/s)
–
Achieved through
– Transform coding
– CPU & GPU co-op
– Asynch I/O
–
–
Predictive model accurate within 5%
Seismic compression library
Falch & Elster: The Power of GPU Computing
NTNU CSE Seminar Oct 2, 2013