SlideShare una empresa de Scribd logo
1 de 66
Descargar para leer sin conexión
1

http://research.idi.ntnu.no/hpc-lab

The Power of GPU Computing
Thomas L. Falch and Dr. Anne C. Elster(*)
HPC-Lab, Dept. Computer and Info. Science
Norwegian University of Science & Technology
Trondheim, Norway
(*)Elster also holds a 0% Visiting Scientist appointment at
ICES, UT Austin where she spends summer and sabatticals

Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
2

Thank yous to:

http://research.idi.ntnu.no/hpc-lab

HPC-Lab Post Docs
and grad. students!

HPC-Lab 2012
Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
3

Dr. Elster´s HPC-Lab currently focuses
on research related to novel GPU
and multi-core architectures

http://research.idi.ntnu.no/hpc-lab

> 40 Master students (since 2001)
> 15 masters projects on GPU for HPC
Parallelization of Seismic and Image
Related Applications on GPUs and
Multi-Cores
Modeling Heterogenous systems
Parallel and Distributed Algorithms and
Tools
Performance Evaluation and
Benchmarking

Collaborators / Supporters:

Adaptive and Auto-Tuneable
Algorithms and Implementations

NTNU CSE Seminar Oct 2, 2013
AMD, ARM, CERN, NVIDIA, Statoil, GPU Computing
Falch & Elster: The Power of Schlumberger, GE-Healthcare, and others
4

Outline

http://research.idi.ntnu.no/hpc-lab

• Introduction to GPU computing
• Overview of GPU projects at the HPC-Lab
– 3D Real-Time Snow simulation
– Flow simulations (porous media)
– Surface extraction

• Visualization of scattered point data

Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
5

http://research.idi.ntnu.no/hpc-lab

Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
6

http://research.idi.ntnu.no/hpc-lab

Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
7

The “Walls” (refr. Dr. David Patterson)

http://research.idi.ntnu.no/hpc-lab

To increase processor performance one can:
1. Increase the system clock speed -> Power Wall(*)
2. Increase memory bandwidth-> more complex
3. Parallelize -> more complex

(*) The Power Wall: Too much Heat and
transistor performance degrades
(more power leakage as power increases)!
 Now maxing out at 3-4GHz for general processors
Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
8

http://research.idi.ntnu.no/hpc-lab

Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
The TOP10 (June 2013)
Rank

Site

Manufacturer

1

National University of
Defense Technology

NUDT

2

DOE/SC/Oak Ridge
National Laboratory

Cray

3

DOE/NNSA/LLNL

IBM

4

5

RIKEN Advanced
Institute for
Computational
Science (AICS)
DOE/SC/Argonne
National Laboratory

Computer

Country

Cores

Rmax
[Tflops]

Power
[MW]

Tianhe-2 (MilkyWay-2) - THIVB-FEP Cluster, Intel Xeon
E5-2692 12C 2.200GHz, TH
China 3,120,000 33,862.7 17.8
Express-2, Intel Xeon Phi
31S1P
Titan - Cray XK7 , Opteron
6274 16C 2.200GHz, Cray
USA
560,640 17,590.0
8.2
Gemini interconnect, NVIDIA
K20x
Sequoia - BlueGene/Q, Power
USA 1,572,864 17,173.2
7.9
BQC 16C 1.60 GHz, Custom

Fujitsu

K computer, SPARC64 VIIIfx
2.0GHz, Tofu interconnect

Japan

IBM

Mira - BlueGene/Q, Power
BQC 16C 1.60GHz, Custom

USA

705,024 10,510.0 12.7
786,432

8,586.6

3.9
10

Intel´s Xeon Phi

http://research.idi.ntnu.no/hpc-lab

(aka MIC, Knights Ferry/Knights Corner, Larrabee)

Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
11

How to get to Exascale?

http://research.idi.ntnu.no/hpc-lab

Limited by Power!
Solution?
(BOF at SC´10 by Elster, Vaquez-Poletti & Perhac:
Towards Exa-Scale: Heterogeneous Clouds
(CPUs, GPUs and Embedded Devices)

Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
12

http://research.idi.ntnu.no/hpc-lab

Outline
• Introduction to GPU computing
• Overview of GPU projects at the HPC-Lab
– 3D Real-Time Snow simulation
– Flow simulations (porous media)
– Surface extraction

• Visualization of scattered point data

Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
13

NTNU GPU Activities

http://research.idi.ntnu.no/hpc-lab

NTNU is a NVIDIA CUDA Research & Teaching Center
• Elster teaches a Senior Parallel computing class with
50+ students
•

Elster´s HPC-lab has graduated 20+ Master
students in GPU computing (2007-2013)

Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
14

HPC-Lab History (last 6 yrs):
http://research.idi.ntnu.no/hpc-lab

Fall 2006:
• First 2 student projects with GPU programming (Cg)
Christian Larsen (MS Fall Project, December 2006):
“Utilizing GPUs on Cluster Computers” (joint with Schlumberger)
Erik Axel Nielsen asks for FX 4800 card for project with GE Healthcare
•

Elster head of Computational Science & Visualization program and helped
NTNU acquire new IBM Supercomputer
(Njord, 7+ TFLOPS, proprietary switch)

14
Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
15

HPC-Lab History (contin.):
http://research.idi.ntnu.no/hpc-lab

2007:

Erik Axel Nielsen (Masters thesis, June 2007):
“Real-time Wavelet Filtering on the GPU” -- joint project with GE Healthcare.
40 times GPU speedup of algorithm led to our implementation being adopted the same fall in
their high-end cardivascular ultrasound scanner.
Christian Larsen (Masters thesis, June 2007) Tore Fevang, Schlumberger (co-advisor):
"Framework for Polygonial Structures Computations on Clusters” (incl GPU parallelization)
Idar Borlaug (Masters thesis, June 2007):
“ Seismic Processing Using Parallel 3D FMM”
Thibault Collet (Masters thesis summer 2007):
"Massively Online Games with Food Chains"
Knut Imar Hagen (Masters thesis, June 2007)
“Fault-tolerance for MPI Codes on Computation
Clusters” (joint project with Statoil)
Nils Magnus Larsgård (Masters thesis summer 2007):
“Framework for Converting MPI Codes to Hybrid OpenMP/MPI Codes”
15
Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
16

HPC-Lab History (contin.):
http://research.idi.ntnu.no/hpc-lab

2008:
•

Quadcore Supercomputer at UiTø (Stallo)
ca. 70 TF

•

HPC-LAB at IDI/NTNU opens in Oct. with
• several NVIDIA donation
• Several quad-core machines (1-2
donated by Schlumberger)

16
Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
17

HPC-Lab History (contin.):
http://research.idi.ntnu.no/hpc-lab

2008:
HPC-LAB at IDI/NTNU opens in Oct. with
• several NVIDIA donation
• Several quad-core machines (1-2
donated by Schlumberger)

Atle Rudshaug (Masters thesis, June 2008): “Optimizing & Parallelizing a Large
Commercial Code for Modeling Oil-well Networks” -- joint project with Yggdrasil
Andreas Bach (Masters thesis, September 2008): “Profiling and Optimizing
a Seismic Application on Modern Architectures” -- joint project with Statoil
Rune Hovland (Masters project, Dec 2008) :
"Latency and Bandwidth Impact on GPU Systems" (ParCo 2009 w/ Elster)
Daniele Giuseppe Spampinato (Masters Project, December 2008):
"Linear Optimizations with CUDA (IPDPS MTAAP 2009 w/ Elster)
17
Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
18

Selected Master theses and Master reports
supervised by Dr. Elster in 2009
1) Robin Eidissen (Masters thesis, January 2009) :
http://research.idi.ntnu.no/hpc-lab
"Utilizing GPUs for Real-Time Visualization of Snow” (demoed @ SC´08-SC´10)
Eirik Aksnes and Henrik Hesland (MS Project, Jan 2009) :
"GPU Techniques for Porous Rock Visualization”
2) Rune Erlend Jensen (Masters thesis, May 2009, currently PhD student at HPC-Lab) :
"Techniques and Tools for Optimizing Codes on Modern Architectures:
A Low-Level Approach” (NR MS Thesis Award!)
3) Rune Johan Hovland (Masters thesis, June 2009),
Dr. Magnus Lie Hetland (co-advisor): "Throughput Computing on Future GPUs”
4) Henrik Hesland (Masters thesis, June 2009) Thorvald Natvig (co-advisor):
"GPU-Enabled Interactive Pore Detection for 3D Rock Visualization "
5) Eirik Ola Aksnes (Masters thesis, July 2009)
Ståle Fjeldstand & Atle Rudshaug, Numerical Rocks (co-advisors):
"Simulation of Fluid Flow Through Porous Rocks on Modern GPUs" (ParCo 2009)
6) Daniel Haugen (Masters thesis, July 2009) Tore Fevang, Schlumberger (co-advisor):
"Seismic Data Compression and GPU Memory Latency"
7) Åsmund Herikstad (Masters thesis, July 2009) Svein-Erik Måsøy, MedTek, NTNU (co-advisor)
"Parallel Techniques for Estimation and Correction of Aberration in Medical Ultrasound Imaging"
8) Owe Johansen (Masters thesis, July 2009) John Hybertsen & Jon André Haugen, Statoil (coadvisors): "Seismic Shot Processing on GPU"
9) Daniele Giuseppe Spampinato (Masters thesis, July 2009; currently PhD student @ ETH)
"Modeling Communication on Multi-GPU Systems” (ParCo 2009)
Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
19

HPC-Lab -- Spring 2010
http://research.idi.ntnu.no/hpc-lab

Dr. Anne C. Elster
Lab Director

Dr. John P. Ryan
Post Doc

Dr. Jan Perhac
Post Doc

Jan Christian
Meyer (PhD stud)

Thorvald Natvig
(PhD stud.)

Rune E. Jensen
(PhD stud.)

Master Students – Spring 2010

Ahmed Aqrawi
Assist. TDT 4200

Aleksander
Gjermundsen

Affiliates /Visitors

Andreas
Hysing

Øystein Krog

Holger Ludvigsen
Assist TDT 4205

+ 2 Cybernetics students
+ 3 visualization students
+ 1-2 || arch/multicore students
+ 1 Marine student

Eirik O. Aksnes
(tentative PhD,
Now consultant
for Statoil)

Refsnaes & Singh did FEM on GPU -NTNU CSE
Kvamsdal & Elster Seminar Oct 2, 2013

Gagandeep
Collarborations between
Falch & Elster: The Power of GPU Computing
Singh (Math)
20

HPC-Lab History (contin.):
http://research.idi.ntnu.no/hpc-lab

2010:
-

NVIDIA Fermi-based card(470, c2050, c2070(fall))
More on OpenCL

Ahmed A. Aqwari (Masters thesis, June 2010):
“Effects of Compression on Data Intensive Algorithms”
Aleksander Gjermundsen (Masters thesis, July 2010):
“Audio Processing on GPU”
Andreas Hysing (Masters thesis, Aug 2010): Parallel Inversion code (w/Statoil)
Øystein Krog (Masters thesis, June 2010):
“GPU-based Real-Time Snow Avalanche Simulations” (SPH)
Holger Ludvigsen (Masters thesis, June 2010, Dr. Frank Lindseth (co-advisor):
“Real-Time GPU-Based 3D Ultrasound Reconstruction
and Visualization”
Thorvald Natvig (PhD Dec 2010) “Automatic Run-Time Communication and I/O”
20

Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
HPC-Lab -- 2011
(Elster on sabatical 2010/11)

21

Dr. Anne C. Elster
Lab Director

Dr. Ian Karlin
Post Doc

Jan Christian
Meyer (PhD stud)

Rune E. Jensen
(PhD stud.)

http://research.idi.ntnu.no/hpc-lab

Erik Smistad
(PhD stud.)
Elster co-advisor

Master Students – Spring 2011

Fredrik Fossum
GPU Rigid body
simulation

Yngve S. Lindal
(GPU proj
@ CERN)

Affiliates /Visitors

Ole-Martin
Brende
(MedTech)

Ove
Stinessen
(Statoil proj)

Jarle
Stensland
(OpenCL BLAS)

Thor Kristian
Valderhaug
(Numerical Rocks
proj Multi-GPU LBM)

Geir Jostein
Lien
(2-yr Master
Informatics,
graduated
2012)

Miguel Martinez-delAmor (PhD student from
Spain,
Falch & Elster: The Power of GPU Computing Fall 2011 NTNU CSE Seminar Oct 2, 2013
22

HPC-Lab: Master theses 2012
http://research.idi.ntnu.no/hpc-lab

Kjetil Babington: Terrain Rendering Techniques for the HPC-Lab Snow Simulato

Thomas Løfsgaard Falch: 3D Visualization of X-ray Diffraction Data

Geir Josten Lien: Auto-tunable GPU BLAS
Jan Magne Rovde: Real-Time Granular Flow Simulation Using
the PCISPH Method on GPGPU Devices Using CUDA
Frederik Magnus Johansen Vestre:
Enhancing and Porting the HPC-Lab Snow Simulator to OpenCL
on Mobile Platforms
Jan Christian Meyer, PhD Theses (December):
Performance Modeling of Heterogeneous Systems
Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
23

HPC-Lab: 2012/2013
http://research.idi.ntnu.no/hpc-lab

Jan Christian Meyer, PhD Theses (December):
Performance Modeling of Heterogeneous Systems

Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
24

HPC-Lab: Master Theses 2013

http://research.idi.ntnu.no/hpc-lab

Lark Kirkholt Melhus (June):
Analyzing Contextual Bias of Program Execution on Modern CPUs
Magnus Mikalsen (June): OpenACC-based Snow Simulation
Andreas Nordahl (June): Enhancing the HPC-Lab Snow Simulator
with More Realistic Terrains and Other Interactive Features
Lars Espen Nordhus (June): Ray Tracing for Simulation of Wireless Networks
in 3D Scenes
Stian Aaraas Pedersen (June): Progressive Photon Mapping on GPUs
Andreas Skomedal (June): Heterogeneous FTDT for Seismic Processing
Henrik Holenbakken Knutsen (Sept): Enhancing Software Portability
with Hardware Parametrized Autotuning
Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
25

HPC-Lab: 2013/2014

http://research.idi.ntnu.no/hpc-lab

Anne C. Elster – Director
Malik Khan – Post Doc to start Nov 1, 2013
PhD students:

●

Rune Jensen,

●

Johannes Kvam

●

Thomas Falch,

Samira Pakdel,

Ruben Spaans

Co-supervised by Elster:

●
●
●
●
●

Johannes Kvam, Erik Smistad, Mehdi Bozorgi, Lane Holloway (UT ECE Student)

●

+ 8 master students & 2 MedTech PhD students
Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
26

Outline

http://research.idi.ntnu.no/hpc-lab

• Introduction to GPU computing
• Overview of GPU projects at the HPC-Lab
– 3D Real-Time Snow simulation
– Flow simulations (porous media)
– Surface extraction

• Visualization of scattered point data

Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
27

http://research.idi.ntnu.no/hpc-lab

Snow Simulation:
calc. 4+ million particles in real-time
using multi-core CPU + GPU

Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
28

http://research.idi.ntnu.no/hpc-lab

Snow Simulation – Wind field

Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
29

http://research.idi.ntnu.no/hpc-lab

Add more real-time features

Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
30

Add Road Generation
(Used A* algorithm, Demo @ SC11)

Falch & Elster: The Power of GPU Computing

http://research.idi.ntnu.no/hpc-lab

NTNU CSE Seminar Oct 2, 2013
31

Add Ray-Tracing

Falch & Elster: The Power of GPU Computing

http://research.idi.ntnu.no/hpc-lab

NTNU CSE Seminar Oct 2, 2013
32

Outline

http://research.idi.ntnu.no/hpc-lab

• Introduction to GPU computing
• Overview of GPU projects at the HPC-Lab
– 3D Real-Time Snow simulation
– Flow simulations (porous media)
– Surface extraction

• Visualization of scattered point data

Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
33

Simulations of Fluid Flow through
Porous Rocks using GPUs

http://research.idi.ntnu.no/hpc-lab

Eirik Ola Aksnes & A.C: Elster (ParCo 2009)
+ current work with Thor Kristian Valderhaug using OpenCL
In collaboration with :
Numerical Rocks & NTNU Chemistry Dept.
Use Lattice Bolzmann Method
a.k.a LBM

Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
34

Outline

http://research.idi.ntnu.no/hpc-lab

• Introduction to GPU computing
• Overview of GPU projects at the HPC-Lab
– 3D Real-Time Snow simulation
– Flow simulations (porous media)
– Surface extraction

• Visualization of scattered point data

Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
35

3D Surface Extraction

http://research.idi.ntnu.no/hpc-lab

(w/ Dr. Frank Lindseth (SINTEF MedTek and NTNU,
and MS/PhD student Erik Smistad

Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
36

3D Surface Extraction

http://research.idi.ntnu.no/hpc-lab

(w/ Dr. Frank Lindseth (SINTEF MedTek and NTNU,
and MS/PhD student Erik Smistad

Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
37

3D Surface Extraction on GPUs

http://research.idi.ntnu.no/hpc-lab

• Use Marching Cubes – algorithm for extracting a 3D
surface from a set of sampled scalars
• Algorithm used extensively for visualizing and analyzing
medical data (X-ray, MR) and the result of 3D
segmentation.
• Completely data parallel
• Challenge:
How to store the result of each
cube in parallel on GPU
Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
38

3D Surface Extraction
-- Histogram data
•

http://research.idi.ntnu.no/hpc-lab

Challenge: How to store the result of each cube in parallel on GPU?
In serial implementation this is simple – just use a stack and
add the vertex data to the stack

• GPU Solution: Histogram Pyramids [1]
• A datastructure that:
•
•
•
•

Filters out cubes that has no triangle (stream reduction)
Returns total sum of triangles
Provides each cube with an index for memory storage
Can be efficiently used by means of textures yielding large speed-ups

[1] G. Ziegler et al: On-the-fly Point Clouds through Histogram Pyramids; Vision, Modeling, and Visualization 2006

Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
39

3D Surface Extraction

http://research.idi.ntnu.no/hpc-lab

-- Histogram Pyramids: Construction & Traversal

HP Construction

HP Traversal

Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
40

3D Surface Extraction
http://research.idi.ntnu.no/hpc-lab

-- Results:
HPMC Dyken et al.
Size

Exec.
time

512^3

vs.

Our OpenCL implementation

FPS
(avg)

Memory

Size

Exec.
time

FPS
(avg)

Memory

3324 ms

0.3

490 MB

512^3

34 ms

0.3

121 MB

256^3

5 ms

223

122 MB

256^3

10 ms

105

40 MB

128^3

3 ms

394

44 MB

128^3

4 ms

233

26 MB

64^3

2 ms

519

22MB

64^3

3 ms

319

22MB

Our Test system:
•
•
•
•

Intel i5 750, 4GB RAM
ATI Radeon 5870 (1GB RAM)
AMD Catalyst 11.2 graphics driver
APP SDK 2.3 w/ OpenCL 1.1

Note:
OpenCL-OpenGL
Synch measured to
be 2-20ms, i.e. 7090<% for smallest
datasets

Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
41

Outline

http://research.idi.ntnu.no/hpc-lab

• Introduction to GPU computing
• Overview of GPU projects at the HPC-Lab
– 3D Real-Time Snow simulation
– Flow simulations (porous media)
– Surface extraction

• Visualization of scattered point data

Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
42

Scattered Point Data

(a )

(b)

Falch & Elster: The Power of GPU Computing

http://research.idi.ntnu.no/hpc-lab

(c)

NTNU CSE Seminar Oct 2, 2013
43

Examples

http://research.idi.ntnu.no/hpc-lab

• Sensor networks
• Simulations (n-body, SPH)
• Post-processing/streaming
over network

Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
44

X-ray Diffraction

http://research.idi.ntnu.no/hpc-lab

Detector

X-ray source
Q
^i
k

^f
k

Specimen
Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
45

Volume Ray Casting
Eye/camera

http://research.idi.ntnu.no/hpc-lab

image

Volume

ray

Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
46

Volume Ray Casting of
Scattered Point Data

http://research.idi.ntnu.no/hpc-lab

Eye/camera

Image

Ray
Bounding box

Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
47

http://research.idi.ntnu.no/hpc-lab

Interpolation

r2

r3
r1

Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
48

Finding Neighbors

Falch & Elster: The Power of GPU Computing

http://research.idi.ntnu.no/hpc-lab

NTNU CSE Seminar Oct 2, 2013
49

http://research.idi.ntnu.no/hpc-lab

Optimizations
• Empty space skipping
• Early ray termination
• Filtering

C
B
A

C
B
A

Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
50

GPU Implementation
•
•
•
•

http://research.idi.ntnu.no/hpc-lab

CUDA C, almost same code
One thread for each ray/pixel
Remove recursion (for older hardware)
Texture memory

Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
51

Multi GPU

http://research.idi.ntnu.no/hpc-lab

• Load distribution challenging
– Different hardware
– Different amount of work per thread (ray/pixel)

• Use previous image to divide work for next
• Ray length as proxy for amount of work

Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
52

Results

Falch & Elster: The Power of GPU Computing

http://research.idi.ntnu.no/hpc-lab

NTNU CSE Seminar Oct 2, 2013
53

Results

Falch & Elster: The Power of GPU Computing

http://research.idi.ntnu.no/hpc-lab

NTNU CSE Seminar Oct 2, 2013
54

Results

Falch & Elster: The Power of GPU Computing

http://research.idi.ntnu.no/hpc-lab

NTNU CSE Seminar Oct 2, 2013
55

Results

Falch & Elster: The Power of GPU Computing

http://research.idi.ntnu.no/hpc-lab

NTNU CSE Seminar Oct 2, 2013
56

Results

Falch & Elster: The Power of GPU Computing

http://research.idi.ntnu.no/hpc-lab

NTNU CSE Seminar Oct 2, 2013
57

Results

Falch & Elster: The Power of GPU Computing

http://research.idi.ntnu.no/hpc-lab

NTNU CSE Seminar Oct 2, 2013
58

Results

Falch & Elster: The Power of GPU Computing

http://research.idi.ntnu.no/hpc-lab

NTNU CSE Seminar Oct 2, 2013
59

http://research.idi.ntnu.no/hpc-lab

Questions?

Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
60

Thank yous to:

http://research.idi.ntnu.no/hpc-lab

HPC-Lab Post Docs
and grad. students!
@ SC´07

Spring 2007

Spring 2010
Spring 2009

Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
61

http://research.idi.ntnu.no/hpc-lab

Modeling Heterogeneous Systems
“Optimized Barriers for
Heterogeneous Systems
Using MPI”

Jan Christian Meyer
PhD Student
Finishing summer 2011

(to be presented at
IEEE IPDPS 2011, HCW)

Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
62

Dealing with bandwidth issues:
Compression of Large Seismic
Datasets on GPU (Aqrawi & Elster IPDPS 2011)

http://research.idi.ntnu.no/hpc-lab

Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
63

Motivation

http://research.idi.ntnu.no/hpc-lab

Locality & I/O – challenge for data intensive algorithms
Look at techniques for reducing Mem. Bandwidth
– Hardware: HDD, SSD
– Compression: JPEG, MPEG, MP3 ...

–

Explore GPU compression capabilities

Seismic filtering process
–
Transform coding works well for signal data*
* [H.S.Malvar 1992], [L.C.Duval 2000], [C.Larsen 2006], [D.Haugen 2009]

Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
64

http://research.idi.ntnu.no/hpc-lab

Results GPU acceleration
Execution time comparison to FERMI arcitechture
700

600

Execution time (s)

500
Intel i7 Single
Intel i7 Quad
Nvidia Tesla c1060
Nvidia Tesla c2050

400

300

200

100

0
DCT 3D

DCT AAN 3D

LOT 1D

Algorithm

Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013
Falch & Elster: The Power of GPU Computing
3D AAN(GPU)

2D AAN(GPU)

1D AAN(GPU)

2D DCT (GPU)

1D DCT (GPU)

3D AAN(Quad)

2D AAN(Quad)

1D AAN(Quad)

2D DCT (Quad)

I/O Speedup HDD

1D DCT (Quad)

3D AAN (Single)

2D AAN (Single)

1D AAN (Single)

2D DCT (Single)

1D DCT (Single)

Huffman (GPU)

Huffman (Quad)

Huffman (Single)

RLE(Quad)

RLE (Single)

I/O speed up compa red to pla tform

65

http://research.idi.ntnu.no/hpc-lab

Results I/O Speedup
I/O Speedup SSD

7

6

5

4

3

2

1

0

Compression algor ithm

NTNU CSE Seminar Oct 2, 2013
66

Summary Compression

http://research.idi.ntnu.no/hpc-lab

–
–

When optimizing for I/O need efficent compression rate
AND fast compression algorithm
Compression can give up to:
– 6.2 I/O speedup on HDD (70MB/s)
– 3.9 I/O speedup on SSD (140MB/s)

–

Achieved through
– Transform coding
– CPU & GPU co-op
– Asynch I/O

–
–

Predictive model accurate within 5%
Seismic compression library

Falch & Elster: The Power of GPU Computing

NTNU CSE Seminar Oct 2, 2013

Más contenido relacionado

La actualidad más candente

Big Fast Data in High-Energy Particle Physics
Big Fast Data in High-Energy Particle PhysicsBig Fast Data in High-Energy Particle Physics
Big Fast Data in High-Energy Particle PhysicsAndrew Lowe
 
Autonomous experimental phase diagram acquisition
Autonomous experimental phase diagram acquisitionAutonomous experimental phase diagram acquisition
Autonomous experimental phase diagram acquisitionaimsnist
 
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...Frederic Desprez
 
Automated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design ProblemsAutomated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design ProblemsAnubhav Jain
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningAnubhav Jain
 
Materials Project computation and database infrastructure
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructureAnubhav Jain
 
Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Anubhav Jain
 
The DuraMat Data Hub and Analytics Capability: A Resource for Solar PV Data
The DuraMat Data Hub and Analytics Capability: A Resource for Solar PV DataThe DuraMat Data Hub and Analytics Capability: A Resource for Solar PV Data
The DuraMat Data Hub and Analytics Capability: A Resource for Solar PV DataAnubhav Jain
 
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & AlluxioAlluxio, Inc.
 
Overview of DuraMat software tool development
Overview of DuraMat software tool developmentOverview of DuraMat software tool development
Overview of DuraMat software tool developmentAnubhav Jain
 
DuraMat Data Analytics
DuraMat Data AnalyticsDuraMat Data Analytics
DuraMat Data AnalyticsAnubhav Jain
 
The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...Anubhav Jain
 
Software tools for data-driven research and their application to thermoelectr...
Software tools for data-driven research and their application to thermoelectr...Software tools for data-driven research and their application to thermoelectr...
Software tools for data-driven research and their application to thermoelectr...Anubhav Jain
 
A science-gateway for workflow executions: online and non-clairvoyant self-h...
A science-gateway for workflow executions: online and non-clairvoyant self-h...A science-gateway for workflow executions: online and non-clairvoyant self-h...
A science-gateway for workflow executions: online and non-clairvoyant self-h...Rafael Ferreira da Silva
 
The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...Anubhav Jain
 
Computational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methodsComputational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methodsAnubhav Jain
 
Smart Metrics for High Performance Material Design
Smart Metrics for High Performance Material DesignSmart Metrics for High Performance Material Design
Smart Metrics for High Performance Material Designaimsnist
 
Materials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learningMaterials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learningAnubhav Jain
 
Data dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNLData dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNLAnubhav Jain
 

La actualidad más candente (20)

Big Fast Data in High-Energy Particle Physics
Big Fast Data in High-Energy Particle PhysicsBig Fast Data in High-Energy Particle Physics
Big Fast Data in High-Energy Particle Physics
 
Autonomous experimental phase diagram acquisition
Autonomous experimental phase diagram acquisitionAutonomous experimental phase diagram acquisition
Autonomous experimental phase diagram acquisition
 
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
Grid'5000: Running a Large Instrument for Parallel and Distributed Computing ...
 
Automated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design ProblemsAutomated Machine Learning Applied to Diverse Materials Design Problems
Automated Machine Learning Applied to Diverse Materials Design Problems
 
Software tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data miningSoftware tools for high-throughput materials data generation and data mining
Software tools for high-throughput materials data generation and data mining
 
Materials Project computation and database infrastructure
Materials Project computation and database infrastructureMaterials Project computation and database infrastructure
Materials Project computation and database infrastructure
 
Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...Software tools, crystal descriptors, and machine learning applied to material...
Software tools, crystal descriptors, and machine learning applied to material...
 
The DuraMat Data Hub and Analytics Capability: A Resource for Solar PV Data
The DuraMat Data Hub and Analytics Capability: A Resource for Solar PV DataThe DuraMat Data Hub and Analytics Capability: A Resource for Solar PV Data
The DuraMat Data Hub and Analytics Capability: A Resource for Solar PV Data
 
DIET_BLAST
DIET_BLASTDIET_BLAST
DIET_BLAST
 
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & AlluxioUltra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio
Ultra Fast Deep Learning in Hybrid Cloud using Intel Analytics Zoo & Alluxio
 
Overview of DuraMat software tool development
Overview of DuraMat software tool developmentOverview of DuraMat software tool development
Overview of DuraMat software tool development
 
DuraMat Data Analytics
DuraMat Data AnalyticsDuraMat Data Analytics
DuraMat Data Analytics
 
The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...The Materials Project: Experiences from running a million computational scien...
The Materials Project: Experiences from running a million computational scien...
 
Software tools for data-driven research and their application to thermoelectr...
Software tools for data-driven research and their application to thermoelectr...Software tools for data-driven research and their application to thermoelectr...
Software tools for data-driven research and their application to thermoelectr...
 
A science-gateway for workflow executions: online and non-clairvoyant self-h...
A science-gateway for workflow executions: online and non-clairvoyant self-h...A science-gateway for workflow executions: online and non-clairvoyant self-h...
A science-gateway for workflow executions: online and non-clairvoyant self-h...
 
The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...The Materials Project: An Electronic Structure Database for Community-Based M...
The Materials Project: An Electronic Structure Database for Community-Based M...
 
Computational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methodsComputational materials design with high-throughput and machine learning methods
Computational materials design with high-throughput and machine learning methods
 
Smart Metrics for High Performance Material Design
Smart Metrics for High Performance Material DesignSmart Metrics for High Performance Material Design
Smart Metrics for High Performance Material Design
 
Materials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learningMaterials discovery through theory, computation, and machine learning
Materials discovery through theory, computation, and machine learning
 
Data dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNLData dissemination and materials informatics at LBNL
Data dissemination and materials informatics at LBNL
 

Destacado

Study On Monorail Sky Train Passenger Landing Facilities
Study On Monorail Sky Train Passenger Landing FacilitiesStudy On Monorail Sky Train Passenger Landing Facilities
Study On Monorail Sky Train Passenger Landing FacilitiesIJRES Journal
 
Sixth Sence Technology
Sixth Sence TechnologySixth Sence Technology
Sixth Sence TechnologyBeat Boyz
 
Cloud operating system
Cloud operating systemCloud operating system
Cloud operating systemsadak pramodh
 
Seminar Report On Implementation Of Li-FI Technology
Seminar Report On Implementation Of Li-FI TechnologySeminar Report On Implementation Of Li-FI Technology
Seminar Report On Implementation Of Li-FI TechnologyAnjeet Kumar
 
ieee projects 2014-15 for cse with abstract and base paper
ieee projects 2014-15 for cse with abstract and base paper ieee projects 2014-15 for cse with abstract and base paper
ieee projects 2014-15 for cse with abstract and base paper vsanthosh05
 
Data compretion
Data compretionData compretion
Data compretionSajan Sahu
 
DigiLocker-presentation
DigiLocker-presentationDigiLocker-presentation
DigiLocker-presentationNeelam Chhipa
 
Latest computing devices & latest technology innovations
Latest computing devices & latest technology innovationsLatest computing devices & latest technology innovations
Latest computing devices & latest technology innovationsKhisal Iftikhar
 
How a search engine works report
How a search engine works reportHow a search engine works report
How a search engine works reportSovan Misra
 
Microsoft hololens final ppt
Microsoft hololens final pptMicrosoft hololens final ppt
Microsoft hololens final pptrekhameenacs
 
Theory of Automata and formal languages unit 2
Theory of Automata and formal languages unit 2Theory of Automata and formal languages unit 2
Theory of Automata and formal languages unit 2Abhimanyu Mishra
 
Transefermation
TransefermationTransefermation
TransefermationToran sahu
 
Introduction of Cloud computing
Introduction of Cloud computingIntroduction of Cloud computing
Introduction of Cloud computingRkrishna Mishra
 

Destacado (15)

Study On Monorail Sky Train Passenger Landing Facilities
Study On Monorail Sky Train Passenger Landing FacilitiesStudy On Monorail Sky Train Passenger Landing Facilities
Study On Monorail Sky Train Passenger Landing Facilities
 
Sixth Sence Technology
Sixth Sence TechnologySixth Sence Technology
Sixth Sence Technology
 
Cloud operating system
Cloud operating systemCloud operating system
Cloud operating system
 
Seminar Report On Implementation Of Li-FI Technology
Seminar Report On Implementation Of Li-FI TechnologySeminar Report On Implementation Of Li-FI Technology
Seminar Report On Implementation Of Li-FI Technology
 
ieee projects 2014-15 for cse with abstract and base paper
ieee projects 2014-15 for cse with abstract and base paper ieee projects 2014-15 for cse with abstract and base paper
ieee projects 2014-15 for cse with abstract and base paper
 
Data compretion
Data compretionData compretion
Data compretion
 
DigiLocker-presentation
DigiLocker-presentationDigiLocker-presentation
DigiLocker-presentation
 
The overview of latest technology
The overview of latest technologyThe overview of latest technology
The overview of latest technology
 
Search Engines
Search EnginesSearch Engines
Search Engines
 
Latest computing devices & latest technology innovations
Latest computing devices & latest technology innovationsLatest computing devices & latest technology innovations
Latest computing devices & latest technology innovations
 
How a search engine works report
How a search engine works reportHow a search engine works report
How a search engine works report
 
Microsoft hololens final ppt
Microsoft hololens final pptMicrosoft hololens final ppt
Microsoft hololens final ppt
 
Theory of Automata and formal languages unit 2
Theory of Automata and formal languages unit 2Theory of Automata and formal languages unit 2
Theory of Automata and formal languages unit 2
 
Transefermation
TransefermationTransefermation
Transefermation
 
Introduction of Cloud computing
Introduction of Cloud computingIntroduction of Cloud computing
Introduction of Cloud computing
 

Similar a Elster falch-gpu-cse-sem-oct2013

Larry Smarr - NRP Application Drivers
Larry Smarr - NRP Application DriversLarry Smarr - NRP Application Drivers
Larry Smarr - NRP Application DriversLarry Smarr
 
Nikravesh big datafeb2013bt
Nikravesh big datafeb2013btNikravesh big datafeb2013bt
Nikravesh big datafeb2013btMasoud Nikravesh
 
European Exascale System Interconnect & Storage
European Exascale System Interconnect & StorageEuropean Exascale System Interconnect & Storage
European Exascale System Interconnect & Storageinside-BigData.com
 
Gridforum Juergen Knobloch Grids For Science 20080402
Gridforum Juergen Knobloch Grids For Science 20080402Gridforum Juergen Knobloch Grids For Science 20080402
Gridforum Juergen Knobloch Grids For Science 20080402vrij
 
Multiphysics Group at HSR
Multiphysics Group at HSRMultiphysics Group at HSR
Multiphysics Group at HSRmictc
 
Generative AI Using HPC in Text Summarization and Energy Plants
Generative AI Using HPC in Text Summarization and Energy PlantsGenerative AI Using HPC in Text Summarization and Energy Plants
Generative AI Using HPC in Text Summarization and Energy PlantsUniversity of Maribor
 
Emerging Trends at Calit2
Emerging Trends at Calit2Emerging Trends at Calit2
Emerging Trends at Calit2Larry Smarr
 
CloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning
 
BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...
BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...
BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...Big Data Week
 
Design and manufacturing of coils for MRI application
Design and manufacturing of coils for MRI applicationDesign and manufacturing of coils for MRI application
Design and manufacturing of coils for MRI applicationNeuroPoly
 
Advances at the Argonne Leadership Computing Center
Advances at the Argonne Leadership Computing CenterAdvances at the Argonne Leadership Computing Center
Advances at the Argonne Leadership Computing Centerdavidemartin
 
Temperature prediction of a two stage pulse tube cryocooler by neural network
Temperature prediction of a two stage pulse tube cryocooler by neural networkTemperature prediction of a two stage pulse tube cryocooler by neural network
Temperature prediction of a two stage pulse tube cryocooler by neural networkIAEME Publication
 
High Performance Cyberinfrastructure is Needed to Enable Data-Intensive Scien...
High Performance Cyberinfrastructure is Needed to Enable Data-Intensive Scien...High Performance Cyberinfrastructure is Needed to Enable Data-Intensive Scien...
High Performance Cyberinfrastructure is Needed to Enable Data-Intensive Scien...Larry Smarr
 
Fajar J. Ekaputra, Marta Sabou, Estefania Serral and Stefan Biffl | Knowledge...
Fajar J. Ekaputra, Marta Sabou, Estefania Serral and Stefan Biffl | Knowledge...Fajar J. Ekaputra, Marta Sabou, Estefania Serral and Stefan Biffl | Knowledge...
Fajar J. Ekaputra, Marta Sabou, Estefania Serral and Stefan Biffl | Knowledge...semanticsconference
 
Designing HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale SystemsDesigning HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale Systemsinside-BigData.com
 
High Performance Computing Infrastructure as a Key Enabler to Engineering Des...
High Performance Computing Infrastructure as a Key Enabler to Engineering Des...High Performance Computing Infrastructure as a Key Enabler to Engineering Des...
High Performance Computing Infrastructure as a Key Enabler to Engineering Des...NSEAkure
 
Science and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated EraScience and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated EraLarry Smarr
 
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...Larry Smarr
 

Similar a Elster falch-gpu-cse-sem-oct2013 (20)

Larry Smarr - NRP Application Drivers
Larry Smarr - NRP Application DriversLarry Smarr - NRP Application Drivers
Larry Smarr - NRP Application Drivers
 
Nikravesh big datafeb2013bt
Nikravesh big datafeb2013btNikravesh big datafeb2013bt
Nikravesh big datafeb2013bt
 
European Exascale System Interconnect & Storage
European Exascale System Interconnect & StorageEuropean Exascale System Interconnect & Storage
European Exascale System Interconnect & Storage
 
Gridforum Juergen Knobloch Grids For Science 20080402
Gridforum Juergen Knobloch Grids For Science 20080402Gridforum Juergen Knobloch Grids For Science 20080402
Gridforum Juergen Knobloch Grids For Science 20080402
 
Multiphysics Group at HSR
Multiphysics Group at HSRMultiphysics Group at HSR
Multiphysics Group at HSR
 
Generative AI Using HPC in Text Summarization and Energy Plants
Generative AI Using HPC in Text Summarization and Energy PlantsGenerative AI Using HPC in Text Summarization and Energy Plants
Generative AI Using HPC in Text Summarization and Energy Plants
 
Emerging Trends at Calit2
Emerging Trends at Calit2Emerging Trends at Calit2
Emerging Trends at Calit2
 
CloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use CaseCloudLightning and the OPM-based Use Case
CloudLightning and the OPM-based Use Case
 
BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...
BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...
BDW16 London - Ingrid Funie, Imperial College London - Machine Learning and F...
 
Design and manufacturing of coils for MRI application
Design and manufacturing of coils for MRI applicationDesign and manufacturing of coils for MRI application
Design and manufacturing of coils for MRI application
 
Advances at the Argonne Leadership Computing Center
Advances at the Argonne Leadership Computing CenterAdvances at the Argonne Leadership Computing Center
Advances at the Argonne Leadership Computing Center
 
Available HPC resources at CSUC
Available HPC resources at CSUCAvailable HPC resources at CSUC
Available HPC resources at CSUC
 
Temperature prediction of a two stage pulse tube cryocooler by neural network
Temperature prediction of a two stage pulse tube cryocooler by neural networkTemperature prediction of a two stage pulse tube cryocooler by neural network
Temperature prediction of a two stage pulse tube cryocooler by neural network
 
Available HPC resources at CSUC
Available HPC resources at CSUCAvailable HPC resources at CSUC
Available HPC resources at CSUC
 
High Performance Cyberinfrastructure is Needed to Enable Data-Intensive Scien...
High Performance Cyberinfrastructure is Needed to Enable Data-Intensive Scien...High Performance Cyberinfrastructure is Needed to Enable Data-Intensive Scien...
High Performance Cyberinfrastructure is Needed to Enable Data-Intensive Scien...
 
Fajar J. Ekaputra, Marta Sabou, Estefania Serral and Stefan Biffl | Knowledge...
Fajar J. Ekaputra, Marta Sabou, Estefania Serral and Stefan Biffl | Knowledge...Fajar J. Ekaputra, Marta Sabou, Estefania Serral and Stefan Biffl | Knowledge...
Fajar J. Ekaputra, Marta Sabou, Estefania Serral and Stefan Biffl | Knowledge...
 
Designing HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale SystemsDesigning HPC & Deep Learning Middleware for Exascale Systems
Designing HPC & Deep Learning Middleware for Exascale Systems
 
High Performance Computing Infrastructure as a Key Enabler to Engineering Des...
High Performance Computing Infrastructure as a Key Enabler to Engineering Des...High Performance Computing Infrastructure as a Key Enabler to Engineering Des...
High Performance Computing Infrastructure as a Key Enabler to Engineering Des...
 
Science and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated EraScience and Cyberinfrastructure in the Data-Dominated Era
Science and Cyberinfrastructure in the Data-Dominated Era
 
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
A Campus-Scale High Performance Cyberinfrastructure is Required for Data-Int...
 

Elster falch-gpu-cse-sem-oct2013

  • 1. 1 http://research.idi.ntnu.no/hpc-lab The Power of GPU Computing Thomas L. Falch and Dr. Anne C. Elster(*) HPC-Lab, Dept. Computer and Info. Science Norwegian University of Science & Technology Trondheim, Norway (*)Elster also holds a 0% Visiting Scientist appointment at ICES, UT Austin where she spends summer and sabatticals Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 2. 2 Thank yous to: http://research.idi.ntnu.no/hpc-lab HPC-Lab Post Docs and grad. students! HPC-Lab 2012 Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 3. 3 Dr. Elster´s HPC-Lab currently focuses on research related to novel GPU and multi-core architectures http://research.idi.ntnu.no/hpc-lab > 40 Master students (since 2001) > 15 masters projects on GPU for HPC Parallelization of Seismic and Image Related Applications on GPUs and Multi-Cores Modeling Heterogenous systems Parallel and Distributed Algorithms and Tools Performance Evaluation and Benchmarking Collaborators / Supporters: Adaptive and Auto-Tuneable Algorithms and Implementations NTNU CSE Seminar Oct 2, 2013 AMD, ARM, CERN, NVIDIA, Statoil, GPU Computing Falch & Elster: The Power of Schlumberger, GE-Healthcare, and others
  • 4. 4 Outline http://research.idi.ntnu.no/hpc-lab • Introduction to GPU computing • Overview of GPU projects at the HPC-Lab – 3D Real-Time Snow simulation – Flow simulations (porous media) – Surface extraction • Visualization of scattered point data Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 5. 5 http://research.idi.ntnu.no/hpc-lab Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 6. 6 http://research.idi.ntnu.no/hpc-lab Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 7. 7 The “Walls” (refr. Dr. David Patterson) http://research.idi.ntnu.no/hpc-lab To increase processor performance one can: 1. Increase the system clock speed -> Power Wall(*) 2. Increase memory bandwidth-> more complex 3. Parallelize -> more complex (*) The Power Wall: Too much Heat and transistor performance degrades (more power leakage as power increases)!  Now maxing out at 3-4GHz for general processors Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 8. 8 http://research.idi.ntnu.no/hpc-lab Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 9. The TOP10 (June 2013) Rank Site Manufacturer 1 National University of Defense Technology NUDT 2 DOE/SC/Oak Ridge National Laboratory Cray 3 DOE/NNSA/LLNL IBM 4 5 RIKEN Advanced Institute for Computational Science (AICS) DOE/SC/Argonne National Laboratory Computer Country Cores Rmax [Tflops] Power [MW] Tianhe-2 (MilkyWay-2) - THIVB-FEP Cluster, Intel Xeon E5-2692 12C 2.200GHz, TH China 3,120,000 33,862.7 17.8 Express-2, Intel Xeon Phi 31S1P Titan - Cray XK7 , Opteron 6274 16C 2.200GHz, Cray USA 560,640 17,590.0 8.2 Gemini interconnect, NVIDIA K20x Sequoia - BlueGene/Q, Power USA 1,572,864 17,173.2 7.9 BQC 16C 1.60 GHz, Custom Fujitsu K computer, SPARC64 VIIIfx 2.0GHz, Tofu interconnect Japan IBM Mira - BlueGene/Q, Power BQC 16C 1.60GHz, Custom USA 705,024 10,510.0 12.7 786,432 8,586.6 3.9
  • 10. 10 Intel´s Xeon Phi http://research.idi.ntnu.no/hpc-lab (aka MIC, Knights Ferry/Knights Corner, Larrabee) Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 11. 11 How to get to Exascale? http://research.idi.ntnu.no/hpc-lab Limited by Power! Solution? (BOF at SC´10 by Elster, Vaquez-Poletti & Perhac: Towards Exa-Scale: Heterogeneous Clouds (CPUs, GPUs and Embedded Devices) Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 12. 12 http://research.idi.ntnu.no/hpc-lab Outline • Introduction to GPU computing • Overview of GPU projects at the HPC-Lab – 3D Real-Time Snow simulation – Flow simulations (porous media) – Surface extraction • Visualization of scattered point data Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 13. 13 NTNU GPU Activities http://research.idi.ntnu.no/hpc-lab NTNU is a NVIDIA CUDA Research & Teaching Center • Elster teaches a Senior Parallel computing class with 50+ students • Elster´s HPC-lab has graduated 20+ Master students in GPU computing (2007-2013) Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 14. 14 HPC-Lab History (last 6 yrs): http://research.idi.ntnu.no/hpc-lab Fall 2006: • First 2 student projects with GPU programming (Cg) Christian Larsen (MS Fall Project, December 2006): “Utilizing GPUs on Cluster Computers” (joint with Schlumberger) Erik Axel Nielsen asks for FX 4800 card for project with GE Healthcare • Elster head of Computational Science & Visualization program and helped NTNU acquire new IBM Supercomputer (Njord, 7+ TFLOPS, proprietary switch) 14 Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 15. 15 HPC-Lab History (contin.): http://research.idi.ntnu.no/hpc-lab 2007: Erik Axel Nielsen (Masters thesis, June 2007): “Real-time Wavelet Filtering on the GPU” -- joint project with GE Healthcare. 40 times GPU speedup of algorithm led to our implementation being adopted the same fall in their high-end cardivascular ultrasound scanner. Christian Larsen (Masters thesis, June 2007) Tore Fevang, Schlumberger (co-advisor): "Framework for Polygonial Structures Computations on Clusters” (incl GPU parallelization) Idar Borlaug (Masters thesis, June 2007): “ Seismic Processing Using Parallel 3D FMM” Thibault Collet (Masters thesis summer 2007): "Massively Online Games with Food Chains" Knut Imar Hagen (Masters thesis, June 2007) “Fault-tolerance for MPI Codes on Computation Clusters” (joint project with Statoil) Nils Magnus Larsgård (Masters thesis summer 2007): “Framework for Converting MPI Codes to Hybrid OpenMP/MPI Codes” 15 Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 16. 16 HPC-Lab History (contin.): http://research.idi.ntnu.no/hpc-lab 2008: • Quadcore Supercomputer at UiTø (Stallo) ca. 70 TF • HPC-LAB at IDI/NTNU opens in Oct. with • several NVIDIA donation • Several quad-core machines (1-2 donated by Schlumberger) 16 Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 17. 17 HPC-Lab History (contin.): http://research.idi.ntnu.no/hpc-lab 2008: HPC-LAB at IDI/NTNU opens in Oct. with • several NVIDIA donation • Several quad-core machines (1-2 donated by Schlumberger) Atle Rudshaug (Masters thesis, June 2008): “Optimizing & Parallelizing a Large Commercial Code for Modeling Oil-well Networks” -- joint project with Yggdrasil Andreas Bach (Masters thesis, September 2008): “Profiling and Optimizing a Seismic Application on Modern Architectures” -- joint project with Statoil Rune Hovland (Masters project, Dec 2008) : "Latency and Bandwidth Impact on GPU Systems" (ParCo 2009 w/ Elster) Daniele Giuseppe Spampinato (Masters Project, December 2008): "Linear Optimizations with CUDA (IPDPS MTAAP 2009 w/ Elster) 17 Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 18. 18 Selected Master theses and Master reports supervised by Dr. Elster in 2009 1) Robin Eidissen (Masters thesis, January 2009) : http://research.idi.ntnu.no/hpc-lab "Utilizing GPUs for Real-Time Visualization of Snow” (demoed @ SC´08-SC´10) Eirik Aksnes and Henrik Hesland (MS Project, Jan 2009) : "GPU Techniques for Porous Rock Visualization” 2) Rune Erlend Jensen (Masters thesis, May 2009, currently PhD student at HPC-Lab) : "Techniques and Tools for Optimizing Codes on Modern Architectures: A Low-Level Approach” (NR MS Thesis Award!) 3) Rune Johan Hovland (Masters thesis, June 2009), Dr. Magnus Lie Hetland (co-advisor): "Throughput Computing on Future GPUs” 4) Henrik Hesland (Masters thesis, June 2009) Thorvald Natvig (co-advisor): "GPU-Enabled Interactive Pore Detection for 3D Rock Visualization " 5) Eirik Ola Aksnes (Masters thesis, July 2009) Ståle Fjeldstand & Atle Rudshaug, Numerical Rocks (co-advisors): "Simulation of Fluid Flow Through Porous Rocks on Modern GPUs" (ParCo 2009) 6) Daniel Haugen (Masters thesis, July 2009) Tore Fevang, Schlumberger (co-advisor): "Seismic Data Compression and GPU Memory Latency" 7) Åsmund Herikstad (Masters thesis, July 2009) Svein-Erik Måsøy, MedTek, NTNU (co-advisor) "Parallel Techniques for Estimation and Correction of Aberration in Medical Ultrasound Imaging" 8) Owe Johansen (Masters thesis, July 2009) John Hybertsen & Jon André Haugen, Statoil (coadvisors): "Seismic Shot Processing on GPU" 9) Daniele Giuseppe Spampinato (Masters thesis, July 2009; currently PhD student @ ETH) "Modeling Communication on Multi-GPU Systems” (ParCo 2009) Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 19. 19 HPC-Lab -- Spring 2010 http://research.idi.ntnu.no/hpc-lab Dr. Anne C. Elster Lab Director Dr. John P. Ryan Post Doc Dr. Jan Perhac Post Doc Jan Christian Meyer (PhD stud) Thorvald Natvig (PhD stud.) Rune E. Jensen (PhD stud.) Master Students – Spring 2010 Ahmed Aqrawi Assist. TDT 4200 Aleksander Gjermundsen Affiliates /Visitors Andreas Hysing Øystein Krog Holger Ludvigsen Assist TDT 4205 + 2 Cybernetics students + 3 visualization students + 1-2 || arch/multicore students + 1 Marine student Eirik O. Aksnes (tentative PhD, Now consultant for Statoil) Refsnaes & Singh did FEM on GPU -NTNU CSE Kvamsdal & Elster Seminar Oct 2, 2013 Gagandeep Collarborations between Falch & Elster: The Power of GPU Computing Singh (Math)
  • 20. 20 HPC-Lab History (contin.): http://research.idi.ntnu.no/hpc-lab 2010: - NVIDIA Fermi-based card(470, c2050, c2070(fall)) More on OpenCL Ahmed A. Aqwari (Masters thesis, June 2010): “Effects of Compression on Data Intensive Algorithms” Aleksander Gjermundsen (Masters thesis, July 2010): “Audio Processing on GPU” Andreas Hysing (Masters thesis, Aug 2010): Parallel Inversion code (w/Statoil) Øystein Krog (Masters thesis, June 2010): “GPU-based Real-Time Snow Avalanche Simulations” (SPH) Holger Ludvigsen (Masters thesis, June 2010, Dr. Frank Lindseth (co-advisor): “Real-Time GPU-Based 3D Ultrasound Reconstruction and Visualization” Thorvald Natvig (PhD Dec 2010) “Automatic Run-Time Communication and I/O” 20 Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 21. HPC-Lab -- 2011 (Elster on sabatical 2010/11) 21 Dr. Anne C. Elster Lab Director Dr. Ian Karlin Post Doc Jan Christian Meyer (PhD stud) Rune E. Jensen (PhD stud.) http://research.idi.ntnu.no/hpc-lab Erik Smistad (PhD stud.) Elster co-advisor Master Students – Spring 2011 Fredrik Fossum GPU Rigid body simulation Yngve S. Lindal (GPU proj @ CERN) Affiliates /Visitors Ole-Martin Brende (MedTech) Ove Stinessen (Statoil proj) Jarle Stensland (OpenCL BLAS) Thor Kristian Valderhaug (Numerical Rocks proj Multi-GPU LBM) Geir Jostein Lien (2-yr Master Informatics, graduated 2012) Miguel Martinez-delAmor (PhD student from Spain, Falch & Elster: The Power of GPU Computing Fall 2011 NTNU CSE Seminar Oct 2, 2013
  • 22. 22 HPC-Lab: Master theses 2012 http://research.idi.ntnu.no/hpc-lab Kjetil Babington: Terrain Rendering Techniques for the HPC-Lab Snow Simulato Thomas Løfsgaard Falch: 3D Visualization of X-ray Diffraction Data Geir Josten Lien: Auto-tunable GPU BLAS Jan Magne Rovde: Real-Time Granular Flow Simulation Using the PCISPH Method on GPGPU Devices Using CUDA Frederik Magnus Johansen Vestre: Enhancing and Porting the HPC-Lab Snow Simulator to OpenCL on Mobile Platforms Jan Christian Meyer, PhD Theses (December): Performance Modeling of Heterogeneous Systems Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 23. 23 HPC-Lab: 2012/2013 http://research.idi.ntnu.no/hpc-lab Jan Christian Meyer, PhD Theses (December): Performance Modeling of Heterogeneous Systems Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 24. 24 HPC-Lab: Master Theses 2013 http://research.idi.ntnu.no/hpc-lab Lark Kirkholt Melhus (June): Analyzing Contextual Bias of Program Execution on Modern CPUs Magnus Mikalsen (June): OpenACC-based Snow Simulation Andreas Nordahl (June): Enhancing the HPC-Lab Snow Simulator with More Realistic Terrains and Other Interactive Features Lars Espen Nordhus (June): Ray Tracing for Simulation of Wireless Networks in 3D Scenes Stian Aaraas Pedersen (June): Progressive Photon Mapping on GPUs Andreas Skomedal (June): Heterogeneous FTDT for Seismic Processing Henrik Holenbakken Knutsen (Sept): Enhancing Software Portability with Hardware Parametrized Autotuning Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 25. 25 HPC-Lab: 2013/2014 http://research.idi.ntnu.no/hpc-lab Anne C. Elster – Director Malik Khan – Post Doc to start Nov 1, 2013 PhD students: ● Rune Jensen, ● Johannes Kvam ● Thomas Falch, Samira Pakdel, Ruben Spaans Co-supervised by Elster: ● ● ● ● ● Johannes Kvam, Erik Smistad, Mehdi Bozorgi, Lane Holloway (UT ECE Student) ● + 8 master students & 2 MedTech PhD students Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 26. 26 Outline http://research.idi.ntnu.no/hpc-lab • Introduction to GPU computing • Overview of GPU projects at the HPC-Lab – 3D Real-Time Snow simulation – Flow simulations (porous media) – Surface extraction • Visualization of scattered point data Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 27. 27 http://research.idi.ntnu.no/hpc-lab Snow Simulation: calc. 4+ million particles in real-time using multi-core CPU + GPU Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 28. 28 http://research.idi.ntnu.no/hpc-lab Snow Simulation – Wind field Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 29. 29 http://research.idi.ntnu.no/hpc-lab Add more real-time features Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 30. 30 Add Road Generation (Used A* algorithm, Demo @ SC11) Falch & Elster: The Power of GPU Computing http://research.idi.ntnu.no/hpc-lab NTNU CSE Seminar Oct 2, 2013
  • 31. 31 Add Ray-Tracing Falch & Elster: The Power of GPU Computing http://research.idi.ntnu.no/hpc-lab NTNU CSE Seminar Oct 2, 2013
  • 32. 32 Outline http://research.idi.ntnu.no/hpc-lab • Introduction to GPU computing • Overview of GPU projects at the HPC-Lab – 3D Real-Time Snow simulation – Flow simulations (porous media) – Surface extraction • Visualization of scattered point data Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 33. 33 Simulations of Fluid Flow through Porous Rocks using GPUs http://research.idi.ntnu.no/hpc-lab Eirik Ola Aksnes & A.C: Elster (ParCo 2009) + current work with Thor Kristian Valderhaug using OpenCL In collaboration with : Numerical Rocks & NTNU Chemistry Dept. Use Lattice Bolzmann Method a.k.a LBM Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 34. 34 Outline http://research.idi.ntnu.no/hpc-lab • Introduction to GPU computing • Overview of GPU projects at the HPC-Lab – 3D Real-Time Snow simulation – Flow simulations (porous media) – Surface extraction • Visualization of scattered point data Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 35. 35 3D Surface Extraction http://research.idi.ntnu.no/hpc-lab (w/ Dr. Frank Lindseth (SINTEF MedTek and NTNU, and MS/PhD student Erik Smistad Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 36. 36 3D Surface Extraction http://research.idi.ntnu.no/hpc-lab (w/ Dr. Frank Lindseth (SINTEF MedTek and NTNU, and MS/PhD student Erik Smistad Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 37. 37 3D Surface Extraction on GPUs http://research.idi.ntnu.no/hpc-lab • Use Marching Cubes – algorithm for extracting a 3D surface from a set of sampled scalars • Algorithm used extensively for visualizing and analyzing medical data (X-ray, MR) and the result of 3D segmentation. • Completely data parallel • Challenge: How to store the result of each cube in parallel on GPU Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 38. 38 3D Surface Extraction -- Histogram data • http://research.idi.ntnu.no/hpc-lab Challenge: How to store the result of each cube in parallel on GPU? In serial implementation this is simple – just use a stack and add the vertex data to the stack • GPU Solution: Histogram Pyramids [1] • A datastructure that: • • • • Filters out cubes that has no triangle (stream reduction) Returns total sum of triangles Provides each cube with an index for memory storage Can be efficiently used by means of textures yielding large speed-ups [1] G. Ziegler et al: On-the-fly Point Clouds through Histogram Pyramids; Vision, Modeling, and Visualization 2006 Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 39. 39 3D Surface Extraction http://research.idi.ntnu.no/hpc-lab -- Histogram Pyramids: Construction & Traversal HP Construction HP Traversal Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 40. 40 3D Surface Extraction http://research.idi.ntnu.no/hpc-lab -- Results: HPMC Dyken et al. Size Exec. time 512^3 vs. Our OpenCL implementation FPS (avg) Memory Size Exec. time FPS (avg) Memory 3324 ms 0.3 490 MB 512^3 34 ms 0.3 121 MB 256^3 5 ms 223 122 MB 256^3 10 ms 105 40 MB 128^3 3 ms 394 44 MB 128^3 4 ms 233 26 MB 64^3 2 ms 519 22MB 64^3 3 ms 319 22MB Our Test system: • • • • Intel i5 750, 4GB RAM ATI Radeon 5870 (1GB RAM) AMD Catalyst 11.2 graphics driver APP SDK 2.3 w/ OpenCL 1.1 Note: OpenCL-OpenGL Synch measured to be 2-20ms, i.e. 7090<% for smallest datasets Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 41. 41 Outline http://research.idi.ntnu.no/hpc-lab • Introduction to GPU computing • Overview of GPU projects at the HPC-Lab – 3D Real-Time Snow simulation – Flow simulations (porous media) – Surface extraction • Visualization of scattered point data Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 42. 42 Scattered Point Data (a ) (b) Falch & Elster: The Power of GPU Computing http://research.idi.ntnu.no/hpc-lab (c) NTNU CSE Seminar Oct 2, 2013
  • 43. 43 Examples http://research.idi.ntnu.no/hpc-lab • Sensor networks • Simulations (n-body, SPH) • Post-processing/streaming over network Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 45. 45 Volume Ray Casting Eye/camera http://research.idi.ntnu.no/hpc-lab image Volume ray Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 46. 46 Volume Ray Casting of Scattered Point Data http://research.idi.ntnu.no/hpc-lab Eye/camera Image Ray Bounding box Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 47. 47 http://research.idi.ntnu.no/hpc-lab Interpolation r2 r3 r1 Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 48. 48 Finding Neighbors Falch & Elster: The Power of GPU Computing http://research.idi.ntnu.no/hpc-lab NTNU CSE Seminar Oct 2, 2013
  • 49. 49 http://research.idi.ntnu.no/hpc-lab Optimizations • Empty space skipping • Early ray termination • Filtering C B A C B A Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 50. 50 GPU Implementation • • • • http://research.idi.ntnu.no/hpc-lab CUDA C, almost same code One thread for each ray/pixel Remove recursion (for older hardware) Texture memory Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 51. 51 Multi GPU http://research.idi.ntnu.no/hpc-lab • Load distribution challenging – Different hardware – Different amount of work per thread (ray/pixel) • Use previous image to divide work for next • Ray length as proxy for amount of work Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 52. 52 Results Falch & Elster: The Power of GPU Computing http://research.idi.ntnu.no/hpc-lab NTNU CSE Seminar Oct 2, 2013
  • 53. 53 Results Falch & Elster: The Power of GPU Computing http://research.idi.ntnu.no/hpc-lab NTNU CSE Seminar Oct 2, 2013
  • 54. 54 Results Falch & Elster: The Power of GPU Computing http://research.idi.ntnu.no/hpc-lab NTNU CSE Seminar Oct 2, 2013
  • 55. 55 Results Falch & Elster: The Power of GPU Computing http://research.idi.ntnu.no/hpc-lab NTNU CSE Seminar Oct 2, 2013
  • 56. 56 Results Falch & Elster: The Power of GPU Computing http://research.idi.ntnu.no/hpc-lab NTNU CSE Seminar Oct 2, 2013
  • 57. 57 Results Falch & Elster: The Power of GPU Computing http://research.idi.ntnu.no/hpc-lab NTNU CSE Seminar Oct 2, 2013
  • 58. 58 Results Falch & Elster: The Power of GPU Computing http://research.idi.ntnu.no/hpc-lab NTNU CSE Seminar Oct 2, 2013
  • 59. 59 http://research.idi.ntnu.no/hpc-lab Questions? Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 60. 60 Thank yous to: http://research.idi.ntnu.no/hpc-lab HPC-Lab Post Docs and grad. students! @ SC´07 Spring 2007 Spring 2010 Spring 2009 Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 61. 61 http://research.idi.ntnu.no/hpc-lab Modeling Heterogeneous Systems “Optimized Barriers for Heterogeneous Systems Using MPI” Jan Christian Meyer PhD Student Finishing summer 2011 (to be presented at IEEE IPDPS 2011, HCW) Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 62. 62 Dealing with bandwidth issues: Compression of Large Seismic Datasets on GPU (Aqrawi & Elster IPDPS 2011) http://research.idi.ntnu.no/hpc-lab Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 63. 63 Motivation http://research.idi.ntnu.no/hpc-lab Locality & I/O – challenge for data intensive algorithms Look at techniques for reducing Mem. Bandwidth – Hardware: HDD, SSD – Compression: JPEG, MPEG, MP3 ... – Explore GPU compression capabilities Seismic filtering process – Transform coding works well for signal data* * [H.S.Malvar 1992], [L.C.Duval 2000], [C.Larsen 2006], [D.Haugen 2009] Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 64. 64 http://research.idi.ntnu.no/hpc-lab Results GPU acceleration Execution time comparison to FERMI arcitechture 700 600 Execution time (s) 500 Intel i7 Single Intel i7 Quad Nvidia Tesla c1060 Nvidia Tesla c2050 400 300 200 100 0 DCT 3D DCT AAN 3D LOT 1D Algorithm Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013
  • 65. Falch & Elster: The Power of GPU Computing 3D AAN(GPU) 2D AAN(GPU) 1D AAN(GPU) 2D DCT (GPU) 1D DCT (GPU) 3D AAN(Quad) 2D AAN(Quad) 1D AAN(Quad) 2D DCT (Quad) I/O Speedup HDD 1D DCT (Quad) 3D AAN (Single) 2D AAN (Single) 1D AAN (Single) 2D DCT (Single) 1D DCT (Single) Huffman (GPU) Huffman (Quad) Huffman (Single) RLE(Quad) RLE (Single) I/O speed up compa red to pla tform 65 http://research.idi.ntnu.no/hpc-lab Results I/O Speedup I/O Speedup SSD 7 6 5 4 3 2 1 0 Compression algor ithm NTNU CSE Seminar Oct 2, 2013
  • 66. 66 Summary Compression http://research.idi.ntnu.no/hpc-lab – – When optimizing for I/O need efficent compression rate AND fast compression algorithm Compression can give up to: – 6.2 I/O speedup on HDD (70MB/s) – 3.9 I/O speedup on SSD (140MB/s) – Achieved through – Transform coding – CPU & GPU co-op – Asynch I/O – – Predictive model accurate within 5% Seismic compression library Falch & Elster: The Power of GPU Computing NTNU CSE Seminar Oct 2, 2013