2. Outline
● Introduction to GPU Computing
– Past: Graphics Processing and GPGPU
– Present: CUDA and OpenCL
– A bit on the architecture
● Why GPU?
● GPU v.s. Multi-core and Distributed
● Open problems.
● Where does this go?
19-Jan-2011 Computing Students talk 2
3. Introduction to GPU Computing
● Who have access to 1,000 processors?
19-Jan-2011 Computing Students talk 3
4. Introduction to GPU Computing
● Who have access to 1,000 processors?
19-Jan-2011 Computing Students talk 4
5. Introduction to GPU Computing
● Who have access to 1,000 processors?
YOU
19-Jan-2011 Computing Students talk 5
6. Introduction to GPU Computing
● In the past
– GPU = Graphics Processing Unit
19-Jan-2011 Computing Students talk 6
7. Introduction to GPU Computing
● In the past
– GPU = Graphics Processing Unit
19-Jan-2011 Computing Students talk 7
8. Introduction to GPU Computing
● In the past
– GPU = Graphics Processing Unit
19-Jan-2011 Computing Students talk 8
9. Introduction to GPU Computing
● In the past
– GPU = Graphics Processing Unit
19-Jan-2011 Computing Students talk 9
10. Introduction to GPU Computing
● In the past
– GPU = Graphics Processing Unit
19-Jan-2011 Computing Students talk 10
11. Introduction to GPU Computing
● In the past
– GPGPU = General Purpose computation using GPUs
19-Jan-2011 Computing Students talk 11
12. Introduction to GPU Computing
● Now al
Gener
– GPU = Graphics Processing Unit
__device__ float3 collideCell(int3 gridPos, uint index...
{
uint gridHash = calcGridHash(gridPos);
...
for(uint j=startIndex; j<endIndex; j++) {
if (j != index) {
...
force += collideSpheres(...);
}
}
return force;
}
19-Jan-2011 Computing Students talk 12
13. Introduction to GPU Computing
● Now
– We have CUDA (NVIDIA, proprietary) and OpenCL (open standard)
__device__ float3 collideCell(int3 gridPos, uint index...
{
uint gridHash = calcGridHash(gridPos);
...
for(uint j=startIndex; j<endIndex; j++) {
if (j != index) {
...
force += collideSpheres(...);
}
}
return force;
}
19-Jan-2011 Computing Students talk 13
14. Introduction to GPU Computing
● A (just a little) bit on the
architecture of the latest
NVIDIA GPU (Fermi)
– Very simple core (even simpler
than the Intel Atom)
– Little cache
19-Jan-2011 Computing Students talk 14
16. Why GPU?
● Performance
19-Jan-2011 Computing Students talk 16
17. Why GPU?
● People have used it, and it works.
– Bio-Informatics
– Finance
– Fluid Dynamics
– Data-mining
– Computer Vision
– Medical Imaging
– Numerical Analytics
19-Jan-2011 Computing Students talk 17
18. Why GPU?
● A new, promising area
– Fast growing
– Ubiquitous
– New paradigm → new problems, new challenges
19-Jan-2011 Computing Students talk 18
19. GPU v.s. Multi-core
● A lot more threads of computation are required:
– The GPU has a lot more “core” than a multi-core CPU.
– A GPU core is no where as powerful as a CPU core.
19-Jan-2011 Computing Students talk 19
20. GPU v.s. Multi-core
● Challenges:
– Not all problems can easily be broken into many small sub-
problems to be solved in parallel.
– Race conditions are much more serious.
– Atomic operations are still doable, locking is a performance killer.
Lock-free algorithms are much preferable.
– Memory access bottleneck (memory is not that parallel)
– Debugging is a nightmare.
19-Jan-2011 Computing Students talk 20
21. GPU v.s. Distributed
● GPU allows much cheaper communication between
different threads.
● GPU memory is still limited compared to a distributed
system.
● GPU cores are not completely independent processors
– Need fine-grain parallelism
– Reaching the scalability of a distributed system is difficult.
19-Jan-2011 Computing Students talk 21
22. Open problems
● Data-structures
● Algorithms
● Tools
● Theory
19-Jan-2011 Computing Students talk 22
23. Open problems
● Data-structures
– Requirement: Able to handle very high level of concurrent access.
– Common data-structures like dynamic arrays, priority queues or
hash tables are not very suitable for the GPU.
– Some existing works: kD-tree, quad-tree, read-only hash table...
19-Jan-2011 Computing Students talk 23
24. Open problems
● Algorithms
– Most sequential algorithms need serious re-design to make good
use of such a huge number of cores.
● Our computational geometry research: use the discrete
space computation to approximate the continuous space
result.
– Traditional parallel algorithms may or may not work.
● Usual assumption: infinite number of processors
● No serious study on this so far!
19-Jan-2011 Computing Students talk 24
25. Open problems
● Tools
– Programming language: Better language or model to express
parallel algorithms?
– Compiler: Optimize GPU code? Auto-parallelization?
● There's some work on OpenMP to CUDA.
– Debugging tool? Maybe a whole new “art of debugging” is needed.
– Software engineering is currently far behind the hardware
development.
19-Jan-2011 Computing Students talk 25
26. Open problems
● Theory
– Some traditional approach:
● PRAM: CRCW, EREW. Too general.
● SIMD: Too restricted.
– Big Oh analysis may not be good enough.
● Time complexity is relevant, but work complexity is more
important.
● Most GPU computing works only talk about actual running
time.
– Performance Modeling for GPU, anyone?
19-Jan-2011 Computing Students talk 26
27. Where does this go?
● Intel/AMD already have 6 core 12 threads processors
(maybe more).
● SeaMicro has a server with 512 Atom dual-core processors.
● AMD Fusion: CPU + GPU.
● The GPU may not stay forever, but massively-multithreaded
is definitely the future of computing.
19-Jan-2011 Computing Students talk 27
28. Where to start?
● Check your PC.
– If it's not at the age of being able to go to a Primary school, there's
a high chance it has a GPU.
● Go to NVIDIA/ATI website, download some development
toolkit, and you're ready to go.
19-Jan-2011 Computing Students talk 28
29. THANK YOU
● Any questions? Just ask.
● Any suggestion? What are you waiting for.
● Any problem or solution to discuss? Let's have a private talk
somewhere (j/k)
19-Jan-2011 Computing Students talk 29