2. Introduction
It is a processor optimized for 2D/3D graphics, video,
visual computing, and display.
It is highly parallel, highly multi threaded multiprocessor
optimized for visual computing.
It provide real-time visual interaction with computed
objects via graphics images, and video.
3. History
●
Up to late 90's
– No GPUs
– Much simpler VGA controller
● Consisted of
– A memory controller
– Display generator + DRAM
● DRAM was either shared with CPU
or private
4. History
●
By 1997
– More complex VGA controllers
● Incorporated 3D accelerating functions in
hardware
– Triangle set up and rasterization
– Texture mapping and shading
A combination of shapes(Lines, polygons, letters, …)
into an image consisting of individual pixels
5. History
●
By 2000
– Single chip graphics processor incorporated
nearly all functions of graphics pipeline of
high-end workstations
● Beginning of the end of high-end
workstation market
– VGA controller was renamed Graphic
Processing Units
6. Current Trends
Well defined APIs
Open GL:
Open standard for 3D graphics programming
Web GL:
Open GL extension for web
DirectX:
Set of MS multimedia programming interfaces
(Direct3D for 3D graphics)
Can implement novel graphics algorithms
Use GPUs for non-conventional applications
7. Current Trends
Combining powers of CPU and GPU - heterogeneous
architectures
GPUs become scalable parallel processors
Moving from hardware-defined pipelining architectures to
more flexible programmable architectures
8. Architechture Evolution
Memory
CPU
floating point co-processors
attached to microprocessors.
Graphic
s card
Interest to provide hardware
support for displays
Display
Led to graphics processing units
(GPUs)
9. GPUs with dedicated pipelines
Input stage
Vertex shader
stage
Graphi
cs
memor
y
Geometry
shader stage
Frame
buffer
Rasterizer
stage
Pixel shading
stage
Graphics chips generally had a
pipeline structure
individual stages performing
Specialized operations, finally
leading to loading frame buffer for
Display
Individual stages may have access
to graphics memory for storing
intermediate computed data.
10. PROGRAMMING GPUS
•
•
•
Will focus on parallel computing applications
Must decompose problem into set of parallel
computations
Ideally two-level to match GPU organization
12. GPGU and CUDA
GPGU
●
General-Purpose computing on GPU
●
Uses traditional graphics API and graphics pipeline
CUDA
●
Compute Unified Device Architecture
●
Parallel computing platform and programming model
●
Invented by NVIDIA
●
Single Program Multiple Data approach
13. CUDA
➢
➢
➢
CUDA programs are written in C
Within C programs, call SIMT “kernel” routines that are
executed on GPU
Provides three abstractions
➢
➢
➢
Hierarchy of thread groups
Shared memory
Barrier synchronization
15. CUDA
●
●
●
Lowest level of parallelism – CUDA Thread
Compiler + Hardware can gang 1000s of CUDA threads
together leads to various levels of parallelism within the
GPU
MIMD,SIMD,Instruction level Parallelism
Single Instruction, Multiple Thread (SIMT)
16. Conventional C Code
// Invoke DAXPY
dapxy(n,2.0,x,y);
// DAXPY in C
void daxpy(int n,double a,double *x, double *y)
{
for (int i=0;i<n;++i)
y[i] = a*x[i] + y[i];
}
17. Corresponding CUDA Code
// Invoke DAXPY with 256 threads per Thread Block
_host_
int nblocks = (n+255)/256;
daxpy<<<nblocks,256>>>(n,2.0,x,y);
//DAXPY in CUDA
_device_
Void daxpy(int n,double a,double *x, double *y)
{
int i = blockIdX.x*blockDim.x+threadIdx.x;
if(i<n) y[i]=a*x[i]+y[i];
}
●
18. Cont...
●
_device_ (OR) _global_
●
_host_
●
●
---
functions of GPU
--- functions of the system processor
CUDA variables declared in the _device_ are allocated to
the GPU Memory,which is acessable by all the multithreaded
SIMD processors
Function call syntax for the function uses GPU is
name<<<dimGrid,dimBlock>>>(..parameterlist..)
●
GPU Hardware handles Threads
19. ●
●
Threads are blocked together and executed in group of
32 threads – Thread Block
The hardware that executes a whole block of threats is
called a Multithreaded SIMD Processor