Implement Anisotropic Diffusion on CUDA platform
1 thread handle 1 pixel
Dividing the image to multiple sub-regions, process them parallely to exploit multiple cores
2. INTRODUCTION
Me:
Chan Le – 3rd year undergraduate student
Double major in Computer Science & Management Science
A Vietnamese - KAIST ’13
Professor:
Won-Ki Jeong
GPU-accelerated large-scale biomedical image processing
Project:
Apply parallel programming to improve performance of image
processing
3. MOTIVATION
Biomedical researches work with images
Really big images
Take long time to process
Raw images are hard to analyze & use for research
Really noisy sometimes
Need to preprocess before using
Image preprocessing using serial algorithms are
slow
Nowadays, parallel computing are developing
Thanks to the popularity of multi-core CPUs and GPUs
4. RELATED WORKS: USING PDE IN NOISE-
REDUCTION
IN
(x,y+1
ΔW = IW – It
)
ΔN
IW ΔW It ΔE IE
(x-1,y) (x,y) (x+1,y)
ΔS
IS
(x,y-
1)
Heat equation
At pixel every (x,y) of the image at the time t:
I =It+ΔI
t+1
ΔI = (ΔW+ ΔN+ ΔE+ ΔS) / 4
5. RELATED WORKS: ANISOTROPIC DIFFUSION
Paper: Scale-space and edge detection using
anisotropic diffusion (Pietro Perona & Jitendra Malik, 1990)
Basic idea: Adding coefficient to each ΔW,ΔN,ΔS,ΔE
.
/4
How to calculate each c?
C=
C=
7. NVIDIA CUDA
Serial vs Parallel program
Thread: unit of processing
In the past: CPU has only 1 core -> 1 thread at a time
Nowadays: multi-cores -> multiple thread at a time
CUDA™ is a parallel computing platform and
programming model invented by NVIDIA.
http://www.nvidia.com/object/cuda_home_new.html
How could it helps?
CPU: 1-6 cores
GPU: hundreds
improve performance by the scale of 10 to 100, depends on
the algorithm
8. MY IMPLEMENTATION
Implement Anisotropic
Diffusion on CUDA
platform
1 thread handle 1 pixel
Dividing the image to
multiple sub-regions,
process them parallely to
exploit multiple cores
14. CONCLUSION
The result of this project could be use to help
improving quality of images before using.
Utilizing GPU computing power could improve the
performance of your program by 100-200 times
Partial Differential Equations are good choices
when design parallel algorithm
However, the performance is limited by the GPU’s
memory size