SlideShare una empresa de Scribd logo
1 de 18
Descargar para leer sin conexión
INTRODUCTION TO CUDA
Prepared for Geek Camp Singapore 2011
                                  Raymond Tay
THE FREE LUNCH IS OVER – HERB
SUTTER
WE NEED TO THINK BEYOND MULTI-CORE
CPUS … WE NEED TO THINK MANY-CORE
GPUS




…
NVIDIA GPUS FPS
    FPS – Floating-point per second aka flops. A measure of how
     many flops can a GPU do. More is Better 


                                                   GPUs beat CPUs
NVIDIA GPUS MEMORY BANDWIDTH
    With massively parallel processors in Nvidia’s GPUs, providing
     high memory bandwidth plays a big role in high performance
     computing.

                                                    GPUs beat CPUs
GPU VS CPU




CPU                                  GPU
"   Optimised for low-latency        "   Optimised for data-parallel,
    access to cached data sets           throughput computation
"   Control logic for out-of-order   "   Architecture tolerant of
    and speculative execution            memory latency
                                     "   More transistors dedicated to
                                         computation
I DON’T KNOW C/C++, SHOULD I LEAVE?
                           Your Brain Asks:
                                       Wait a minute, why
  Relax,   no worries. Not to fret.   should I learn the C/
                                       C++ SDK?

                                       CUDA Answers:
                                       Efficiency!!!
WHAT DO I NEED TO BEGIN WITH CUDA?
  A   Nvidia CUDA enabled graphics card e.g. Fermi
HOW DOES CUDA WORK



                                  PCI Bus




1.  Copy input data from CPU memory to
    GPU memory
2.  Load GPU program and execute,
    caching data on chip for performance
3.  Copy results from GPU memory to CPU
    memory
EXAMPLE: BLOCK CYPHER
void host_shift_cypher(unsigned int *input_array,    __global__ void shift_cypher(unsigned int
   unsigned int *output_array, unsigned int              *input_array, unsigned int *output_array,
   shift_amount, unsigned int alphabet_max,              unsigned int shift_amount, unsigned int
   unsigned int array_length)	
                          alphabet_max, unsigned int array_length)	
{	
                                                  {	
  for(unsigned int i=0;i<array_length;i++)	
           unsigned int tid = threadIdx.x + blockIdx.x *
 {	
                                                      blockDim.x;	

       int element = input_array[i];	
                 int shifted = input_array[tid] + shift_amount;	
       int shifted = element + shift_amount;	
         if ( shifted > alphabet_max )	
       if(shifted > alphabet_max)	
                        	
shifted = shifted % (alphabet_max + 1);	
       {	
         shifted = shifted % (alphabet_max + 1);	
     output_array[tid] = shifted;	
       }	
                                           }	
       output_array[i] = shifted;	
  }	
                                                Int main() {	
}	
                                                  dim3 dimGrid(ceil(array_length)/block_size);	
Int main() {	
                                                     dim3 dimBlock(block_size);	
host_shift_cypher(input_array, output_array,
                                                     shift_cypher<<<dimGrid,dimBlock>>>(input_array,
   shift_amount, alphabet_max, array_length);	
                                                          output_array, shift_amount, alphabet_max,
}	
                                                       array_length);	
                                                     }	
                    CPU                                               GPU
                    Program                                           Program
EXAMPLE: VECTOR ADDITION
 // CUDA CODE
__global__ void VecAdd(const float* A, const float* B, float* C,
    unsigned int N)
{
  int i = blockDim.x * blockIdx.x + threadIdx.x;
  if (i < N)
   C[i] = A[i] + B[i];
}

// C CODE
void VecAdd(const float* A, const float* B, float* C,unsigned int N)
{
 for( int i = 0; i < N; ++i)
  C[i] = A[i] + B[i];
}
DEBUGGER
              CUDA-GDB	
           • Based on GDB
           • Linux
           • Mac OS X



                             Parallel Nsight	
                            • Plugin inside
                            Visual Studio
VISUAL PROFILER & MEMCHECK
                                 Profiler	
                           •  Microsoft Windows
                           •  Linux
                           •  Mac OS X

                           •  Analyze
                           Performance




     CUDA-MEMCHECK	
    •  Microsoft Windows
    •  Linux
    •  Mac OS X

    •  Detect memory
    access errors
WHERE’S CUDA AT IN 2011?
  60,000 researchers use it to aid drug discovery
  470 universities teach CUDA
WHERE’S CUDA AT IN 2011? (PART 2..)
  NVIDIA   Show Case (1000+ applications)
ADDITIONAL RESOURCES
    CUDA FAQ (http://tegradeveloper.nvidia.com/cuda-faq)
    CUDA Tools & Ecosystem (
     http://tegradeveloper.nvidia.com/cuda-tools-ecosystem)
    CUDA Downloads (http://tegradeveloper.nvidia.com/cuda-downloads)
    NVIDIA Forums (http://forums.nvidia.com/index.php?showforum=62)
    GPGPU (http://gpgpu.org )
    CUDA By Example (
     http://tegradeveloper.nvidia.com/content/cuda-example-introduction-
     general-purpose-gpu-programming-0)
         Jason Sanders & Edward Kandrot
    GPU Computing Gems Emerald Edition (
     http://www.amazon.com/GPU-Computing-Gems-Emerald-Applications/dp/
     0123849888/ )
         Editor in Chief: Prof Hwu Wen-Mei
CUDA LIBRARIES
  Visit this site
   http://developer.nvidia.com/cuda-tools-
   ecosystem#Libraries
  Thrust, CUFFT, CUBLAS, CUSP, NPP, OpenCV,
   GPU AI-Tree Search, GPU AI-Path Finding
  A lot of the libraries are hosted in Google Code.
   Many more gems in there too!
THANK YOU
  @RaymondTayBL

Más contenido relacionado

La actualidad más candente

Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with Cuda
Rob Gillen
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
Arka Ghosh
 

La actualidad más candente (18)

Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDA
 
Cuda
CudaCuda
Cuda
 
Intro to GPGPU Programming with Cuda
Intro to GPGPU Programming with CudaIntro to GPGPU Programming with Cuda
Intro to GPGPU Programming with Cuda
 
NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009
 
Cuda tutorial
Cuda tutorialCuda tutorial
Cuda tutorial
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACCAccelerating HPC Applications on NVIDIA GPUs with OpenACC
Accelerating HPC Applications on NVIDIA GPUs with OpenACC
 
C++ amp on linux
C++ amp on linuxC++ amp on linux
C++ amp on linux
 
Computing using GPUs
Computing using GPUsComputing using GPUs
Computing using GPUs
 
Vc4c development of opencl compiler for videocore4
Vc4c  development of opencl compiler for videocore4Vc4c  development of opencl compiler for videocore4
Vc4c development of opencl compiler for videocore4
 
Development of hardware-based Elements for GStreamer 1.0: A case study (GStre...
Development of hardware-based Elements for GStreamer 1.0: A case study (GStre...Development of hardware-based Elements for GStreamer 1.0: A case study (GStre...
Development of hardware-based Elements for GStreamer 1.0: A case study (GStre...
 
Engineering fast indexes (Deepdive)
Engineering fast indexes (Deepdive)Engineering fast indexes (Deepdive)
Engineering fast indexes (Deepdive)
 
Gpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cudaGpu workshop cluster universe: scripting cuda
Gpu workshop cluster universe: scripting cuda
 
Lecture 04
Lecture 04Lecture 04
Lecture 04
 
UDPSRC GStreamer Plugin Session VIII
UDPSRC GStreamer Plugin Session VIIIUDPSRC GStreamer Plugin Session VIII
UDPSRC GStreamer Plugin Session VIII
 
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
Advanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering PipelineAdvanced Scenegraph Rendering Pipeline
Advanced Scenegraph Rendering Pipeline
 
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
Applying of the NVIDIA CUDA to the video processing in the task of the roundw...
 

Destacado

Network Security Threats and Solutions
Network Security Threats and SolutionsNetwork Security Threats and Solutions
Network Security Threats and Solutions
Colin058
 

Destacado (7)

Toying with spark
Toying with sparkToying with spark
Toying with spark
 
Distributed computing for new bloods
Distributed computing for new bloodsDistributed computing for new bloods
Distributed computing for new bloods
 
Modern Cryptography
Modern CryptographyModern Cryptography
Modern Cryptography
 
Network Security
Network SecurityNetwork Security
Network Security
 
Network Security
Network SecurityNetwork Security
Network Security
 
Network security
Network securityNetwork security
Network security
 
Network Security Threats and Solutions
Network Security Threats and SolutionsNetwork Security Threats and Solutions
Network Security Threats and Solutions
 

Similar a Introduction to cuda geek camp singapore 2011

Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
Arka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
Arka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
Arka Ghosh
 
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
mouhouioui
 
Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08
Angela Mendoza M.
 
Open CL For Haifa Linux Club
Open CL For Haifa Linux ClubOpen CL For Haifa Linux Club
Open CL For Haifa Linux Club
Ofer Rosenberg
 
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools
laparuma
 

Similar a Introduction to cuda geek camp singapore 2011 (20)

Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.Using GPUs to handle Big Data with Java by Adam Roberts.
Using GPUs to handle Big Data with Java by Adam Roberts.
 
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
 
Introduction to Accelerators
Introduction to AcceleratorsIntroduction to Accelerators
Introduction to Accelerators
 
Programar para GPUs
Programar para GPUsProgramar para GPUs
Programar para GPUs
 
GPU: Understanding CUDA
GPU: Understanding CUDAGPU: Understanding CUDA
GPU: Understanding CUDA
 
Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08
 
Linux kernel debugging
Linux kernel debuggingLinux kernel debugging
Linux kernel debugging
 
introduction to CUDA_C.pptx it is widely used
introduction to CUDA_C.pptx it is widely usedintroduction to CUDA_C.pptx it is widely used
introduction to CUDA_C.pptx it is widely used
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptx
 
Open CL For Haifa Linux Club
Open CL For Haifa Linux ClubOpen CL For Haifa Linux Club
Open CL For Haifa Linux Club
 
Intro2 Cuda Moayad
Intro2 Cuda MoayadIntro2 Cuda Moayad
Intro2 Cuda Moayad
 
Cuda intro
Cuda introCuda intro
Cuda intro
 
lecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdflecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdf
 
Anatomy of ROCgdb presentation at gcc cauldron 2022
Anatomy of ROCgdb presentation at gcc cauldron 2022Anatomy of ROCgdb presentation at gcc cauldron 2022
Anatomy of ROCgdb presentation at gcc cauldron 2022
 
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran LonikarExploiting GPU's for Columnar DataFrrames by Kiran Lonikar
Exploiting GPU's for Columnar DataFrrames by Kiran Lonikar
 
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools
[01][gpu 컴퓨팅을 위한 언어, 도구 및 api] miller languages tools
 
C++ AMP 실천 및 적용 전략
C++ AMP 실천 및 적용 전략 C++ AMP 실천 및 적용 전략
C++ AMP 실천 및 적용 전략
 

Último

+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
?#DUbAI#??##{{(☎️+971_581248768%)**%*]'#abortion pills for sale in dubai@
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
WSO2
 

Último (20)

DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ..."I see eyes in my soup": How Delivery Hero implemented the safety system for ...
"I see eyes in my soup": How Delivery Hero implemented the safety system for ...
 
Strategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a FresherStrategies for Landing an Oracle DBA Job as a Fresher
Strategies for Landing an Oracle DBA Job as a Fresher
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?A Year of the Servo Reboot: Where Are We Now?
A Year of the Servo Reboot: Where Are We Now?
 
Boost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdfBoost Fertility New Invention Ups Success Rates.pdf
Boost Fertility New Invention Ups Success Rates.pdf
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...2024: Domino Containers - The Next Step. News from the Domino Container commu...
2024: Domino Containers - The Next Step. News from the Domino Container commu...
 
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
Apidays New York 2024 - The Good, the Bad and the Governed by David O'Neill, ...
 
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
+971581248768>> SAFE AND ORIGINAL ABORTION PILLS FOR SALE IN DUBAI AND ABUDHA...
 
Architecting Cloud Native Applications
Architecting Cloud Native ApplicationsArchitecting Cloud Native Applications
Architecting Cloud Native Applications
 
A Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source MilvusA Beginners Guide to Building a RAG App Using Open Source Milvus
A Beginners Guide to Building a RAG App Using Open Source Milvus
 
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, AdobeApidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
Apidays New York 2024 - Scaling API-first by Ian Reasor and Radu Cotescu, Adobe
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
AWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of TerraformAWS Community Day CPH - Three problems of Terraform
AWS Community Day CPH - Three problems of Terraform
 
FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024FWD Group - Insurer Innovation Award 2024
FWD Group - Insurer Innovation Award 2024
 
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot ModelNavi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Navi Mumbai Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Real Time Object Detection Using Open CV
Real Time Object Detection Using Open CVReal Time Object Detection Using Open CV
Real Time Object Detection Using Open CV
 

Introduction to cuda geek camp singapore 2011

  • 1. INTRODUCTION TO CUDA Prepared for Geek Camp Singapore 2011 Raymond Tay
  • 2. THE FREE LUNCH IS OVER – HERB SUTTER
  • 3. WE NEED TO THINK BEYOND MULTI-CORE CPUS … WE NEED TO THINK MANY-CORE GPUS …
  • 4. NVIDIA GPUS FPS   FPS – Floating-point per second aka flops. A measure of how many flops can a GPU do. More is Better  GPUs beat CPUs
  • 5. NVIDIA GPUS MEMORY BANDWIDTH   With massively parallel processors in Nvidia’s GPUs, providing high memory bandwidth plays a big role in high performance computing. GPUs beat CPUs
  • 6. GPU VS CPU CPU GPU "   Optimised for low-latency "   Optimised for data-parallel, access to cached data sets throughput computation "   Control logic for out-of-order "   Architecture tolerant of and speculative execution memory latency "   More transistors dedicated to computation
  • 7. I DON’T KNOW C/C++, SHOULD I LEAVE? Your Brain Asks: Wait a minute, why   Relax, no worries. Not to fret. should I learn the C/ C++ SDK? CUDA Answers: Efficiency!!!
  • 8. WHAT DO I NEED TO BEGIN WITH CUDA?   A Nvidia CUDA enabled graphics card e.g. Fermi
  • 9. HOW DOES CUDA WORK PCI Bus 1.  Copy input data from CPU memory to GPU memory 2.  Load GPU program and execute, caching data on chip for performance 3.  Copy results from GPU memory to CPU memory
  • 10. EXAMPLE: BLOCK CYPHER void host_shift_cypher(unsigned int *input_array, __global__ void shift_cypher(unsigned int unsigned int *output_array, unsigned int *input_array, unsigned int *output_array, shift_amount, unsigned int alphabet_max, unsigned int shift_amount, unsigned int unsigned int array_length) alphabet_max, unsigned int array_length) { { for(unsigned int i=0;i<array_length;i++) unsigned int tid = threadIdx.x + blockIdx.x * { blockDim.x; int element = input_array[i]; int shifted = input_array[tid] + shift_amount; int shifted = element + shift_amount; if ( shifted > alphabet_max ) if(shifted > alphabet_max) shifted = shifted % (alphabet_max + 1); { shifted = shifted % (alphabet_max + 1); output_array[tid] = shifted; } } output_array[i] = shifted; } Int main() { } dim3 dimGrid(ceil(array_length)/block_size); Int main() { dim3 dimBlock(block_size); host_shift_cypher(input_array, output_array, shift_cypher<<<dimGrid,dimBlock>>>(input_array, shift_amount, alphabet_max, array_length); output_array, shift_amount, alphabet_max, } array_length); } CPU GPU Program Program
  • 11. EXAMPLE: VECTOR ADDITION // CUDA CODE __global__ void VecAdd(const float* A, const float* B, float* C, unsigned int N) { int i = blockDim.x * blockIdx.x + threadIdx.x; if (i < N) C[i] = A[i] + B[i]; } // C CODE void VecAdd(const float* A, const float* B, float* C,unsigned int N) { for( int i = 0; i < N; ++i) C[i] = A[i] + B[i]; }
  • 12. DEBUGGER CUDA-GDB • Based on GDB • Linux • Mac OS X Parallel Nsight • Plugin inside Visual Studio
  • 13. VISUAL PROFILER & MEMCHECK Profiler •  Microsoft Windows •  Linux •  Mac OS X •  Analyze Performance CUDA-MEMCHECK •  Microsoft Windows •  Linux •  Mac OS X •  Detect memory access errors
  • 14. WHERE’S CUDA AT IN 2011?   60,000 researchers use it to aid drug discovery   470 universities teach CUDA
  • 15. WHERE’S CUDA AT IN 2011? (PART 2..)   NVIDIA Show Case (1000+ applications)
  • 16. ADDITIONAL RESOURCES   CUDA FAQ (http://tegradeveloper.nvidia.com/cuda-faq)   CUDA Tools & Ecosystem ( http://tegradeveloper.nvidia.com/cuda-tools-ecosystem)   CUDA Downloads (http://tegradeveloper.nvidia.com/cuda-downloads)   NVIDIA Forums (http://forums.nvidia.com/index.php?showforum=62)   GPGPU (http://gpgpu.org )   CUDA By Example ( http://tegradeveloper.nvidia.com/content/cuda-example-introduction- general-purpose-gpu-programming-0)   Jason Sanders & Edward Kandrot   GPU Computing Gems Emerald Edition ( http://www.amazon.com/GPU-Computing-Gems-Emerald-Applications/dp/ 0123849888/ )   Editor in Chief: Prof Hwu Wen-Mei
  • 17. CUDA LIBRARIES   Visit this site http://developer.nvidia.com/cuda-tools- ecosystem#Libraries   Thrust, CUFFT, CUBLAS, CUSP, NPP, OpenCV, GPU AI-Tree Search, GPU AI-Path Finding   A lot of the libraries are hosted in Google Code. Many more gems in there too!
  • 18. THANK YOU @RaymondTayBL