SlideShare una empresa de Scribd logo
1 de 26
Seminar „11                             CUDA

Contents


              1   WHAT IS CUDA ??????

              2   EXECUTION MODEL


              3   IMPLEMENTATION


              4   APPLICATION




3/17/2012                                      2
Seminar „11                                              CUDA


What is CUDA ??????
 CUDA – Compute Unified Device
  Architecture
    Hardware and software architecture

    For computing on the GPU

    Developed by Nvidia in 2007

    GPU

          Do massive amount of task simultaneously and quickly by
              using several ALUs

          ALUs are programmable by Graphics API
3/17/2012                                                            3
Seminar „11                                     CUDA


What is CUDA ??????
 Using CUDA – No need to map GPU towards Graphics APIs

 CUDA provides number crunching very fast

 CUDA is well suited for highly parallel algorithms and
   large datasets

 Consists of heterogeneous programming model and
  software environment
      Hardware and software models
      An Extension of C programming
 Designed to enable heterogeneous computation
      Computation with CPU &GPU
3/17/2012                                                  4
Seminar „11                                        CUDA

  CUDA kernels & threads
 Device = GPU
     Executes parallel portions of an application
      as kernels

 Host = CPU
     Executes serial portions of an application

 Kernel = Functions that runs on device
     One kernel at one time
     Many threads execute each kernel
 Posses host and device memory
 Host and device connected by PCI
  EXPRESS X16
  3/17/2012                                                 5
Seminar „11                                         CUDA

Arrays parallel threads
 A CUDA kernel is executed by an array of threads
      All threads run the same code

      Each thread has ID uses to compute memory addresses




3/17/2012                                                    6
Seminar „11                                          CUDA

Thread batching
 Thread cooperation is valuable
     Share results to avoid redundant computation
     Share memory accesses

 Thread block = Group of threads

     Threads cooperate together using shared memory and
       synchronization

     Thread ID is calculated by

             x+yDx (for 2 dimensional block)

               (x,y) – thread index

               (Dx,Dy) – block size
3/17/2012                                                   7
Seminar „11                                               CUDA

Thread Batching (Contd…)
               (x+yDx+zDxDy) (for 3 dimensional block)

                 (x,y,z) – thread index

                (Dx,Dy,Dz) – block size

 Grid = Group of thread blocks




3/17/2012                                                        8
Seminar „11                                       CUDA

Thread Batching (Contd…)
  There is block ID
      • Calculated as thread ID

  Threads in different blocks cannot cooperate




3/17/2012                                                9
Seminar „11                                              CUDA

Transparent Scalability

 Hardware is free to schedule thread blocks on any
    processor
      A kernel scales across parallel multiprocessors




3/17/2012                                                       10
Seminar „11                                                   CUDA

CUDA architectures

    Architecture’s Codename        G80       GT200            Fermi
          Release Year             2006        2008            2010
      Number of Transistors     681 million 1.4 billion     3.0 billion
   Streaming Multiprocessors
                                    16           30             16
              (SM)
    Streaming Processors (per
                                     8           8              32
               SM)
   Streaming Processors (total)     128         240            512
                                                          Configurable 48
    Shared Memory (per SM)         16 KB      16 KB
                                                           KB or 16 KB
                                                          Configurable 16
         L1 Cache (per SM)         None        None
                                                           KB or 48 KB


3/17/2012                                                                   11
Seminar „11                  CUDA

8 & 10 Series Architecture


                                    G80




                                    GT200



3/17/2012                                 12
Seminar „11              CUDA

Kernel memory access
 Per thread
                Thread



 Per block
              Block

 Per device




3/17/2012                       13
Seminar „11                                           CUDA

Physical Memory Layout
 “Local” memory resides in device DRAM
      Use registers and shared memory to minimize local memory use

 Host can read and write global memory but not shared
    memory




3/17/2012                                                        14
Seminar „11                    CUDA

Execution Model
                   Threads are executed
                    by thread processors



                   Thread blocks are
                    executed by
                    multiprocessors



                   A kernel is launched as
                    a grid of thread blocks


3/17/2012                                  15
Seminar „11                 CUDA

CUDA software development




3/17/2012                          16
Seminar „11                       CUDA

Compiling CUDA code



 CUDA nvcc compiler to
    compile the .cu files which
    divides code into NVidia
    assembly and C++ code.




3/17/2012                                17
Seminar „11                                                 CUDA


    Example
int main(void){
         float *a_h, *b_h;          //host data
         float *a_d, *b_d;          //device data        Host      Device
         int N = 15, nBytes, i;
         nBytes = N*sizeof(float);                       a_h           a_d
         a_h = (float*)malloc(nBytes);
         b_h = (float*)malloc(nBytes);
                                                         b_h           b_d
         cudaMalloc((void**)&a_d,nBytes);
         cudaMalloc((void**)&b_d,nBytes);
         for(i=0; i<N; i++) a_h[i]=100.f +i;
         cudaMemcpy(a_d, a_h, nBytes, cudaMemcpyHostToDevice);
         cudaMemcpy(b_d, a_d, nBytes, cudaMemcpyDeviceToDevice);
         cudaMemcpy(b_h, b_d, nByyes, cudaMemcpyDeviceToHost);
         for(i=0; i<N; i++) assert(a_h[i] == b_h[i]);
         free(a_h); free(b_h); cudaFree(a_d); cudaFree(b_d);
         return 0;}
    3/17/2012                                                                18
Seminar „11                                  CUDA

 Applications



Finance                Numeric         Medical




               Oil & Gas       Biophysics




Audio                      Video            Imaging
 3/17/2012                                            19
Seminar „11                                    CUDA

Advantages

 Provides shared memory

 Cost effective

 The gaming industries demand on Graphics cards has
    forced a lot of research and money into the improvement
    of the GPUs

 Transparent Scalability



3/17/2012                                                 20
Seminar „11                                  CUDA

Drawbacks


 Despite having hundreds of “cores” CUDA is not as
    flexible as CPU‟s

 Not as effective for personal computers




3/17/2012                                             21
Seminar „11                                  CUDA

Future Scope


 Implementation of CUDA in several other group of
    companies‟ GPUs.

 More and more streaming processors can be included

 CUDA in wide variety of programming languages.




3/17/2012                                              22
Seminar „11                                        CUDA

Conclusion

 Brought significant innovations to the High Performance
    Computing world.

 CUDA simplified process of development of general
    purpose parallel applications.

 These applications have now enough computational
    power to get proper results in a short time.



3/17/2012                                                 23
Seminar „11                                                       CUDA

  References
1. “CUDA by Example: An Introduction to General-Purpose GPU
    Programming” by Edward kandrot
2. “Programming Massively Parallel Processors: A Hands-on Approach
    (Applications of GPU Computing Series)” By David B kirk & Wen Mei W.
    Hwu.
3. “GPU Computing Gems Emerald Edition (Applications of GPU Computing
    Series)” By Wen-mei W. Hwu .
4. “The Cost To Play: CUDA Programming” , By Douglas Eadline, Ph.D. ,on
    Linux Magazine Wednesday, February 17th, 2010
5. “Nvidia Announces CUDA x86” Written by Cristian, On Tech Connect
    Magazine 21 September 2010
6. CUDA Programming Guide. ver. 1.1,
    http://www.nvidia.com/object/cuda_develop.html
7. TESLA GPU Computing Technical Brief,
    http://www.nvidia.com/object/tesla_product_literature.html
8. G80 architecture reviews and specification,
    http://www.nvidia.com/page/8800_reviews.html,
    http://www.nvidia.com/page/8800_tech_specs.html
9. Beyond3D G80: Architecture and GPU Analysis,
    http://www.beyond3d.com/content/reviews/1
10. Graphics adapters supporting CUDA,
    http://www.nvidia.com/object/cuda_learn_products.html
  3/17/2012                                                                24
Seminar „11                    CUDA


              Questions?????




3/17/2012                             26

Más contenido relacionado

La actualidad más candente

La actualidad más candente (20)

NVIDIA CUDA
NVIDIA CUDANVIDIA CUDA
NVIDIA CUDA
 
Tech Talk NVIDIA CUDA
Tech Talk NVIDIA CUDATech Talk NVIDIA CUDA
Tech Talk NVIDIA CUDA
 
CUDA
CUDACUDA
CUDA
 
Introduction to parallel computing using CUDA
Introduction to parallel computing using CUDAIntroduction to parallel computing using CUDA
Introduction to parallel computing using CUDA
 
GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)GPU Architecture NVIDIA (GTX GeForce 480)
GPU Architecture NVIDIA (GTX GeForce 480)
 
Heterogeneous computing
Heterogeneous computingHeterogeneous computing
Heterogeneous computing
 
Lec04 gpu architecture
Lec04 gpu architectureLec04 gpu architecture
Lec04 gpu architecture
 
Cuda introduction
Cuda introductionCuda introduction
Cuda introduction
 
Parallel computing
Parallel computingParallel computing
Parallel computing
 
Introduction to OpenCL
Introduction to OpenCLIntroduction to OpenCL
Introduction to OpenCL
 
Graphics processing unit (GPU)
Graphics processing unit (GPU)Graphics processing unit (GPU)
Graphics processing unit (GPU)
 
Parallel Computing
Parallel ComputingParallel Computing
Parallel Computing
 
Tensor Processing Unit (TPU)
Tensor Processing Unit (TPU)Tensor Processing Unit (TPU)
Tensor Processing Unit (TPU)
 
GPU: Understanding CUDA
GPU: Understanding CUDAGPU: Understanding CUDA
GPU: Understanding CUDA
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
 
Graphics processing unit ppt
Graphics processing unit pptGraphics processing unit ppt
Graphics processing unit ppt
 
Parallel Computing on the GPU
Parallel Computing on the GPUParallel Computing on the GPU
Parallel Computing on the GPU
 
GPU - Basic Working
GPU - Basic WorkingGPU - Basic Working
GPU - Basic Working
 
Kato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance ComputingKato Mivule: An Overview of CUDA for High Performance Computing
Kato Mivule: An Overview of CUDA for High Performance Computing
 
High performance computing
High performance computingHigh performance computing
High performance computing
 

Similar a Cuda

gpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsngpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsnARUNACHALAM468781
 
S0333 gtc2012-gmac-programming-cuda
S0333 gtc2012-gmac-programming-cudaS0333 gtc2012-gmac-programming-cuda
S0333 gtc2012-gmac-programming-cudamistercteam
 
An Introduction to CUDA-OpenCL - University.pptx
An Introduction to CUDA-OpenCL - University.pptxAn Introduction to CUDA-OpenCL - University.pptx
An Introduction to CUDA-OpenCL - University.pptxAnirudhGarg35
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxssuser413a98
 
Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Angela Mendoza M.
 
NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009Randall Hand
 
A beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAA beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAPiyush Mittal
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeOfer Rosenberg
 
Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Rob Gillen
 
Computing using GPUs
Computing using GPUsComputing using GPUs
Computing using GPUsShree Kumar
 
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...Stefano Di Carlo
 
GlusterFS CTDB Integration
GlusterFS CTDB IntegrationGlusterFS CTDB Integration
GlusterFS CTDB IntegrationEtsuji Nakai
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapGeorge Markomanolis
 
Cloud, Distributed, Embedded: Erlang in the Heterogeneous Computing World
Cloud, Distributed, Embedded: Erlang in the Heterogeneous Computing WorldCloud, Distributed, Embedded: Erlang in the Heterogeneous Computing World
Cloud, Distributed, Embedded: Erlang in the Heterogeneous Computing WorldOmer Kilic
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computingArka Ghosh
 

Similar a Cuda (20)

gpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsngpuprogram_lecture,architecture_designsn
gpuprogram_lecture,architecture_designsn
 
S0333 gtc2012-gmac-programming-cuda
S0333 gtc2012-gmac-programming-cudaS0333 gtc2012-gmac-programming-cuda
S0333 gtc2012-gmac-programming-cuda
 
An Introduction to CUDA-OpenCL - University.pptx
An Introduction to CUDA-OpenCL - University.pptxAn Introduction to CUDA-OpenCL - University.pptx
An Introduction to CUDA-OpenCL - University.pptx
 
lecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptxlecture11_GPUArchCUDA01.pptx
lecture11_GPUArchCUDA01.pptx
 
Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08
 
GPU Ecosystem
GPU EcosystemGPU Ecosystem
GPU Ecosystem
 
NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009NVidia CUDA Tutorial - June 15, 2009
NVidia CUDA Tutorial - June 15, 2009
 
A beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDAA beginner’s guide to programming GPUs with CUDA
A beginner’s guide to programming GPUs with CUDA
 
Hpc4
Hpc4Hpc4
Hpc4
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universe
 
Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)
 
Computing using GPUs
Computing using GPUsComputing using GPUs
Computing using GPUs
 
GPU Programming with Java
GPU Programming with JavaGPU Programming with Java
GPU Programming with Java
 
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
Multi-faceted Microarchitecture Level Reliability Characterization for NVIDIA...
 
GlusterFS CTDB Integration
GlusterFS CTDB IntegrationGlusterFS CTDB Integration
GlusterFS CTDB Integration
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmap
 
Cloud, Distributed, Embedded: Erlang in the Heterogeneous Computing World
Cloud, Distributed, Embedded: Erlang in the Heterogeneous Computing WorldCloud, Distributed, Embedded: Erlang in the Heterogeneous Computing World
Cloud, Distributed, Embedded: Erlang in the Heterogeneous Computing World
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 
Vpu technology &gpgpu computing
Vpu technology &gpgpu computingVpu technology &gpgpu computing
Vpu technology &gpgpu computing
 

Último

Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfPoh-Sun Goh
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxCeline George
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...Nguyen Thanh Tu Collection
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxDr. Sarita Anand
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.christianmathematics
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxannathomasp01
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - Englishneillewis46
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024Elizabeth Walsh
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.MaryamAhmad92
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...Poonam Aher Patil
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxPooja Bhuva
 

Último (20)

Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Google Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptxGoogle Gemini An AI Revolution in Education.pptx
Google Gemini An AI Revolution in Education.pptx
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.This PowerPoint helps students to consider the concept of infinity.
This PowerPoint helps students to consider the concept of infinity.
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptxOn_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
On_Translating_a_Tamil_Poem_by_A_K_Ramanujan.pptx
 

Cuda

  • 1.
  • 2. Seminar „11 CUDA Contents 1 WHAT IS CUDA ?????? 2 EXECUTION MODEL 3 IMPLEMENTATION 4 APPLICATION 3/17/2012 2
  • 3. Seminar „11 CUDA What is CUDA ??????  CUDA – Compute Unified Device Architecture  Hardware and software architecture  For computing on the GPU  Developed by Nvidia in 2007  GPU  Do massive amount of task simultaneously and quickly by using several ALUs  ALUs are programmable by Graphics API 3/17/2012 3
  • 4. Seminar „11 CUDA What is CUDA ??????  Using CUDA – No need to map GPU towards Graphics APIs  CUDA provides number crunching very fast  CUDA is well suited for highly parallel algorithms and large datasets  Consists of heterogeneous programming model and software environment  Hardware and software models  An Extension of C programming  Designed to enable heterogeneous computation  Computation with CPU &GPU 3/17/2012 4
  • 5. Seminar „11 CUDA CUDA kernels & threads  Device = GPU  Executes parallel portions of an application as kernels  Host = CPU  Executes serial portions of an application  Kernel = Functions that runs on device  One kernel at one time  Many threads execute each kernel  Posses host and device memory  Host and device connected by PCI EXPRESS X16 3/17/2012 5
  • 6. Seminar „11 CUDA Arrays parallel threads  A CUDA kernel is executed by an array of threads  All threads run the same code  Each thread has ID uses to compute memory addresses 3/17/2012 6
  • 7. Seminar „11 CUDA Thread batching  Thread cooperation is valuable  Share results to avoid redundant computation  Share memory accesses  Thread block = Group of threads  Threads cooperate together using shared memory and synchronization  Thread ID is calculated by  x+yDx (for 2 dimensional block) (x,y) – thread index (Dx,Dy) – block size 3/17/2012 7
  • 8. Seminar „11 CUDA Thread Batching (Contd…)  (x+yDx+zDxDy) (for 3 dimensional block) (x,y,z) – thread index (Dx,Dy,Dz) – block size  Grid = Group of thread blocks 3/17/2012 8
  • 9. Seminar „11 CUDA Thread Batching (Contd…)  There is block ID • Calculated as thread ID  Threads in different blocks cannot cooperate 3/17/2012 9
  • 10. Seminar „11 CUDA Transparent Scalability  Hardware is free to schedule thread blocks on any processor  A kernel scales across parallel multiprocessors 3/17/2012 10
  • 11. Seminar „11 CUDA CUDA architectures Architecture’s Codename G80 GT200 Fermi Release Year 2006 2008 2010 Number of Transistors 681 million 1.4 billion 3.0 billion Streaming Multiprocessors 16 30 16 (SM) Streaming Processors (per 8 8 32 SM) Streaming Processors (total) 128 240 512 Configurable 48 Shared Memory (per SM) 16 KB 16 KB KB or 16 KB Configurable 16 L1 Cache (per SM) None None KB or 48 KB 3/17/2012 11
  • 12. Seminar „11 CUDA 8 & 10 Series Architecture G80 GT200 3/17/2012 12
  • 13. Seminar „11 CUDA Kernel memory access  Per thread Thread  Per block Block  Per device 3/17/2012 13
  • 14. Seminar „11 CUDA Physical Memory Layout  “Local” memory resides in device DRAM  Use registers and shared memory to minimize local memory use  Host can read and write global memory but not shared memory 3/17/2012 14
  • 15. Seminar „11 CUDA Execution Model  Threads are executed by thread processors  Thread blocks are executed by multiprocessors  A kernel is launched as a grid of thread blocks 3/17/2012 15
  • 16. Seminar „11 CUDA CUDA software development 3/17/2012 16
  • 17. Seminar „11 CUDA Compiling CUDA code  CUDA nvcc compiler to compile the .cu files which divides code into NVidia assembly and C++ code. 3/17/2012 17
  • 18. Seminar „11 CUDA Example int main(void){ float *a_h, *b_h; //host data float *a_d, *b_d; //device data Host Device int N = 15, nBytes, i; nBytes = N*sizeof(float); a_h a_d a_h = (float*)malloc(nBytes); b_h = (float*)malloc(nBytes); b_h b_d cudaMalloc((void**)&a_d,nBytes); cudaMalloc((void**)&b_d,nBytes); for(i=0; i<N; i++) a_h[i]=100.f +i; cudaMemcpy(a_d, a_h, nBytes, cudaMemcpyHostToDevice); cudaMemcpy(b_d, a_d, nBytes, cudaMemcpyDeviceToDevice); cudaMemcpy(b_h, b_d, nByyes, cudaMemcpyDeviceToHost); for(i=0; i<N; i++) assert(a_h[i] == b_h[i]); free(a_h); free(b_h); cudaFree(a_d); cudaFree(b_d); return 0;} 3/17/2012 18
  • 19. Seminar „11 CUDA Applications Finance Numeric Medical Oil & Gas Biophysics Audio Video Imaging 3/17/2012 19
  • 20. Seminar „11 CUDA Advantages  Provides shared memory  Cost effective  The gaming industries demand on Graphics cards has forced a lot of research and money into the improvement of the GPUs  Transparent Scalability 3/17/2012 20
  • 21. Seminar „11 CUDA Drawbacks  Despite having hundreds of “cores” CUDA is not as flexible as CPU‟s  Not as effective for personal computers 3/17/2012 21
  • 22. Seminar „11 CUDA Future Scope  Implementation of CUDA in several other group of companies‟ GPUs.  More and more streaming processors can be included  CUDA in wide variety of programming languages. 3/17/2012 22
  • 23. Seminar „11 CUDA Conclusion  Brought significant innovations to the High Performance Computing world.  CUDA simplified process of development of general purpose parallel applications.  These applications have now enough computational power to get proper results in a short time. 3/17/2012 23
  • 24. Seminar „11 CUDA References 1. “CUDA by Example: An Introduction to General-Purpose GPU Programming” by Edward kandrot 2. “Programming Massively Parallel Processors: A Hands-on Approach (Applications of GPU Computing Series)” By David B kirk & Wen Mei W. Hwu. 3. “GPU Computing Gems Emerald Edition (Applications of GPU Computing Series)” By Wen-mei W. Hwu . 4. “The Cost To Play: CUDA Programming” , By Douglas Eadline, Ph.D. ,on Linux Magazine Wednesday, February 17th, 2010 5. “Nvidia Announces CUDA x86” Written by Cristian, On Tech Connect Magazine 21 September 2010 6. CUDA Programming Guide. ver. 1.1, http://www.nvidia.com/object/cuda_develop.html 7. TESLA GPU Computing Technical Brief, http://www.nvidia.com/object/tesla_product_literature.html 8. G80 architecture reviews and specification, http://www.nvidia.com/page/8800_reviews.html, http://www.nvidia.com/page/8800_tech_specs.html 9. Beyond3D G80: Architecture and GPU Analysis, http://www.beyond3d.com/content/reviews/1 10. Graphics adapters supporting CUDA, http://www.nvidia.com/object/cuda_learn_products.html 3/17/2012 24
  • 25.
  • 26. Seminar „11 CUDA Questions????? 3/17/2012 26

Notas del editor

  1. Host code