SlideShare a Scribd company logo
1 of 13
Download to read offline
INTRODUCTION
TO OPENCL
Unai Lopez

Intelligent Systems Group

Department of Computer Architecture & Technology

University of the Basque Country
Outline
1)  Introduction

2)  Programming Basics


3)  “Hello World”


4)  Final remarks
OpenCL
•  Standard for the development of data parallel applications

•  Most used for the development of GPGPU applications:
 General Purpose computing on Graphics Processing Units

•  A GPU is comprised of hundreds of compute cores




     nVidia GTX 285 (240 Compute cores)   nVidia GT200b Architecture

•  Specialized for massively data parallel computation
OpenCL & GPGPU
•  GPGPU: Take advantage of GPU’s computing power to
 make massively parallel applications

•  Parallel applications with huge acceleration in Molecular
 Dynamics, Image Processing, Evolutionary Computation,…

•  All cases based on data parallelism:
 each thread processes a subset of the data

•  For example, a vector addition:

        A	

        +	

        B	

        ||	

        C	

   Thread ID    0   1   2   3   4   5   6   7   8   9   10   11
OpenCL
•  Furthermore, OpenCL provides portability:
  same code can run on different architectures

•  For example:




Intel Core i5 CPU      STICell B/E        Intel Xeon Phi    AMD HD 6950 GPU
4 cores @ 2’5 Ghz   8 cores @ 3,2 Ghz   50 cores @ 1 Ghz   1408 cores @ 800 Mhz
OpenCL
•  Provides the following abstraction:
 A compute device is composed by compute units




•  OpenCL platform: Host + Compute Devices

•  Each manufacturer provides an SDK:
   •  NVIDIA SDK for GPUs
   •  AMD APP for CPUs/GPU
   •  Intel for CPUs
   •  IBM for PowerPC and Cell B/E
Programming Basics
•  Kernel: function that defines the behavior of each thread


•  For example, kernel for vector addition:
  __kernel void sumKernel (
  __global int* a, __global int* b, __global int* c)
  {
     int i = get_global_id(0);
     c[i] = a[i] + b[i];
  }


•  Written in OpenCL-C: ANSI-C + Set of kernel functions, e.g.:
   •  get_global_id: obtains thread index
   •  barrier: synchronizes threads
Programming Basics
•  An OpenCL applications consists of:



 Kernel file (OpenCL-C): problem computation   Host code(C): kernel management


•  Basic host application flow:
   1.  Load and Compilation of kernel
   2.  Data copy from host to device (e.g. from CPU to GPU)
   3.  Execution of kernel
   4.  Data copy from device to host
   5.  Release kernels and data from device memory
•  Execution using command queue in each device
Programming Basics
•  Host code: programmed using OpenCL API

•  API Calls, such as:
   •  clCreateProgramWithSource: Load kernel from char*
   •  clBuildProgram: Compile kernel
   •  clSetKernelArgs: Set kernel arguments for the device
   •  clEnqueueWriteBuffer/clEnqueueRead: Copy data vector to device
   •  clEnqueueNDRangerKernel: Launch kernel in device


•  API Types, such as:
   •  cl_mem: Pointer to device memory objects
   •  cl_program: Kernel object
   •  cl_float / cl_int / cl_uint: Redefinition of C types
Hello World

•  Implementation of simple vector addition in OpenCL


•  Checks for default platform and device in the system


•  Modify Makefile with proper paths in each system


•  Run: vectorAdd <size_of_vector>
Final Remarks
•  OpenCL does not provide performance portability

•  Alternative to NVIDIA CUDA:
 Programming paradigm for NVIDIA GPU cards

•  Combinable with other parallel programming models:
   •  OpenMP for SMPs / MPI for MPPs


•  Huge ecosystems for OpenCL, e.g. OpenACC:
 Develop GPGPU applications using directives
           #pragma acc kernels
           for(i = 0; i< N; i++)
              c[i] = b[i] + a[i];
More about OpenCL
•  Before starting to develop take a look at:
   •  Context, command queues, events,…
•  Documentation
   •  Khronos Group: Maintainers of OpenCL
   •  OpenCL Best practices guide in CUDA/AMD SDKs
   •  Programming Massively Parallel Processors (Book for CUDA)


•  OpenCL sample applications:
   •  Most SDKs include example OpenCL applications
   •  Rodinia: http://lava.cs.virginia.edu/wiki/rodinia
   •  Parboil: http://impact.crhc.illinois.edu/parboil.aspx
INTRODUCTION
TO OPENCL
Unai Lopez – ulopez009@ehu.es

Intelligent Systems Group

Department of Computer Architecture & Technology

University of the Basque Country

More Related Content

What's hot

Parallel Programming
Parallel ProgrammingParallel Programming
Parallel Programming
Uday Sharma
 

What's hot (20)

Gpu with cuda architecture
Gpu with cuda architectureGpu with cuda architecture
Gpu with cuda architecture
 
Parallel Programming
Parallel ProgrammingParallel Programming
Parallel Programming
 
Introduction to Parallel Computing
Introduction to Parallel ComputingIntroduction to Parallel Computing
Introduction to Parallel Computing
 
Cuda
CudaCuda
Cuda
 
CUDA
CUDACUDA
CUDA
 
Introduction to OpenMP
Introduction to OpenMPIntroduction to OpenMP
Introduction to OpenMP
 
Chapter 1 - introduction - parallel computing
Chapter  1 - introduction - parallel computingChapter  1 - introduction - parallel computing
Chapter 1 - introduction - parallel computing
 
Linux Programming
Linux ProgrammingLinux Programming
Linux Programming
 
Fun with Lambdas: C++14 Style (part 1)
Fun with Lambdas: C++14 Style (part 1)Fun with Lambdas: C++14 Style (part 1)
Fun with Lambdas: C++14 Style (part 1)
 
Processes and threads
Processes and threadsProcesses and threads
Processes and threads
 
OpenCL Heterogeneous Parallel Computing
OpenCL Heterogeneous Parallel ComputingOpenCL Heterogeneous Parallel Computing
OpenCL Heterogeneous Parallel Computing
 
Distributed deep learning
Distributed deep learningDistributed deep learning
Distributed deep learning
 
CUDA Architecture
CUDA ArchitectureCUDA Architecture
CUDA Architecture
 
13. multiprocessing
13. multiprocessing13. multiprocessing
13. multiprocessing
 
Architecture of TPU, GPU and CPU
Architecture of TPU, GPU and CPUArchitecture of TPU, GPU and CPU
Architecture of TPU, GPU and CPU
 
Cuda tutorial
Cuda tutorialCuda tutorial
Cuda tutorial
 
Pytorch
PytorchPytorch
Pytorch
 
Parallel processing
Parallel processingParallel processing
Parallel processing
 
A New Golden Age for Computer Architecture
A New Golden Age for Computer ArchitectureA New Golden Age for Computer Architecture
A New Golden Age for Computer Architecture
 
Parallel Computing
Parallel ComputingParallel Computing
Parallel Computing
 

Viewers also liked

OpenCL applications in genomics
OpenCL applications in genomicsOpenCL applications in genomics
OpenCL applications in genomics
USC
 
FPGA Architecture Presentation
FPGA Architecture PresentationFPGA Architecture Presentation
FPGA Architecture Presentation
omutukuda
 

Viewers also liked (13)

Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...Productive OpenCL Programming An Introduction to OpenCL Libraries  with Array...
Productive OpenCL Programming An Introduction to OpenCL Libraries with Array...
 
OpenCL applications in genomics
OpenCL applications in genomicsOpenCL applications in genomics
OpenCL applications in genomics
 
Leverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math LibrariesLeverage the Speed of OpenCL™ with AMD Math Libraries
Leverage the Speed of OpenCL™ with AMD Math Libraries
 
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP..."Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
"Efficient Implementation of Convolutional Neural Networks using OpenCL on FP...
 
FPGA Architecture Presentation
FPGA Architecture PresentationFPGA Architecture Presentation
FPGA Architecture Presentation
 
Field programable gate array
Field programable gate arrayField programable gate array
Field programable gate array
 
FPGAs : An Overview
FPGAs : An OverviewFPGAs : An Overview
FPGAs : An Overview
 
FPGA Introduction
FPGA IntroductionFPGA Introduction
FPGA Introduction
 
FPGA
FPGAFPGA
FPGA
 
What is FPGA?
What is FPGA?What is FPGA?
What is FPGA?
 
FPGA
FPGAFPGA
FPGA
 
Applications of paralleL processing
Applications of paralleL processingApplications of paralleL processing
Applications of paralleL processing
 
Fundamentals of FPGA
Fundamentals of FPGAFundamentals of FPGA
Fundamentals of FPGA
 

Similar to Introduction to OpenCL

MattsonTutorialSC14.pptx
MattsonTutorialSC14.pptxMattsonTutorialSC14.pptx
MattsonTutorialSC14.pptx
gopikahari7
 
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
mouhouioui
 
3 Open-Source-SYCL-Intel-Khronos-EVS-Workshop_May19.pdf
3 Open-Source-SYCL-Intel-Khronos-EVS-Workshop_May19.pdf3 Open-Source-SYCL-Intel-Khronos-EVS-Workshop_May19.pdf
3 Open-Source-SYCL-Intel-Khronos-EVS-Workshop_May19.pdf
JunZhao68
 

Similar to Introduction to OpenCL (20)

Introduction to OpenCL By Hammad Ghulam Mustafa
Introduction to OpenCL By Hammad Ghulam MustafaIntroduction to OpenCL By Hammad Ghulam Mustafa
Introduction to OpenCL By Hammad Ghulam Mustafa
 
MattsonTutorialSC14.pptx
MattsonTutorialSC14.pptxMattsonTutorialSC14.pptx
MattsonTutorialSC14.pptx
 
MattsonTutorialSC14.pdf
MattsonTutorialSC14.pdfMattsonTutorialSC14.pdf
MattsonTutorialSC14.pdf
 
The Rise of Parallel Computing
The Rise of Parallel ComputingThe Rise of Parallel Computing
The Rise of Parallel Computing
 
Utilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmapUtilizing AMD GPUs: Tuning, programming models, and roadmap
Utilizing AMD GPUs: Tuning, programming models, and roadmap
 
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by Mikael ...
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by  Mikael ...WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by  Mikael ...
WT-4069, WebCL: Enabling OpenCL Acceleration of Web Applications, by Mikael ...
 
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
Etude éducatif sur les GPUs & CPUs et les architectures paralleles -Programmi...
 
LCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience ReportLCU13: GPGPU on ARM Experience Report
LCU13: GPGPU on ARM Experience Report
 
Open cl programming using python syntax
Open cl programming using python syntaxOpen cl programming using python syntax
Open cl programming using python syntax
 
OpenCL programming using Python syntax
OpenCL programming using Python syntax OpenCL programming using Python syntax
OpenCL programming using Python syntax
 
Evaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI SupercomputerEvaluating GPU programming Models for the LUMI Supercomputer
Evaluating GPU programming Models for the LUMI Supercomputer
 
Exploring the Programming Models for the LUMI Supercomputer
Exploring the Programming Models for the LUMI Supercomputer Exploring the Programming Models for the LUMI Supercomputer
Exploring the Programming Models for the LUMI Supercomputer
 
Using Deep Learning Toolkits with Kubernetes clusters
Using Deep Learning Toolkits with Kubernetes clustersUsing Deep Learning Toolkits with Kubernetes clusters
Using Deep Learning Toolkits with Kubernetes clusters
 
Newbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universeNewbie’s guide to_the_gpgpu_universe
Newbie’s guide to_the_gpgpu_universe
 
Harnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern CoprocessorsHarnessing OpenCL in Modern Coprocessors
Harnessing OpenCL in Modern Coprocessors
 
Lec 10-linux-review
Lec 10-linux-reviewLec 10-linux-review
Lec 10-linux-review
 
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob KaralusDistributed Tensorflow with Kubernetes - data2day - Jakob Karalus
Distributed Tensorflow with Kubernetes - data2day - Jakob Karalus
 
lecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdflecture_GPUArchCUDA02-CUDAMem.pdf
lecture_GPUArchCUDA02-CUDAMem.pdf
 
3 Open-Source-SYCL-Intel-Khronos-EVS-Workshop_May19.pdf
3 Open-Source-SYCL-Intel-Khronos-EVS-Workshop_May19.pdf3 Open-Source-SYCL-Intel-Khronos-EVS-Workshop_May19.pdf
3 Open-Source-SYCL-Intel-Khronos-EVS-Workshop_May19.pdf
 
LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2LCU14 310- Cisco ODP v2
LCU14 310- Cisco ODP v2
 

More from Unai Lopez-Novoa

More from Unai Lopez-Novoa (8)

Exploring performance and energy consumption differences between recent Intel...
Exploring performance and energy consumption differences between recent Intel...Exploring performance and energy consumption differences between recent Intel...
Exploring performance and energy consumption differences between recent Intel...
 
A Platform for Overcrowding Detection in Indoor Events using Scalable Technol...
A Platform for Overcrowding Detection in Indoor Events using Scalable Technol...A Platform for Overcrowding Detection in Indoor Events using Scalable Technol...
A Platform for Overcrowding Detection in Indoor Events using Scalable Technol...
 
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...
Contributions to the Efficient Use of General Purpose Coprocessors: KDE as Ca...
 
Introducción a la Computación Paralela
Introducción a la Computación ParalelaIntroducción a la Computación Paralela
Introducción a la Computación Paralela
 
Computación Heterogénea: Aplicaciones y Modelado de Rendimiento
Computación Heterogénea: Aplicaciones y Modelado de RendimientoComputación Heterogénea: Aplicaciones y Modelado de Rendimiento
Computación Heterogénea: Aplicaciones y Modelado de Rendimiento
 
Exploring Gpgpu Workloads
Exploring Gpgpu WorkloadsExploring Gpgpu Workloads
Exploring Gpgpu Workloads
 
Tolerancia a fallos en MPI con Checkpointing
Tolerancia a fallos en MPI con CheckpointingTolerancia a fallos en MPI con Checkpointing
Tolerancia a fallos en MPI con Checkpointing
 
Introduccion a MPI
Introduccion a MPIIntroduccion a MPI
Introduccion a MPI
 

Recently uploaded

Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
Joaquim Jorge
 

Recently uploaded (20)

04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Tech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdfTech Trends Report 2024 Future Today Institute.pdf
Tech Trends Report 2024 Future Today Institute.pdf
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
presentation ICT roal in 21st century education
presentation ICT roal in 21st century educationpresentation ICT roal in 21st century education
presentation ICT roal in 21st century education
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)Powerful Google developer tools for immediate impact! (2023-24 C)
Powerful Google developer tools for immediate impact! (2023-24 C)
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?What Are The Drone Anti-jamming Systems Technology?
What Are The Drone Anti-jamming Systems Technology?
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Evaluating the top large language models.pdf
Evaluating the top large language models.pdfEvaluating the top large language models.pdf
Evaluating the top large language models.pdf
 
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot TakeoffStrategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
Strategize a Smooth Tenant-to-tenant Migration and Copilot Takeoff
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
Artificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and MythsArtificial Intelligence: Facts and Myths
Artificial Intelligence: Facts and Myths
 

Introduction to OpenCL

  • 1. INTRODUCTION TO OPENCL Unai Lopez Intelligent Systems Group Department of Computer Architecture & Technology University of the Basque Country
  • 2. Outline 1)  Introduction 2)  Programming Basics 3)  “Hello World” 4)  Final remarks
  • 3. OpenCL •  Standard for the development of data parallel applications •  Most used for the development of GPGPU applications: General Purpose computing on Graphics Processing Units •  A GPU is comprised of hundreds of compute cores nVidia GTX 285 (240 Compute cores) nVidia GT200b Architecture •  Specialized for massively data parallel computation
  • 4. OpenCL & GPGPU •  GPGPU: Take advantage of GPU’s computing power to make massively parallel applications •  Parallel applications with huge acceleration in Molecular Dynamics, Image Processing, Evolutionary Computation,… •  All cases based on data parallelism: each thread processes a subset of the data •  For example, a vector addition: A + B || C Thread ID 0 1 2 3 4 5 6 7 8 9 10 11
  • 5. OpenCL •  Furthermore, OpenCL provides portability: same code can run on different architectures •  For example: Intel Core i5 CPU STICell B/E Intel Xeon Phi AMD HD 6950 GPU 4 cores @ 2’5 Ghz 8 cores @ 3,2 Ghz 50 cores @ 1 Ghz 1408 cores @ 800 Mhz
  • 6. OpenCL •  Provides the following abstraction: A compute device is composed by compute units •  OpenCL platform: Host + Compute Devices •  Each manufacturer provides an SDK: •  NVIDIA SDK for GPUs •  AMD APP for CPUs/GPU •  Intel for CPUs •  IBM for PowerPC and Cell B/E
  • 7. Programming Basics •  Kernel: function that defines the behavior of each thread •  For example, kernel for vector addition: __kernel void sumKernel ( __global int* a, __global int* b, __global int* c) { int i = get_global_id(0); c[i] = a[i] + b[i]; } •  Written in OpenCL-C: ANSI-C + Set of kernel functions, e.g.: •  get_global_id: obtains thread index •  barrier: synchronizes threads
  • 8. Programming Basics •  An OpenCL applications consists of: Kernel file (OpenCL-C): problem computation Host code(C): kernel management •  Basic host application flow: 1.  Load and Compilation of kernel 2.  Data copy from host to device (e.g. from CPU to GPU) 3.  Execution of kernel 4.  Data copy from device to host 5.  Release kernels and data from device memory •  Execution using command queue in each device
  • 9. Programming Basics •  Host code: programmed using OpenCL API •  API Calls, such as: •  clCreateProgramWithSource: Load kernel from char* •  clBuildProgram: Compile kernel •  clSetKernelArgs: Set kernel arguments for the device •  clEnqueueWriteBuffer/clEnqueueRead: Copy data vector to device •  clEnqueueNDRangerKernel: Launch kernel in device •  API Types, such as: •  cl_mem: Pointer to device memory objects •  cl_program: Kernel object •  cl_float / cl_int / cl_uint: Redefinition of C types
  • 10. Hello World •  Implementation of simple vector addition in OpenCL •  Checks for default platform and device in the system •  Modify Makefile with proper paths in each system •  Run: vectorAdd <size_of_vector>
  • 11. Final Remarks •  OpenCL does not provide performance portability •  Alternative to NVIDIA CUDA: Programming paradigm for NVIDIA GPU cards •  Combinable with other parallel programming models: •  OpenMP for SMPs / MPI for MPPs •  Huge ecosystems for OpenCL, e.g. OpenACC: Develop GPGPU applications using directives #pragma acc kernels for(i = 0; i< N; i++) c[i] = b[i] + a[i];
  • 12. More about OpenCL •  Before starting to develop take a look at: •  Context, command queues, events,… •  Documentation •  Khronos Group: Maintainers of OpenCL •  OpenCL Best practices guide in CUDA/AMD SDKs •  Programming Massively Parallel Processors (Book for CUDA) •  OpenCL sample applications: •  Most SDKs include example OpenCL applications •  Rodinia: http://lava.cs.virginia.edu/wiki/rodinia •  Parboil: http://impact.crhc.illinois.edu/parboil.aspx
  • 13. INTRODUCTION TO OPENCL Unai Lopez – ulopez009@ehu.es Intelligent Systems Group Department of Computer Architecture & Technology University of the Basque Country