SlideShare una empresa de Scribd logo
1 de 29
Descargar para leer sin conexión
Research in GPU Computing




                     Cao Thanh Tung
Outline

 ●   Introduction to GPU Computing
              –   Past:      Graphics Processing and GPGPU
              –   Present:   CUDA and OpenCL
              –   A bit on the architecture
 ●   Why GPU?
 ●   GPU v.s. Multi-core and Distributed
 ●   Open problems.
 ●   Where does this go?

19-Jan-2011                            Computing Students talk   2
Introduction to GPU Computing

 ●   Who have access to 1,000 processors?




19-Jan-2011              Computing Students talk   3
Introduction to GPU Computing

 ●   Who have access to 1,000 processors?




19-Jan-2011              Computing Students talk   4
Introduction to GPU Computing

 ●   Who have access to 1,000 processors?

                                                   YOU




19-Jan-2011              Computing Students talk         5
Introduction to GPU Computing

 ●   In the past
              –   GPU = Graphics Processing Unit




19-Jan-2011                         Computing Students talk   6
Introduction to GPU Computing

 ●   In the past
              –   GPU = Graphics Processing Unit




19-Jan-2011                         Computing Students talk   7
Introduction to GPU Computing

 ●   In the past
              –   GPU = Graphics Processing Unit




19-Jan-2011                         Computing Students talk   8
Introduction to GPU Computing

 ●   In the past
              –   GPU = Graphics Processing Unit




19-Jan-2011                         Computing Students talk   9
Introduction to GPU Computing

 ●   In the past
              –   GPU = Graphics Processing Unit




19-Jan-2011                         Computing Students talk   10
Introduction to GPU Computing

 ●   In the past
              –   GPGPU = General Purpose computation using GPUs




19-Jan-2011                        Computing Students talk         11
Introduction to GPU Computing

 ●   Now                          al
                            Gener
               –   GPU = Graphics Processing Unit

              __device__ float3 collideCell(int3 gridPos, uint index...
              {
                  uint gridHash = calcGridHash(gridPos);
                  ...
                  for(uint j=startIndex; j<endIndex; j++) {
                      if (j != index) {
                          ...
                          force += collideSpheres(...);
                      }
                  }
                  return force;
              }

19-Jan-2011                          Computing Students talk              12
Introduction to GPU Computing

 ●   Now
               –   We have CUDA (NVIDIA, proprietary) and OpenCL (open standard)

              __device__ float3 collideCell(int3 gridPos, uint index...
              {
                  uint gridHash = calcGridHash(gridPos);
                  ...
                  for(uint j=startIndex; j<endIndex; j++) {
                      if (j != index) {
                          ...
                          force += collideSpheres(...);
                      }
                  }
                  return force;
              }

19-Jan-2011                         Computing Students talk                    13
Introduction to GPU Computing

 ●   A (just a little) bit on the
     architecture of the latest
     NVIDIA GPU (Fermi)
       –   Very simple core (even simpler
             than the Intel Atom)
       –   Little cache




19-Jan-2011                       Computing Students talk   14
Why GPU?




19-Jan-2011    Computing Students talk   15
Why GPU?

 ●   Performance




19-Jan-2011         Computing Students talk   16
Why GPU?

 ●   People have used it, and it works.
              –   Bio-Informatics
              –   Finance
              –   Fluid Dynamics
              –   Data-mining
              –   Computer Vision
              –   Medical Imaging
              –   Numerical Analytics



19-Jan-2011                          Computing Students talk   17
Why GPU?

 ●   A new, promising area
              –   Fast growing
              –   Ubiquitous
              –   New paradigm → new problems, new challenges




19-Jan-2011                        Computing Students talk      18
GPU v.s. Multi-core

 ●   A lot more threads of computation are required:
              –   The GPU has a lot more “core” than a multi-core CPU.
              –   A GPU core is no where as powerful as a CPU core.




19-Jan-2011                         Computing Students talk              19
GPU v.s. Multi-core

 ●   Challenges:
              –   Not all problems can easily be broken into many small sub-
                    problems to be solved in parallel.
              –   Race conditions are much more serious.
              –   Atomic operations are still doable, locking is a performance killer.
                    Lock-free algorithms are much preferable.
              –   Memory access bottleneck (memory is not that parallel)
              –   Debugging is a nightmare.




19-Jan-2011                           Computing Students talk                            20
GPU v.s. Distributed

 ●   GPU allows much cheaper communication between
     different threads.
 ●   GPU memory is still limited compared to a distributed
     system.
 ●   GPU cores are not completely independent processors
              –   Need fine-grain parallelism
              –   Reaching the scalability of a distributed system is difficult.




19-Jan-2011                           Computing Students talk                      21
Open problems

 ●   Data-structures
 ●   Algorithms
 ●   Tools
 ●   Theory




19-Jan-2011               Computing Students talk   22
Open problems

 ●   Data-structures
              –   Requirement: Able to handle very high level of concurrent access.
              –   Common data-structures like dynamic arrays, priority queues or
                    hash tables are not very suitable for the GPU.
              –   Some existing works: kD-tree, quad-tree, read-only hash table...




19-Jan-2011                          Computing Students talk                          23
Open problems

 ●   Algorithms
              –   Most sequential algorithms need serious re-design to make good
                   use of such a huge number of cores.
                        ●   Our computational geometry research: use the discrete
                             space computation to approximate the continuous space
                             result.
              –   Traditional parallel algorithms may or may not work.
                        ●   Usual assumption: infinite number of processors
                        ●   No serious study on this so far!



19-Jan-2011                            Computing Students talk                     24
Open problems

 ●   Tools
              –   Programming language: Better language or model to express
                    parallel algorithms?
              –   Compiler: Optimize GPU code? Auto-parallelization?
                        ●   There's some work on OpenMP to CUDA.
              –   Debugging tool? Maybe a whole new “art of debugging” is needed.


              –   Software engineering is currently far behind the hardware
                    development.


19-Jan-2011                          Computing Students talk                   25
Open problems

 ●   Theory
              –   Some traditional approach:
                        ●   PRAM: CRCW, EREW. Too general.
                        ●   SIMD: Too restricted.
              –   Big Oh analysis may not be good enough.
                        ●   Time complexity is relevant, but work complexity is more
                              important.
                        ●   Most GPU computing works only talk about actual running
                             time.
              –   Performance Modeling for GPU, anyone?

19-Jan-2011                            Computing Students talk                         26
Where does this go?

 ●   Intel/AMD already have 6 core 12 threads processors
     (maybe more).
 ●   SeaMicro has a server with 512 Atom dual-core processors.
 ●   AMD Fusion: CPU + GPU.


 ●   The GPU may not stay forever, but massively-multithreaded
     is definitely the future of computing.


19-Jan-2011               Computing Students talk            27
Where to start?

 ●   Check your PC.
              –   If it's not at the age of being able to go to a Primary school, there's
                      a high chance it has a GPU.
 ●   Go to NVIDIA/ATI website, download some development
     toolkit, and you're ready to go.




19-Jan-2011                           Computing Students talk                           28
THANK YOU

 ●   Any questions? Just ask.
 ●   Any suggestion? What are you waiting for.
 ●   Any problem or solution to discuss? Let's have a private talk
     somewhere (j/k)




19-Jan-2011                Computing Students talk              29

Más contenido relacionado

Destacado

Open CL For Haifa Linux Club
Open CL For Haifa Linux ClubOpen CL For Haifa Linux Club
Open CL For Haifa Linux ClubOfer Rosenberg
 
GPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 KeynoteGPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 KeynoteNVIDIA
 
Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Rob Gillen
 
Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Angela Mendoza M.
 
E-Learning: Introduction to GPGPU
E-Learning: Introduction to GPGPUE-Learning: Introduction to GPGPU
E-Learning: Introduction to GPGPUNur Ahmadi
 
GPUDirect RDMA and Green Multi-GPU Architectures
GPUDirect RDMA and Green Multi-GPU ArchitecturesGPUDirect RDMA and Green Multi-GPU Architectures
GPUDirect RDMA and Green Multi-GPU Architecturesinside-BigData.com
 
Introduction to gpu architecture
Introduction to gpu architectureIntroduction to gpu architecture
Introduction to gpu architectureCHIHTE LU
 
CS 354 GPU Architecture
CS 354 GPU ArchitectureCS 354 GPU Architecture
CS 354 GPU ArchitectureMark Kilgard
 
Introduction to OpenCL, 2010
Introduction to OpenCL, 2010Introduction to OpenCL, 2010
Introduction to OpenCL, 2010Tomasz Bednarz
 
GPU, CUDA, OpenCL and OpenACC for Parallel Applications
GPU, CUDA, OpenCL and OpenACC for Parallel ApplicationsGPU, CUDA, OpenCL and OpenACC for Parallel Applications
GPU, CUDA, OpenCL and OpenACC for Parallel ApplicationsMarcos Gonzalez
 
Gpu and The Brick Wall
Gpu and The Brick WallGpu and The Brick Wall
Gpu and The Brick Wallugur candan
 
Introduction to CUDA
Introduction to CUDAIntroduction to CUDA
Introduction to CUDARaymond Tay
 
Graphics Processing Unit - GPU
Graphics Processing Unit - GPUGraphics Processing Unit - GPU
Graphics Processing Unit - GPUChetan Gole
 
Graphic Processing Unit (GPU)
Graphic Processing Unit (GPU)Graphic Processing Unit (GPU)
Graphic Processing Unit (GPU)Jafar Khan
 
Graphics processing unit (GPU)
Graphics processing unit (GPU)Graphics processing unit (GPU)
Graphics processing unit (GPU)Amal R
 

Destacado (20)

Gpgpu
GpgpuGpgpu
Gpgpu
 
Open CL For Haifa Linux Club
Open CL For Haifa Linux ClubOpen CL For Haifa Linux Club
Open CL For Haifa Linux Club
 
GPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 KeynoteGPU Technology Conference 2014 Keynote
GPU Technology Conference 2014 Keynote
 
Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)Intro to GPGPU with CUDA (DevLink)
Intro to GPGPU with CUDA (DevLink)
 
Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08Nvidia cuda tutorial_no_nda_apr08
Nvidia cuda tutorial_no_nda_apr08
 
E-Learning: Introduction to GPGPU
E-Learning: Introduction to GPGPUE-Learning: Introduction to GPGPU
E-Learning: Introduction to GPGPU
 
GPUDirect RDMA and Green Multi-GPU Architectures
GPUDirect RDMA and Green Multi-GPU ArchitecturesGPUDirect RDMA and Green Multi-GPU Architectures
GPUDirect RDMA and Green Multi-GPU Architectures
 
Introduction to gpu architecture
Introduction to gpu architectureIntroduction to gpu architecture
Introduction to gpu architecture
 
GPU Programming with Java
GPU Programming with JavaGPU Programming with Java
GPU Programming with Java
 
CS 354 GPU Architecture
CS 354 GPU ArchitectureCS 354 GPU Architecture
CS 354 GPU Architecture
 
Introduction to OpenCL, 2010
Introduction to OpenCL, 2010Introduction to OpenCL, 2010
Introduction to OpenCL, 2010
 
Hands on OpenCL
Hands on OpenCLHands on OpenCL
Hands on OpenCL
 
Lec04 gpu architecture
Lec04 gpu architectureLec04 gpu architecture
Lec04 gpu architecture
 
GPU, CUDA, OpenCL and OpenACC for Parallel Applications
GPU, CUDA, OpenCL and OpenACC for Parallel ApplicationsGPU, CUDA, OpenCL and OpenACC for Parallel Applications
GPU, CUDA, OpenCL and OpenACC for Parallel Applications
 
GPU: Understanding CUDA
GPU: Understanding CUDAGPU: Understanding CUDA
GPU: Understanding CUDA
 
Gpu and The Brick Wall
Gpu and The Brick WallGpu and The Brick Wall
Gpu and The Brick Wall
 
Introduction to CUDA
Introduction to CUDAIntroduction to CUDA
Introduction to CUDA
 
Graphics Processing Unit - GPU
Graphics Processing Unit - GPUGraphics Processing Unit - GPU
Graphics Processing Unit - GPU
 
Graphic Processing Unit (GPU)
Graphic Processing Unit (GPU)Graphic Processing Unit (GPU)
Graphic Processing Unit (GPU)
 
Graphics processing unit (GPU)
Graphics processing unit (GPU)Graphics processing unit (GPU)
Graphics processing unit (GPU)
 

Similar a CSTalks - GPGPU - 19 Jan

What I learned building a parallel processor from scratch
What I learned building a parallel processor from scratchWhat I learned building a parallel processor from scratch
What I learned building a parallel processor from scratchAndreas Olofsson
 
GPUs vs CPUs for Parallel Processing
GPUs vs CPUs for Parallel ProcessingGPUs vs CPUs for Parallel Processing
GPUs vs CPUs for Parallel ProcessingMohammed Billoo
 
OpenHPI - Parallel Programming Concepts - Week 4
OpenHPI - Parallel Programming Concepts - Week 4OpenHPI - Parallel Programming Concepts - Week 4
OpenHPI - Parallel Programming Concepts - Week 4Peter Tröger
 
Artificial intelligence on the Edge
Artificial intelligence on the EdgeArtificial intelligence on the Edge
Artificial intelligence on the EdgeUsman Qayyum
 
digitaldesign-s20-lecture3b-fpga-afterlecture.pdf
digitaldesign-s20-lecture3b-fpga-afterlecture.pdfdigitaldesign-s20-lecture3b-fpga-afterlecture.pdf
digitaldesign-s20-lecture3b-fpga-afterlecture.pdfDuy-Hieu Bui
 
High Performance Rust UI.pdf
High Performance Rust UI.pdfHigh Performance Rust UI.pdf
High Performance Rust UI.pdfmraaaaa
 
Report on GPGPU at FCA (Lyon, France, 11-15 October, 2010)
Report on GPGPU at FCA  (Lyon, France, 11-15 October, 2010)Report on GPGPU at FCA  (Lyon, France, 11-15 October, 2010)
Report on GPGPU at FCA (Lyon, France, 11-15 October, 2010)PhtRaveller
 
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "SHow I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "SBrandon Liu
 
Deterministic Machine Learning with MLflow and mlf-core
Deterministic Machine Learning with MLflow and mlf-coreDeterministic Machine Learning with MLflow and mlf-core
Deterministic Machine Learning with MLflow and mlf-coreDatabricks
 
Mauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscteMauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-isctembreternitz
 
Introduction to plotting in Python
Introduction to plotting in Python Introduction to plotting in Python
Introduction to plotting in Python bzamecnik
 
Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509Linaro
 
blueMarine Sailing with NetBeans Platform
blueMarine Sailing with NetBeans PlatformblueMarine Sailing with NetBeans Platform
blueMarine Sailing with NetBeans PlatformFabrizio Giudici
 
The road to multi/many core computing
The road to multi/many core computingThe road to multi/many core computing
The road to multi/many core computingOsvaldo Gervasi
 
Strata Beijing 2017: Jumpy, a python interface for nd4j
Strata Beijing 2017: Jumpy, a python interface for nd4jStrata Beijing 2017: Jumpy, a python interface for nd4j
Strata Beijing 2017: Jumpy, a python interface for nd4jAdam Gibson
 
CGYRO Performance on Power9 CPUs and Volta GPUS
CGYRO Performance on Power9 CPUs and Volta GPUSCGYRO Performance on Power9 CPUs and Volta GPUS
CGYRO Performance on Power9 CPUs and Volta GPUSIgor Sfiligoi
 
Compute API –Past & Future
Compute API –Past & FutureCompute API –Past & Future
Compute API –Past & FutureOfer Rosenberg
 
Java Image Processing for Geospatial Community
Java Image Processing for Geospatial CommunityJava Image Processing for Geospatial Community
Java Image Processing for Geospatial CommunityJody Garnett
 

Similar a CSTalks - GPGPU - 19 Jan (20)

What I learned building a parallel processor from scratch
What I learned building a parallel processor from scratchWhat I learned building a parallel processor from scratch
What I learned building a parallel processor from scratch
 
GPUs vs CPUs for Parallel Processing
GPUs vs CPUs for Parallel ProcessingGPUs vs CPUs for Parallel Processing
GPUs vs CPUs for Parallel Processing
 
OpenHPI - Parallel Programming Concepts - Week 4
OpenHPI - Parallel Programming Concepts - Week 4OpenHPI - Parallel Programming Concepts - Week 4
OpenHPI - Parallel Programming Concepts - Week 4
 
Artificial intelligence on the Edge
Artificial intelligence on the EdgeArtificial intelligence on the Edge
Artificial intelligence on the Edge
 
Cloud accounting software uk
Cloud accounting software ukCloud accounting software uk
Cloud accounting software uk
 
Dl2 computing gpu
Dl2 computing gpuDl2 computing gpu
Dl2 computing gpu
 
digitaldesign-s20-lecture3b-fpga-afterlecture.pdf
digitaldesign-s20-lecture3b-fpga-afterlecture.pdfdigitaldesign-s20-lecture3b-fpga-afterlecture.pdf
digitaldesign-s20-lecture3b-fpga-afterlecture.pdf
 
High Performance Rust UI.pdf
High Performance Rust UI.pdfHigh Performance Rust UI.pdf
High Performance Rust UI.pdf
 
Report on GPGPU at FCA (Lyon, France, 11-15 October, 2010)
Report on GPGPU at FCA  (Lyon, France, 11-15 October, 2010)Report on GPGPU at FCA  (Lyon, France, 11-15 October, 2010)
Report on GPGPU at FCA (Lyon, France, 11-15 October, 2010)
 
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "SHow I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
How I Sped up Complex Matrix-Vector Multiplication: Finding Intel MKL's "S
 
Deterministic Machine Learning with MLflow and mlf-core
Deterministic Machine Learning with MLflow and mlf-coreDeterministic Machine Learning with MLflow and mlf-core
Deterministic Machine Learning with MLflow and mlf-core
 
Mauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscteMauricio breteernitiz hpc-exascale-iscte
Mauricio breteernitiz hpc-exascale-iscte
 
Introduction to plotting in Python
Introduction to plotting in Python Introduction to plotting in Python
Introduction to plotting in Python
 
Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509Deep Learning on ARM Platforms - SFO17-509
Deep Learning on ARM Platforms - SFO17-509
 
blueMarine Sailing with NetBeans Platform
blueMarine Sailing with NetBeans PlatformblueMarine Sailing with NetBeans Platform
blueMarine Sailing with NetBeans Platform
 
The road to multi/many core computing
The road to multi/many core computingThe road to multi/many core computing
The road to multi/many core computing
 
Strata Beijing 2017: Jumpy, a python interface for nd4j
Strata Beijing 2017: Jumpy, a python interface for nd4jStrata Beijing 2017: Jumpy, a python interface for nd4j
Strata Beijing 2017: Jumpy, a python interface for nd4j
 
CGYRO Performance on Power9 CPUs and Volta GPUS
CGYRO Performance on Power9 CPUs and Volta GPUSCGYRO Performance on Power9 CPUs and Volta GPUS
CGYRO Performance on Power9 CPUs and Volta GPUS
 
Compute API –Past & Future
Compute API –Past & FutureCompute API –Past & Future
Compute API –Past & Future
 
Java Image Processing for Geospatial Community
Java Image Processing for Geospatial CommunityJava Image Processing for Geospatial Community
Java Image Processing for Geospatial Community
 

Más de cstalks

CSTalks-Natural Language Processing-2 Nov
CSTalks-Natural Language Processing-2 NovCSTalks-Natural Language Processing-2 Nov
CSTalks-Natural Language Processing-2 Novcstalks
 
CSTalks-Natural Language Processing-17Aug
CSTalks-Natural Language Processing-17AugCSTalks-Natural Language Processing-17Aug
CSTalks-Natural Language Processing-17Augcstalks
 
CSTalks-Sensor-Rich Mobile Video Indexing and Search-17Aug
CSTalks-Sensor-Rich Mobile Video Indexing and Search-17AugCSTalks-Sensor-Rich Mobile Video Indexing and Search-17Aug
CSTalks-Sensor-Rich Mobile Video Indexing and Search-17Augcstalks
 
CSTalks-Visualizing Software Behavior-14Sep
CSTalks-Visualizing Software Behavior-14SepCSTalks-Visualizing Software Behavior-14Sep
CSTalks-Visualizing Software Behavior-14Sepcstalks
 
CSTalks-Polymorphic heterogeneous multicore systems-17Aug
CSTalks-Polymorphic heterogeneous multicore systems-17AugCSTalks-Polymorphic heterogeneous multicore systems-17Aug
CSTalks-Polymorphic heterogeneous multicore systems-17Augcstalks
 
CSTalks-Quaternary Semantics Recomandation System-24 Aug
CSTalks-Quaternary Semantics Recomandation System-24 AugCSTalks-Quaternary Semantics Recomandation System-24 Aug
CSTalks-Quaternary Semantics Recomandation System-24 Augcstalks
 
CSTalks - Object detection and tracking - 25th May
CSTalks - Object detection and tracking - 25th MayCSTalks - Object detection and tracking - 25th May
CSTalks - Object detection and tracking - 25th Maycstalks
 
CSTalks - The Multicore Midlife Crisis - 30 Mar
CSTalks - The Multicore Midlife Crisis - 30 MarCSTalks - The Multicore Midlife Crisis - 30 Mar
CSTalks - The Multicore Midlife Crisis - 30 Marcstalks
 
CSTalks - On machine learning - 2 Mar
CSTalks - On machine learning - 2 MarCSTalks - On machine learning - 2 Mar
CSTalks - On machine learning - 2 Marcstalks
 
CSTalks - Real movie recommendation - 9 Mar
CSTalks - Real movie recommendation - 9 MarCSTalks - Real movie recommendation - 9 Mar
CSTalks - Real movie recommendation - 9 Marcstalks
 
CSTalks-LifeBeyondPhD-16Mar
CSTalks-LifeBeyondPhD-16MarCSTalks-LifeBeyondPhD-16Mar
CSTalks-LifeBeyondPhD-16Marcstalks
 
CSTalks - Music Information Retrieval - 23 Feb
CSTalks - Music Information Retrieval - 23 FebCSTalks - Music Information Retrieval - 23 Feb
CSTalks - Music Information Retrieval - 23 Febcstalks
 
CSTalks - Peer-to-peer - 16 Feb
CSTalks - Peer-to-peer - 16 FebCSTalks - Peer-to-peer - 16 Feb
CSTalks - Peer-to-peer - 16 Febcstalks
 
CSTalks - Named Data Networks - 9 Feb
CSTalks - Named Data Networks - 9 FebCSTalks - Named Data Networks - 9 Feb
CSTalks - Named Data Networks - 9 Febcstalks
 
CSTalks - Model Checking - 26 Jan
CSTalks - Model Checking - 26 JanCSTalks - Model Checking - 26 Jan
CSTalks - Model Checking - 26 Jancstalks
 

Más de cstalks (15)

CSTalks-Natural Language Processing-2 Nov
CSTalks-Natural Language Processing-2 NovCSTalks-Natural Language Processing-2 Nov
CSTalks-Natural Language Processing-2 Nov
 
CSTalks-Natural Language Processing-17Aug
CSTalks-Natural Language Processing-17AugCSTalks-Natural Language Processing-17Aug
CSTalks-Natural Language Processing-17Aug
 
CSTalks-Sensor-Rich Mobile Video Indexing and Search-17Aug
CSTalks-Sensor-Rich Mobile Video Indexing and Search-17AugCSTalks-Sensor-Rich Mobile Video Indexing and Search-17Aug
CSTalks-Sensor-Rich Mobile Video Indexing and Search-17Aug
 
CSTalks-Visualizing Software Behavior-14Sep
CSTalks-Visualizing Software Behavior-14SepCSTalks-Visualizing Software Behavior-14Sep
CSTalks-Visualizing Software Behavior-14Sep
 
CSTalks-Polymorphic heterogeneous multicore systems-17Aug
CSTalks-Polymorphic heterogeneous multicore systems-17AugCSTalks-Polymorphic heterogeneous multicore systems-17Aug
CSTalks-Polymorphic heterogeneous multicore systems-17Aug
 
CSTalks-Quaternary Semantics Recomandation System-24 Aug
CSTalks-Quaternary Semantics Recomandation System-24 AugCSTalks-Quaternary Semantics Recomandation System-24 Aug
CSTalks-Quaternary Semantics Recomandation System-24 Aug
 
CSTalks - Object detection and tracking - 25th May
CSTalks - Object detection and tracking - 25th MayCSTalks - Object detection and tracking - 25th May
CSTalks - Object detection and tracking - 25th May
 
CSTalks - The Multicore Midlife Crisis - 30 Mar
CSTalks - The Multicore Midlife Crisis - 30 MarCSTalks - The Multicore Midlife Crisis - 30 Mar
CSTalks - The Multicore Midlife Crisis - 30 Mar
 
CSTalks - On machine learning - 2 Mar
CSTalks - On machine learning - 2 MarCSTalks - On machine learning - 2 Mar
CSTalks - On machine learning - 2 Mar
 
CSTalks - Real movie recommendation - 9 Mar
CSTalks - Real movie recommendation - 9 MarCSTalks - Real movie recommendation - 9 Mar
CSTalks - Real movie recommendation - 9 Mar
 
CSTalks-LifeBeyondPhD-16Mar
CSTalks-LifeBeyondPhD-16MarCSTalks-LifeBeyondPhD-16Mar
CSTalks-LifeBeyondPhD-16Mar
 
CSTalks - Music Information Retrieval - 23 Feb
CSTalks - Music Information Retrieval - 23 FebCSTalks - Music Information Retrieval - 23 Feb
CSTalks - Music Information Retrieval - 23 Feb
 
CSTalks - Peer-to-peer - 16 Feb
CSTalks - Peer-to-peer - 16 FebCSTalks - Peer-to-peer - 16 Feb
CSTalks - Peer-to-peer - 16 Feb
 
CSTalks - Named Data Networks - 9 Feb
CSTalks - Named Data Networks - 9 FebCSTalks - Named Data Networks - 9 Feb
CSTalks - Named Data Networks - 9 Feb
 
CSTalks - Model Checking - 26 Jan
CSTalks - Model Checking - 26 JanCSTalks - Model Checking - 26 Jan
CSTalks - Model Checking - 26 Jan
 

Último

Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...DhatriParmar
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxMichelleTuguinay1
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxAnupam32727
 
4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptxmary850239
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Projectjordimapav
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfChristalin Nelson
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17Celine George
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWQuiz Club NITW
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSMae Pangan
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfVanessa Camilleri
 
Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Celine George
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptxDhatriParmar
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...Nguyen Thanh Tu Collection
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research DiscourseAnita GoswamiGiri
 
CHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxCHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxAneriPatwari
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxSayali Powar
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 

Último (20)

Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
Blowin' in the Wind of Caste_ Bob Dylan's Song as a Catalyst for Social Justi...
 
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptxDIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
DIFFERENT BASKETRY IN THE PHILIPPINES PPT.pptx
 
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptxCLASSIFICATION OF ANTI - CANCER DRUGS.pptx
CLASSIFICATION OF ANTI - CANCER DRUGS.pptx
 
4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx4.9.24 School Desegregation in Boston.pptx
4.9.24 School Desegregation in Boston.pptx
 
ClimART Action | eTwinning Project
ClimART Action    |    eTwinning ProjectClimART Action    |    eTwinning Project
ClimART Action | eTwinning Project
 
Indexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdfIndexing Structures in Database Management system.pdf
Indexing Structures in Database Management system.pdf
 
How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17How to Manage Buy 3 Get 1 Free in Odoo 17
How to Manage Buy 3 Get 1 Free in Odoo 17
 
Mythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITWMythology Quiz-4th April 2024, Quiz Club NITW
Mythology Quiz-4th April 2024, Quiz Club NITW
 
Textual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHSTextual Evidence in Reading and Writing of SHS
Textual Evidence in Reading and Writing of SHS
 
ICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdfICS2208 Lecture6 Notes for SL spaces.pdf
ICS2208 Lecture6 Notes for SL spaces.pdf
 
Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17Tree View Decoration Attribute in the Odoo 17
Tree View Decoration Attribute in the Odoo 17
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
Unraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptxUnraveling Hypertext_ Analyzing  Postmodern Elements in  Literature.pptx
Unraveling Hypertext_ Analyzing Postmodern Elements in Literature.pptx
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
31 ĐỀ THI THỬ VÀO LỚP 10 - TIẾNG ANH - FORM MỚI 2025 - 40 CÂU HỎI - BÙI VĂN V...
 
Scientific Writing :Research Discourse
Scientific  Writing :Research  DiscourseScientific  Writing :Research  Discourse
Scientific Writing :Research Discourse
 
Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"Mattingly "AI & Prompt Design: Large Language Models"
Mattingly "AI & Prompt Design: Large Language Models"
 
CHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptxCHEST Proprioceptive neuromuscular facilitation.pptx
CHEST Proprioceptive neuromuscular facilitation.pptx
 
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptxBIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
BIOCHEMISTRY-CARBOHYDRATE METABOLISM CHAPTER 2.pptx
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 

CSTalks - GPGPU - 19 Jan

  • 1. Research in GPU Computing Cao Thanh Tung
  • 2. Outline ● Introduction to GPU Computing – Past: Graphics Processing and GPGPU – Present: CUDA and OpenCL – A bit on the architecture ● Why GPU? ● GPU v.s. Multi-core and Distributed ● Open problems. ● Where does this go? 19-Jan-2011 Computing Students talk 2
  • 3. Introduction to GPU Computing ● Who have access to 1,000 processors? 19-Jan-2011 Computing Students talk 3
  • 4. Introduction to GPU Computing ● Who have access to 1,000 processors? 19-Jan-2011 Computing Students talk 4
  • 5. Introduction to GPU Computing ● Who have access to 1,000 processors? YOU 19-Jan-2011 Computing Students talk 5
  • 6. Introduction to GPU Computing ● In the past – GPU = Graphics Processing Unit 19-Jan-2011 Computing Students talk 6
  • 7. Introduction to GPU Computing ● In the past – GPU = Graphics Processing Unit 19-Jan-2011 Computing Students talk 7
  • 8. Introduction to GPU Computing ● In the past – GPU = Graphics Processing Unit 19-Jan-2011 Computing Students talk 8
  • 9. Introduction to GPU Computing ● In the past – GPU = Graphics Processing Unit 19-Jan-2011 Computing Students talk 9
  • 10. Introduction to GPU Computing ● In the past – GPU = Graphics Processing Unit 19-Jan-2011 Computing Students talk 10
  • 11. Introduction to GPU Computing ● In the past – GPGPU = General Purpose computation using GPUs 19-Jan-2011 Computing Students talk 11
  • 12. Introduction to GPU Computing ● Now al Gener – GPU = Graphics Processing Unit __device__ float3 collideCell(int3 gridPos, uint index... { uint gridHash = calcGridHash(gridPos); ... for(uint j=startIndex; j<endIndex; j++) { if (j != index) { ... force += collideSpheres(...); } } return force; } 19-Jan-2011 Computing Students talk 12
  • 13. Introduction to GPU Computing ● Now – We have CUDA (NVIDIA, proprietary) and OpenCL (open standard) __device__ float3 collideCell(int3 gridPos, uint index... { uint gridHash = calcGridHash(gridPos); ... for(uint j=startIndex; j<endIndex; j++) { if (j != index) { ... force += collideSpheres(...); } } return force; } 19-Jan-2011 Computing Students talk 13
  • 14. Introduction to GPU Computing ● A (just a little) bit on the architecture of the latest NVIDIA GPU (Fermi) – Very simple core (even simpler than the Intel Atom) – Little cache 19-Jan-2011 Computing Students talk 14
  • 15. Why GPU? 19-Jan-2011 Computing Students talk 15
  • 16. Why GPU? ● Performance 19-Jan-2011 Computing Students talk 16
  • 17. Why GPU? ● People have used it, and it works. – Bio-Informatics – Finance – Fluid Dynamics – Data-mining – Computer Vision – Medical Imaging – Numerical Analytics 19-Jan-2011 Computing Students talk 17
  • 18. Why GPU? ● A new, promising area – Fast growing – Ubiquitous – New paradigm → new problems, new challenges 19-Jan-2011 Computing Students talk 18
  • 19. GPU v.s. Multi-core ● A lot more threads of computation are required: – The GPU has a lot more “core” than a multi-core CPU. – A GPU core is no where as powerful as a CPU core. 19-Jan-2011 Computing Students talk 19
  • 20. GPU v.s. Multi-core ● Challenges: – Not all problems can easily be broken into many small sub- problems to be solved in parallel. – Race conditions are much more serious. – Atomic operations are still doable, locking is a performance killer. Lock-free algorithms are much preferable. – Memory access bottleneck (memory is not that parallel) – Debugging is a nightmare. 19-Jan-2011 Computing Students talk 20
  • 21. GPU v.s. Distributed ● GPU allows much cheaper communication between different threads. ● GPU memory is still limited compared to a distributed system. ● GPU cores are not completely independent processors – Need fine-grain parallelism – Reaching the scalability of a distributed system is difficult. 19-Jan-2011 Computing Students talk 21
  • 22. Open problems ● Data-structures ● Algorithms ● Tools ● Theory 19-Jan-2011 Computing Students talk 22
  • 23. Open problems ● Data-structures – Requirement: Able to handle very high level of concurrent access. – Common data-structures like dynamic arrays, priority queues or hash tables are not very suitable for the GPU. – Some existing works: kD-tree, quad-tree, read-only hash table... 19-Jan-2011 Computing Students talk 23
  • 24. Open problems ● Algorithms – Most sequential algorithms need serious re-design to make good use of such a huge number of cores. ● Our computational geometry research: use the discrete space computation to approximate the continuous space result. – Traditional parallel algorithms may or may not work. ● Usual assumption: infinite number of processors ● No serious study on this so far! 19-Jan-2011 Computing Students talk 24
  • 25. Open problems ● Tools – Programming language: Better language or model to express parallel algorithms? – Compiler: Optimize GPU code? Auto-parallelization? ● There's some work on OpenMP to CUDA. – Debugging tool? Maybe a whole new “art of debugging” is needed. – Software engineering is currently far behind the hardware development. 19-Jan-2011 Computing Students talk 25
  • 26. Open problems ● Theory – Some traditional approach: ● PRAM: CRCW, EREW. Too general. ● SIMD: Too restricted. – Big Oh analysis may not be good enough. ● Time complexity is relevant, but work complexity is more important. ● Most GPU computing works only talk about actual running time. – Performance Modeling for GPU, anyone? 19-Jan-2011 Computing Students talk 26
  • 27. Where does this go? ● Intel/AMD already have 6 core 12 threads processors (maybe more). ● SeaMicro has a server with 512 Atom dual-core processors. ● AMD Fusion: CPU + GPU. ● The GPU may not stay forever, but massively-multithreaded is definitely the future of computing. 19-Jan-2011 Computing Students talk 27
  • 28. Where to start? ● Check your PC. – If it's not at the age of being able to go to a Primary school, there's a high chance it has a GPU. ● Go to NVIDIA/ATI website, download some development toolkit, and you're ready to go. 19-Jan-2011 Computing Students talk 28
  • 29. THANK YOU ● Any questions? Just ask. ● Any suggestion? What are you waiting for. ● Any problem or solution to discuss? Let's have a private talk somewhere (j/k) 19-Jan-2011 Computing Students talk 29