SlideShare a Scribd company logo
1 of 23
For Dummies
From a Dummy


Ngobrol Ilmiah PPIS #1
16 Desember, 2012
M. Alfian Amrizal
Tohoku University
• Introduction to Parallel Computing
• GPU as an Accelerator




                                       2
Classical science


Nature
         Observation          Theory
                                       blogs.sundaymercury.net




                  Physical
                Experiments

                                       conserve-energy-future.com

            Numerical Simulations


              Modern science
                                                                    3
                                        SX-9 (Tohoku University)
Quantum chemistry                                 Cosmology                                            CFD




                                                                                    autoevolution.com
scidacreview.org

                                            physicsworld.com


                                 Medicine                           Material design




                   albertkents.com
                                                               solid.me.tut.ac.jp
                                                                                                              4
• Supercomputer
         –      The most powerful computers that can be built[2]
         –      First computer “ENIAC” ⇒ 350 mult/sec (1946)
         –      Todays supercomputer > 1,000,000,000 x ENIACS
         –      Todays processor speed only ~ 1,000,000 x ENIACS (?)

                          “Parallel computing”




                            cbc.ca
                                                 datacenterknowledge.com
allvoices.com                                                              5
CPU: The brain of the
computer, all data is
processed here

Memory: The computers
scratch pad, programs
are loaded and run here


GPU: For graphics
processing. Used as
accelerator in HPC


Storage: Hold data
and program files
                          6
•  The free lunch is over!!

                               -Heat
                               -Power restriction
                               -Transistor size
                               CPU arent getting
                               any faster




                                             7
• Multicomputers       • Multicore
                              Core1      Core2




  Distributed memory        Shared memory
   parallel computer       parallel computer
                       (e.g. dual core, quad core etc)
                                                         8
• Trends in HPC system design
     –    More nodes/processors/cores
     –    Deep memory hierarchies
     –    Non-uniform interconnect network
     –    Accelerators  today’s topic
                                                   N

                                            N           P
                                                             P
                                                                …
                                                               … C
                                                                C
                                        N
                                                    P
                                                            C … CC
                                                              C             A C
                                                                                  …   C
                                    N          P
                                                 P
                                                 ……
                                             PP C C            C M
                                                                               C  …   C
 N          N            N       N         P
                                         PP CCC …
                                                ……       CC
                                                           C
                                                               M
                                                                 M     …
                                                                      A C      C
                                                                                      C
                     …                  P
                                     PP CCC   ……
                                               …      CC
                                                        C             ……
                                                                    A C
                                                                         C
                                                                             C
                                                                               C
  P   C      P   C       P   C               ……      C
                                                   CC M M
                                                                  A C
                                                                     …     C
                                             …                      ………
                                    P     CC                   A C    C      C
                                  P
                                      C
                                        C
                                            …     C
                                                C MMM
                                                  MM           M    C
                                                                        C
                                                                           C M
                                                                                 M

      M          M           M
                                              M
                                               MM
                                                 M                C …   C M
                                                                        M M
                                           M         M            C …   CM
                                                 M                    M
                                               M                    M
          Good old days!                   M
          One proc. / node
          One core / proc.       Too complicated …
          Uniform network…       How can we fully exploit the potential?                  9
• Programmers need to learn both Hardware and
  Software




                              Figure: Markus Pueschel
                                                    10
• We need a powerful computer
• CPU speed cannot be increased anymore
• Go parallel:
  – Multicomputer
  – Multicore
• System’s complexity requires programmer
  to learn both HW and SW


                                       11
• Introduction to Parallel Computing
• GPU as Accelerator




                                       12
13
• Power is the problem
  – System size is limited by power budget
• Heterogeneous system is promising
  – CPU + Accelerator (=GPU)
  – CPU and GPU have their own strengths and
    weaknesses
  – CPU: few cores, high frequency (~GHz)
  – GPU: 1000 cores, low frequency (~MHz)

                                               14
• Graphics Processing Unit (GPU)
      – Originally developed for quickly generating 2D and
        3D graphics, images, and video
      – Highly parallel processor
      – GPU is more power-efficient than CPU[3]




*Image from nvidia.com                                       15
• CPU and GPU are very different
  processors
  – Latency-oriented design (=speculative)
  – Throughput-oriented design (=parallel)


                  vs



                                             16
• CPU and GPU are very different
  processors
  – Latency-oriented design (=speculative)
  – Throughput-oriented design (=parallel)


             vs vs



                                             17
CPU   task 1 task 2 task 3 task 4


          task 1
          task 2
GPU
          task 3
          task 4                    time




      vs vs



                                           18
• Speculative execution by branch prediction is
      effective to shorten the execution time. But
      it makes the hardware complicated


                                       A = 2;
                                       B = 3;
                                       C = A+B;
                                       D = A*B;
                                       E = A-B;
                                       if ( C > 4 )
                                       {
E   D   C   ?                            A = 0;
                                       }
                                       B = 0;
                                                      19
• CPU has a large cache memory and
  control unit
• GPUs devote more hardware resources
  to ALUs




                                        20
• Many simple cores
  – No speculation features
     • Simplicity to increase the number of cores on a chip
     • Fast context switch due to simplicity of its core design




                  comp.      memory access   comp.
     GPU Core A
                           comp.    memory access
                  context switch
                                   comp.               time




                                                                  21
• CPU and GPU are very different
  processors
  – They have own strengths and weaknesses
    • CPU has few big cores to shorten the execution
      time
    • GPU has many simple cores to increase
      throughput
  – CPU for serial execution and GPU for parallel
    execution

                         22
[1] Levin, E. “Grand challenges to computational
science.” Communication of the ACM
32(12):1456-1457, December 1989.

[2] Kauffmann, William J. III, and Larry L. Smarr.
Supercomputing and the Transformation.

[3] Nvidia. “Doing more with less of a scarce
resource.” http://www.nvidia.com/object/gcr-
energy-efficiency.html

                         23

More Related Content

Viewers also liked

10 gafes que você não pode cometer numa entrevista de emprego
10 gafes que você não pode cometer numa entrevista de emprego10 gafes que você não pode cometer numa entrevista de emprego
10 gafes que você não pode cometer numa entrevista de empregoAna Cunha
 
Electrocardiograma normal y Arritmias
Electrocardiograma normal y ArritmiasElectrocardiograma normal y Arritmias
Electrocardiograma normal y ArritmiasCatalina Guajardo
 
Entrevista para padres y alumnos 15 16
Entrevista para padres y alumnos 15 16Entrevista para padres y alumnos 15 16
Entrevista para padres y alumnos 15 16Any Flores
 
Exerccios sobre o Sistema Muscular
Exerccios sobre o Sistema MuscularExerccios sobre o Sistema Muscular
Exerccios sobre o Sistema MuscularJuarez Silva
 
Valoracion de enfermeria por Dominios
Valoracion de enfermeria por DominiosValoracion de enfermeria por Dominios
Valoracion de enfermeria por Dominiosmiguel hilario
 

Viewers also liked (8)

Sistema arterial posterior
Sistema arterial posteriorSistema arterial posterior
Sistema arterial posterior
 
10 gafes que você não pode cometer numa entrevista de emprego
10 gafes que você não pode cometer numa entrevista de emprego10 gafes que você não pode cometer numa entrevista de emprego
10 gafes que você não pode cometer numa entrevista de emprego
 
Teoriasevolutivas
TeoriasevolutivasTeoriasevolutivas
Teoriasevolutivas
 
Electrocardiograma normal y Arritmias
Electrocardiograma normal y ArritmiasElectrocardiograma normal y Arritmias
Electrocardiograma normal y Arritmias
 
Entrevista para padres y alumnos 15 16
Entrevista para padres y alumnos 15 16Entrevista para padres y alumnos 15 16
Entrevista para padres y alumnos 15 16
 
(2015-09-16)sol
(2015-09-16)sol(2015-09-16)sol
(2015-09-16)sol
 
Exerccios sobre o Sistema Muscular
Exerccios sobre o Sistema MuscularExerccios sobre o Sistema Muscular
Exerccios sobre o Sistema Muscular
 
Valoracion de enfermeria por Dominios
Valoracion de enfermeria por DominiosValoracion de enfermeria por Dominios
Valoracion de enfermeria por Dominios
 

Recently uploaded

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Scriptwesley chun
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024Results
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Recently uploaded (20)

Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 
Automating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps ScriptAutomating Google Workspace (GWS) & more with Apps Script
Automating Google Workspace (GWS) & more with Apps Script
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdfThe Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
The Role of Taxonomy and Ontology in Semantic Layers - Heather Hedden.pdf
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024A Call to Action for Generative AI in 2024
A Call to Action for Generative AI in 2024
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Heterogeneous Parallel Computing with GPU: From a Dummy for Dummies

  • 1. For Dummies From a Dummy Ngobrol Ilmiah PPIS #1 16 Desember, 2012 M. Alfian Amrizal Tohoku University
  • 2. • Introduction to Parallel Computing • GPU as an Accelerator 2
  • 3. Classical science Nature Observation Theory blogs.sundaymercury.net Physical Experiments conserve-energy-future.com Numerical Simulations Modern science 3 SX-9 (Tohoku University)
  • 4. Quantum chemistry Cosmology CFD autoevolution.com scidacreview.org physicsworld.com Medicine Material design albertkents.com solid.me.tut.ac.jp 4
  • 5. • Supercomputer – The most powerful computers that can be built[2] – First computer “ENIAC” ⇒ 350 mult/sec (1946) – Todays supercomputer > 1,000,000,000 x ENIACS – Todays processor speed only ~ 1,000,000 x ENIACS (?) “Parallel computing” cbc.ca datacenterknowledge.com allvoices.com 5
  • 6. CPU: The brain of the computer, all data is processed here Memory: The computers scratch pad, programs are loaded and run here GPU: For graphics processing. Used as accelerator in HPC Storage: Hold data and program files 6
  • 7. •  The free lunch is over!! -Heat -Power restriction -Transistor size CPU arent getting any faster 7
  • 8. • Multicomputers • Multicore Core1 Core2 Distributed memory Shared memory parallel computer parallel computer (e.g. dual core, quad core etc) 8
  • 9. • Trends in HPC system design – More nodes/processors/cores – Deep memory hierarchies – Non-uniform interconnect network – Accelerators  today’s topic N N P P … … C C N P C … CC C A C … C N P P …… PP C C C M C … C N N N N P PP CCC … …… CC C M M … A C C C … P PP CCC …… … CC C …… A C C C C P C P C P C …… C CC M M A C … C … ……… P CC A C C C P C C … C C MMM MM M C C C M M M M M M MM M C … C M M M M M C … CM M M M M Good old days! M One proc. / node One core / proc. Too complicated … Uniform network… How can we fully exploit the potential? 9
  • 10. • Programmers need to learn both Hardware and Software Figure: Markus Pueschel 10
  • 11. • We need a powerful computer • CPU speed cannot be increased anymore • Go parallel: – Multicomputer – Multicore • System’s complexity requires programmer to learn both HW and SW 11
  • 12. • Introduction to Parallel Computing • GPU as Accelerator 12
  • 13. 13
  • 14. • Power is the problem – System size is limited by power budget • Heterogeneous system is promising – CPU + Accelerator (=GPU) – CPU and GPU have their own strengths and weaknesses – CPU: few cores, high frequency (~GHz) – GPU: 1000 cores, low frequency (~MHz) 14
  • 15. • Graphics Processing Unit (GPU) – Originally developed for quickly generating 2D and 3D graphics, images, and video – Highly parallel processor – GPU is more power-efficient than CPU[3] *Image from nvidia.com 15
  • 16. • CPU and GPU are very different processors – Latency-oriented design (=speculative) – Throughput-oriented design (=parallel) vs 16
  • 17. • CPU and GPU are very different processors – Latency-oriented design (=speculative) – Throughput-oriented design (=parallel) vs vs 17
  • 18. CPU task 1 task 2 task 3 task 4 task 1 task 2 GPU task 3 task 4 time vs vs 18
  • 19. • Speculative execution by branch prediction is effective to shorten the execution time. But it makes the hardware complicated A = 2; B = 3; C = A+B; D = A*B; E = A-B; if ( C > 4 ) { E D C ? A = 0; } B = 0; 19
  • 20. • CPU has a large cache memory and control unit • GPUs devote more hardware resources to ALUs 20
  • 21. • Many simple cores – No speculation features • Simplicity to increase the number of cores on a chip • Fast context switch due to simplicity of its core design comp. memory access comp. GPU Core A comp. memory access context switch comp. time 21
  • 22. • CPU and GPU are very different processors – They have own strengths and weaknesses • CPU has few big cores to shorten the execution time • GPU has many simple cores to increase throughput – CPU for serial execution and GPU for parallel execution 22
  • 23. [1] Levin, E. “Grand challenges to computational science.” Communication of the ACM 32(12):1456-1457, December 1989. [2] Kauffmann, William J. III, and Larry L. Smarr. Supercomputing and the Transformation. [3] Nvidia. “Doing more with less of a scarce resource.” http://www.nvidia.com/object/gcr- energy-efficiency.html 23