SlideShare una empresa de Scribd logo
1 de 13
Descargar para leer sin conexión
Parallel Programming

       By Roman Okolovich
Overview
   Traditionally, computer software has been written for serial
    computation. To solve a problem, an algorithm is constructed
    and implemented as a serial stream of instructions. These
    instructions are executed on a central processing unit on one
    computer. Only one instruction may execute at a time—after
    that instruction is finished, the next is executed.
   Nowadays one single machine (PC) can have multi-core
    and/or multi-processor computer architecture.
   A multiprocessor computer architecture where two or more
    identical processors can connect to a single shared main
    memory. Most common multiprocessor systems today use an
    SMP (symmetric multiprocessing) architecture. In the case of
    multi-core processors, the SMP architecture applies to the
    cores, treating them as separate processors.
Speedup
   The amount of performance gained by
    the use of a multi-core processor is
    strongly dependent on the software
    algorithms and implementation. In
    particular, the possible gains are limited
    by the fraction of the software that can
    be "parallelized" to run on multiple cores
    simultaneously; this effect is described
    by Amdahl's law. In the best case, so-
    called embarrassingly parallel problems
    may realize speedup factors near the
    number of cores. Many typical
    applications, however, do not realize
    such large speedup factors and thus,
    the parallelization of software is a
    significant on-going topic of research.
Intel Atom
   Nokia Booklet 3G - Intel® Atom™ Z530, 1.6 GHz
   Intel Atom is the brand name for a line of ultra-low-voltage x86
    and x86-64 CPUs (or microprocessors) from Intel, designed in 45
    nm CMOS and used mainly in Netbooks, Nettops and MIDs.
   Intel Atom can execute up to two instructions per cycle. The
    performance of a single core Atom is equal to around half that
    offered by an equivalent Celeron.
   Hyper-threading (officially termed Hyper-Threading Technology
    or HTT) is an Intel-proprietary technology used to improve
    parallelization of computations (doing multiple tasks at once)
    performed on PC microprocessors.
   A processor with hyper-threading enabled is treated by the
    operating system as two processors instead of one. This means
    that only one processor is physically present but the operating
    system sees two virtual processors, and shares the workload
    between them.
   The advantages of hyper-threading are listed as: improved
    support for multi-threaded code, allowing multiple threads to run
    simultaneously, improved reaction and response time.
Instruction level parallelism
   Instruction-level parallelism (ILP) is a measure of how
    many of the operations in a computer program can be
    performed simultaneously. Consider the following
    program:
   1. e = a + b
    2. f = c + d
    3. g = e * f
   Operation 3 depends on the results of operations 1 and
    2, so it cannot be calculated until both of them are
    completed. However, operations 1 and 2 do not depend
    on any other operation, so they can be calculated
    simultaneously. (See also: Data dependency) If we
    assume that each operation can be completed in one unit
    of time then these three instructions can be completed in
    a total of two units of time, giving an ILP of 3/2.
Qt 4's Multithreading
   Qt provides thread support in the form of platform-independent threading
    classes, a thread-safe way of posting events, and signal-slot connections
    across threads. This makes it easy to develop portable multithreaded Qt
    applications and take advantage of multiprocessor machines.
       QThread provides the means to start a new thread.
       QThreadStorage provides per-thread data storage.
       QThreadPool manages a pool of threads that run QRunnable objects.
       QRunnable is an abstract class representing a runnable object.
       QMutex provides a mutual exclusion lock, or mutex.
       QMutexLocker is a convenience class that automatically locks and unlocks a
        QMutex.
       QReadWriteLock provides a lock that allows simultaneous read access.
       QReadLocker and QWriteLocker are convenience classes that automatically lock
        and unlock a QReadWriteLock.
       QSemaphore provides an integer semaphore (a generalization of a mutex).
       QWaitCondition provides a way for threads to go to sleep until woken up by
        another thread.
       QAtomicInt provides atomic operations on integers.
       QAtomicPointer provides atomic operations on pointers.
OpenMP
   The OpenMP Application Program Interface (API) supports multi-platform
    shared-memory parallel programming in C/C++ and Fortran on all
    architectures, including Unix platforms and Windows NT platforms.
   OpenMP is a portable, scalable model that gives shared-memory parallel
    programmers a simple and flexible interface for developing parallel
    applications for platforms ranging from the desktop to the supercomputer.
   The designers of OpenMP wanted to provide an easy method to thread
    applications without requiring that the programmer know how to create,
    synchronize, and destroy threads or even requiring him or her to determine
    how many threads to create. To achieve these ends, the OpenMP designers
    developed a platform-independent set of compiler pragmas, directives,
    function calls, and environment variables that explicitly instruct the compiler
    how and where to insert threads into the application.
   Most loops can be threaded by inserting only one pragma right before the
    loop. Further, by leaving the nitty-gritty details to the compiler and OpenMP,
    you can spend more time determining which loops should be threaded and
    how to best restructure the algorithms for maximum performance.
OpenMP Example                                   • OpenMP places the following five restrictions on
#include <omp.h>                                      which loops can be threaded:
#include <stdio.h>                                       • The loop variable must be of type signed
int main() {                                               integer. Unsigned integers, such as
#pragma omp parallel                                       DWORD's, will not work.
printf("Hello from thread %d, nthreads %dn",            • The comparison operation must be in the
    omp_get_thread_num(), omp_get_num_threads());          form loop_variable <, <=, >, or >=
}                                                          loop_invariant_integer
                                                         • The third expression or increment portion of
//-------------------------------------------              the for loop must be either integer addition
#pragma omp parallel shared(n,a,b)                         or integer subtraction and by a loop
{                                                          invariant value.
  #pragma omp for                                        • If the comparison operation is < or <=, the
  for (int i=0; i<n; i++)                                  loop variable must increment on every
  {                                                        iteration, and conversely, if the comparison
   a[i] = i + 1;                                           operation is > or >=, the loop variable must
   #pragma omp parallel for                                decrement on every iteration.
   /*-- Okay - This is a parallel region --*/            • The loop must be a basic block, meaning
   for (int j=0; j<n; j++)                                 no jumps from the inside of the loop to the
    b[i][j] = a[i];                                        outside are permitted with the exception of
  }
                                                           the exit statement, which terminates the
} /*-- End of parallel region --*/
                                                           whole application. If the statements goto or
//-------------------------------------------
                                                           break are used, they must jump within the
#pragma omp parallel for
                                                           loop, not outside it. The same goes for
for (i=0; i < numPixels; i++)
                                                           exception handling; exceptions must be
{                                                          caught within the loop.
   pGrayScaleBitmap[i] = (unsigned BYTE)
            (pRGBBitmap[i].red * 0.299 +
             pRGBBitmap[i].green * 0.587 +
             pRGBBitmap[i].blue * 0.114);
}
OpenMP and Visual Studio
Intel Threading Building Blocks (TBB)
   Intel® Threading Building Blocks (Intel® TBB) is an award-winning C++ template
    library that abstracts threads to tasks to create reliable, portable, and scalable
    parallel applications. Just as the C++ Standard Template Library (STL) extends the
    core language, Intel TBB offers C++ users a higher level abstraction for parallelism.
    To implement Intel TBB, developers use familiar C++ templates and coding style,
    leaving low-level threading details to the library. It is also portable between
    architectures and operating systems.
   Intel® TBB for Windows (Linux, Mac OS) costs $299 per sit.


     #include   <iostream>
     #include   <string>
     #include   “tbb/parallel_for.h”
     #include   “tbb/blocked_range.h”
     using namespace tbb;
     using namespace std;
     int main() {
       //...
       parallel_for(blocked_range<size_t>(0, to_scan.size() ),
                    SubStringFinder( to_scan, max, pos ));
       //...
       return 0;
     }
Parallel Pattern Library (PPL)
   The Concurrency Runtime is a concurrent programming framework for C++.
    The Concurrency Runtime simplifies parallel programming and helps you
    write robust, scalable, and responsive parallel applications.
   The features that the Concurrency Runtime provides are unified by a
    common work scheduler. This work scheduler implements a work-stealing
    algorithm that enables your application to scale as the number of available
    processors increases.
   The Concurrency Runtime enables the following programming patterns and
    concepts:
       Imperative data parallelism: Parallel algorithms distribute computations on
        collections or on sets of data across multiple processors.
       Task parallelism: Task objects distribute multiple independent operations across
        processors.
       Declarative data parallelism: Asynchronous agents and message passing enable
        you to declare what computation has to be performed, but not how it is performed.
       Asynchrony: Asynchronous agents make productive use of latency by doing work
        while waiting for data.
   The Concurrency Runtime is provided as part of the C Runtime Library
    (CRT).
   Only Visual Studio 2010 supports PPL
Concurrency Runtime Architecture
   The Concurrency Runtime is divided into four components: the
    Parallel Patterns Library (PPL), the Asynchronous Agents Library,
    the work scheduler, and the resource manager. These components
    reside between the operating system and applications. The
    following illustration shows how the Concurrency Runtime
    components interact among the operating system and applications:
                               struct LongRunningOperationMsg{
                                       LongRunningOperationMsg (int x, int y)
                                       : m_x(x),m_y(y){}
                                       int m_x;
                                       int m_y;
                               }
                               call<LongRunningOperationMsg>*
                                LongRunningOperationCall = new
                                  call<LongRunningOperationMsg>([](
                                LongRunningOperationMsg msg)
                                {
                                 LongRunningOperation(msg.x, msg.y);
                                })
                               void SomeFunction(int x, int y){
                                   asend(LongRunningOperationCall,
                                         LongRunningOperationMsg(x,y));
                               }
References
   Parallel computing
   Superscalar
   Simultaneous multithreading
   Hyper-threading
   Thread Support in Qt
   OpenMP
   Intel: Getting Started with OpenMP
   Intel® Threading Building Blocks (Intel® TBB)
   Intel® Threading Building Blocks 2.2 for Open Source
   Concurrency Runtime Library
   Four Ways to Use the Concurrency Runtime in Your C++
    Projects
   Parallel Programming in Native Code blog

Más contenido relacionado

La actualidad más candente

Programming using Open Mp
Programming using Open MpProgramming using Open Mp
Programming using Open Mp
Anshul Sharma
 
Directive-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous ComputingDirective-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous Computing
Ruymán Reyes
 
.Net Multithreading and Parallelization
.Net Multithreading and Parallelization.Net Multithreading and Parallelization
.Net Multithreading and Parallelization
Dmitri Nesteruk
 
Suyash Thesis Presentation
Suyash Thesis PresentationSuyash Thesis Presentation
Suyash Thesis Presentation
Tanvee Katyal
 
Trends of SW Platforms for Heterogeneous Multi-core systems and Open Source ...
Trends of SW Platforms for Heterogeneous Multi-core systems and  Open Source ...Trends of SW Platforms for Heterogeneous Multi-core systems and  Open Source ...
Trends of SW Platforms for Heterogeneous Multi-core systems and Open Source ...
Seunghwa Song
 
May2010 hex-core-opt
May2010 hex-core-optMay2010 hex-core-opt
May2010 hex-core-opt
Jeff Larkin
 
Compiler optimization
Compiler optimizationCompiler optimization
Compiler optimization
liu_ming50
 
Automatic Generation of Peephole Superoptimizers
Automatic Generation of Peephole SuperoptimizersAutomatic Generation of Peephole Superoptimizers
Automatic Generation of Peephole Superoptimizers
keanumit
 

La actualidad más candente (20)

Open mp intro_01
Open mp intro_01Open mp intro_01
Open mp intro_01
 
Parallelization using open mp
Parallelization using open mpParallelization using open mp
Parallelization using open mp
 
OpenMP And C++
OpenMP And C++OpenMP And C++
OpenMP And C++
 
MPI n OpenMP
MPI n OpenMPMPI n OpenMP
MPI n OpenMP
 
openmp
openmpopenmp
openmp
 
Introduction to OpenMP (Performance)
Introduction to OpenMP (Performance)Introduction to OpenMP (Performance)
Introduction to OpenMP (Performance)
 
Introduction to OpenMP
Introduction to OpenMPIntroduction to OpenMP
Introduction to OpenMP
 
OpenMp
OpenMpOpenMp
OpenMp
 
Open mp library functions and environment variables
Open mp library functions and environment variablesOpen mp library functions and environment variables
Open mp library functions and environment variables
 
Programming using Open Mp
Programming using Open MpProgramming using Open Mp
Programming using Open Mp
 
Openmp
OpenmpOpenmp
Openmp
 
Directive-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous ComputingDirective-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous Computing
 
.Net Multithreading and Parallelization
.Net Multithreading and Parallelization.Net Multithreading and Parallelization
.Net Multithreading and Parallelization
 
Suyash Thesis Presentation
Suyash Thesis PresentationSuyash Thesis Presentation
Suyash Thesis Presentation
 
CFD - OpenFOAM
CFD - OpenFOAMCFD - OpenFOAM
CFD - OpenFOAM
 
Trends of SW Platforms for Heterogeneous Multi-core systems and Open Source ...
Trends of SW Platforms for Heterogeneous Multi-core systems and  Open Source ...Trends of SW Platforms for Heterogeneous Multi-core systems and  Open Source ...
Trends of SW Platforms for Heterogeneous Multi-core systems and Open Source ...
 
May2010 hex-core-opt
May2010 hex-core-optMay2010 hex-core-opt
May2010 hex-core-opt
 
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
Computação Paralela: Benefícios e Desafios - Intel Software Conference 2013
 
Compiler optimization
Compiler optimizationCompiler optimization
Compiler optimization
 
Automatic Generation of Peephole Superoptimizers
Automatic Generation of Peephole SuperoptimizersAutomatic Generation of Peephole Superoptimizers
Automatic Generation of Peephole Superoptimizers
 

Destacado

process models- software engineering
process models- software engineeringprocess models- software engineering
process models- software engineering
Arun Nair
 

Destacado (8)

code analysis for c++
code analysis for c++code analysis for c++
code analysis for c++
 
SDLC, Iterative Model
SDLC, Iterative ModelSDLC, Iterative Model
SDLC, Iterative Model
 
process models- software engineering
process models- software engineeringprocess models- software engineering
process models- software engineering
 
Introduction to parallel processing
Introduction to parallel processingIntroduction to parallel processing
Introduction to parallel processing
 
Agile vs Iterative vs Waterfall models
Agile vs Iterative vs Waterfall models Agile vs Iterative vs Waterfall models
Agile vs Iterative vs Waterfall models
 
Software Development Life Cycle (SDLC)
Software Development Life Cycle (SDLC)Software Development Life Cycle (SDLC)
Software Development Life Cycle (SDLC)
 
Process models
Process modelsProcess models
Process models
 
List of Software Development Model and Methods
List of Software Development Model and MethodsList of Software Development Model and Methods
List of Software Development Model and Methods
 

Similar a Parallel Programming

Buffer overflow tutorial
Buffer overflow tutorialBuffer overflow tutorial
Buffer overflow tutorial
hughpearse
 

Similar a Parallel Programming (20)

Parallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPParallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMP
 
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
 
Migration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming ModelsMigration To Multi Core - Parallel Programming Models
Migration To Multi Core - Parallel Programming Models
 
25-MPI-OpenMP.pptx
25-MPI-OpenMP.pptx25-MPI-OpenMP.pptx
25-MPI-OpenMP.pptx
 
Chap7 slides
Chap7 slidesChap7 slides
Chap7 slides
 
Omp tutorial cpugpu_programming_cdac
Omp tutorial cpugpu_programming_cdacOmp tutorial cpugpu_programming_cdac
Omp tutorial cpugpu_programming_cdac
 
Parllelizaion
ParllelizaionParllelizaion
Parllelizaion
 
Matlab ppt
Matlab pptMatlab ppt
Matlab ppt
 
OpenMP
OpenMPOpenMP
OpenMP
 
Algoritmi e Calcolo Parallelo 2012/2013 - OpenMP
Algoritmi e Calcolo Parallelo 2012/2013 - OpenMPAlgoritmi e Calcolo Parallelo 2012/2013 - OpenMP
Algoritmi e Calcolo Parallelo 2012/2013 - OpenMP
 
Introduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimizationIntroduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimization
 
Buffer overflow tutorial
Buffer overflow tutorialBuffer overflow tutorial
Buffer overflow tutorial
 
OpenMP.pptx
OpenMP.pptxOpenMP.pptx
OpenMP.pptx
 
openmpfinal.pdf
openmpfinal.pdfopenmpfinal.pdf
openmpfinal.pdf
 
Threads
ThreadsThreads
Threads
 
CS4961-L9.ppt
CS4961-L9.pptCS4961-L9.ppt
CS4961-L9.ppt
 
Lecture6
Lecture6Lecture6
Lecture6
 
Go1
Go1Go1
Go1
 
openmp final2.pptx
openmp final2.pptxopenmp final2.pptx
openmp final2.pptx
 
Introduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimizationIntroduction to Parallelization ans performance optimization
Introduction to Parallelization ans performance optimization
 

Más de Roman Okolovich

Visual Studio 2008 Overview
Visual Studio 2008 OverviewVisual Studio 2008 Overview
Visual Studio 2008 Overview
Roman Okolovich
 

Más de Roman Okolovich (10)

Unit tests and TDD
Unit tests and TDDUnit tests and TDD
Unit tests and TDD
 
C# XML documentation
C# XML documentationC# XML documentation
C# XML documentation
 
Using QString effectively
Using QString effectivelyUsing QString effectively
Using QString effectively
 
Ram Disk
Ram DiskRam Disk
Ram Disk
 
64 bits for developers
64 bits for developers64 bits for developers
64 bits for developers
 
Virtual Functions
Virtual FunctionsVirtual Functions
Virtual Functions
 
Visual Studio 2008 Overview
Visual Studio 2008 OverviewVisual Studio 2008 Overview
Visual Studio 2008 Overview
 
State Machine Framework
State Machine FrameworkState Machine Framework
State Machine Framework
 
The Big Three
The Big ThreeThe Big Three
The Big Three
 
Smart Pointers
Smart PointersSmart Pointers
Smart Pointers
 

Último

Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Safe Software
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
panagenda
 

Último (20)

Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)Introduction to Multilingual Retrieval Augmented Generation (RAG)
Introduction to Multilingual Retrieval Augmented Generation (RAG)
 
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers:  A Deep Dive into Serverless Spatial Data and FMECloud Frontiers:  A Deep Dive into Serverless Spatial Data and FME
Cloud Frontiers: A Deep Dive into Serverless Spatial Data and FME
 
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin WoodPolkadot JAM Slides - Token2049 - By Dr. Gavin Wood
Polkadot JAM Slides - Token2049 - By Dr. Gavin Wood
 
Corporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptxCorporate and higher education May webinar.pptx
Corporate and higher education May webinar.pptx
 
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemkeProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
ProductAnonymous-April2024-WinProductDiscovery-MelissaKlemke
 
Why Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire businessWhy Teams call analytics are critical to your entire business
Why Teams call analytics are critical to your entire business
 
Platformless Horizons for Digital Adaptability
Platformless Horizons for Digital AdaptabilityPlatformless Horizons for Digital Adaptability
Platformless Horizons for Digital Adaptability
 
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​Elevate Developer Efficiency & build GenAI Application with Amazon Q​
Elevate Developer Efficiency & build GenAI Application with Amazon Q​
 
DBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor PresentationDBX First Quarter 2024 Investor Presentation
DBX First Quarter 2024 Investor Presentation
 
WSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering DevelopersWSO2's API Vision: Unifying Control, Empowering Developers
WSO2's API Vision: Unifying Control, Empowering Developers
 
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
Apidays New York 2024 - Passkeys: Developing APIs to enable passwordless auth...
 
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
Web Form Automation for Bonterra Impact Management (fka Social Solutions Apri...
 
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
Biography Of Angeliki Cooney | Senior Vice President Life Sciences | Albany, ...
 
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
Apidays New York 2024 - Accelerating FinTech Innovation by Vasa Krishnan, Fin...
 
Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..Understanding the FAA Part 107 License ..
Understanding the FAA Part 107 License ..
 
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot ModelMcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
Mcleodganj Call Girls 🥰 8617370543 Service Offer VIP Hot Model
 
MS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectorsMS Copilot expands with MS Graph connectors
MS Copilot expands with MS Graph connectors
 
Artificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : UncertaintyArtificial Intelligence Chap.5 : Uncertainty
Artificial Intelligence Chap.5 : Uncertainty
 
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
Navigating the Deluge_ Dubai Floods and the Resilience of Dubai International...
 
MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024MINDCTI Revenue Release Quarter One 2024
MINDCTI Revenue Release Quarter One 2024
 

Parallel Programming

  • 1. Parallel Programming By Roman Okolovich
  • 2. Overview  Traditionally, computer software has been written for serial computation. To solve a problem, an algorithm is constructed and implemented as a serial stream of instructions. These instructions are executed on a central processing unit on one computer. Only one instruction may execute at a time—after that instruction is finished, the next is executed.  Nowadays one single machine (PC) can have multi-core and/or multi-processor computer architecture.  A multiprocessor computer architecture where two or more identical processors can connect to a single shared main memory. Most common multiprocessor systems today use an SMP (symmetric multiprocessing) architecture. In the case of multi-core processors, the SMP architecture applies to the cores, treating them as separate processors.
  • 3. Speedup  The amount of performance gained by the use of a multi-core processor is strongly dependent on the software algorithms and implementation. In particular, the possible gains are limited by the fraction of the software that can be "parallelized" to run on multiple cores simultaneously; this effect is described by Amdahl's law. In the best case, so- called embarrassingly parallel problems may realize speedup factors near the number of cores. Many typical applications, however, do not realize such large speedup factors and thus, the parallelization of software is a significant on-going topic of research.
  • 4. Intel Atom  Nokia Booklet 3G - Intel® Atom™ Z530, 1.6 GHz  Intel Atom is the brand name for a line of ultra-low-voltage x86 and x86-64 CPUs (or microprocessors) from Intel, designed in 45 nm CMOS and used mainly in Netbooks, Nettops and MIDs.  Intel Atom can execute up to two instructions per cycle. The performance of a single core Atom is equal to around half that offered by an equivalent Celeron.  Hyper-threading (officially termed Hyper-Threading Technology or HTT) is an Intel-proprietary technology used to improve parallelization of computations (doing multiple tasks at once) performed on PC microprocessors.  A processor with hyper-threading enabled is treated by the operating system as two processors instead of one. This means that only one processor is physically present but the operating system sees two virtual processors, and shares the workload between them.  The advantages of hyper-threading are listed as: improved support for multi-threaded code, allowing multiple threads to run simultaneously, improved reaction and response time.
  • 5. Instruction level parallelism  Instruction-level parallelism (ILP) is a measure of how many of the operations in a computer program can be performed simultaneously. Consider the following program:  1. e = a + b 2. f = c + d 3. g = e * f  Operation 3 depends on the results of operations 1 and 2, so it cannot be calculated until both of them are completed. However, operations 1 and 2 do not depend on any other operation, so they can be calculated simultaneously. (See also: Data dependency) If we assume that each operation can be completed in one unit of time then these three instructions can be completed in a total of two units of time, giving an ILP of 3/2.
  • 6. Qt 4's Multithreading  Qt provides thread support in the form of platform-independent threading classes, a thread-safe way of posting events, and signal-slot connections across threads. This makes it easy to develop portable multithreaded Qt applications and take advantage of multiprocessor machines.  QThread provides the means to start a new thread.  QThreadStorage provides per-thread data storage.  QThreadPool manages a pool of threads that run QRunnable objects.  QRunnable is an abstract class representing a runnable object.  QMutex provides a mutual exclusion lock, or mutex.  QMutexLocker is a convenience class that automatically locks and unlocks a QMutex.  QReadWriteLock provides a lock that allows simultaneous read access.  QReadLocker and QWriteLocker are convenience classes that automatically lock and unlock a QReadWriteLock.  QSemaphore provides an integer semaphore (a generalization of a mutex).  QWaitCondition provides a way for threads to go to sleep until woken up by another thread.  QAtomicInt provides atomic operations on integers.  QAtomicPointer provides atomic operations on pointers.
  • 7. OpenMP  The OpenMP Application Program Interface (API) supports multi-platform shared-memory parallel programming in C/C++ and Fortran on all architectures, including Unix platforms and Windows NT platforms.  OpenMP is a portable, scalable model that gives shared-memory parallel programmers a simple and flexible interface for developing parallel applications for platforms ranging from the desktop to the supercomputer.  The designers of OpenMP wanted to provide an easy method to thread applications without requiring that the programmer know how to create, synchronize, and destroy threads or even requiring him or her to determine how many threads to create. To achieve these ends, the OpenMP designers developed a platform-independent set of compiler pragmas, directives, function calls, and environment variables that explicitly instruct the compiler how and where to insert threads into the application.  Most loops can be threaded by inserting only one pragma right before the loop. Further, by leaving the nitty-gritty details to the compiler and OpenMP, you can spend more time determining which loops should be threaded and how to best restructure the algorithms for maximum performance.
  • 8. OpenMP Example • OpenMP places the following five restrictions on #include <omp.h> which loops can be threaded: #include <stdio.h> • The loop variable must be of type signed int main() { integer. Unsigned integers, such as #pragma omp parallel DWORD's, will not work. printf("Hello from thread %d, nthreads %dn", • The comparison operation must be in the omp_get_thread_num(), omp_get_num_threads()); form loop_variable <, <=, >, or >= } loop_invariant_integer • The third expression or increment portion of //------------------------------------------- the for loop must be either integer addition #pragma omp parallel shared(n,a,b) or integer subtraction and by a loop { invariant value. #pragma omp for • If the comparison operation is < or <=, the for (int i=0; i<n; i++) loop variable must increment on every { iteration, and conversely, if the comparison a[i] = i + 1; operation is > or >=, the loop variable must #pragma omp parallel for decrement on every iteration. /*-- Okay - This is a parallel region --*/ • The loop must be a basic block, meaning for (int j=0; j<n; j++) no jumps from the inside of the loop to the b[i][j] = a[i]; outside are permitted with the exception of } the exit statement, which terminates the } /*-- End of parallel region --*/ whole application. If the statements goto or //------------------------------------------- break are used, they must jump within the #pragma omp parallel for loop, not outside it. The same goes for for (i=0; i < numPixels; i++) exception handling; exceptions must be { caught within the loop. pGrayScaleBitmap[i] = (unsigned BYTE) (pRGBBitmap[i].red * 0.299 + pRGBBitmap[i].green * 0.587 + pRGBBitmap[i].blue * 0.114); }
  • 10. Intel Threading Building Blocks (TBB)  Intel® Threading Building Blocks (Intel® TBB) is an award-winning C++ template library that abstracts threads to tasks to create reliable, portable, and scalable parallel applications. Just as the C++ Standard Template Library (STL) extends the core language, Intel TBB offers C++ users a higher level abstraction for parallelism. To implement Intel TBB, developers use familiar C++ templates and coding style, leaving low-level threading details to the library. It is also portable between architectures and operating systems.  Intel® TBB for Windows (Linux, Mac OS) costs $299 per sit. #include <iostream> #include <string> #include “tbb/parallel_for.h” #include “tbb/blocked_range.h” using namespace tbb; using namespace std; int main() { //... parallel_for(blocked_range<size_t>(0, to_scan.size() ), SubStringFinder( to_scan, max, pos )); //... return 0; }
  • 11. Parallel Pattern Library (PPL)  The Concurrency Runtime is a concurrent programming framework for C++. The Concurrency Runtime simplifies parallel programming and helps you write robust, scalable, and responsive parallel applications.  The features that the Concurrency Runtime provides are unified by a common work scheduler. This work scheduler implements a work-stealing algorithm that enables your application to scale as the number of available processors increases.  The Concurrency Runtime enables the following programming patterns and concepts:  Imperative data parallelism: Parallel algorithms distribute computations on collections or on sets of data across multiple processors.  Task parallelism: Task objects distribute multiple independent operations across processors.  Declarative data parallelism: Asynchronous agents and message passing enable you to declare what computation has to be performed, but not how it is performed.  Asynchrony: Asynchronous agents make productive use of latency by doing work while waiting for data.  The Concurrency Runtime is provided as part of the C Runtime Library (CRT).  Only Visual Studio 2010 supports PPL
  • 12. Concurrency Runtime Architecture  The Concurrency Runtime is divided into four components: the Parallel Patterns Library (PPL), the Asynchronous Agents Library, the work scheduler, and the resource manager. These components reside between the operating system and applications. The following illustration shows how the Concurrency Runtime components interact among the operating system and applications: struct LongRunningOperationMsg{ LongRunningOperationMsg (int x, int y) : m_x(x),m_y(y){} int m_x; int m_y; } call<LongRunningOperationMsg>* LongRunningOperationCall = new call<LongRunningOperationMsg>([]( LongRunningOperationMsg msg) { LongRunningOperation(msg.x, msg.y); }) void SomeFunction(int x, int y){ asend(LongRunningOperationCall, LongRunningOperationMsg(x,y)); }
  • 13. References  Parallel computing  Superscalar  Simultaneous multithreading  Hyper-threading  Thread Support in Qt  OpenMP  Intel: Getting Started with OpenMP  Intel® Threading Building Blocks (Intel® TBB)  Intel® Threading Building Blocks 2.2 for Open Source  Concurrency Runtime Library  Four Ways to Use the Concurrency Runtime in Your C++ Projects  Parallel Programming in Native Code blog