SlideShare una empresa de Scribd logo
1 de 82
Descargar para leer sin conexión
Programming with OpenMP*




    Software &




1                © 2012 WIPRO LTD | WWW.WIPRO.COM
Agenda
    •What is OpenMP?
    •Parallel regions
    •Data-Sharing Attribute Clauses
    •Worksharing
    •OpenMP 3.0 Tasks
    •Synchronization
    •Runtime functions/environment variables
    •Optional Advanced topics




       Software & Services Group, Developer Products Division
      Software & Services Group


2                                             © 2012 WIPRO LTD | WWW.WIPRO.COM
What Is OpenMP?




      Software & Services Group, Developer Products Division
    Software & Services Group




    Developer©Products Division All rights reserved.
     Copyright 2010, Intel Corporation.
                                                                                    Copyright© 2011, Intel Corporation. All rights reserved.
      *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners.                       3/14/2012   3   3
3                                                                           brands and names of
                                                                                     © 2012 WIPRO LTD | WWW.WIPRO.COM
Programming Model

     Fork-Join Parallelism:
     •    Master thread spawns a team of threads as needed
     •    Parallelism is added incrementally: that is, the sequential
          program evolves into a parallel program




    Master
    Thread

                                                     Parallel Regions

           Software & Services Group, Developer Products Division
         Software & ServicesDivision
         Developer Products Group
                                                                                     3/14/2012   4   4
4                                                 © 2012 WIPRO LTD | WWW.WIPRO.COM
A Few Details to Get Started

    •Compiler option /Qopenmp in Windows
          or –openmp in Linux
    •Most of the constructs in OpenMP are compiler
      directives or pragmas
       –For C and C++, the pragmas take the form:
                   #pragma omp construct [clause [clause]…]
        –For Fortran, the directives take one of the forms:
                   C$OMP construct [clause [clause]…]
                   !$OMP construct [clause [clause]…]
                   *$OMP construct [clause [clause]…]
    •Header file or Fortran module
                   #include “omp.h”
                   use omp_lib


       Software & Services Group, Developer Products Division
      Software & Services Group

5                                             © 2012 WIPRO LTD | WWW.WIPRO.COM
Agenda
    •What is OpenMP?
    •Parallel regions
    •Data-Sharing Attribute Clauses
    •Worksharing
    •OpenMP 3.0 Tasks
    •Synchronization
    •Runtime functions/environment variables
    •Intel® Parallel Debugger Extension
    •Optional Advanced topics



       Software & Services Group, Developer Products Division
      Software & Services Group


6                                             © 2012 WIPRO LTD | WWW.WIPRO.COM
Parallel Region & Structured Blocks
                    (C/C++)
                        • Most OpenMP constructs apply to structured blocks
                                            – Structured block: a block with one point of entry at the top
                                              and one point of exit at the bottom
                                            – The only “branches” allowed are STOP statements in Fortran
                                              and exit() in C/C++

                #pragma omp parallel                                                                                                      if (go_now()) goto more;
                {                                                                                                                                       #pragma omp parallel
                  int id = omp_get_thread_num();                                                                 {
                                                                                                                            int id = omp_get_thread_num();
                more: res[id] = do_big_job (id);                                                                 more: res[id] = do_big_job(id);
                                                                                                                            if (conv (res[id])) goto done;
                    if (conv (res[id])) goto more;                                                                 goto more;
                }                                                                                                                     }
                printf (“All donen”);                                                                                  done: if (!really_done()) goto more;

              A structured block                                                                                                               Not a structured block
      Software & Services Group, Developer Products Division
    Software & Services Group
    Developer©Products Division All rights reserved.
      Copyright 2010, Intel Corporation.
                                                                                    Copyright© 2011, Intel Corporation. All rights reserved.
      *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners.
                                                                            brands and names of                                                                   3/14/2012   7   7

7                                                                                    © 2012 WIPRO LTD | WWW.WIPRO.COM
Agenda
    •What is OpenMP?
    •Parallel regions
    •Data-Sharing Attribute Clauses
    •Worksharing
    •OpenMP 3.0 Tasks
    •Synchronization
    •Runtime functions/environment variables
    •Intel® Parallel Debugger Extension
    •Optional Advanced topics



       Software & Services Group, Developer Products Division
      Software & Services Group


8                                             © 2012 WIPRO LTD | WWW.WIPRO.COM
Data-Sharing Attribute Clauses
    shared                 Declares one or more list items to be shared by tasks generated by a
                           Parallel or task construct(Default).

    private                Declares one or more list items to be private to a task.

    default                Allows the user to control the data-sharing attributes of variables that
                           Are referenced in a parallel or task construct,and whose data-sharing
                           Attributes are implicitly determined

     firstprivate          Declares one or more listitems to be private to a task,and initializes
                           Each of them with the value that the corresponding original item has
                           When the construct is encountered.

     lastprivate           Declares one or more list items to be private to an implicit task,and
                           Causes the corresponding original list item to be updated after the end
                           Of the region.

     reduction             Specifies an operator and oneor more list items.For each listitem,a
                           Private copy is created in each implicit task,and isinitialized
                           Appropriately for the operator.After the end of the region,the original
                           List item is updated with the values of the private copies using the
                           Specified operator.
       Software & Services Group



       Developer Products Division
                                                                                              3/14/2012   9   9
9                                               © 2012 WIPRO LTD | WWW.WIPRO.COM
The Private Clause
     •Reproduces the variable for each task
             – Variables are un-initialized; C++ object is default constructed
             – Any value external to the parallel region is undefined




        Software & Services Group, Developer Products Division
       Software & Services Group




10                                             © 2012 WIPRO LTD | WWW.WIPRO.COM
FirstPrivate Clause




11                     © 2012 WIPRO LTD | WWW.WIPRO.COM
Lastprivate Clause


     •Variables update shared variable using value from
       last iteration
     •C++ objects are updated as if by assignment
                    void sq2(int n,
                               double *lastterm)
                    {
                      double x; int i;
                      #pragma omp parallel
                      #pragma omp for lastprivate(x)
                      for (i = 0; i < n; i++){
                         x = a[i]*a[i] + b[i]*b[i];
                         b[i] = sqrt(x);
                      }
                      *lastterm = x;
                    }

         Software & Services Group, Developer Products Division
       Software & Services Group
       Developer Products Division
                                                                                   3/14/2012 12   12
12                                              © 2012 WIPRO LTD | WWW.WIPRO.COM
OpenMP* Reduction Clause
                 reduction (op : list)

     •The variables in “list” must be shared in the
       enclosing parallel region
     •Inside parallel or work-sharing construct:
            – A PRIVATE copy of each list variable is created and initialized
              depending on the “op”

            – These copies are updated locally by threads

            – At end of construct, local copies are combined through “op”
              into a single value and combined with the value in the original
              SHARED variable




        Software & Services Group, Developer Products Division
       Software & Services Group



13                                             © 2012 WIPRO LTD | WWW.WIPRO.COM
Activity 1 – Hello Worlds
     •Modify the “Hello, Worlds” serial code to run
      multithreaded using OpenMP*




        Software & Services Group, Developer Products Division
       Software & Services Group


14                                             © 2012 WIPRO LTD | WWW.WIPRO.COM
Agenda
     •What is OpenMP?
     •Parallel regions
     •Data-Sharing Attribute Clauses
     •Worksharing
     •OpenMP 3.0 Tasks
     •Synchronization
     •Runtime functions/environment variables
     •Optional Advanced topics




        Software & Services Group, Developer Products Division
       Software & Services Group


15                                             © 2012 WIPRO LTD | WWW.WIPRO.COM
Worksharing
     •Worksharing is the general term used in OpenMP
      to describe distribution of work across threads.

     •Three examples of worksharing in OpenMP are:
     •loop(omp for) construct
     •omp sections construct
     •omp task construct

                           Automatically divides work
                                among threads

         Software & Services Group, Developer Products Division
       Software &2010, Intel Corporation. All rights reserved.
       Developer©Products Division
         Copyright
                   Services Group                                                      Copyright© 2011, Intel Corporation. All rights reserved.
         *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners.
                                                                               brands and names of                                                3/14/2012 16   16

16                                                                                      © 2012 WIPRO LTD | WWW.WIPRO.COM
Agenda
     •What is OpenMP?
     •Parallel regions
     •Data-Sharing Attribute Clauses
     •Worksharing – Parallel For
     •OpenMP 3.0 Tasks
     •Synchronization
     •Runtime functions/environment variables
     •Optional Advanced topics




        Software & Services Group, Developer Products Division
       Software & Services Group


17                                             © 2012 WIPRO LTD | WWW.WIPRO.COM
omp for Construct


     // assume N=12
     #pragma omp parallel                                                           #pragma omp parallel
     #pragma omp for
        for(i = 1, i < N+1, i++)                                                        #pragma omp for
           c[i] = a[i] + b[i];
                                                                                  i=1           i=5                i=9
                                                                                  i=2           i=6                i = 10
     •Threads are assigned an                                                     i=3           i=7                i = 11
                                                                                  i=4           i=8                i = 12
       independent set of
       iterations                                                                          Implicit barrier
     •Threads must wait at the
       end of work-sharing
       construct

        Software & Services Group, Developer Products Division
      Software & Services Group
      Developer Products Division
                                                                                                    3/14/2012 18            18
18                                             © 2012 WIPRO LTD | WWW.WIPRO.COM
Combining constructs
     •These two code segments are equivalent

        #pragma omp parallel
        {
           #pragma omp for
           for (i=0;i< MAX; i++) {
                  res[i] = huge();
           }
        }

                                                                                                                                        #pragma omp parallel for
                                                                                                                                           for (i=0;i< MAX; i++) {
                                                                                                                                                res[i] = huge();
                                                                                                                                           }


         Software & Services Group, Developer Products Division
       Software &2010, Intel Corporation. All rights reserved.
                   Services Group
       Developer©Products Division
         Copyright
                                                                                       Copyright© 2011, Intel Corporation. All rights reserved.
         *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners.
                                                                               brands and names of                                                             3/14/2012 19   19


19                                                                                      © 2012 WIPRO LTD | WWW.WIPRO.COM
The schedule clause
     The schedule clause affects how loop iterations are mapped onto
       threads
     schedule(static [,chunk])
               – Blocks of iterations of size “chunk” to threads
               – Round robin distribution
               – Low overhead, may cause load imbalance
     schedule(dynamic[,chunk])
               – Threads grab “chunk” iterations
               – When done with iterations, thread requests next set
               – Higher threading overhead, can reduce load imbalance
     schedule(guided[,chunk])
               – Dynamic schedule starting with large block
               – Size of the blocks shrink; no smaller than “chunk”
     schedule(runtime)
               – Scheduling is deferred until run time, and the schedule and chunk size are taken
                 from the run-sched-var ICV



        Software & Services Group, Developer Products Division
       Software & Services Group



20                                             © 2012 WIPRO LTD | WWW.WIPRO.COM
Assigning Loop Iterations in OpenMP*:
     Which Schedule to Use

      ScheduleClause                                              WhenToUse
      STATIC                                                      Predictable and similar work
                                                                  Per iteration

      DYNAMIC                                                     Unpredictable,variable work
                                                                  Per iteration

      GUIDED                                                      Special case of dynamic to
                                                                  Reduce schedulingo verhead

      RUNTIME                                                     Modify schedule at run-time
                                                                  Via environment variable



         Software & Services Group, Developer Products Division
       Software & ServicesDivision
       Developer Products Group
                                                                                           3/14/2012 22
21                                                                                                        22
                                                © 2012 WIPRO LTD | WWW.WIPRO.COM
Schedule Clause Example

                    #pragma omp parallel for schedule (static, 8)
                       for( int i = start; i <= end; i += 2 )
                       {
                         if ( TestForPrime(i) ) gPrimesFound++;
                       }


     Iterations are divided into chunks of 8
       • If start = 3, then first chunk is i={3,5,7,9,11,13,15,17}




          Software & Services Group, Developer Products Division
        Software &2010, Intel Corporation. All rights reserved.
                    Services Group
        Developer©Products Division
          Copyright
                                                                                        Copyright© 2011, Intel Corporation. All rights reserved.
          *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners.
                                                                                brands and names of                                                3/14/2012 23   23

22                                                                                       © 2012 WIPRO LTD | WWW.WIPRO.COM
Nested Parallelism
#include <stdio.h>
#include <omp.h>
int main(int argc, char *argv[])
{
int n;
omp_set_nested(1);
#pragma omp parallel private(n)
{
n=omp_get_thread_num();
#pragma omp parallel
printf("thread number %d - nested thread number %dn", n, omp_get_thread_num());
}
}

thread     number           0   -   nested   thread     number          0            0   0
thread     number           0   -   nested   thread     number          1                    1
thread     number           1   -   nested   thread     number          0
thread     number           1   -   nested   thread     number          1
                                                                                     1   0
                                                                                             1


           Software & Services Group, Developer Products Division
         Software & Services Group
         Developer Products Division
                                                                                             3/14/2012 24   24
23                                                © 2012 WIPRO LTD | WWW.WIPRO.COM
The collapse clause

     Collapse is new concept in OpenMP 3.0. May be used to
     specify how many loops are associated with the loop
     construct.



                          #pragma omp parallel for collapse(2)
                                  for( i = 0; i < N; ++i )
                                  for( k = 0; k < N; ++k )
                                    for( j = 0; j < N; ++j )
                                                      foo();




        Software & Services Group, Developer Products Division
       Software & Services Group
24                                             © 2012 WIPRO LTD | WWW.WIPRO.COM
Agenda
     •What is OpenMP?
     •Parallel regions
     •Data-Sharing Attribute Clauses
     •Worksharing – Parallel Sections
     •OpenMP 3.0 Tasks
     •Synchronization
     •Runtime functions/environment variables
     •Optional Advanced topics




        Software & Services Group, Developer Products Division
       Software & Services Group


25                                             © 2012 WIPRO LTD | WWW.WIPRO.COM
Task Decomposition


     a = alice();
     b = bob();
     s = boss(a, b);
                                                                      alice               bob
     c = cy();
     printf ("%6.2fn",
          bigboss(s,c));
                                                                                   boss                    cy


                                                                                           big
      alice,bob, and cy                                                                   boss
      can be computed
      in parallel

         Software & Services Group, Developer Products Division
       Software & Services Group
       Developer Products Division
                                                                                                3/14/2012 28    28
26                                              © 2012 WIPRO LTD | WWW.WIPRO.COM
omp sections

     • #pragma omp sections
     • Must be inside a parallel region
     • Precedes a code block containing of N blocks of code that may be
       executed concurrently by N threads
     • Encompasses each omp section

     • #pragma omp section
     • Precedes each block of code within the encompassing block
       described above
     • May be omitted for first parallel section after the parallel sections
       pragma
     • Enclosed program segments are distributed for parallel execution
       among available threads




         Software & Services Group, Developer Products Division
        Software & Services Group
27                                              © 2012 WIPRO LTD | WWW.WIPRO.COM
Functional Level Parallelism w sections


     #pragma omp parallel sections
     {
     #pragma omp section /* Optional */
         a = alice();
     #pragma omp section
         b = bob();
     #pragma omp section
        c = cy();
     }

     s = boss(a, b);
     printf ("%6.2fn",
             bigboss(s,c));
           Software & Services Group, Developer Products Division
         Software &2010, Intel Corporation. All rights reserved.
                     Services Group
         Developer©Products Division
           Copyright
                                                                                         Copyright© 2011, Intel Corporation. All rights reserved.
           *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners.                       3/14/2012 30   30
28                                                                               brands and names of
                                                                                          © 2012 WIPRO LTD | WWW.WIPRO.COM
Advantage of Parallel Sections

     • Independent sections of code can
        execute concurrently – reduce
        execution time

                   #pragma omp parallel sections
                   {
                            #pragma omp section
                            phase1();
                            #pragma omp section
                            phase2();
                            #pragma omp section
                            phase3();
                   }                                                                                                                               Serial         Parallel


        Developer©Products Division All rights reserved.
         Copyright 2010, Intel Corporation.
                                                                                        Copyright© 2011, Intel Corporation. All rights reserved.
          *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners.
                                                                                brands and names of                                                         3/14/2012 31   31


29                                                                                       © 2012 WIPRO LTD | WWW.WIPRO.COM
Agenda
     •What is OpenMP?
     •Parallel regions
     •Data-Sharing Attribute Clauses
     •Worksharing
     •OpenMP 3.0 Tasks
     •Synchronization
     •Runtime functions/environment variables
     •Optional Advanced topics




        Software & Services Group, Developer Products Division
       Software & Services Group


30                                             © 2012 WIPRO LTD | WWW.WIPRO.COM
New Addition to OpenMP

     •Tasks – Main change for OpenMP 3.0
     • Allows parallelization of irregular problems
             –unbounded loops
             –recursive algorithms
             –producer/consumer




        Software & Services Group, Developer Products Division
       Software & Services Group


31                                             © 2012 WIPRO LTD | WWW.WIPRO.COM
What are tasks?

     • Tasks are independent units of work
     •         Threads are assigned to perform the work
       of each task
        –      Tasks may be deferred
     •         Tasks may be executed immediately
     • The runtime system decides which of the above

        – Tasks are composed of:
                 – code to execute
                 – data environment
                 – internal control variables (ICV)                                                                                                Serial              Parallel



          Software & Services Group, Developer Products Division
        Software & Services Group
        Developer©Products Division All rights reserved.
          Copyright 2010, Intel Corporation.
                                                                                        Copyright© 2011, Intel Corporation. All rights reserved.
          *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners.
                                                                                brands and names of                                                         3/14/2012 34   34
32                                                                                       © 2012 WIPRO LTD | WWW.WIPRO.COM
Simple Task Example

     #pragma omp parallel
     // assume 8 threads                                                                                                                            A pool of 8 threads is
     {                                                                                                                                                  created here
       #pragma omp single private(p)
       {
         …                                                                                                                                       One thread gets to execute
         while (p) {                                                                                                                                    the while loop
         #pragma omp task
           {                                                                                                                                       The single “while loop”
             processwork(p);                                                                                                                      thread creates a task for
           }                                                                                                                                          each instance of
           p = p->next;                                                                                                                                processwork()
         }
       }
     }

        Software & Services Group, Developer Products Division
      Software & Services Group
      Developer©Products Division All rights reserved.
        Copyright 2010, Intel Corporation.
                                                                                      Copyright© 2011, Intel Corporation. All rights reserved.
        *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners.
                                                                              brands and names of                                                               3/14/2012 35   35
33                                                                                     © 2012 WIPRO LTD | WWW.WIPRO.COM
Task Construct – Explicit Task View



     – A team of threads is created
        at the omp parallel construct                         #pragma omp parallel
     – A single thread is chosen to                           {
        execute the while loop – lets                           #pragma omp single
        call this thread “L”
     – Thread L operates the while                              { // block 1
        loop, creates tasks, and                                  node * p = head;
        fetches next pointers                                     while (p) { //block 2
     – Each time L crosses the omp                                #pragma omp task
        task construct it generates a
        new task and has a thread                                    process(p);
        assigned to it                                            p = p->next; //block 3
     – Each task runs in its own                                  }
        thread
     – All tasks complete at the                                }
        barrier at the end of the                             }
        parallel region’s single
        construct
        Software & Services Group, Developer Products Division
      Software & Services Group
      Developer Products Division
                                                                                      3/14/2012 36   36
34                                             © 2012 WIPRO LTD | WWW.WIPRO.COM
Why are tasks useful?
     Have potential to parallelize irregular patterns and recursive function calls




35                                 © 2012 WIPRO LTD | WWW.WIPRO.COM
Activity 4 – Parallel Fibonacci

     Objective:
     1. create a parallel version of Fibonacci sample. That uses OpenMP
         tasks;
     2. try to find balance between serial and parallel recursion parts.




       Software & Services Group, Developer Products Division
      Software & Services Group

36                                            © 2012 WIPRO LTD | WWW.WIPRO.COM
When are tasks gauranteed to be
     complete?
      Tasks are gauranteed to be complete:
      • At thread or task barriers
      • At the directive: #pragma omp barrier
      • At the directive: #pragma omp taskwait




       Software & Services Group, Developer Products Division
      Software & Services Group




37                                            © 2012 WIPRO LTD | WWW.WIPRO.COM
Task Completion Example


     #pragma omp parallel                                               Multiple foo tasks created
     {                                                                 here – one for each thread
        #pragma omp task
        foo();
                                                                       All foo tasks guaranteed to
        #pragma omp barrier                                                be completed here
        #pragma omp single
        {
            #pragma omp task                                           One bar task created here
            bar();
        }
     }                                                                 bar task guaranteed to be
                                                                             completed here


      Developer Products Division
                                                                                       3/14/2012 40   40
38                                  © 2012 WIPRO LTD | WWW.WIPRO.COM
Agenda
     •What is OpenMP?
     •Parallel regions
     •Data-Sharing Attribute Clauses
     •Worksharing
     •OpenMP 3.0 Tasks
     •Synchronization
     •Runtime functions/environment variables
     •Optional Advanced topics




        Software & Services Group, Developer Products Division
       Software & Services Group


39                                             © 2012 WIPRO LTD | WWW.WIPRO.COM
Example: Dot Product


             float dot_prod(float* a, float* b, int N)
             {
               float sum = 0.0;
             #pragma omp parallel for shared(sum)
                for(int i=0; i<N; i++) {
                  sum += a[i] * b[i];
                }
               return sum;
             }




        Software & Services Group, Developer Products Division
      Software & Services Group



      Developer Products Division
                                                                                  3/14/2012 42   42
40                                             © 2012 WIPRO LTD | WWW.WIPRO.COM
Race Condition

     •A race condition is nondeterministic
       behavior caused     by the times at which
       two or more threads access a shared
       variable
     •For example, suppose both Thread A and
       Thread B are executing the statement
     •             area += 4.0 / (1.0 + x*x);




           Software & Services Group, Developer Products Division
         Software & Services Group
         Developer Products Division
                                                                                     3/14/2012 43   43
41                                                © 2012 WIPRO LTD | WWW.WIPRO.COM
Two Timings




42                 © 2012 WIPRO LTD | WWW.WIPRO.COM
Protect Shared Data
     •Must protect access to shared, modifiable data


               float dot_prod(float* a, float* b, int N)
               {
                 float sum = 0.0;
               #pragma omp parallel for shared(sum)
                  for(int i=0; i<N; i++) {
               #pragma omp critical
                            sum += a[i] * b[i];
                     }
                    return sum;
               }



         Software & Services Group, Developer Products Division
       Software &2010, Intel Corporation. All rights reserved.
                   Services Group
       Developer©Products Division
         Copyright
                                                                                       Copyright© 2011, Intel Corporation. All rights reserved.
         *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners.
                                                                               brands and names of                                                3/14/2012 45   45

43                                                                                      © 2012 WIPRO LTD | WWW.WIPRO.COM
OpenMP* Critical Construct

     • #pragma omp critical [(lock_name)]
     •Defines a critical region on a structured block

                                                  float RES;
     Threads wait their turn –at
     a time, only one calls
     consum() thereby
     protecting RES from race
     conditions
     Naming the critical                            #pragma omp critical (RES_lock)
     construct RES_lock is                          consum (B, RES);
     optional                                            }
                                             }
                                     Good Practice – Name all critical sections


         Software & Services Group, Developer Products Division
       Software & Services Group


       Developer Products Division
                                                                                      3/14/2012 46   46
44                                              © 2012 WIPRO LTD | WWW.WIPRO.COM
OpenMP* Reduction Clause
                 reduction (op : list)


     •The variables in “list” must be shared in the
       enclosing parallel region
     •Inside parallel or work-sharing construct:
            – A PRIVATE copy of each list variable is created and initialized
              depending on the “op”

            – These copies are updated locally by threads

            – At end of construct, local copies are combined through “op”
              into a single value and combined with the value in the original
              SHARED variable




        Software & Services Group, Developer Products Division
       Software & Services Group


45                                             © 2012 WIPRO LTD | WWW.WIPRO.COM
Reduction Example
     #include <stdio.h>
     #include "omp.h"
     int main(int argc, char *argv[])
     {
      int count = 10;
      #pragma omp parallel reduction (+: count)
        { int num;
           num = omp_get_thread_num();
           count++;
           printf("count is equal : %d on thread %d n", count, num);
        }
      printf("count at the end: %dn", count);
     }

       count is equal : 1 on thread 0
       count is equal : 1 on thread 1
       count at the end: 12

            Software & Services Group, Developer Products Division
          Software &2010, Intel Corporation. All rights reserved.
                      Services Group
          Developer©Products Division
            Copyright
                                                                                          Copyright© 2011, Intel Corporation. All rights reserved.
            *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners.
                                                                                  brands and names of                                                3/14/2012 48   48

46                                                                                         © 2012 WIPRO LTD | WWW.WIPRO.COM
C/C++ Reduction Operations

     •A range of associative operands can be used with
       reduction
     •Initial values are the ones that make sense
       mathematically
                  Operand                      InitialValue                                                         Operand   InitialValue
                         +                                          0                                                   &              ~0

                         *                                          1                                                   |               0

                          -                                         0                                                   &&              1

                         ^                                          0
                                                                                                                        ||              0




       Developer©Products Division All rights reserved.
         Software &Intel Corporation.
         Copyright 2010,
                            Services Group, Developer Products DivisionAll rights reserved.
                                                                         Copyright© 2011, Intel Corporation.
                                                                                                                                             3/14/2012 49   49
       Software & Services are the property of their *Other respective owners.the propertyaretheir respective owners.
         *Other brands and names
                                 Group                                   brands and names of




47                                                                            © 2012 WIPRO LTD | WWW.WIPRO.COM
Numerical Integration Example




48                     © 2012 WIPRO LTD | WWW.WIPRO.COM
Activity 5 - Computing Pi


     static long num_steps=100000;                                                                                                                  •Parallelize the
     double step, pi;
                                                                                                                                                      numerical
     void main()                                                                                                                                      integration code
     { int i;                                                                                                                                         using OpenMP
       double x, sum = 0.0;

       step = 1.0/(double) num_steps;                                                                                                               •What variables can
       for (i=0; i< num_steps; i++){                                                                                                                 be shared?
          x = (i+0.5)*step;
          sum = sum + 4.0/(1.0 + x*x);                                                                                                              •What variables need
       }                                                                                                                                             to be private?
       pi = step * sum;
       printf(“Pi = %fn”,pi);                                                                                                                      •What variables
                                                                                                                                                     should be set up for
                                                                                                                                                     reductions?
           Software & Services Group, Developer Products Division
         Software & Services Group
         Developer©Products Division All rights reserved.
           Copyright 2010, Intel Corporation.
                                                                                         Copyright© 2011, Intel Corporation. All rights reserved.
           *Other brands and names are the property of their *Other respective owners.the property of their respective owners.                                     3/14/2012 51   51
49                                                                                        © 2012 WIPRO LTD | WWW.WIPRO.COM
Single Construct

     •Denotes block of code to be executed by only
      one thread
                  – Implementation defined
     •Implicit barrier at end
          #pragma omp parallel
          {
            DoManyThings();
          #pragma omp single
            {
              ExchangeBoundaries();
            } // threads wait here for single
            DoManyMoreThings();
          }


        Software & Services Group, Developer Products Division
      Software &2010, Intel Corporation. All rights reserved.
                  Services Group
      Developer©Products Division
        Copyright
                                                                                      Copyright© 2011, Intel Corporation. All rights reserved.
        *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners.
                                                                              brands and names of                                                3/14/2012 52   52

50                                                                                     © 2012 WIPRO LTD | WWW.WIPRO.COM
Master Construct

     •Denotes block of code to be executed only by
      the master thread
     •No implicit barrier at end

          #pragma omp parallel
          {
            DoManyThings();
          #pragma omp master
            {            // if not master skip to next stmt
              ExchangeBoundaries();
            }
            DoManyMoreThings();
          }



        Software & Services Group, Developer Products Division
      Software & Services Group

      Developer Products Division
                                                                                  3/14/2012 53   53
51                                             © 2012 WIPRO LTD | WWW.WIPRO.COM
Implicit Barriers
     •Several OpenMP* constructs have implicit barriers
             – Parallel – necessary barrier – cannot be removed
             – for
             – single
     •Unnecessary barriers hurt performance and can be
       removed with the nowait clause
             – The nowait clause is applicable to:
                – For clause
                – Single clause




        Software & Services Group, Developer Products Division
       Software & Services Group



52                                             © 2012 WIPRO LTD | WWW.WIPRO.COM
Nowait Clause
           #pragma omp for nowait
             for(...)                                                #pragma single nowait
               {...};                                                { [...] }



     •Use when threads unnecessarily wait between
       independent computations
                 #pragma omp for schedule(dynamic,1) nowait
                  for(int i=0; i<n; i++)
                    a[i] = bigFunc1(i);

                 #pragma omp for schedule(dynamic,1)
                  for(int j=0; j<m; j++)
                    b[j] = bigFunc2(j);


         Software & Services Group, Developer Products Division
       Software & Services Group
       Developer Products Division
                                                                                             3/14/2012 55   55
53                                              © 2012 WIPRO LTD | WWW.WIPRO.COM
Barrier Construct
     •Explicit barrier synchronization
     •Each thread waits until all threads arrive


                         #pragma omp parallel shared (A, B, C)
                         {
                                 DoSomeWork(A,B);
                                 printf(“Processed A into Bn”);
                         #pragma omp barrier
                                 DoSomeWork(B,C);
                                 printf(“Processed B into Cn”);
                         }




         Software & Services Group, Developer Products Division
       Software &2010, Intel Corporation. All rights reserved.
                   Services Group
       Developer©Products Division
         Copyright
                                                                                       Copyright© 2011, Intel Corporation. All rights reserved.
         *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners.
                                                                               brands and names of                                                3/14/2012 56   56

54                                                                                      © 2012 WIPRO LTD | WWW.WIPRO.COM
Atomic Construct
     •Special case of a critical section
     •Applies only to simple update of memory location


               #pragma omp parallel for shared(x, y, index, n)
                 for (i = 0; i < n; i++) {
                   #pragma omp atomic
                      x[index[i]] += work1(i);
                   y[i] += work2(i);
                 }




         Software & Services Group, Developer Products Division
       Software &2010, Intel Corporation. All rights reserved.
                   Services Group
       Developer©Products Division
         Copyright
                                                                                       Copyright© 2011, Intel Corporation. All rights reserved.
         *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners.
                                                                               brands and names of                                                3/14/2012 57   57

55                                                                                      © 2012 WIPRO LTD | WWW.WIPRO.COM
Agenda
     •What is OpenMP?
     •Parallel regions
     •Data-Sharing Attribute Clauses
     •Worksharing
     •OpenMP 3.0 Tasks
     •Synchronization
     •Runtime functions/environment variables
     •Optional Advanced topics




        Software & Services Group, Developer Products Division
       Software & Services Group


56                                             © 2012 WIPRO LTD | WWW.WIPRO.COM
Environment Variables


     •Set the default number of threads
                    OMP_NUM_THREADS integer

     •Set the default scheduling protocol
                    OMP_SCHEDULE “schedule[, chunk_size]”
     •Enable dynamic thread adjustment
                    OMP_DYNAMIC [TRUE|FALSE]
     •Enable nested parallelism
                    OMP_NESTED [TRUE|FALSE]




      Software & Services Group, Developer Products Division
     Software & Services Group


57                                           © 2012 WIPRO LTD | WWW.WIPRO.COM
20+ Library Routines


        Software & Services Group, Developer Products Division
      Software & Services Group




      Developer Products Division
                                                                                  3/14/2012 60   60
58                                             © 2012 WIPRO LTD | WWW.WIPRO.COM
Library Routines
     •To fix the number of threads used in a program
                    – Set the number of threads
                    – Then save the number returned
      #include <omp.h>                                                                                                                             Request as many threads as you
                                                                                                                                                   have processors.
      void main ()
      {
        int num_threads;
        omp_set_num_threads (omp_num_procs ());

          #pragma omp parallel                                                                                                                          Protect this
          {                                                                                                                                             operation because
            int id = omp_get_thread_num ();
                                                                                                                                                        memory stores are
              #pragma omp single
                                                                                                                                                        not atomic
              num_threads = omp_get_num_threads ();

              do_lots_of_stuff (id);
          }
      }

          Software & Services Group, Developer Products Division
      Software & Services Group
      Developer©Products Division All rights reserved.
        Copyright 2010, Intel Corporation.
                                                                                        Copyright© 2011, Intel Corporation. All rights reserved.
          *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners.
                                                                                brands and names of                                                                     3/14/2012 61   61
59                                                                                       © 2012 WIPRO LTD | WWW.WIPRO.COM
Agenda
     •What is OpenMP?
     •Parallel regions
     •Data-Sharing Attribute Clauses
     •Worksharing
     •OpenMP 3.0 Tasks
     •Synchronization
     •Runtime functions/environment variables
     •Optional Advanced topics
       – Intel® Parallel Debugger Extension
       – Advanced Concepts



        Software & Services Group, Developer Products Division
       Software & Services Group



60                                             © 2012 WIPRO LTD | WWW.WIPRO.COM
Key Features


       Thread Shared Data Event Detection
            Break on Thread Shared Data Access (read/write)
       Re-entrant Function Detection
       SIMD SSE Registers Window
       Enhanced OpenMP* Support
            Serialize OpenMP threaded application execution on the
             fly
            Insight into thread groups, barriers, locks, wait lists etc.




        Software & Services Group, Developer Products Division
       Software & Services Group


61                                             © 2012 WIPRO LTD | WWW.WIPRO.COM
Debugger Extensions in Microsoft* VS


     The Intel® Debugger Extensions is a plug-in to Visual Studio
     2005/2008/2010 and add features to the Microsoft* debugger.




     The parallel debugger extensions has similar buttons as the Linux
     version of IDB. The functionality is identical.


          Software & Services Group, Developer Products Division
        Developer& ServicesDivision
        Software Products Group                                                     3/14/2012 64   64
62                                               © 2012 WIPRO LTD | WWW.WIPRO.COM
Shared Data Events Detection

 • Shared data access is a major                                                                         Visual Studio*
   problem in multi- threaded                                            Compiler
   applications                                                       Instrumented
         Can cause hard to diagnose intermittent                       Application                             GUI
          program failure                                                                                         s
                                                                                                       GUI Extension
         Tool support is required for detection                          Memory Acces
                                                                         Instrumentation     Normal
                                                                                           debugging
                                                                                                         Debug Engine

                                                                  Intel Debug Runtime          RTL     Debugger Extension
     Technology built on:                                                                    events




      Code & debug info Instrumentation by Intel compiler (/Qopenmp /debug:parallel)
      Debug runtime library that collects data access traces and triggers debugger tool
        (libiomp5md.dll, pdbx.dll)

      Debugger Extension in Visual Studio Debug Engine collects and exports information
      GUI integration provides parallelism views and user interactivity




             Software & Services Group, Developer Products Division
           Software & Services Group
           Developer Products Division
                                                                                                           3/14/2012 65     65
63                                                  © 2012 WIPRO LTD | WWW.WIPRO.COM
Shared Data Events Detection – contd.

     Data sharing detection is part of overall debug process
      Breakpoint model (stop on
       detection)
      GUI extensions show results
       & link to source

      Filter capabilities to hide
       false positives

     New powerful
     data breakpoint types
      Stop when 2nd thread
       accesses specific address
      Stop on read from address



       Key User Benefit:
       A simplified feature to detect shared data accesses from multiple threads

            Software & Services Group, Developer Products Division
          Software & ServicesDivision
          Developer Products Group
                                                                                      3/14/2012 66   66
64                                                 © 2012 WIPRO LTD | WWW.WIPRO.COM
Shared Data Events Detection - Usage


     Enabling Shared Data Events Detection
     automatically enables “Stop On Event”




                                                              Future detection and logging of events
                                                              can be suppressed




           Software & Services Group, Developer Products Division
         Software & Services Group
         Developer Products Division
                                                                                            3/14/2012 67   67
65                                                © 2012 WIPRO LTD | WWW.WIPRO.COM
Shared Data Events Detection - Filtering

     Data sharing detection is selective
          Data Filter
                Specific data items and
                 variables can be excluded




          Code Filter
                Functions can be excluded
                Source files can be excluded
                Address ranges can be
                 excluded




           Software & Services Group, Developer Products Division
         Software & Services Group
         Developer Products Division
                                                                                     3/14/2012 68   68
66                                                © 2012 WIPRO LTD | WWW.WIPRO.COM
Re-Entrant Call Detection


     Automatically halts execution when a function is executed by more
      than one thread at any given point in time.




     Allows to identify reentrancy requirements/problems for multi-
       threaded applications



           Software & Services Group, Developer Products Division
         Software & ServicesDivision
         Developer Products Group
                                                                                     3/14/2012 69
67                                                                                                  69
                                                  © 2012 WIPRO LTD | WWW.WIPRO.COM
Re-Entrant Call Detection

     • After execution is halted and user clicks ok the program
       counter will point exactly to where the problem occurred.




          Software & Services Group, Developer Products Division
        Software & ServicesDivision
        Developer Products Group
                                                                                    3/14/2012 70
68                                                                                                 70
                                                 © 2012 WIPRO LTD | WWW.WIPRO.COM
SIMD SSE Debugging Window
     SSE Registers Window
     SSE Registers display of variables used for SIMD
      operations
     Free Formatting for flexible representation of data
     In-depth insight into data parallelization and
      vectorization.            In-Place Edit




         Software & Services Group, Developer Products Division
       Developer©Products Division All rights reserved.
        Copyright 2010, Intel Corporation.
                                                                                       Copyright© 2011, Intel Corporation. All rights reserved.
         *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners.
                                                                               brands and names of                                                3/14/2012 71   71
69                                                                                      © 2012 WIPRO LTD | WWW.WIPRO.COM
Enhanced OpenMP* Debugging Support
     •Dedicated OpenMP runtime
       object - information
       Windows
            OpenMP Task and Spawn Tree lists
            Barrier and Lock information
            Task Wait lists
            Thread Team worker lists

     •Serialize Parallel Regions
          Change number of parallel threads
           dynamically during runtime to 1 or N (all)
          Verify code correctness for serial execution
           vs. Parallel execution
          Identify whether a runtime issue is really
           parallelism related

     User benefit:
     Detailed execution state information for OpenMP applications (deadlock
     detection). Influences execution behavior without recompile!



         Software & Services Group, Developer Products Division
       Software & Services Group
       Developer Products Division
                                                                                   3/14/2012 72   72
70                                              © 2012 WIPRO LTD | WWW.WIPRO.COM
OpenMP* Information Windows

     • Tasks, Spawn Trees, Teams, Barriers, Locks
     • (Task ID reporting incorrect)




          Software & Services Group, Developer Products Division
        Software & ServicesDivision
        Developer Products Group
                                                                                    3/14/2012 73
71                                                                                                 73
                                                 © 2012 WIPRO LTD | WWW.WIPRO.COM
Serialize Parallel Regions

     Problem:
     Parallel loop computes a wrong result. Is it a concurrency or
     algorithm issue ?

     Parallel Debug Support                                                                          …
     Runtime access to the OpenMP num_thread property
     Set to 1 for serial execution of next parallel block                              Disable parallel




     User Benefit
     Verification of a algorithm “on-the-fly” without slowing down the
     entire application to serial execution
     On demand serial debugging without recompile/restart
                                                                                        Enable parallel



                                                                                                     …




           Software & Services Group, Developer Products Division
         Software & Services Group
         Developer Products Division
                                                                                      3/14/2012 74        74
72                                                 © 2012 WIPRO LTD | WWW.WIPRO.COM
Agenda
     •What is OpenMP?
     •Parallel regions
     •Data-Sharing Attribute Clauses
     •Worksharing
     •OpenMP 3.0 Tasks
     •Synchronization
     •Runtime functions/environment variables
     •Optional Advanced topics
       – Intel® Parallel Debugger Extension
       – Advanced Concepts



        Software & Services Group, Developer Products Division
       Software & Services Group


73                                             © 2012 WIPRO LTD | WWW.WIPRO.COM
Parallel Construct – Implicit Task View

      –Tasks are created in
       OpenMP even without an
       explicit task directive.
      –Lets look at how tasks
       are created implicitly for
       the code snippet below
           – Thread encountering parallel
             construct packages up a set of
             implicit tasks
           – Team of threads is created.
           – Each thread in team is
             assigned to one of the tasks                                     parallel
             (and tied to it).
           – Barrier holds original master                                      { int mydata
             thread until all implicit tasks
             are finished.
        Software & Services Group, Developer Products Division
      Software & Services Group
      Developer Products Division
                                                                                               3/14/2012 76   76
74                                             © 2012 WIPRO LTD | WWW.WIPRO.COM
Task Construct

      #pragma omp task [clause[[,]clause] ...]
                 structured-block

      where clause can be one of:


                 if (expression)
                 untied
                 shared (list)
                 private (list)
                 firstprivate (list)
                 default( shared | none )
       Software & Services Group, Developer Products Division
      Software & Services Group


75                                            © 2012 WIPRO LTD | WWW.WIPRO.COM
Tied & Untied Tasks
        – Tied Tasks:
           – A tied task gets a thread assigned to it at its first execution
             and the same thread services the task for its lifetime
           – A thread executing a tied task, can be suspended, and sent
             off to execute some other task, but eventually, the same
             thread will return to resume execution of its original tied
             task
           – Tasks are tied unless explicitly declared untied


        – Untied Tasks:
           – An united task has no long term association with any given
             thread. Any thread not otherwise occupied is free to
             execute an untied task. The thread assigned to execute an
             untied task may only change at a "task scheduling point".
           – An untied task is created by appending “untied” to the task
             clause



76                                   © 2012 WIPRO LTD | WWW.WIPRO.COM
Task switching
     • task switching The act of a thread switching from the execution of
       one task to another task.

     • The purpose of task switching is distribute threads among the
       unassigned tasks in the team to avoid piling up long queues of
       unassigned tasks
     • Task switching, for tied tasks, can only occur at task scheduling
       points located within the following constructs

            encountered task constructs
            encountered taskwait constructs
            encountered barrier directives
            implicit barrier regions
            at the end of the tied task region

     • Untied tasks have implementation dependent scheduling points



         Software & Services Group, Developer Products Division
        Software & Services Group


77                                              © 2012 WIPRO LTD | WWW.WIPRO.COM
Task switching example
     The thread executing the “for loop” , AKA the generating
     task, generates many tasks in a short time so...
     The SINGLE generating task will have to suspend for a
     while when “task pool” fills up
     • Task switching is invoked to start draining the “pool”
     • When “pool” is sufficiently drained – then the single task
        can being generating more tasks again

                           int exp; //exp either T or F;
                           #pragma omp single
                           {
                               for (i=0; i<ONEZILLION; i++)
                                #pragma omp task
                                    process(item[i]);
                               }
                           }

     Developer Products Division Group,
           Software & Services            Developer Products Division
         Software & Services Group                                                    3/14/2012 80   80
78                                                 © 2012 WIPRO LTD | WWW.WIPRO.COM
Optional foil - OpenMP* API

     •Get the thread number within a team
       int omp_get_thread_num(void);
     •Get the number of threads in a team
       int omp_get_num_threads(void);


     •Usually not needed for OpenMP codes
             – Can lead to code not being serially consistent
             – Does have specific uses (debugging)
             – Must include a header file
             #include <omp.h>




        Software & Services Group, Developer Products Division
       Software & Services Group


79                                             © 2012 WIPRO LTD | WWW.WIPRO.COM
Optional foil - Monte Carlo Pi
                                                                                            1
                                                         # of darts hitting circle
                                                                                               4r2
                                                           #of dartsin square                   r2
                                                             # of darts hitting circle
                                                         4
                                                               #of dartsin square
                                                          loop 1 to MAX
                                                                     x.coor=(random#)
                                                                     y.coor=(random#)
                                                                     dist=sqrt(x^2 + y^2)
                            r                                        if (dist <= 1)
                                                                      hits=hits+1
                                                          pi = 4 * hits/MAX




        Software & Services Group, Developer Products Division
      Software & Services Group
      Developer Products Division
                                                                                     3/14/2012 82      82
80                                             © 2012 WIPRO LTD | WWW.WIPRO.COM
Optional foil - Making Monte Carlo’s Parallel

                  hits = 0
                  call SEED48(1)
                  DO I = 1, max
                      x = DRAND48()
                      y = DRAND48()
                      IF (SQRT(x*x + y*y) .LT. 1) THEN
                          hits = hits+1
                      ENDIF
                  END DO
                  pi = REAL(hits)/REAL(max) * 4.0




                    What is the challenge here?
         Software & Services Group, Developer Products Division
       Software & Services Group
       Developer Products Division
                                                                                   3/14/2012 83   83
81                                              © 2012 WIPRO LTD | WWW.WIPRO.COM
Optional Activity 6: Computing Pi

     •Use the Intel® Math Kernel Library (Intel® MKL)
       VSL:
             – Intel MKL’s VSL (Vector Statistics Libraries)

             – VSL creates an array, rather than a single random number

             – VSL can have multiple seeds (one for each thread)

     •Objective:
             – Use basic OpenMP* syntax to make Pi parallel

             – Choose the best code to divide the task up

             – Categorize properly all variables


        Software & Services Group, Developer Products Division
       Software & Services Group
82                                             © 2012 WIPRO LTD | WWW.WIPRO.COM

Más contenido relacionado

La actualidad más candente

Key topics when migrating from FAST to Solr, EuroCon 2010
Key topics when migrating from FAST to Solr, EuroCon 2010Key topics when migrating from FAST to Solr, EuroCon 2010
Key topics when migrating from FAST to Solr, EuroCon 2010Cominvent AS
 
Towards JVM Dynamic Languages Toolchain
Towards JVM Dynamic Languages ToolchainTowards JVM Dynamic Languages Toolchain
Towards JVM Dynamic Languages ToolchainAttila Szegedi
 
Oop05 6
Oop05 6Oop05 6
Oop05 6schwaa
 
AOP in C# 2013
AOP in C# 2013AOP in C# 2013
AOP in C# 2013Antya Dev
 
Smart annotation processing - Paris JUG
Smart annotation processing - Paris JUGSmart annotation processing - Paris JUG
Smart annotation processing - Paris JUGgdigugli
 
An Introduction to SPL, the Standard PHP Library
An Introduction to SPL, the Standard PHP LibraryAn Introduction to SPL, the Standard PHP Library
An Introduction to SPL, the Standard PHP LibraryRobin Fernandes
 
Pgloader
PgloaderPgloader
Pgloadertapoueh
 
Modules of the twenties
Modules of the twentiesModules of the twenties
Modules of the twentiesPuppet
 
Enriching EMF Models with Scala (quick overview)
Enriching EMF Models with Scala (quick overview)Enriching EMF Models with Scala (quick overview)
Enriching EMF Models with Scala (quick overview)Filip Krikava
 
Itinerary Website (Web Development Document)
Itinerary Website (Web Development Document)Itinerary Website (Web Development Document)
Itinerary Website (Web Development Document)Traitet Thepbandansuk
 
Apache Harmony: An Open Innovation
Apache Harmony: An Open InnovationApache Harmony: An Open Innovation
Apache Harmony: An Open InnovationTim Ellison
 
Puppetizing Your Organization
Puppetizing Your OrganizationPuppetizing Your Organization
Puppetizing Your OrganizationRobert Nelson
 
Mike Taulty OData (NxtGen User Group UK)
Mike Taulty OData (NxtGen User Group UK)Mike Taulty OData (NxtGen User Group UK)
Mike Taulty OData (NxtGen User Group UK)ukdpe
 
CDO Ignite
CDO IgniteCDO Ignite
CDO IgniteHolmes70
 
Constructor injection and other new features for Declarative Services 1.4
Constructor injection and other new features for Declarative Services 1.4Constructor injection and other new features for Declarative Services 1.4
Constructor injection and other new features for Declarative Services 1.4bjhargrave
 

La actualidad más candente (20)

The Java Carputer
The Java CarputerThe Java Carputer
The Java Carputer
 
Key topics when migrating from FAST to Solr, EuroCon 2010
Key topics when migrating from FAST to Solr, EuroCon 2010Key topics when migrating from FAST to Solr, EuroCon 2010
Key topics when migrating from FAST to Solr, EuroCon 2010
 
Towards JVM Dynamic Languages Toolchain
Towards JVM Dynamic Languages ToolchainTowards JVM Dynamic Languages Toolchain
Towards JVM Dynamic Languages Toolchain
 
Oop05 6
Oop05 6Oop05 6
Oop05 6
 
AOP in C# 2013
AOP in C# 2013AOP in C# 2013
AOP in C# 2013
 
Smart annotation processing - Paris JUG
Smart annotation processing - Paris JUGSmart annotation processing - Paris JUG
Smart annotation processing - Paris JUG
 
Basics of building a blackfin application
Basics of building a blackfin applicationBasics of building a blackfin application
Basics of building a blackfin application
 
An Introduction to SPL, the Standard PHP Library
An Introduction to SPL, the Standard PHP LibraryAn Introduction to SPL, the Standard PHP Library
An Introduction to SPL, the Standard PHP Library
 
Basic java part_ii
Basic java part_iiBasic java part_ii
Basic java part_ii
 
Pgloader
PgloaderPgloader
Pgloader
 
Modules of the twenties
Modules of the twentiesModules of the twenties
Modules of the twenties
 
Enriching EMF Models with Scala (quick overview)
Enriching EMF Models with Scala (quick overview)Enriching EMF Models with Scala (quick overview)
Enriching EMF Models with Scala (quick overview)
 
Itinerary Website (Web Development Document)
Itinerary Website (Web Development Document)Itinerary Website (Web Development Document)
Itinerary Website (Web Development Document)
 
Apache Harmony: An Open Innovation
Apache Harmony: An Open InnovationApache Harmony: An Open Innovation
Apache Harmony: An Open Innovation
 
Dvm
DvmDvm
Dvm
 
Puppetizing Your Organization
Puppetizing Your OrganizationPuppetizing Your Organization
Puppetizing Your Organization
 
Mike Taulty OData (NxtGen User Group UK)
Mike Taulty OData (NxtGen User Group UK)Mike Taulty OData (NxtGen User Group UK)
Mike Taulty OData (NxtGen User Group UK)
 
CDO Ignite
CDO IgniteCDO Ignite
CDO Ignite
 
Constructor injection and other new features for Declarative Services 1.4
Constructor injection and other new features for Declarative Services 1.4Constructor injection and other new features for Declarative Services 1.4
Constructor injection and other new features for Declarative Services 1.4
 
TAO DAYS - API (IT Session)
TAO DAYS - API (IT Session)TAO DAYS - API (IT Session)
TAO DAYS - API (IT Session)
 

Destacado

Google app engine cheat sheet
Google app engine cheat sheetGoogle app engine cheat sheet
Google app engine cheat sheetPiyush Mittal
 
Intro to parallel computing
Intro to parallel computingIntro to parallel computing
Intro to parallel computingPiyush Mittal
 
Online signature recognition
Online signature recognitionOnline signature recognition
Online signature recognitionPiyush Mittal
 
Basics of Coding Theory
Basics of Coding TheoryBasics of Coding Theory
Basics of Coding TheoryPiyush Mittal
 
Security in mobile ad hoc networks
Security in mobile ad hoc networksSecurity in mobile ad hoc networks
Security in mobile ad hoc networksPiyush Mittal
 

Destacado (10)

Reflection
ReflectionReflection
Reflection
 
Arp and rarp
Arp and rarpArp and rarp
Arp and rarp
 
Google app engine cheat sheet
Google app engine cheat sheetGoogle app engine cheat sheet
Google app engine cheat sheet
 
Intro to parallel computing
Intro to parallel computingIntro to parallel computing
Intro to parallel computing
 
Vi cheat sheet
Vi cheat sheetVi cheat sheet
Vi cheat sheet
 
Power mock
Power mockPower mock
Power mock
 
Channel coding
Channel codingChannel coding
Channel coding
 
Online signature recognition
Online signature recognitionOnline signature recognition
Online signature recognition
 
Basics of Coding Theory
Basics of Coding TheoryBasics of Coding Theory
Basics of Coding Theory
 
Security in mobile ad hoc networks
Security in mobile ad hoc networksSecurity in mobile ad hoc networks
Security in mobile ad hoc networks
 

Similar a Intel open mp

Better Code: Concurrency
Better Code: ConcurrencyBetter Code: Concurrency
Better Code: ConcurrencyPlatonov Sergey
 
OpenERP Technical Memento V0.7.3
OpenERP Technical Memento V0.7.3OpenERP Technical Memento V0.7.3
OpenERP Technical Memento V0.7.3Borni DHIFI
 
Zend Products and PHP for IBMi
Zend Products and PHP for IBMi  Zend Products and PHP for IBMi
Zend Products and PHP for IBMi Shlomo Vanunu
 
Open Source RAD with OpenERP 7.0
Open Source RAD with OpenERP 7.0Open Source RAD with OpenERP 7.0
Open Source RAD with OpenERP 7.0Quang Ngoc
 
A Fabric/Puppet Build/Deploy System
A Fabric/Puppet Build/Deploy SystemA Fabric/Puppet Build/Deploy System
A Fabric/Puppet Build/Deploy Systemadrian_nye
 
NFF-GO (YANFF) - Yet Another Network Function Framework
NFF-GO (YANFF) - Yet Another Network Function FrameworkNFF-GO (YANFF) - Yet Another Network Function Framework
NFF-GO (YANFF) - Yet Another Network Function FrameworkMichelle Holley
 
NTTs Journey with Openstack-final
NTTs Journey with Openstack-finalNTTs Journey with Openstack-final
NTTs Journey with Openstack-finalshintaro mizuno
 
Room 2 - 3 - Nguyễn Hoài Nam & Nguyễn Việt Hùng - Terraform & Pulumi Comparin...
Room 2 - 3 - Nguyễn Hoài Nam & Nguyễn Việt Hùng - Terraform & Pulumi Comparin...Room 2 - 3 - Nguyễn Hoài Nam & Nguyễn Việt Hùng - Terraform & Pulumi Comparin...
Room 2 - 3 - Nguyễn Hoài Nam & Nguyễn Việt Hùng - Terraform & Pulumi Comparin...Vietnam Open Infrastructure User Group
 
Terraform vs Pulumi
Terraform vs PulumiTerraform vs Pulumi
Terraform vs PulumiHoaiNam307
 
Lambdas and Streams in Java SE 8: Making Bulk Operations simple - Simon Ritter
Lambdas and Streams in Java SE 8: Making Bulk Operations simple - Simon RitterLambdas and Streams in Java SE 8: Making Bulk Operations simple - Simon Ritter
Lambdas and Streams in Java SE 8: Making Bulk Operations simple - Simon RitterJAXLondon2014
 
Lambdas And Streams in JDK8
Lambdas And Streams in JDK8Lambdas And Streams in JDK8
Lambdas And Streams in JDK8Simon Ritter
 
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...Maarten Balliauw
 
First Steps in Python Programming
First Steps in Python ProgrammingFirst Steps in Python Programming
First Steps in Python ProgrammingDozie Agbo
 
Building scalable and language independent java services using apache thrift
Building scalable and language independent java services using apache thriftBuilding scalable and language independent java services using apache thrift
Building scalable and language independent java services using apache thriftTalentica Software
 
Multiplatform shared codebase with Kotlin/Native - UA Mobile 2019
Multiplatform shared codebase with Kotlin/Native - UA Mobile 2019Multiplatform shared codebase with Kotlin/Native - UA Mobile 2019
Multiplatform shared codebase with Kotlin/Native - UA Mobile 2019UA Mobile
 
Multiplatform shared codebase with Kotlin/Native - UA Mobile 2019
Multiplatform shared codebase with Kotlin/Native - UA Mobile 2019Multiplatform shared codebase with Kotlin/Native - UA Mobile 2019
Multiplatform shared codebase with Kotlin/Native - UA Mobile 2019Eugene Kurko
 
Building scalable and language-independent Java services using Apache Thrift ...
Building scalable and language-independent Java services using Apache Thrift ...Building scalable and language-independent Java services using Apache Thrift ...
Building scalable and language-independent Java services using Apache Thrift ...IndicThreads
 
NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...
NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...
NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...Maarten Balliauw
 

Similar a Intel open mp (20)

Better Code: Concurrency
Better Code: ConcurrencyBetter Code: Concurrency
Better Code: Concurrency
 
OpenERP Technical Memento V0.7.3
OpenERP Technical Memento V0.7.3OpenERP Technical Memento V0.7.3
OpenERP Technical Memento V0.7.3
 
Zend Products and PHP for IBMi
Zend Products and PHP for IBMi  Zend Products and PHP for IBMi
Zend Products and PHP for IBMi
 
Open Source RAD with OpenERP 7.0
Open Source RAD with OpenERP 7.0Open Source RAD with OpenERP 7.0
Open Source RAD with OpenERP 7.0
 
A Fabric/Puppet Build/Deploy System
A Fabric/Puppet Build/Deploy SystemA Fabric/Puppet Build/Deploy System
A Fabric/Puppet Build/Deploy System
 
NFF-GO (YANFF) - Yet Another Network Function Framework
NFF-GO (YANFF) - Yet Another Network Function FrameworkNFF-GO (YANFF) - Yet Another Network Function Framework
NFF-GO (YANFF) - Yet Another Network Function Framework
 
Golang
GolangGolang
Golang
 
Golang
GolangGolang
Golang
 
NTTs Journey with Openstack-final
NTTs Journey with Openstack-finalNTTs Journey with Openstack-final
NTTs Journey with Openstack-final
 
Room 2 - 3 - Nguyễn Hoài Nam & Nguyễn Việt Hùng - Terraform & Pulumi Comparin...
Room 2 - 3 - Nguyễn Hoài Nam & Nguyễn Việt Hùng - Terraform & Pulumi Comparin...Room 2 - 3 - Nguyễn Hoài Nam & Nguyễn Việt Hùng - Terraform & Pulumi Comparin...
Room 2 - 3 - Nguyễn Hoài Nam & Nguyễn Việt Hùng - Terraform & Pulumi Comparin...
 
Terraform vs Pulumi
Terraform vs PulumiTerraform vs Pulumi
Terraform vs Pulumi
 
Lambdas and Streams in Java SE 8: Making Bulk Operations simple - Simon Ritter
Lambdas and Streams in Java SE 8: Making Bulk Operations simple - Simon RitterLambdas and Streams in Java SE 8: Making Bulk Operations simple - Simon Ritter
Lambdas and Streams in Java SE 8: Making Bulk Operations simple - Simon Ritter
 
Lambdas And Streams in JDK8
Lambdas And Streams in JDK8Lambdas And Streams in JDK8
Lambdas And Streams in JDK8
 
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...
 
First Steps in Python Programming
First Steps in Python ProgrammingFirst Steps in Python Programming
First Steps in Python Programming
 
Building scalable and language independent java services using apache thrift
Building scalable and language independent java services using apache thriftBuilding scalable and language independent java services using apache thrift
Building scalable and language independent java services using apache thrift
 
Multiplatform shared codebase with Kotlin/Native - UA Mobile 2019
Multiplatform shared codebase with Kotlin/Native - UA Mobile 2019Multiplatform shared codebase with Kotlin/Native - UA Mobile 2019
Multiplatform shared codebase with Kotlin/Native - UA Mobile 2019
 
Multiplatform shared codebase with Kotlin/Native - UA Mobile 2019
Multiplatform shared codebase with Kotlin/Native - UA Mobile 2019Multiplatform shared codebase with Kotlin/Native - UA Mobile 2019
Multiplatform shared codebase with Kotlin/Native - UA Mobile 2019
 
Building scalable and language-independent Java services using Apache Thrift ...
Building scalable and language-independent Java services using Apache Thrift ...Building scalable and language-independent Java services using Apache Thrift ...
Building scalable and language-independent Java services using Apache Thrift ...
 
NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...
NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...
NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...
 

Más de Piyush Mittal

Más de Piyush Mittal (20)

Design pattern tutorial
Design pattern tutorialDesign pattern tutorial
Design pattern tutorial
 
Gpu archi
Gpu archiGpu archi
Gpu archi
 
Cuda Architecture
Cuda ArchitectureCuda Architecture
Cuda Architecture
 
Cuda toolkit reference manual
Cuda toolkit reference manualCuda toolkit reference manual
Cuda toolkit reference manual
 
Matrix multiplication using CUDA
Matrix multiplication using CUDAMatrix multiplication using CUDA
Matrix multiplication using CUDA
 
Java cheat sheet
Java cheat sheetJava cheat sheet
Java cheat sheet
 
Git cheat sheet
Git cheat sheetGit cheat sheet
Git cheat sheet
 
Css cheat sheet
Css cheat sheetCss cheat sheet
Css cheat sheet
 
Cpp cheat sheet
Cpp cheat sheetCpp cheat sheet
Cpp cheat sheet
 
Ubuntu cheat sheet
Ubuntu cheat sheetUbuntu cheat sheet
Ubuntu cheat sheet
 
Php cheat sheet
Php cheat sheetPhp cheat sheet
Php cheat sheet
 
oracle 9i cheat sheet
oracle 9i cheat sheetoracle 9i cheat sheet
oracle 9i cheat sheet
 
Open ssh cheet sheat
Open ssh cheet sheatOpen ssh cheet sheat
Open ssh cheet sheat
 
Opencl cheet sheet
Opencl cheet sheetOpencl cheet sheet
Opencl cheet sheet
 
Open MP cheet sheet
Open MP cheet sheetOpen MP cheet sheet
Open MP cheet sheet
 
My sql cheat sheet
My sql cheat sheetMy sql cheat sheet
My sql cheat sheet
 
Matlab cheatsheet
Matlab cheatsheetMatlab cheatsheet
Matlab cheatsheet
 
Latex cheat sheet
Latex cheat sheetLatex cheat sheet
Latex cheat sheet
 
Gdb cheat sheet
Gdb cheat sheetGdb cheat sheet
Gdb cheat sheet
 
Linked lists
Linked listsLinked lists
Linked lists
 

Intel open mp

  • 1. Programming with OpenMP* Software & 1 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 2. Agenda •What is OpenMP? •Parallel regions •Data-Sharing Attribute Clauses •Worksharing •OpenMP 3.0 Tasks •Synchronization •Runtime functions/environment variables •Optional Advanced topics Software & Services Group, Developer Products Division Software & Services Group 2 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 3. What Is OpenMP? Software & Services Group, Developer Products Division Software & Services Group Developer©Products Division All rights reserved. Copyright 2010, Intel Corporation. Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners. 3/14/2012 3 3 3 brands and names of © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 4. Programming Model Fork-Join Parallelism: • Master thread spawns a team of threads as needed • Parallelism is added incrementally: that is, the sequential program evolves into a parallel program Master Thread Parallel Regions Software & Services Group, Developer Products Division Software & ServicesDivision Developer Products Group 3/14/2012 4 4 4 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 5. A Few Details to Get Started •Compiler option /Qopenmp in Windows or –openmp in Linux •Most of the constructs in OpenMP are compiler directives or pragmas –For C and C++, the pragmas take the form: #pragma omp construct [clause [clause]…] –For Fortran, the directives take one of the forms: C$OMP construct [clause [clause]…] !$OMP construct [clause [clause]…] *$OMP construct [clause [clause]…] •Header file or Fortran module #include “omp.h” use omp_lib Software & Services Group, Developer Products Division Software & Services Group 5 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 6. Agenda •What is OpenMP? •Parallel regions •Data-Sharing Attribute Clauses •Worksharing •OpenMP 3.0 Tasks •Synchronization •Runtime functions/environment variables •Intel® Parallel Debugger Extension •Optional Advanced topics Software & Services Group, Developer Products Division Software & Services Group 6 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 7. Parallel Region & Structured Blocks (C/C++) • Most OpenMP constructs apply to structured blocks – Structured block: a block with one point of entry at the top and one point of exit at the bottom – The only “branches” allowed are STOP statements in Fortran and exit() in C/C++ #pragma omp parallel if (go_now()) goto more; { #pragma omp parallel int id = omp_get_thread_num(); { int id = omp_get_thread_num(); more: res[id] = do_big_job (id); more: res[id] = do_big_job(id); if (conv (res[id])) goto done; if (conv (res[id])) goto more; goto more; } } printf (“All donen”); done: if (!really_done()) goto more; A structured block Not a structured block Software & Services Group, Developer Products Division Software & Services Group Developer©Products Division All rights reserved. Copyright 2010, Intel Corporation. Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners. brands and names of 3/14/2012 7 7 7 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 8. Agenda •What is OpenMP? •Parallel regions •Data-Sharing Attribute Clauses •Worksharing •OpenMP 3.0 Tasks •Synchronization •Runtime functions/environment variables •Intel® Parallel Debugger Extension •Optional Advanced topics Software & Services Group, Developer Products Division Software & Services Group 8 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 9. Data-Sharing Attribute Clauses shared Declares one or more list items to be shared by tasks generated by a Parallel or task construct(Default). private Declares one or more list items to be private to a task. default Allows the user to control the data-sharing attributes of variables that Are referenced in a parallel or task construct,and whose data-sharing Attributes are implicitly determined firstprivate Declares one or more listitems to be private to a task,and initializes Each of them with the value that the corresponding original item has When the construct is encountered. lastprivate Declares one or more list items to be private to an implicit task,and Causes the corresponding original list item to be updated after the end Of the region. reduction Specifies an operator and oneor more list items.For each listitem,a Private copy is created in each implicit task,and isinitialized Appropriately for the operator.After the end of the region,the original List item is updated with the values of the private copies using the Specified operator. Software & Services Group Developer Products Division 3/14/2012 9 9 9 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 10. The Private Clause •Reproduces the variable for each task – Variables are un-initialized; C++ object is default constructed – Any value external to the parallel region is undefined Software & Services Group, Developer Products Division Software & Services Group 10 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 11. FirstPrivate Clause 11 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 12. Lastprivate Clause •Variables update shared variable using value from last iteration •C++ objects are updated as if by assignment void sq2(int n, double *lastterm) { double x; int i; #pragma omp parallel #pragma omp for lastprivate(x) for (i = 0; i < n; i++){ x = a[i]*a[i] + b[i]*b[i]; b[i] = sqrt(x); } *lastterm = x; } Software & Services Group, Developer Products Division Software & Services Group Developer Products Division 3/14/2012 12 12 12 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 13. OpenMP* Reduction Clause reduction (op : list) •The variables in “list” must be shared in the enclosing parallel region •Inside parallel or work-sharing construct: – A PRIVATE copy of each list variable is created and initialized depending on the “op” – These copies are updated locally by threads – At end of construct, local copies are combined through “op” into a single value and combined with the value in the original SHARED variable Software & Services Group, Developer Products Division Software & Services Group 13 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 14. Activity 1 – Hello Worlds •Modify the “Hello, Worlds” serial code to run multithreaded using OpenMP* Software & Services Group, Developer Products Division Software & Services Group 14 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 15. Agenda •What is OpenMP? •Parallel regions •Data-Sharing Attribute Clauses •Worksharing •OpenMP 3.0 Tasks •Synchronization •Runtime functions/environment variables •Optional Advanced topics Software & Services Group, Developer Products Division Software & Services Group 15 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 16. Worksharing •Worksharing is the general term used in OpenMP to describe distribution of work across threads. •Three examples of worksharing in OpenMP are: •loop(omp for) construct •omp sections construct •omp task construct Automatically divides work among threads Software & Services Group, Developer Products Division Software &2010, Intel Corporation. All rights reserved. Developer©Products Division Copyright Services Group Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners. brands and names of 3/14/2012 16 16 16 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 17. Agenda •What is OpenMP? •Parallel regions •Data-Sharing Attribute Clauses •Worksharing – Parallel For •OpenMP 3.0 Tasks •Synchronization •Runtime functions/environment variables •Optional Advanced topics Software & Services Group, Developer Products Division Software & Services Group 17 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 18. omp for Construct // assume N=12 #pragma omp parallel #pragma omp parallel #pragma omp for for(i = 1, i < N+1, i++) #pragma omp for c[i] = a[i] + b[i]; i=1 i=5 i=9 i=2 i=6 i = 10 •Threads are assigned an i=3 i=7 i = 11 i=4 i=8 i = 12 independent set of iterations Implicit barrier •Threads must wait at the end of work-sharing construct Software & Services Group, Developer Products Division Software & Services Group Developer Products Division 3/14/2012 18 18 18 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 19. Combining constructs •These two code segments are equivalent #pragma omp parallel { #pragma omp for for (i=0;i< MAX; i++) { res[i] = huge(); } } #pragma omp parallel for for (i=0;i< MAX; i++) { res[i] = huge(); } Software & Services Group, Developer Products Division Software &2010, Intel Corporation. All rights reserved. Services Group Developer©Products Division Copyright Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners. brands and names of 3/14/2012 19 19 19 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 20. The schedule clause The schedule clause affects how loop iterations are mapped onto threads schedule(static [,chunk]) – Blocks of iterations of size “chunk” to threads – Round robin distribution – Low overhead, may cause load imbalance schedule(dynamic[,chunk]) – Threads grab “chunk” iterations – When done with iterations, thread requests next set – Higher threading overhead, can reduce load imbalance schedule(guided[,chunk]) – Dynamic schedule starting with large block – Size of the blocks shrink; no smaller than “chunk” schedule(runtime) – Scheduling is deferred until run time, and the schedule and chunk size are taken from the run-sched-var ICV Software & Services Group, Developer Products Division Software & Services Group 20 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 21. Assigning Loop Iterations in OpenMP*: Which Schedule to Use ScheduleClause WhenToUse STATIC Predictable and similar work Per iteration DYNAMIC Unpredictable,variable work Per iteration GUIDED Special case of dynamic to Reduce schedulingo verhead RUNTIME Modify schedule at run-time Via environment variable Software & Services Group, Developer Products Division Software & ServicesDivision Developer Products Group 3/14/2012 22 21 22 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 22. Schedule Clause Example #pragma omp parallel for schedule (static, 8) for( int i = start; i <= end; i += 2 ) { if ( TestForPrime(i) ) gPrimesFound++; } Iterations are divided into chunks of 8 • If start = 3, then first chunk is i={3,5,7,9,11,13,15,17} Software & Services Group, Developer Products Division Software &2010, Intel Corporation. All rights reserved. Services Group Developer©Products Division Copyright Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners. brands and names of 3/14/2012 23 23 22 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 23. Nested Parallelism #include <stdio.h> #include <omp.h> int main(int argc, char *argv[]) { int n; omp_set_nested(1); #pragma omp parallel private(n) { n=omp_get_thread_num(); #pragma omp parallel printf("thread number %d - nested thread number %dn", n, omp_get_thread_num()); } } thread number 0 - nested thread number 0 0 0 thread number 0 - nested thread number 1 1 thread number 1 - nested thread number 0 thread number 1 - nested thread number 1 1 0 1 Software & Services Group, Developer Products Division Software & Services Group Developer Products Division 3/14/2012 24 24 23 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 24. The collapse clause Collapse is new concept in OpenMP 3.0. May be used to specify how many loops are associated with the loop construct. #pragma omp parallel for collapse(2) for( i = 0; i < N; ++i ) for( k = 0; k < N; ++k ) for( j = 0; j < N; ++j ) foo(); Software & Services Group, Developer Products Division Software & Services Group 24 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 25. Agenda •What is OpenMP? •Parallel regions •Data-Sharing Attribute Clauses •Worksharing – Parallel Sections •OpenMP 3.0 Tasks •Synchronization •Runtime functions/environment variables •Optional Advanced topics Software & Services Group, Developer Products Division Software & Services Group 25 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 26. Task Decomposition a = alice(); b = bob(); s = boss(a, b); alice bob c = cy(); printf ("%6.2fn", bigboss(s,c)); boss cy big alice,bob, and cy boss can be computed in parallel Software & Services Group, Developer Products Division Software & Services Group Developer Products Division 3/14/2012 28 28 26 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 27. omp sections • #pragma omp sections • Must be inside a parallel region • Precedes a code block containing of N blocks of code that may be executed concurrently by N threads • Encompasses each omp section • #pragma omp section • Precedes each block of code within the encompassing block described above • May be omitted for first parallel section after the parallel sections pragma • Enclosed program segments are distributed for parallel execution among available threads Software & Services Group, Developer Products Division Software & Services Group 27 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 28. Functional Level Parallelism w sections #pragma omp parallel sections { #pragma omp section /* Optional */ a = alice(); #pragma omp section b = bob(); #pragma omp section c = cy(); } s = boss(a, b); printf ("%6.2fn", bigboss(s,c)); Software & Services Group, Developer Products Division Software &2010, Intel Corporation. All rights reserved. Services Group Developer©Products Division Copyright Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners. 3/14/2012 30 30 28 brands and names of © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 29. Advantage of Parallel Sections • Independent sections of code can execute concurrently – reduce execution time #pragma omp parallel sections { #pragma omp section phase1(); #pragma omp section phase2(); #pragma omp section phase3(); } Serial Parallel Developer©Products Division All rights reserved. Copyright 2010, Intel Corporation. Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners. brands and names of 3/14/2012 31 31 29 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 30. Agenda •What is OpenMP? •Parallel regions •Data-Sharing Attribute Clauses •Worksharing •OpenMP 3.0 Tasks •Synchronization •Runtime functions/environment variables •Optional Advanced topics Software & Services Group, Developer Products Division Software & Services Group 30 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 31. New Addition to OpenMP •Tasks – Main change for OpenMP 3.0 • Allows parallelization of irregular problems –unbounded loops –recursive algorithms –producer/consumer Software & Services Group, Developer Products Division Software & Services Group 31 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 32. What are tasks? • Tasks are independent units of work • Threads are assigned to perform the work of each task – Tasks may be deferred • Tasks may be executed immediately • The runtime system decides which of the above – Tasks are composed of: – code to execute – data environment – internal control variables (ICV) Serial Parallel Software & Services Group, Developer Products Division Software & Services Group Developer©Products Division All rights reserved. Copyright 2010, Intel Corporation. Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners. brands and names of 3/14/2012 34 34 32 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 33. Simple Task Example #pragma omp parallel // assume 8 threads A pool of 8 threads is { created here #pragma omp single private(p) { … One thread gets to execute while (p) { the while loop #pragma omp task { The single “while loop” processwork(p); thread creates a task for } each instance of p = p->next; processwork() } } } Software & Services Group, Developer Products Division Software & Services Group Developer©Products Division All rights reserved. Copyright 2010, Intel Corporation. Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners. brands and names of 3/14/2012 35 35 33 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 34. Task Construct – Explicit Task View – A team of threads is created at the omp parallel construct #pragma omp parallel – A single thread is chosen to { execute the while loop – lets #pragma omp single call this thread “L” – Thread L operates the while { // block 1 loop, creates tasks, and node * p = head; fetches next pointers while (p) { //block 2 – Each time L crosses the omp #pragma omp task task construct it generates a new task and has a thread process(p); assigned to it p = p->next; //block 3 – Each task runs in its own } thread – All tasks complete at the } barrier at the end of the } parallel region’s single construct Software & Services Group, Developer Products Division Software & Services Group Developer Products Division 3/14/2012 36 36 34 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 35. Why are tasks useful? Have potential to parallelize irregular patterns and recursive function calls 35 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 36. Activity 4 – Parallel Fibonacci Objective: 1. create a parallel version of Fibonacci sample. That uses OpenMP tasks; 2. try to find balance between serial and parallel recursion parts. Software & Services Group, Developer Products Division Software & Services Group 36 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 37. When are tasks gauranteed to be complete? Tasks are gauranteed to be complete: • At thread or task barriers • At the directive: #pragma omp barrier • At the directive: #pragma omp taskwait Software & Services Group, Developer Products Division Software & Services Group 37 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 38. Task Completion Example #pragma omp parallel Multiple foo tasks created { here – one for each thread #pragma omp task foo(); All foo tasks guaranteed to #pragma omp barrier be completed here #pragma omp single { #pragma omp task One bar task created here bar(); } } bar task guaranteed to be completed here Developer Products Division 3/14/2012 40 40 38 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 39. Agenda •What is OpenMP? •Parallel regions •Data-Sharing Attribute Clauses •Worksharing •OpenMP 3.0 Tasks •Synchronization •Runtime functions/environment variables •Optional Advanced topics Software & Services Group, Developer Products Division Software & Services Group 39 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 40. Example: Dot Product float dot_prod(float* a, float* b, int N) { float sum = 0.0; #pragma omp parallel for shared(sum) for(int i=0; i<N; i++) { sum += a[i] * b[i]; } return sum; } Software & Services Group, Developer Products Division Software & Services Group Developer Products Division 3/14/2012 42 42 40 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 41. Race Condition •A race condition is nondeterministic behavior caused by the times at which two or more threads access a shared variable •For example, suppose both Thread A and Thread B are executing the statement • area += 4.0 / (1.0 + x*x); Software & Services Group, Developer Products Division Software & Services Group Developer Products Division 3/14/2012 43 43 41 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 42. Two Timings 42 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 43. Protect Shared Data •Must protect access to shared, modifiable data float dot_prod(float* a, float* b, int N) { float sum = 0.0; #pragma omp parallel for shared(sum) for(int i=0; i<N; i++) { #pragma omp critical sum += a[i] * b[i]; } return sum; } Software & Services Group, Developer Products Division Software &2010, Intel Corporation. All rights reserved. Services Group Developer©Products Division Copyright Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners. brands and names of 3/14/2012 45 45 43 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 44. OpenMP* Critical Construct • #pragma omp critical [(lock_name)] •Defines a critical region on a structured block float RES; Threads wait their turn –at a time, only one calls consum() thereby protecting RES from race conditions Naming the critical #pragma omp critical (RES_lock) construct RES_lock is consum (B, RES); optional } } Good Practice – Name all critical sections Software & Services Group, Developer Products Division Software & Services Group Developer Products Division 3/14/2012 46 46 44 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 45. OpenMP* Reduction Clause reduction (op : list) •The variables in “list” must be shared in the enclosing parallel region •Inside parallel or work-sharing construct: – A PRIVATE copy of each list variable is created and initialized depending on the “op” – These copies are updated locally by threads – At end of construct, local copies are combined through “op” into a single value and combined with the value in the original SHARED variable Software & Services Group, Developer Products Division Software & Services Group 45 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 46. Reduction Example #include <stdio.h> #include "omp.h" int main(int argc, char *argv[]) { int count = 10; #pragma omp parallel reduction (+: count) { int num; num = omp_get_thread_num(); count++; printf("count is equal : %d on thread %d n", count, num); } printf("count at the end: %dn", count); } count is equal : 1 on thread 0 count is equal : 1 on thread 1 count at the end: 12 Software & Services Group, Developer Products Division Software &2010, Intel Corporation. All rights reserved. Services Group Developer©Products Division Copyright Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners. brands and names of 3/14/2012 48 48 46 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 47. C/C++ Reduction Operations •A range of associative operands can be used with reduction •Initial values are the ones that make sense mathematically Operand InitialValue Operand InitialValue + 0 & ~0 * 1 | 0 - 0 && 1 ^ 0 || 0 Developer©Products Division All rights reserved. Software &Intel Corporation. Copyright 2010, Services Group, Developer Products DivisionAll rights reserved. Copyright© 2011, Intel Corporation. 3/14/2012 49 49 Software & Services are the property of their *Other respective owners.the propertyaretheir respective owners. *Other brands and names Group brands and names of 47 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 48. Numerical Integration Example 48 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 49. Activity 5 - Computing Pi static long num_steps=100000; •Parallelize the double step, pi; numerical void main() integration code { int i; using OpenMP double x, sum = 0.0; step = 1.0/(double) num_steps; •What variables can for (i=0; i< num_steps; i++){ be shared? x = (i+0.5)*step; sum = sum + 4.0/(1.0 + x*x); •What variables need } to be private? pi = step * sum; printf(“Pi = %fn”,pi); •What variables should be set up for reductions? Software & Services Group, Developer Products Division Software & Services Group Developer©Products Division All rights reserved. Copyright 2010, Intel Corporation. Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their *Other respective owners.the property of their respective owners. 3/14/2012 51 51 49 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 50. Single Construct •Denotes block of code to be executed by only one thread – Implementation defined •Implicit barrier at end #pragma omp parallel { DoManyThings(); #pragma omp single { ExchangeBoundaries(); } // threads wait here for single DoManyMoreThings(); } Software & Services Group, Developer Products Division Software &2010, Intel Corporation. All rights reserved. Services Group Developer©Products Division Copyright Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners. brands and names of 3/14/2012 52 52 50 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 51. Master Construct •Denotes block of code to be executed only by the master thread •No implicit barrier at end #pragma omp parallel { DoManyThings(); #pragma omp master { // if not master skip to next stmt ExchangeBoundaries(); } DoManyMoreThings(); } Software & Services Group, Developer Products Division Software & Services Group Developer Products Division 3/14/2012 53 53 51 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 52. Implicit Barriers •Several OpenMP* constructs have implicit barriers – Parallel – necessary barrier – cannot be removed – for – single •Unnecessary barriers hurt performance and can be removed with the nowait clause – The nowait clause is applicable to: – For clause – Single clause Software & Services Group, Developer Products Division Software & Services Group 52 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 53. Nowait Clause #pragma omp for nowait for(...) #pragma single nowait {...}; { [...] } •Use when threads unnecessarily wait between independent computations #pragma omp for schedule(dynamic,1) nowait for(int i=0; i<n; i++) a[i] = bigFunc1(i); #pragma omp for schedule(dynamic,1) for(int j=0; j<m; j++) b[j] = bigFunc2(j); Software & Services Group, Developer Products Division Software & Services Group Developer Products Division 3/14/2012 55 55 53 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 54. Barrier Construct •Explicit barrier synchronization •Each thread waits until all threads arrive #pragma omp parallel shared (A, B, C) { DoSomeWork(A,B); printf(“Processed A into Bn”); #pragma omp barrier DoSomeWork(B,C); printf(“Processed B into Cn”); } Software & Services Group, Developer Products Division Software &2010, Intel Corporation. All rights reserved. Services Group Developer©Products Division Copyright Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners. brands and names of 3/14/2012 56 56 54 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 55. Atomic Construct •Special case of a critical section •Applies only to simple update of memory location #pragma omp parallel for shared(x, y, index, n) for (i = 0; i < n; i++) { #pragma omp atomic x[index[i]] += work1(i); y[i] += work2(i); } Software & Services Group, Developer Products Division Software &2010, Intel Corporation. All rights reserved. Services Group Developer©Products Division Copyright Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners. brands and names of 3/14/2012 57 57 55 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 56. Agenda •What is OpenMP? •Parallel regions •Data-Sharing Attribute Clauses •Worksharing •OpenMP 3.0 Tasks •Synchronization •Runtime functions/environment variables •Optional Advanced topics Software & Services Group, Developer Products Division Software & Services Group 56 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 57. Environment Variables •Set the default number of threads OMP_NUM_THREADS integer •Set the default scheduling protocol OMP_SCHEDULE “schedule[, chunk_size]” •Enable dynamic thread adjustment OMP_DYNAMIC [TRUE|FALSE] •Enable nested parallelism OMP_NESTED [TRUE|FALSE] Software & Services Group, Developer Products Division Software & Services Group 57 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 58. 20+ Library Routines Software & Services Group, Developer Products Division Software & Services Group Developer Products Division 3/14/2012 60 60 58 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 59. Library Routines •To fix the number of threads used in a program – Set the number of threads – Then save the number returned #include <omp.h> Request as many threads as you have processors. void main () { int num_threads; omp_set_num_threads (omp_num_procs ()); #pragma omp parallel Protect this { operation because int id = omp_get_thread_num (); memory stores are #pragma omp single not atomic num_threads = omp_get_num_threads (); do_lots_of_stuff (id); } } Software & Services Group, Developer Products Division Software & Services Group Developer©Products Division All rights reserved. Copyright 2010, Intel Corporation. Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners. brands and names of 3/14/2012 61 61 59 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 60. Agenda •What is OpenMP? •Parallel regions •Data-Sharing Attribute Clauses •Worksharing •OpenMP 3.0 Tasks •Synchronization •Runtime functions/environment variables •Optional Advanced topics – Intel® Parallel Debugger Extension – Advanced Concepts Software & Services Group, Developer Products Division Software & Services Group 60 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 61. Key Features Thread Shared Data Event Detection Break on Thread Shared Data Access (read/write) Re-entrant Function Detection SIMD SSE Registers Window Enhanced OpenMP* Support Serialize OpenMP threaded application execution on the fly Insight into thread groups, barriers, locks, wait lists etc. Software & Services Group, Developer Products Division Software & Services Group 61 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 62. Debugger Extensions in Microsoft* VS The Intel® Debugger Extensions is a plug-in to Visual Studio 2005/2008/2010 and add features to the Microsoft* debugger. The parallel debugger extensions has similar buttons as the Linux version of IDB. The functionality is identical. Software & Services Group, Developer Products Division Developer& ServicesDivision Software Products Group 3/14/2012 64 64 62 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 63. Shared Data Events Detection • Shared data access is a major Visual Studio* problem in multi- threaded Compiler applications Instrumented  Can cause hard to diagnose intermittent Application GUI program failure s GUI Extension  Tool support is required for detection Memory Acces Instrumentation Normal debugging Debug Engine Intel Debug Runtime RTL Debugger Extension Technology built on: events  Code & debug info Instrumentation by Intel compiler (/Qopenmp /debug:parallel)  Debug runtime library that collects data access traces and triggers debugger tool (libiomp5md.dll, pdbx.dll)  Debugger Extension in Visual Studio Debug Engine collects and exports information  GUI integration provides parallelism views and user interactivity Software & Services Group, Developer Products Division Software & Services Group Developer Products Division 3/14/2012 65 65 63 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 64. Shared Data Events Detection – contd. Data sharing detection is part of overall debug process  Breakpoint model (stop on detection)  GUI extensions show results & link to source  Filter capabilities to hide false positives New powerful data breakpoint types  Stop when 2nd thread accesses specific address  Stop on read from address Key User Benefit: A simplified feature to detect shared data accesses from multiple threads Software & Services Group, Developer Products Division Software & ServicesDivision Developer Products Group 3/14/2012 66 66 64 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 65. Shared Data Events Detection - Usage Enabling Shared Data Events Detection automatically enables “Stop On Event” Future detection and logging of events can be suppressed Software & Services Group, Developer Products Division Software & Services Group Developer Products Division 3/14/2012 67 67 65 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 66. Shared Data Events Detection - Filtering Data sharing detection is selective  Data Filter  Specific data items and variables can be excluded  Code Filter  Functions can be excluded  Source files can be excluded  Address ranges can be excluded Software & Services Group, Developer Products Division Software & Services Group Developer Products Division 3/14/2012 68 68 66 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 67. Re-Entrant Call Detection Automatically halts execution when a function is executed by more than one thread at any given point in time. Allows to identify reentrancy requirements/problems for multi- threaded applications Software & Services Group, Developer Products Division Software & ServicesDivision Developer Products Group 3/14/2012 69 67 69 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 68. Re-Entrant Call Detection • After execution is halted and user clicks ok the program counter will point exactly to where the problem occurred. Software & Services Group, Developer Products Division Software & ServicesDivision Developer Products Group 3/14/2012 70 68 70 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 69. SIMD SSE Debugging Window SSE Registers Window SSE Registers display of variables used for SIMD operations Free Formatting for flexible representation of data In-depth insight into data parallelization and vectorization. In-Place Edit Software & Services Group, Developer Products Division Developer©Products Division All rights reserved. Copyright 2010, Intel Corporation. Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners. brands and names of 3/14/2012 71 71 69 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 70. Enhanced OpenMP* Debugging Support •Dedicated OpenMP runtime object - information Windows  OpenMP Task and Spawn Tree lists  Barrier and Lock information  Task Wait lists  Thread Team worker lists •Serialize Parallel Regions  Change number of parallel threads dynamically during runtime to 1 or N (all)  Verify code correctness for serial execution vs. Parallel execution  Identify whether a runtime issue is really parallelism related User benefit: Detailed execution state information for OpenMP applications (deadlock detection). Influences execution behavior without recompile! Software & Services Group, Developer Products Division Software & Services Group Developer Products Division 3/14/2012 72 72 70 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 71. OpenMP* Information Windows • Tasks, Spawn Trees, Teams, Barriers, Locks • (Task ID reporting incorrect) Software & Services Group, Developer Products Division Software & ServicesDivision Developer Products Group 3/14/2012 73 71 73 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 72. Serialize Parallel Regions Problem: Parallel loop computes a wrong result. Is it a concurrency or algorithm issue ? Parallel Debug Support … Runtime access to the OpenMP num_thread property Set to 1 for serial execution of next parallel block Disable parallel User Benefit Verification of a algorithm “on-the-fly” without slowing down the entire application to serial execution On demand serial debugging without recompile/restart Enable parallel … Software & Services Group, Developer Products Division Software & Services Group Developer Products Division 3/14/2012 74 74 72 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 73. Agenda •What is OpenMP? •Parallel regions •Data-Sharing Attribute Clauses •Worksharing •OpenMP 3.0 Tasks •Synchronization •Runtime functions/environment variables •Optional Advanced topics – Intel® Parallel Debugger Extension – Advanced Concepts Software & Services Group, Developer Products Division Software & Services Group 73 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 74. Parallel Construct – Implicit Task View –Tasks are created in OpenMP even without an explicit task directive. –Lets look at how tasks are created implicitly for the code snippet below – Thread encountering parallel construct packages up a set of implicit tasks – Team of threads is created. – Each thread in team is assigned to one of the tasks parallel (and tied to it). – Barrier holds original master { int mydata thread until all implicit tasks are finished. Software & Services Group, Developer Products Division Software & Services Group Developer Products Division 3/14/2012 76 76 74 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 75. Task Construct #pragma omp task [clause[[,]clause] ...] structured-block where clause can be one of: if (expression) untied shared (list) private (list) firstprivate (list) default( shared | none ) Software & Services Group, Developer Products Division Software & Services Group 75 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 76. Tied & Untied Tasks – Tied Tasks: – A tied task gets a thread assigned to it at its first execution and the same thread services the task for its lifetime – A thread executing a tied task, can be suspended, and sent off to execute some other task, but eventually, the same thread will return to resume execution of its original tied task – Tasks are tied unless explicitly declared untied – Untied Tasks: – An united task has no long term association with any given thread. Any thread not otherwise occupied is free to execute an untied task. The thread assigned to execute an untied task may only change at a "task scheduling point". – An untied task is created by appending “untied” to the task clause 76 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 77. Task switching • task switching The act of a thread switching from the execution of one task to another task. • The purpose of task switching is distribute threads among the unassigned tasks in the team to avoid piling up long queues of unassigned tasks • Task switching, for tied tasks, can only occur at task scheduling points located within the following constructs encountered task constructs encountered taskwait constructs encountered barrier directives implicit barrier regions at the end of the tied task region • Untied tasks have implementation dependent scheduling points Software & Services Group, Developer Products Division Software & Services Group 77 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 78. Task switching example The thread executing the “for loop” , AKA the generating task, generates many tasks in a short time so... The SINGLE generating task will have to suspend for a while when “task pool” fills up • Task switching is invoked to start draining the “pool” • When “pool” is sufficiently drained – then the single task can being generating more tasks again int exp; //exp either T or F; #pragma omp single { for (i=0; i<ONEZILLION; i++) #pragma omp task process(item[i]); } } Developer Products Division Group, Software & Services Developer Products Division Software & Services Group 3/14/2012 80 80 78 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 79. Optional foil - OpenMP* API •Get the thread number within a team int omp_get_thread_num(void); •Get the number of threads in a team int omp_get_num_threads(void); •Usually not needed for OpenMP codes – Can lead to code not being serially consistent – Does have specific uses (debugging) – Must include a header file #include <omp.h> Software & Services Group, Developer Products Division Software & Services Group 79 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 80. Optional foil - Monte Carlo Pi 1 # of darts hitting circle  4r2 #of dartsin square r2 # of darts hitting circle  4 #of dartsin square loop 1 to MAX x.coor=(random#) y.coor=(random#) dist=sqrt(x^2 + y^2) r if (dist <= 1) hits=hits+1 pi = 4 * hits/MAX Software & Services Group, Developer Products Division Software & Services Group Developer Products Division 3/14/2012 82 82 80 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 81. Optional foil - Making Monte Carlo’s Parallel hits = 0 call SEED48(1) DO I = 1, max x = DRAND48() y = DRAND48() IF (SQRT(x*x + y*y) .LT. 1) THEN hits = hits+1 ENDIF END DO pi = REAL(hits)/REAL(max) * 4.0 What is the challenge here? Software & Services Group, Developer Products Division Software & Services Group Developer Products Division 3/14/2012 83 83 81 © 2012 WIPRO LTD | WWW.WIPRO.COM
  • 82. Optional Activity 6: Computing Pi •Use the Intel® Math Kernel Library (Intel® MKL) VSL: – Intel MKL’s VSL (Vector Statistics Libraries) – VSL creates an array, rather than a single random number – VSL can have multiple seeds (one for each thread) •Objective: – Use basic OpenMP* syntax to make Pi parallel – Choose the best code to divide the task up – Categorize properly all variables Software & Services Group, Developer Products Division Software & Services Group 82 © 2012 WIPRO LTD | WWW.WIPRO.COM