Enviar búsqueda
Cargar
Intel open mp
•
1 recomendación
•
682 vistas
Piyush Mittal
Seguir
Denunciar
Compartir
Denunciar
Compartir
1 de 82
Descargar ahora
Descargar para leer sin conexión
Recomendados
Java 8-revealed
Java 8-revealed
Hamed Hatami
Zend server for IBM i update 5.6
Zend server for IBM i update 5.6
Zend by Rogue Wave Software
Profile Guided Optimization
Profile Guided Optimization
Northwest C++ Users' Group
JavaOne 2012 - CON11234 - Multi device Content Display and a Smart Use of Ann...
JavaOne 2012 - CON11234 - Multi device Content Display and a Smart Use of Ann...
gdigugli
Performance and Extensibility with EMF
Performance and Extensibility with EMF
Kenn Hussey
OkAPI meet symfony, symfony meet OkAPI
OkAPI meet symfony, symfony meet OkAPI
Lukas Smith
EclipseCon 2007: Effective Use of the Eclipse Modeling Framework
EclipseCon 2007: Effective Use of the Eclipse Modeling Framework
Dave Steinberg
Smart Annotation Processing - Marseille JUG
Smart Annotation Processing - Marseille JUG
gdigugli
Recomendados
Java 8-revealed
Java 8-revealed
Hamed Hatami
Zend server for IBM i update 5.6
Zend server for IBM i update 5.6
Zend by Rogue Wave Software
Profile Guided Optimization
Profile Guided Optimization
Northwest C++ Users' Group
JavaOne 2012 - CON11234 - Multi device Content Display and a Smart Use of Ann...
JavaOne 2012 - CON11234 - Multi device Content Display and a Smart Use of Ann...
gdigugli
Performance and Extensibility with EMF
Performance and Extensibility with EMF
Kenn Hussey
OkAPI meet symfony, symfony meet OkAPI
OkAPI meet symfony, symfony meet OkAPI
Lukas Smith
EclipseCon 2007: Effective Use of the Eclipse Modeling Framework
EclipseCon 2007: Effective Use of the Eclipse Modeling Framework
Dave Steinberg
Smart Annotation Processing - Marseille JUG
Smart Annotation Processing - Marseille JUG
gdigugli
The Java Carputer
The Java Carputer
Simon Ritter
Key topics when migrating from FAST to Solr, EuroCon 2010
Key topics when migrating from FAST to Solr, EuroCon 2010
Cominvent AS
Towards JVM Dynamic Languages Toolchain
Towards JVM Dynamic Languages Toolchain
Attila Szegedi
Oop05 6
Oop05 6
schwaa
AOP in C# 2013
AOP in C# 2013
Antya Dev
Smart annotation processing - Paris JUG
Smart annotation processing - Paris JUG
gdigugli
Basics of building a blackfin application
Basics of building a blackfin application
Pantech ProLabs India Pvt Ltd
An Introduction to SPL, the Standard PHP Library
An Introduction to SPL, the Standard PHP Library
Robin Fernandes
Basic java part_ii
Basic java part_ii
Khaled AlGhazaly
Pgloader
Pgloader
tapoueh
Modules of the twenties
Modules of the twenties
Puppet
Enriching EMF Models with Scala (quick overview)
Enriching EMF Models with Scala (quick overview)
Filip Krikava
Itinerary Website (Web Development Document)
Itinerary Website (Web Development Document)
Traitet Thepbandansuk
Apache Harmony: An Open Innovation
Apache Harmony: An Open Innovation
Tim Ellison
Dvm
Dvm
Shivam Sharma
Puppetizing Your Organization
Puppetizing Your Organization
Robert Nelson
Mike Taulty OData (NxtGen User Group UK)
Mike Taulty OData (NxtGen User Group UK)
ukdpe
CDO Ignite
CDO Ignite
Holmes70
Constructor injection and other new features for Declarative Services 1.4
Constructor injection and other new features for Declarative Services 1.4
bjhargrave
TAO DAYS - API (IT Session)
TAO DAYS - API (IT Session)
Open Assessment Technologies
Reflection
Reflection
Piyush Mittal
Arp and rarp
Arp and rarp
Piyush Mittal
Más contenido relacionado
La actualidad más candente
The Java Carputer
The Java Carputer
Simon Ritter
Key topics when migrating from FAST to Solr, EuroCon 2010
Key topics when migrating from FAST to Solr, EuroCon 2010
Cominvent AS
Towards JVM Dynamic Languages Toolchain
Towards JVM Dynamic Languages Toolchain
Attila Szegedi
Oop05 6
Oop05 6
schwaa
AOP in C# 2013
AOP in C# 2013
Antya Dev
Smart annotation processing - Paris JUG
Smart annotation processing - Paris JUG
gdigugli
Basics of building a blackfin application
Basics of building a blackfin application
Pantech ProLabs India Pvt Ltd
An Introduction to SPL, the Standard PHP Library
An Introduction to SPL, the Standard PHP Library
Robin Fernandes
Basic java part_ii
Basic java part_ii
Khaled AlGhazaly
Pgloader
Pgloader
tapoueh
Modules of the twenties
Modules of the twenties
Puppet
Enriching EMF Models with Scala (quick overview)
Enriching EMF Models with Scala (quick overview)
Filip Krikava
Itinerary Website (Web Development Document)
Itinerary Website (Web Development Document)
Traitet Thepbandansuk
Apache Harmony: An Open Innovation
Apache Harmony: An Open Innovation
Tim Ellison
Dvm
Dvm
Shivam Sharma
Puppetizing Your Organization
Puppetizing Your Organization
Robert Nelson
Mike Taulty OData (NxtGen User Group UK)
Mike Taulty OData (NxtGen User Group UK)
ukdpe
CDO Ignite
CDO Ignite
Holmes70
Constructor injection and other new features for Declarative Services 1.4
Constructor injection and other new features for Declarative Services 1.4
bjhargrave
TAO DAYS - API (IT Session)
TAO DAYS - API (IT Session)
Open Assessment Technologies
La actualidad más candente
(20)
The Java Carputer
The Java Carputer
Key topics when migrating from FAST to Solr, EuroCon 2010
Key topics when migrating from FAST to Solr, EuroCon 2010
Towards JVM Dynamic Languages Toolchain
Towards JVM Dynamic Languages Toolchain
Oop05 6
Oop05 6
AOP in C# 2013
AOP in C# 2013
Smart annotation processing - Paris JUG
Smart annotation processing - Paris JUG
Basics of building a blackfin application
Basics of building a blackfin application
An Introduction to SPL, the Standard PHP Library
An Introduction to SPL, the Standard PHP Library
Basic java part_ii
Basic java part_ii
Pgloader
Pgloader
Modules of the twenties
Modules of the twenties
Enriching EMF Models with Scala (quick overview)
Enriching EMF Models with Scala (quick overview)
Itinerary Website (Web Development Document)
Itinerary Website (Web Development Document)
Apache Harmony: An Open Innovation
Apache Harmony: An Open Innovation
Dvm
Dvm
Puppetizing Your Organization
Puppetizing Your Organization
Mike Taulty OData (NxtGen User Group UK)
Mike Taulty OData (NxtGen User Group UK)
CDO Ignite
CDO Ignite
Constructor injection and other new features for Declarative Services 1.4
Constructor injection and other new features for Declarative Services 1.4
TAO DAYS - API (IT Session)
TAO DAYS - API (IT Session)
Destacado
Reflection
Reflection
Piyush Mittal
Arp and rarp
Arp and rarp
Piyush Mittal
Google app engine cheat sheet
Google app engine cheat sheet
Piyush Mittal
Intro to parallel computing
Intro to parallel computing
Piyush Mittal
Vi cheat sheet
Vi cheat sheet
Piyush Mittal
Power mock
Power mock
Piyush Mittal
Channel coding
Channel coding
Piyush Mittal
Online signature recognition
Online signature recognition
Piyush Mittal
Basics of Coding Theory
Basics of Coding Theory
Piyush Mittal
Security in mobile ad hoc networks
Security in mobile ad hoc networks
Piyush Mittal
Destacado
(10)
Reflection
Reflection
Arp and rarp
Arp and rarp
Google app engine cheat sheet
Google app engine cheat sheet
Intro to parallel computing
Intro to parallel computing
Vi cheat sheet
Vi cheat sheet
Power mock
Power mock
Channel coding
Channel coding
Online signature recognition
Online signature recognition
Basics of Coding Theory
Basics of Coding Theory
Security in mobile ad hoc networks
Security in mobile ad hoc networks
Similar a Intel open mp
Better Code: Concurrency
Better Code: Concurrency
Platonov Sergey
OpenERP Technical Memento V0.7.3
OpenERP Technical Memento V0.7.3
Borni DHIFI
Zend Products and PHP for IBMi
Zend Products and PHP for IBMi
Shlomo Vanunu
Open Source RAD with OpenERP 7.0
Open Source RAD with OpenERP 7.0
Quang Ngoc
A Fabric/Puppet Build/Deploy System
A Fabric/Puppet Build/Deploy System
adrian_nye
NFF-GO (YANFF) - Yet Another Network Function Framework
NFF-GO (YANFF) - Yet Another Network Function Framework
Michelle Holley
Golang
Golang
Software Infrastructure
Golang
Golang
Fatih Şimşek
NTTs Journey with Openstack-final
NTTs Journey with Openstack-final
shintaro mizuno
Room 2 - 3 - Nguyễn Hoài Nam & Nguyễn Việt Hùng - Terraform & Pulumi Comparin...
Room 2 - 3 - Nguyễn Hoài Nam & Nguyễn Việt Hùng - Terraform & Pulumi Comparin...
Vietnam Open Infrastructure User Group
Terraform vs Pulumi
Terraform vs Pulumi
HoaiNam307
Lambdas and Streams in Java SE 8: Making Bulk Operations simple - Simon Ritter
Lambdas and Streams in Java SE 8: Making Bulk Operations simple - Simon Ritter
JAXLondon2014
Lambdas And Streams in JDK8
Lambdas And Streams in JDK8
Simon Ritter
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...
Maarten Balliauw
First Steps in Python Programming
First Steps in Python Programming
Dozie Agbo
Building scalable and language independent java services using apache thrift
Building scalable and language independent java services using apache thrift
Talentica Software
Multiplatform shared codebase with Kotlin/Native - UA Mobile 2019
Multiplatform shared codebase with Kotlin/Native - UA Mobile 2019
UA Mobile
Multiplatform shared codebase with Kotlin/Native - UA Mobile 2019
Multiplatform shared codebase with Kotlin/Native - UA Mobile 2019
Eugene Kurko
Building scalable and language-independent Java services using Apache Thrift ...
Building scalable and language-independent Java services using Apache Thrift ...
IndicThreads
NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...
NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...
Maarten Balliauw
Similar a Intel open mp
(20)
Better Code: Concurrency
Better Code: Concurrency
OpenERP Technical Memento V0.7.3
OpenERP Technical Memento V0.7.3
Zend Products and PHP for IBMi
Zend Products and PHP for IBMi
Open Source RAD with OpenERP 7.0
Open Source RAD with OpenERP 7.0
A Fabric/Puppet Build/Deploy System
A Fabric/Puppet Build/Deploy System
NFF-GO (YANFF) - Yet Another Network Function Framework
NFF-GO (YANFF) - Yet Another Network Function Framework
Golang
Golang
Golang
Golang
NTTs Journey with Openstack-final
NTTs Journey with Openstack-final
Room 2 - 3 - Nguyễn Hoài Nam & Nguyễn Việt Hùng - Terraform & Pulumi Comparin...
Room 2 - 3 - Nguyễn Hoài Nam & Nguyễn Việt Hùng - Terraform & Pulumi Comparin...
Terraform vs Pulumi
Terraform vs Pulumi
Lambdas and Streams in Java SE 8: Making Bulk Operations simple - Simon Ritter
Lambdas and Streams in Java SE 8: Making Bulk Operations simple - Simon Ritter
Lambdas And Streams in JDK8
Lambdas And Streams in JDK8
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...
ConFoo Montreal - Microservices for building an IDE - The innards of JetBrain...
First Steps in Python Programming
First Steps in Python Programming
Building scalable and language independent java services using apache thrift
Building scalable and language independent java services using apache thrift
Multiplatform shared codebase with Kotlin/Native - UA Mobile 2019
Multiplatform shared codebase with Kotlin/Native - UA Mobile 2019
Multiplatform shared codebase with Kotlin/Native - UA Mobile 2019
Multiplatform shared codebase with Kotlin/Native - UA Mobile 2019
Building scalable and language-independent Java services using Apache Thrift ...
Building scalable and language-independent Java services using Apache Thrift ...
NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...
NDC Sydney 2019 - Microservices for building an IDE – The innards of JetBrain...
Más de Piyush Mittal
Design pattern tutorial
Design pattern tutorial
Piyush Mittal
Gpu archi
Gpu archi
Piyush Mittal
Cuda Architecture
Cuda Architecture
Piyush Mittal
Cuda toolkit reference manual
Cuda toolkit reference manual
Piyush Mittal
Matrix multiplication using CUDA
Matrix multiplication using CUDA
Piyush Mittal
Java cheat sheet
Java cheat sheet
Piyush Mittal
Git cheat sheet
Git cheat sheet
Piyush Mittal
Css cheat sheet
Css cheat sheet
Piyush Mittal
Cpp cheat sheet
Cpp cheat sheet
Piyush Mittal
Ubuntu cheat sheet
Ubuntu cheat sheet
Piyush Mittal
Php cheat sheet
Php cheat sheet
Piyush Mittal
oracle 9i cheat sheet
oracle 9i cheat sheet
Piyush Mittal
Open ssh cheet sheat
Open ssh cheet sheat
Piyush Mittal
Opencl cheet sheet
Opencl cheet sheet
Piyush Mittal
Open MP cheet sheet
Open MP cheet sheet
Piyush Mittal
My sql cheat sheet
My sql cheat sheet
Piyush Mittal
Matlab cheatsheet
Matlab cheatsheet
Piyush Mittal
Latex cheat sheet
Latex cheat sheet
Piyush Mittal
Gdb cheat sheet
Gdb cheat sheet
Piyush Mittal
Linked lists
Linked lists
Piyush Mittal
Más de Piyush Mittal
(20)
Design pattern tutorial
Design pattern tutorial
Gpu archi
Gpu archi
Cuda Architecture
Cuda Architecture
Cuda toolkit reference manual
Cuda toolkit reference manual
Matrix multiplication using CUDA
Matrix multiplication using CUDA
Java cheat sheet
Java cheat sheet
Git cheat sheet
Git cheat sheet
Css cheat sheet
Css cheat sheet
Cpp cheat sheet
Cpp cheat sheet
Ubuntu cheat sheet
Ubuntu cheat sheet
Php cheat sheet
Php cheat sheet
oracle 9i cheat sheet
oracle 9i cheat sheet
Open ssh cheet sheat
Open ssh cheet sheat
Opencl cheet sheet
Opencl cheet sheet
Open MP cheet sheet
Open MP cheet sheet
My sql cheat sheet
My sql cheat sheet
Matlab cheatsheet
Matlab cheatsheet
Latex cheat sheet
Latex cheat sheet
Gdb cheat sheet
Gdb cheat sheet
Linked lists
Linked lists
Intel open mp
1.
Programming with OpenMP*
Software & 1 © 2012 WIPRO LTD | WWW.WIPRO.COM
2.
Agenda
•What is OpenMP? •Parallel regions •Data-Sharing Attribute Clauses •Worksharing •OpenMP 3.0 Tasks •Synchronization •Runtime functions/environment variables •Optional Advanced topics Software & Services Group, Developer Products Division Software & Services Group 2 © 2012 WIPRO LTD | WWW.WIPRO.COM
3.
What Is OpenMP?
Software & Services Group, Developer Products Division Software & Services Group Developer©Products Division All rights reserved. Copyright 2010, Intel Corporation. Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners. 3/14/2012 3 3 3 brands and names of © 2012 WIPRO LTD | WWW.WIPRO.COM
4.
Programming Model
Fork-Join Parallelism: • Master thread spawns a team of threads as needed • Parallelism is added incrementally: that is, the sequential program evolves into a parallel program Master Thread Parallel Regions Software & Services Group, Developer Products Division Software & ServicesDivision Developer Products Group 3/14/2012 4 4 4 © 2012 WIPRO LTD | WWW.WIPRO.COM
5.
A Few Details
to Get Started •Compiler option /Qopenmp in Windows or –openmp in Linux •Most of the constructs in OpenMP are compiler directives or pragmas –For C and C++, the pragmas take the form: #pragma omp construct [clause [clause]…] –For Fortran, the directives take one of the forms: C$OMP construct [clause [clause]…] !$OMP construct [clause [clause]…] *$OMP construct [clause [clause]…] •Header file or Fortran module #include “omp.h” use omp_lib Software & Services Group, Developer Products Division Software & Services Group 5 © 2012 WIPRO LTD | WWW.WIPRO.COM
6.
Agenda
•What is OpenMP? •Parallel regions •Data-Sharing Attribute Clauses •Worksharing •OpenMP 3.0 Tasks •Synchronization •Runtime functions/environment variables •Intel® Parallel Debugger Extension •Optional Advanced topics Software & Services Group, Developer Products Division Software & Services Group 6 © 2012 WIPRO LTD | WWW.WIPRO.COM
7.
Parallel Region &
Structured Blocks (C/C++) • Most OpenMP constructs apply to structured blocks – Structured block: a block with one point of entry at the top and one point of exit at the bottom – The only “branches” allowed are STOP statements in Fortran and exit() in C/C++ #pragma omp parallel if (go_now()) goto more; { #pragma omp parallel int id = omp_get_thread_num(); { int id = omp_get_thread_num(); more: res[id] = do_big_job (id); more: res[id] = do_big_job(id); if (conv (res[id])) goto done; if (conv (res[id])) goto more; goto more; } } printf (“All donen”); done: if (!really_done()) goto more; A structured block Not a structured block Software & Services Group, Developer Products Division Software & Services Group Developer©Products Division All rights reserved. Copyright 2010, Intel Corporation. Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners. brands and names of 3/14/2012 7 7 7 © 2012 WIPRO LTD | WWW.WIPRO.COM
8.
Agenda
•What is OpenMP? •Parallel regions •Data-Sharing Attribute Clauses •Worksharing •OpenMP 3.0 Tasks •Synchronization •Runtime functions/environment variables •Intel® Parallel Debugger Extension •Optional Advanced topics Software & Services Group, Developer Products Division Software & Services Group 8 © 2012 WIPRO LTD | WWW.WIPRO.COM
9.
Data-Sharing Attribute Clauses
shared Declares one or more list items to be shared by tasks generated by a Parallel or task construct(Default). private Declares one or more list items to be private to a task. default Allows the user to control the data-sharing attributes of variables that Are referenced in a parallel or task construct,and whose data-sharing Attributes are implicitly determined firstprivate Declares one or more listitems to be private to a task,and initializes Each of them with the value that the corresponding original item has When the construct is encountered. lastprivate Declares one or more list items to be private to an implicit task,and Causes the corresponding original list item to be updated after the end Of the region. reduction Specifies an operator and oneor more list items.For each listitem,a Private copy is created in each implicit task,and isinitialized Appropriately for the operator.After the end of the region,the original List item is updated with the values of the private copies using the Specified operator. Software & Services Group Developer Products Division 3/14/2012 9 9 9 © 2012 WIPRO LTD | WWW.WIPRO.COM
10.
The Private Clause
•Reproduces the variable for each task – Variables are un-initialized; C++ object is default constructed – Any value external to the parallel region is undefined Software & Services Group, Developer Products Division Software & Services Group 10 © 2012 WIPRO LTD | WWW.WIPRO.COM
11.
FirstPrivate Clause 11
© 2012 WIPRO LTD | WWW.WIPRO.COM
12.
Lastprivate Clause
•Variables update shared variable using value from last iteration •C++ objects are updated as if by assignment void sq2(int n, double *lastterm) { double x; int i; #pragma omp parallel #pragma omp for lastprivate(x) for (i = 0; i < n; i++){ x = a[i]*a[i] + b[i]*b[i]; b[i] = sqrt(x); } *lastterm = x; } Software & Services Group, Developer Products Division Software & Services Group Developer Products Division 3/14/2012 12 12 12 © 2012 WIPRO LTD | WWW.WIPRO.COM
13.
OpenMP* Reduction Clause
reduction (op : list) •The variables in “list” must be shared in the enclosing parallel region •Inside parallel or work-sharing construct: – A PRIVATE copy of each list variable is created and initialized depending on the “op” – These copies are updated locally by threads – At end of construct, local copies are combined through “op” into a single value and combined with the value in the original SHARED variable Software & Services Group, Developer Products Division Software & Services Group 13 © 2012 WIPRO LTD | WWW.WIPRO.COM
14.
Activity 1 –
Hello Worlds •Modify the “Hello, Worlds” serial code to run multithreaded using OpenMP* Software & Services Group, Developer Products Division Software & Services Group 14 © 2012 WIPRO LTD | WWW.WIPRO.COM
15.
Agenda
•What is OpenMP? •Parallel regions •Data-Sharing Attribute Clauses •Worksharing •OpenMP 3.0 Tasks •Synchronization •Runtime functions/environment variables •Optional Advanced topics Software & Services Group, Developer Products Division Software & Services Group 15 © 2012 WIPRO LTD | WWW.WIPRO.COM
16.
Worksharing
•Worksharing is the general term used in OpenMP to describe distribution of work across threads. •Three examples of worksharing in OpenMP are: •loop(omp for) construct •omp sections construct •omp task construct Automatically divides work among threads Software & Services Group, Developer Products Division Software &2010, Intel Corporation. All rights reserved. Developer©Products Division Copyright Services Group Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners. brands and names of 3/14/2012 16 16 16 © 2012 WIPRO LTD | WWW.WIPRO.COM
17.
Agenda
•What is OpenMP? •Parallel regions •Data-Sharing Attribute Clauses •Worksharing – Parallel For •OpenMP 3.0 Tasks •Synchronization •Runtime functions/environment variables •Optional Advanced topics Software & Services Group, Developer Products Division Software & Services Group 17 © 2012 WIPRO LTD | WWW.WIPRO.COM
18.
omp for Construct
// assume N=12 #pragma omp parallel #pragma omp parallel #pragma omp for for(i = 1, i < N+1, i++) #pragma omp for c[i] = a[i] + b[i]; i=1 i=5 i=9 i=2 i=6 i = 10 •Threads are assigned an i=3 i=7 i = 11 i=4 i=8 i = 12 independent set of iterations Implicit barrier •Threads must wait at the end of work-sharing construct Software & Services Group, Developer Products Division Software & Services Group Developer Products Division 3/14/2012 18 18 18 © 2012 WIPRO LTD | WWW.WIPRO.COM
19.
Combining constructs
•These two code segments are equivalent #pragma omp parallel { #pragma omp for for (i=0;i< MAX; i++) { res[i] = huge(); } } #pragma omp parallel for for (i=0;i< MAX; i++) { res[i] = huge(); } Software & Services Group, Developer Products Division Software &2010, Intel Corporation. All rights reserved. Services Group Developer©Products Division Copyright Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners. brands and names of 3/14/2012 19 19 19 © 2012 WIPRO LTD | WWW.WIPRO.COM
20.
The schedule clause
The schedule clause affects how loop iterations are mapped onto threads schedule(static [,chunk]) – Blocks of iterations of size “chunk” to threads – Round robin distribution – Low overhead, may cause load imbalance schedule(dynamic[,chunk]) – Threads grab “chunk” iterations – When done with iterations, thread requests next set – Higher threading overhead, can reduce load imbalance schedule(guided[,chunk]) – Dynamic schedule starting with large block – Size of the blocks shrink; no smaller than “chunk” schedule(runtime) – Scheduling is deferred until run time, and the schedule and chunk size are taken from the run-sched-var ICV Software & Services Group, Developer Products Division Software & Services Group 20 © 2012 WIPRO LTD | WWW.WIPRO.COM
21.
Assigning Loop Iterations
in OpenMP*: Which Schedule to Use ScheduleClause WhenToUse STATIC Predictable and similar work Per iteration DYNAMIC Unpredictable,variable work Per iteration GUIDED Special case of dynamic to Reduce schedulingo verhead RUNTIME Modify schedule at run-time Via environment variable Software & Services Group, Developer Products Division Software & ServicesDivision Developer Products Group 3/14/2012 22 21 22 © 2012 WIPRO LTD | WWW.WIPRO.COM
22.
Schedule Clause Example
#pragma omp parallel for schedule (static, 8) for( int i = start; i <= end; i += 2 ) { if ( TestForPrime(i) ) gPrimesFound++; } Iterations are divided into chunks of 8 • If start = 3, then first chunk is i={3,5,7,9,11,13,15,17} Software & Services Group, Developer Products Division Software &2010, Intel Corporation. All rights reserved. Services Group Developer©Products Division Copyright Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners. brands and names of 3/14/2012 23 23 22 © 2012 WIPRO LTD | WWW.WIPRO.COM
23.
Nested Parallelism #include <stdio.h> #include
<omp.h> int main(int argc, char *argv[]) { int n; omp_set_nested(1); #pragma omp parallel private(n) { n=omp_get_thread_num(); #pragma omp parallel printf("thread number %d - nested thread number %dn", n, omp_get_thread_num()); } } thread number 0 - nested thread number 0 0 0 thread number 0 - nested thread number 1 1 thread number 1 - nested thread number 0 thread number 1 - nested thread number 1 1 0 1 Software & Services Group, Developer Products Division Software & Services Group Developer Products Division 3/14/2012 24 24 23 © 2012 WIPRO LTD | WWW.WIPRO.COM
24.
The collapse clause
Collapse is new concept in OpenMP 3.0. May be used to specify how many loops are associated with the loop construct. #pragma omp parallel for collapse(2) for( i = 0; i < N; ++i ) for( k = 0; k < N; ++k ) for( j = 0; j < N; ++j ) foo(); Software & Services Group, Developer Products Division Software & Services Group 24 © 2012 WIPRO LTD | WWW.WIPRO.COM
25.
Agenda
•What is OpenMP? •Parallel regions •Data-Sharing Attribute Clauses •Worksharing – Parallel Sections •OpenMP 3.0 Tasks •Synchronization •Runtime functions/environment variables •Optional Advanced topics Software & Services Group, Developer Products Division Software & Services Group 25 © 2012 WIPRO LTD | WWW.WIPRO.COM
26.
Task Decomposition
a = alice(); b = bob(); s = boss(a, b); alice bob c = cy(); printf ("%6.2fn", bigboss(s,c)); boss cy big alice,bob, and cy boss can be computed in parallel Software & Services Group, Developer Products Division Software & Services Group Developer Products Division 3/14/2012 28 28 26 © 2012 WIPRO LTD | WWW.WIPRO.COM
27.
omp sections
• #pragma omp sections • Must be inside a parallel region • Precedes a code block containing of N blocks of code that may be executed concurrently by N threads • Encompasses each omp section • #pragma omp section • Precedes each block of code within the encompassing block described above • May be omitted for first parallel section after the parallel sections pragma • Enclosed program segments are distributed for parallel execution among available threads Software & Services Group, Developer Products Division Software & Services Group 27 © 2012 WIPRO LTD | WWW.WIPRO.COM
28.
Functional Level Parallelism
w sections #pragma omp parallel sections { #pragma omp section /* Optional */ a = alice(); #pragma omp section b = bob(); #pragma omp section c = cy(); } s = boss(a, b); printf ("%6.2fn", bigboss(s,c)); Software & Services Group, Developer Products Division Software &2010, Intel Corporation. All rights reserved. Services Group Developer©Products Division Copyright Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners. 3/14/2012 30 30 28 brands and names of © 2012 WIPRO LTD | WWW.WIPRO.COM
29.
Advantage of Parallel
Sections • Independent sections of code can execute concurrently – reduce execution time #pragma omp parallel sections { #pragma omp section phase1(); #pragma omp section phase2(); #pragma omp section phase3(); } Serial Parallel Developer©Products Division All rights reserved. Copyright 2010, Intel Corporation. Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners. brands and names of 3/14/2012 31 31 29 © 2012 WIPRO LTD | WWW.WIPRO.COM
30.
Agenda
•What is OpenMP? •Parallel regions •Data-Sharing Attribute Clauses •Worksharing •OpenMP 3.0 Tasks •Synchronization •Runtime functions/environment variables •Optional Advanced topics Software & Services Group, Developer Products Division Software & Services Group 30 © 2012 WIPRO LTD | WWW.WIPRO.COM
31.
New Addition to
OpenMP •Tasks – Main change for OpenMP 3.0 • Allows parallelization of irregular problems –unbounded loops –recursive algorithms –producer/consumer Software & Services Group, Developer Products Division Software & Services Group 31 © 2012 WIPRO LTD | WWW.WIPRO.COM
32.
What are tasks?
• Tasks are independent units of work • Threads are assigned to perform the work of each task – Tasks may be deferred • Tasks may be executed immediately • The runtime system decides which of the above – Tasks are composed of: – code to execute – data environment – internal control variables (ICV) Serial Parallel Software & Services Group, Developer Products Division Software & Services Group Developer©Products Division All rights reserved. Copyright 2010, Intel Corporation. Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners. brands and names of 3/14/2012 34 34 32 © 2012 WIPRO LTD | WWW.WIPRO.COM
33.
Simple Task Example
#pragma omp parallel // assume 8 threads A pool of 8 threads is { created here #pragma omp single private(p) { … One thread gets to execute while (p) { the while loop #pragma omp task { The single “while loop” processwork(p); thread creates a task for } each instance of p = p->next; processwork() } } } Software & Services Group, Developer Products Division Software & Services Group Developer©Products Division All rights reserved. Copyright 2010, Intel Corporation. Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners. brands and names of 3/14/2012 35 35 33 © 2012 WIPRO LTD | WWW.WIPRO.COM
34.
Task Construct –
Explicit Task View – A team of threads is created at the omp parallel construct #pragma omp parallel – A single thread is chosen to { execute the while loop – lets #pragma omp single call this thread “L” – Thread L operates the while { // block 1 loop, creates tasks, and node * p = head; fetches next pointers while (p) { //block 2 – Each time L crosses the omp #pragma omp task task construct it generates a new task and has a thread process(p); assigned to it p = p->next; //block 3 – Each task runs in its own } thread – All tasks complete at the } barrier at the end of the } parallel region’s single construct Software & Services Group, Developer Products Division Software & Services Group Developer Products Division 3/14/2012 36 36 34 © 2012 WIPRO LTD | WWW.WIPRO.COM
35.
Why are tasks
useful? Have potential to parallelize irregular patterns and recursive function calls 35 © 2012 WIPRO LTD | WWW.WIPRO.COM
36.
Activity 4 –
Parallel Fibonacci Objective: 1. create a parallel version of Fibonacci sample. That uses OpenMP tasks; 2. try to find balance between serial and parallel recursion parts. Software & Services Group, Developer Products Division Software & Services Group 36 © 2012 WIPRO LTD | WWW.WIPRO.COM
37.
When are tasks
gauranteed to be complete? Tasks are gauranteed to be complete: • At thread or task barriers • At the directive: #pragma omp barrier • At the directive: #pragma omp taskwait Software & Services Group, Developer Products Division Software & Services Group 37 © 2012 WIPRO LTD | WWW.WIPRO.COM
38.
Task Completion Example
#pragma omp parallel Multiple foo tasks created { here – one for each thread #pragma omp task foo(); All foo tasks guaranteed to #pragma omp barrier be completed here #pragma omp single { #pragma omp task One bar task created here bar(); } } bar task guaranteed to be completed here Developer Products Division 3/14/2012 40 40 38 © 2012 WIPRO LTD | WWW.WIPRO.COM
39.
Agenda
•What is OpenMP? •Parallel regions •Data-Sharing Attribute Clauses •Worksharing •OpenMP 3.0 Tasks •Synchronization •Runtime functions/environment variables •Optional Advanced topics Software & Services Group, Developer Products Division Software & Services Group 39 © 2012 WIPRO LTD | WWW.WIPRO.COM
40.
Example: Dot Product
float dot_prod(float* a, float* b, int N) { float sum = 0.0; #pragma omp parallel for shared(sum) for(int i=0; i<N; i++) { sum += a[i] * b[i]; } return sum; } Software & Services Group, Developer Products Division Software & Services Group Developer Products Division 3/14/2012 42 42 40 © 2012 WIPRO LTD | WWW.WIPRO.COM
41.
Race Condition
•A race condition is nondeterministic behavior caused by the times at which two or more threads access a shared variable •For example, suppose both Thread A and Thread B are executing the statement • area += 4.0 / (1.0 + x*x); Software & Services Group, Developer Products Division Software & Services Group Developer Products Division 3/14/2012 43 43 41 © 2012 WIPRO LTD | WWW.WIPRO.COM
42.
Two Timings 42
© 2012 WIPRO LTD | WWW.WIPRO.COM
43.
Protect Shared Data
•Must protect access to shared, modifiable data float dot_prod(float* a, float* b, int N) { float sum = 0.0; #pragma omp parallel for shared(sum) for(int i=0; i<N; i++) { #pragma omp critical sum += a[i] * b[i]; } return sum; } Software & Services Group, Developer Products Division Software &2010, Intel Corporation. All rights reserved. Services Group Developer©Products Division Copyright Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners. brands and names of 3/14/2012 45 45 43 © 2012 WIPRO LTD | WWW.WIPRO.COM
44.
OpenMP* Critical Construct
• #pragma omp critical [(lock_name)] •Defines a critical region on a structured block float RES; Threads wait their turn –at a time, only one calls consum() thereby protecting RES from race conditions Naming the critical #pragma omp critical (RES_lock) construct RES_lock is consum (B, RES); optional } } Good Practice – Name all critical sections Software & Services Group, Developer Products Division Software & Services Group Developer Products Division 3/14/2012 46 46 44 © 2012 WIPRO LTD | WWW.WIPRO.COM
45.
OpenMP* Reduction Clause
reduction (op : list) •The variables in “list” must be shared in the enclosing parallel region •Inside parallel or work-sharing construct: – A PRIVATE copy of each list variable is created and initialized depending on the “op” – These copies are updated locally by threads – At end of construct, local copies are combined through “op” into a single value and combined with the value in the original SHARED variable Software & Services Group, Developer Products Division Software & Services Group 45 © 2012 WIPRO LTD | WWW.WIPRO.COM
46.
Reduction Example
#include <stdio.h> #include "omp.h" int main(int argc, char *argv[]) { int count = 10; #pragma omp parallel reduction (+: count) { int num; num = omp_get_thread_num(); count++; printf("count is equal : %d on thread %d n", count, num); } printf("count at the end: %dn", count); } count is equal : 1 on thread 0 count is equal : 1 on thread 1 count at the end: 12 Software & Services Group, Developer Products Division Software &2010, Intel Corporation. All rights reserved. Services Group Developer©Products Division Copyright Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners. brands and names of 3/14/2012 48 48 46 © 2012 WIPRO LTD | WWW.WIPRO.COM
47.
C/C++ Reduction Operations
•A range of associative operands can be used with reduction •Initial values are the ones that make sense mathematically Operand InitialValue Operand InitialValue + 0 & ~0 * 1 | 0 - 0 && 1 ^ 0 || 0 Developer©Products Division All rights reserved. Software &Intel Corporation. Copyright 2010, Services Group, Developer Products DivisionAll rights reserved. Copyright© 2011, Intel Corporation. 3/14/2012 49 49 Software & Services are the property of their *Other respective owners.the propertyaretheir respective owners. *Other brands and names Group brands and names of 47 © 2012 WIPRO LTD | WWW.WIPRO.COM
48.
Numerical Integration Example 48
© 2012 WIPRO LTD | WWW.WIPRO.COM
49.
Activity 5 -
Computing Pi static long num_steps=100000; •Parallelize the double step, pi; numerical void main() integration code { int i; using OpenMP double x, sum = 0.0; step = 1.0/(double) num_steps; •What variables can for (i=0; i< num_steps; i++){ be shared? x = (i+0.5)*step; sum = sum + 4.0/(1.0 + x*x); •What variables need } to be private? pi = step * sum; printf(“Pi = %fn”,pi); •What variables should be set up for reductions? Software & Services Group, Developer Products Division Software & Services Group Developer©Products Division All rights reserved. Copyright 2010, Intel Corporation. Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their *Other respective owners.the property of their respective owners. 3/14/2012 51 51 49 © 2012 WIPRO LTD | WWW.WIPRO.COM
50.
Single Construct
•Denotes block of code to be executed by only one thread – Implementation defined •Implicit barrier at end #pragma omp parallel { DoManyThings(); #pragma omp single { ExchangeBoundaries(); } // threads wait here for single DoManyMoreThings(); } Software & Services Group, Developer Products Division Software &2010, Intel Corporation. All rights reserved. Services Group Developer©Products Division Copyright Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners. brands and names of 3/14/2012 52 52 50 © 2012 WIPRO LTD | WWW.WIPRO.COM
51.
Master Construct
•Denotes block of code to be executed only by the master thread •No implicit barrier at end #pragma omp parallel { DoManyThings(); #pragma omp master { // if not master skip to next stmt ExchangeBoundaries(); } DoManyMoreThings(); } Software & Services Group, Developer Products Division Software & Services Group Developer Products Division 3/14/2012 53 53 51 © 2012 WIPRO LTD | WWW.WIPRO.COM
52.
Implicit Barriers
•Several OpenMP* constructs have implicit barriers – Parallel – necessary barrier – cannot be removed – for – single •Unnecessary barriers hurt performance and can be removed with the nowait clause – The nowait clause is applicable to: – For clause – Single clause Software & Services Group, Developer Products Division Software & Services Group 52 © 2012 WIPRO LTD | WWW.WIPRO.COM
53.
Nowait Clause
#pragma omp for nowait for(...) #pragma single nowait {...}; { [...] } •Use when threads unnecessarily wait between independent computations #pragma omp for schedule(dynamic,1) nowait for(int i=0; i<n; i++) a[i] = bigFunc1(i); #pragma omp for schedule(dynamic,1) for(int j=0; j<m; j++) b[j] = bigFunc2(j); Software & Services Group, Developer Products Division Software & Services Group Developer Products Division 3/14/2012 55 55 53 © 2012 WIPRO LTD | WWW.WIPRO.COM
54.
Barrier Construct
•Explicit barrier synchronization •Each thread waits until all threads arrive #pragma omp parallel shared (A, B, C) { DoSomeWork(A,B); printf(“Processed A into Bn”); #pragma omp barrier DoSomeWork(B,C); printf(“Processed B into Cn”); } Software & Services Group, Developer Products Division Software &2010, Intel Corporation. All rights reserved. Services Group Developer©Products Division Copyright Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners. brands and names of 3/14/2012 56 56 54 © 2012 WIPRO LTD | WWW.WIPRO.COM
55.
Atomic Construct
•Special case of a critical section •Applies only to simple update of memory location #pragma omp parallel for shared(x, y, index, n) for (i = 0; i < n; i++) { #pragma omp atomic x[index[i]] += work1(i); y[i] += work2(i); } Software & Services Group, Developer Products Division Software &2010, Intel Corporation. All rights reserved. Services Group Developer©Products Division Copyright Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners. brands and names of 3/14/2012 57 57 55 © 2012 WIPRO LTD | WWW.WIPRO.COM
56.
Agenda
•What is OpenMP? •Parallel regions •Data-Sharing Attribute Clauses •Worksharing •OpenMP 3.0 Tasks •Synchronization •Runtime functions/environment variables •Optional Advanced topics Software & Services Group, Developer Products Division Software & Services Group 56 © 2012 WIPRO LTD | WWW.WIPRO.COM
57.
Environment Variables
•Set the default number of threads OMP_NUM_THREADS integer •Set the default scheduling protocol OMP_SCHEDULE “schedule[, chunk_size]” •Enable dynamic thread adjustment OMP_DYNAMIC [TRUE|FALSE] •Enable nested parallelism OMP_NESTED [TRUE|FALSE] Software & Services Group, Developer Products Division Software & Services Group 57 © 2012 WIPRO LTD | WWW.WIPRO.COM
58.
20+ Library Routines
Software & Services Group, Developer Products Division Software & Services Group Developer Products Division 3/14/2012 60 60 58 © 2012 WIPRO LTD | WWW.WIPRO.COM
59.
Library Routines
•To fix the number of threads used in a program – Set the number of threads – Then save the number returned #include <omp.h> Request as many threads as you have processors. void main () { int num_threads; omp_set_num_threads (omp_num_procs ()); #pragma omp parallel Protect this { operation because int id = omp_get_thread_num (); memory stores are #pragma omp single not atomic num_threads = omp_get_num_threads (); do_lots_of_stuff (id); } } Software & Services Group, Developer Products Division Software & Services Group Developer©Products Division All rights reserved. Copyright 2010, Intel Corporation. Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners. brands and names of 3/14/2012 61 61 59 © 2012 WIPRO LTD | WWW.WIPRO.COM
60.
Agenda
•What is OpenMP? •Parallel regions •Data-Sharing Attribute Clauses •Worksharing •OpenMP 3.0 Tasks •Synchronization •Runtime functions/environment variables •Optional Advanced topics – Intel® Parallel Debugger Extension – Advanced Concepts Software & Services Group, Developer Products Division Software & Services Group 60 © 2012 WIPRO LTD | WWW.WIPRO.COM
61.
Key Features
Thread Shared Data Event Detection Break on Thread Shared Data Access (read/write) Re-entrant Function Detection SIMD SSE Registers Window Enhanced OpenMP* Support Serialize OpenMP threaded application execution on the fly Insight into thread groups, barriers, locks, wait lists etc. Software & Services Group, Developer Products Division Software & Services Group 61 © 2012 WIPRO LTD | WWW.WIPRO.COM
62.
Debugger Extensions in
Microsoft* VS The Intel® Debugger Extensions is a plug-in to Visual Studio 2005/2008/2010 and add features to the Microsoft* debugger. The parallel debugger extensions has similar buttons as the Linux version of IDB. The functionality is identical. Software & Services Group, Developer Products Division Developer& ServicesDivision Software Products Group 3/14/2012 64 64 62 © 2012 WIPRO LTD | WWW.WIPRO.COM
63.
Shared Data Events
Detection • Shared data access is a major Visual Studio* problem in multi- threaded Compiler applications Instrumented Can cause hard to diagnose intermittent Application GUI program failure s GUI Extension Tool support is required for detection Memory Acces Instrumentation Normal debugging Debug Engine Intel Debug Runtime RTL Debugger Extension Technology built on: events Code & debug info Instrumentation by Intel compiler (/Qopenmp /debug:parallel) Debug runtime library that collects data access traces and triggers debugger tool (libiomp5md.dll, pdbx.dll) Debugger Extension in Visual Studio Debug Engine collects and exports information GUI integration provides parallelism views and user interactivity Software & Services Group, Developer Products Division Software & Services Group Developer Products Division 3/14/2012 65 65 63 © 2012 WIPRO LTD | WWW.WIPRO.COM
64.
Shared Data Events
Detection – contd. Data sharing detection is part of overall debug process Breakpoint model (stop on detection) GUI extensions show results & link to source Filter capabilities to hide false positives New powerful data breakpoint types Stop when 2nd thread accesses specific address Stop on read from address Key User Benefit: A simplified feature to detect shared data accesses from multiple threads Software & Services Group, Developer Products Division Software & ServicesDivision Developer Products Group 3/14/2012 66 66 64 © 2012 WIPRO LTD | WWW.WIPRO.COM
65.
Shared Data Events
Detection - Usage Enabling Shared Data Events Detection automatically enables “Stop On Event” Future detection and logging of events can be suppressed Software & Services Group, Developer Products Division Software & Services Group Developer Products Division 3/14/2012 67 67 65 © 2012 WIPRO LTD | WWW.WIPRO.COM
66.
Shared Data Events
Detection - Filtering Data sharing detection is selective Data Filter Specific data items and variables can be excluded Code Filter Functions can be excluded Source files can be excluded Address ranges can be excluded Software & Services Group, Developer Products Division Software & Services Group Developer Products Division 3/14/2012 68 68 66 © 2012 WIPRO LTD | WWW.WIPRO.COM
67.
Re-Entrant Call Detection
Automatically halts execution when a function is executed by more than one thread at any given point in time. Allows to identify reentrancy requirements/problems for multi- threaded applications Software & Services Group, Developer Products Division Software & ServicesDivision Developer Products Group 3/14/2012 69 67 69 © 2012 WIPRO LTD | WWW.WIPRO.COM
68.
Re-Entrant Call Detection
• After execution is halted and user clicks ok the program counter will point exactly to where the problem occurred. Software & Services Group, Developer Products Division Software & ServicesDivision Developer Products Group 3/14/2012 70 68 70 © 2012 WIPRO LTD | WWW.WIPRO.COM
69.
SIMD SSE Debugging
Window SSE Registers Window SSE Registers display of variables used for SIMD operations Free Formatting for flexible representation of data In-depth insight into data parallelization and vectorization. In-Place Edit Software & Services Group, Developer Products Division Developer©Products Division All rights reserved. Copyright 2010, Intel Corporation. Copyright© 2011, Intel Corporation. All rights reserved. *Other brands and names are the property of their *Other respective owners.the propertyaretheir respective owners. brands and names of 3/14/2012 71 71 69 © 2012 WIPRO LTD | WWW.WIPRO.COM
70.
Enhanced OpenMP* Debugging
Support •Dedicated OpenMP runtime object - information Windows OpenMP Task and Spawn Tree lists Barrier and Lock information Task Wait lists Thread Team worker lists •Serialize Parallel Regions Change number of parallel threads dynamically during runtime to 1 or N (all) Verify code correctness for serial execution vs. Parallel execution Identify whether a runtime issue is really parallelism related User benefit: Detailed execution state information for OpenMP applications (deadlock detection). Influences execution behavior without recompile! Software & Services Group, Developer Products Division Software & Services Group Developer Products Division 3/14/2012 72 72 70 © 2012 WIPRO LTD | WWW.WIPRO.COM
71.
OpenMP* Information Windows
• Tasks, Spawn Trees, Teams, Barriers, Locks • (Task ID reporting incorrect) Software & Services Group, Developer Products Division Software & ServicesDivision Developer Products Group 3/14/2012 73 71 73 © 2012 WIPRO LTD | WWW.WIPRO.COM
72.
Serialize Parallel Regions
Problem: Parallel loop computes a wrong result. Is it a concurrency or algorithm issue ? Parallel Debug Support … Runtime access to the OpenMP num_thread property Set to 1 for serial execution of next parallel block Disable parallel User Benefit Verification of a algorithm “on-the-fly” without slowing down the entire application to serial execution On demand serial debugging without recompile/restart Enable parallel … Software & Services Group, Developer Products Division Software & Services Group Developer Products Division 3/14/2012 74 74 72 © 2012 WIPRO LTD | WWW.WIPRO.COM
73.
Agenda
•What is OpenMP? •Parallel regions •Data-Sharing Attribute Clauses •Worksharing •OpenMP 3.0 Tasks •Synchronization •Runtime functions/environment variables •Optional Advanced topics – Intel® Parallel Debugger Extension – Advanced Concepts Software & Services Group, Developer Products Division Software & Services Group 73 © 2012 WIPRO LTD | WWW.WIPRO.COM
74.
Parallel Construct –
Implicit Task View –Tasks are created in OpenMP even without an explicit task directive. –Lets look at how tasks are created implicitly for the code snippet below – Thread encountering parallel construct packages up a set of implicit tasks – Team of threads is created. – Each thread in team is assigned to one of the tasks parallel (and tied to it). – Barrier holds original master { int mydata thread until all implicit tasks are finished. Software & Services Group, Developer Products Division Software & Services Group Developer Products Division 3/14/2012 76 76 74 © 2012 WIPRO LTD | WWW.WIPRO.COM
75.
Task Construct
#pragma omp task [clause[[,]clause] ...] structured-block where clause can be one of: if (expression) untied shared (list) private (list) firstprivate (list) default( shared | none ) Software & Services Group, Developer Products Division Software & Services Group 75 © 2012 WIPRO LTD | WWW.WIPRO.COM
76.
Tied & Untied
Tasks – Tied Tasks: – A tied task gets a thread assigned to it at its first execution and the same thread services the task for its lifetime – A thread executing a tied task, can be suspended, and sent off to execute some other task, but eventually, the same thread will return to resume execution of its original tied task – Tasks are tied unless explicitly declared untied – Untied Tasks: – An united task has no long term association with any given thread. Any thread not otherwise occupied is free to execute an untied task. The thread assigned to execute an untied task may only change at a "task scheduling point". – An untied task is created by appending “untied” to the task clause 76 © 2012 WIPRO LTD | WWW.WIPRO.COM
77.
Task switching
• task switching The act of a thread switching from the execution of one task to another task. • The purpose of task switching is distribute threads among the unassigned tasks in the team to avoid piling up long queues of unassigned tasks • Task switching, for tied tasks, can only occur at task scheduling points located within the following constructs encountered task constructs encountered taskwait constructs encountered barrier directives implicit barrier regions at the end of the tied task region • Untied tasks have implementation dependent scheduling points Software & Services Group, Developer Products Division Software & Services Group 77 © 2012 WIPRO LTD | WWW.WIPRO.COM
78.
Task switching example
The thread executing the “for loop” , AKA the generating task, generates many tasks in a short time so... The SINGLE generating task will have to suspend for a while when “task pool” fills up • Task switching is invoked to start draining the “pool” • When “pool” is sufficiently drained – then the single task can being generating more tasks again int exp; //exp either T or F; #pragma omp single { for (i=0; i<ONEZILLION; i++) #pragma omp task process(item[i]); } } Developer Products Division Group, Software & Services Developer Products Division Software & Services Group 3/14/2012 80 80 78 © 2012 WIPRO LTD | WWW.WIPRO.COM
79.
Optional foil -
OpenMP* API •Get the thread number within a team int omp_get_thread_num(void); •Get the number of threads in a team int omp_get_num_threads(void); •Usually not needed for OpenMP codes – Can lead to code not being serially consistent – Does have specific uses (debugging) – Must include a header file #include <omp.h> Software & Services Group, Developer Products Division Software & Services Group 79 © 2012 WIPRO LTD | WWW.WIPRO.COM
80.
Optional foil -
Monte Carlo Pi 1 # of darts hitting circle 4r2 #of dartsin square r2 # of darts hitting circle 4 #of dartsin square loop 1 to MAX x.coor=(random#) y.coor=(random#) dist=sqrt(x^2 + y^2) r if (dist <= 1) hits=hits+1 pi = 4 * hits/MAX Software & Services Group, Developer Products Division Software & Services Group Developer Products Division 3/14/2012 82 82 80 © 2012 WIPRO LTD | WWW.WIPRO.COM
81.
Optional foil -
Making Monte Carlo’s Parallel hits = 0 call SEED48(1) DO I = 1, max x = DRAND48() y = DRAND48() IF (SQRT(x*x + y*y) .LT. 1) THEN hits = hits+1 ENDIF END DO pi = REAL(hits)/REAL(max) * 4.0 What is the challenge here? Software & Services Group, Developer Products Division Software & Services Group Developer Products Division 3/14/2012 83 83 81 © 2012 WIPRO LTD | WWW.WIPRO.COM
82.
Optional Activity 6:
Computing Pi •Use the Intel® Math Kernel Library (Intel® MKL) VSL: – Intel MKL’s VSL (Vector Statistics Libraries) – VSL creates an array, rather than a single random number – VSL can have multiple seeds (one for each thread) •Objective: – Use basic OpenMP* syntax to make Pi parallel – Choose the best code to divide the task up – Categorize properly all variables Software & Services Group, Developer Products Division Software & Services Group 82 © 2012 WIPRO LTD | WWW.WIPRO.COM
Descargar ahora