SlideShare una empresa de Scribd logo
1 de 31
1
ITCS4145/5145, Parallel Programming B. Wilkinson Fall 2010, Oct 27, 2010
Programming with Shared Memory
Part 2
Introduction to OpenMP
2
OpenMP
An accepted standard developed in the late 1990s by a
group of industry specialists.
Consists of a small set of compiler directives, augmented
with a small set of library routines and environment variables
using the base language Fortran and C/C++.
Several OpenMP compilers available.
3
Wikipedia OpenMP
http://en.wikipedia.org/wiki/File:OpenMP_language_extensions.svg
4
OpenMP
• Uses a thread-based shared memory programming
model
• OpenMP programs will create multiple threads
• All threads have access to global memory
• Data can be shared among all threads or private to one
thread
• Synchronization occurs but often implicit
5
OpenMP uses “fork-join” model but thread-based.
Initially, a single thread executed by a master thread.
parallel directive creates a team of threads with a specified
block of code executed by the multiple threads in parallel.
The exact number of threads in the team determined by one
of several ways.
Other directives used within a parallel construct to specify
parallel for loops and different blocks of code for threads.
6
parallel region
Multiple threads
parallel region
Master thread
Fork/join model
Synchronization
7
For C/C++, the OpenMP directives contained in #pragma
statements.
Format:
#pragma omp directive_name ...
where omp is an OpenMP keyword.
May be additional parameters (clauses) after directive name
for different options.
Some directives require code to specified in a structured
block that follows directive and then directive and structured
block form a “construct”.
8
Parallel Directive
#pragma omp parallel
structured_block
creates multiple threads, each one executing the specified
structured_block, (a single statement or a compound
statement created with { ...} with a single entry point and a
single exit point.)
Implicit barrier at end of construct.
Directive corresponds to forall construct.
9
Hello world example
#pragma omp parallel
{
printf("Hello World from thread = %dn", omp_get_thread_num(),
omp_get_num_threads());
}
Output from an 8-processor/core machine:
Hello World from thread 0 of 8
Hello World from thread 4 of 8
Hello World from thread 3 of 8
Hello World from thread 2 of 8
Hello World from thread 7 of 8
Hello World from thread 1 of 8
Hello World from thread 6 of 8
Hello World from thread 5 of 8
OpenMP
directive for a
parallel region
Opening
brace must on
a new line
10
Private and shared variables
Variables could be declared within each parallel region but
OpenMP provides private clause.
int tid;
…
#pragma omp parallel private(tid)
{
tid = omp_get_thread_num();
printf("Hello World from thread = %dn", tid);
}
Each thread
has a local
variable tid
Also a shared clause available.
11
12
Number of threads in a team
Established by either:
1.num_threads clause after the parallel directive, or
2. omp_set_num_threads() library routine being previously
called, or
3. Environment variable OMP_NUM_THREADS is defined
in order given or is system dependent if none of above.
Number of threads available can also be altered dynamically to
achieve best use of system resources.
13
Work-Sharing
Three constructs in this classification:
sections
for
single
In all cases, there is an implicit barrier at end of construct
unless a nowait clause included, which overrides the barrier.
Note: These constructs do not start a new team of threads.
That done by an enclosing parallel construct.
14
Sections
The construct
#pragma omp sections
{
#pragma omp section
structured_block
.
.
.
#pragma omp section
structured_block
}
cause structured blocks to be shared among threads in team.
The first section directive optional.
Blocks
executed by
available
threads
15
Example
#pragma omp parallel shared(a,b,c,d,nthreads) private(i,tid)
{
tid = omp_get_thread_num();
#pragma omp sections nowait
{
#pragma omp section
{
printf("Thread %d doing section 1n",tid);
for (i=0; i<N; i++) {
c[i] = a[i] + b[i];
printf("Thread %d: c[%d]= %fn",tid,i,c[i]);
}
}
#pragma omp section
{
printf("Thread %d doing section 2n",tid);
for (i=0; i<N; i++) {
d[i] = a[i] * b[i];
printf("Thread %d: d[%d]= %fn",tid,i,d[i]);
}
}
} /* end of sections */
} /* end of parallel section */
One
thread
does this
Another
thread
does this
16
For Loop
#pragma omp for
for ( i = 0; …. )
causes for loop to be divided into parts and parts shared
among threads in the team. for loop must be of a simple form.
Way for loop divided can be specified by additional “schedule”
clause.
Example
schedule (static, chunk_size)
for loop divided into sizes specified by chunk_size and
allocated to threads in a round robin fashion.
For loop of a
simple form
17
Example
#pragma omp parallel shared(a,b,c,nthreads,chunk) private(i,tid)
{
tid = omp_get_thread_num();
if (tid == 0) {
nthreads = omp_get_num_threads();
printf("Number of threads = %dn", nthreads);
}
printf("Thread %d starting...n",tid);
#pragma omp for schedule(dynamic,chunk)
for (i=0; i<N; i++) {
c[i] = a[i] + b[i];
printf("Thread %d: c[%d]= %fn",tid,i,c[i]);
}
} /* end of parallel section */
For loop
Executed by
one thread
18
Single
The directive
#pragma omp single
structured block
cause the structured block to be executed by one thread only.
19
Combined Parallel Work-sharing
Constructs
If a parallel directive is followed by a single for directive, it
can be combined into:
#pragma omp parallel for
<for loop>
with similar effects.
20
If a parallel directive is followed by a single sections directive,
it can be combined into
#pragma omp parallel sections
{
#pragma omp section
structured_block
#pragma omp section
structured_block
.
.
.
}
with similar effect. (In both cases, the nowait clause is not
allowed.)
21
Master Directive
The master directive:
#pragma omp master
structured_block
causes the master thread to execute the structured block.
Different to those in the work sharing group in that there is
no implied barrier at the end of the construct (nor the
beginning). Other threads encountering this directive will
ignore it and the associated structured block, and will move
on.
22
Loop Scheduling and Partitioning
OpenMP offers scheduling clauses to add to for construct:
• Static
#pragma omp parallel for schedule (static,chunk_size)
Partitions loop iterations into equal sized chunks specified by
chunk_size. Chunks assigned to threads in round robin
fashion.
• Dynamic
#pragma omp parallel for schedule (dynamic,chunk_size)
Uses internal work queue. Chunk-sized block of loop
assigned to threads as they become available.
23
• Guided
#pragma omp parallel for schedule (guided,chunk_size)
Similar to dynamic but chunk size starts large and gets smaller
to reduce time threads have to go to work queue.
chunk size = number of iterations remaining
2 * number of threads
• Runtime
#pragma omp parallel for schedule (runtime)
Uses OMP_SCEDULE environment variable to specify which of
static, dynamic or guided should be used.
24
Reduction clause
Used combined the result of the iterations into a single
value c.f. with MPI _Reduce().
Can be used with parallel, for, and sections,
Example
sum = 0
#pragma omp parallel for reduction(+:sum)
for (k = 0; k < 100; k++ ) {
sum = sum + funct(k);
}
Private copy of sum created for each thread by complier.
Private copy will be added to sum at end.
Eliminates here the need for critical sections.
Operation
Variable
25
Private variables
private clause – creates private copies of variables for
each thread
firstprivate clause - as private clause but initializes each
copy to the values given immediately prior to parallel
construct.
lastprivate clause – as private but “the value of each
lastprivate variable from the sequentially last iteration of
the associated loop, or the lexically last section directive,
is assigned to the variable’s original object.”
26
Synchronization Constructs
Critical
critical directive will only allow one thread execute the
associated structured block. When one or more threads
reach the critical directive:
#pragma omp critical name
structured_block
they will wait until no other thread is executing the same
critical section (one with the same name), and then one
thread will proceed to execute the structured block.
name is optional. All critical sections with no name map to
one undefined name.
27
Barrier
When a thread reaches the barrier
#pragma omp barrier
it waits until all threads have reached the barrier and then they
all proceed together.
There are restrictions on the placement of barrier directive in a
program. In particular, all threads must be able to reach the
barrier.
28
Atomic
The atomic directive
#pragma omp atomic
expression_statement
implements a critical section efficiently when the critical
section simply updates a variable (adds one, subtracts one,
or does some other simple arithmetic operation as defined
by expression_statement).
29
Flush
A synchronization point which causes thread to have a
“consistent” view of certain or all shared variables in memory.
All current read and write operations on variables allowed to
complete and values written back to memory but any memory
operations in code after flush are not started.
Format:
#pragma omp flush (variable_list)
Only applied to thread executing flush, not to all threads in team.
Flush occurs automatically at entry and exit of parallel and critical
directives, and at the exit of for, sections, and single (if a no-wait
clause is not present).
30
Ordered clause
Used in conjunction with for and parallel for directives to
cause an iteration to be executed in the order that it
would have occurred if written as a sequential loop.
31
More information
Full information on OpenMP at
http://openmp.org/wp/

Más contenido relacionado

La actualidad más candente

Presentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel ProgrammingPresentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel ProgrammingVengada Karthik Rangaraju
 
Concurrent Programming OpenMP @ Distributed System Discussion
Concurrent Programming OpenMP @ Distributed System DiscussionConcurrent Programming OpenMP @ Distributed System Discussion
Concurrent Programming OpenMP @ Distributed System DiscussionCherryBerry2
 
Introduction to OpenMP (Performance)
Introduction to OpenMP (Performance)Introduction to OpenMP (Performance)
Introduction to OpenMP (Performance)Akhila Prabhakaran
 
Towards Chainer v1.5
Towards Chainer v1.5Towards Chainer v1.5
Towards Chainer v1.5Seiya Tokui
 
Towards an Integration of the Actor Model in an FRP Language for Small-Scale ...
Towards an Integration of the Actor Model in an FRP Language for Small-Scale ...Towards an Integration of the Actor Model in an FRP Language for Small-Scale ...
Towards an Integration of the Actor Model in an FRP Language for Small-Scale ...Takuo Watanabe
 
A Language Support for Exhaustive Fault-Injection in Message-Passing System M...
A Language Support for Exhaustive Fault-Injection in Message-Passing System M...A Language Support for Exhaustive Fault-Injection in Message-Passing System M...
A Language Support for Exhaustive Fault-Injection in Message-Passing System M...Takuo Watanabe
 
Improving Robustness In Distributed Systems
Improving Robustness In Distributed SystemsImproving Robustness In Distributed Systems
Improving Robustness In Distributed Systemsl xf
 
Multithreading in java
Multithreading in javaMultithreading in java
Multithreading in javaRaghu nath
 

La actualidad más candente (20)

Introduction to OpenMP
Introduction to OpenMPIntroduction to OpenMP
Introduction to OpenMP
 
Presentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel ProgrammingPresentation on Shared Memory Parallel Programming
Presentation on Shared Memory Parallel Programming
 
Concurrent Programming OpenMP @ Distributed System Discussion
Concurrent Programming OpenMP @ Distributed System DiscussionConcurrent Programming OpenMP @ Distributed System Discussion
Concurrent Programming OpenMP @ Distributed System Discussion
 
Open mp directives
Open mp directivesOpen mp directives
Open mp directives
 
openmp
openmpopenmp
openmp
 
Introduction to MPI
Introduction to MPIIntroduction to MPI
Introduction to MPI
 
Introduction to OpenMP (Performance)
Introduction to OpenMP (Performance)Introduction to OpenMP (Performance)
Introduction to OpenMP (Performance)
 
Open mp
Open mpOpen mp
Open mp
 
Introduction to OpenMP
Introduction to OpenMPIntroduction to OpenMP
Introduction to OpenMP
 
Parallel Programming
Parallel ProgrammingParallel Programming
Parallel Programming
 
OpenMp
OpenMpOpenMp
OpenMp
 
Openmp
OpenmpOpenmp
Openmp
 
Parllelizaion
ParllelizaionParllelizaion
Parllelizaion
 
Towards Chainer v1.5
Towards Chainer v1.5Towards Chainer v1.5
Towards Chainer v1.5
 
Lecture8
Lecture8Lecture8
Lecture8
 
Towards an Integration of the Actor Model in an FRP Language for Small-Scale ...
Towards an Integration of the Actor Model in an FRP Language for Small-Scale ...Towards an Integration of the Actor Model in an FRP Language for Small-Scale ...
Towards an Integration of the Actor Model in an FRP Language for Small-Scale ...
 
A Language Support for Exhaustive Fault-Injection in Message-Passing System M...
A Language Support for Exhaustive Fault-Injection in Message-Passing System M...A Language Support for Exhaustive Fault-Injection in Message-Passing System M...
A Language Support for Exhaustive Fault-Injection in Message-Passing System M...
 
report
reportreport
report
 
Improving Robustness In Distributed Systems
Improving Robustness In Distributed SystemsImproving Robustness In Distributed Systems
Improving Robustness In Distributed Systems
 
Multithreading in java
Multithreading in javaMultithreading in java
Multithreading in java
 

Destacado (20)

Announcements for june 2013
Announcements for june 2013Announcements for june 2013
Announcements for june 2013
 
Paralle programming 2
Paralle programming 2Paralle programming 2
Paralle programming 2
 
赤字決算の対処法
赤字決算の対処法赤字決算の対処法
赤字決算の対処法
 
Announcements for june 2014
Announcements for june 2014Announcements for june 2014
Announcements for june 2014
 
Announcements for May 2014
Announcements for May 2014Announcements for May 2014
Announcements for May 2014
 
Android and Smartphones
Android and SmartphonesAndroid and Smartphones
Android and Smartphones
 
Announcements for june 2013
Announcements for june 2013Announcements for june 2013
Announcements for june 2013
 
Announcements for Feb 19 2012
Announcements for Feb 19 2012Announcements for Feb 19 2012
Announcements for Feb 19 2012
 
Command GM
Command GMCommand GM
Command GM
 
Hoarding
HoardingHoarding
Hoarding
 
Himti uin jakarta
Himti uin jakartaHimti uin jakarta
Himti uin jakarta
 
February 2013 announcements
February 2013 announcementsFebruary 2013 announcements
February 2013 announcements
 
Announcements for june 2014
Announcements for june 2014Announcements for june 2014
Announcements for june 2014
 
September 2012 announcements
September 2012 announcementsSeptember 2012 announcements
September 2012 announcements
 
GCC
GCCGCC
GCC
 
Announcements for june 2013
Announcements for june 2013Announcements for june 2013
Announcements for june 2013
 
November 2012 announcements
November 2012 announcementsNovember 2012 announcements
November 2012 announcements
 
Announcements for july 2013
Announcements for july 2013Announcements for july 2013
Announcements for july 2013
 
September 2012 announcements
September 2012 announcementsSeptember 2012 announcements
September 2012 announcements
 
July 2012 announcements
July 2012 announcementsJuly 2012 announcements
July 2012 announcements
 

Similar a Programming using Open Mp

Parallel and Distributed Computing Chapter 5
Parallel and Distributed Computing Chapter 5Parallel and Distributed Computing Chapter 5
Parallel and Distributed Computing Chapter 5AbdullahMunir32
 
Parallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPParallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPAnil Bohare
 
Algoritmi e Calcolo Parallelo 2012/2013 - OpenMP
Algoritmi e Calcolo Parallelo 2012/2013 - OpenMPAlgoritmi e Calcolo Parallelo 2012/2013 - OpenMP
Algoritmi e Calcolo Parallelo 2012/2013 - OpenMPPier Luca Lanzi
 
Omp tutorial cpugpu_programming_cdac
Omp tutorial cpugpu_programming_cdacOmp tutorial cpugpu_programming_cdac
Omp tutorial cpugpu_programming_cdacGanesan Narayanasamy
 
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5Jeff Larkin
 
Intro To .Net Threads
Intro To .Net ThreadsIntro To .Net Threads
Intro To .Net Threadsrchakra
 
Hs java open_party
Hs java open_partyHs java open_party
Hs java open_partyOpen Party
 
Algorithm2e package for Latex
Algorithm2e package for LatexAlgorithm2e package for Latex
Algorithm2e package for LatexChris Lee
 
An eternal question of timing
An eternal question of timingAn eternal question of timing
An eternal question of timingPVS-Studio
 
introduction to server-side scripting
introduction to server-side scriptingintroduction to server-side scripting
introduction to server-side scriptingAmirul Shafeeq
 
slides8 SharedMemory.ppt
slides8 SharedMemory.pptslides8 SharedMemory.ppt
slides8 SharedMemory.pptaminnezarat
 
Java Performance, Threading and Concurrent Data Structures
Java Performance, Threading and Concurrent Data StructuresJava Performance, Threading and Concurrent Data Structures
Java Performance, Threading and Concurrent Data StructuresHitendra Kumar
 
Directive-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous ComputingDirective-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous ComputingRuymán Reyes
 
Stefan Richter - Writing simple, readable and robust code: Examples in Java, ...
Stefan Richter - Writing simple, readable and robust code: Examples in Java, ...Stefan Richter - Writing simple, readable and robust code: Examples in Java, ...
Stefan Richter - Writing simple, readable and robust code: Examples in Java, ...AboutYouGmbH
 

Similar a Programming using Open Mp (20)

Parallel and Distributed Computing Chapter 5
Parallel and Distributed Computing Chapter 5Parallel and Distributed Computing Chapter 5
Parallel and Distributed Computing Chapter 5
 
Lecture6
Lecture6Lecture6
Lecture6
 
Parallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMPParallelization of Coupled Cluster Code with OpenMP
Parallelization of Coupled Cluster Code with OpenMP
 
Algoritmi e Calcolo Parallelo 2012/2013 - OpenMP
Algoritmi e Calcolo Parallelo 2012/2013 - OpenMPAlgoritmi e Calcolo Parallelo 2012/2013 - OpenMP
Algoritmi e Calcolo Parallelo 2012/2013 - OpenMP
 
Omp tutorial cpugpu_programming_cdac
Omp tutorial cpugpu_programming_cdacOmp tutorial cpugpu_programming_cdac
Omp tutorial cpugpu_programming_cdac
 
Lecture7
Lecture7Lecture7
Lecture7
 
openmp final2.pptx
openmp final2.pptxopenmp final2.pptx
openmp final2.pptx
 
Cc module 3.pptx
Cc module 3.pptxCc module 3.pptx
Cc module 3.pptx
 
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
GTC16 - S6410 - Comparing OpenACC 2.5 and OpenMP 4.5
 
Intro To .Net Threads
Intro To .Net ThreadsIntro To .Net Threads
Intro To .Net Threads
 
Md09 multithreading
Md09 multithreadingMd09 multithreading
Md09 multithreading
 
genalg
genalggenalg
genalg
 
Hs java open_party
Hs java open_partyHs java open_party
Hs java open_party
 
Algorithm2e package for Latex
Algorithm2e package for LatexAlgorithm2e package for Latex
Algorithm2e package for Latex
 
An eternal question of timing
An eternal question of timingAn eternal question of timing
An eternal question of timing
 
introduction to server-side scripting
introduction to server-side scriptingintroduction to server-side scripting
introduction to server-side scripting
 
slides8 SharedMemory.ppt
slides8 SharedMemory.pptslides8 SharedMemory.ppt
slides8 SharedMemory.ppt
 
Java Performance, Threading and Concurrent Data Structures
Java Performance, Threading and Concurrent Data StructuresJava Performance, Threading and Concurrent Data Structures
Java Performance, Threading and Concurrent Data Structures
 
Directive-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous ComputingDirective-based approach to Heterogeneous Computing
Directive-based approach to Heterogeneous Computing
 
Stefan Richter - Writing simple, readable and robust code: Examples in Java, ...
Stefan Richter - Writing simple, readable and robust code: Examples in Java, ...Stefan Richter - Writing simple, readable and robust code: Examples in Java, ...
Stefan Richter - Writing simple, readable and robust code: Examples in Java, ...
 

Más de Anshul Sharma (10)

Understanding concurrency
Understanding concurrencyUnderstanding concurrency
Understanding concurrency
 
Interm codegen
Interm codegenInterm codegen
Interm codegen
 
Open MPI 2
Open MPI 2Open MPI 2
Open MPI 2
 
Open MPI
Open MPIOpen MPI
Open MPI
 
Parallel programming
Parallel programmingParallel programming
Parallel programming
 
Cuda 3
Cuda 3Cuda 3
Cuda 3
 
Cuda 2
Cuda 2Cuda 2
Cuda 2
 
Cuda intro
Cuda introCuda intro
Cuda intro
 
Des
DesDes
Des
 
Intoduction to Linux
Intoduction to LinuxIntoduction to Linux
Intoduction to Linux
 

Último

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...gurkirankumar98700
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Miguel Araújo
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGSujit Pal
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityPrincipled Technologies
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 

Último (20)

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
Kalyanpur ) Call Girls in Lucknow Finest Escorts Service 🍸 8923113531 🎰 Avail...
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
Mastering MySQL Database Architecture: Deep Dive into MySQL Shell and MySQL R...
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Google AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAGGoogle AI Hackathon: LLM based Evaluator for RAG
Google AI Hackathon: LLM based Evaluator for RAG
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
Boost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivityBoost PC performance: How more available memory can improve productivity
Boost PC performance: How more available memory can improve productivity
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 

Programming using Open Mp

  • 1. 1 ITCS4145/5145, Parallel Programming B. Wilkinson Fall 2010, Oct 27, 2010 Programming with Shared Memory Part 2 Introduction to OpenMP
  • 2. 2 OpenMP An accepted standard developed in the late 1990s by a group of industry specialists. Consists of a small set of compiler directives, augmented with a small set of library routines and environment variables using the base language Fortran and C/C++. Several OpenMP compilers available.
  • 4. 4 OpenMP • Uses a thread-based shared memory programming model • OpenMP programs will create multiple threads • All threads have access to global memory • Data can be shared among all threads or private to one thread • Synchronization occurs but often implicit
  • 5. 5 OpenMP uses “fork-join” model but thread-based. Initially, a single thread executed by a master thread. parallel directive creates a team of threads with a specified block of code executed by the multiple threads in parallel. The exact number of threads in the team determined by one of several ways. Other directives used within a parallel construct to specify parallel for loops and different blocks of code for threads.
  • 6. 6 parallel region Multiple threads parallel region Master thread Fork/join model Synchronization
  • 7. 7 For C/C++, the OpenMP directives contained in #pragma statements. Format: #pragma omp directive_name ... where omp is an OpenMP keyword. May be additional parameters (clauses) after directive name for different options. Some directives require code to specified in a structured block that follows directive and then directive and structured block form a “construct”.
  • 8. 8 Parallel Directive #pragma omp parallel structured_block creates multiple threads, each one executing the specified structured_block, (a single statement or a compound statement created with { ...} with a single entry point and a single exit point.) Implicit barrier at end of construct. Directive corresponds to forall construct.
  • 9. 9 Hello world example #pragma omp parallel { printf("Hello World from thread = %dn", omp_get_thread_num(), omp_get_num_threads()); } Output from an 8-processor/core machine: Hello World from thread 0 of 8 Hello World from thread 4 of 8 Hello World from thread 3 of 8 Hello World from thread 2 of 8 Hello World from thread 7 of 8 Hello World from thread 1 of 8 Hello World from thread 6 of 8 Hello World from thread 5 of 8 OpenMP directive for a parallel region Opening brace must on a new line
  • 10. 10 Private and shared variables Variables could be declared within each parallel region but OpenMP provides private clause. int tid; … #pragma omp parallel private(tid) { tid = omp_get_thread_num(); printf("Hello World from thread = %dn", tid); } Each thread has a local variable tid Also a shared clause available.
  • 11. 11
  • 12. 12 Number of threads in a team Established by either: 1.num_threads clause after the parallel directive, or 2. omp_set_num_threads() library routine being previously called, or 3. Environment variable OMP_NUM_THREADS is defined in order given or is system dependent if none of above. Number of threads available can also be altered dynamically to achieve best use of system resources.
  • 13. 13 Work-Sharing Three constructs in this classification: sections for single In all cases, there is an implicit barrier at end of construct unless a nowait clause included, which overrides the barrier. Note: These constructs do not start a new team of threads. That done by an enclosing parallel construct.
  • 14. 14 Sections The construct #pragma omp sections { #pragma omp section structured_block . . . #pragma omp section structured_block } cause structured blocks to be shared among threads in team. The first section directive optional. Blocks executed by available threads
  • 15. 15 Example #pragma omp parallel shared(a,b,c,d,nthreads) private(i,tid) { tid = omp_get_thread_num(); #pragma omp sections nowait { #pragma omp section { printf("Thread %d doing section 1n",tid); for (i=0; i<N; i++) { c[i] = a[i] + b[i]; printf("Thread %d: c[%d]= %fn",tid,i,c[i]); } } #pragma omp section { printf("Thread %d doing section 2n",tid); for (i=0; i<N; i++) { d[i] = a[i] * b[i]; printf("Thread %d: d[%d]= %fn",tid,i,d[i]); } } } /* end of sections */ } /* end of parallel section */ One thread does this Another thread does this
  • 16. 16 For Loop #pragma omp for for ( i = 0; …. ) causes for loop to be divided into parts and parts shared among threads in the team. for loop must be of a simple form. Way for loop divided can be specified by additional “schedule” clause. Example schedule (static, chunk_size) for loop divided into sizes specified by chunk_size and allocated to threads in a round robin fashion. For loop of a simple form
  • 17. 17 Example #pragma omp parallel shared(a,b,c,nthreads,chunk) private(i,tid) { tid = omp_get_thread_num(); if (tid == 0) { nthreads = omp_get_num_threads(); printf("Number of threads = %dn", nthreads); } printf("Thread %d starting...n",tid); #pragma omp for schedule(dynamic,chunk) for (i=0; i<N; i++) { c[i] = a[i] + b[i]; printf("Thread %d: c[%d]= %fn",tid,i,c[i]); } } /* end of parallel section */ For loop Executed by one thread
  • 18. 18 Single The directive #pragma omp single structured block cause the structured block to be executed by one thread only.
  • 19. 19 Combined Parallel Work-sharing Constructs If a parallel directive is followed by a single for directive, it can be combined into: #pragma omp parallel for <for loop> with similar effects.
  • 20. 20 If a parallel directive is followed by a single sections directive, it can be combined into #pragma omp parallel sections { #pragma omp section structured_block #pragma omp section structured_block . . . } with similar effect. (In both cases, the nowait clause is not allowed.)
  • 21. 21 Master Directive The master directive: #pragma omp master structured_block causes the master thread to execute the structured block. Different to those in the work sharing group in that there is no implied barrier at the end of the construct (nor the beginning). Other threads encountering this directive will ignore it and the associated structured block, and will move on.
  • 22. 22 Loop Scheduling and Partitioning OpenMP offers scheduling clauses to add to for construct: • Static #pragma omp parallel for schedule (static,chunk_size) Partitions loop iterations into equal sized chunks specified by chunk_size. Chunks assigned to threads in round robin fashion. • Dynamic #pragma omp parallel for schedule (dynamic,chunk_size) Uses internal work queue. Chunk-sized block of loop assigned to threads as they become available.
  • 23. 23 • Guided #pragma omp parallel for schedule (guided,chunk_size) Similar to dynamic but chunk size starts large and gets smaller to reduce time threads have to go to work queue. chunk size = number of iterations remaining 2 * number of threads • Runtime #pragma omp parallel for schedule (runtime) Uses OMP_SCEDULE environment variable to specify which of static, dynamic or guided should be used.
  • 24. 24 Reduction clause Used combined the result of the iterations into a single value c.f. with MPI _Reduce(). Can be used with parallel, for, and sections, Example sum = 0 #pragma omp parallel for reduction(+:sum) for (k = 0; k < 100; k++ ) { sum = sum + funct(k); } Private copy of sum created for each thread by complier. Private copy will be added to sum at end. Eliminates here the need for critical sections. Operation Variable
  • 25. 25 Private variables private clause – creates private copies of variables for each thread firstprivate clause - as private clause but initializes each copy to the values given immediately prior to parallel construct. lastprivate clause – as private but “the value of each lastprivate variable from the sequentially last iteration of the associated loop, or the lexically last section directive, is assigned to the variable’s original object.”
  • 26. 26 Synchronization Constructs Critical critical directive will only allow one thread execute the associated structured block. When one or more threads reach the critical directive: #pragma omp critical name structured_block they will wait until no other thread is executing the same critical section (one with the same name), and then one thread will proceed to execute the structured block. name is optional. All critical sections with no name map to one undefined name.
  • 27. 27 Barrier When a thread reaches the barrier #pragma omp barrier it waits until all threads have reached the barrier and then they all proceed together. There are restrictions on the placement of barrier directive in a program. In particular, all threads must be able to reach the barrier.
  • 28. 28 Atomic The atomic directive #pragma omp atomic expression_statement implements a critical section efficiently when the critical section simply updates a variable (adds one, subtracts one, or does some other simple arithmetic operation as defined by expression_statement).
  • 29. 29 Flush A synchronization point which causes thread to have a “consistent” view of certain or all shared variables in memory. All current read and write operations on variables allowed to complete and values written back to memory but any memory operations in code after flush are not started. Format: #pragma omp flush (variable_list) Only applied to thread executing flush, not to all threads in team. Flush occurs automatically at entry and exit of parallel and critical directives, and at the exit of for, sections, and single (if a no-wait clause is not present).
  • 30. 30 Ordered clause Used in conjunction with for and parallel for directives to cause an iteration to be executed in the order that it would have occurred if written as a sequential loop.
  • 31. 31 More information Full information on OpenMP at http://openmp.org/wp/